CN103971702A - Sound monitoring method, device and system - Google Patents
Sound monitoring method, device and system Download PDFInfo
- Publication number
- CN103971702A CN103971702A CN201310332073.6A CN201310332073A CN103971702A CN 103971702 A CN103971702 A CN 103971702A CN 201310332073 A CN201310332073 A CN 201310332073A CN 103971702 A CN103971702 A CN 103971702A
- Authority
- CN
- China
- Prior art keywords
- sound
- training
- detected
- event
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 58
- 230000005236 sound signal Effects 0.000 claims abstract description 56
- 238000001514 detection method Methods 0.000 claims abstract description 14
- 238000001228 spectrum Methods 0.000 claims description 13
- 238000012806 monitoring device Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 239000000203 mixture Substances 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000009432 framing Methods 0.000 claims description 5
- 239000013598 vector Substances 0.000 claims description 5
- 239000000284 extract Substances 0.000 abstract description 9
- 238000012545 processing Methods 0.000 abstract description 3
- 238000003909 pattern recognition Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
Landscapes
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
本发明提供一种声音监控方法、装置及系统,涉及声音信号处理和模式识别技术领域。该方法包含步骤:训练声音阶段和检测声音阶段,训练声音阶段包含:S1、获取训练声音信号,提取训练声音特征;S2、根据训练声音特征,训练声音事件模型;检测声音阶段包含:S3、提取待检测声音特征;S4、判断所述声音事件模型中是否存在至少一个与所述待检测声音特征匹配的声音事件模型,如为是,则判定存在暴力事件;如为否,判定不存在暴力事件。本发明通过提取声音信号的声音特征,将所提取的声音特征与训练好的声音事件模型做比较,分析得出电梯内是否存在暴力事件,实现了电梯内暴力事件的自动监控,实时给出监控结果,能有效保证检测的准确率。
The invention provides a sound monitoring method, device and system, and relates to the technical fields of sound signal processing and pattern recognition. The method includes steps: training sound stage and detecting sound stage, the training sound stage includes: S1, obtaining training sound signal, extracting training sound features; S2, according to training sound features, training sound event model; detecting sound stage includes: S3, extracting Sound features to be detected; S4. Judging whether there is at least one sound event model matching the sound features to be detected in the sound event model, if yes, then determine that there is a violent event; if not, determine that there is no violent event . The present invention extracts the sound features of the sound signal, compares the extracted sound features with the trained sound event model, analyzes whether there is a violent event in the elevator, realizes the automatic monitoring of the violent event in the elevator, and provides real-time monitoring As a result, the detection accuracy can be effectively guaranteed.
Description
技术领域technical field
本发明涉及声音信号处理和模式识别技术领域,具体涉及一种声音监控方法、装置及系统。The invention relates to the technical field of sound signal processing and pattern recognition, in particular to a sound monitoring method, device and system.
背景技术Background technique
随着现代化城市的高速发展,电梯的使用越来越普遍并已成为高层建筑必不可少的垂直交通工具,与居民的日常工作和生活密切相关。据有关部门统计,截止目前,我国电梯的年需求量已达到全球的三分之一。与此同时,由于电梯相对封闭,已成为犯罪分子实施不法行为的极佳场所,这为人们的日常生活带来了众多的安全隐患。越来越多的犯罪分子在电梯中实施抢劫、杀人或者是性骚扰,严重威胁着乘梯者的生命财产安全。文献表明,近年来的电梯暴力事件呈现迅速增长趋势,仅2012年一年,有记录在案的电梯犯罪事件就高达6.2万多起。因此,对电梯内发生的事件进行有效的监控无疑将对电梯暴力事件的发现、制止和侦破等有着重要的现实意义。With the rapid development of modern cities, the use of elevators has become more and more common and has become an indispensable vertical transportation tool for high-rise buildings, which is closely related to the daily work and life of residents. According to the statistics of relevant departments, as of now, the annual demand for elevators in my country has reached one-third of the world's total. At the same time, because the elevator is relatively closed, it has become an excellent place for criminals to carry out illegal activities, which has brought many potential safety hazards for people's daily life. More and more criminals are committing robbery, murder or sexual harassment in elevators, seriously threatening the life and property safety of elevator riders. Literature shows that in recent years, elevator violence has shown a rapid growth trend. In 2012 alone, there were more than 62,000 recorded elevator crimes. Therefore, the effective monitoring of the incidents in the elevator will undoubtedly have important practical significance for the discovery, prevention and detection of elevator violence.
目前广泛采用摄像头视频监控的方式对电梯内的暴力事件实现有效监控。At present, the method of camera video surveillance is widely used to effectively monitor the violence in the elevator.
虽然取得了一定的效果,但尚存在着如下问题:监控的智能化程度低,依赖于监控室工作人员的观察或翻看视频来发现暴力事件。显然,这种监控方式将耗费大量人力物力、而且人看视频图像超过20分钟其注意力将明显下降,准确率也大打折扣。Although certain effects have been achieved, there are still the following problems: the intelligence level of monitoring is low, and violent incidents are found by the observation of staff in the monitoring room or looking through videos. Obviously, this monitoring method will consume a lot of manpower and material resources, and people's attention will obviously decrease after watching the video image for more than 20 minutes, and the accuracy rate will be greatly reduced.
发明内容Contents of the invention
(一)解决的技术问题(1) Solved technical problems
针对现有技术的不足,本发明提供一种声音监控方法、装置及系统,能够自动实现电梯内暴力事件的监控。Aiming at the deficiencies of the prior art, the present invention provides a sound monitoring method, device and system, which can automatically realize the monitoring of violent events in the elevator.
(二)技术方案(2) Technical solutions
为实现以上目的,本发明通过以下技术方案予以实现:To achieve the above object, the present invention is achieved through the following technical solutions:
一种声音监控方法,包括训练声音阶段和检测声音阶段,A sound monitoring method comprising a training sound phase and a detection sound phase,
所述训练声音阶段包含步骤:The training voice stage comprises the steps of:
S1、获取训练声音信号,提取所述训练声音信号的训练声音特征;S1. Obtain a training sound signal, and extract training sound features of the training sound signal;
S2、根据所述训练声音特征,训练声音事件模型;S2. Train a sound event model according to the training sound features;
所述检测声音阶段包含步骤:The detection sound stage includes steps:
S3、获取待检测声音信号,提取所述待检测声音信号的待检测声音特征;S3. Obtain the sound signal to be detected, and extract the sound feature to be detected of the sound signal to be detected;
S4、判断所述声音事件模型中是否存在至少一个与所述待检测声音特征匹配的声音事件模型,如为是,则判定存在暴力事件;如为否,判定不存在暴力事件。S4. Judging whether there is at least one sound event model matching the sound feature to be detected among the sound event models, if yes, it is determined that there is a violent event; if it is no, it is determined that there is no violent event.
优选的,步骤S1中包含步骤:Preferably, step S1 includes steps:
S11、对所述获取的声音信号进行预处理;S11. Preprocessing the acquired sound signal;
S12、对预处理过后的声音信号作离散傅立叶变换,求得功率谱;S12. Discrete Fourier transform is performed on the preprocessed sound signal to obtain a power spectrum;
S13、基于梅尔滤波器组求得所述功率谱的梅尔倒谱系数;S13. Obtain the Mel cepstrum coefficients of the power spectrum based on the Mel filter bank;
S14、计算所述梅尔倒谱系数的一阶差分和二阶差分,将所述一阶差分和二阶差分的系数与所述梅尔倒谱系数拼接,形成声音特征。S14. Calculate the first-order difference and the second-order difference of the Mel cepstral coefficients, and splice the coefficients of the first-order difference and the second-order differences with the Mel cepstral coefficients to form sound features.
优选的,步骤S11中的预处理包括分帧操作和加窗操作;Preferably, the preprocessing in step S11 includes a framing operation and a windowing operation;
其中,加窗操作采用的窗函数为汉明窗,表达式w(n)为:Among them, the window function used in the windowing operation is the Hamming window, and the expression w(n) is:
式中n为时间序号,L为窗长;In the formula, n is the time sequence number, and L is the window length;
步骤S12中所述的求功率谱的表达式Xa(k)为:The expression X a (k) of asking power spectrum described in step S12 is:
式中x(n)为加窗后的语音帧,N表示傅立叶变换的点数,j表示虚数单位。In the formula, x(n) is the speech frame after windowing, N represents the number of Fourier transform points, and j represents the imaginary number unit.
优选的,步骤S2中的通过高斯混合模型来训练声音暴力事件模型,所述的M阶高斯混合模型的概率密度函数如下:Preferably, in step S2, the Gaussian mixture model is used to train the sound violence event model, and the probability density function of the M-order Gaussian mixture model is as follows:
其中,
式中,λ={ci,μi,Σi;(i=1...M)},μi为均值矢量,Σi为协方差矩阵,i=1,2,..M。矩阵Σi在这里采用对角阵: In the formula, λ={c i ,μ i ,Σ i ;(i=1...M)}, μ i is the mean vector, Σ i is the covariance matrix, i=1,2,...M. The matrix Σ i takes a diagonal matrix here:
优选的,步骤S4包含以下步骤:Preferably, step S4 includes the following steps:
S31、假定声音事件模型有N个,每个声音事件模型通过一个高斯混合模型建模,分别为λ1,λ2,...,λN,在判断阶段,输入的观测所述待检测声音特征集为O={o1,o2,...,oT},T为输入声音的帧数;S31. It is assumed that there are N sound event models, and each sound event model is modeled by a Gaussian mixture model, which are λ 1 , λ 2 ,..., λ N , and in the judgment stage, the input observations of the sound to be detected The feature set is O={o 1 ,o 2 ,...,o T }, T is the number of frames of the input sound;
S32、计算所述待检测声音为第n个声音事件模型的后验概率,1≤n≤N;S32. Calculate the posterior probability that the sound to be detected is the nth sound event model, 1≤n≤N;
S33、根据所述后验概率得到预判结果;S33. Obtain a prediction result according to the posterior probability;
S34、根据所述预判结果得到最终的判决结果。S34. Obtain a final judgment result according to the prediction result.
优选的,preferred,
步骤S32中的计算后验概率表达式为:The calculation posterior probability expression in the step S32 is:
式中,p(λn)为第n个声音事件模型的先验概率;p(O)为所有声音事件模型条件下所述待检测声音特征集O的概率;p(O|λn)为第n个声音事件模型产生所述待检测声音特征集O的条件概率。In the formula, p(λ n ) is the prior probability of the nth sound event model; p(O) is the probability of the sound feature set O to be detected under the conditions of all sound event models; p(O|λ n ) is The nth sound event model generates the conditional probability of the sound feature set O to be detected.
优选的,步骤S33中的计算预判结果表达式为:Preferably, the calculation prediction result expression in step S33 is:
式中,p(λn)为第n个声音事件模型的先验概率;p(O)为所有声音事件模型条件下所述待检测声音特征集O的概率;P(λn|ot)为ot产生于λn的概率;In the formula, p(λ n ) is the prior probability of the nth sound event model; p(O) is the probability of the sound feature set O to be detected under the conditions of all sound event models; P(λ n |o t ) is the probability that o t is generated from λ n ;
优选的,步骤S34中的计算判决结果表达式为:Preferably, the calculation judgment result expression in step S34 is:
式中,p(λn)为第n个声音事件模型的先验概率;p(O)为所有声音事件模型条件下所述待检测声音特征集O的概率;为ot产生于的概率;threshold为预设的拒识门限。In the formula, p (λ n ) is the prior probability of the nth sound event model; p (O) is the probability of the sound feature set O to be detected under all sound event model conditions; for o t produced in probability; threshold is the preset rejection threshold.
本发明还一种声音监控装置,包含以下模块:The present invention also provides a sound monitoring device, comprising the following modules:
训练声音阶段模块,获取训练声音信号,提取所述训练声音信号的训练声音特征;根据所述训练声音特征,训练声音事件模型;Training sound stage module, obtain training sound signal, extract the training sound feature of described training sound signal; According to described training sound feature, train sound event model;
检测声音阶段模块,获取待检测声音信号,提取所述待检测声音信号的待检测声音特征;判断所述声音事件模型中是否存在至少一个与所述待检测声音特征匹配的声音事件模型,如为是,则判定存在暴力事件;如为否,判定不存在暴力事件。Detecting the sound stage module, obtaining the sound signal to be detected, extracting the sound feature to be detected of the sound signal to be detected; judging whether there is at least one sound event model matching the sound feature to be detected in the sound event model, such as If yes, it is determined that there is a violent incident; if no, it is determined that there is no violent incident.
本发明还提供了一种声音监控系统,其特征在于,包括麦克风,多路信号采集器,还包括声音监控装置;The present invention also provides a sound monitoring system, which is characterized in that it includes a microphone, a multi-channel signal collector, and a sound monitoring device;
所述麦克风安装于电梯内,采集声音信号,传送给多路信号采集器;The microphone is installed in the elevator, collects sound signals, and transmits them to the multi-channel signal collector;
所述多路信号采集器,接收麦克风发送的声音信号,传送给声音监控装置;The multi-channel signal collector receives the sound signal sent by the microphone and transmits it to the sound monitoring device;
所述声音监控装置对声音信号进行处理。The sound monitoring device processes sound signals.
(三)有益效果(3) Beneficial effects
本发明通过提供一种声音监控方法、装置及系统,通过提取训练声音信号的训练声音特征,训练声音事件模型;通过提取待检测声音信号的待检测声音特征,将所提取的待检测声音特征与训练声音事件模型做比较,分析得出电梯内是否存在暴力事件,实现了电梯内暴力事件的自动监控,实时给出监控结果,能有效保证检测的准确率,为监控人员的下一步处理提供依据。The present invention provides a sound monitoring method, device and system, and trains the sound event model by extracting the training sound features of the training sound signal; by extracting the sound features to be detected of the sound signal to be detected, the extracted sound features to be detected are combined with Train the sound event model for comparison, analyze whether there is a violent incident in the elevator, realize the automatic monitoring of violent incidents in the elevator, and give the monitoring results in real time, which can effectively ensure the accuracy of the detection and provide a basis for the next step of the monitoring personnel. .
本发明所采用的设备与视频监控所需要的工业相机相比,麦克风及其相关的采集设备具有成本低廉的优势,便于推广使用。Compared with the industrial cameras required for video surveillance, the equipment used in the present invention has the advantage of low cost and is convenient for popularization and use.
本发明所采用的麦克风相比与视频监控所需要的工业相机,体积小,便于布置在隐藏的角落,避免受犯罪分子的破坏,使得监控设备更加安全。Compared with the industrial camera required for video surveillance, the microphone adopted in the present invention is smaller in size and convenient to be arranged in hidden corners to avoid being damaged by criminals and make the monitoring equipment safer.
本发明所采用的麦克风相比与视频监控所需要的工业相机,采集信号不受光照、遮挡和伪装等因素的影响,使得监控方式更加稳定。Compared with the industrial camera required for video monitoring, the microphone used in the present invention can collect signals without being affected by factors such as illumination, occlusion, and camouflage, so that the monitoring method is more stable.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为本发明优选实施例的一种声音监控方法的流程示意图;Fig. 1 is a schematic flow chart of a sound monitoring method in a preferred embodiment of the present invention;
图2为本发明优选实施例的一种声音监控方法的流程示意图;Fig. 2 is a schematic flow chart of a sound monitoring method in a preferred embodiment of the present invention;
图3为本发明优选实施例的一种声音监控装置的结构示意图;Fig. 3 is a schematic structural diagram of a sound monitoring device according to a preferred embodiment of the present invention;
图4为本发明优选实施例的一种声音监控系统的架构示意图。Fig. 4 is a schematic structural diagram of a sound monitoring system according to a preferred embodiment of the present invention.
具体实施方式Detailed ways
下面对于本发明所提出的一种声音监控方法、装置及系统,结合附图和实施例详细说明。A sound monitoring method, device and system proposed by the present invention will be described in detail below with reference to the accompanying drawings and embodiments.
实施例1:Example 1:
如图1所示,一种声音监控方法,包括训练声音阶段和检测声音阶段,As shown in Figure 1, a kind of sound monitoring method, comprises training sound stage and detection sound stage,
所述训练声音阶段包含步骤:The training voice stage comprises the steps of:
S1、获取训练声音信号,提取所述训练声音信号的训练声音特征;S1. Obtain a training sound signal, and extract training sound features of the training sound signal;
S2、根据所述训练声音特征训练声音事件模型;S2. Train a sound event model according to the training sound features;
所述检测声音阶段包含步骤:The detection sound stage includes steps:
S3、获取待检测声音信号,提取所述待检测声音信号的待检测声音特征;S3. Obtain the sound signal to be detected, and extract the sound feature to be detected of the sound signal to be detected;
S4、判断所述声音事件模型中是否存在至少一个与所述待检测声音特征匹配的声音事件模型,如为是,则判定存在暴力事件;如为否,判定不存在暴力事件。S4. Judging whether there is at least one sound event model matching the sound feature to be detected among the sound event models, if yes, it is determined that there is a violent event; if it is no, it is determined that there is no violent event.
本发明实施例通过提供一种声音监控方法,通过提取训练声音信号的训练声音特征,训练声音事件模型;通过提取待检测声音信号的待检测声音特征,将所提取的待检测声音特征与训练声音事件模型做比较,分析得出电梯内是否存在暴力事件,实现了电梯内暴力事件的自动监控,实时给出监控结果,能有效保证检测的准确率,为监控人员的下一步处理提供依据。The embodiment of the present invention provides a sound monitoring method, by extracting the training sound features of the training sound signal to train the sound event model; by extracting the sound features to be detected of the sound signal to be detected, the extracted sound features to be detected and the training sound By comparing the event models, it is analyzed whether there is a violent incident in the elevator, which realizes the automatic monitoring of violent incidents in the elevator, and gives the monitoring results in real time, which can effectively ensure the accuracy of the detection and provide a basis for the next step of the monitoring personnel.
下面对本发明实施例继续进行详细的阐述:The embodiment of the present invention continues to be described in detail below:
一种声音监控方法,包括训练声音阶段和检测声音阶段,A sound monitoring method comprising a training sound phase and a detection sound phase,
所述训练声音阶段包含步骤:The training voice stage comprises the steps of:
S1、获取训练声音信号,提取所述训练声音信号的训练声音特征;S1. Obtain a training sound signal, and extract training sound features of the training sound signal;
优选的,步骤S1中包含步骤:Preferably, step S1 includes steps:
S11、对获取的训练声音信号进行预处理;S11. Preprocessing the acquired training sound signal;
S12、对预处理过后的声音信号作离散傅立叶变换,求得功率谱;S12. Discrete Fourier transform is performed on the preprocessed sound signal to obtain a power spectrum;
S13、基于梅尔滤波器组求得所述功率谱的梅尔倒谱系数;S13. Obtain the Mel cepstrum coefficients of the power spectrum based on the Mel filter bank;
S14、计算所述梅尔倒谱系数的一阶差分和二阶差分,将所述一阶差分和二阶差分的系数与所述梅尔倒谱系数拼接,形成声音特征。S14. Calculate the first-order difference and the second-order difference of the Mel cepstral coefficients, and splice the coefficients of the first-order difference and the second-order differences with the Mel cepstral coefficients to form sound features.
优选的,步骤S11中的预处理包括分帧操作和加窗操作;Preferably, the preprocessing in step S11 includes a framing operation and a windowing operation;
其中,加窗操作采用的窗函数为汉明窗,表达式w(n)为:Among them, the window function used in the windowing operation is the Hamming window, and the expression w(n) is:
式中n为时间序号,L为窗长;In the formula, n is the time sequence number, and L is the window length;
优选的,步骤S12中所述的求功率谱的表达式Xa(k)为:Preferably, the expression X a (k) of the power spectrum described in step S12 is:
式中x(n)为加窗后的语音帧,N表示傅立叶变换的点数,j表示虚数单位。In the formula, x(n) is the speech frame after windowing, N represents the number of Fourier transform points, and j represents the imaginary number unit.
S2、根据所述训练声音特征训练声音事件模型;S2. Train a sound event model according to the training sound features;
本发明实施例为每个训练声音信号建立一个GMM。M阶GMM的概率密度函数如下:In the embodiment of the present invention, a GMM is established for each training sound signal. The probability density function of the M-order GMM is as follows:
其中,λ为GMM模型的参数集;o为K维的声学特征矢量;i为隐状态号,也就是高斯分量的序号,M阶GMM就有M个隐状态;ci为第i个分量的混合权值,其值对应为隐状态i的先验概率,因此有:Among them, λ is the parameter set of the GMM model; o is the K-dimensional acoustic feature vector; i is the hidden state number, that is, the serial number of the Gaussian component, and M-order GMM has M hidden states; c i is the i-th component The mixed weight value corresponds to the prior probability of the hidden state i, so there are:
P(o|i,λ)为高斯混合分量,对应隐状态i的观察概率密度函数,P(o|i,λ) is a Gaussian mixture component, corresponding to the observation probability density function of hidden state i,
其中,步骤S2中的通过高斯混合模型来训练声音暴力事件模型,所述的M阶高斯混合模型的概率密度函数如下:Wherein, in step S2, the Gaussian mixture model is used to train the sound violence event model, and the probability density function of the M-order Gaussian mixture model is as follows:
其中,
式中,λ={ci,μi,Σi;(i=1...M)},μi为均值矢量,Σi为协方差矩阵,i=1,2,..M。矩阵Σi在这里采用对角阵: In the formula, λ={c i ,μ i ,Σ i ;(i=1...M)}, μ i is the mean vector, Σ i is the covariance matrix, i=1,2,...M. The matrix Σ i takes a diagonal matrix here:
所述检测声音阶段包含步骤:The detection sound stage includes steps:
S3、获取待检测声音信号,提取所述待检测声音信号的待检测声音特征;S3. Obtain the sound signal to be detected, and extract the sound feature to be detected of the sound signal to be detected;
优选的,步骤S3中包含步骤:Preferably, step S3 includes steps:
S11’、对获取的待检测声音信号进行预处理;S11', preprocessing the acquired sound signal to be detected;
优选的,步骤S11’中的预处理包括分帧操作和加窗操作;Preferably, the preprocessing in step S11' comprises framing and windowing operations;
其中,分帧的目的在于将时间信号分割为相互交叠的语音片断,即帧。每帧长度通常为30ms左右,帧移为10ms。Among them, the purpose of framing is to divide the time signal into overlapping speech segments, ie frames. The length of each frame is usually about 30ms, and the frame shift is 10ms.
其中,加窗操作采用的窗函数为汉明窗,表达式w(n)为:Among them, the window function used in the windowing operation is the Hamming window, and the expression w(n) is:
式中n为时间序号,L为窗长;In the formula, n is the time sequence number, and L is the window length;
S12’、对预处理过后的声音信号作离散傅立叶变换,求得功率谱;S12', performing discrete Fourier transform on the preprocessed sound signal to obtain the power spectrum;
优选的,步骤S12’中所述的求功率谱的表达式Xa(k)为:Preferably, the expression X a (k) for calculating the power spectrum described in step S12' is:
式中x(n)为加窗后的语音帧,N表示傅立叶变换的点数,j表示虚数单位。In the formula, x(n) is the speech frame after windowing, N represents the number of Fourier transform points, and j represents the imaginary number unit.
S13’、基于梅尔滤波器组求得所述功率谱的梅尔倒谱系数;S13', obtain the Mel cepstrum coefficients of the power spectrum based on the Mel filter bank;
本发明实施例定义一个有M个滤波器的滤波器组(滤波器的个数和临界带的个数相近),采用的滤波器为三角滤波器,中心频率为f(m),m=0,2,…,M-1,本发明实施例取M=28。滤波器组中每个三角滤波器的跨度在梅尔标度上是相等的,三角滤波器的频率响应定义为:The embodiment of the present invention defines a filter bank with M filters (the number of filters is similar to the number of critical bands), and the filter used is a triangular filter with a center frequency of f(m), m=0 ,2,...,M-1, M=28 in the embodiment of the present invention. The span of each triangular filter in the filter bank is equal on the Mel scale, and the frequency response of a triangular filter is defined as:
接下来对功率谱加梅尔滤波器组:Next to the power spectrum Gamel filter bank:
然后作离散余弦变换(DCT)得到梅尔倒谱系数:Then do the discrete cosine transform (DCT) to get the Mel cepstral coefficients:
S14’、计算所述梅尔倒谱系数的一阶差分和二阶差分,将所述一阶差分和二阶差分的系数与所述梅尔倒谱系数拼接,形成声音特征。S14'. Calculate the first-order difference and the second-order difference of the Mel cepstrum coefficients, and splice the coefficients of the first-order difference and the second-order differences with the Mel cepstral coefficients to form sound features.
如果t和t+1时刻的倒谱向量为ct和ct+1,If the cepstrum vectors at time t and t+1 are c t and c t+1 ,
一阶差分的计算方法为:The calculation method of the first order difference is:
Δct=ct+1-ct Δc t =c t+1 -c t
二阶差分为:The second order difference is:
ΔΔct=Δct+1-Δct ΔΔc t = Δc t+1 -Δc t
拼接后的语音特征为:The speech features after splicing are:
[ct Δct ΔΔct][c t Δc t ΔΔc t ]
S4、判断所述声音事件模型中是否存在至少一个与所述待检测声音特征匹配的声音事件模型,如为是,则判定存在暴力事件;如为否,判定不存在暴力事件。S4. Judging whether there is at least one sound event model matching the sound feature to be detected among the sound event models, if yes, it is determined that there is a violent event; if it is no, it is determined that there is no violent event.
优选的,步骤S4包含以下步骤:Preferably, step S4 includes the following steps:
S31、假定声音事件模型有N个,每个声音事件模型通过一个高斯混合模型建模,分别为λ1,λ2,...,λN,在判断阶段,输入的观测所述待检测声音特征集为O={o1,o2,...,oT},T为输入声音的帧数;S31. It is assumed that there are N sound event models, and each sound event model is modeled by a Gaussian mixture model, which are λ 1 , λ 2 ,..., λ N , and in the judgment stage, the input observations of the sound to be detected The feature set is O={o 1 ,o 2 ,...,o T }, T is the number of frames of the input sound;
S32、计算所述待检测声音为第n个声音事件模型的后验概率,1≤n≤N;S32. Calculate the posterior probability that the sound to be detected is the nth sound event model, 1≤n≤N;
S33、根据所述后验概率得到预判结果;S33. Obtain a prediction result according to the posterior probability;
S34、根据所述预判结果得到最终的判决结果。S34. Obtain a final judgment result according to the prediction result.
优选的,步骤S32中的计算后验概率表达式为:Preferably, the calculation posterior probability expression in step S32 is:
式中,p(λn)为第n个声音事件模型的先验概率;p(O)为所有声音事件模型条件下所述待检测声音特征集O的概率;p(O|λn)为第n个声音事件模型产生所述待检测声音特征集O的条件概率。In the formula, p(λ n ) is the prior probability of the nth sound event model; p(O) is the probability of the sound feature set O to be detected under the conditions of all sound event models; p(O|λ n ) is The nth sound event model generates the conditional probability of the sound feature set O to be detected.
优选的,步骤S33中的计算预判结果表达式为:Preferably, the calculation prediction result expression in step S33 is:
式中,p(λn)为第n个声音事件模型的先验概率;p(O)为所有声音事件模型条件下所述待检测声音特征集O的概率;P(λn|ot)为ot产生于λn的概率;In the formula, p(λ n ) is the prior probability of the nth sound event model; p(O) is the probability of the sound feature set O to be detected under the conditions of all sound event models; P(λ n |o t ) is the probability that o t is generated from λ n ;
优选的,步骤S34中的计算判决结果表达式为:Preferably, the calculation judgment result expression in step S34 is:
式中,p(λn)为第n个声音事件模型的先验概率;p(O)为所有声音事件模型条件下所述待检测声音特征集O的概率;为ot产生于的概率;threshold为预设的拒识门限。In the formula, p (λ n ) is the prior probability of the nth sound event model; p (O) is the probability of the sound feature set O to be detected under all sound event model conditions; for o t produced in probability; threshold is the preset rejection threshold.
实施例2:Example 2:
如图3所示,一种声音监控装置,其特征在于,包含以下模块:As shown in Figure 3, a sound monitoring device is characterized in that it comprises the following modules:
训练声音阶段模块,获取训练声音信号,提取所述训练声音信号的训练声音特征;根据所述训练声音特征,训练声音事件模型;Training sound stage module, obtain training sound signal, extract the training sound feature of described training sound signal; According to described training sound feature, train sound event model;
检测声音阶段模块,获取待检测声音信号,提取所述待检测声音信号的待检测声音特征;判断所述声音事件模型中是否存在至少一个与所述待检测声音特征匹配的声音事件模型,如为是,则判定存在暴力事件;如为否,判定不存在暴力事件。Detecting the sound stage module, obtaining the sound signal to be detected, extracting the sound feature to be detected of the sound signal to be detected; judging whether there is at least one sound event model matching the sound feature to be detected in the sound event model, such as If yes, it is determined that there is a violent incident; if no, it is determined that there is no violent incident.
实施例3:Example 3:
如图4所示,一种声音监控系统,其特征在于,包括麦克风,多路信号采集器,还包括如实施例2中所述的声音监控装置;As shown in Figure 4, a kind of sound monitoring system is characterized in that, comprises microphone, multi-channel signal collector, also comprises the sound monitoring device as described in embodiment 2;
所述麦克风安装于电梯内,采集声音信号,传送给多路信号采集器;The microphone is installed in the elevator, collects sound signals, and transmits them to the multi-channel signal collector;
所述多路信号采集器,接收麦克风发送的声音信号,传送给声音监控装置;The multi-channel signal collector receives the sound signal sent by the microphone and transmits it to the sound monitoring device;
所述声音监控装置对声音信号进行处理。The sound monitoring device processes sound signals.
综上,本发明实施例通过提供一种声音监控方法、装置及系统,通过提取训练声音信号的训练声音特征,训练声音事件模型;通过提取待检测声音信号的待检测声音特征,将所提取的待检测声音特征与训练声音事件模型做比较,分析得出电梯内是否存在暴力事件,实现了电梯内暴力事件的自动监控,实时给出监控结果,能有效保证检测的准确率,为监控人员的下一步处理提供依据。To sum up, the embodiment of the present invention provides a sound monitoring method, device and system, by extracting the training sound features of the training sound signal, training the sound event model; by extracting the sound features to be detected of the sound signal to be detected, the extracted Comparing the sound features to be detected with the training sound event model, it is analyzed whether there is a violent incident in the elevator, which realizes the automatic monitoring of violent incidents in the elevator, and the monitoring results are given in real time, which can effectively ensure the accuracy of the detection and provide for the monitoring personnel. Provide basis for further processing.
本发明实施例所采用的设备与视频监控所需要的工业相机相比,麦克风及其相关的采集设备具有成本低廉的优势,便于推广使用。Compared with the industrial cameras required for video surveillance, the equipment used in the embodiment of the present invention has the advantage of low cost and is convenient for popularization and use.
本发明实施例所采用的麦克风相比与视频监控所需要的工业相机,体积小,便于布置在隐藏的角落,避免受犯罪分子的破坏,使得监控设备更加安全。Compared with the industrial cameras required for video surveillance, the microphones used in the embodiments of the present invention are smaller in size and convenient to be placed in hidden corners to avoid damage by criminals and make the monitoring equipment safer.
本发明实施例所采用的麦克风相比与视频监控所需要的工业相机,采集信号不受光照、遮挡和伪装等因素的影响,使得监控方式更加稳定。Compared with the industrial cameras required for video surveillance, the microphones used in the embodiments of the present invention can collect signals without being affected by factors such as illumination, occlusion, and camouflage, making the monitoring method more stable.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, the terms "comprising", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be described in the foregoing embodiments Modifications are made to the recorded technical solutions, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310332073.6A CN103971702A (en) | 2013-08-01 | 2013-08-01 | Sound monitoring method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310332073.6A CN103971702A (en) | 2013-08-01 | 2013-08-01 | Sound monitoring method, device and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103971702A true CN103971702A (en) | 2014-08-06 |
Family
ID=51241116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310332073.6A Pending CN103971702A (en) | 2013-08-01 | 2013-08-01 | Sound monitoring method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103971702A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105679313A (en) * | 2016-04-15 | 2016-06-15 | 福建新恒通智能科技有限公司 | Audio recognition alarm system and method |
CN107527617A (en) * | 2017-09-30 | 2017-12-29 | 上海应用技术大学 | Monitoring method, apparatus and system based on voice recognition |
CN107910019A (en) * | 2017-11-30 | 2018-04-13 | 中国科学院微电子研究所 | Human body sound signal processing and analyzing method |
CN110223715A (en) * | 2019-05-07 | 2019-09-10 | 华南理工大学 | It is a kind of based on sound event detection old solitary people man in activity estimation method |
CN110800053A (en) * | 2017-06-13 | 2020-02-14 | 米纳特有限公司 | Method and apparatus for obtaining event indications based on audio data |
CN111326172A (en) * | 2018-12-17 | 2020-06-23 | 北京嘀嘀无限科技发展有限公司 | Conflict detection method and device, electronic equipment and readable storage medium |
WO2020140552A1 (en) * | 2018-12-31 | 2020-07-09 | 瑞声声学科技(深圳)有限公司 | Haptic feedback method |
CN111599379A (en) * | 2020-05-09 | 2020-08-28 | 北京南师信息技术有限公司 | Conflict early warning method, device, equipment, readable storage medium and triage system |
CN113421544A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
CN113670434A (en) * | 2021-06-21 | 2021-11-19 | 深圳供电局有限公司 | Transformer substation equipment sound abnormality identification method and device and computer equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477798A (en) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | Method for analyzing and extracting audio data of set scene |
CN101587710A (en) * | 2009-07-02 | 2009-11-25 | 北京理工大学 | A Multi-codebook Coding Parameter Quantization Method Based on Audio Emergency Classification |
CN102509545A (en) * | 2011-09-21 | 2012-06-20 | 哈尔滨工业大学 | Real time acoustics event detecting system and method |
CN102799899A (en) * | 2012-06-29 | 2012-11-28 | 北京理工大学 | Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model) |
CN103177722A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Tone-similarity-based song retrieval method |
CN103226948A (en) * | 2013-04-22 | 2013-07-31 | 山东师范大学 | Audio scene recognition method based on acoustic events |
-
2013
- 2013-08-01 CN CN201310332073.6A patent/CN103971702A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477798A (en) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | Method for analyzing and extracting audio data of set scene |
CN101587710A (en) * | 2009-07-02 | 2009-11-25 | 北京理工大学 | A Multi-codebook Coding Parameter Quantization Method Based on Audio Emergency Classification |
CN102509545A (en) * | 2011-09-21 | 2012-06-20 | 哈尔滨工业大学 | Real time acoustics event detecting system and method |
CN102799899A (en) * | 2012-06-29 | 2012-11-28 | 北京理工大学 | Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model) |
CN103177722A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Tone-similarity-based song retrieval method |
CN103226948A (en) * | 2013-04-22 | 2013-07-31 | 山东师范大学 | Audio scene recognition method based on acoustic events |
Non-Patent Citations (2)
Title |
---|
蒋刚 等: "《工业机器人》", 31 January 2011 * |
韩纪庆 等: "《音频信息检索理论与技术》", 31 March 2011 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105679313A (en) * | 2016-04-15 | 2016-06-15 | 福建新恒通智能科技有限公司 | Audio recognition alarm system and method |
CN110800053A (en) * | 2017-06-13 | 2020-02-14 | 米纳特有限公司 | Method and apparatus for obtaining event indications based on audio data |
CN107527617A (en) * | 2017-09-30 | 2017-12-29 | 上海应用技术大学 | Monitoring method, apparatus and system based on voice recognition |
CN107910019A (en) * | 2017-11-30 | 2018-04-13 | 中国科学院微电子研究所 | Human body sound signal processing and analyzing method |
CN111326172A (en) * | 2018-12-17 | 2020-06-23 | 北京嘀嘀无限科技发展有限公司 | Conflict detection method and device, electronic equipment and readable storage medium |
WO2020140552A1 (en) * | 2018-12-31 | 2020-07-09 | 瑞声声学科技(深圳)有限公司 | Haptic feedback method |
CN110223715A (en) * | 2019-05-07 | 2019-09-10 | 华南理工大学 | It is a kind of based on sound event detection old solitary people man in activity estimation method |
CN110223715B (en) * | 2019-05-07 | 2021-05-25 | 华南理工大学 | A method for estimating activity in the home of the elderly living alone based on sound event detection |
CN111599379A (en) * | 2020-05-09 | 2020-08-28 | 北京南师信息技术有限公司 | Conflict early warning method, device, equipment, readable storage medium and triage system |
CN111599379B (en) * | 2020-05-09 | 2023-09-29 | 北京南师信息技术有限公司 | Conflict early warning method, device, equipment, readable storage medium and triage system |
CN113670434A (en) * | 2021-06-21 | 2021-11-19 | 深圳供电局有限公司 | Transformer substation equipment sound abnormality identification method and device and computer equipment |
CN113421544A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
CN113421544B (en) * | 2021-06-30 | 2024-05-10 | 平安科技(深圳)有限公司 | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103971702A (en) | Sound monitoring method, device and system | |
CN107527617A (en) | Monitoring method, apparatus and system based on voice recognition | |
CN103971700A (en) | Voice monitoring method and device | |
CN102522082B (en) | Recognizing and locating method for abnormal sound in public places | |
CN106251874A (en) | A kind of voice gate inhibition and quiet environment monitoring method and system | |
CN102014278A (en) | Intelligent video monitoring method based on voice recognition technology | |
CN102664006A (en) | Abnormal voice detecting method based on time-domain and frequency-domain analysis | |
CN102426835A (en) | Switch cabinet partial discharge signal identification method based on support vector machine model | |
CN110364141A (en) | Alarm method of elevator typical abnormal sound based on deep single classifier | |
Kim et al. | Hierarchical approach for abnormal acoustic event classification in an elevator | |
Choi et al. | Selective background adaptation based abnormal acoustic event recognition for audio surveillance | |
KR101250668B1 (en) | Method for recogning emergency speech using gmm | |
CN109243486A (en) | A kind of winged acoustic detection method of cracking down upon evil forces based on machine learning | |
CN105812721A (en) | Tracking monitoring method and tracking monitoring device | |
Park et al. | Sound learning–based event detection for acoustic surveillance sensors | |
Zhang et al. | A pairwise algorithm for pitch estimation and speech separation using deep stacking network | |
Spadini et al. | Sound event recognition in a smart city surveillance context | |
Warule et al. | Hilbert-Huang Transform-Based Time-Frequency Analysis of Speech Signals for the Identification of Common Cold | |
Wan et al. | Recognition of potential danger to buried pipelines based on sounds | |
Galgali et al. | Speaker profiling by extracting paralinguistic parameters using mel frequency cepstral coefficients | |
Khanum et al. | Speech based gender identification using feed forward neural networks | |
CN115526205A (en) | Intelligent excavator voiceprint identification method and system based on convolutional neural network | |
Vozáriková et al. | Acoustic events detection using MFCC and MPEG-7 descriptors | |
Vozáriková et al. | Surveillance system based on the acoustic events detection | |
Estrebou et al. | Voice recognition based on probabilistic SOM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140806 |