CN103971702A

CN103971702A - Sound monitoring method, device and system

Info

Publication number: CN103971702A
Application number: CN201310332073.6A
Authority: CN
Inventors: 何勇军; 孙广路; 谢怡宁; 刘嘉辉
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2013-08-01
Filing date: 2013-08-01
Publication date: 2014-08-06

Abstract

The invention provides a sound monitoring method, device and system, and relates to the technical fields of sound signal processing and pattern recognition. The method includes steps: training sound stage and detecting sound stage, the training sound stage includes: S1, obtaining training sound signal, extracting training sound features; S2, according to training sound features, training sound event model; detecting sound stage includes: S3, extracting Sound features to be detected; S4. Judging whether there is at least one sound event model matching the sound features to be detected in the sound event model, if yes, then determine that there is a violent event; if not, determine that there is no violent event . The present invention extracts the sound features of the sound signal, compares the extracted sound features with the trained sound event model, analyzes whether there is a violent event in the elevator, realizes the automatic monitoring of the violent event in the elevator, and provides real-time monitoring As a result, the detection accuracy can be effectively guaranteed.

Description

Sound monitoring method, device and system

技术领域technical field

本发明涉及声音信号处理和模式识别技术领域，具体涉及一种声音监控方法、装置及系统。The invention relates to the technical field of sound signal processing and pattern recognition, in particular to a sound monitoring method, device and system.

背景技术Background technique

随着现代化城市的高速发展,电梯的使用越来越普遍并已成为高层建筑必不可少的垂直交通工具，与居民的日常工作和生活密切相关。据有关部门统计，截止目前，我国电梯的年需求量已达到全球的三分之一。与此同时，由于电梯相对封闭，已成为犯罪分子实施不法行为的极佳场所，这为人们的日常生活带来了众多的安全隐患。越来越多的犯罪分子在电梯中实施抢劫、杀人或者是性骚扰，严重威胁着乘梯者的生命财产安全。文献表明，近年来的电梯暴力事件呈现迅速增长趋势，仅2012年一年，有记录在案的电梯犯罪事件就高达6.2万多起。因此，对电梯内发生的事件进行有效的监控无疑将对电梯暴力事件的发现、制止和侦破等有着重要的现实意义。With the rapid development of modern cities, the use of elevators has become more and more common and has become an indispensable vertical transportation tool for high-rise buildings, which is closely related to the daily work and life of residents. According to the statistics of relevant departments, as of now, the annual demand for elevators in my country has reached one-third of the world's total. At the same time, because the elevator is relatively closed, it has become an excellent place for criminals to carry out illegal activities, which has brought many potential safety hazards for people's daily life. More and more criminals are committing robbery, murder or sexual harassment in elevators, seriously threatening the life and property safety of elevator riders. Literature shows that in recent years, elevator violence has shown a rapid growth trend. In 2012 alone, there were more than 62,000 recorded elevator crimes. Therefore, the effective monitoring of the incidents in the elevator will undoubtedly have important practical significance for the discovery, prevention and detection of elevator violence.

目前广泛采用摄像头视频监控的方式对电梯内的暴力事件实现有效监控。At present, the method of camera video surveillance is widely used to effectively monitor the violence in the elevator.

虽然取得了一定的效果，但尚存在着如下问题：监控的智能化程度低，依赖于监控室工作人员的观察或翻看视频来发现暴力事件。显然，这种监控方式将耗费大量人力物力、而且人看视频图像超过20分钟其注意力将明显下降，准确率也大打折扣。Although certain effects have been achieved, there are still the following problems: the intelligence level of monitoring is low, and violent incidents are found by the observation of staff in the monitoring room or looking through videos. Obviously, this monitoring method will consume a lot of manpower and material resources, and people's attention will obviously decrease after watching the video image for more than 20 minutes, and the accuracy rate will be greatly reduced.

发明内容Contents of the invention

（一）解决的技术问题(1) Solved technical problems

针对现有技术的不足，本发明提供一种声音监控方法、装置及系统，能够自动实现电梯内暴力事件的监控。Aiming at the deficiencies of the prior art, the present invention provides a sound monitoring method, device and system, which can automatically realize the monitoring of violent events in the elevator.

（二）技术方案(2) Technical solutions

为实现以上目的，本发明通过以下技术方案予以实现：To achieve the above object, the present invention is achieved through the following technical solutions:

一种声音监控方法，包括训练声音阶段和检测声音阶段，A sound monitoring method comprising a training sound phase and a detection sound phase,

所述训练声音阶段包含步骤：The training voice stage comprises the steps of:

S1、获取训练声音信号，提取所述训练声音信号的训练声音特征；S1. Obtain a training sound signal, and extract training sound features of the training sound signal;

S2、根据所述训练声音特征，训练声音事件模型；S2. Train a sound event model according to the training sound features;

所述检测声音阶段包含步骤：The detection sound stage includes steps:

S3、获取待检测声音信号，提取所述待检测声音信号的待检测声音特征；S3. Obtain the sound signal to be detected, and extract the sound feature to be detected of the sound signal to be detected;

S4、判断所述声音事件模型中是否存在至少一个与所述待检测声音特征匹配的声音事件模型，如为是，则判定存在暴力事件；如为否，判定不存在暴力事件。S4. Judging whether there is at least one sound event model matching the sound feature to be detected among the sound event models, if yes, it is determined that there is a violent event; if it is no, it is determined that there is no violent event.

优选的，步骤S1中包含步骤：Preferably, step S1 includes steps:

S11、对所述获取的声音信号进行预处理；S11. Preprocessing the acquired sound signal;

S12、对预处理过后的声音信号作离散傅立叶变换，求得功率谱；S12. Discrete Fourier transform is performed on the preprocessed sound signal to obtain a power spectrum;

S13、基于梅尔滤波器组求得所述功率谱的梅尔倒谱系数；S13. Obtain the Mel cepstrum coefficients of the power spectrum based on the Mel filter bank;

S14、计算所述梅尔倒谱系数的一阶差分和二阶差分，将所述一阶差分和二阶差分的系数与所述梅尔倒谱系数拼接，形成声音特征。S14. Calculate the first-order difference and the second-order difference of the Mel cepstral coefficients, and splice the coefficients of the first-order difference and the second-order differences with the Mel cepstral coefficients to form sound features.

优选的，步骤S11中的预处理包括分帧操作和加窗操作；Preferably, the preprocessing in step S11 includes a framing operation and a windowing operation;

其中，加窗操作采用的窗函数为汉明窗，表达式w(n)为：Among them, the window function used in the windowing operation is the Hamming window, and the expression w(n) is:

式中n为时间序号，L为窗长；In the formula, n is the time sequence number, and L is the window length;

步骤S12中所述的求功率谱的表达式X_a(k)为：The expression X _a (k) of asking power spectrum described in step S12 is:

${X x}_{a a} ((k k)) = = {| | | | {Σ Σ}_{n no = = 00}^{N N - - 11} x x ((n no)) {e e}^{- - j j 22 kπ kπ / / N N} | | | |}^{22},, 00 \leq \leq k k \leq \leq N N$

式中x(n)为加窗后的语音帧，N表示傅立叶变换的点数，j表示虚数单位。In the formula, x(n) is the speech frame after windowing, N represents the number of Fourier transform points, and j represents the imaginary number unit.

优选的，步骤S2中的通过高斯混合模型来训练声音暴力事件模型，所述的M阶高斯混合模型的概率密度函数如下：Preferably, in step S2, the Gaussian mixture model is used to train the sound violence event model, and the probability density function of the M-order Gaussian mixture model is as follows:

$P P ((o o | | λ λ)) = = {Σ Σ}_{i i = = 11}^{M m} {c c}_{i i} P P ((o o | | i i,, λ λ))$

其中， $P (o | i, λ) = N (o, μ_{i}, Σ_{i}) = \frac{1}{{(2 π)}^{K / 2} {| Σ_{i} |}^{1 / 2}} \exp {- \frac{{(o - μ_{i})}^{T} Σ_{i}^{- 1} (o - μ_{i})}{2}}$ in, $P (o | i, λ) = N (o, μ_{i}, Σ_{i}) = \frac{1}{{(2 π)}^{K / 2} {| Σ_{i} |}^{1 / 2}} \exp {- \frac{{(o - μ_{i})}^{T} Σ_{i}^{- 1} (o - μ_{i})}{2}}$

式中，λ＝{c_i,μ_i,Σ_i;(i＝1...M)}，μ_i为均值矢量，Σ_i为协方差矩阵，i＝1,2,..M。矩阵Σ_i在这里采用对角阵： In the formula, λ={c _i ,μ _i ,Σ _i ;(i=1...M)}, μ _i is the mean vector, Σ _i is the covariance matrix, i=1,2,...M. The matrix Σ _i takes a diagonal matrix here:

${c c}_{i i} = = \frac{11}{T T} {Σ Σ}_{i i = = 11}^{T T} P P (({q q}_{t t} = = i i | | {o o}_{t t},, λ λ))$

${μ μ}_{i i} = = \frac{{Σ Σ}_{t t = = 11}^{T T} P P (({q q}_{t t} = = i i | | {o o}_{t t},, λ λ)) {o o}_{t t}}{{Σ Σ}_{t t = = 11}^{T T} P P (({q q}_{t t} = = i i | | {o o}_{t t},, λ λ))}$

优选的，步骤S4包含以下步骤：Preferably, step S4 includes the following steps:

S31、假定声音事件模型有N个，每个声音事件模型通过一个高斯混合模型建模，分别为λ₁,λ₂,...,λ_N，在判断阶段，输入的观测所述待检测声音特征集为O＝{o₁,o₂,...,o_T}，T为输入声音的帧数；S31. It is assumed that there are N sound event models, and each sound event model is modeled by a Gaussian mixture model, which are λ ₁ , λ ₂ ,..., λ _N , and in the judgment stage, the input observations of the sound to be detected The feature set is O={o ₁ ,o ₂ ,...,o _T }, T is the number of frames of the input sound;

S32、计算所述待检测声音为第n个声音事件模型的后验概率，1≤n≤N；S32. Calculate the posterior probability that the sound to be detected is the nth sound event model, 1≤n≤N;

S33、根据所述后验概率得到预判结果；S33. Obtain a prediction result according to the posterior probability;

S34、根据所述预判结果得到最终的判决结果。S34. Obtain a final judgment result according to the prediction result.

优选的，preferred,

步骤S32中的计算后验概率表达式为：The calculation posterior probability expression in the step S32 is:

$p p (({λ λ}_{n no} | | O o)) = = \frac{p p ((O o | | {λ λ}_{n no})) p p (({λ λ}_{n no}))}{p p ((O o))}$

$= = \frac{p p ((O o | | {λ λ}_{n no})) p p (({λ λ}_{n no}))}{{Σ Σ}_{m m = = 11}^{N N} p p ((O o | | {λ λ}_{m m})) p p (({λ λ}_{m m}))}$

$P P (({λ λ}_{n no})) = = \frac{11}{N N},, n no = = 1,2 1,2,, N N . .$

式中，p(λ_n)为第n个声音事件模型的先验概率；p(O)为所有声音事件模型条件下所述待检测声音特征集O的概率；p(O|λ_n)为第n个声音事件模型产生所述待检测声音特征集O的条件概率。In the formula, p(λ _n ) is the prior probability of the nth sound event model; p(O) is the probability of the sound feature set O to be detected under the conditions of all sound event models; p(O|λ _n ) is The nth sound event model generates the conditional probability of the sound feature set O to be detected.

优选的，步骤S33中的计算预判结果表达式为：Preferably, the calculation prediction result expression in step S33 is:

${n no}^{* *} = = \underset{11 \leq \leq n no \leq \leq N N}{arg arg max max} ln ln P P (({λ λ}_{n no} | | O o)) = = \underset{11 \leq \leq n no \leq \leq N N}{arg arg max max} {Σ Σ}_{t t = = 11}^{T T} ln ln P P (({λ λ}_{n no} | | {o o}_{t t}))$

式中，p(λ_n)为第n个声音事件模型的先验概率；p(O)为所有声音事件模型条件下所述待检测声音特征集O的概率；P(λ_n|o_t)为o_t产生于λ_n的概率；In the formula, p(λ _n ) is the prior probability of the nth sound event model; p(O) is the probability of the sound feature set O to be detected under the conditions of all sound event models; P(λ _n |o _t ) is the probability that o _t is generated from λ _n ;

优选的，步骤S34中的计算判决结果表达式为：Preferably, the calculation judgment result expression in step S34 is:

式中，p(λ_n)为第n个声音事件模型的先验概率；p(O)为所有声音事件模型条件下所述待检测声音特征集O的概率；为o_t产生于的概率；threshold为预设的拒识门限。In the formula, p (λ _n ) is the prior probability of the nth sound event model; p (O) is the probability of the sound feature set O to be detected under all sound event model conditions; for o _t produced in probability; threshold is the preset rejection threshold.

本发明还一种声音监控装置，包含以下模块：The present invention also provides a sound monitoring device, comprising the following modules:

训练声音阶段模块，获取训练声音信号，提取所述训练声音信号的训练声音特征；根据所述训练声音特征，训练声音事件模型；Training sound stage module, obtain training sound signal, extract the training sound feature of described training sound signal; According to described training sound feature, train sound event model;

检测声音阶段模块，获取待检测声音信号，提取所述待检测声音信号的待检测声音特征；判断所述声音事件模型中是否存在至少一个与所述待检测声音特征匹配的声音事件模型，如为是，则判定存在暴力事件；如为否，判定不存在暴力事件。Detecting the sound stage module, obtaining the sound signal to be detected, extracting the sound feature to be detected of the sound signal to be detected; judging whether there is at least one sound event model matching the sound feature to be detected in the sound event model, such as If yes, it is determined that there is a violent incident; if no, it is determined that there is no violent incident.

本发明还提供了一种声音监控系统，其特征在于，包括麦克风，多路信号采集器，还包括声音监控装置；The present invention also provides a sound monitoring system, which is characterized in that it includes a microphone, a multi-channel signal collector, and a sound monitoring device;

所述麦克风安装于电梯内，采集声音信号，传送给多路信号采集器；The microphone is installed in the elevator, collects sound signals, and transmits them to the multi-channel signal collector;

所述多路信号采集器，接收麦克风发送的声音信号，传送给声音监控装置；The multi-channel signal collector receives the sound signal sent by the microphone and transmits it to the sound monitoring device;

所述声音监控装置对声音信号进行处理。The sound monitoring device processes sound signals.

（三）有益效果(3) Beneficial effects

本发明通过提供一种声音监控方法、装置及系统，通过提取训练声音信号的训练声音特征，训练声音事件模型；通过提取待检测声音信号的待检测声音特征，将所提取的待检测声音特征与训练声音事件模型做比较，分析得出电梯内是否存在暴力事件，实现了电梯内暴力事件的自动监控，实时给出监控结果，能有效保证检测的准确率，为监控人员的下一步处理提供依据。The present invention provides a sound monitoring method, device and system, and trains the sound event model by extracting the training sound features of the training sound signal; by extracting the sound features to be detected of the sound signal to be detected, the extracted sound features to be detected are combined with Train the sound event model for comparison, analyze whether there is a violent incident in the elevator, realize the automatic monitoring of violent incidents in the elevator, and give the monitoring results in real time, which can effectively ensure the accuracy of the detection and provide a basis for the next step of the monitoring personnel. .

本发明所采用的设备与视频监控所需要的工业相机相比，麦克风及其相关的采集设备具有成本低廉的优势，便于推广使用。Compared with the industrial cameras required for video surveillance, the equipment used in the present invention has the advantage of low cost and is convenient for popularization and use.

本发明所采用的麦克风相比与视频监控所需要的工业相机，体积小，便于布置在隐藏的角落，避免受犯罪分子的破坏，使得监控设备更加安全。Compared with the industrial camera required for video surveillance, the microphone adopted in the present invention is smaller in size and convenient to be arranged in hidden corners to avoid being damaged by criminals and make the monitoring equipment safer.

本发明所采用的麦克风相比与视频监控所需要的工业相机，采集信号不受光照、遮挡和伪装等因素的影响，使得监控方式更加稳定。Compared with the industrial camera required for video monitoring, the microphone used in the present invention can collect signals without being affected by factors such as illumination, occlusion, and camouflage, so that the monitoring method is more stable.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明优选实施例的一种声音监控方法的流程示意图；Fig. 1 is a schematic flow chart of a sound monitoring method in a preferred embodiment of the present invention;

图2为本发明优选实施例的一种声音监控方法的流程示意图；Fig. 2 is a schematic flow chart of a sound monitoring method in a preferred embodiment of the present invention;

图3为本发明优选实施例的一种声音监控装置的结构示意图；Fig. 3 is a schematic structural diagram of a sound monitoring device according to a preferred embodiment of the present invention;

图4为本发明优选实施例的一种声音监控系统的架构示意图。Fig. 4 is a schematic structural diagram of a sound monitoring system according to a preferred embodiment of the present invention.

具体实施方式Detailed ways

下面对于本发明所提出的一种声音监控方法、装置及系统，结合附图和实施例详细说明。A sound monitoring method, device and system proposed by the present invention will be described in detail below with reference to the accompanying drawings and embodiments.

实施例1：Example 1:

如图1所示，一种声音监控方法，包括训练声音阶段和检测声音阶段，As shown in Figure 1, a kind of sound monitoring method, comprises training sound stage and detection sound stage,

S2、根据所述训练声音特征训练声音事件模型；S2. Train a sound event model according to the training sound features;

本发明实施例通过提供一种声音监控方法，通过提取训练声音信号的训练声音特征，训练声音事件模型；通过提取待检测声音信号的待检测声音特征，将所提取的待检测声音特征与训练声音事件模型做比较，分析得出电梯内是否存在暴力事件，实现了电梯内暴力事件的自动监控，实时给出监控结果，能有效保证检测的准确率，为监控人员的下一步处理提供依据。The embodiment of the present invention provides a sound monitoring method, by extracting the training sound features of the training sound signal to train the sound event model; by extracting the sound features to be detected of the sound signal to be detected, the extracted sound features to be detected and the training sound By comparing the event models, it is analyzed whether there is a violent incident in the elevator, which realizes the automatic monitoring of violent incidents in the elevator, and gives the monitoring results in real time, which can effectively ensure the accuracy of the detection and provide a basis for the next step of the monitoring personnel.

下面对本发明实施例继续进行详细的阐述：The embodiment of the present invention continues to be described in detail below:

优选的，步骤S1中包含步骤：Preferably, step S1 includes steps:

S11、对获取的训练声音信号进行预处理；S11. Preprocessing the acquired training sound signal;

优选的，步骤S12中所述的求功率谱的表达式X_a(k)为：Preferably, the expression X _a (k) of the power spectrum described in step S12 is:

本发明实施例为每个训练声音信号建立一个GMM。M阶GMM的概率密度函数如下：In the embodiment of the present invention, a GMM is established for each training sound signal. The probability density function of the M-order GMM is as follows:

$P P ((o o | | λ λ)) = = {Σ Σ}_{i i = = 11}^{M m} P P ((o o | | i i,, λ λ)) = = {Σ Σ}_{i i = = 11}^{M m} {c c}_{i i} P P ((o o | | i i,, λ λ))$

其中，λ为GMM模型的参数集；o为K维的声学特征矢量；i为隐状态号，也就是高斯分量的序号，M阶GMM就有M个隐状态；c_i为第i个分量的混合权值，其值对应为隐状态i的先验概率，因此有：Among them, λ is the parameter set of the GMM model; o is the K-dimensional acoustic feature vector; i is the hidden state number, that is, the serial number of the Gaussian component, and M-order GMM has M hidden states; c _i is the i-th component The mixed weight value corresponds to the prior probability of the hidden state i, so there are:

${Σ Σ}_{i i = = 11}^{M m} {c c}_{i i} = = 11$

P(o|i,λ)为高斯混合分量，对应隐状态i的观察概率密度函数，P(o|i,λ) is a Gaussian mixture component, corresponding to the observation probability density function of hidden state i,

其中，步骤S2中的通过高斯混合模型来训练声音暴力事件模型，所述的M阶高斯混合模型的概率密度函数如下：Wherein, in step S2, the Gaussian mixture model is used to train the sound violence event model, and the probability density function of the M-order Gaussian mixture model is as follows:

优选的，步骤S3中包含步骤：Preferably, step S3 includes steps:

S11’、对获取的待检测声音信号进行预处理；S11', preprocessing the acquired sound signal to be detected;

优选的，步骤S11’中的预处理包括分帧操作和加窗操作；Preferably, the preprocessing in step S11' comprises framing and windowing operations;

其中，分帧的目的在于将时间信号分割为相互交叠的语音片断，即帧。每帧长度通常为30ms左右，帧移为10ms。Among them, the purpose of framing is to divide the time signal into overlapping speech segments, ie frames. The length of each frame is usually about 30ms, and the frame shift is 10ms.

S12’、对预处理过后的声音信号作离散傅立叶变换，求得功率谱；S12', performing discrete Fourier transform on the preprocessed sound signal to obtain the power spectrum;

优选的，步骤S12’中所述的求功率谱的表达式X_a(k)为：Preferably, the expression X _a (k) for calculating the power spectrum described in step S12' is:

S13’、基于梅尔滤波器组求得所述功率谱的梅尔倒谱系数；S13', obtain the Mel cepstrum coefficients of the power spectrum based on the Mel filter bank;

本发明实施例定义一个有M个滤波器的滤波器组(滤波器的个数和临界带的个数相近)，采用的滤波器为三角滤波器，中心频率为f(m)，m=0,2,…,M-1，本发明实施例取M=28。滤波器组中每个三角滤波器的跨度在梅尔标度上是相等的，三角滤波器的频率响应定义为：The embodiment of the present invention defines a filter bank with M filters (the number of filters is similar to the number of critical bands), and the filter used is a triangular filter with a center frequency of f(m), m=0 ,2,...,M-1, M=28 in the embodiment of the present invention. The span of each triangular filter in the filter bank is equal on the Mel scale, and the frequency response of a triangular filter is defined as:

${H h}_{m m} ((k k)) = = \{\begin{matrix} 00 & k k < < f f ((m m - - 11)) ork ork > > f f ((m m + + 11)) \\ \frac{22 ((k k - - f f ((m m - - 11))))}{((f f ((m m + + 11)) - - f f ((m m - - 11)))) ((f f ((m m)) - - f f ((m m - - 11))))} & f f ((m m - - 11)) < < k k < < f f ((m m)) \\ \frac{22 ((f f ((m m + + 11)) - - k k))}{((f f ((m m + + 11)) - - f f ((m m - - 11)))) ((f f ((m m + + 11)) - - f f ((m m))))} & f f ((m m)) \leq \leq k k \leq \leq f f ((m m + + 11)) \end{matrix}$

接下来对功率谱加梅尔滤波器组：Next to the power spectrum Gamel filter bank:

$S S ((m m)) = = ln ln (({Σ Σ}_{k k = = 00}^{N N - - 11} {| | {X x}_{a a} ((k k)) | |}^{22} {H h}_{m m} ((k k)))),, 00 \leq \leq m m \leq \leq M m$

然后作离散余弦变换(DCT)得到梅尔倒谱系数：Then do the discrete cosine transform (DCT) to get the Mel cepstral coefficients:

$c c ((n no)) = = {Σ Σ}_{m m = = 00}^{M m - - 11} S S ((m m)) cos cos ((nπ nπ ((m m - - 0.5 0.5)) / / M m)),, 00 \leq \leq n no \leq \leq M m$

S14’、计算所述梅尔倒谱系数的一阶差分和二阶差分，将所述一阶差分和二阶差分的系数与所述梅尔倒谱系数拼接，形成声音特征。S14'. Calculate the first-order difference and the second-order difference of the Mel cepstrum coefficients, and splice the coefficients of the first-order difference and the second-order differences with the Mel cepstral coefficients to form sound features.

如果t和t+1时刻的倒谱向量为c_t和c_t+1，If the cepstrum vectors at time t and t+1 are c _t and c _t+1 ,

一阶差分的计算方法为：The calculation method of the first order difference is:

Δc_t＝c_t+1-c_t Δc _t =c _t+1 -c _t

二阶差分为：The second order difference is:

ΔΔc_t＝Δc_t+1-Δc_t ΔΔc _t = Δc _t+1 -Δc _t

拼接后的语音特征为：The speech features after splicing are:

[c_t Δc_t ΔΔc_t][c _t Δc _t ΔΔc _t ]

优选的，步骤S32中的计算后验概率表达式为：Preferably, the calculation posterior probability expression in step S32 is:

$P P (({λ λ}_{n no})) = = \frac{11}{N N},, n no = = 1,2 1,2,, N N . .$

实施例2：Example 2:

如图3所示，一种声音监控装置，其特征在于，包含以下模块：As shown in Figure 3, a sound monitoring device is characterized in that it comprises the following modules:

实施例3：Example 3:

如图4所示，一种声音监控系统，其特征在于，包括麦克风，多路信号采集器，还包括如实施例2中所述的声音监控装置；As shown in Figure 4, a kind of sound monitoring system is characterized in that, comprises microphone, multi-channel signal collector, also comprises the sound monitoring device as described in embodiment 2;

综上，本发明实施例通过提供一种声音监控方法、装置及系统，通过提取训练声音信号的训练声音特征，训练声音事件模型；通过提取待检测声音信号的待检测声音特征，将所提取的待检测声音特征与训练声音事件模型做比较，分析得出电梯内是否存在暴力事件，实现了电梯内暴力事件的自动监控，实时给出监控结果，能有效保证检测的准确率，为监控人员的下一步处理提供依据。To sum up, the embodiment of the present invention provides a sound monitoring method, device and system, by extracting the training sound features of the training sound signal, training the sound event model; by extracting the sound features to be detected of the sound signal to be detected, the extracted Comparing the sound features to be detected with the training sound event model, it is analyzed whether there is a violent incident in the elevator, which realizes the automatic monitoring of violent incidents in the elevator, and the monitoring results are given in real time, which can effectively ensure the accuracy of the detection and provide for the monitoring personnel. Provide basis for further processing.

本发明实施例所采用的设备与视频监控所需要的工业相机相比，麦克风及其相关的采集设备具有成本低廉的优势，便于推广使用。Compared with the industrial cameras required for video surveillance, the equipment used in the embodiment of the present invention has the advantage of low cost and is convenient for popularization and use.

本发明实施例所采用的麦克风相比与视频监控所需要的工业相机，体积小，便于布置在隐藏的角落，避免受犯罪分子的破坏，使得监控设备更加安全。Compared with the industrial cameras required for video surveillance, the microphones used in the embodiments of the present invention are smaller in size and convenient to be placed in hidden corners to avoid damage by criminals and make the monitoring equipment safer.

本发明实施例所采用的麦克风相比与视频监控所需要的工业相机，采集信号不受光照、遮挡和伪装等因素的影响，使得监控方式更加稳定。Compared with the industrial cameras required for video surveillance, the microphones used in the embodiments of the present invention can collect signals without being affected by factors such as illumination, occlusion, and camouflage, making the monitoring method more stable.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, the terms "comprising", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be described in the foregoing embodiments Modifications are made to the recorded technical solutions, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A sound monitoring method is characterized in that the method comprises a training sound stage and a detection sound stage,

the training sound phase comprises the steps of:

s1, acquiring a training voice signal, and extracting the training voice feature of the training voice signal;

s2, training a sound event model according to the training sound characteristics;

the sound detection stage comprises the following steps:

s3, acquiring a sound signal to be detected, and extracting the sound characteristic to be detected of the sound signal to be detected;

s4, judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.

2. The sound monitoring method of claim 1, wherein the step S1 or the step S3 includes the steps of:

s11, preprocessing the acquired sound signal;

s12, performing discrete Fourier transform on the preprocessed sound signal to obtain a power spectrum;

s13, obtaining a Mel cepstrum coefficient of the power spectrum based on a Mel filter bank;

s14, calculating the first order difference and the second order difference of the Mel cepstrum coefficient, and splicing the coefficients of the first order difference and the second order difference with the Mel cepstrum coefficient to form sound characteristics.

3. The sound monitoring method of claim 2,

the preprocessing in step S11 includes a framing operation and a windowing operation;

wherein, the window function adopted by the windowing operation is a hamming window, and the expression w (n) is:

wherein n is a time sequence number, and L is a window length;

the expression X for power spectrum calculation described in step S12_a(k) Comprises the following steps:

where x (N) is the windowed speech frame, N represents the number of points in the Fourier transform, and j represents the imaginary unit.

4. The sound monitoring method of claim 1, wherein the violent acoustic event model is trained by a gaussian mixture model in step S2, and the probability density function of the M-th order gaussian mixture model is as follows:

wherein,

wherein λ ═ c_i,μ_i,Σ_i;(i＝1...M)}，μ_iAs mean vector, sigma_iIs a covariance matrix, i ═ 1,2,. M. Matrix sigma_iHere, a diagonal matrix is used:

5. the sound monitoring method of claim 1, wherein the step S4 includes the steps of:

s31, assuming that N sound event models are provided, each sound event model passes throughModeling by a Gaussian mixture model of lambda₁,λ₂,...,λ_NIn the judging stage, the input observation voice feature set to be detected is O ═ O₁,o₂,...,o_TT is the frame number of the input sound;

s32, calculating the posterior probability of the nth sound event model of the sound to be detected, wherein N is more than or equal to 1 and less than or equal to N;

s33, obtaining a pre-judgment result according to the posterior probability;

and S34, obtaining a final judgment result according to the pre-judgment result.

6. The sound monitoring method of claim 5,

the expression for calculating the posterior probability in step S32 is:

in the formula, p (lambda)_n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (O | λ)_n) And generating the conditional probability of the sound feature set O to be detected for the nth sound event model.

7. The sound monitoring method of claim 5,

the expression of the calculation prejudgment result in the step S33 is:

in the formula, p (lambda)_n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (lambda)_n|o_t) Is o_tGenerated from lambda_nThe probability of (c).

8. The sound monitoring method of claim 5,

the calculation decision result expression in step S34 is:

in the formula, p (lambda)_n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions;is o_tGenerated fromThe probability of (d); the threshold is a preset rejection threshold.

9. A sound monitoring device, comprising:

the training sound stage module is used for acquiring a training sound signal and extracting the training sound characteristic of the training sound signal; training a sound event model according to the training sound characteristics;

the detection sound stage module is used for acquiring a sound signal to be detected and extracting the sound characteristic to be detected of the sound signal to be detected; judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.

10. A sound monitoring system, comprising a microphone, a multi-channel signal collector, and a sound monitoring apparatus according to claim 9;

the microphone is arranged in the elevator, collects sound signals and transmits the sound signals to the multi-path signal collector;

the multi-channel signal collector receives the sound signals sent by the microphone and transmits the sound signals to the sound monitoring device;

the sound monitoring device processes the sound signal.