Nothing Special   »   [go: up one dir, main page]

CN103971702A - Sound monitoring method, device and system - Google Patents

Sound monitoring method, device and system Download PDF

Info

Publication number
CN103971702A
CN103971702A CN201310332073.6A CN201310332073A CN103971702A CN 103971702 A CN103971702 A CN 103971702A CN 201310332073 A CN201310332073 A CN 201310332073A CN 103971702 A CN103971702 A CN 103971702A
Authority
CN
China
Prior art keywords
sound
mrow
training
detected
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310332073.6A
Other languages
Chinese (zh)
Inventor
何勇军
孙广路
谢怡宁
刘嘉辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN201310332073.6A priority Critical patent/CN103971702A/en
Publication of CN103971702A publication Critical patent/CN103971702A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention provides a sound monitoring method, device and system and relates to the technical field of sound signal processing and sound mode recognition. The method includes a sound training stage and a sound detecting stage. The sound training stage includes the steps of S1, acquiring sound training signals, and extracting sound training features; S2, training sound event models according to the sound training features. The sound detecting stage includes the steps of S3, extracting the features of to-be-detected sounds; S4, judging whether at least one of the sound event models is matched with the features of the to-be-detected sounds or not, if so, judging that a violent event exists, and if not, judging that no violent event exists. The method has the advantages that the sound features of sound signals are extracted and compared with trained sound event models, so that whether violent events exist in an elevator can be known through analyzing, automatic monitoring of the violent events in the elevator is achieved, the monitoring results can be provided in real time, and detecting accuracy can be effectively guaranteed.

Description

Sound monitoring method, device and system
Technical Field
The invention relates to the technical field of sound signal processing and pattern recognition, in particular to a sound monitoring method, device and system.
Background
With the high-speed development of modern cities, the use of elevators is more and more common and has become an indispensable vertical transportation means for high-rise buildings, and is closely related to the daily work and life of residents. According to statistics of relevant departments, the annual demand of the elevators in China reaches one third of the global demand at present. Meanwhile, because the elevator is relatively closed, the elevator becomes an excellent place for criminals to implement illegal activities, and thus, a plurality of potential safety hazards are brought to daily life of people. More and more criminals implement robbery, killing or sexual disturbance in the elevator, and seriously threaten the life and property safety of elevator riders. The literature shows that the elevator violence incidents in recent years show a rapid growth trend, and only in 2012 years, the elevator criminal incidents recorded on the case are as high as 6.2 thousands. Therefore, effective monitoring of events occurring in the elevator undoubtedly has important practical significance for discovery, prevention, detection and the like of elevator violence events.
At present, the violent events in the elevator are effectively monitored by widely adopting a camera video monitoring mode.
Although a certain effect is obtained, the following problems still exist: the monitoring is low in intelligentization degree, and violent events are found by observing or reviewing videos of monitoring room workers. Obviously, the monitoring method consumes a great deal of manpower and material resources, and the attention of people who watch the video images for more than 20 minutes is obviously reduced, and the accuracy rate is greatly reduced.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a sound monitoring method, a sound monitoring device and a sound monitoring system, which can automatically realize the monitoring of violence incidents in an elevator.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
a sound monitoring method comprises a training sound stage and a detection sound stage,
the training sound phase comprises the steps of:
s1, acquiring a training voice signal, and extracting the training voice feature of the training voice signal;
s2, training a sound event model according to the training sound characteristics;
the sound detection stage comprises the following steps:
s3, acquiring a sound signal to be detected, and extracting the sound characteristic to be detected of the sound signal to be detected;
s4, judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
Preferably, step S1 includes the steps of:
s11, preprocessing the acquired sound signal;
s12, performing discrete Fourier transform on the preprocessed sound signal to obtain a power spectrum;
s13, obtaining a Mel cepstrum coefficient of the power spectrum based on a Mel filter bank;
s14, calculating the first order difference and the second order difference of the Mel cepstrum coefficient, and splicing the coefficients of the first order difference and the second order difference with the Mel cepstrum coefficient to form sound characteristics.
Preferably, the preprocessing in step S11 includes a framing operation and a windowing operation;
wherein, the window function adopted by the windowing operation is a hamming window, and the expression w (n) is:
wherein n is a time sequence number, and L is a window length;
the expression X for power spectrum calculation described in step S12a(k) Comprises the following steps:
<math> <mrow> <msub> <mi>X</mi> <mi>a</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mrow> <mo>|</mo> <mo>|</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>j</mi> <mn>2</mn> <mi>k&pi;</mi> <mo>/</mo> <mi>N</mi> </mrow> </msup> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> <mn>0</mn> <mo>&le;</mo> <mi>k</mi> <mo>&le;</mo> <mi>N</mi> </mrow> </math>
where x (N) is the windowed speech frame, N represents the number of points in the Fourier transform, and j represents the imaginary unit.
Preferably, in step S2, the violent acoustic event model is trained by a gaussian mixture model, and the probability density function of the M-th order gaussian mixture model is as follows:
<math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>|</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>c</mi> <mi>i</mi> </msub> <mi>P</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>|</mo> <mi>i</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> </mrow> </math>
wherein, <math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>|</mo> <mi>i</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>,</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>&Sigma;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msup> <mrow> <mo>(</mo> <mn>2</mn> <mi>&pi;</mi> <mo>)</mo> </mrow> <mrow> <mi>K</mi> <mo>/</mo> <mn>2</mn> </mrow> </msup> <msup> <mrow> <mo>|</mo> <msub> <mi>&Sigma;</mi> <mi>i</mi> </msub> <mo>|</mo> </mrow> <mrow> <mn>1</mn> <mo>/</mo> <mn>2</mn> </mrow> </msup> </mrow> </mfrac> <mi>exp</mi> <mo>{</mo> <mo>-</mo> <mfrac> <mrow> <msup> <mrow> <mo>(</mo> <mi>o</mi> <mo>-</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mi>&Sigma;</mi> <mi>i</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>o</mi> <mo>-</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mn>2</mn> </mfrac> <mo>}</mo> </mrow> </math>
wherein λ ═ ciii;(i=1...M)},μiAs mean vector, sigmaiIs a covariance matrix, i ═ 1,2,. M.Matrix sigmaiHere, a diagonal matrix is used:
<math> <mrow> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>T</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>i</mi> <mo>|</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>i</mi> <mo>|</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <msub> <mi>o</mi> <mi>t</mi> </msub> </mrow> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>i</mi> <mo>|</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </math>
preferably, step S4 includes the steps of:
s31, assuming that N sound event models exist, each sound event model is modeled by a Gaussian mixture model which is lambda respectively12,...,λNIn the judging stage, the input observation voice feature set to be detected is O ═ O1,o2,...,oTT is the frame number of the input sound;
s32, calculating the posterior probability of the nth sound event model of the sound to be detected, wherein N is more than or equal to 1 and less than or equal to N;
s33, obtaining a pre-judgment result according to the posterior probability;
and S34, obtaining a final judgment result according to the pre-judgment result.
Preferably, the first and second liquid crystal materials are,
the expression for calculating the posterior probability in step S32 is:
<math> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>|</mo> <mi>O</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>|</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>|</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>|</mo> <msub> <mi>&lambda;</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mi>N</mi> <mo>.</mo> </mrow> </math>
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (O | λ)n) And generating the conditional probability of the sound feature set O to be detected for the nth sound event model.
Preferably, the expression of the calculation anticipation result in step S33 is:
<math> <mrow> <msup> <mi>n</mi> <mo>*</mo> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>max</mi> </mrow> <mrow> <mn>1</mn> <mo>&le;</mo> <mi>n</mi> <mo>&le;</mo> <mi>N</mi> </mrow> </munder> <mi>ln</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>|</mo> <mi>O</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>max</mi> </mrow> <mrow> <mn>1</mn> <mo>&le;</mo> <mi>n</mi> <mo>&le;</mo> <mi>N</mi> </mrow> </munder> <munderover> <mi>&Sigma;</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <mi>ln</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>|</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (lambda)n|ot) Is otGenerated from lambdanThe probability of (d);
preferably, the calculation decision result expression in step S34 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions;is otGenerated fromThe probability of (d); the threshold is a preset rejection threshold.
The invention also provides a sound monitoring device, which comprises the following modules:
the training sound stage module is used for acquiring a training sound signal and extracting the training sound characteristic of the training sound signal; training a sound event model according to the training sound characteristics;
the detection sound stage module is used for acquiring a sound signal to be detected and extracting the sound characteristic to be detected of the sound signal to be detected; judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
The invention also provides a sound monitoring system which is characterized by comprising a microphone, a multi-channel signal collector and a sound monitoring device;
the microphone is arranged in the elevator, collects sound signals and transmits the sound signals to the multi-path signal collector;
the multi-channel signal collector receives the sound signals sent by the microphone and transmits the sound signals to the sound monitoring device;
the sound monitoring device processes the sound signal.
(III) advantageous effects
The invention provides a sound monitoring method, a device and a system, wherein a sound event model is trained by extracting training sound characteristics of a training sound signal; the extracted sound characteristics to be detected of the sound signals to be detected are extracted, the extracted sound characteristics to be detected are compared with the training sound event model, whether a violent event exists in the elevator or not is obtained through analysis, automatic monitoring of the violent event in the elevator is achieved, a monitoring result is given in real time, the accuracy of detection can be effectively guaranteed, and a basis is provided for the next processing of monitoring personnel.
Compared with an industrial camera required by video monitoring, the microphone and the related acquisition equipment have the advantages of low cost and convenience in popularization and use.
Compared with an industrial camera required by video monitoring, the microphone adopted by the invention has small volume, is convenient to arrange at a hidden corner, avoids being damaged by criminals, and ensures that monitoring equipment is safer.
Compared with an industrial camera required by video monitoring, the microphone adopted by the invention has the advantages that the collected signals are not influenced by factors such as illumination, shielding and camouflage, so that the monitoring mode is more stable.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for monitoring sound according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart illustrating a method for monitoring sound according to a preferred embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a sound monitoring apparatus according to a preferred embodiment of the present invention;
fig. 4 is a schematic diagram of a sound monitoring system according to a preferred embodiment of the present invention.
Detailed Description
The following describes a sound monitoring method, a sound monitoring device, and a sound monitoring system according to the present invention in detail with reference to the accompanying drawings and embodiments.
Example 1:
as shown in fig. 1, a sound monitoring method includes a training sound stage and a detection sound stage,
the training sound phase comprises the steps of:
s1, acquiring a training voice signal, and extracting the training voice feature of the training voice signal;
s2, training a sound event model according to the training sound characteristics;
the sound detection stage comprises the following steps:
s3, acquiring a sound signal to be detected, and extracting the sound characteristic to be detected of the sound signal to be detected;
s4, judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
The embodiment of the invention provides a sound monitoring method, which trains a sound event model by extracting training sound characteristics of a training sound signal; the extracted sound characteristics to be detected of the sound signals to be detected are extracted, the extracted sound characteristics to be detected are compared with the training sound event model, whether a violent event exists in the elevator or not is obtained through analysis, automatic monitoring of the violent event in the elevator is achieved, a monitoring result is given in real time, the accuracy of detection can be effectively guaranteed, and a basis is provided for the next processing of monitoring personnel.
The following examples of the present invention will be explained in detail:
a sound monitoring method comprises a training sound stage and a detection sound stage,
the training sound phase comprises the steps of:
s1, acquiring a training voice signal, and extracting the training voice feature of the training voice signal;
preferably, step S1 includes the steps of:
s11, preprocessing the acquired training sound signal;
s12, performing discrete Fourier transform on the preprocessed sound signal to obtain a power spectrum;
s13, obtaining a Mel cepstrum coefficient of the power spectrum based on a Mel filter bank;
s14, calculating the first order difference and the second order difference of the Mel cepstrum coefficient, and splicing the coefficients of the first order difference and the second order difference with the Mel cepstrum coefficient to form sound characteristics.
Preferably, the preprocessing in step S11 includes a framing operation and a windowing operation;
wherein, the window function adopted by the windowing operation is a hamming window, and the expression w (n) is:
wherein n is a time sequence number, and L is a window length;
preferably, the expression X for power spectrum calculation described in step S12a(k) Comprises the following steps:
<math> <mrow> <msub> <mi>X</mi> <mi>a</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mrow> <mo>|</mo> <mo>|</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>j</mi> <mn>2</mn> <mi>k&pi;</mi> <mo>/</mo> <mi>N</mi> </mrow> </msup> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> <mn>0</mn> <mo>&le;</mo> <mi>k</mi> <mo>&le;</mo> <mi>N</mi> </mrow> </math>
where x (N) is the windowed speech frame, N represents the number of points in the Fourier transform, and j represents the imaginary unit.
S2, training a sound event model according to the training sound characteristics;
embodiments of the present invention establish a GMM for each training sound signal. The probability density function for an M-th order GMM is as follows:
<math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>|</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>|</mo> <mi>i</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>c</mi> <mi>i</mi> </msub> <mi>P</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>|</mo> <mi>i</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> </mrow> </math>
wherein λ is a parameter set of the GMM model; o is an acoustic feature vector of K dimension; i is a hidden state number, namely the serial number of a Gaussian component, and M-order GMM has M hidden states; c. CiThe mixed weight of the ith component is the prior probability of the value corresponding to the hidden state i, so that:
<math> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </math>
p (o | i, λ) is a Gaussian mixture component, corresponding to the observed probability density function of the hidden state i,
in step S2, the violent acoustic event model is trained by a gaussian mixture model, and the probability density function of the M-th order gaussian mixture model is as follows:
<math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>|</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>c</mi> <mi>i</mi> </msub> <mi>P</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>|</mo> <mi>i</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> </mrow> </math>
wherein, <math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>|</mo> <mi>i</mi> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>N</mi> <mrow> <mo>(</mo> <mi>o</mi> <mo>,</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>&Sigma;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <msup> <mrow> <mo>(</mo> <mn>2</mn> <mi>&pi;</mi> <mo>)</mo> </mrow> <mrow> <mi>K</mi> <mo>/</mo> <mn>2</mn> </mrow> </msup> <msup> <mrow> <mo>|</mo> <msub> <mi>&Sigma;</mi> <mi>i</mi> </msub> <mo>|</mo> </mrow> <mrow> <mn>1</mn> <mo>/</mo> <mn>2</mn> </mrow> </msup> </mrow> </mfrac> <mi>exp</mi> <mo>{</mo> <mo>-</mo> <mfrac> <mrow> <msup> <mrow> <mo>(</mo> <mi>o</mi> <mo>-</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mi>&Sigma;</mi> <mi>i</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>o</mi> <mo>-</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mn>2</mn> </mfrac> <mo>}</mo> </mrow> </math>
wherein λ ═ ciii;(i=1...M)},μiAs mean vector, sigmaiIs a covariance matrix, i ═ 1,2,. M. Matrix sigmaiHere, a diagonal matrix is used:
<math> <mrow> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>T</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>i</mi> <mo>|</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>i</mi> <mo>|</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> <msub> <mi>o</mi> <mi>t</mi> </msub> </mrow> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>i</mi> <mo>|</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>,</mo> <mi>&lambda;</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </math>
the sound detection stage comprises the following steps:
s3, acquiring a sound signal to be detected, and extracting the sound characteristic to be detected of the sound signal to be detected;
preferably, step S3 includes the steps of:
s11', preprocessing the acquired sound signal to be detected;
preferably, the preprocessing in step S11' includes a framing operation and a windowing operation;
the purpose of framing is, among other things, to divide the time signal into overlapping speech segments, i.e. frames. Each frame is typically around 30ms in length and the frame is shifted by 10 ms.
Wherein, the window function adopted by the windowing operation is a hamming window, and the expression w (n) is:
wherein n is a time sequence number, and L is a window length;
s12', performing discrete Fourier transform on the preprocessed sound signal to obtain a power spectrum;
preferably, the expression X for power spectrum calculation described in step S12a(k) Comprises the following steps:
<math> <mrow> <msub> <mi>X</mi> <mi>a</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mrow> <mo>|</mo> <mo>|</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <msup> <mi>e</mi> <mrow> <mo>-</mo> <mi>j</mi> <mn>2</mn> <mi>k&pi;</mi> <mo>/</mo> <mi>N</mi> </mrow> </msup> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> <mn>0</mn> <mo>&le;</mo> <mi>k</mi> <mo>&le;</mo> <mi>N</mi> </mrow> </math>
where x (N) is the windowed speech frame, N represents the number of points in the Fourier transform, and j represents the imaginary unit.
S13', obtaining a Mel cepstrum coefficient of the power spectrum based on a Mel filter bank;
in the embodiment of the present invention, a filter bank having M filters is defined (the number of the filters is close to the number of critical bands), the adopted filters are triangular filters, the center frequency is f (M), M =0,2, …, and M-1, and M =28 is taken in the embodiment of the present invention. The span of each triangular filter in the filter bank is equal on the mel scale, and the frequency response of the triangular filter is defined as:
<math> <mrow> <msub> <mi>H</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mi>k</mi> <mo>&lt;</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>ork</mi> <mo>></mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mn>2</mn> <mrow> <mo>(</mo> <mi>k</mi> <mo>-</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mrow> <mrow> <mo>(</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&lt;</mo> <mi>k</mi> <mo>&lt;</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mn>2</mn> <mrow> <mo>(</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mrow> <mo>(</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>&le;</mo> <mi>k</mi> <mo>&le;</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>
next, for the power spectrum plus the mel filter bank:
<math> <mrow> <mi>S</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>ln</mi> <mrow> <mo>(</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mrow> <mo>|</mo> <msub> <mi>X</mi> <mi>a</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>|</mo> </mrow> <mn>2</mn> </msup> <msub> <mi>H</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>,</mo> <mn>0</mn> <mo>&le;</mo> <mi>m</mi> <mo>&le;</mo> <mi>M</mi> </mrow> </math>
then, Discrete Cosine Transform (DCT) is carried out to obtain Mel cepstrum coefficients:
<math> <mrow> <mi>c</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>m</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>S</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>cos</mi> <mrow> <mo>(</mo> <mi>n&pi;</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>/</mo> <mi>M</mi> <mo>)</mo> </mrow> <mo>,</mo> <mn>0</mn> <mo>&le;</mo> <mi>n</mi> <mo>&le;</mo> <mi>M</mi> </mrow> </math>
s14', calculating the first order difference and the second order difference of the Mel cepstrum coefficient, and splicing the coefficients of the first order difference and the second order difference with the Mel cepstrum coefficient to form sound characteristics.
If the cepstrum vector at the time t and t +1 is ctAnd ct+1
The calculation method of the first order difference comprises the following steps:
Δct=ct+1-ct
the second order difference is:
ΔΔct=Δct+1-Δct
the spliced voice features are as follows:
[ct Δct ΔΔct]
s4, judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
Preferably, step S4 includes the steps of:
s31, assuming that N sound event models exist, each sound event model is modeled by a Gaussian mixture model which is lambda respectively12,...,λNIn the judging stage, the input observation voice feature set to be detected is O ═ O1,o2,...,oTT is the frame number of the input sound;
s32, calculating the posterior probability of the nth sound event model of the sound to be detected, wherein N is more than or equal to 1 and less than or equal to N;
s33, obtaining a pre-judgment result according to the posterior probability;
and S34, obtaining a final judgment result according to the pre-judgment result.
Preferably, the expression of the calculated posterior probability in step S32 is:
<math> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>|</mo> <mi>O</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>|</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <mo>=</mo> <mfrac> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>|</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>p</mi> <mrow> <mo>(</mo> <mi>O</mi> <mo>|</mo> <msub> <mi>&lambda;</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>m</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mi>N</mi> <mo>.</mo> </mrow> </math>
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (O | λ)n) And generating the conditional probability of the sound feature set O to be detected for the nth sound event model.
Preferably, the expression of the calculation anticipation result in step S33 is:
<math> <mrow> <msup> <mi>n</mi> <mo>*</mo> </msup> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>max</mi> </mrow> <mrow> <mn>1</mn> <mo>&le;</mo> <mi>n</mi> <mo>&le;</mo> <mi>N</mi> </mrow> </munder> <mi>ln</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>|</mo> <mi>O</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>max</mi> </mrow> <mrow> <mn>1</mn> <mo>&le;</mo> <mi>n</mi> <mo>&le;</mo> <mi>N</mi> </mrow> </munder> <munderover> <mi>&Sigma;</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>T</mi> </munderover> <mi>ln</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&lambda;</mi> <mi>n</mi> </msub> <mo>|</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> </mrow> </math>
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (lambda)n|ot) Is otGenerated from lambdanThe probability of (d);
preferably, the calculation decision result expression in step S34 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions;is otGenerated fromThe probability of (d); the threshold is a preset rejection threshold.
Example 2:
as shown in fig. 3, a sound monitoring apparatus includes the following modules:
the training sound stage module is used for acquiring a training sound signal and extracting the training sound characteristic of the training sound signal; training a sound event model according to the training sound characteristics;
the detection sound stage module is used for acquiring a sound signal to be detected and extracting the sound characteristic to be detected of the sound signal to be detected; judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
Example 3:
as shown in fig. 4, a sound monitoring system is characterized by comprising a microphone, a multipath signal collector, and a sound monitoring apparatus as described in embodiment 2;
the microphone is arranged in the elevator, collects sound signals and transmits the sound signals to the multi-path signal collector;
the multi-channel signal collector receives the sound signals sent by the microphone and transmits the sound signals to the sound monitoring device;
the sound monitoring device processes the sound signal.
To sum up, the embodiment of the present invention provides a method, an apparatus, and a system for monitoring sound, in which a sound event model is trained by extracting training sound features of a training sound signal; the extracted sound characteristics to be detected of the sound signals to be detected are extracted, the extracted sound characteristics to be detected are compared with the training sound event model, whether a violent event exists in the elevator or not is obtained through analysis, automatic monitoring of the violent event in the elevator is achieved, a monitoring result is given in real time, the accuracy of detection can be effectively guaranteed, and a basis is provided for the next processing of monitoring personnel.
Compared with an industrial camera required by video monitoring, the microphone and the related acquisition equipment thereof have the advantages of low cost and convenience in popularization and use.
Compared with an industrial camera required by video monitoring, the microphone adopted by the embodiment of the invention has small volume, is convenient to arrange at a hidden corner, avoids being damaged by criminals, and ensures that monitoring equipment is safer.
Compared with an industrial camera required by video monitoring, the microphone adopted by the embodiment of the invention has the advantages that the collected signals are not influenced by factors such as illumination, shielding and camouflage, so that the monitoring mode is more stable.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A sound monitoring method is characterized in that the method comprises a training sound stage and a detection sound stage,
the training sound phase comprises the steps of:
s1, acquiring a training voice signal, and extracting the training voice feature of the training voice signal;
s2, training a sound event model according to the training sound characteristics;
the sound detection stage comprises the following steps:
s3, acquiring a sound signal to be detected, and extracting the sound characteristic to be detected of the sound signal to be detected;
s4, judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
2. The sound monitoring method of claim 1, wherein the step S1 or the step S3 includes the steps of:
s11, preprocessing the acquired sound signal;
s12, performing discrete Fourier transform on the preprocessed sound signal to obtain a power spectrum;
s13, obtaining a Mel cepstrum coefficient of the power spectrum based on a Mel filter bank;
s14, calculating the first order difference and the second order difference of the Mel cepstrum coefficient, and splicing the coefficients of the first order difference and the second order difference with the Mel cepstrum coefficient to form sound characteristics.
3. The sound monitoring method of claim 2,
the preprocessing in step S11 includes a framing operation and a windowing operation;
wherein, the window function adopted by the windowing operation is a hamming window, and the expression w (n) is:
wherein n is a time sequence number, and L is a window length;
the expression X for power spectrum calculation described in step S12a(k) Comprises the following steps:
where x (N) is the windowed speech frame, N represents the number of points in the Fourier transform, and j represents the imaginary unit.
4. The sound monitoring method of claim 1, wherein the violent acoustic event model is trained by a gaussian mixture model in step S2, and the probability density function of the M-th order gaussian mixture model is as follows:
wherein,
wherein λ ═ ciii;(i=1...M)},μiAs mean vector, sigmaiIs a covariance matrix, i ═ 1,2,. M. Matrix sigmaiHere, a diagonal matrix is used:
5. the sound monitoring method of claim 1, wherein the step S4 includes the steps of:
s31, assuming that N sound event models are provided, each sound event model passes throughModeling by a Gaussian mixture model of lambda12,...,λNIn the judging stage, the input observation voice feature set to be detected is O ═ O1,o2,...,oTT is the frame number of the input sound;
s32, calculating the posterior probability of the nth sound event model of the sound to be detected, wherein N is more than or equal to 1 and less than or equal to N;
s33, obtaining a pre-judgment result according to the posterior probability;
and S34, obtaining a final judgment result according to the pre-judgment result.
6. The sound monitoring method of claim 5,
the expression for calculating the posterior probability in step S32 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (O | λ)n) And generating the conditional probability of the sound feature set O to be detected for the nth sound event model.
7. The sound monitoring method of claim 5,
the expression of the calculation prejudgment result in the step S33 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (lambda)n|ot) Is otGenerated from lambdanThe probability of (c).
8. The sound monitoring method of claim 5,
the calculation decision result expression in step S34 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions;is otGenerated fromThe probability of (d); the threshold is a preset rejection threshold.
9. A sound monitoring device, comprising:
the training sound stage module is used for acquiring a training sound signal and extracting the training sound characteristic of the training sound signal; training a sound event model according to the training sound characteristics;
the detection sound stage module is used for acquiring a sound signal to be detected and extracting the sound characteristic to be detected of the sound signal to be detected; judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
10. A sound monitoring system, comprising a microphone, a multi-channel signal collector, and a sound monitoring apparatus according to claim 9;
the microphone is arranged in the elevator, collects sound signals and transmits the sound signals to the multi-path signal collector;
the multi-channel signal collector receives the sound signals sent by the microphone and transmits the sound signals to the sound monitoring device;
the sound monitoring device processes the sound signal.
CN201310332073.6A 2013-08-01 2013-08-01 Sound monitoring method, device and system Pending CN103971702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310332073.6A CN103971702A (en) 2013-08-01 2013-08-01 Sound monitoring method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310332073.6A CN103971702A (en) 2013-08-01 2013-08-01 Sound monitoring method, device and system

Publications (1)

Publication Number Publication Date
CN103971702A true CN103971702A (en) 2014-08-06

Family

ID=51241116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310332073.6A Pending CN103971702A (en) 2013-08-01 2013-08-01 Sound monitoring method, device and system

Country Status (1)

Country Link
CN (1) CN103971702A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679313A (en) * 2016-04-15 2016-06-15 福建新恒通智能科技有限公司 Audio recognition alarm system and method
CN107527617A (en) * 2017-09-30 2017-12-29 上海应用技术大学 Monitoring method, apparatus and system based on voice recognition
CN107910019A (en) * 2017-11-30 2018-04-13 中国科学院微电子研究所 Human body sound signal processing and analyzing method
CN110223715A (en) * 2019-05-07 2019-09-10 华南理工大学 It is a kind of based on sound event detection old solitary people man in activity estimation method
CN110800053A (en) * 2017-06-13 2020-02-14 米纳特有限公司 Method and apparatus for obtaining event indications based on audio data
CN111326172A (en) * 2018-12-17 2020-06-23 北京嘀嘀无限科技发展有限公司 Conflict detection method and device, electronic equipment and readable storage medium
WO2020140552A1 (en) * 2018-12-31 2020-07-09 瑞声声学科技(深圳)有限公司 Haptic feedback method
CN111599379A (en) * 2020-05-09 2020-08-28 北京南师信息技术有限公司 Conflict early warning method, device, equipment, readable storage medium and triage system
CN113421544A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Singing voice synthesis method and device, computer equipment and storage medium
CN113670434A (en) * 2021-06-21 2021-11-19 深圳供电局有限公司 Transformer substation equipment sound abnormality identification method and device and computer equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477798A (en) * 2009-02-17 2009-07-08 北京邮电大学 Method for analyzing and extracting audio data of set scene
CN101587710A (en) * 2009-07-02 2009-11-25 北京理工大学 A kind of many code books coding parameter quantification method based on the audio emergent event classification
CN102509545A (en) * 2011-09-21 2012-06-20 哈尔滨工业大学 Real time acoustics event detecting system and method
CN102799899A (en) * 2012-06-29 2012-11-28 北京理工大学 Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model)
CN103177722A (en) * 2013-03-08 2013-06-26 北京理工大学 Tone-similarity-based song retrieval method
CN103226948A (en) * 2013-04-22 2013-07-31 山东师范大学 Audio scene recognition method based on acoustic events

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477798A (en) * 2009-02-17 2009-07-08 北京邮电大学 Method for analyzing and extracting audio data of set scene
CN101587710A (en) * 2009-07-02 2009-11-25 北京理工大学 A kind of many code books coding parameter quantification method based on the audio emergent event classification
CN102509545A (en) * 2011-09-21 2012-06-20 哈尔滨工业大学 Real time acoustics event detecting system and method
CN102799899A (en) * 2012-06-29 2012-11-28 北京理工大学 Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model)
CN103177722A (en) * 2013-03-08 2013-06-26 北京理工大学 Tone-similarity-based song retrieval method
CN103226948A (en) * 2013-04-22 2013-07-31 山东师范大学 Audio scene recognition method based on acoustic events

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
蒋刚 等: "《工业机器人》", 31 January 2011 *
韩纪庆 等: "《音频信息检索理论与技术》", 31 March 2011 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679313A (en) * 2016-04-15 2016-06-15 福建新恒通智能科技有限公司 Audio recognition alarm system and method
CN110800053A (en) * 2017-06-13 2020-02-14 米纳特有限公司 Method and apparatus for obtaining event indications based on audio data
CN107527617A (en) * 2017-09-30 2017-12-29 上海应用技术大学 Monitoring method, apparatus and system based on voice recognition
CN107910019A (en) * 2017-11-30 2018-04-13 中国科学院微电子研究所 Human body sound signal processing and analyzing method
CN111326172A (en) * 2018-12-17 2020-06-23 北京嘀嘀无限科技发展有限公司 Conflict detection method and device, electronic equipment and readable storage medium
WO2020140552A1 (en) * 2018-12-31 2020-07-09 瑞声声学科技(深圳)有限公司 Haptic feedback method
CN110223715A (en) * 2019-05-07 2019-09-10 华南理工大学 It is a kind of based on sound event detection old solitary people man in activity estimation method
CN110223715B (en) * 2019-05-07 2021-05-25 华南理工大学 Home activity estimation method for solitary old people based on sound event detection
CN111599379A (en) * 2020-05-09 2020-08-28 北京南师信息技术有限公司 Conflict early warning method, device, equipment, readable storage medium and triage system
CN111599379B (en) * 2020-05-09 2023-09-29 北京南师信息技术有限公司 Conflict early warning method, device, equipment, readable storage medium and triage system
CN113670434A (en) * 2021-06-21 2021-11-19 深圳供电局有限公司 Transformer substation equipment sound abnormality identification method and device and computer equipment
CN113421544A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Singing voice synthesis method and device, computer equipment and storage medium
CN113421544B (en) * 2021-06-30 2024-05-10 平安科技(深圳)有限公司 Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103971702A (en) Sound monitoring method, device and system
Liu et al. A sound monitoring system for prevention of underground pipeline damage caused by construction
CN103971700A (en) Voice monitoring method and device
CN107527617A (en) Monitoring method, apparatus and system based on voice recognition
CN110444202B (en) Composite voice recognition method, device, equipment and computer readable storage medium
US20080215318A1 (en) Event recognition
Yang et al. Acoustics recognition of construction equipments based on LPCC features and SVM
Kiktova et al. Comparison of different feature types for acoustic event detection system
KR101250668B1 (en) Method for recogning emergency speech using gmm
Choi et al. Selective background adaptation based abnormal acoustic event recognition for audio surveillance
CN105812721A (en) Tracking monitoring method and tracking monitoring device
CN115631765A (en) Belt carrier roller sound anomaly detection method based on deep learning
CN115512688A (en) Abnormal sound detection method and device
Wijayakulasooriya Automatic recognition of elephant infrasound calls using formant analysis and hidden markov model
CN105352541B (en) A kind of transformer station high-voltage side bus auxiliary monitoring system and its monitoring method based on power network disaster prevention disaster reduction system
Vozáriková et al. Surveillance system based on the acoustic events detection
Agarwal et al. Security threat sounds classification using neural network
CN104064197B (en) Method for improving speech recognition robustness on basis of dynamic information among speech frames
Spadini et al. Sound event recognition in a smart city surveillance context
CN102881099B (en) Anti-theft alarming method and device applied to ATM
Khanum et al. Speech based gender identification using feed forward neural networks
CN101614698A (en) Mass spectral monitoring device and monitor method
Estrebou et al. Voice recognition based on probabilistic SOM
CN108182950B (en) Improved method for decomposing and extracting abnormal sound characteristics of public places through empirical wavelet transform
Jena et al. Gender classification by pitch analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140806