CN103971702A - Sound monitoring method, device and system - Google Patents
Sound monitoring method, device and system Download PDFInfo
- Publication number
- CN103971702A CN103971702A CN201310332073.6A CN201310332073A CN103971702A CN 103971702 A CN103971702 A CN 103971702A CN 201310332073 A CN201310332073 A CN 201310332073A CN 103971702 A CN103971702 A CN 103971702A
- Authority
- CN
- China
- Prior art keywords
- sound
- mrow
- training
- detected
- msub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 63
- 230000005236 sound signal Effects 0.000 claims abstract description 53
- 238000001514 detection method Methods 0.000 claims description 15
- 238000001228 spectrum Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000012806 monitoring device Methods 0.000 claims description 11
- 239000000203 mixture Substances 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000009432 framing Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The invention provides a sound monitoring method, device and system and relates to the technical field of sound signal processing and sound mode recognition. The method includes a sound training stage and a sound detecting stage. The sound training stage includes the steps of S1, acquiring sound training signals, and extracting sound training features; S2, training sound event models according to the sound training features. The sound detecting stage includes the steps of S3, extracting the features of to-be-detected sounds; S4, judging whether at least one of the sound event models is matched with the features of the to-be-detected sounds or not, if so, judging that a violent event exists, and if not, judging that no violent event exists. The method has the advantages that the sound features of sound signals are extracted and compared with trained sound event models, so that whether violent events exist in an elevator can be known through analyzing, automatic monitoring of the violent events in the elevator is achieved, the monitoring results can be provided in real time, and detecting accuracy can be effectively guaranteed.
Description
Technical Field
The invention relates to the technical field of sound signal processing and pattern recognition, in particular to a sound monitoring method, device and system.
Background
With the high-speed development of modern cities, the use of elevators is more and more common and has become an indispensable vertical transportation means for high-rise buildings, and is closely related to the daily work and life of residents. According to statistics of relevant departments, the annual demand of the elevators in China reaches one third of the global demand at present. Meanwhile, because the elevator is relatively closed, the elevator becomes an excellent place for criminals to implement illegal activities, and thus, a plurality of potential safety hazards are brought to daily life of people. More and more criminals implement robbery, killing or sexual disturbance in the elevator, and seriously threaten the life and property safety of elevator riders. The literature shows that the elevator violence incidents in recent years show a rapid growth trend, and only in 2012 years, the elevator criminal incidents recorded on the case are as high as 6.2 thousands. Therefore, effective monitoring of events occurring in the elevator undoubtedly has important practical significance for discovery, prevention, detection and the like of elevator violence events.
At present, the violent events in the elevator are effectively monitored by widely adopting a camera video monitoring mode.
Although a certain effect is obtained, the following problems still exist: the monitoring is low in intelligentization degree, and violent events are found by observing or reviewing videos of monitoring room workers. Obviously, the monitoring method consumes a great deal of manpower and material resources, and the attention of people who watch the video images for more than 20 minutes is obviously reduced, and the accuracy rate is greatly reduced.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a sound monitoring method, a sound monitoring device and a sound monitoring system, which can automatically realize the monitoring of violence incidents in an elevator.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme:
a sound monitoring method comprises a training sound stage and a detection sound stage,
the training sound phase comprises the steps of:
s1, acquiring a training voice signal, and extracting the training voice feature of the training voice signal;
s2, training a sound event model according to the training sound characteristics;
the sound detection stage comprises the following steps:
s3, acquiring a sound signal to be detected, and extracting the sound characteristic to be detected of the sound signal to be detected;
s4, judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
Preferably, step S1 includes the steps of:
s11, preprocessing the acquired sound signal;
s12, performing discrete Fourier transform on the preprocessed sound signal to obtain a power spectrum;
s13, obtaining a Mel cepstrum coefficient of the power spectrum based on a Mel filter bank;
s14, calculating the first order difference and the second order difference of the Mel cepstrum coefficient, and splicing the coefficients of the first order difference and the second order difference with the Mel cepstrum coefficient to form sound characteristics.
Preferably, the preprocessing in step S11 includes a framing operation and a windowing operation;
wherein, the window function adopted by the windowing operation is a hamming window, and the expression w (n) is:
wherein n is a time sequence number, and L is a window length;
the expression X for power spectrum calculation described in step S12a(k) Comprises the following steps:
where x (N) is the windowed speech frame, N represents the number of points in the Fourier transform, and j represents the imaginary unit.
Preferably, in step S2, the violent acoustic event model is trained by a gaussian mixture model, and the probability density function of the M-th order gaussian mixture model is as follows:
wherein, <math>
<mrow>
<mi>P</mi>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mo>|</mo>
<mi>i</mi>
<mo>,</mo>
<mi>λ</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>N</mi>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mo>,</mo>
<msub>
<mi>μ</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>Σ</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<msup>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mi>π</mi>
<mo>)</mo>
</mrow>
<mrow>
<mi>K</mi>
<mo>/</mo>
<mn>2</mn>
</mrow>
</msup>
<msup>
<mrow>
<mo>|</mo>
<msub>
<mi>Σ</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
</mrow>
<mrow>
<mn>1</mn>
<mo>/</mo>
<mn>2</mn>
</mrow>
</msup>
</mrow>
</mfrac>
<mi>exp</mi>
<mo>{</mo>
<mo>-</mo>
<mfrac>
<mrow>
<msup>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mo>-</mo>
<msub>
<mi>μ</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mi>T</mi>
</msup>
<msubsup>
<mi>Σ</mi>
<mi>i</mi>
<mrow>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mo>-</mo>
<msub>
<mi>μ</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mn>2</mn>
</mfrac>
<mo>}</mo>
</mrow>
</math>
wherein λ ═ ci,μi,Σi;(i=1...M)},μiAs mean vector, sigmaiIs a covariance matrix, i ═ 1,2,. M.Matrix sigmaiHere, a diagonal matrix is used:
preferably, step S4 includes the steps of:
s31, assuming that N sound event models exist, each sound event model is modeled by a Gaussian mixture model which is lambda respectively1,λ2,...,λNIn the judging stage, the input observation voice feature set to be detected is O ═ O1,o2,...,oTT is the frame number of the input sound;
s32, calculating the posterior probability of the nth sound event model of the sound to be detected, wherein N is more than or equal to 1 and less than or equal to N;
s33, obtaining a pre-judgment result according to the posterior probability;
and S34, obtaining a final judgment result according to the pre-judgment result.
Preferably, the first and second liquid crystal materials are,
the expression for calculating the posterior probability in step S32 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (O | λ)n) And generating the conditional probability of the sound feature set O to be detected for the nth sound event model.
Preferably, the expression of the calculation anticipation result in step S33 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (lambda)n|ot) Is otGenerated from lambdanThe probability of (d);
preferably, the calculation decision result expression in step S34 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions;is otGenerated fromThe probability of (d); the threshold is a preset rejection threshold.
The invention also provides a sound monitoring device, which comprises the following modules:
the training sound stage module is used for acquiring a training sound signal and extracting the training sound characteristic of the training sound signal; training a sound event model according to the training sound characteristics;
the detection sound stage module is used for acquiring a sound signal to be detected and extracting the sound characteristic to be detected of the sound signal to be detected; judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
The invention also provides a sound monitoring system which is characterized by comprising a microphone, a multi-channel signal collector and a sound monitoring device;
the microphone is arranged in the elevator, collects sound signals and transmits the sound signals to the multi-path signal collector;
the multi-channel signal collector receives the sound signals sent by the microphone and transmits the sound signals to the sound monitoring device;
the sound monitoring device processes the sound signal.
(III) advantageous effects
The invention provides a sound monitoring method, a device and a system, wherein a sound event model is trained by extracting training sound characteristics of a training sound signal; the extracted sound characteristics to be detected of the sound signals to be detected are extracted, the extracted sound characteristics to be detected are compared with the training sound event model, whether a violent event exists in the elevator or not is obtained through analysis, automatic monitoring of the violent event in the elevator is achieved, a monitoring result is given in real time, the accuracy of detection can be effectively guaranteed, and a basis is provided for the next processing of monitoring personnel.
Compared with an industrial camera required by video monitoring, the microphone and the related acquisition equipment have the advantages of low cost and convenience in popularization and use.
Compared with an industrial camera required by video monitoring, the microphone adopted by the invention has small volume, is convenient to arrange at a hidden corner, avoids being damaged by criminals, and ensures that monitoring equipment is safer.
Compared with an industrial camera required by video monitoring, the microphone adopted by the invention has the advantages that the collected signals are not influenced by factors such as illumination, shielding and camouflage, so that the monitoring mode is more stable.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for monitoring sound according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart illustrating a method for monitoring sound according to a preferred embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a sound monitoring apparatus according to a preferred embodiment of the present invention;
fig. 4 is a schematic diagram of a sound monitoring system according to a preferred embodiment of the present invention.
Detailed Description
The following describes a sound monitoring method, a sound monitoring device, and a sound monitoring system according to the present invention in detail with reference to the accompanying drawings and embodiments.
Example 1:
as shown in fig. 1, a sound monitoring method includes a training sound stage and a detection sound stage,
the training sound phase comprises the steps of:
s1, acquiring a training voice signal, and extracting the training voice feature of the training voice signal;
s2, training a sound event model according to the training sound characteristics;
the sound detection stage comprises the following steps:
s3, acquiring a sound signal to be detected, and extracting the sound characteristic to be detected of the sound signal to be detected;
s4, judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
The embodiment of the invention provides a sound monitoring method, which trains a sound event model by extracting training sound characteristics of a training sound signal; the extracted sound characteristics to be detected of the sound signals to be detected are extracted, the extracted sound characteristics to be detected are compared with the training sound event model, whether a violent event exists in the elevator or not is obtained through analysis, automatic monitoring of the violent event in the elevator is achieved, a monitoring result is given in real time, the accuracy of detection can be effectively guaranteed, and a basis is provided for the next processing of monitoring personnel.
The following examples of the present invention will be explained in detail:
a sound monitoring method comprises a training sound stage and a detection sound stage,
the training sound phase comprises the steps of:
s1, acquiring a training voice signal, and extracting the training voice feature of the training voice signal;
preferably, step S1 includes the steps of:
s11, preprocessing the acquired training sound signal;
s12, performing discrete Fourier transform on the preprocessed sound signal to obtain a power spectrum;
s13, obtaining a Mel cepstrum coefficient of the power spectrum based on a Mel filter bank;
s14, calculating the first order difference and the second order difference of the Mel cepstrum coefficient, and splicing the coefficients of the first order difference and the second order difference with the Mel cepstrum coefficient to form sound characteristics.
Preferably, the preprocessing in step S11 includes a framing operation and a windowing operation;
wherein, the window function adopted by the windowing operation is a hamming window, and the expression w (n) is:
wherein n is a time sequence number, and L is a window length;
preferably, the expression X for power spectrum calculation described in step S12a(k) Comprises the following steps:
where x (N) is the windowed speech frame, N represents the number of points in the Fourier transform, and j represents the imaginary unit.
S2, training a sound event model according to the training sound characteristics;
embodiments of the present invention establish a GMM for each training sound signal. The probability density function for an M-th order GMM is as follows:
wherein λ is a parameter set of the GMM model; o is an acoustic feature vector of K dimension; i is a hidden state number, namely the serial number of a Gaussian component, and M-order GMM has M hidden states; c. CiThe mixed weight of the ith component is the prior probability of the value corresponding to the hidden state i, so that:
p (o | i, λ) is a Gaussian mixture component, corresponding to the observed probability density function of the hidden state i,
in step S2, the violent acoustic event model is trained by a gaussian mixture model, and the probability density function of the M-th order gaussian mixture model is as follows:
wherein, <math>
<mrow>
<mi>P</mi>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mo>|</mo>
<mi>i</mi>
<mo>,</mo>
<mi>λ</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>N</mi>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mo>,</mo>
<msub>
<mi>μ</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>Σ</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<msup>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mi>π</mi>
<mo>)</mo>
</mrow>
<mrow>
<mi>K</mi>
<mo>/</mo>
<mn>2</mn>
</mrow>
</msup>
<msup>
<mrow>
<mo>|</mo>
<msub>
<mi>Σ</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
</mrow>
<mrow>
<mn>1</mn>
<mo>/</mo>
<mn>2</mn>
</mrow>
</msup>
</mrow>
</mfrac>
<mi>exp</mi>
<mo>{</mo>
<mo>-</mo>
<mfrac>
<mrow>
<msup>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mo>-</mo>
<msub>
<mi>μ</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mi>T</mi>
</msup>
<msubsup>
<mi>Σ</mi>
<mi>i</mi>
<mrow>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>o</mi>
<mo>-</mo>
<msub>
<mi>μ</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mn>2</mn>
</mfrac>
<mo>}</mo>
</mrow>
</math>
wherein λ ═ ci,μi,Σi;(i=1...M)},μiAs mean vector, sigmaiIs a covariance matrix, i ═ 1,2,. M. Matrix sigmaiHere, a diagonal matrix is used:
the sound detection stage comprises the following steps:
s3, acquiring a sound signal to be detected, and extracting the sound characteristic to be detected of the sound signal to be detected;
preferably, step S3 includes the steps of:
s11', preprocessing the acquired sound signal to be detected;
preferably, the preprocessing in step S11' includes a framing operation and a windowing operation;
the purpose of framing is, among other things, to divide the time signal into overlapping speech segments, i.e. frames. Each frame is typically around 30ms in length and the frame is shifted by 10 ms.
Wherein, the window function adopted by the windowing operation is a hamming window, and the expression w (n) is:
wherein n is a time sequence number, and L is a window length;
s12', performing discrete Fourier transform on the preprocessed sound signal to obtain a power spectrum;
preferably, the expression X for power spectrum calculation described in step S12a(k) Comprises the following steps:
where x (N) is the windowed speech frame, N represents the number of points in the Fourier transform, and j represents the imaginary unit.
S13', obtaining a Mel cepstrum coefficient of the power spectrum based on a Mel filter bank;
in the embodiment of the present invention, a filter bank having M filters is defined (the number of the filters is close to the number of critical bands), the adopted filters are triangular filters, the center frequency is f (M), M =0,2, …, and M-1, and M =28 is taken in the embodiment of the present invention. The span of each triangular filter in the filter bank is equal on the mel scale, and the frequency response of the triangular filter is defined as:
next, for the power spectrum plus the mel filter bank:
then, Discrete Cosine Transform (DCT) is carried out to obtain Mel cepstrum coefficients:
s14', calculating the first order difference and the second order difference of the Mel cepstrum coefficient, and splicing the coefficients of the first order difference and the second order difference with the Mel cepstrum coefficient to form sound characteristics.
If the cepstrum vector at the time t and t +1 is ctAnd ct+1,
The calculation method of the first order difference comprises the following steps:
Δct=ct+1-ct
the second order difference is:
ΔΔct=Δct+1-Δct
the spliced voice features are as follows:
[ct Δct ΔΔct]
s4, judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
Preferably, step S4 includes the steps of:
s31, assuming that N sound event models exist, each sound event model is modeled by a Gaussian mixture model which is lambda respectively1,λ2,...,λNIn the judging stage, the input observation voice feature set to be detected is O ═ O1,o2,...,oTT is the frame number of the input sound;
s32, calculating the posterior probability of the nth sound event model of the sound to be detected, wherein N is more than or equal to 1 and less than or equal to N;
s33, obtaining a pre-judgment result according to the posterior probability;
and S34, obtaining a final judgment result according to the pre-judgment result.
Preferably, the expression of the calculated posterior probability in step S32 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (O | λ)n) And generating the conditional probability of the sound feature set O to be detected for the nth sound event model.
Preferably, the expression of the calculation anticipation result in step S33 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (lambda)n|ot) Is otGenerated from lambdanThe probability of (d);
preferably, the calculation decision result expression in step S34 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions;is otGenerated fromThe probability of (d); the threshold is a preset rejection threshold.
Example 2:
as shown in fig. 3, a sound monitoring apparatus includes the following modules:
the training sound stage module is used for acquiring a training sound signal and extracting the training sound characteristic of the training sound signal; training a sound event model according to the training sound characteristics;
the detection sound stage module is used for acquiring a sound signal to be detected and extracting the sound characteristic to be detected of the sound signal to be detected; judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
Example 3:
as shown in fig. 4, a sound monitoring system is characterized by comprising a microphone, a multipath signal collector, and a sound monitoring apparatus as described in embodiment 2;
the microphone is arranged in the elevator, collects sound signals and transmits the sound signals to the multi-path signal collector;
the multi-channel signal collector receives the sound signals sent by the microphone and transmits the sound signals to the sound monitoring device;
the sound monitoring device processes the sound signal.
To sum up, the embodiment of the present invention provides a method, an apparatus, and a system for monitoring sound, in which a sound event model is trained by extracting training sound features of a training sound signal; the extracted sound characteristics to be detected of the sound signals to be detected are extracted, the extracted sound characteristics to be detected are compared with the training sound event model, whether a violent event exists in the elevator or not is obtained through analysis, automatic monitoring of the violent event in the elevator is achieved, a monitoring result is given in real time, the accuracy of detection can be effectively guaranteed, and a basis is provided for the next processing of monitoring personnel.
Compared with an industrial camera required by video monitoring, the microphone and the related acquisition equipment thereof have the advantages of low cost and convenience in popularization and use.
Compared with an industrial camera required by video monitoring, the microphone adopted by the embodiment of the invention has small volume, is convenient to arrange at a hidden corner, avoids being damaged by criminals, and ensures that monitoring equipment is safer.
Compared with an industrial camera required by video monitoring, the microphone adopted by the embodiment of the invention has the advantages that the collected signals are not influenced by factors such as illumination, shielding and camouflage, so that the monitoring mode is more stable.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A sound monitoring method is characterized in that the method comprises a training sound stage and a detection sound stage,
the training sound phase comprises the steps of:
s1, acquiring a training voice signal, and extracting the training voice feature of the training voice signal;
s2, training a sound event model according to the training sound characteristics;
the sound detection stage comprises the following steps:
s3, acquiring a sound signal to be detected, and extracting the sound characteristic to be detected of the sound signal to be detected;
s4, judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
2. The sound monitoring method of claim 1, wherein the step S1 or the step S3 includes the steps of:
s11, preprocessing the acquired sound signal;
s12, performing discrete Fourier transform on the preprocessed sound signal to obtain a power spectrum;
s13, obtaining a Mel cepstrum coefficient of the power spectrum based on a Mel filter bank;
s14, calculating the first order difference and the second order difference of the Mel cepstrum coefficient, and splicing the coefficients of the first order difference and the second order difference with the Mel cepstrum coefficient to form sound characteristics.
3. The sound monitoring method of claim 2,
the preprocessing in step S11 includes a framing operation and a windowing operation;
wherein, the window function adopted by the windowing operation is a hamming window, and the expression w (n) is:
wherein n is a time sequence number, and L is a window length;
the expression X for power spectrum calculation described in step S12a(k) Comprises the following steps:
where x (N) is the windowed speech frame, N represents the number of points in the Fourier transform, and j represents the imaginary unit.
4. The sound monitoring method of claim 1, wherein the violent acoustic event model is trained by a gaussian mixture model in step S2, and the probability density function of the M-th order gaussian mixture model is as follows:
wherein,
wherein λ ═ ci,μi,Σi;(i=1...M)},μiAs mean vector, sigmaiIs a covariance matrix, i ═ 1,2,. M. Matrix sigmaiHere, a diagonal matrix is used:
5. the sound monitoring method of claim 1, wherein the step S4 includes the steps of:
s31, assuming that N sound event models are provided, each sound event model passes throughModeling by a Gaussian mixture model of lambda1,λ2,...,λNIn the judging stage, the input observation voice feature set to be detected is O ═ O1,o2,...,oTT is the frame number of the input sound;
s32, calculating the posterior probability of the nth sound event model of the sound to be detected, wherein N is more than or equal to 1 and less than or equal to N;
s33, obtaining a pre-judgment result according to the posterior probability;
and S34, obtaining a final judgment result according to the pre-judgment result.
6. The sound monitoring method of claim 5,
the expression for calculating the posterior probability in step S32 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (O | λ)n) And generating the conditional probability of the sound feature set O to be detected for the nth sound event model.
7. The sound monitoring method of claim 5,
the expression of the calculation prejudgment result in the step S33 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions; p (lambda)n|ot) Is otGenerated from lambdanThe probability of (c).
8. The sound monitoring method of claim 5,
the calculation decision result expression in step S34 is:
in the formula, p (lambda)n) Is the prior probability of the nth acoustic event model; p (O) is the probability of the sound feature set O to be detected under all the sound event model conditions;is otGenerated fromThe probability of (d); the threshold is a preset rejection threshold.
9. A sound monitoring device, comprising:
the training sound stage module is used for acquiring a training sound signal and extracting the training sound characteristic of the training sound signal; training a sound event model according to the training sound characteristics;
the detection sound stage module is used for acquiring a sound signal to be detected and extracting the sound characteristic to be detected of the sound signal to be detected; judging whether at least one sound event model matched with the sound features to be detected exists in the sound event models, and if so, judging that a violent event exists; if not, judging that no violent event exists.
10. A sound monitoring system, comprising a microphone, a multi-channel signal collector, and a sound monitoring apparatus according to claim 9;
the microphone is arranged in the elevator, collects sound signals and transmits the sound signals to the multi-path signal collector;
the multi-channel signal collector receives the sound signals sent by the microphone and transmits the sound signals to the sound monitoring device;
the sound monitoring device processes the sound signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310332073.6A CN103971702A (en) | 2013-08-01 | 2013-08-01 | Sound monitoring method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310332073.6A CN103971702A (en) | 2013-08-01 | 2013-08-01 | Sound monitoring method, device and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103971702A true CN103971702A (en) | 2014-08-06 |
Family
ID=51241116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310332073.6A Pending CN103971702A (en) | 2013-08-01 | 2013-08-01 | Sound monitoring method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103971702A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105679313A (en) * | 2016-04-15 | 2016-06-15 | 福建新恒通智能科技有限公司 | Audio recognition alarm system and method |
CN107527617A (en) * | 2017-09-30 | 2017-12-29 | 上海应用技术大学 | Monitoring method, apparatus and system based on voice recognition |
CN107910019A (en) * | 2017-11-30 | 2018-04-13 | 中国科学院微电子研究所 | Human body sound signal processing and analyzing method |
CN110223715A (en) * | 2019-05-07 | 2019-09-10 | 华南理工大学 | It is a kind of based on sound event detection old solitary people man in activity estimation method |
CN110800053A (en) * | 2017-06-13 | 2020-02-14 | 米纳特有限公司 | Method and apparatus for obtaining event indications based on audio data |
CN111326172A (en) * | 2018-12-17 | 2020-06-23 | 北京嘀嘀无限科技发展有限公司 | Conflict detection method and device, electronic equipment and readable storage medium |
WO2020140552A1 (en) * | 2018-12-31 | 2020-07-09 | 瑞声声学科技(深圳)有限公司 | Haptic feedback method |
CN111599379A (en) * | 2020-05-09 | 2020-08-28 | 北京南师信息技术有限公司 | Conflict early warning method, device, equipment, readable storage medium and triage system |
CN113421544A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
CN113670434A (en) * | 2021-06-21 | 2021-11-19 | 深圳供电局有限公司 | Transformer substation equipment sound abnormality identification method and device and computer equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477798A (en) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | Method for analyzing and extracting audio data of set scene |
CN101587710A (en) * | 2009-07-02 | 2009-11-25 | 北京理工大学 | A kind of many code books coding parameter quantification method based on the audio emergent event classification |
CN102509545A (en) * | 2011-09-21 | 2012-06-20 | 哈尔滨工业大学 | Real time acoustics event detecting system and method |
CN102799899A (en) * | 2012-06-29 | 2012-11-28 | 北京理工大学 | Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model) |
CN103177722A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Tone-similarity-based song retrieval method |
CN103226948A (en) * | 2013-04-22 | 2013-07-31 | 山东师范大学 | Audio scene recognition method based on acoustic events |
-
2013
- 2013-08-01 CN CN201310332073.6A patent/CN103971702A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477798A (en) * | 2009-02-17 | 2009-07-08 | 北京邮电大学 | Method for analyzing and extracting audio data of set scene |
CN101587710A (en) * | 2009-07-02 | 2009-11-25 | 北京理工大学 | A kind of many code books coding parameter quantification method based on the audio emergent event classification |
CN102509545A (en) * | 2011-09-21 | 2012-06-20 | 哈尔滨工业大学 | Real time acoustics event detecting system and method |
CN102799899A (en) * | 2012-06-29 | 2012-11-28 | 北京理工大学 | Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model) |
CN103177722A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Tone-similarity-based song retrieval method |
CN103226948A (en) * | 2013-04-22 | 2013-07-31 | 山东师范大学 | Audio scene recognition method based on acoustic events |
Non-Patent Citations (2)
Title |
---|
蒋刚 等: "《工业机器人》", 31 January 2011 * |
韩纪庆 等: "《音频信息检索理论与技术》", 31 March 2011 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105679313A (en) * | 2016-04-15 | 2016-06-15 | 福建新恒通智能科技有限公司 | Audio recognition alarm system and method |
CN110800053A (en) * | 2017-06-13 | 2020-02-14 | 米纳特有限公司 | Method and apparatus for obtaining event indications based on audio data |
CN107527617A (en) * | 2017-09-30 | 2017-12-29 | 上海应用技术大学 | Monitoring method, apparatus and system based on voice recognition |
CN107910019A (en) * | 2017-11-30 | 2018-04-13 | 中国科学院微电子研究所 | Human body sound signal processing and analyzing method |
CN111326172A (en) * | 2018-12-17 | 2020-06-23 | 北京嘀嘀无限科技发展有限公司 | Conflict detection method and device, electronic equipment and readable storage medium |
WO2020140552A1 (en) * | 2018-12-31 | 2020-07-09 | 瑞声声学科技(深圳)有限公司 | Haptic feedback method |
CN110223715A (en) * | 2019-05-07 | 2019-09-10 | 华南理工大学 | It is a kind of based on sound event detection old solitary people man in activity estimation method |
CN110223715B (en) * | 2019-05-07 | 2021-05-25 | 华南理工大学 | Home activity estimation method for solitary old people based on sound event detection |
CN111599379A (en) * | 2020-05-09 | 2020-08-28 | 北京南师信息技术有限公司 | Conflict early warning method, device, equipment, readable storage medium and triage system |
CN111599379B (en) * | 2020-05-09 | 2023-09-29 | 北京南师信息技术有限公司 | Conflict early warning method, device, equipment, readable storage medium and triage system |
CN113670434A (en) * | 2021-06-21 | 2021-11-19 | 深圳供电局有限公司 | Transformer substation equipment sound abnormality identification method and device and computer equipment |
CN113421544A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
CN113421544B (en) * | 2021-06-30 | 2024-05-10 | 平安科技(深圳)有限公司 | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103971702A (en) | Sound monitoring method, device and system | |
Liu et al. | A sound monitoring system for prevention of underground pipeline damage caused by construction | |
CN103971700A (en) | Voice monitoring method and device | |
CN107527617A (en) | Monitoring method, apparatus and system based on voice recognition | |
CN110444202B (en) | Composite voice recognition method, device, equipment and computer readable storage medium | |
US20080215318A1 (en) | Event recognition | |
Yang et al. | Acoustics recognition of construction equipments based on LPCC features and SVM | |
Kiktova et al. | Comparison of different feature types for acoustic event detection system | |
KR101250668B1 (en) | Method for recogning emergency speech using gmm | |
Choi et al. | Selective background adaptation based abnormal acoustic event recognition for audio surveillance | |
CN105812721A (en) | Tracking monitoring method and tracking monitoring device | |
CN115631765A (en) | Belt carrier roller sound anomaly detection method based on deep learning | |
CN115512688A (en) | Abnormal sound detection method and device | |
Wijayakulasooriya | Automatic recognition of elephant infrasound calls using formant analysis and hidden markov model | |
CN105352541B (en) | A kind of transformer station high-voltage side bus auxiliary monitoring system and its monitoring method based on power network disaster prevention disaster reduction system | |
Vozáriková et al. | Surveillance system based on the acoustic events detection | |
Agarwal et al. | Security threat sounds classification using neural network | |
CN104064197B (en) | Method for improving speech recognition robustness on basis of dynamic information among speech frames | |
Spadini et al. | Sound event recognition in a smart city surveillance context | |
CN102881099B (en) | Anti-theft alarming method and device applied to ATM | |
Khanum et al. | Speech based gender identification using feed forward neural networks | |
CN101614698A (en) | Mass spectral monitoring device and monitor method | |
Estrebou et al. | Voice recognition based on probabilistic SOM | |
CN108182950B (en) | Improved method for decomposing and extracting abnormal sound characteristics of public places through empirical wavelet transform | |
Jena et al. | Gender classification by pitch analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140806 |