CN105895116B - Double-track voice break-in analysis method - Google Patents
Double-track voice break-in analysis method Download PDFInfo
- Publication number
- CN105895116B CN105895116B CN201610209686.4A CN201610209686A CN105895116B CN 105895116 B CN105895116 B CN 105895116B CN 201610209686 A CN201610209686 A CN 201610209686A CN 105895116 B CN105895116 B CN 105895116B
- Authority
- CN
- China
- Prior art keywords
- time
- call
- endpoint
- end point
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 9
- 238000001514 detection method Methods 0.000 claims abstract description 8
- 230000000694 effects Effects 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 11
- 241001122315 Polites Species 0.000 abstract description 5
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a double-track voice break-in analysis method, which carries out effective voice endpoint detection on recording streams of two tracks by a voice activity detection technology to find the talk-over from a few seconds to a few seconds in the whole voice; according to the effective voice endpoints recorded by the two sound channels, the endpoint time of each segment is processed in a unified mode, the endpoint is described in a unified mode through three attributes of the time point, the sound channel and the endpoint type, and all the endpoints are tiled on a time axis; and traversing all the time points from front to back, and analyzing whether the endpoint types are the starting position endpoint and the ending position endpoint. The double-track voice call-inserting and robbing analysis method can capture the phenomenon in time when the call-inserting and robbing occur between two or more roles, and carries out subsequent processing, thereby avoiding the call mode that the call-inserting and the robbing are polite, and providing high-quality guarantee for customer service.
Description
Technical Field
The invention belongs to the technical field of customer service calls, and particularly relates to a double-track voice break-in analysis method.
Background
The voice customer service is customer service mainly carried out in a mobile phone mode, and the problems of call snatching and call insertion often occur between two or more roles in the process of the customer service. The speech robbing refers to the situation between two characters, one character just speaks, the other character speaks immediately, and no time interval exists between the two characters, which is a polite way in the conversation and can be considered as being outstanding and not serious by the other party. Interlude refers to a way between two characters, one of which is speaking and the other of which is directly interlude to express its own opinions, which is a much less polite way in a conversation. The phenomena of call snatching and call inserting seriously affect the quality level of the customer service.
Disclosure of Invention
The invention aims to provide a method for analyzing the emergency call and the plug-in call of the double-track voice, and aims to solve the problems of emergency call and plug-in call in the customer service process.
The invention is realized in this way, the method for analyzing the double-track voice inserting speech includes the following steps:
the method comprises the following steps that firstly, effective voice endpoint detection is carried out on recording streams of two sound channels through a voice activity detection technology, and the fact that the whole voice is over-talking from a few seconds to a few seconds is found out;
step two, according to the effective voice end points of the two sound channel recordings, the end point time of each segment is processed in a unified mode, the end point time is described in a unified mode through three attributes of the time point, the sound channel and the end point type, and all the end points are tiled on a time axis;
and step three, two end points are arranged next to each other, wherein the former end point is the starting end point of the speaking of the role A, and the latter end point is the ending end point of the speaking of the role B, which is the phenomenon of call insertion.
And step four, two end points are arranged next to each other, wherein the former end point is the end point of the speaking of the role A, the latter end point is the start end point of the speaking of the role B, and the time boundary difference of the two end points is less than 200ms, namely the phenomenon of the call robbing.
The invention also adopts the following technical measures:
the valid voice endpoint in step one comprises three attributes of a start time, an end time and a speaker.
The endpoint types in step two include start and end.
The method for analyzing the endpoint type comprises the following steps:
step one, checking the type of an endpoint;
step two, if the stack top is the starting position end point, judging whether the stack top contains the starting position;
step three, if the stack top comprises a starting position, judging whether the starting time position is the same as the role of the starting position;
step four, if the data are the same, the data are wrong, and one person cannot speak without finishing speaking and then starts speaking again;
step five, if the difference is different, the occurrence of the call-in is indicated, the call-in information is recorded, and the end point at the top of the stack is popped up;
step six, if the stack top does not contain the starting position, the starting position is pushed, the end position is added by 1, and the circulation is continued;
step seven, if the stack top is the end position end point, judging whether the stack top comprises a start position;
step eight, if the stack top comprises a starting position, judging whether the starting time position is the same as the role of the ending position;
step nine, if the two are the same, the two are indicated to be normal end points, no call is inserted, and the time point of the ending position is recorded;
step ten, if the difference is different, the data is wrong, and the previous call is inserted and is not recorded;
step eleven, if the stack top does not contain the starting position, whether the ending position and the starting position of the previous end point are within 200ms or not is judged, if yes, the call robbing occurs, the call robbing occurrence time is recorded, and the end point of the stack top is popped up;
step twelve, arranging and recording all the inserting message information, wherein each inserting message section comprises a starting time, an ending time, a type and an inserting direction.
The invention has the advantages and positive effects that: the double-track voice call-inserting and robbing analysis method can capture the phenomenon in time when the call-inserting and robbing occur between two or more roles, and carries out subsequent processing, thereby avoiding the call mode that the call-inserting and the robbing are polite, and providing high-quality guarantee for customer service.
Drawings
Fig. 1 is a flowchart of a method for analyzing a double-channel speech break-in provided by an embodiment of the present invention;
fig. 2 is a flowchart of a method for analyzing peer types according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The application of the principles of the present invention will be further described with reference to the accompanying figures 1 and 2 and the specific embodiments.
The double-track voice break-in analysis method comprises the following steps:
s101, performing effective voice endpoint detection on the recording streams of the two sound channels through a voice activity detection technology to find the talk-over-speech in the whole voice from several seconds to several seconds;
s102, according to the effective voice endpoints recorded by the two sound channels, the endpoint time of each segment is processed in a unified mode, the endpoint time is described in a unified mode through three attributes of the time point, the sound channel and the endpoint type, and all the endpoints are tiled on a time axis;
and S103, two end points which are close to each other, wherein the former end point is the starting end point of the speaking of the role A, and the latter end point is the ending end point of the speaking of the role B, which is the phenomenon of call interruption.
And S104, two end points which are close to each other, wherein the former end point is the end point of the speaking of the role A, the latter end point is the start end point of the speaking of the role B, and the time boundary difference of the two end points is less than 200ms, namely the phenomenon of the call robbing.
The valid speech endpoint in S101 contains three attributes, a start time, an end time, and a speaker.
The endpoint type in S102 includes start and end.
The method for analyzing the endpoint type comprises the following steps:
s201, checking the type of an endpoint;
s202, if the stack top is the starting position end point, judging whether the stack top contains the starting position;
s203, if the stack top comprises a starting position, judging whether the starting time position is the same as the role of the starting position;
s204, if the data are the same, the data are wrong, and one person cannot speak without finishing speaking and then starts speaking again;
s205, if the difference is different, the occurrence of the call-in is indicated, the call-in information is recorded, and the end point at the top of the stack is popped up;
s206, if the stack top does not contain the starting position, the starting position is pushed, the end position is added with 1, and the circulation is continued;
s207, if the stack top is the end position, judging whether the stack top comprises a starting position;
s208, if the stack top contains the starting position, judging whether the starting time position is the same as the role of the ending position;
s209, if the two are the same, the two are indicated to be normal end points, no call is inserted, and the time point of the ending position is recorded;
s210, if the difference is different, the data is wrong, and the previous call is inserted and is not recorded;
s211, if the stack top does not contain the starting position, whether the ending position and the starting position of the previous end point are within 200ms or not is judged, if yes, the call robbing occurs, the call robbing occurrence time is recorded, and the stack top end point is popped up;
s212, all the information of the emergency call is sorted and recorded, wherein each section of the emergency call comprises a start time, an end time, a type (emergency call or call), and a direction (who calls who).
The double-track voice call-inserting and robbing analysis method can capture the phenomenon in time when the call-inserting and robbing occur between two or more roles, and carries out subsequent processing, thereby avoiding the call mode that the call-inserting and the robbing are polite, and providing high-quality guarantee for customer service.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (1)
1. A double-track voice break-in analysis method is characterized by comprising the following steps:
the method comprises the following steps that firstly, effective voice endpoint detection is carried out on recording streams of two sound channels through a voice activity detection technology, and the fact that the whole voice is over-talking from a few seconds to a few seconds is found out;
step two, according to the effective voice end points of the two sound channel recordings, the end point time of each segment is processed in a unified mode, the end point time is described in a unified mode through three attributes of the time point, the sound channel and the end point type, and all the end points are tiled on a time axis;
step three, traversing all time points from front to back, and analyzing whether the endpoint types are starting position endpoints and ending position endpoints;
the effective voice endpoint in the step one comprises three attributes of a start time, an end time and a speaker;
the end point type in the second step comprises start and end;
the method for analyzing the endpoint types comprises the following steps:
step 1, checking the type of an endpoint;
step 2, if the stack top is the starting position end point, judging whether the stack top contains the starting position;
step 3, if the stack top contains a starting position, judging whether the starting time position is the same as the role of the starting position;
step 4, if the data are the same, the data are wrong, and one person cannot speak without finishing speaking and then starts speaking again;
step 5, if the difference is different, indicating that the call is inserted, recording the information of the call, and popping up the end point at the top of the stack;
step 6, if the stack top does not contain the starting position, the starting position is pushed, the end position is added by 1, and the circulation is continued;
step 7, if the stack top is the end position end point, judging whether the stack top contains the start position;
step 8, if the stack top contains a starting position, judging whether the starting time position is the same as the role of the ending position;
step 9, if the two are the same, the two are indicated to be normal end points, no call is inserted, and the time point of the ending position is recorded;
step 10, if the difference is different, the data is wrong, and the previous call is inserted and is not recorded;
step 11, if the stack top does not contain the starting position, whether the ending position and the starting position of the previous end point are within 200ms or not is judged, if yes, the call robbing occurs, the call robbing occurrence time is recorded, and the stack top end point is popped up;
and 12, arranging and recording all the inserting message information, wherein each segment of inserting message comprises a starting time, an ending time, a type and an inserting direction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610209686.4A CN105895116B (en) | 2016-04-06 | 2016-04-06 | Double-track voice break-in analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610209686.4A CN105895116B (en) | 2016-04-06 | 2016-04-06 | Double-track voice break-in analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105895116A CN105895116A (en) | 2016-08-24 |
CN105895116B true CN105895116B (en) | 2020-01-03 |
Family
ID=57012984
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610209686.4A Active CN105895116B (en) | 2016-04-06 | 2016-04-06 | Double-track voice break-in analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105895116B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109600526A (en) * | 2019-01-08 | 2019-04-09 | 上海上湖信息技术有限公司 | Customer service quality determining method and device, readable storage medium storing program for executing |
CN111147669A (en) * | 2019-12-30 | 2020-05-12 | 科讯嘉联信息技术有限公司 | Full real-time automatic service quality inspection system and method |
CN112511698B (en) * | 2020-12-03 | 2022-04-01 | 普强时代(珠海横琴)信息技术有限公司 | Real-time call analysis method based on universal boundary detection |
CN113066496A (en) * | 2021-03-17 | 2021-07-02 | 浙江百应科技有限公司 | Method for analyzing call robbing of two conversation parties in audio |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001265368A (en) * | 2000-03-17 | 2001-09-28 | Omron Corp | Voice recognition device and recognized object detecting method |
CN102522081A (en) * | 2011-12-29 | 2012-06-27 | 北京百度网讯科技有限公司 | Method for detecting speech endpoints and system |
CN103811009A (en) * | 2014-03-13 | 2014-05-21 | 华东理工大学 | Smart phone customer service system based on speech analysis |
CN104052610A (en) * | 2014-05-19 | 2014-09-17 | 国家电网公司 | Informatization intelligent conference dispatching management device and using method |
WO2015001492A1 (en) * | 2013-07-02 | 2015-01-08 | Family Systems, Limited | Systems and methods for improving audio conferencing services |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8914288B2 (en) * | 2011-09-01 | 2014-12-16 | At&T Intellectual Property I, L.P. | System and method for advanced turn-taking for interactive spoken dialog systems |
JP2015169827A (en) * | 2014-03-07 | 2015-09-28 | 富士通株式会社 | Speech processing device, speech processing method, and speech processing program |
-
2016
- 2016-04-06 CN CN201610209686.4A patent/CN105895116B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001265368A (en) * | 2000-03-17 | 2001-09-28 | Omron Corp | Voice recognition device and recognized object detecting method |
CN102522081A (en) * | 2011-12-29 | 2012-06-27 | 北京百度网讯科技有限公司 | Method for detecting speech endpoints and system |
WO2015001492A1 (en) * | 2013-07-02 | 2015-01-08 | Family Systems, Limited | Systems and methods for improving audio conferencing services |
CN103811009A (en) * | 2014-03-13 | 2014-05-21 | 华东理工大学 | Smart phone customer service system based on speech analysis |
CN104052610A (en) * | 2014-05-19 | 2014-09-17 | 国家电网公司 | Informatization intelligent conference dispatching management device and using method |
Also Published As
Publication number | Publication date |
---|---|
CN105895116A (en) | 2016-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105895116B (en) | Double-track voice break-in analysis method | |
US9571638B1 (en) | Segment-based queueing for audio captioning | |
CN105979106B (en) | A kind of the ringing tone recognition methods and system of call center system | |
US20140343941A1 (en) | Visualization interface of continuous waveform multi-speaker identification | |
US10798135B2 (en) | Switch controller for separating multiple portions of call | |
US20150310863A1 (en) | Method and apparatus for speaker diarization | |
EP3127114B1 (en) | Situation dependent transient suppression | |
CN103190139B (en) | For providing the system and method for conferencing information | |
EP1755324A1 (en) | Unified messaging with transcription of voicemail messages | |
WO2014069076A1 (en) | Conversation analysis device and conversation analysis method | |
CN101951432A (en) | Method, device and mobile terminal for adding contact information into address book | |
US10504538B2 (en) | Noise reduction by application of two thresholds in each frequency band in audio signals | |
CN109644192B (en) | Audio delivery method and apparatus with speech detection period duration compensation | |
CN104023110A (en) | Voiceprint recognition-based caller management method and mobile terminal | |
US10540983B2 (en) | Detecting and reducing feedback | |
US11050871B2 (en) | Storing messages | |
CN112995422A (en) | Call control method and device, electronic equipment and storage medium | |
US10192566B1 (en) | Noise reduction in an audio system | |
CN113808592B (en) | Method and device for transferring call record, electronic equipment and storage medium | |
US20130244623A1 (en) | Updating Contact Information In A Mobile Communications Device | |
CN105338197A (en) | Processing method when voice service is interrupted, processing system and terminal | |
WO2014069121A1 (en) | Conversation analysis device and conversation analysis method | |
CN105704327B (en) | A kind of method and system of rejection phone | |
WO2015019662A1 (en) | Analysis subject determination device and analysis subject determination method | |
US20210398537A1 (en) | Transcription of communications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200309 Address after: 519000 room 105-58115, No. 6, Baohua Road, Hengqin New District, Zhuhai City, Guangdong Province (centralized office area) Patentee after: Puqiang times (Zhuhai Hengqin) Information Technology Co., Ltd Address before: 100085 cloud base 4 / F, tower C, Software Park Plaza, building 4, No. 8, Dongbei Wangxi Road, Haidian District, Beijing Patentee before: Puqiang Information Technology (Beijing) Co., Ltd. |