CN105895116B

CN105895116B - Double-track voice break-in analysis method

Info

Publication number: CN105895116B
Application number: CN201610209686.4A
Authority: CN
Inventors: 刘郁松; 何国涛; 李全忠; 蒲瑶
Original assignee: Puqiang Information Technology (beijing) Co Ltd
Current assignee: Puqiang times (Zhuhai Hengqin) Information Technology Co., Ltd
Priority date: 2016-04-06
Filing date: 2016-04-06
Publication date: 2020-01-03
Anticipated expiration: 2036-04-06
Also published as: CN105895116A

Abstract

The invention discloses a double-track voice break-in analysis method, which carries out effective voice endpoint detection on recording streams of two tracks by a voice activity detection technology to find the talk-over from a few seconds to a few seconds in the whole voice; according to the effective voice endpoints recorded by the two sound channels, the endpoint time of each segment is processed in a unified mode, the endpoint is described in a unified mode through three attributes of the time point, the sound channel and the endpoint type, and all the endpoints are tiled on a time axis; and traversing all the time points from front to back, and analyzing whether the endpoint types are the starting position endpoint and the ending position endpoint. The double-track voice call-inserting and robbing analysis method can capture the phenomenon in time when the call-inserting and robbing occur between two or more roles, and carries out subsequent processing, thereby avoiding the call mode that the call-inserting and the robbing are polite, and providing high-quality guarantee for customer service.

Description

Double-track voice break-in analysis method

Technical Field

The invention belongs to the technical field of customer service calls, and particularly relates to a double-track voice break-in analysis method.

Background

The voice customer service is customer service mainly carried out in a mobile phone mode, and the problems of call snatching and call insertion often occur between two or more roles in the process of the customer service. The speech robbing refers to the situation between two characters, one character just speaks, the other character speaks immediately, and no time interval exists between the two characters, which is a polite way in the conversation and can be considered as being outstanding and not serious by the other party. Interlude refers to a way between two characters, one of which is speaking and the other of which is directly interlude to express its own opinions, which is a much less polite way in a conversation. The phenomena of call snatching and call inserting seriously affect the quality level of the customer service.

Disclosure of Invention

The invention aims to provide a method for analyzing the emergency call and the plug-in call of the double-track voice, and aims to solve the problems of emergency call and plug-in call in the customer service process.

The invention is realized in this way, the method for analyzing the double-track voice inserting speech includes the following steps:

the method comprises the following steps that firstly, effective voice endpoint detection is carried out on recording streams of two sound channels through a voice activity detection technology, and the fact that the whole voice is over-talking from a few seconds to a few seconds is found out;

step two, according to the effective voice end points of the two sound channel recordings, the end point time of each segment is processed in a unified mode, the end point time is described in a unified mode through three attributes of the time point, the sound channel and the end point type, and all the end points are tiled on a time axis;

and step three, two end points are arranged next to each other, wherein the former end point is the starting end point of the speaking of the role A, and the latter end point is the ending end point of the speaking of the role B, which is the phenomenon of call insertion.

And step four, two end points are arranged next to each other, wherein the former end point is the end point of the speaking of the role A, the latter end point is the start end point of the speaking of the role B, and the time boundary difference of the two end points is less than 200ms, namely the phenomenon of the call robbing.

The invention also adopts the following technical measures:

the valid voice endpoint in step one comprises three attributes of a start time, an end time and a speaker.

The endpoint types in step two include start and end.

The method for analyzing the endpoint type comprises the following steps:

step one, checking the type of an endpoint;

step two, if the stack top is the starting position end point, judging whether the stack top contains the starting position;

step three, if the stack top comprises a starting position, judging whether the starting time position is the same as the role of the starting position;

step four, if the data are the same, the data are wrong, and one person cannot speak without finishing speaking and then starts speaking again;

step five, if the difference is different, the occurrence of the call-in is indicated, the call-in information is recorded, and the end point at the top of the stack is popped up;

step six, if the stack top does not contain the starting position, the starting position is pushed, the end position is added by 1, and the circulation is continued;

step seven, if the stack top is the end position end point, judging whether the stack top comprises a start position;

step eight, if the stack top comprises a starting position, judging whether the starting time position is the same as the role of the ending position;

step nine, if the two are the same, the two are indicated to be normal end points, no call is inserted, and the time point of the ending position is recorded;

step ten, if the difference is different, the data is wrong, and the previous call is inserted and is not recorded;

step eleven, if the stack top does not contain the starting position, whether the ending position and the starting position of the previous end point are within 200ms or not is judged, if yes, the call robbing occurs, the call robbing occurrence time is recorded, and the end point of the stack top is popped up;

step twelve, arranging and recording all the inserting message information, wherein each inserting message section comprises a starting time, an ending time, a type and an inserting direction.

The invention has the advantages and positive effects that: the double-track voice call-inserting and robbing analysis method can capture the phenomenon in time when the call-inserting and robbing occur between two or more roles, and carries out subsequent processing, thereby avoiding the call mode that the call-inserting and the robbing are polite, and providing high-quality guarantee for customer service.

Drawings

Fig. 1 is a flowchart of a method for analyzing a double-channel speech break-in provided by an embodiment of the present invention;

fig. 2 is a flowchart of a method for analyzing peer types according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The application of the principles of the present invention will be further described with reference to the accompanying figures 1 and 2 and the specific embodiments.

The double-track voice break-in analysis method comprises the following steps:

s101, performing effective voice endpoint detection on the recording streams of the two sound channels through a voice activity detection technology to find the talk-over-speech in the whole voice from several seconds to several seconds;

s102, according to the effective voice endpoints recorded by the two sound channels, the endpoint time of each segment is processed in a unified mode, the endpoint time is described in a unified mode through three attributes of the time point, the sound channel and the endpoint type, and all the endpoints are tiled on a time axis;

and S103, two end points which are close to each other, wherein the former end point is the starting end point of the speaking of the role A, and the latter end point is the ending end point of the speaking of the role B, which is the phenomenon of call interruption.

And S104, two end points which are close to each other, wherein the former end point is the end point of the speaking of the role A, the latter end point is the start end point of the speaking of the role B, and the time boundary difference of the two end points is less than 200ms, namely the phenomenon of the call robbing.

The valid speech endpoint in S101 contains three attributes, a start time, an end time, and a speaker.

The endpoint type in S102 includes start and end.

The method for analyzing the endpoint type comprises the following steps:

s201, checking the type of an endpoint;

s202, if the stack top is the starting position end point, judging whether the stack top contains the starting position;

s203, if the stack top comprises a starting position, judging whether the starting time position is the same as the role of the starting position;

s204, if the data are the same, the data are wrong, and one person cannot speak without finishing speaking and then starts speaking again;

s205, if the difference is different, the occurrence of the call-in is indicated, the call-in information is recorded, and the end point at the top of the stack is popped up;

s206, if the stack top does not contain the starting position, the starting position is pushed, the end position is added with 1, and the circulation is continued;

s207, if the stack top is the end position, judging whether the stack top comprises a starting position;

s208, if the stack top contains the starting position, judging whether the starting time position is the same as the role of the ending position;

s209, if the two are the same, the two are indicated to be normal end points, no call is inserted, and the time point of the ending position is recorded;

s210, if the difference is different, the data is wrong, and the previous call is inserted and is not recorded;

s211, if the stack top does not contain the starting position, whether the ending position and the starting position of the previous end point are within 200ms or not is judged, if yes, the call robbing occurs, the call robbing occurrence time is recorded, and the stack top end point is popped up;

s212, all the information of the emergency call is sorted and recorded, wherein each section of the emergency call comprises a start time, an end time, a type (emergency call or call), and a direction (who calls who).

The double-track voice call-inserting and robbing analysis method can capture the phenomenon in time when the call-inserting and robbing occur between two or more roles, and carries out subsequent processing, thereby avoiding the call mode that the call-inserting and the robbing are polite, and providing high-quality guarantee for customer service.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A double-track voice break-in analysis method is characterized by comprising the following steps:

step three, traversing all time points from front to back, and analyzing whether the endpoint types are starting position endpoints and ending position endpoints;

the effective voice endpoint in the step one comprises three attributes of a start time, an end time and a speaker;

the end point type in the second step comprises start and end;

the method for analyzing the endpoint types comprises the following steps:

step 1, checking the type of an endpoint;

step 2, if the stack top is the starting position end point, judging whether the stack top contains the starting position;

step 3, if the stack top contains a starting position, judging whether the starting time position is the same as the role of the starting position;

step 4, if the data are the same, the data are wrong, and one person cannot speak without finishing speaking and then starts speaking again;

step 5, if the difference is different, indicating that the call is inserted, recording the information of the call, and popping up the end point at the top of the stack;

step 6, if the stack top does not contain the starting position, the starting position is pushed, the end position is added by 1, and the circulation is continued;

step 7, if the stack top is the end position end point, judging whether the stack top contains the start position;

step 8, if the stack top contains a starting position, judging whether the starting time position is the same as the role of the ending position;

step 9, if the two are the same, the two are indicated to be normal end points, no call is inserted, and the time point of the ending position is recorded;

step 10, if the difference is different, the data is wrong, and the previous call is inserted and is not recorded;

step 11, if the stack top does not contain the starting position, whether the ending position and the starting position of the previous end point are within 200ms or not is judged, if yes, the call robbing occurs, the call robbing occurrence time is recorded, and the stack top end point is popped up;

and 12, arranging and recording all the inserting message information, wherein each segment of inserting message comprises a starting time, an ending time, a type and an inserting direction.