CN116975260A

CN116975260A - Complaint work order processing method, device, equipment and medium based on semantic mining

Info

Publication number: CN116975260A
Application number: CN202211150650.5A
Authority: CN
Inventors: 马建辉; 杨威; 张文圳; 郭鹏; 姚坤; 刘文吉; 颜涛; 郭宝; 杨丹; 阙鋆淑
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Design Institute Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Design Institute Co Ltd
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2023-10-31

Abstract

The embodiment of the application relates to the technical field of communication and discloses a complaint work order processing method, device, equipment and medium based on semantic mining. Obviously, the complaint work order processing method provided by the application is beneficial to efficiently and accurately positioning the complaint reasons, promoting the optimization solution, assisting in capturing the user sensitivity characteristics and carrying out the re-complaint prediction of the user, thereby reducing the number of complaint users, improving the complaint solving efficiency and promoting the perception satisfaction.

Description

Complaint work order processing method, device, equipment and medium based on semantic mining

Technical Field

The embodiment of the application relates to the technical field of communication, in particular to a complaint work order processing method, device, equipment and medium based on semantic mining.

Background

With the rapid development of mobile networks, mobile communication services have become an indispensable communication mode and service access mode in people's daily lives. Under intense market competition, user satisfaction with the business experience becomes particularly important. How to provide better perception experience and improve user satisfaction is not only the core appeal of each large operator, but also the life-saving cost.

The network problem complaint worksheet is used as visual feedback perceived by the mobile communication user, so that the advantages and disadvantages of the user on network experience and the network resident willingness are highlighted. The traditional complaint work order processing is more focused on the retrospective analysis of complaint problems by utilizing complaint time, complaint user numbers and associated signaling data, or simply manual classification is carried out, so that rich information contained in the complaint content of the work order is ignored, and the specific defects are shown in the following aspects:

1. the complaint worksheets have more characters, the manual processing is time-consuming and labor-consuming, and the analysis efficiency is extremely low;

2. the individual understanding capability is different, and the understanding of useful information is deviated and easy to leak and error;

3. Hotspots are difficult to capture, and systematic rule support is not available, so that clustering summary is not facilitated.

Disclosure of Invention

In view of the above problems, the embodiments of the present application provide a method, an apparatus, a device, and a medium for processing a complaint work order based on semantic mining, which are used to solve the problems of low processing efficiency and low processing accuracy of the complaint work order in the prior art.

According to a first aspect of an embodiment of the present application, there is provided a complaint work order processing method based on semantic mining, including:

acquiring complaint work order data, and preprocessing the complaint work order data to acquire a complaint data set, wherein the preprocessing comprises data cleaning, data selection and text segmentation;

extracting text characteristic information of the complaint data set to obtain complaint work order text data, storing the complaint work order text data in a database in a structured form, classifying based on experts to form a corpus, and training the expected corpus to obtain trained model data;

determining an anti-shake language model according to the model data, and carrying out semantic mining on the complaint work order text data according to a rule base of grammar semantics and a complaint dictionary by utilizing the anti-shake language model;

Clustering the semantic feature data to be classified by adopting a clustering algorithm, and determining a classification subject corresponding to the complaint text data;

and analyzing complaint reasons and user behaviors corresponding to the complaint worksheet data according to the classification subjects and the user characteristic data corresponding to the complaint worksheet data, and determining classification results of the complaint worksheet data under the corresponding classification subjects.

In some optional implementations, the data cleaning of the complaint worksheet data includes: deleting redundant data and/or error data in the complaint work order data; and/or the number of the groups of groups,

the selecting the data of the complaint worksheet data comprises the following steps: selecting proper data related to the set application field from the complaint work order data; and/or the number of the groups of groups,

the text segmentation of the complaint worksheet data comprises the following steps: and carrying out Chinese word segmentation and paragraph segmentation on the complaint worksheet data.

In some optional implementations, the chinese word segmentation for the complaint worksheet data includes:

and converting the complaint work order data into text data, inputting the text data into a word segmentation model for word segmentation, marking the parts of speech by using a word segmentation engine in the word segmentation process, extracting keywords, and giving out words related to the description of the problems according to the word frequency record.

In some optional implementations, the text feature information extracting the complaint data set includes: and extracting the technical terms, the high-frequency words and the set template information from the complaint data set.

In some alternative implementations, training the anti-shake language model is further included;

in the training process, the anti-shake language model learns the mapping relation between words and classification labels in a training sample through a training set and the relation between contexts in the training sample in the learning process, judges the probability corresponding to word sequences forming sentences, and outputs a group of word sequences with the maximum probability as output according to the judging result.

In some alternative implementations, the method further includes:

after training the anti-shake language model, testing the anti-shake language model by adopting a test set, and optimizing model parameters of the anti-shake language model based on the test result and the correlation between the test set and the training set.

In some optional implementations, the clustering the semantic feature data to be classified using a clustering algorithm includes:

And clustering the semantic feature data to be classified by adopting an improved K-means clustering algorithm according to the complaint dictionary, determining classification subjects corresponding to the complaint text data, determining the number of clusters in the improved K-means clustering algorithm according to a Bayesian information criterion, and determining a theme keyword by an initial clustering center according to the complaint dictionary.

According to a second aspect of the embodiment of the present application, there is provided a complaint work order processing device based on semantic mining, including:

the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring complaint work order data and preprocessing the complaint work order data to acquire a complaint data set, and the preprocessing comprises data cleaning, data selection and text segmentation;

the model training module is used for extracting text characteristic information of the complaint data set to obtain complaint work order text data, storing the complaint work order text data in a database in a structured form, forming a corpus after expert classification, and training the expected database to obtain trained model data;

the text mining module is used for determining an anti-shake language model according to the model data, and carrying out semantic mining on the complaint work order text data according to a rule base of grammar semantics and a complaint dictionary by utilizing the anti-shake language model;

The clustering module is used for clustering the semantic feature data to be classified by adopting a clustering algorithm and determining a classification subject corresponding to the complaint text data;

the classification determining module is used for analyzing complaint reasons and user behaviors corresponding to the complaint work order data according to the classification subjects and the user characteristic data corresponding to the complaint work order data, and determining classification results of the complaint work order data under the corresponding classification subjects.

According to a third aspect of the embodiment of the present application, a complaint work order processing device based on semantic mining is provided, which includes a processor and a memory, where the memory stores a computer program executable by the processor, and the computer program implements the processing method according to any one of the above when executed by the processor.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a controller, implements the processing method of any one of the above.

In the method, the device, the equipment and the medium for processing the complaint work orders based on the semantic mining, the structural data which can be recognized and analyzed by a machine are obtained by preprocessing the complaint work order data and extracting the characteristics, then the semantic content of the structural complaint work order text data is deeply analyzed based on the semantic mining anti-shake language model, the complaint semantic characteristic data is obtained and clustered to the corresponding classification subject, and the classification result of the complaint work order data is obtained by analyzing the corresponding complaint reasons and the user behaviors based on the clustering result and the user characteristic data. Obviously, the complaint work order processing method provided by the application is beneficial to efficiently and accurately positioning the complaint reasons, promoting the optimization solution, assisting in capturing the user sensitivity characteristics and carrying out the re-complaint prediction of the user, thereby reducing the number of complaint users, improving the complaint solving efficiency and promoting the perception satisfaction.

The foregoing description is only an overview of the technical solutions of the embodiments of the present application, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present application can be more clearly understood, and the following specific embodiments of the present application are given for clarity and understanding.

Drawings

The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a schematic flow diagram of a complaint work order processing method based on semantic mining according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a complaint work order processing method based on semantic mining according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a complaint work order processing method based on semantic mining according to an embodiment of the present application;

FIG. 4 is a graph showing the comparison of time consumption indexes of a conventional K-means clustering algorithm and an improved K-means clustering algorithm according to an embodiment of the present application;

FIG. 5 is a contrast diagram of recall ratio indexes of a conventional K-means clustering algorithm and an improved K-means clustering algorithm provided by an embodiment of the present application;

FIG. 6 is a graph showing the comparison of precision indexes of a conventional K-means clustering algorithm and an improved K-means clustering algorithm provided by an embodiment of the present application;

FIG. 7 is a flow chart corresponding to a complaint work order processing method based on semantic mining according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a process for implementing user emotion classification by a method for processing a complaint work order based on semantic mining according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a specific implementation process of a method for processing a complaint work order based on semantic mining to implement a problem of locating and classifying a boundary of the complaint work order according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a complaint work order processing device based on semantic mining according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a complaint work order processing device based on semantic mining according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein.

Along with user complaints, a large number of complaint work orders are accumulated by the mobile operators, and besides relatively clear information such as simple time, place, number and the like, a large number of useful and fuzzy text information such as phenomena, emotion, expectations and the like are contained; and even the same complaint problem can show larger difference due to the different expression modes and channels of individuals. For these non-standardized text complaint information, the existing complaint processing system cannot realize effective intelligent analysis on the hot spot problem, and can only simply classify the problem by an inefficient manual mode. Based on the above, the inventor considers that the complaint work order complaint content field is split, deep semantic content mining is carried out, and the method has great significance on complaint root cause analysis, solution execution and user sensitivity identification. Therefore, the application provides a complaint work order processing method based on semantic mining. The complaint work order processing method based on semantic mining mainly carries out deep analysis aiming at the contents of complaint work order complaints, extracts complaint key information, summarizes the concentrated attention points of user complaints, is beneficial to efficiently and accurately positioning the complaint reasons, promotes optimization solution, can assist in capturing user sensitivity characteristics, and carries out user re-complaint prediction. Therefore, the complaint work order processing method based on semantic mining provided by the embodiment of the application can reduce the number of complaint users, improve the complaint solving efficiency and promote the perception satisfaction.

Based on the defects of the traditional processing mode of the complaint worksheets, the complaint worksheets processing method based on semantic mining is realized based on an intelligent recognition analysis algorithm, the intelligent recognition analysis algorithm applied to the embodiment of the application mainly comprises a text mining algorithm and a clustering algorithm, the complaint worksheets are subjected to feature extraction and mining by the text mining algorithm so as to mine complaint worksheets text information which is favorable for classification analysis, the complaint worksheets are clustered based on the clustering algorithm, and classification processing is performed on the complaint worksheets based on a clustering result. The text mining algorithm between the embodiments of the present application is a text mining algorithm based on an ASLM (Anti-Shake Language Model Anti-shake language model), and is hereinafter abbreviated as ASLM text mining algorithm. The ASLM text mining algorithm performs text mining based on the depth analysis and information of the complaint work orders, and intelligently identifies the semantic content of the complaint work orders, so that effective information which is beneficial to analysis of the root cause, the sensitivity characteristics and the like of the complaint of the users is mined. The clustering algorithm provided by the embodiment of the application is an improved K-means clustering algorithm based on topic keywords, complaint text information is accurately classified into given topic categories through the algorithm, the clustering number of the algorithm is determined by BIC (Bayesian information criterion), and the initial clustering center of the clusters is not randomly selected any more but is determined by specified topic keywords. And finally, delimiting the essential problem of the user complaints according to the clustering result. According to the classification method of the complaint worksheets, on the basis of the existing XDR (External Data Representation ) -based delimitation, the classification result of the intelligent recognition analysis module is newly added, and the two technologies are combined, so that more accurate delimitation and positioning of the complaint problems of the users are realized, and the satisfaction degree of the users is improved.

The complaint work order processing method based on semantic mining provided by the embodiment of the application is mainly characterized in that the semantic content of the complaint work order of a user is mined by intelligently analyzing the complaint work order, then the complaint work order is classified based on the complaint dictionary established aiming at the characteristics of the user, and the complaint dictionary is automatically updated in real time according to the classification result, so that the purpose of more accurate complaint classification is achieved. The method, the device, the equipment and the medium for processing the complaint work orders based on semantic mining provided by the embodiment of the application are further described in detail below with reference to fig. 1 to 11.

Referring to fig. 1, a schematic flow chart of a complaint work order processing method based on semantic mining according to some embodiments of the present application is provided. The complaint work order processing method based on semantic mining provided in the embodiment can be applied to the complaint work order processing device based on semantic mining provided in fig. 11. In this embodiment, the complaint work order processing method based on semantic mining includes S02, S04, S06, S08, and S10, which are described in detail below.

S02, acquiring complaint work order data, and preprocessing the complaint work order data to obtain a complaint data set, wherein the preprocessing comprises data cleaning, data selection and text segmentation.

And acquiring complaint work order data from the complaint work order system. The complaint work order system collects data of complaint contents of users, and collected complaints comprise complaints of users accepted by telephone hotlines and complaints accepted by online business halls. Sources of complaints received by telephone hotline are: after customer service personnel make a call, the contents of the complaints of the user are recorded. The sources of complaints accepted by the online business hall are as follows: complaint content text input by clients in online business halls. After the complaint work order data is obtained from the complaint work order system, the contents of the complaint work order data need to be preprocessed intensively to obtain clean complaint work order data. The preprocessing of the content of the complaint work order data can specifically include, but is not limited to, sequentially performing data cleaning, data selection and text segmentation on the complaint work order data. The data cleaning is to delete dirty data, namely useless data, in complaint work order data. The data selection is to select data meeting the setting conditions from the complaint work order data. The text segmentation is to segment the text data of the complaint work order based on the set segmentation rule. The complaint work order data mainly refers to a detailed complaint work order table obtained from a complaint work order system. After the foregoing preprocessing of the detailed list of complaints, the detailed list of complaints is converted into a complaint data set.

S04: extracting text characteristic information of the complaint data set to obtain complaint work order text data, storing the complaint work order text data in a database in a structured mode, classifying based on experts to form a corpus, and training the expected corpus to obtain trained model data.

After the pretreatment is carried out on the complaint worksheet data, a clean complaint data set is obtained. And then, feature extraction processing is required to be carried out on the complaint text data in the complaint data set, and the complaint work order data in the complaint work order data set is converted into the complaint work order text data in the Chinese text form. Further, all complaint work order text data are stored in a database in a structured form. The data stored in the structured form is beneficial to subsequent machine identification and analysis. And classifying the structured complaint work order text data in the database by using expert knowledge to form a corpus. The complaint worksheet data in the corpus carries corresponding classification labels. Training the text feature extraction model by using data in the corpus to obtain a corpus model, and obtaining model data of the corpus after training.

S06: determining an anti-shake language model according to the model data, and carrying out semantic mining on the complaint work order text data according to a rule base of grammar semantics and a complaint dictionary by utilizing the anti-shake language model.

According to the relation between complaint reasons and keywords established in the complaint dictionary, and combining with the data of the corpus model, an Anti-shake language model (ASLM, anti-Shake Language Model) is constructed by adopting a method based on Chinese word segmentation and rule matching. And intelligent analysis is carried out on the complaint work order text data by using the anti-shake language model. The anti-shake language model is a machine learning model, and intelligent classification processing by using the machine learning model is actually a knowledge mining process, namely, the text data of the complaint worksheet is faced, and semantic mining is carried out by combining a rule base of grammar and semantic and a complaint dictionary. According to the embodiment, a text classification technology based on machine learning is adopted for intelligent text mining of the complaint work order text data, classification modeling is carried out on the complaint work order through sedimentation of expert business knowledge, and the optimal classification of the text is calculated, so that follow-up automatic judgment of complaint reasons is achieved.

S08: and clustering the semantic feature data to be classified by adopting a clustering algorithm, and determining a classification subject corresponding to the complaint text data.

The embodiment of the application adopts an intelligent text clustering algorithm to divide the whole set of complaint text data into different categories according to a certain rule. Specifically, each topic corresponding to the complaint text data may be determined based on the keywords determined in the complaint dictionary. And then classifying the complaint text data into different classification subjects by using a clustering algorithm.

S010: and analyzing complaint reasons and user behaviors corresponding to the complaint worksheet data according to the classification subjects and the user characteristic data corresponding to the complaint worksheet data, and determining classification results of the complaint worksheet data under the corresponding classification subjects.

The user characteristic data may include, but is not limited to, brand information, activity information, preference information, power down information, etc. of the user. Classification topics such as user mood questions, delimitation positioning, etc. The classification result corresponding to the user emotion problem is a classification grade of the user emotion, such as a grade of bad emotion of the user. Based on the classification level of the user emotion, the classification result of the user behavior, such as re-complaint, upgrade complaint or number-carrying network transfer, can be further predicted. The localization of delimitation may refer to the cause of user complaints, equipment failure, and thus classification topics may further include failure issues. And classification results corresponding to the fault problems, such as terminal faults, base station faults, network capacity problems, interference problems, server problems and the like.

In view of the above, the method for processing the complaint worksheet based on semantic mining according to the embodiment of the application further performs feature extraction on the complaint worksheet data after preprocessing the complaint worksheet data to obtain structural data which can be recognized and analyzed by a machine, further performs deep analysis on semantic content of the structural complaint worksheet text data based on an anti-shake language model of semantic mining to obtain complaint semantic feature data, then clusters the complaint semantic feature data to corresponding classification subjects, and then analyzes corresponding complaint reasons and user behaviors of the complaint worksheet data based on the clustered results and the user feature data to obtain classification results of the complaint worksheet data under the corresponding classification subjects. Obviously, the complaint work order processing method provided by the application can refine key information in the complaint work order, summarize concentrated attention points of user complaints, not only be helpful for efficiently and accurately positioning complaint reasons and promoting optimization solution, but also assist in capturing user sensitivity characteristics and carrying out user re-complaint prediction, thereby finally reducing the number of complaint users, improving complaint solving efficiency and promoting perception satisfaction.

Referring to fig. 2, a flow chart of a method for processing a complaint work order based on semantic mining according to some embodiments of the present application is shown, where the difference between the embodiment shown in fig. 1 and the embodiment is that the steps for preprocessing the complaint work order are further limited, and the rest are the same. Therefore, in the present embodiment, a description is given of a preprocessing step of a complaint work order. Specifically, in this embodiment, S02 may specifically include S021, S022, and/or S023.

S021: and cleaning the data of the complaint work order data to delete redundant data and/or error data in the complaint work order data.

S022: and selecting data from the complaint work order data to select proper data relevant to the set application field from the complaint work order data.

S023: and text segmentation is carried out on the complaint worksheet data so as to carry out Chinese word segmentation and paragraph segmentation on the complaint worksheet data.

Specifically, in some embodiments, S023 further comprises: and converting the complaint work order data into text data, inputting the text data into a word segmentation model for word segmentation, marking the parts of speech by using a word segmentation engine in the word segmentation process, extracting keywords, and giving out words related to the description of the problems according to the word frequency record.

In some embodiments, extracting text feature information from the complaint data set in S04 specifically includes: and extracting the technical terms, the high-frequency words and the set template information from the complaint data set.

The anti-shake language model constructed in the S06 is a probability model algorithm, and when a general text mining algorithm is used for realizing classification, the problem of low classification accuracy can occur due to time fluctuation. The implementation of the scheme is that short-term model processing is newly added on the basis of a text classification algorithm of the language model, and the classification effect of the language model can be more stable after the processing, so that the purpose of jitter prevention is achieved, and the classification result is more accurate. Specifically, the process of modeling the content of complaint work order text data by the anti-shake language model text provided by the embodiment of the application is as follows:

For a sentence s=w consisting of m words ₁ w ₂ …w _m (w _i Representing the ith word of M words), defining probability P (S/M) of occurrence sequence of a group of words for S sentences in the anti-shake language model, wherein the representation mode of the probability P (S/M) can be represented by a multivariate model as shown in a formula (1):

in addition, the anti-shake language model also assumes that all words x in the complaint work order text data _n The text is irrelevant, and a expression formula of which the probability P (C/S) that the classification C of the complaint work order text data and the text content S (sentence S) are relevant is defined as shown in a formula (2):

in the above formula, P (C) is the prior probability, which represents the prior probability of any piece of text content associated with class C, where P (x) is used as the probability of generation _n and/C) it can be represented as a two-state hybrid model.

The anti-shake language model provided by the embodiment of the application is an efficient text mining classification algorithm, and the improvement of the algorithm mode of the algorithm is required to be trained and learned. Therefore, in some embodiments, the complaint text processing method based on semantic mining provided by the application further includes training an anti-shake language model, learning, by a training set, a mapping relationship between words and classification labels in a training sample and a relationship between contexts in the training sample in the learning process, determining the size of probabilities corresponding to word sequences forming sentences, and outputting a group of word sequences with the largest probabilities as output according to the determined results.

The anti-shake language model learns a plurality of words through a training set, considers the relation between contexts in the learning process, judges the probability corresponding to word sequences (word sequences) according to the magnitudes of P (S/M) and P (C/S), and outputs the probability corresponding to the word sequences and the sequence most probably corresponding to sentences S as the output of the anti-shake language model according to the judging result to serve as semantic feature data of complaint text data.

In some embodiments, in order to further obtain the classification effect of the anti-shake language model, after the training of the anti-shake language model, a test set may be further used to test the anti-shake language model, and model parameters of the anti-shake language model are optimized based on the test result and the correlation between the test set and the training set. When the correlation degree between the test set and the training set is high, if the obtained classification effect is good, the influence of the sample in the training set on the model is large. The more the number of sample sets is, the better the classification effect of the trained anti-shake language model is, but the number of sample sets and the word segmentation effect are not in a linear relationship, that is, when the number of samples in the sample sets reaches a certain number, the continuous increase of the sample scale cannot necessarily continuously improve the classification effect. The anti-shake language model has a good effect on the closed test of the test set, and the text representation method of the language model can eliminate the influence of time variation after anti-shake treatment, and can extract the characteristic information of the sample to a large extent, so that the classification of the text can be realized more efficiently.

Referring to fig. 3, a flowchart of a method for processing a complaint work order based on semantic mining according to some embodiments of the present application is shown, in this embodiment, S08 may be specifically S081, and S02, S04, S06, and S010 are the same as those shown in fig. 1, and further detailed description of S081 is provided.

S081: and clustering the semantic feature data to be classified by adopting an improved K-means clustering algorithm according to the complaint dictionary, determining classification subjects corresponding to the complaint text data, determining the number of clusters in the improved K-means clustering algorithm according to a Bayesian information criterion, and determining a theme keyword by an initial clustering center according to the complaint dictionary.

The conventional K-means algorithm includes the following steps:

(11) Selecting a clustering center: the traditional K-means algorithm randomly selects K points as initial cluster centers.

(12) Attribution determination of sample class: the distance from each object (semantic feature data corresponding to complaint work order text data) to each cluster center is calculated, and the data object is classified into the class of the cluster center closest to the data object.

(13) Optimization of a clustering center: and calculating a new cluster center until the cluster centers of two adjacent times have no change, which indicates that the adjustment of the data object is finished and the cluster criterion function is converged.

(14) In each iteration, it is examined whether the classification of each sample is correct, and if not, the classification is adjusted. After all the data are adjusted, the clustering center is modified again, and the next iteration is carried out.

The specific steps of the improved K-means clustering algorithm provided by the embodiment of the application are as follows:

(21) Selecting a clustering center: the improved K-means clustering algorithm estimates the number of clusters based on Bayesian information criteria (Bayesian Information Criterion, BIC) when selecting cluster centers. The theoretical source of the BIC criterion is bayesian probability theory, which is generally applied to the problem of optimal model selection, and the essence of the theory is that two indexes of complexity and fitting degree of a specific model are measured according to the BIC value, and an optimal balance state is hoped to be obtained between the two indexes, so that an optimal model is determined. From a given set of candidate models, the one of the function models that corresponds to the maximum BIC value is selected.

(22) Attribution determination of sample class: when the improved K-means clustering algorithm determines that the sample class belongs to, whether the number of topics can be optimized again is measured by calculating BIC corresponding to the current function model, namely when two classes can not determine whether merging is needed, BIC values of the whole sample set under the two conditions of merging and non-merging are compared, and a classification method under the condition is determined under the condition that the values under the two conditions are larger.

(23) Optimization of a clustering center: when the BIC value reaches a maximum in (22), it indicates that the number of topics present is the result that is optimal under the criteria.

(24) If all data objects are correctly classified according to the BIC maximum principle in one iteration algorithm, no adjustment is performed, the cluster center is not changed, and the algorithm is ended.

The number K of topics of the improved K-means clustering algorithm is the keyword of K topics designated by a user, an initial clustering center is selected for each topic, and the K-means clustering algorithm is executed, so that the K-means clustering effect of the improved algorithm is superior to that of the conventional K-means clustering algorithm which is not improved in general because the number and the initial center of the clusters are not designated randomly.

To demonstrate that the improved K-means clustering algorithm provided in accordance with the examples of the present application has superior clustering effects to conventional K-means clustering algorithms. The improved K-means clustering algorithm and the clustering quality of the traditional K-means clustering algorithm are evaluated. The indexes for evaluating the clustering effect, which are frequently used in the clustering quality evaluation, are precision (precision), recall (recall) and F-measure (time consumption), wherein F-measure is obtained by the operation of the first two indexes and is a comprehensive index for evaluating by combining the two indexes, and if the larger value of F-measure is, the better the clustering result is indicated. As shown in formulas (3), (4) and (5), respectively:

Wherein N is _ij Represents the number of samples belonging to the ith original class in the jth result set, N _j For the sample total number of the jth result set, N _i Representing the total number of samples of the i-th original classification.

The traditional K-means clustering algorithm is compared with three indexes of recall ratio, precision ratio and time consumption of the improved K-means algorithm in the embodiment of the application:

the testing process is as follows: from the complaint text records, 18000 records respectively belonging to the category 1 and the category 2 are selected; 24000 pieces of text are extracted from the rest text records as noise text; the two clustering algorithms are executed on the mixed sample space, the vector space dimension is respectively 100-400, the interval is 100, and the comparison result of the consumed time, recall ratio and precision ratio corresponding to the traditional K-means clustering algorithm and the improved K-means algorithm in the embodiment of the application is respectively shown in fig. 4, 5 and 6. In fig. 4 to 6, a line a represents an index corresponding to a conventional K-means clustering algorithm, and a line B represents an index corresponding to an improved K-means algorithm in the embodiment of the present application.

Experimental results prove that the optimization algorithm based on the BIC criterion provided by the embodiment of the application not only utilizes the advantages of the traditional K-means clustering algorithm, but also overcomes the defect of sensitive initial clustering center in the traditional K-means clustering algorithm, so that the optimization algorithm not only consumes less time, but also is superior to the traditional K-means clustering algorithm in terms of recall ratio and precision ratio indexes

Fig. 7 is a flow chart corresponding to a method for processing a complaint work order based on semantic mining according to an embodiment of the present application. The following describes the complaint work order processing method based on semantic mining according to the embodiment of the present application in detail based on the description of fig. 7.

As shown in fig. 7, in the present embodiment, the processing of the complaint work order data includes four modules of corpus input, model training of the complaint work order text, intelligent text mining algorithm, and improved clustering algorithm. The corpus input module is the above S02, the complaint worksheet text model training is the above S04, the intelligent text mining algorithm module is the above S06, and the clustering algorithm module is the above S08 and S010.

And in the corpus input module, the complaint work order system constructs a complaint content detail table according to complaints accepted by telephone hotlines and complaints accepted by online business halls, and performs pretreatment including data cleaning, data selection and text segmentation on complaint work order data in the complaint content detail table so as to convert the complaint content detail table into a complaint data set. The complaint data set is stored in a database in a structured form after being extracted with text characteristic information for classification by an expert. The expert classifies the structured complaint worksheet text data to add corresponding classification labels for the complaint worksheet data to obtain corpus data, forming a corpus, and training the corpus to form model data. And constructing an ASLM intelligent mining algorithm according to the trained model data in an intelligent text mining algorithm module, namely constructing an anti-shake language model, and then extracting text features of the complaint work order text data by utilizing the anti-shake language model to obtain semantic text data. And in the improved clustering algorithm module, based on an improved K-means clustering algorithm, based on the relation between complaint reasons and keywords established in a complaint dictionary, clustering semantic text data to obtain classification subjects corresponding to the complaint worksheets, and further based on user feature data and clustering results, analyzing the complaint reasons and user behaviors of the complaint worksheets to obtain classification of complaint users, classification summary of the complaint reasons and classification summary of the complaint worksheets. The classification summary of the complaint work orders is based on classification results of the complaint work orders, and the behaviors of complaint users are predicted so as to predict the behaviors possibly existing in the follow-up of the complaint users. The corpus input module further comprises user feature data, wherein the user feature data comprises user brand information, activity information, preferential information, startup and shutdown information and the like, and the user feature information is further input into the user feature extraction model to obtain the user feature data.

In order to further clearly describe the complaint work order processing method based on semantic analysis provided by the embodiment of the application, two application scenes are taken as examples to describe the application of the complaint work order processing method provided by the embodiment of the application.

The first application scene is a user emotion classification application obtained based on the processing of the complaint worksheet. In a first application of the scene, the process of implementing user emotion classification according to the processing method of the complaint worksheet based on semantic mining according to the embodiment of the application is shown in fig. 8, which specifically includes: after the complaint text is obtained, the complaint text is input into a module realized by an AI intelligent classification method based on the semantic content of the complaint worksheet, and word segmentation is performed. In the word segmentation process, a word segmentation engine is adopted to label the parts of speech, then keyword extraction is carried out, and related words related to the emotion of the user are given out according to word frequency records, such as: complaints of the letter department are repeated, words such as network change and the like are not solved, and the words can express emotion of a user. After analyzing the sensitive words, carrying out semantic analysis by adopting an anti-shake language model, carrying out classified modeling on the complaint work orders based on an improved K-pieces clustering algorithm, complaint reasons and keywords established in a complaint dictionary, obtaining analysis results of the complaint work orders, extracting information from the analysis results for subsequent classified prediction of emotion analysis, classifying the emotion of the user according to the analysis results of the complaint work orders, and obtaining the emotion classification level of the user. And finally, predicting the second complaint of the user based on classification according to the emotion level of the user, and upgrading the complaint and the tendency of carrying numbers to turn network.

And the second application scene is used for realizing the classified application of the delimiting and positioning problems of the complaint worksheets based on the treatment of the complaint worksheets. In the application of scene two, the specific implementation process of realizing the problem of locating and classifying the boundary of the complaint worksheet based on the processing method of the complaint worksheet of the semantic mining according to the embodiment of the application is shown in fig. 9, and the specific implementation process is as follows: after the complaint text is obtained, the complaint text is input into a module realized by an AI intelligent classification method based on the semantic content of the complaint worksheet, and word segmentation is performed. In the word segmentation process, a word segmentation engine is adopted to label parts of speech, then keyword extraction is carried out, and related vocabularies such as problem description are given according to word frequency records: the system can not access the network, can not hear clearly, has no sound or signal, has slow access to the network, has poor signal, has no communication with the telephone, has recently started, has long time existence, has no problem for others, has no problem for switching on or switching off, and the like, and can directly or indirectly point to the reasons of complaint faults. After analysis of the complaint related words, semantic analysis, classification modeling and information extraction are carried out, analysis results are given, and classification of complaint problems is given finally, so that whether the complaint problems are terminal problems, base station faults, network capacity, interference problems or server problems is analyzed.

According to the complaint work order processing method based on semantic mining, time efficiency and accuracy of the complaint work order processing are improved greatly. The manual processing work orders are about 200 pieces per day, more than 2000 pieces of intelligent classification of the complaint work order processing method based on semantic mining can be processed per day, the labor cost is saved by about 10 people per day, the efficiency is improved by 10 times compared with that of manual processing, and compared with the useful information loss in manual processing, the accuracy of complaint delimitation is improved by 48% after the method introduces root cause vocabulary; after the user re-complaint characteristic results are introduced, the accuracy of predicting the secondary complaint is improved by 53%.

From the above, the complaint work order processing method based on semantic mining provided by the embodiment of the application at least comprises one of the following beneficial effects:

1. on the basis of a semantic content mining algorithm, in order to solve the problem that the accuracy of the semantic recognition algorithm fluctuates with time, an anti-shake model algorithm is adopted in the scheme, and an anti-shake scheme is introduced in the algorithm, so that the accuracy is improved, and the accuracy becomes more stable.

2. On reason summary of complaint problems, an improved K-means clustering algorithm based on topic keywords is adopted, through which complaint texts are accurately classified into given topic categories, the clustering number of the algorithm is determined by BIC criteria, and the initial clustering center of the clusters is not randomly selected any more but is determined by topic keywords appointed by users.

3. The method for identifying the user features based on the complaint worksheet semantic content mining is characterized in that complaint worksheet information is split, the worksheet semantic content mining is realized, effective information is extracted, and the XDR-based delimiting method and the complaint text-based delimiting method are combined, so that the accurate positioning and prediction of complaint problems are facilitated.

Referring to fig. 10, a schematic structural diagram of a complaint work order processing device 100 based on semantic mining according to some embodiments of the present application is shown. The implementation device 100 includes a preprocessing module 101, a model training module 102, a text mining module 103, a clustering module 104, and a classification determination module 105. The preprocessing module 101 is configured to obtain complaint work order data, and preprocess the complaint work order data to obtain a complaint data set, where the preprocessing includes data cleaning, data selecting and text segmentation. The model training module 102 is configured to extract text feature information from the complaint data set to obtain complaint work order text data, store the complaint work order text data in a database in a structured form, form a corpus after classifying based on experts, and train the corpus to obtain trained model data. The text mining module 103 is configured to determine an anti-shake language model according to the model data, and perform semantic mining on the complaint work order text data according to a rule base of grammar semantics and a complaint dictionary by using the anti-shake language model, where the complaint work order semantic feature data. The clustering module 104 is configured to cluster the semantic feature data to be classified by using a clustering algorithm, and determine a classification topic corresponding to the complaint text data. The classification determining module 105 is configured to analyze, according to the classification subject and the user feature data corresponding to the complaint work order data, a complaint cause and a user behavior corresponding to the complaint work order data, and determine a classification result of the complaint work order data under the corresponding classification subject.

In an alternative implementation manner, the preprocessing module 101 is further configured to perform data cleaning on the complaint work order data, so as to delete redundant data and/or error data in the complaint work order data; and/or selecting data from the complaint work order data to select proper data relevant to the set application field from the complaint work order data; and/or text segmentation is carried out on the complaint worksheet data so as to carry out Chinese word segmentation and paragraph segmentation on the complaint worksheet data.

In an alternative implementation, model training module 102 is also configured to extract terms, high frequency words, and set template information from the complaint data set.

In an alternative implementation manner, the complaint work order processing device based on semantic mining further includes a text mining training module (not shown in fig. 10), which is configured to train the anti-shake language model, learn, through a training set, a mapping relationship between words and classification labels in a training sample and a relationship between contexts in the training sample during learning, determine a probability corresponding to a word sequence forming a sentence, and output, as an output, a set of word sequences with the largest probability according to a result of the determination.

In an alternative implementation, the complaint work order processing device based on semantic mining further includes a text mining test module (not shown in fig. 10) for testing the anti-shake language model with a test set after training the anti-shake language model, and optimizing model parameters of the anti-shake language model based on the result of the test and the correlation between the test set and the training set.

In an alternative implementation manner, the clustering module 104 is further configured to cluster the semantic feature data to be classified according to a complaint dictionary by using an improved K-means clustering algorithm, determine a classification topic corresponding to the complaint text data, determine the number of clusters according to a bayesian information criterion by using the number of clusters in the improved K-means clustering algorithm, and use an initial cluster center as a topic keyword determined according to the complaint dictionary.

According to the complaint work order processing device based on semantic mining, through preprocessing and feature extraction on complaint work order data, structured data which can be recognized and analyzed by a machine are obtained, semantic content of structured complaint work order text data is deeply analyzed based on an anti-shake language model of semantic mining, complaint semantic feature data are obtained and clustered to corresponding classification subjects, and based on a clustering result and user feature data, complaint work order data corresponding complaint reasons and user behaviors are analyzed, so that classification results of the complaint work order data are obtained. Obviously, the complaint work order processing method provided by the application is beneficial to efficiently and accurately positioning the complaint reasons, promoting the optimization solution, assisting in capturing the user sensitivity characteristics and carrying out the re-complaint prediction of the user, thereby reducing the number of complaint users, improving the complaint solving efficiency and promoting the perception satisfaction.

Fig. 11 is a schematic structural diagram of a complaint work order processing device based on semantic mining according to an embodiment of the present application. The embodiment of the application does not treat the complaint work orders based on semantic mining.

As shown in fig. 11, the access network device may include: a processor 1102, a communication interface (Communications Interface), a memory 1106, and a communication bus 1108.

Wherein: processor 1102, communication interface 1104, and memory 1106 communicate with each other via a communication bus 1108. A communication interface 1104 for communicating with network elements of other devices, such as clients or other servers. Processor 1102 is configured to execute program 1110, and specifically may execute relevant steps in the semantic mining-based complaint work order processing method provided in any of the foregoing embodiments.

In particular, program 1110 may include program code comprising computer-executable instructions.

The processor 1102 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors comprised by the access network device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

Memory 1106 for storing program 1110. The memory 1106 may include high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

According to the complaint work order processing equipment based on semantic mining, through preprocessing and feature extraction of the complaint work order data, structured data which can be recognized and analyzed by a machine are obtained, semantic content of structured complaint work order text data is deeply analyzed based on an anti-shake language model of semantic mining, the complaint semantic feature data is obtained and clustered to corresponding classification subjects, and based on a clustering result and user feature data, the complaint work order data corresponding complaint reasons and user behaviors are analyzed, so that classification results of the complaint work order data are obtained. Obviously, the complaint work order processing method provided by the application is beneficial to efficiently and accurately positioning the complaint reasons, promoting the optimization solution, assisting in capturing the user sensitivity characteristics and carrying out the re-complaint prediction of the user, thereby reducing the number of complaint users, improving the complaint solving efficiency and promoting the perception satisfaction.

In addition, the embodiment of the application also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and the computer program realizes the steps in the complaint work order processing method based on semantic mining provided in any embodiment in the market when being executed by a processor. Since the above description has been made in detail on each step in the complaint work order processing method based on semantic mining and the obtained technical effects are similar, no description will be made here.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of the embodiments of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed application requires more features than are expressly recited in each claim.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component, and they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims

1. A complaint work order processing method based on semantic mining is characterized by comprising the following steps:

According to a preset anti-shake language model, a grammar semantic rule base and a complaint dictionary, carrying out semantic mining on the complaint work order text data, and carrying out semantic feature data on the complaint work order; the preset anti-shake language model is obtained in advance according to XXXX

2. The method of claim 1, wherein the data cleansing of the complaint worksheet data comprises: deleting redundant data and/or error data in the complaint work order data; and/or the number of the groups of groups,

3. The method of claim 2, wherein said chinese word segmentation of said complaint worksheet data comprises:

4. The method of claim 1, wherein the text feature information, the text feature information extraction of the complaint dataset, comprises: and extracting the technical terms, the high-frequency words and the set template information from the complaint data set.

5. The method of claim 1, further comprising training the anti-shake language model;

6. The method as recited in claim 5, further comprising:

7. The method of claim 1, wherein clustering the semantic feature data to be classified using a clustering algorithm comprises:

8. A complaint work order processing device based on semantic mining, comprising:

9. A complaint work order processing apparatus based on semantic mining, characterized by comprising a processor and a memory, wherein a computer program executable by the processor is stored in the memory, which computer program, when executed by the processor, implements the processing method according to any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a controller, implements the processing method according to any one of claims 1 to 7.