CN107943966A - Abnormal individual character decision method and device based on microblogging text - Google Patents
Abnormal individual character decision method and device based on microblogging text Download PDFInfo
- Publication number
- CN107943966A CN107943966A CN201711211558.4A CN201711211558A CN107943966A CN 107943966 A CN107943966 A CN 107943966A CN 201711211558 A CN201711211558 A CN 201711211558A CN 107943966 A CN107943966 A CN 107943966A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msup
- preset time
- microblogging text
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000008451 emotion Effects 0.000 claims abstract description 40
- 238000012706 support-vector machine Methods 0.000 claims abstract description 20
- 230000008909 emotion recognition Effects 0.000 claims abstract description 8
- 241001269238 Data Species 0.000 claims description 25
- 238000012360 testing method Methods 0.000 claims description 19
- 238000013480 data collection Methods 0.000 claims description 17
- 238000002790 cross-validation Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000007935 neutral effect Effects 0.000 claims description 8
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 238000002474 experimental method Methods 0.000 claims 1
- 238000005259 measurement Methods 0.000 claims 1
- 230000008447 perception Effects 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 4
- 238000003491 array Methods 0.000 description 7
- 230000002996 emotional effect Effects 0.000 description 7
- 230000006399 behavior Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000005856 abnormality Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000003997 social interaction Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000020509 sex determination Effects 0.000 description 1
- 230000011273 social behavior Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Primary Health Care (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Computational Linguistics (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of abnormal individual character decision method and device based on microblogging text.This method includes:Obtain the second default quantity bar microblogging text data of the first default quantity user in preset time period;Emotion recognition is carried out to the second default quantity bar microblogging text data using support vector machines and is marked, obtains the 3rd default quantity kind emotion;The microblogging text data with affective tag is counted according to preset time unit, obtains cube;Joint probability density calculating is carried out to cube, obtains the joint probability density value of each cube;When joint probability density value is less than density value threshold value, it is abnormal to judge that emotion of the user in preset time unit occurs.As it can be seen that the corresponding emotion processing of the microblogging text of the public is cube by the present invention, then batch calculates the joint probability density value of cube, and what can be quantified detects abnormal individuals, realizes fairly simple.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of abnormal individual character decision method based on microblogging text
And device.
Background technology
Be currently based on the abnormal individual character decision method of microblogging mainly have it is following two:First method is to be based on user behavior
The abnormal individual character of mode excavation user, this method point out that the key issue of abnormality detection is normal use pattern (normal
Usage profiles) foundation and how active user's behavior to be compared and judged using the pattern.Behavior pattern
Refer to certain regularity embodied during program execution or user's operation, once occur and the conventional normal behaviour pattern of user
There is different behaviors or word, then whether the mood for needing to consider the user exception occurs.However, user behavior pattern is dug
The method for digging the abnormal individual character of user is more specific, it is necessary to which the Social behaviors pattern conventional to each user is tracked description and builds
Mould, then could be compared detection to being likely to occur abnormal behavior, implement more time-consuming.
Second method is to detect abnormal individual character based on the interaction between user on social networks.By to user's microblogging
Interaction (thumbing up comment etc.) opening relationships network of text and good friend, that is, a social network diagram, from the mould of emotional interaction
The individual for abnormal emotion occur is detected in type figure.The abnormality of user occurs close in social media with his/her friend
Correlation, such a method are based on large-scale Twitter data sets, from the social platform of real world, systematically study user's
Pressure state and the correlation of social interaction.One group of extremely relevant text, vision and social property are defined first, are then proposed
A kind of new mixed model-factor graph models coupling convolutional neural networks, utilize Twitter contents and social interaction's information
Stress mornitoring, tests the partially connected (i.e. no triangle joint) of social structure for the user for showing to have abnormal individual character than non-exception
User will be higher by 14%, show that friend's social structure of the user of abnormal individual character often connects less, uncomplicated.However, it is based on
Interaction on social networks between user come detect abnormal individual character need to excavate it is mutual between user and its related friend
It is dynamic, but emotion excacation is often relatively difficult, and social architectural feature unobvious, it is more than partially connected and triangle
Shape, may can also be related to the structure of many complexity, be unfavorable for therefrom finding rule and the abnormal individual character of detection.
The content of the invention
For in the prior art the defects of, the present invention provides a kind of abnormal individual character decision method based on microblogging text and
Device, for solve that abnormal individual character in the prior art is time-consuming and laborious or emotion excavate in social complicated be unfavorable for finding rule
The problem of rule and detection exception individual character.
In a first aspect, an embodiment of the present invention provides a kind of abnormal individual character decision method based on microblogging text, the side
Method includes:
Obtain the second default quantity bar microblogging text data of the first default quantity user in preset time period;
Emotion recognition is carried out to the described second default quantity bar microblogging text data using support vector machines and is marked, is obtained
3rd default quantity kind emotion;
The microblogging text data with affective tag is counted according to preset time unit, obtains cube;
The length of the preset time period is the several times of the preset time unit;
Joint probability density calculating is carried out to the cube, obtains the joint probability density of each cube
Value;
When joint probability density value is less than density value threshold value, judge that emotion of the user in preset time unit occurs
It is abnormal.
Alternatively, the described second default quantity kind emotion is 5 kinds, respectively neutral, happy, surprised, sad and angry, right
The label answered is 0,1,2,3 and 4.
Alternatively, the microblogging text data with affective tag is counted according to preset time unit, obtains multidimensional
Data set includes:
Classified according to support vector machines to the described second default quantity bar microblogging text data;
For each user in the described first default quantity user, the five dimension data collection of the user are determined.
Alternatively, choosing the density value threshold value includes:
Based on the second default quantity bar microblogging text data, according to the described first default quantity user and it is described default when
Between the corresponding whole preset time units of section obtain multiple five dimension datas collection;
Batch calculates the joint probability density of the multiple five dimension datas collection;
The multiple five dimension datas collection is divided into cross validation collection and test set;
Based on different threshold values, the cross validation collection is tested according to joint probability density function, obtains multigroup reality
Test result;
Density value threshold value using the corresponding threshold value of accuracy rate highest of multigroup experimental result as the test set.
Alternatively, the joint probability density function is represented using the following formula:
In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection.
Second aspect, an embodiment of the present invention provides a kind of abnormal individual character decision maker based on microblogging text, the dress
Put including:
Text data acquisition module, for obtaining the second default quantity of the first default quantity user in preset time period
Bar microblogging text data;
Text emotion identification module, for using support vector machines to the described second default quantity bar microblogging text data into
Row emotion recognition simultaneously marks, and obtains the 3rd default quantity kind emotion;
Data set statistical module, for uniting according to preset time unit to the microblogging text data with affective tag
Meter, obtains cube;The length of the preset time period is the several times of the preset time unit;
Density value computing module, for carrying out joint probability density calculating to the cube, obtains each multidimensional
The joint probability density value of data set;
Determination module, for when joint probability density value is less than density value threshold value, judging the user in preset time list
Emotion in position occurs abnormal.
Alternatively, the described second default quantity kind emotion is 5 kinds, respectively neutral, happy, surprised, sad and angry, right
The label answered is 0,1,2,3 and 4.
Alternatively, the data set statistical module includes:
Microblogging text classification unit, for according to support vector machines to the described second default quantity bar microblogging text data into
Row classification;
Data set determination unit, for for each user in the described first default quantity user, determining the use
The five dimension data collection at family.
Alternatively, described device further includes density value threshold value acquisition module;The density value threshold value acquisition module includes:
Data set acquiring unit, for presetting quantity bar microblogging text data based on second, according to first present count
Amount user and the corresponding whole preset time units of the preset time period obtain multiple five dimension datas collection;
Density value computing unit, for calculating the joint probability density value of the multiple five dimension datas collection in batches;
Data set grouped element, for the multiple five dimension datas collection to be grouped into cross validation collection and test set;
Experimental considerations unit, for based on different threshold values, being carried out according to joint probability density function to the cross validation collection real
Test, obtain multigroup experimental result;
Density value threshold value determination unit, for the corresponding threshold value of accuracy rate highest of multigroup experimental result to be determined as
The density value threshold value of the test set.
Alternatively, the joint probability density function is represented using the following formula:
In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection.
As shown from the above technical solution, the embodiment of the present invention is by obtaining the first default quantity user in preset time period
The second default quantity bar microblogging text data;Using support vector machines to the described second default quantity bar microblogging text data into
Row emotion recognition simultaneously marks, and obtains the 3rd default quantity kind emotion;According to preset time unit to the microblogging with affective tag
Text data is counted, and obtains cube;The length of the preset time period is the several times of the preset time unit;
Joint probability density calculating is carried out to the cube, obtains the joint probability density value of each cube;Work as connection
When conjunction probability density value is less than density value threshold value, it is abnormal to judge that emotion of the user in preset time unit occurs.As it can be seen that this
The corresponding emotion processing of the microblogging text of the public is cube by invention, and then the joint of batch calculating cube is general
Rate density value, what can be quantified detects abnormal individuals, realizes fairly simple.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with
Other attached drawings are obtained according to these figures.
Fig. 1 is that the method flow of the abnormal individual character decision method provided in an embodiment of the present invention based on microblogging text is illustrated
Figure;
Fig. 2 is the disaggregated model that support vector machines provided in an embodiment of the present invention handles microblogging text data;
Fig. 3 is that the method flow of the abnormal individual character decision method provided in an embodiment of the present invention based on microblogging text is illustrated
Figure;
Fig. 4 is multivariate Gaussian distribution process result schematic diagram;
Fig. 5 is one abnormality detection proof diagram of case study on implementation;
Fig. 6 is two abnormality detection proof diagram of case study on implementation;
Fig. 7 is the block diagram of the abnormal individual character decision maker provided in an embodiment of the present invention based on microblogging text.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work
Embodiment, belongs to the scope of protection of the invention.
Fig. 1 is that the method flow of the abnormal individual character decision method provided in an embodiment of the present invention based on microblogging text is illustrated
Figure.As shown in Figure 1, being somebody's turn to do the abnormal individual character decision method based on microblogging text includes:
101, obtain the second default quantity bar microblogging text data of the first default quantity user in preset time period;
102, emotion recognition is carried out to the described second default quantity bar microblogging text data using support vector machines and is marked,
Obtain the 3rd default quantity kind emotion;
103, the microblogging text data with affective tag is counted according to preset time unit, obtains multidimensional data
Collection;The length of the preset time period is the several times of the preset time unit;
104, joint probability density calculating is carried out to the cube, obtains the joint probability of each cube
Density value;
105, when joint probability density value is less than density value threshold value, judge emotion of the user in preset time unit
Occur abnormal.
As it can be seen that the corresponding emotion processing of the microblogging text of the public is cube by the present invention, then batch calculates more
The joint probability density value of dimension data collection, what can be quantified detects abnormal individuals, realizes fairly simple.
With reference to the accompanying drawings and examples to the exception sex determination side provided in an embodiment of the present invention based on microblogging text
Each step of method is described in detail.
First, 101 are introduced, obtains the second default quantity bar microblogging text of the first default quantity user in preset time period
The step of notebook data.
Above-mentioned preset time period can be one day, January or 1 year etc., and those skilled in the art can be according to specific field
Scape is configured, and is not limited thereto.In one embodiment, preset time period is a calendar month.
Above-mentioned first default quantity can be 100,1000,10000,100000 etc., and those skilled in the art can basis
Concrete scene is configured, and is not limited thereto.In one embodiment, the first default quantity is 100.
Similarly, the above-mentioned second default quantity can be 100,1000,10000,100000 etc., and those skilled in the art can be with
It is configured, is not limited thereto according to concrete scene.In two embodiments, the first default quantity is 10000.
In the embodiment of the present invention, 10000 microblogging texts of 100 users in microblogging are collected.
Secondly, 102 are introduced, emotion knowledge is carried out to the described second default quantity bar microblogging text data using support vector machines
The step of not and marking, obtaining the 3rd default quantity kind emotion.
Above-mentioned 3rd default quantity can be 3,4,5 even more, and those skilled in the art can be according to specifically being set
Put.In one embodiment, the 3rd default quantity is 5 kinds, i.e., the 3rd default quantity kind emotion can be neutral, happy, surprised, wound
The heart and anger.
Using support vector machines to above-mentioned 10000 progress emotion recognitions, i.e. every microblogging in one embodiment of the invention
Text data corresponds to neutral, happy, surprised, sad or angry.As shown in Fig. 2, microblogging text of the support vector machines to input
Notebook data is arranged, and is divided into 5 types.For convenience of subsequent quantitation calculate, in an embodiment of the present invention using label " 0,1,
2nd, 3,4,5 " above-mentioned 5 kinds of emotions are substituted, i.e., using label " 0,1,2,3,4,5 " respectively mark " it is neutral, happy, surprised, sad and
The corresponding microblogging text of anger ".The microblogging text data of 5 class labels is subjected to text vector, carries out feature selecting, then,
The weight (TF*IDF) of each feature is calculated, model training is finally carried out and prediction obtains the classification results of microblogging text data.
As it can be seen that training set and test set are extracted from microblogging text by support vector machines in the embodiment of the present invention
Vector characteristics, and then emotional semantic classification result is provided to test set, it can be ensured that the accuracy of emotional semantic classification.
Again, 103 are introduced, the microblogging text data with affective tag is counted according to preset time unit, is obtained
To cube;The step of length of the preset time period is the several times of the preset time unit.
Above-mentioned preset time unit can be one day, January or 1 year etc., can be the part of preset time period, i.e., in advance
If the period can be the several times of preset time, those skilled in the art can be configured according to concrete scene, not make herein
Limit.In one embodiment, preset time period is a calendar month.
One embodiment of the invention carries out statistical classification to the microblogging text marked, is used for the described first default quantity
Each user in family, determines the five dimension data collection of the user, so as to obtain the microblogging issue feelings of each user in every month
Condition.
4th, 104 are introduced, joint probability density calculating is carried out to the cube, obtains each cube
Joint probability density value the step of.
Joint probability density value is calculated to five dimension data collection batch in one embodiment of the invention, calculation formula is:
X (k) is five dimension variable datas in formula, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection.
Finally, 105 are introduced, when joint probability density value is less than density value threshold value, judges the user in preset time list
There is abnormal step in emotion in position.
In the present embodiment, according to the joint probability density value of batch calculating, suitable density value threshold value is selected, when joint is general
When rate density is less than the density value threshold value, then judge that user feeling occurs different in certain a period of time in the month or the month
Often, abnormal user is marked.Also, it can also be examined according to the microblogging text in certain a period of time in the month or the month
Whether these users there is abnormal emotion really, so as to improve the accuracy of suggestion mode.
It should be noted that choosing the density value threshold value by following steps in the embodiment of the present invention includes:
Based on the second default quantity bar microblogging text data, according to the described first default quantity user and it is described default when
Between the corresponding whole preset time units of section obtain multiple five dimension datas collection;
Batch calculates the joint probability density of the multiple five dimension datas collection;
The multiple five dimension datas collection is divided into cross validation collection and test set;
Based on different threshold values, the cross validation collection is tested according to joint probability density function, obtains multigroup reality
Test result;
Density value threshold value using the corresponding threshold value of accuracy rate highest of multigroup experimental result as the test set.
Embodiment one
With in May, 2016 public's microblog emotional data instance shown in Fig. 3 in the present embodiment.Identified using support vector machines
The emotion of microblogging text data, obtains five dimension datas collection (part) as shown in Table 1.
The emotional semantic classification statistics of 1 user's issuing microblog text of table
Multivariate Gaussian distribution process is carried out to above-mentioned five dimension datas collection in the present embodiment, as shown in Figure 4.Five dimension in present case
The calculating process of the joint probability density of data is as follows:
Input:
o Data:D x N arrays, represent the data sample of N number of D dimensions, are the matrix of 21*5 in the present embodiment
o Mu:D x K arrays, represent the average of data set
o Sigma:D x D x K arrays, represent the covariance matrix of data set
Output:
o prob:1 x N arrays, represent the probability density of N number of data point.
The matlab codes for calculating joint probability density are as follows:
Mu=mean (Data, 1) % averages by dimension
Sigma=cov (Data) % seeks matrix covariance
Data=Data'-repmat (Mu', nbData, 1);
% seeks joint probability density
Prob=sum ((Data*inv (Sigma)) .*Data, 2);
Prob=exp (- 0.5*prob)/sqrt ((2*pi) ^nbVar* (abs (det (Sigma))+realmin)).
The present embodiment calculates the joint probability density of above-mentioned five dimension datas collection, such as table according to joint probability density function formula
Shown in 2.
2 joint probability density value of table
According to the density value threshold value (4e-05) of setting, abnormal user is marked, as the sign user of table 2, corresponding joint are general
Rate density value is 2.13e-06.
Finally verify whether abnormal emotion occur with reference to microblogging text data of the user in May, 2016, such as Fig. 5 institutes
Show.
Embodiment two
With in January, 2016 public's microblog emotional data in the present embodiment, microblogging text data is identified using support vector machines
Emotion, obtain five dimension data collection (part) as shown in table 3.
The emotional semantic classification statistics of 3 user's issuing microblog text of table
Multivariate Gaussian distribution process is carried out to above-mentioned five dimension datas collection in the present embodiment, as shown in Figure 4.Five dimension in present case
The calculating process of the joint probability density of data is as follows:
Input:
o Data:D x N arrays, represent the data sample of N number of D dimensions, present case is the matrix o Mu of 15*5:D x K numbers
Group, represents the average of data set
o Sigma:D x D x K arrays, represent the covariance matrix of data set
Output:
o prob:1xN arrays, represent the probability density of N number of data point.
The matlab codes for calculating joint probability density are as follows:
Mu=mean (Data, 1) % averages by dimension
Sigma=cov (Data) % seeks matrix covariance
Data=Data'-repmat (Mu', nbData, 1);
% seeks joint probability density
Prob=sum ((Data*inv (Sigma)) .*Data, 2);
Prob=exp (- 0.5*prob)/sqrt ((2*pi) ^nbVar* (abs (det (Sigma))+realmin));
The present embodiment calculates the joint probability density of above-mentioned five dimension datas collection according to joint probability density code, such as the institute of table 4
Show.
4 joint probability density value of table
According to the density value threshold value (4e-05) of setting, abnormal user is marked, as the sign user of table 4, corresponding joint are general
Rate density value is 1.46e-08,1.09e-07 and 7.65e-08.
Finally verify whether abnormal emotion occur with reference to microblogging text data of the user in January, 2016, such as the moon in Fig. 6
Shown in the corresponding content of shadow.
The embodiment of the present invention additionally provides a kind of abnormal individual character decision maker based on microblogging text, as shown in fig. 7, described
Device includes:
Text data acquisition module 701, second for obtaining the first default quantity user in preset time period are default
Quantity bar microblogging text data;
Text emotion identification module 702, for presetting quantity bar microblogging textual data to described second using support vector machines
According to progress emotion recognition and mark, obtain the 3rd default quantity kind emotion;
Data set statistical module 703, for according to preset time unit to the microblogging text data with affective tag into
Row statistics, obtains cube;The length of the preset time period is the several times of the preset time unit;
Density value computing module 704, for carrying out joint probability density calculating to the cube, obtains each more
The joint probability density value of dimension data collection;
Determination module 705, for when joint probability density value is less than density value threshold value, judging the user in preset time
Emotion in unit occurs abnormal.
In one embodiment, the described second default quantity kind emotion is 5 kinds, it is respectively neutral, happy, surprised, sad and
Anger, corresponding label are 0,1,2,3 and 4.
In one embodiment, the data set statistical module includes:
Microblogging text classification unit, for according to support vector machines to the described second default quantity bar microblogging text data into
Row classification;
Data set determination unit, for for each user in the described first default quantity user, determining the use
The five dimension data collection at family.
In one embodiment, described device further includes density value threshold value acquisition module;The density value threshold value acquisition module
Including:
Data set acquiring unit, for presetting quantity bar microblogging text data based on second, according to first present count
Amount user and the corresponding whole preset time units of the preset time period obtain multiple five dimension datas collection;
Density value computing unit, for calculating the joint probability density value of the multiple five dimension datas collection in batches;
Data set grouped element, for the multiple five dimension datas collection to be grouped into cross validation collection and test set;
Experimental considerations unit, for based on different threshold values, being carried out according to joint probability density function to the cross validation collection real
Test, obtain multigroup experimental result;
Density value threshold value determination unit, for the corresponding threshold value of accuracy rate highest of multigroup experimental result to be determined as
The density value threshold value of the test set.
In one embodiment, the joint probability density function is represented using the following formula:
In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection,
Each element i.e. in covariance is the covariance between different components in five dimension datas represented.
It should be noted that the abnormal individual character decision maker provided in an embodiment of the present invention based on microblogging text and above-mentioned side
Method is one-to-one relation, and the implementation detail of the above method is equally applicable to above device, and the embodiment of the present invention is no longer to upper
The system of stating is described in detail.
In the specification of the present invention, numerous specific details are set forth.It is to be appreciated, however, that the embodiment of the present invention can be with
Put into practice in the case of these no details.In some instances, known method, structure and skill is not been shown in detail
Art, so as not to obscure the understanding of this description.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe is described in detail the present invention with reference to foregoing embodiments, it will be understood by those of ordinary skill in the art that:Its according to
Can so modify to the technical solution described in foregoing embodiments, either to which part or all technical characteristic into
Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme, it should all cover among the claim of the present invention and the scope of specification.
Claims (10)
- A kind of 1. abnormal individual character decision method based on microblogging text, it is characterised in that the described method includes:Obtain the second default quantity bar microblogging text data of the first default quantity user in preset time period;Emotion recognition is carried out to the described second default quantity bar microblogging text data using support vector machines and is marked, obtains the 3rd Default quantity kind emotion;The microblogging text data with affective tag is counted according to preset time unit, obtains cube;It is described The length of preset time period is the several times of the preset time unit;Joint probability density calculating is carried out to the cube, obtains the joint probability density value of each cube;When joint probability density value is less than density value threshold value, it is different to judge that emotion of the user in preset time unit occurs Often.
- 2. exception individual character decision method according to claim 1, it is characterised in that described second, which presets quantity kind emotion, is 5 kinds, respectively neutral, happy, surprised, sad and angry, corresponding label is 0,1,2,3 and 4.
- 3. exception individual character decision method according to claim 1, it is characterised in that according to preset time unit to in love The microblogging text data of sense label is counted, and obtaining cube includes:Classified according to support vector machines to the described second default quantity bar microblogging text data;For each user in the described first default quantity user, the five dimension data collection of the user are determined.
- 4. exception individual character decision method according to claim 1, it is characterised in that choosing the density value threshold value includes:Based on the second default quantity bar microblogging text data, according to the described first default quantity user and the preset time period Corresponding whole preset time unit obtains multiple five dimension datas collection;Batch calculates the joint probability density of the multiple five dimension datas collection;The multiple five dimension datas collection is divided into cross validation collection and test set;Based on different threshold values, the cross validation collection is tested according to joint probability density function, obtains multigroup experiment knot Fruit;Density value threshold value using the corresponding threshold value of accuracy rate highest of multigroup experimental result as the test set.
- 5. exception individual character decision method according to claim 4, it is characterised in that the joint probability density function uses The following formula represents:<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <mi>x</mi> <mo>;</mo> <mi>u</mi> <mo>,</mo> <mi>&Sigma;</mi> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <msup> <mi>&pi;</mi> <mfrac> <mi>n</mi> <mn>2</mn> </mfrac> </msup> <mo>|</mo> <mi>&Sigma;</mi> <msup> <mo>|</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </msup> </mrow> </mfrac> <mi>exp</mi> <mrow> <mo>&lsqb;</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <mrow> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mi>&mu;</mi> </mrow> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msup> <mi>&Sigma;</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mrow> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mi>&mu;</mi> </mrow> <mo>)</mo> </mrow> </mrow> <mo>&rsqb;</mo> </mrow> </mrow>In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection, association side Difference is the degree that each dimension of measurement deviates its average.
- 6. a kind of abnormal individual character decision maker based on microblogging text, it is characterised in that described device includes:Text data acquisition module, the second default quantity bar for obtaining the first default quantity user in preset time period are micro- Blog article notebook data;Text emotion identification module, for presetting quantity bar microblogging text data into market to described second using support vector machines Perception is other and marks, and obtains the 3rd default quantity kind emotion;Data set statistical module, for being counted according to preset time unit to the microblogging text data with affective tag, Obtain cube;The length of the preset time period is the several times of the preset time unit;Density value computing module, for carrying out joint probability density calculating to the cube, obtains each multidimensional data The joint probability density value of collection;Determination module, for when joint probability density value is less than density value threshold value, judging the user in preset time unit Emotion occur it is abnormal.
- 7. exception individual character decision maker according to claim 6, it is characterised in that described second, which presets quantity kind emotion, is 5 kinds, respectively neutral, happy, surprised, sad and angry, corresponding label is 0,1,2,3 and 4.
- 8. exception individual character decision maker according to claim 6, it is characterised in that the data set statistical module includes:Microblogging text classification unit, for being divided according to support vector machines the described second default quantity bar microblogging text data Class;Data set determination unit, for for each user in the described first default quantity user, determining the user's Five dimension data collection.
- 9. exception individual character decision maker according to claim 6, it is characterised in that described device further includes density value threshold value Acquisition module;The density value threshold value acquisition module includes:Data set acquiring unit, for based on the second default quantity bar microblogging text data, quantity to be preset according to described first User and the corresponding whole preset time units of the preset time period obtain multiple five dimension datas collection;Density value computing unit, for calculating the joint probability density value of the multiple five dimension datas collection in batches;Data set grouped element, for the multiple five dimension datas collection to be grouped into cross validation collection and test set;Experimental considerations unit, for based on different threshold values, being tested, being obtained to the cross validation collection according to joint probability density function To multigroup experimental result;Density value threshold value determination unit, it is described for the corresponding threshold value of accuracy rate highest of multigroup experimental result to be determined as The density value threshold value of test set.
- 10. exception individual character decision maker according to claim 9, it is characterised in that the joint probability density function is adopted Represented with the following formula:<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <mi>x</mi> <mo>;</mo> <mi>u</mi> <mo>,</mo> <mi>&Sigma;</mi> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>2</mn> <msup> <mi>&pi;</mi> <mfrac> <mi>n</mi> <mn>2</mn> </mfrac> </msup> <mo>|</mo> <mi>&Sigma;</mi> <msup> <mo>|</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> </msup> </mrow> </mfrac> <mi>exp</mi> <mrow> <mo>&lsqb;</mo> <mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <msup> <mrow> <mo>(</mo> <mrow> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mi>&mu;</mi> </mrow> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msup> <mi>&Sigma;</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mrow> <mo>(</mo> <mrow> <msup> <mi>x</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msup> <mo>-</mo> <mi>&mu;</mi> </mrow> <mo>)</mo> </mrow> </mrow> <mo>&rsqb;</mo> </mrow> <mo>;</mo> </mrow>In formula, X (k) is five dimension variables sets, and μ is kth column data average, and Σ is the covariance matrix of five dimension data collection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711211558.4A CN107943966A (en) | 2017-11-28 | 2017-11-28 | Abnormal individual character decision method and device based on microblogging text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711211558.4A CN107943966A (en) | 2017-11-28 | 2017-11-28 | Abnormal individual character decision method and device based on microblogging text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107943966A true CN107943966A (en) | 2018-04-20 |
Family
ID=61950153
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711211558.4A Pending CN107943966A (en) | 2017-11-28 | 2017-11-28 | Abnormal individual character decision method and device based on microblogging text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107943966A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492135A (en) * | 2018-10-27 | 2019-03-19 | 平安科技(深圳)有限公司 | A kind of data checking method and device based on data processing |
CN109522556A (en) * | 2018-11-16 | 2019-03-26 | 北京九狐时代智能科技有限公司 | A kind of intension recognizing method and device |
CN110597703A (en) * | 2018-06-13 | 2019-12-20 | 中国移动通信集团浙江有限公司 | Regression testing method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104268197A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Industry comment data fine grain sentiment analysis method |
WO2016182156A1 (en) * | 2015-05-14 | 2016-11-17 | 디투이모션 주식회사 | Mobile terminal for detecting abnormal activity and system including same |
-
2017
- 2017-11-28 CN CN201711211558.4A patent/CN107943966A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104268197A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Industry comment data fine grain sentiment analysis method |
WO2016182156A1 (en) * | 2015-05-14 | 2016-11-17 | 디투이모션 주식회사 | Mobile terminal for detecting abnormal activity and system including same |
Non-Patent Citations (1)
Title |
---|
XIAO SUN 等: "Detecting users’ anomalous emotion using social media for business intelligence", 《JOURNAL OF COMPUTATIONAL SCIENCE》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110597703A (en) * | 2018-06-13 | 2019-12-20 | 中国移动通信集团浙江有限公司 | Regression testing method and device |
CN109492135A (en) * | 2018-10-27 | 2019-03-19 | 平安科技(深圳)有限公司 | A kind of data checking method and device based on data processing |
CN109492135B (en) * | 2018-10-27 | 2024-03-19 | 平安科技(深圳)有限公司 | Data auditing method and device based on data processing |
CN109522556A (en) * | 2018-11-16 | 2019-03-26 | 北京九狐时代智能科技有限公司 | A kind of intension recognizing method and device |
CN109522556B (en) * | 2018-11-16 | 2024-03-12 | 北京九狐时代智能科技有限公司 | Intention recognition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20090193344A1 (en) | Community mood representation | |
CN104572958B (en) | A kind of sensitive information monitoring method based on event extraction | |
CN103729359B (en) | A kind of method and system recommending search word | |
Raykov et al. | Basic statistics: An introduction with R | |
CN109408823B (en) | A kind of specific objective sentiment analysis method based on multi-channel model | |
Biernacki et al. | A generative model for rank data based on insertion sort algorithm | |
CN106646158A (en) | Transformer fault diagnosis improving method based on multi-classification support vector machine | |
CN104636631A (en) | Diabetes mellitus probability calculation method based on large data of diabetes mellitus system | |
CN108256016A (en) | Personal abnormal emotion detection method and device based on personal microblogging | |
Imon et al. | Identification of multiple outliers in logistic regression | |
CN109740655A (en) | Article score in predicting method based on matrix decomposition and neural collaborative filtering | |
CN107943966A (en) | Abnormal individual character decision method and device based on microblogging text | |
Gerhana et al. | Comparison of naive Bayes classifier and C4. 5 algorithms in predicting student study period | |
CN105787662A (en) | Mobile application software performance prediction method based on attributes | |
CN108733791A (en) | network event detection method | |
CN110851593B (en) | Complex value word vector construction method based on position and semantics | |
CN106951565B (en) | File classification method and the text classifier of acquisition | |
CN104809104A (en) | Method and system for identifying micro-blog textual emotion | |
CN106445914A (en) | Microblog emotion classifier establishing method and device | |
Lesany et al. | Recognition and classification of single and concurrent unnatural patterns in control charts via neural networks and fitted line of samples | |
CN115391670A (en) | Knowledge graph-based internet behavior analysis method and system | |
CN103279549B (en) | A kind of acquisition methods of target data of destination object and device | |
CN109242403A (en) | A kind of demand management method and computer equipment | |
CN113261975A (en) | Deep learning-based electrocardiogram classification method | |
CN103678709A (en) | Recommendation system attack detection method based on time series data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180420 |