CN110276068A - Law merit analysis method and device - Google Patents
Law merit analysis method and device Download PDFInfo
- Publication number
- CN110276068A CN110276068A CN201910379141.1A CN201910379141A CN110276068A CN 110276068 A CN110276068 A CN 110276068A CN 201910379141 A CN201910379141 A CN 201910379141A CN 110276068 A CN110276068 A CN 110276068A
- Authority
- CN
- China
- Prior art keywords
- task
- case
- vector
- prediction
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 130
- 239000013598 vector Substances 0.000 claims abstract description 298
- 238000013528 artificial neural network Methods 0.000 claims abstract description 84
- 239000011159 matrix material Substances 0.000 claims description 31
- 230000015654 memory Effects 0.000 claims description 29
- 238000012163 sequencing technique Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 210000004218 nerve net Anatomy 0.000 claims description 6
- 230000001052 transient effect Effects 0.000 claims description 5
- 239000000463 material Substances 0.000 claims description 4
- 230000008901 benefit Effects 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 210000005036 nerve Anatomy 0.000 claims 1
- 238000000034 method Methods 0.000 abstract description 21
- 238000005516 engineering process Methods 0.000 description 7
- 230000000306 recurrent effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 208000027418 Wounds and injury Diseases 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000006378 damage Effects 0.000 description 3
- 208000014674 injury Diseases 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 210000000056 organ Anatomy 0.000 description 3
- 230000006403 short-term memory Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000000740 bleeding effect Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Tourism & Hospitality (AREA)
- Computing Systems (AREA)
- Technology Law (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present invention provides a kind of law merit analysis method and device.Wherein, method includes: and describes text to case to be analyzed to be segmented and named Entity recognition, obtains sentence sequence;Multiple term vectors are obtained according to each word that sentence sequence includes, each term vector are encoded using first circulation neural network, and obtain the corresponding task text vector of each analysis task;Each element is judged that the corresponding task text vector of task carries out maximum pond, obtain the overall task text vector that element judges task, the overall task text vector of task and case, which are encoded by the corresponding task text vector of prediction task, to be judged to element using second circulation neural network, acquisition case is by the corresponding first hidden vector of prediction task, and case is input to case by prediction model by the corresponding first hidden vector of prediction task, acquisition case is by prediction result.Law merit analysis method provided in an embodiment of the present invention and device, can improve accuracy of analysis.
Description
Technical field
The present invention relates to field of computer technology, more particularly, to a kind of law merit analysis method and device.
Background technique
With the high speed development of artificial intelligence technology, utilize artificial intelligence to help judicial domain develops into the epoch
Inexorable trend.In recent years, the cross discipline research of artificial intelligence and law had very much.Eighties of last century, many scholars utilize number
It learns statistic algorithm, Keywords matching algorithm and merit analysis is carried out to legal case.With the development of machine learning techniques, Geng Duoxue
The method that person begins through manual withdrawal text feature, further to automatically analyze merit.As depth learning technology high speed is sent out
Exhibition, many scholars, which are absorbed in, extracts the information contained in text using neural network, to further increase the quality of case analysis.
But these methods can not generally solve in actual scene caseload that distribution is extremely unbalanced, similar charge is extremely easily obscured
The problem of.In actual scene, there are many charge, law article frequency of occurrence are very low, and traditional deep learning model can not be accurate
Ground provides the analysis result of these cases.In other words, traditional deep learning method is merely able to analyze the most common part
Charge/case by case facts, and the prior art can not distinguish the difference of similar charge case well, without good practical
Property.
In conclusion existing technology is merely able to the case facts of the part charge of analysis of high frequency, and cannot be distinguished similar
The case of charge, therefore the prior art is low to the accuracy of analysis of case and coverage rate is low.
Summary of the invention
The embodiment of the present invention provides a kind of law merit analysis method and device, to solve or at least be partially solved
The low defect of existing law merit analysis method accuracy.
In a first aspect, the embodiment of the present invention provides a kind of law merit analysis method, comprising:
Text is described to case to be analyzed and is segmented and named Entity recognition, obtain sentence sequence, sequence of events and
Name entity;
Each word, the sequence of events and the name entity for including according to the sentence sequence, obtain multiple words to
Amount, encodes each term vector using first circulation neural network, and according to coding result, the hidden vector sum phase of task
It closes matrix and obtains the corresponding task text vector of each analysis task;Wherein, the analysis task includes that element judges task and case
By prediction task;Element is to judgement case by relevant multiple science of law elements;The element judges the number and element of task
Number is identical, and each element judges that task respectively corresponds the science of law element;The number of the hidden vector of task and institute
The number for stating analysis task is identical, and each hidden vector of task respectively corresponds the analysis task;
Each element is judged that the corresponding task text vector of task carries out maximum pond, the element judgement is obtained and appoints
The overall task text vector of business judges the element overall task text vector of task using second circulation neural network
It is encoded with the case by the corresponding task text vector of prediction task, it is corresponding first hidden by prediction task to obtain the case
Vector, and the case is input to case by prediction model by the corresponding first hidden vector of prediction task, obtain case to be analyzed
The case of text is described by prediction result;
Wherein, the first circulation neural network, the hidden vector of the task, the correlation matrix, second circulation mind
Through network and the case by prediction model, it is all based on and is obtained after sample legal documents are trained.
Preferably, the analysis task further include: related law article prediction task and duration prediction task;
Correspondingly, it obtains the element and judges and include: after the overall task text vector of task
The overall task text vector that judges task to the element using second circulation neural network, the case are by predicting
The corresponding task text vector of task, the related law article prediction corresponding task text vector of task and the duration prediction are appointed
Corresponding task text vector of being engaged in is encoded, and obtains the case by prediction task, the related law article prediction task and described
The corresponding first hidden vector of duration prediction task;
By the case from the corresponding first hidden vector of prediction task, the related law article prediction task it is corresponding first it is hidden to
The first hidden vector corresponding with the duration prediction task is measured, is separately input into the case by prediction model, related law article prediction
Model and duration prediction model, obtain case to be analyzed describe the case of text by prediction result, related law article prediction result and
Duration prediction result;
Wherein, the related law article prediction model and the duration prediction model, are all based on the sample legal documents
It is obtained after being trained.
Preferably, described that the corresponding task of each analysis task is obtained according to coding result, the hidden vector sum correlation matrix of task
After text vector, further includes:
Each element is judged into the corresponding task text vector of task, the element is separately input into and judges task pair
The element judgment models answered obtain the result that the element judges task;
Wherein, each element judges the corresponding element judgment models of task, be all based on the sample legal documents into
It is obtained after row training.
Preferably, text is described to case to be analyzed and is segmented and named Entity recognition, obtain sentence sequence, event
Sequence and the specific steps of name entity include:
Text is described to the case to be analyzed and carries out participle and part-of-speech tagging, stop words is deleted, obtains multiple sentences;
Each sentence includes several words and the corresponding part of speech of each word;
The multiple sentence is screened according to the triggering vocabulary constructed in advance, it is related with case important to retain description
True sentence, forms the sentence sequence;
It is corresponding according to the word and word that each sentence includes in default rule, syntax dependence, the sentence sequence
Part of speech, obtain the case to be analyzed describe text description several events and each name entity, will it is described several
Event forms the sequence of events according to the sequencing of Time To Event.
Preferably, each word, the sequence of events and the name entity for including according to the sentence sequence obtain more
The specific steps of a term vector include:
Each word for including by the sentence sequence, according to the sequencing of each Time To Event in the sequence of events
Spliced, obtains sequence of terms;
The sequence of terms is mapped according to the term vector table that pre-training obtains, obtaining the sentence sequence includes
The original term vector of each word;
For each word that the sentence sequence includes, according to event described in sentence where the word and described
Whether word is the name entity, is extended to the original term vector of the word, obtain the corresponding word of the word to
Amount, obtains the multiple term vector.
Preferably, described that the corresponding task of each analysis task is obtained according to coding result, the hidden vector sum correlation matrix of task
The specific steps of text vector include:
For each analysis task, according to the coding result, the hidden vector sum of the corresponding task of the analysis task
The correlation matrix obtains the corresponding weight of the coding result, and according to the corresponding weight of the coding result to the volume
Code result is weighted summation, obtains the corresponding task text vector of the analysis task.
Preferably, the first circulation neural network is long Memory Neural Networks in short-term;The second circulation neural network
For long Memory Neural Networks in short-term.
Second aspect, the embodiment of the present invention provide a kind of law merit analytical equipment, comprising:
Data processing module is segmented and is named Entity recognition for describing text to case to be analyzed, obtains sentence
Subsequence, sequence of events and name entity;
True coding module, each word, the sequence of events and the name for including according to the sentence sequence
Entity obtains multiple term vectors, is encoded using first circulation neural network to each term vector, and is tied according to coding
The hidden vector sum correlation matrix of fruit, task obtains the corresponding task text vector of each analysis task;Wherein, the analysis task includes
Element judges task and case by prediction task;Element is to judgement case by relevant multiple science of law elements;The element judgement is appointed
The number of business and the number of element are identical, and each element judges that task respectively corresponds the science of law element;The task
The number of hidden vector is identical as the number of the analysis task, and each hidden vector of task respectively corresponds the analysis and appoints
Business;
Task sequence prediction module, for each element to be judged that the corresponding task text vector of task carries out maximum pond
Change, obtains the overall task text vector that the element judges task, the element is judged using second circulation neural network
The overall task text vector of task and the case are encoded by the corresponding task text vector of prediction task, obtain the case
Case is input to by predicting by the corresponding first hidden vector of prediction task by the corresponding first hidden vector of prediction task, and by the case
Model obtains case to be analyzed and describes the case of text by prediction result;
Wherein, the first circulation neural network, the hidden vector of the task, the correlation matrix, second circulation mind
Through network and the case by prediction model, it is all based on and is obtained after sample legal documents are trained.
The third aspect, the embodiment of the present invention provides a kind of electronic equipment, including memory, processor and is stored in memory
Computer program that is upper and can running on a processor, realizes the various possible realizations such as first aspect when executing described program
In mode provided by any possible implementation the step of law merit analysis method.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium, are stored thereon with calculating
Machine program, when which is executed by processor realize as first aspect various possible implementations in it is any can
The step of law merit analysis method provided by the implementation of energy.
Law merit analysis method provided in an embodiment of the present invention and device, the dependence based on law element and case between
Relationship analyzes law merit, the case of similar charge can be distinguished according to element, and can be suitably used for analyzing whole cases by
Case facts, and be not limited to common part case by case facts, so as to greatly improve the accuracy of case analysis, and have
There is higher case coverage rate.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow diagram according to law merit analysis method provided in an embodiment of the present invention;
Fig. 2 is the structural schematic diagram according to law merit analytical equipment provided in an embodiment of the present invention;
Fig. 3 is the entity structure schematic diagram according to electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
In order to overcome the above problem of the prior art, the embodiment of the present invention provides a kind of law merit analysis method and dress
It sets, inventive concept is analyzed to judgement case by relevant multiple science of law elements by the model pair that training obtains, according to
The analysis result and machine learning model of science of law element obtain more accurate case by prediction result.
Fig. 1 is the flow diagram according to law merit analysis method provided in an embodiment of the present invention.As shown in Figure 1, side
Method includes: step S101, text is described to case to be analyzed is segmented and named Entity recognition, obtains sentence sequence, thing
Part sequence and name entity.
Specifically, case to be analyzed describes text, describes one section of case facts.
Each sentence in sentence sequence is a sequence of terms.The sequence of terms describes text by case to be analyzed
In a sentence (referring to the sentence separated by comma, branch or fullstop) carry out participle acquisition.
It for Chinese text, can be segmented using existing any Chinese word segmentation packet, such as the Chinese of existing open source
Participle packet thulac.
For each sentence in sentence sequence, if the sentence includes specific word, the available sentence includes
Event, so as to obtain whole events that sentence sequence includes.
For example, the sentence includes attack if in some sentence in sentence sequence including " hitting " word.
Entity is named to include at least name, place name and organization etc..Name, place name and organization etc. have apparent
Text feature, thus each name entity in the word that sentence sequence includes can be extracted.
Step S102, each word, sequence of events and the name entity for including according to sentence sequence, obtains multiple term vectors,
Each term vector is encoded using first circulation neural network, and according to coding result, the hidden vector sum correlation matrix of task
Obtain the corresponding task text vector of each analysis task.
Wherein, analysis task includes that element judges task and case by prediction task;Element is to judgement case by relevant more
A science of law element;Element judges that the number of task is identical as the number of element, and each element judges that task respectively corresponds a science of law
Element;The number of the hidden vector of task is identical as the number of analysis task, and each hidden vector of task respectively corresponds an analysis task.
Specifically, deep learning model reads the word of serializing in the form of term vector sequence, obtains sentence sequence, thing
After part sequence and name entity, for each word that sentence sequence includes, any relevant mode for being used to generate term vector is utilized
Type, and binding events sequence and name entity, can obtain a term vector sequence.The term vector sequence include multiple words to
Amount, each term vector correspond to the word that sentence sequence includes.
It can be any one in Word2vec, GloVe and FastText etc. for generating the correlation model of term vector
Kind, the embodiment of the present invention is not specifically limited this.
After obtaining term vector sequence, first circulation neural network can use to each term vector in the term vector sequence
It is encoded, captures the semantic information of sentence forward-backward correlation, coding result is the second hidden sequence vector or the second hidden vector matrix.
The length of second hidden sequence vector is identical as the length of term vector sequence, i.e., the number of the second hidden vector in the second hidden sequence vector
For the number for the word that sentence sequence includes.
Any term vector inputs first circulation neural network, and first circulation neural network exports a new vector, referred to as
Second hidden vector.
Second hidden sequence vector is mapped to by text vector relevant to analysis task in order to obtain using attention mechanism
Different task text spaces obtains the corresponding task text vector of different analysis tasks.
Analysis task includes at least element and judges task and case by prediction task.Element is to judgement case by relevant multiple
Science of law element, thus element judges that the number of task to be multiple, is respectively used to judge the value of different science of law elements
And prediction.Element is predetermined, how many element, and correspondingly how many element predicts task.
For example, element may include profit, dealing, death, violence, government offices or country's work for criminal case
Personnel public arena, illegally occupy, injure, 10 elements such as during subject intent and production operation.
Whether the meaning of above-mentioned 10 elements is respectively as follows: profit, refer to defendant (or suspect) for the purpose of profit;It buys
It sells, refers in defendant (or suspect) behavior whether be related to act of purchase and sale;Death refers to whether the injured party is dead;Violence, refer to by
Accuse whether (or suspect) uses violent means crime;Government offices or functionary in the state organ refer in case whether relate to
And government offices and functionary in the state organ;Public arena, refers to whether case occurs in public;Illegally occupy, refer to defendant (or
Suspect) whether for the purpose of illegally occupying;Injury, refers to whether the injured party is injured;Subject intent refers to defendant's (or crime
Suspect) it is subjective whether calculated crime;During production operation, refer to whether case occurs during production operation.
It, can be with for different types of administrative case (such as case involving public security, traffic offence case and industrial and commercial administration case)
Using corresponding element, with judgement case by.
It is understood that each element, which judges task, a corresponding task text vector.
In order to realize attention mechanism, define the hidden vector of task for each analysis task, thus task it is hidden to
The number of amount and the number of analysis task are identical, and each hidden vector of task respectively corresponds an analysis task.Task vector is used as and looks into
It askes vector (query).
Correlation matrix, the degree of correlation for presentation code result and the hidden vector of each task.
Case by prediction task, for predict case by.
Step S103, each element is judged that the corresponding task text vector of task carries out maximum pond, obtains element judgement
The overall task text vector of task, using second circulation neural network to element judge task overall task text vector and
Case is encoded by the corresponding task text vector of prediction task, and acquisition case is incited somebody to action by the corresponding first hidden vector of prediction task
Case is input to case by prediction model by the corresponding first hidden vector of prediction task, obtain case to be analyzed describe the case of text by
Prediction result.
Since element judges the number of the corresponding task text vector of task to be multiple, for the ease of carry out case by predicting,
By each element judge the corresponding task text vector of task be defined as judging each element the corresponding task text vector of task into
The result of row maximum Chi Huahou.
tattr=max_pooling ([t1,t2,...,tk])
tattr,i=max (t1,i,t2,i,...,tk,i)
Wherein, tattrIndicate that element judges the overall task text vector of task;t1,t2,...,tkRespectively indicate each element
The corresponding task text vector of judgement task;K is positive integer, indicates that element judges the number of task;tattr,iIndicate tattr?
I element value;1≤i≤d1, d1The dimension of expression task text vector;t1,i,t2,i,...,tk,iRespectively indicate each element judgement
I-th of element value of the corresponding task text vector of task.
The dependence between each analysis task is captured using second circulation neural network, when analysis task includes that element is sentenced
When disconnected task and case are by prediction task, element judges the overall task text vector of task for tattr, case corresponds to by prediction task
Task text vector be taccu, by tattrAnd taccuJudge that task and case form task sequence by the sequence of prediction task by element
Column, obtain element by second circulation neural network and judge the corresponding first hidden vector of taskIt is corresponding by prediction task with case
The first hidden vectorCalculation formula is
Wherein, RNN indicates the operation that second circulation neural network executes.
Acquisition case is by the corresponding first hidden vector of prediction taskLater, willIt is input to case by prediction model, in fact
Now by the first hidden vectorMapping is appeared in court by the corresponding Label space of prediction task, acquisition case is by prediction result.
For example, case may include larceny, robbery crime, deliberately wound by the corresponding label of prediction task for criminal case
Evil crime and corruption offence etc.;For traffic offence case, case by the corresponding label of prediction task may include hypervelocity, not believe by traffic
Signal lamp regulation is current, violate traffic prohibitory sign and intentional block is stained automotive number plate etc..
Case can be any trained classifier by prediction model, for example, support vector machines, artificial neural network and
Decision tree etc..
For example, using trained full Connection Neural Network as case by prediction model, forCase is by prediction model
OutputWherein, Yaccu=larceny, robbery crime ..., then the case that case is exported by prediction model is by predicting
As a result yaccuFor
Wherein, the operation that softmax expression case is executed by prediction model;WaccuAnd baccuExpression case is by prediction model
Parameter.
It is understood that case is by prediction result.yaccuFor vector, yaccuThe value of every dimension represents corresponding label
Probability.That is, yaccuEach element value respectively indicate corresponding case by the probability of label.
Wherein, the hidden vector of first circulation neural network, task, correlation matrix, second circulation neural network and case are by predicting
Model is all based on and obtains after sample legal documents are trained.
It is understood that can be trained based on sample legal documents, adjusting parameter obtains first circulation nerve net
Network, the hidden vector of each task, correlation matrix, second circulation neural network and case are by prediction model.
Sample legal documents refer to the legal documents for determining final legal consequences.For example, can be method for criminal case
The judgement document of institute;It can be the administrative punishment form that administration organ assigns for administrative case.
The embodiment of the present invention analyzes law merit based on the dependence of law element and case between, can basis
Element distinguishes the case of similar charge, and can be suitably used for analyzing whole cases by case facts, and be not limited to common part case
By case facts, so as to greatly improve the accuracy of case analysis, and there is higher case coverage rate.
Content based on the various embodiments described above, analysis task further include: related law article prediction task and duration prediction task.
Specifically, in order to further increase merit analysis it is comprehensive, related law article and duration can also be analyzed,
Thus analysis task further includes related law article prediction task and duration prediction task.
Related law article predicts task, for predicting related law article.
Duration prediction task, for predicting the duration of punishment.For example, for criminal case, punishment when a length of prison term;It is right
In different administrative cases, the duration of punishment can be respectively the duration of administrative detention, the duration suspended business to bring up to standard and provisionally suspend driving
The duration etc. of card.
Correspondingly, it obtains element and judges that the overall task text vector of task includes: later to utilize second circulation nerve net
Network judges that element, the overall task text vector of task, case are pre- by the corresponding task text vector of prediction task, related law article
The corresponding task text vector of survey task and the corresponding task text vector of duration prediction task are encoded, and acquisition case is by predicting
Task, related law article prediction task and the corresponding first hidden vector of duration prediction task.
Specifically, analysis task includes that element judges that task, case are pre- by prediction task, related law article prediction task and duration
When survey task, element is judged to the overall task text vector t of taskattr, case is by the corresponding task text vector of prediction task
taccu, the corresponding task text vector t of related law article prediction tasklawTask text vector corresponding with duration prediction task
ttime, judge task, case by prediction task, the sequence of related law article prediction task and duration prediction task, composition according to element
Task sequence captures the dependence between each analysis task using second circulation neural network, compiles to the task sequence
Code obtains element and judges the corresponding first hidden vector of taskCase is by the corresponding first hidden vector of prediction taskIt is related
Law article predicts the corresponding first hidden vector of taskThe first hidden vector corresponding with duration prediction task
For example, using it is long Memory Neural Networks are as second circulation neural network in short-term when, each analysis task corresponding the
The calculation formula of one hidden vector is
Wherein, LSTM indicates the operation that long Memory Neural Networks in short-term execute.
Case judges task by prediction task, dependent on each element;Related law article predicts task, appoints dependent on the judgement of each element
Business and case are by prediction task;Duration prediction task judges task, case by prediction task and related law article prediction dependent on each element
Task.
By case by the corresponding first hidden vector of prediction task, the corresponding first hidden vector sum duration of related law article prediction task
The corresponding first hidden vector of prediction task is separately input into case by prediction model, related law article prediction model and duration prediction mould
Type obtains case to be analyzed and describes the case of text by prediction result, related law article prediction result and duration prediction result.
Wherein, related law article prediction model and duration prediction model are all based on after sample legal documents are trained and obtain
?.
Specifically, acquisition case is by the corresponding first hidden vector of prediction taskRelated law article prediction task corresponding the
One hidden vectorThe first hidden vector corresponding with duration prediction taskLater, willWithIt inputs respectively
To case by prediction model, related law article prediction model and duration prediction model, realize the first hidden vectorWith
It is respectively mapped to case to be predicted on task and the corresponding Label space of duration prediction task by prediction task, related law article, acquisition case
By prediction result, related law article prediction result, duration prediction result.
Acquisition case is by prediction result, related law article prediction result, the specific steps of duration prediction result, with above-described embodiment
Middle acquisition case is similar by the specific steps of prediction result, and details are not described herein again.
It is understood that can be trained based on sample legal documents, adjusting parameter, obtains related law article and predict mould
Type and duration prediction model.
Analysis task includes that element judges task, case by prediction task, related law article prediction task and duration prediction task
The embodiment of the present invention carries out related law article by the dependence between related law article based on law element, case pre-
It surveys, is predicted, can be obtained more acurrate by the dependence clock synchronization length between, related law article and duration based on law element, case
Ground correlation law article prediction result and duration prediction are as a result, so as to improve the accuracy of case analysis and comprehensive.
Content based on the various embodiments described above obtains each analysis according to coding result, the hidden vector sum correlation matrix of task and appoints
It is engaged in after corresponding task text vector, further includes: each element is judged into the corresponding task text vector of task, is inputted respectively
The corresponding element judgment models of task are judged to element, obtain the result that element judges task.
Wherein, each element judges the corresponding element judgment models of task, is all based on after sample legal documents are trained
It obtains.
Specifically, the corresponding task text vector t of each analysis task is obtained1,t2,...,tkLater, by t1,t2,...,tk
Corresponding element judgment models are separately input into, the predicted value of each element is obtained, the result of task is judged as element.
The predictor formula of the predicted value of any element is
yi=softmax (Witi+bi)
Wherein, tiIndicate that i-th of element judges the corresponding task text of task;yiIndicate the predicted value of i-th of element;1≤
i≤k;K is positive integer, indicates the number of element;WiAnd biIndicate the parameter of i-th of element judgment models;
Yattr={ being, no }.
It is understood that yiFor vector, yiThe value of every dimension represents the probability of corresponding label.For example, yi=
[0.1,0.9], the probability for indicating that i-th of element value is no is 90%, and the probability that value is yes is 10%.
It is understood that can be trained based on sample legal documents, adjusting parameter obtains each element and judges mould
Type.
The embodiment of the present invention judges the corresponding task text vector of task by element judgment models and each element, obtains each
The predicted value of element can improve the comprehensive and intelligent level of merit analysis convenient for the main points of case are appreciated more fully.
Content based on the various embodiments described above describes text to case to be analyzed and is segmented and named Entity recognition,
Obtaining sentence sequence, sequence of events and naming the specific steps of entity includes: to describe text to case to be analyzed to segment
And part-of-speech tagging, stop words is deleted, multiple sentences are obtained;Each sentence includes several words and the corresponding word of each word
Property.
Specifically, it describes each sentence in text to case to be analyzed to segment, each word for obtaining participle
Language carries out part-of-speech tagging, and deletes stop words, and the case being analysed to describes the original series s={ s that text is converted into sentence1,
s2,...,sm}.The original series include multiple sentence s1,s2,...,sm, m indicate original series in sentence quantity.
Stop words refers in processing natural language data (or text), to save memory space and improving treatment effeciency,
Certain words or word are fallen in meeting automatic fitration before or after handling natural language data (or text), these words or word are referred to as
Stop Words (stop words).
Law merit is analyzed, stop words mainly includes the function word for including in human language, these function words are extremely
Generally, compared with other words, function word is without what physical meaning.
Each sentence s in original seriesjFor a sequence of terms sj={ wj1,wj2,...,wjnAnd each word correspondence
Part of speech cj={ cj1,cj2,...,cjn}.Wherein, n indicates sentence sjThe quantity for the word for including;wjiIt indicates in j-th of sentence
I-th of word;1≤j≤m;1≤i≤n;cjiIndicate the corresponding part of speech of i-th of word in j-th of sentence, i.e. wjiIt is corresponding
Part of speech;cji∈ C, C indicate part of speech table.
Multiple sentences are screened according to the triggering vocabulary constructed in advance, retains and describes material facts related with case
Sentence, form sentence sequence.
After obtaining original series, the sentence in original series can be sieved according to the triggering vocabulary constructed in advance
Choosing detects and develops the significant fact involved in text to case, retains the sentence for describing material facts related with case,
The sentence for not describing material facts related with case is deleted, the sentence of reservation is formed into sentence sequence s '={ s '1,s′2,...,
s′m′}.The quantity of sentence in m ' expression sentence sequence.
Sentence comprising event trigger word is considered comprising the corresponding event of trigger word.For example, " hitting " is a triggering
Word, if in some sentence in sentence sequence including " hitting " word, which includes attack.
According to the word and the corresponding word of word that each sentence includes in default rule, syntax dependence, sentence sequence
Property, it obtains case to be analyzed and describes several events and each name entity that text describes, by several events according to event
The sequencing of time of origin forms sequence of events.
Specifically, according to features such as syntax dependence, parts of speech, using default rule, each sentence in subordinate clause subsequence
Each name entity is extracted in the word for including, and the hair of dependent event can be extracted by entities such as the name of extraction, place names
The attributes such as Radix Rehmanniae point, event personage, time of origin, thus obtain description several events and each locale,
It is related to personage and time of origin.
For example, it is the victim of attack that default rule, which is the object of verb " hitting ", so as to according to verb
Word determination before and after " hitting " is related to the personage of attack, and subject is injurer, object is victim.
After obtaining several above-mentioned events, the real timeline that meets accident can be combed, several above-mentioned events are sent out according to event
The sequencing of raw time, rather than the sequencing occurred in sentence sequence, form sequence of events.For in sequence of events
Each event, in addition to mark what is other than, if getting locale, be related to personage and occur when
Between.
For example, case to be analyzed describe text be " Lee's burglary property, is found during theft by owner, with
Fighting with owner makes owner bleed injury, and Lee immediately escapes ";(such as v indicates dynamic for progress Chinese word segmentation and part of speech
Word, p indicate that preposition, n indicate that noun, np indicate that name, d indicate that adverbial word, w indicate punctuate etc.) result of mark is (Lee, np)
(entering the room, v) (theft, v) (property, n) (, w) (theft, v) (process, n) (in, f) (quilt, p) (owner, n) (it was found that v) (
W) (with that is, d), (with c), (owner, n) (occurs, v) (fight, v) and (make, v) (owner, n) (bleed, v) (injured, v) (, w)
(Lee, np) (immediately, d) (escaping, v) (, u) (., w));Delete quilt, the result of medium Chinese stop words is (Lee, np)
(entering the room, v) (theft, v) (property, n) (theft, v) (process, n) (owner, n) (it was found that v) (with that is, d) (with c) (owner,
N) (fighting, v) (owner, n) (bleeding, v) (injured, v) (Lee, np) (immediately, d) (escaping, v) (occurs, v);It is named
The result of Entity recognition is that acquisition sentence " steal process owner and find master flow of fighting with owner by Lee's burglary property
Blood injury Lee immediately escapes ", entity includes (Lee, np) and (owner, n);It is detected, acquisition sequence of events is event 1:
Thievery, personage: Lee, event 2: attack, personage: Lee, owner.
The embodiment of the present invention screens sentence by trigger word, can screen out junk fact, reduces input noise, from
And it can be reduced data processing amount and improve accuracy of analysis.
Content based on the various embodiments described above, each word for including according to sentence sequence, sequence of events and name entity, is obtained
The specific steps for taking multiple term vectors include: each word for including by sentence sequence, when occurring according to each event in sequence of events
Between sequencing spliced, obtain sequence of terms.
Specifically, the sentence sequence s ' word for including is spliced according to the sequencing of Time To Event, is obtained
One input sequence of terms w={ w1,w2,...,wl}.Wherein, l indicates the quantity of word.
Sequence of terms is mapped according to the term vector table that pre-training obtains, obtains each word that sentence sequence includes
Original term vector.
Pre-training is carried out to term vector, obtains a term vector table.Pre-training can using Word2vec, GloVe and
Any one method in FastText etc., the embodiment of the present invention are not specifically limited this.
The sequence of terms of input is mapped to obtain the original term vector of each word by above-mentioned term vector table.
For each word that sentence sequence includes, whether event described in the sentence according to where word and word are life
Name entity, is extended the original term vector of word, obtains the corresponding term vector of word, obtains multiple term vectors.
For each word that sentence sequence includes, event described in the sentence according to where word and the word are
No is name entity (including being any name entity), is extended to the original term vector of the word, i.e., in the word
Increase several elements after original term vector, several increased elements for event described in sentence where indicating word with
And the word is any name entity, to be the corresponding term vector of the word by the prime word vector extensions of the word.
After each word that distich sub-series of packets contains all is extended, multiple term vectors are obtained, constitute term vector sequenceWherein, l indicates word quantity, and d indicates the dimension of term vector.
V={ v1,v2,...,vl}
Wherein, v1,v2,...,vlRespectively word w1,w2,...,wlCorresponding term vector.
Whether event described in sentence according to where word of the embodiment of the present invention and word are name entity, to word
Original term vector is extended, and obtains the corresponding term vector of word, term vector is enabled to better describe the context of the word, from
And more accurate element judging result and case can be obtained according to term vector by analysis result.
Content based on the various embodiments described above obtains each analysis according to coding result, the hidden vector sum correlation matrix of task and appoints
Be engaged in corresponding task text vector specific steps include: for each analysis task, it is corresponding according to coding result, analysis task
The hidden vector sum correlation matrix of task, obtain the corresponding weight of coding result, and according to the corresponding weight of coding result to coding
As a result it is weighted summation, obtains the corresponding task text vector of analysis task.
It is understood that coding result is the second hidden sequence vectorWherein, d1Indicate the second hidden vector
Dimension.
H={ h1,h2,...,hl}
Second hidden sequence vector h includes 1 the second hidden vector, i.e., the length of the second hidden sequence vector is with sequence of terms w's
Length is identical.
Each hidden vector of task can form task vector sequence u={ u1,u2,...,up};Wherein, uiIndicate i-th of analysis
The hidden vector of the corresponding task of task;1≤i≤p;The quantity of p expression analysis task.
For example, the quantity of element prediction task is 10, other analysis tasks further include that case is pre- by prediction task, related law article
Survey task and duration prediction task, then p=13.
For i-th of analysis task, according to the hidden vector u of the corresponding task of the analysis taski, the second hidden sequence vector h and
Correlation matrix Wa, the corresponding task text vector t of the analysis task can be obtainedi。
Specific step is as follows:
Obtain the weight vectors of the analysis taskWeight vectors α from the second hidden sequence vector h each second it is hidden to
The weight of amount forms;
The calculation formula of weight is
Wherein, αjIndicate the weight of j-th of second hidden vectors in the second hidden sequence vector h;1≤j≤l;
After the weight vectors α for obtaining the analysis task, is calculated by following formula and obtain ti,
Through the above steps, the corresponding task text vector of each analysis task can be obtained.
The embodiment of the present invention obtains coding result for every based on the degree of correlation between coding result and the hidden vector of task
The weight of the hidden vector of one task is weighted summation to coding result according to the corresponding weight of coding result, obtains analysis task
Corresponding task text vector can more accurately characterize the feature of each analysis task, to obtain more accurate merit analysis knot
Fruit.
Content based on the various embodiments described above, first circulation neural network are long Memory Neural Networks in short-term;Second circulation
Neural network is long Memory Neural Networks in short-term.
Specifically, first circulation neural network and second circulation neural network can use gating cycle neural network.
Gating cycle neural network adjusts the structure of network on the basis of simple cycle neural network, joined
Door control mechanism, for the transmitting of information in control neural network.The information that door control mechanism can be used to control in memory unit has
How much need to retain, how many needs to abandon, new status information again how many need to be saved in memory unit it is medium.This makes
Gating cycle neural network can learn the relatively long dependence of span, disappear and gradient explosion without gradient
Problem.
Common gating cycle neural network includes that long Memory Neural Networks in short-term and door control cycling element.
Preferably, first circulation neural network and second circulation neural network can use long short-term memory nerve net
Network.
Long Memory Neural Networks (Long Short-term Memory, abbreviation LSTM) in short-term are a kind of time recurrent neurals
Network is suitable for being spaced and postpone relatively long critical event in processing and predicted time sequence.Long short-term memory nerve net
Network is a kind of special gating cycle neural network, and a kind of special Recognition with Recurrent Neural Network.
In general Recognition with Recurrent Neural Network, memory unit does not have the ability of the magnitude of value of scaling information, and therefore, memory is single
Member equally regards it for the status information at each moment, and which results in some useless letters are often stored in memory unit
Breath, and actually useful information has been squeezed by these useless information and has been gone out.LSTM has exactly been done from this starting point accordingly to be changed
Only have a kind of network state different into the Recognition with Recurrent Neural Network with general structure, the state of network is divided into internal shape in LSTM
State and two kinds of external status.The external status of LSTM is similar to the state in the Recognition with Recurrent Neural Network of general structure, the i.e. state
It is both the output of current time hidden layer and the input of subsequent time hidden layer.Here internal state is then that LSTM is peculiar
's.
It is input gate (input gate), out gate respectively there are three the control unit for being referred to as " door " in LSTM
(output gate) and forget door (forget gate), wherein input gate and to forget door be that LSTM can remember and rely on for a long time
It is crucial.Input gate determines that how many information of the state of current time network needs to be saved in internal state, and forgets door then
Determining past status information, how many needs to abandon.Finally, by out gate determine current time internal state how many
Information needs to export to external status.
Learnt by the memory and forgetting status information, the LSTM enable of selectivity than general Recognition with Recurrent Neural Network
The dependence at longer time interval.
The embodiment of the present invention can preferably be caught by using long Memory Neural Networks in short-term as first circulation neural network
The semantic information for catching sentence forward-backward correlation, by using length, Memory Neural Networks, can be more as second circulation neural network in short-term
The dependence between analysis task is captured, well so as to obtain more accurately analysis as a result, improving the accuracy of analysis.
Fig. 2 is the structural schematic diagram according to law merit analytical equipment provided in an embodiment of the present invention.Based on above-mentioned each reality
The content of example is applied, as shown in Fig. 2, the device includes data processing module 201, true coding module 202 and task sequence prediction
Module 203, in which:
Data processing module 201 is segmented and is named Entity recognition for describing text to case to be analyzed, is obtained
Sentence sequence, sequence of events and name entity;
True coding module 202, each word, sequence of events and name entity for including according to sentence sequence, obtains
Multiple term vectors encode each term vector using first circulation neural network, and according to coding result, the hidden vector of task
Task text vector corresponding with correlation matrix each analysis task of acquisition;Wherein, analysis task includes that element judges task and case
By prediction task;Element is to judgement case by relevant multiple science of law elements;Element judges the number of task and the number of element
Identical, each element judges that task respectively corresponds a science of law element;The number of the hidden vector of task is identical as the number of analysis task,
Each hidden vector of task respectively corresponds an analysis task;
Task sequence prediction module 203, for each element to be judged that the corresponding task text vector of task carries out maximum pond
Change, obtains the overall task text vector that element judges task, the whole of task is judged to element using second circulation neural network
Body task text vector and case are encoded by the corresponding task text vector of prediction task, and acquisition case is corresponding by prediction task
First hidden vector, and case is input to case by prediction model by the corresponding first hidden vector of prediction task, obtain case to be analyzed
Part describes the case of text by prediction result;
Wherein, the hidden vector of first circulation neural network, task, correlation matrix, second circulation neural network and case are by predicting
Model is all based on and obtains after sample legal documents are trained.
Specifically, data processing module 201 describes text to case to be analyzed and segments, and to the word that participle obtains
Language is named Entity recognition, obtains sentence sequence, sequence of events and name entity.
After true coding module 202 obtains sentence sequence, sequence of events and name entity, include for sentence sequence
Each word is generated the correlation model of term vector, and binding events sequence and name entity using any, can obtain one
Term vector sequence including multiple term vectors;Can use first circulation neural network to each word in the term vector sequence to
Amount is encoded, and the semantic information of sentence forward-backward correlation is captured, and coding result is the second hidden sequence vector or the second hidden moment of a vector
Battle array;Second hidden sequence vector is mapped to according to the hidden vector sum correlation matrix of task using attention mechanism by different task texts
This space obtains the corresponding task text vector of different analysis tasks.
Each element is judged that the corresponding task text vector of task carries out maximum pond by task sequence prediction module 203, is obtained
It must want the overall task text vector of plain judgement task;The overall task text vector and case that element is judged task are by prediction times
It is engaged in corresponding task text vector, judges that task and case form task sequence by the sequence of prediction task by element, utilize second
Recognition with Recurrent Neural Network captures the dependence between each analysis task, judges element the overall task text vector and case of task
It is encoded by the corresponding task text vector of prediction task, acquisition case is by the corresponding first hidden vector of prediction task;By case by
The corresponding first hidden vector of prediction task is input to case by prediction model, realizes case by the corresponding first hidden vector of prediction task
Mapping is appeared in court by the corresponding Label space of prediction task, acquisition case is by prediction result.
Law merit analytical equipment provided in an embodiment of the present invention, the method provided for executing the various embodiments described above of the present invention
Merit analysis method is restrained, each module which includes realizes that the specific method of corresponding function and process are detailed in
The embodiment of above-mentioned law merit analysis method, details are not described herein again.
The law merit analytical equipment is used for the law merit analysis method of foregoing embodiments.Therefore, in aforementioned each reality
The description and definition in the law merit analysis method in example are applied, can be used for the reason of each execution module in the embodiment of the present invention
Solution.
The embodiment of the present invention analyzes law merit based on the dependence of law element and case between, can basis
Element distinguishes the case of similar charge, and can be suitably used for analyzing whole cases by case facts, and be not limited to common part case
By case facts, so as to greatly improve the accuracy of case analysis, and there is higher case coverage rate.
Fig. 3 is the structural block diagram according to electronic equipment provided in an embodiment of the present invention.Content based on the above embodiment, such as
Shown in Fig. 3, which may include: processor (processor) 301, memory (memory) 302 and bus 303;Its
In, processor 301 and memory 302 pass through bus 303 and complete mutual communication;Processor 301 is stored in for calling
In reservoir 302 and the computer program instructions that can be run on processor 301, to execute provided by above-mentioned each method embodiment
Law merit analysis method, for example, text is described to case to be analyzed and is segmented and named Entity recognition, obtains sentence
Subsequence, sequence of events and name entity;Each word, sequence of events and the name entity for including according to sentence sequence, obtain more
A term vector encodes each term vector using first circulation neural network, and according to coding result, the hidden vector sum of task
Correlation matrix obtains the corresponding task text vector of each analysis task;Wherein, analysis task include element judge task and case by
Prediction task;Element is to judgement case by relevant multiple science of law elements;Element judges the number of task and the number phase of element
Together, each element judges that task respectively corresponds a science of law element;The number of the hidden vector of task is identical as the number of analysis task, respectively
The hidden vector of task respectively corresponds an analysis task;Each element is judged that the corresponding task text vector of task carries out maximum pond
Change, obtains the overall task text vector that element judges task, the whole of task is judged to element using second circulation neural network
Body task text vector and case are encoded by the corresponding task text vector of prediction task, and acquisition case is corresponding by prediction task
First hidden vector, and case is input to case by prediction model by the corresponding first hidden vector of prediction task, obtain case to be analyzed
Part describes the case of text by prediction result;Wherein, the hidden vector of first circulation neural network, task, correlation matrix, second circulation mind
Through network and case by prediction model, it is all based on and is obtained after sample legal documents are trained.
Another embodiment of the present invention discloses a kind of computer program product, and computer program product is non-transient including being stored in
Computer program on computer readable storage medium, computer program include program instruction, when program instruction is held by computer
When row, computer is able to carry out law merit analysis method provided by above-mentioned each method embodiment, for example, to be analyzed
Case describe text and segmented and named Entity recognition, obtain sentence sequence, sequence of events and name entity;According to sentence
Each word, sequence of events and the name entity that sequence includes, obtain multiple term vectors, using first circulation neural network to each
Term vector is encoded, and obtains the corresponding task text of each analysis task according to coding result, the hidden vector sum correlation matrix of task
This vector;Wherein, analysis task includes that element judges task and case by prediction task;Element is to judgement case by relevant multiple
Science of law element;Element judges that the number of task is identical as the number of element, and each element judges that task respectively corresponds a science of law and wants
Element;The number of the hidden vector of task is identical as the number of analysis task, and each hidden vector of task respectively corresponds an analysis task;It will be each
Element judges that the corresponding task text vector of task carries out maximum pond, acquisition element judge the overall task text of task to
Amount, the overall task text vector for judging task to element using second circulation neural network and case are by prediction task corresponding
Business text vector is encoded, and acquisition case is by the corresponding first hidden vector of prediction task, and by case by prediction task corresponding the
One hidden vector is input to case by prediction model, obtains case to be analyzed and describes the case of text by prediction result;Wherein, it first follows
The hidden vector of ring neural network, task, correlation matrix, second circulation neural network and case are all based on example-based approach by prediction model
What rule document obtained after being trained.
In addition, the logical order in above-mentioned memory 302 can be realized by way of SFU software functional unit and conduct
Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally
The technical solution of the inventive embodiments substantially part of the part that contributes to existing technology or the technical solution in other words
It can be embodied in the form of software products, which is stored in a storage medium, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the present invention respectively
The all or part of the steps of a embodiment method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory
(ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk
Etc. the various media that can store program code.
Another embodiment of the present invention provides a kind of non-transient computer readable storage medium, non-transient computer readable storages
Medium storing computer instruction, computer instruction make computer execute the analysis of law merit provided by above-mentioned each method embodiment
Method, for example, text is described to case to be analyzed and is segmented and named Entity recognition, obtains sentence sequence, event
Sequence and name entity;Each word, sequence of events and the name entity for including according to sentence sequence, obtain multiple term vectors, benefit
Each term vector is encoded with first circulation neural network, and is obtained according to coding result, the hidden vector sum correlation matrix of task
Take the corresponding task text vector of each analysis task;Wherein, analysis task includes that element judges task and case by prediction task;It wants
Element is to judgement case by relevant multiple science of law elements;Element judges that the number of task is identical as the number of element, and each element is sentenced
Disconnected task respectively corresponds a science of law element;The number of the hidden vector of task is identical as the number of analysis task, each hidden vector of task
Respectively correspond an analysis task;Each element is judged that the corresponding task text vector of task carries out maximum pond, obtains element
The overall task text vector of judgement task, using second circulation neural network to element judge the overall task text of task to
Amount is encoded with case by the corresponding task text vector of prediction task, acquisition case by the corresponding first hidden vector of prediction task,
And case is input to case by prediction model by the corresponding first hidden vector of prediction task, it obtains case to be analyzed and describes text
Case is by prediction result;Wherein, the hidden vector of first circulation neural network, task, correlation matrix, second circulation neural network and case by
Prediction model is all based on and obtains after sample legal documents are trained.
The apparatus embodiments described above are merely exemplary, wherein unit can be as illustrated by the separation member
Or may not be and be physically separated, component shown as a unit may or may not be physical unit, i.e.,
It can be located in one place, or may be distributed over multiple network units.It can select according to the actual needs therein
Some or all of the modules achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creative labor
In the case where dynamic, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Such understanding, above-mentioned skill
Substantially the part that contributes to existing technology can be embodied in the form of software products art scheme in other words, the calculating
Machine software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used
So that a computer equipment (can be personal computer, server or the network equipment etc.) executes above-mentioned each implementation
The method of certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. a kind of law merit analysis method characterized by comprising
Text is described to case to be analyzed and is segmented and named Entity recognition, obtains sentence sequence, sequence of events and name
Entity;
Each word, the sequence of events and the name entity for including according to the sentence sequence, obtain multiple term vectors, benefit
Each term vector is encoded with first circulation neural network, and according to coding result, the hidden vector sum Correlation Moment of task
Battle array obtains the corresponding task text vector of each analysis task;Wherein, the analysis task includes that element judges task and case by pre-
Survey task;Element is to judgement case by relevant multiple science of law elements;The element judges the number of task and the number of element
Identical, each element judges that task respectively corresponds the science of law element;The number of the hidden vector of task with described point
The number of analysis task is identical, and each hidden vector of task respectively corresponds the analysis task;
Each element is judged that the corresponding task text vector of task carries out maximum pond, the element is obtained and judges task
Overall task text vector judges the element overall task text vector and the institute of task using second circulation neural network
Case is stated to be encoded by the corresponding task text vector of prediction task, obtain the case from prediction task it is corresponding first it is hidden to
Amount, and the case is input to case by prediction model by the corresponding first hidden vector of prediction task, it obtains case to be analyzed and retouches
The case of text is stated by prediction result;
Wherein, the first circulation neural network, the hidden vector of the task, the correlation matrix, the second circulation nerve net
Network and the case are all based on and are obtained after sample legal documents are trained by prediction model.
2. law merit analysis method according to claim 1, which is characterized in that the analysis task further include: related
Law article predicts task and duration prediction task;
Correspondingly, it obtains the element and judges and include: after the overall task text vector of task
The overall task text vector that judges task to the element using second circulation neural network, the case are by prediction task
Corresponding task text vector, the related law article prediction corresponding task text vector of task and the duration prediction task pair
The task text vector answered is encoded, and obtains the case by prediction task, the related law article prediction task and the duration
The corresponding first hidden vector of prediction task;
The case is predicted into the corresponding first hidden vector sum of task by the corresponding first hidden vector of prediction task, the related law article
The corresponding first hidden vector of the duration prediction task is separately input into the case by prediction model, related law article prediction model
And duration prediction model, it obtains case to be analyzed and describes the case of text by prediction result, related law article prediction result and duration
Prediction result;
Wherein, the related law article prediction model and the duration prediction model are all based on the sample legal documents and carry out
It is obtained after training.
3. law merit analysis method according to claim 1, which is characterized in that described hidden according to coding result, task
Vector sum correlation matrix obtains after the corresponding task text vector of each analysis task, further includes:
Each element is judged into the corresponding task text vector of task, the element is separately input into and judges that task is corresponding
Element judgment models obtain the result that the element judges task;
Wherein, each element judges the corresponding element judgment models of task, is all based on the sample legal documents and is instructed
It is obtained after white silk.
4. law merit analysis method according to claim 1, which is characterized in that case to be analyzed describe text into
Row participle and name Entity recognition, the specific steps for obtaining sentence sequence, sequence of events and name entity include:
Text is described to the case to be analyzed and carries out participle and part-of-speech tagging, stop words is deleted, obtains multiple sentences;It is each
Sentence includes several words and the corresponding part of speech of each word;
The multiple sentence is screened according to the triggering vocabulary constructed in advance, retains and describes material facts related with case
Sentence, form the sentence sequence;
The word and the corresponding word of word for including according to each sentence in default rule, syntax dependence, the sentence sequence
Property, it obtains the case to be analyzed and describes several events and each name entity that text describes, it will several described events
According to the sequencing of Time To Event, the sequence of events is formed.
5. law merit analysis method according to claim 1, which is characterized in that according to the sentence sequence include it is each
Word, the sequence of events and the name entity, the specific steps for obtaining multiple term vectors include:
Each word for including by the sentence sequence is carried out according to the sequencing of each Time To Event in the sequence of events
Splicing obtains sequence of terms;
The sequence of terms is mapped according to the term vector table that pre-training obtains, obtain that the sentence sequence includes is each
The original term vector of word;
For each word that the sentence sequence includes, according to event and the word described in sentence where the word
Whether it is the name entity, the original term vector of the word is extended, the corresponding term vector of the word is obtained, obtains
Take the multiple term vector.
6. law merit analysis method according to claim 1, which is characterized in that described hidden according to coding result, task
The specific steps that vector sum correlation matrix obtains the corresponding task text vector of each analysis task include:
For each analysis task, according to the coding result, the hidden vector sum of the corresponding task of the analysis task
Correlation matrix obtains the corresponding weight of the coding result, and is tied according to the corresponding weight of the coding result to the coding
Fruit is weighted summation, obtains the corresponding task text vector of the analysis task.
7. law merit analysis method according to any one of claims 1 to 6, which is characterized in that the first circulation nerve
Network is long Memory Neural Networks in short-term;The second circulation neural network is long Memory Neural Networks in short-term.
8. a kind of law merit analytical equipment characterized by comprising
Data processing module is segmented and is named Entity recognition for describing text to case to be analyzed, obtains sentence sequence
Column, sequence of events and name entity;
True coding module, each word, the sequence of events and the name entity for including according to the sentence sequence,
Obtain multiple term vectors, each term vector encoded using first circulation neural network, and according to coding result, appoint
The corresponding task text vector of each analysis task of hidden vector sum correlation matrix acquisition of being engaged in;Wherein, the analysis task includes element
Judgement task and case are by prediction task;Element is to judgement case by relevant multiple science of law elements;The element judges task
Number is identical as the number of element, and each element judges that task respectively corresponds the science of law element;The task it is hidden to
The number of amount is identical as the number of the analysis task, and each hidden vector of task respectively corresponds the analysis task;
Task sequence prediction module, for each element to be judged that the corresponding task text vector of task carries out maximum pond,
The overall task text vector that the element judges task is obtained, task is judged to the element using second circulation neural network
Overall task text vector and the case encoded by the corresponding task text vector of prediction task, obtain the case by pre-
The corresponding first hidden vector of survey task, and the case is input to case by prediction mould by the corresponding first hidden vector of prediction task
Type obtains case to be analyzed and describes the case of text by prediction result;
Wherein, the first circulation neural network, the hidden vector of the task, the correlation matrix, the second circulation nerve net
Network and the case are all based on and are obtained after sample legal documents are trained by prediction model.
9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine program, which is characterized in that the processor realizes law as described in any one of claim 1 to 7 when executing described program
The step of merit analysis method.
10. a kind of non-transient computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer
The step of law merit analysis method as described in any one of claim 1 to 7 is realized when program is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910379141.1A CN110276068B (en) | 2019-05-08 | 2019-05-08 | Legal case analysis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910379141.1A CN110276068B (en) | 2019-05-08 | 2019-05-08 | Legal case analysis method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110276068A true CN110276068A (en) | 2019-09-24 |
CN110276068B CN110276068B (en) | 2020-08-28 |
Family
ID=67959767
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910379141.1A Active CN110276068B (en) | 2019-05-08 | 2019-05-08 | Legal case analysis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110276068B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110928987A (en) * | 2019-10-18 | 2020-03-27 | 平安科技(深圳)有限公司 | Legal provision retrieval method based on neural network hybrid model and related equipment |
CN111325387A (en) * | 2020-02-13 | 2020-06-23 | 清华大学 | Interpretable law automatic decision prediction method and device |
CN111382333A (en) * | 2020-03-11 | 2020-07-07 | 昆明理工大学 | Case element extraction method in news text sentence based on case correlation joint learning and graph convolution |
CN111460834A (en) * | 2020-04-09 | 2020-07-28 | 北京北大软件工程股份有限公司 | French semantic annotation method and device based on L STM network |
CN111523313A (en) * | 2020-07-03 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Model training and named entity recognition method and device |
CN111552808A (en) * | 2020-04-20 | 2020-08-18 | 北京北大软件工程股份有限公司 | Administrative illegal case law prediction method and tool based on convolutional neural network |
CN111797221A (en) * | 2020-06-16 | 2020-10-20 | 北京北大软件工程股份有限公司 | Similar case recommendation method and device |
CN112100212A (en) * | 2020-09-04 | 2020-12-18 | 中国航天科工集团第二研究院 | Case scenario extraction method based on machine learning and rule matching |
CN113157880A (en) * | 2021-03-25 | 2021-07-23 | 科大讯飞股份有限公司 | Element content obtaining method, device, equipment and storage medium |
US11256856B2 (en) | 2017-10-17 | 2022-02-22 | Handycontract Llc | Method, device, and system, for identifying data elements in data structures |
US11475209B2 (en) | 2017-10-17 | 2022-10-18 | Handycontract Llc | Device, system, and method for extracting named entities from sectioned documents |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239445A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | The method and system that a kind of media event based on neutral net is extracted |
CN107818138A (en) * | 2017-09-28 | 2018-03-20 | 银江股份有限公司 | A kind of case legal regulation recommends method and system |
CN108009284A (en) * | 2017-12-22 | 2018-05-08 | 重庆邮电大学 | Using the Law Text sorting technique of semi-supervised convolutional neural networks |
CN108304911A (en) * | 2018-01-09 | 2018-07-20 | 中国科学院自动化研究所 | Knowledge Extraction Method and system based on Memory Neural Networks and equipment |
WO2018147653A1 (en) * | 2017-02-08 | 2018-08-16 | 사회복지법인 삼성생명공익재단 | Method, device and computer program for generating survival rate prediction model |
CN109308355A (en) * | 2018-09-17 | 2019-02-05 | 清华大学 | Legal decision prediction of result method and device |
-
2019
- 2019-05-08 CN CN201910379141.1A patent/CN110276068B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018147653A1 (en) * | 2017-02-08 | 2018-08-16 | 사회복지법인 삼성생명공익재단 | Method, device and computer program for generating survival rate prediction model |
CN107239445A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | The method and system that a kind of media event based on neutral net is extracted |
CN107818138A (en) * | 2017-09-28 | 2018-03-20 | 银江股份有限公司 | A kind of case legal regulation recommends method and system |
CN108009284A (en) * | 2017-12-22 | 2018-05-08 | 重庆邮电大学 | Using the Law Text sorting technique of semi-supervised convolutional neural networks |
CN108304911A (en) * | 2018-01-09 | 2018-07-20 | 中国科学院自动化研究所 | Knowledge Extraction Method and system based on Memory Neural Networks and equipment |
CN109308355A (en) * | 2018-09-17 | 2019-02-05 | 清华大学 | Legal decision prediction of result method and device |
Non-Patent Citations (1)
Title |
---|
刘宗林 等: "《融入罪名关键词的法律判决预测多任务学习模型》", 《清华大学学报(自然科学版)》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11475209B2 (en) | 2017-10-17 | 2022-10-18 | Handycontract Llc | Device, system, and method for extracting named entities from sectioned documents |
US11256856B2 (en) | 2017-10-17 | 2022-02-22 | Handycontract Llc | Method, device, and system, for identifying data elements in data structures |
WO2021072892A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Legal provision search method based on neural network hybrid model, and related device |
CN110928987A (en) * | 2019-10-18 | 2020-03-27 | 平安科技(深圳)有限公司 | Legal provision retrieval method based on neural network hybrid model and related equipment |
CN110928987B (en) * | 2019-10-18 | 2023-07-25 | 平安科技(深圳)有限公司 | Legal provision retrieval method and related equipment based on neural network hybrid model |
CN111325387A (en) * | 2020-02-13 | 2020-06-23 | 清华大学 | Interpretable law automatic decision prediction method and device |
CN111325387B (en) * | 2020-02-13 | 2023-08-18 | 清华大学 | Interpretable law automatic decision prediction method and device |
CN111382333A (en) * | 2020-03-11 | 2020-07-07 | 昆明理工大学 | Case element extraction method in news text sentence based on case correlation joint learning and graph convolution |
CN111382333B (en) * | 2020-03-11 | 2022-06-21 | 昆明理工大学 | Case element extraction method in news text sentence based on case correlation joint learning and graph convolution |
CN111460834A (en) * | 2020-04-09 | 2020-07-28 | 北京北大软件工程股份有限公司 | French semantic annotation method and device based on L STM network |
CN111460834B (en) * | 2020-04-09 | 2023-06-06 | 北京北大软件工程股份有限公司 | French semantic annotation method and device based on LSTM network |
CN111552808A (en) * | 2020-04-20 | 2020-08-18 | 北京北大软件工程股份有限公司 | Administrative illegal case law prediction method and tool based on convolutional neural network |
CN111797221A (en) * | 2020-06-16 | 2020-10-20 | 北京北大软件工程股份有限公司 | Similar case recommendation method and device |
CN111797221B (en) * | 2020-06-16 | 2023-12-08 | 北京北大软件工程股份有限公司 | Similar case recommending method and device |
CN111523313A (en) * | 2020-07-03 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Model training and named entity recognition method and device |
CN111523313B (en) * | 2020-07-03 | 2020-09-29 | 支付宝(杭州)信息技术有限公司 | Model training and named entity recognition method and device |
CN112100212A (en) * | 2020-09-04 | 2020-12-18 | 中国航天科工集团第二研究院 | Case scenario extraction method based on machine learning and rule matching |
CN113157880B (en) * | 2021-03-25 | 2023-01-17 | 科大讯飞股份有限公司 | Element content obtaining method, device, equipment and storage medium |
CN113157880A (en) * | 2021-03-25 | 2021-07-23 | 科大讯飞股份有限公司 | Element content obtaining method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110276068B (en) | 2020-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276068A (en) | Law merit analysis method and device | |
CN110825901A (en) | Image-text matching method, device and equipment based on artificial intelligence and storage medium | |
ALRashdi et al. | Deep learning and word embeddings for tweet classification for crisis response | |
CN108197098A (en) | A kind of generation of keyword combined strategy and keyword expansion method, apparatus and equipment | |
CN111209384A (en) | Question and answer data processing method and device based on artificial intelligence and electronic equipment | |
CN108335693B (en) | Language identification method and language identification equipment | |
CN108549658A (en) | A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree | |
CN111522987A (en) | Image auditing method and device and computer readable storage medium | |
CN111563158B (en) | Text ranking method, ranking apparatus, server and computer-readable storage medium | |
CN110457585B (en) | Negative text pushing method, device and system and computer equipment | |
CN112507912B (en) | Method and device for identifying illegal pictures | |
CN110188195A (en) | A kind of text intension recognizing method, device and equipment based on deep learning | |
CN108229527A (en) | Training and video analysis method and apparatus, electronic equipment, storage medium, program | |
Pardos et al. | Imputing KCs with representations of problem content and context | |
Altadmri et al. | A framework for automatic semantic video annotation: Utilizing similarity and commonsense knowledge bases | |
CN108268629A (en) | Image Description Methods and device, equipment, medium, program based on keyword | |
CN108229170A (en) | Utilize big data and the software analysis method and device of neural network | |
CN109271624A (en) | A kind of target word determines method, apparatus and storage medium | |
CN110287314A (en) | Long text credibility evaluation method and system based on Unsupervised clustering | |
CN111985207A (en) | Method and device for acquiring access control policy and electronic equipment | |
CN114372532A (en) | Method, device, equipment, medium and product for determining label marking quality | |
CN112818212B (en) | Corpus data acquisition method, corpus data acquisition device, computer equipment and storage medium | |
Sethi et al. | Large-scale multimedia content analysis using scientific workflows | |
CN115329176A (en) | Search request processing method and device, computer equipment and storage medium | |
O'Keefe et al. | Deep learning and word embeddings for tweet classification for crisis response |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |