Nothing Special   »   [go: up one dir, main page]

CN110276068A - Law merit analysis method and device - Google Patents

Law merit analysis method and device Download PDF

Info

Publication number
CN110276068A
CN110276068A CN201910379141.1A CN201910379141A CN110276068A CN 110276068 A CN110276068 A CN 110276068A CN 201910379141 A CN201910379141 A CN 201910379141A CN 110276068 A CN110276068 A CN 110276068A
Authority
CN
China
Prior art keywords
task
case
vector
prediction
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910379141.1A
Other languages
Chinese (zh)
Other versions
CN110276068B (en
Inventor
肖朝军
钟皓曦
曾国洋
刘知远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201910379141.1A priority Critical patent/CN110276068B/en
Publication of CN110276068A publication Critical patent/CN110276068A/en
Application granted granted Critical
Publication of CN110276068B publication Critical patent/CN110276068B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Tourism & Hospitality (AREA)
  • Computing Systems (AREA)
  • Technology Law (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present invention provides a kind of law merit analysis method and device.Wherein, method includes: and describes text to case to be analyzed to be segmented and named Entity recognition, obtains sentence sequence;Multiple term vectors are obtained according to each word that sentence sequence includes, each term vector are encoded using first circulation neural network, and obtain the corresponding task text vector of each analysis task;Each element is judged that the corresponding task text vector of task carries out maximum pond, obtain the overall task text vector that element judges task, the overall task text vector of task and case, which are encoded by the corresponding task text vector of prediction task, to be judged to element using second circulation neural network, acquisition case is by the corresponding first hidden vector of prediction task, and case is input to case by prediction model by the corresponding first hidden vector of prediction task, acquisition case is by prediction result.Law merit analysis method provided in an embodiment of the present invention and device, can improve accuracy of analysis.

Description

Law merit analysis method and device
Technical field
The present invention relates to field of computer technology, more particularly, to a kind of law merit analysis method and device.
Background technique
With the high speed development of artificial intelligence technology, utilize artificial intelligence to help judicial domain develops into the epoch Inexorable trend.In recent years, the cross discipline research of artificial intelligence and law had very much.Eighties of last century, many scholars utilize number It learns statistic algorithm, Keywords matching algorithm and merit analysis is carried out to legal case.With the development of machine learning techniques, Geng Duoxue The method that person begins through manual withdrawal text feature, further to automatically analyze merit.As depth learning technology high speed is sent out Exhibition, many scholars, which are absorbed in, extracts the information contained in text using neural network, to further increase the quality of case analysis. But these methods can not generally solve in actual scene caseload that distribution is extremely unbalanced, similar charge is extremely easily obscured The problem of.In actual scene, there are many charge, law article frequency of occurrence are very low, and traditional deep learning model can not be accurate Ground provides the analysis result of these cases.In other words, traditional deep learning method is merely able to analyze the most common part Charge/case by case facts, and the prior art can not distinguish the difference of similar charge case well, without good practical Property.
In conclusion existing technology is merely able to the case facts of the part charge of analysis of high frequency, and cannot be distinguished similar The case of charge, therefore the prior art is low to the accuracy of analysis of case and coverage rate is low.
Summary of the invention
The embodiment of the present invention provides a kind of law merit analysis method and device, to solve or at least be partially solved The low defect of existing law merit analysis method accuracy.
In a first aspect, the embodiment of the present invention provides a kind of law merit analysis method, comprising:
Text is described to case to be analyzed and is segmented and named Entity recognition, obtain sentence sequence, sequence of events and Name entity;
Each word, the sequence of events and the name entity for including according to the sentence sequence, obtain multiple words to Amount, encodes each term vector using first circulation neural network, and according to coding result, the hidden vector sum phase of task It closes matrix and obtains the corresponding task text vector of each analysis task;Wherein, the analysis task includes that element judges task and case By prediction task;Element is to judgement case by relevant multiple science of law elements;The element judges the number and element of task Number is identical, and each element judges that task respectively corresponds the science of law element;The number of the hidden vector of task and institute The number for stating analysis task is identical, and each hidden vector of task respectively corresponds the analysis task;
Each element is judged that the corresponding task text vector of task carries out maximum pond, the element judgement is obtained and appoints The overall task text vector of business judges the element overall task text vector of task using second circulation neural network It is encoded with the case by the corresponding task text vector of prediction task, it is corresponding first hidden by prediction task to obtain the case Vector, and the case is input to case by prediction model by the corresponding first hidden vector of prediction task, obtain case to be analyzed The case of text is described by prediction result;
Wherein, the first circulation neural network, the hidden vector of the task, the correlation matrix, second circulation mind Through network and the case by prediction model, it is all based on and is obtained after sample legal documents are trained.
Preferably, the analysis task further include: related law article prediction task and duration prediction task;
Correspondingly, it obtains the element and judges and include: after the overall task text vector of task
The overall task text vector that judges task to the element using second circulation neural network, the case are by predicting The corresponding task text vector of task, the related law article prediction corresponding task text vector of task and the duration prediction are appointed Corresponding task text vector of being engaged in is encoded, and obtains the case by prediction task, the related law article prediction task and described The corresponding first hidden vector of duration prediction task;
By the case from the corresponding first hidden vector of prediction task, the related law article prediction task it is corresponding first it is hidden to The first hidden vector corresponding with the duration prediction task is measured, is separately input into the case by prediction model, related law article prediction Model and duration prediction model, obtain case to be analyzed describe the case of text by prediction result, related law article prediction result and Duration prediction result;
Wherein, the related law article prediction model and the duration prediction model, are all based on the sample legal documents It is obtained after being trained.
Preferably, described that the corresponding task of each analysis task is obtained according to coding result, the hidden vector sum correlation matrix of task After text vector, further includes:
Each element is judged into the corresponding task text vector of task, the element is separately input into and judges task pair The element judgment models answered obtain the result that the element judges task;
Wherein, each element judges the corresponding element judgment models of task, be all based on the sample legal documents into It is obtained after row training.
Preferably, text is described to case to be analyzed and is segmented and named Entity recognition, obtain sentence sequence, event Sequence and the specific steps of name entity include:
Text is described to the case to be analyzed and carries out participle and part-of-speech tagging, stop words is deleted, obtains multiple sentences; Each sentence includes several words and the corresponding part of speech of each word;
The multiple sentence is screened according to the triggering vocabulary constructed in advance, it is related with case important to retain description True sentence, forms the sentence sequence;
It is corresponding according to the word and word that each sentence includes in default rule, syntax dependence, the sentence sequence Part of speech, obtain the case to be analyzed describe text description several events and each name entity, will it is described several Event forms the sequence of events according to the sequencing of Time To Event.
Preferably, each word, the sequence of events and the name entity for including according to the sentence sequence obtain more The specific steps of a term vector include:
Each word for including by the sentence sequence, according to the sequencing of each Time To Event in the sequence of events Spliced, obtains sequence of terms;
The sequence of terms is mapped according to the term vector table that pre-training obtains, obtaining the sentence sequence includes The original term vector of each word;
For each word that the sentence sequence includes, according to event described in sentence where the word and described Whether word is the name entity, is extended to the original term vector of the word, obtain the corresponding word of the word to Amount, obtains the multiple term vector.
Preferably, described that the corresponding task of each analysis task is obtained according to coding result, the hidden vector sum correlation matrix of task The specific steps of text vector include:
For each analysis task, according to the coding result, the hidden vector sum of the corresponding task of the analysis task The correlation matrix obtains the corresponding weight of the coding result, and according to the corresponding weight of the coding result to the volume Code result is weighted summation, obtains the corresponding task text vector of the analysis task.
Preferably, the first circulation neural network is long Memory Neural Networks in short-term;The second circulation neural network For long Memory Neural Networks in short-term.
Second aspect, the embodiment of the present invention provide a kind of law merit analytical equipment, comprising:
Data processing module is segmented and is named Entity recognition for describing text to case to be analyzed, obtains sentence Subsequence, sequence of events and name entity;
True coding module, each word, the sequence of events and the name for including according to the sentence sequence Entity obtains multiple term vectors, is encoded using first circulation neural network to each term vector, and is tied according to coding The hidden vector sum correlation matrix of fruit, task obtains the corresponding task text vector of each analysis task;Wherein, the analysis task includes Element judges task and case by prediction task;Element is to judgement case by relevant multiple science of law elements;The element judgement is appointed The number of business and the number of element are identical, and each element judges that task respectively corresponds the science of law element;The task The number of hidden vector is identical as the number of the analysis task, and each hidden vector of task respectively corresponds the analysis and appoints Business;
Task sequence prediction module, for each element to be judged that the corresponding task text vector of task carries out maximum pond Change, obtains the overall task text vector that the element judges task, the element is judged using second circulation neural network The overall task text vector of task and the case are encoded by the corresponding task text vector of prediction task, obtain the case Case is input to by predicting by the corresponding first hidden vector of prediction task by the corresponding first hidden vector of prediction task, and by the case Model obtains case to be analyzed and describes the case of text by prediction result;
Wherein, the first circulation neural network, the hidden vector of the task, the correlation matrix, second circulation mind Through network and the case by prediction model, it is all based on and is obtained after sample legal documents are trained.
The third aspect, the embodiment of the present invention provides a kind of electronic equipment, including memory, processor and is stored in memory Computer program that is upper and can running on a processor, realizes the various possible realizations such as first aspect when executing described program In mode provided by any possible implementation the step of law merit analysis method.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium, are stored thereon with calculating Machine program, when which is executed by processor realize as first aspect various possible implementations in it is any can The step of law merit analysis method provided by the implementation of energy.
Law merit analysis method provided in an embodiment of the present invention and device, the dependence based on law element and case between Relationship analyzes law merit, the case of similar charge can be distinguished according to element, and can be suitably used for analyzing whole cases by Case facts, and be not limited to common part case by case facts, so as to greatly improve the accuracy of case analysis, and have There is higher case coverage rate.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow diagram according to law merit analysis method provided in an embodiment of the present invention;
Fig. 2 is the structural schematic diagram according to law merit analytical equipment provided in an embodiment of the present invention;
Fig. 3 is the entity structure schematic diagram according to electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
In order to overcome the above problem of the prior art, the embodiment of the present invention provides a kind of law merit analysis method and dress It sets, inventive concept is analyzed to judgement case by relevant multiple science of law elements by the model pair that training obtains, according to The analysis result and machine learning model of science of law element obtain more accurate case by prediction result.
Fig. 1 is the flow diagram according to law merit analysis method provided in an embodiment of the present invention.As shown in Figure 1, side Method includes: step S101, text is described to case to be analyzed is segmented and named Entity recognition, obtains sentence sequence, thing Part sequence and name entity.
Specifically, case to be analyzed describes text, describes one section of case facts.
Each sentence in sentence sequence is a sequence of terms.The sequence of terms describes text by case to be analyzed In a sentence (referring to the sentence separated by comma, branch or fullstop) carry out participle acquisition.
It for Chinese text, can be segmented using existing any Chinese word segmentation packet, such as the Chinese of existing open source Participle packet thulac.
For each sentence in sentence sequence, if the sentence includes specific word, the available sentence includes Event, so as to obtain whole events that sentence sequence includes.
For example, the sentence includes attack if in some sentence in sentence sequence including " hitting " word.
Entity is named to include at least name, place name and organization etc..Name, place name and organization etc. have apparent Text feature, thus each name entity in the word that sentence sequence includes can be extracted.
Step S102, each word, sequence of events and the name entity for including according to sentence sequence, obtains multiple term vectors, Each term vector is encoded using first circulation neural network, and according to coding result, the hidden vector sum correlation matrix of task Obtain the corresponding task text vector of each analysis task.
Wherein, analysis task includes that element judges task and case by prediction task;Element is to judgement case by relevant more A science of law element;Element judges that the number of task is identical as the number of element, and each element judges that task respectively corresponds a science of law Element;The number of the hidden vector of task is identical as the number of analysis task, and each hidden vector of task respectively corresponds an analysis task.
Specifically, deep learning model reads the word of serializing in the form of term vector sequence, obtains sentence sequence, thing After part sequence and name entity, for each word that sentence sequence includes, any relevant mode for being used to generate term vector is utilized Type, and binding events sequence and name entity, can obtain a term vector sequence.The term vector sequence include multiple words to Amount, each term vector correspond to the word that sentence sequence includes.
It can be any one in Word2vec, GloVe and FastText etc. for generating the correlation model of term vector Kind, the embodiment of the present invention is not specifically limited this.
After obtaining term vector sequence, first circulation neural network can use to each term vector in the term vector sequence It is encoded, captures the semantic information of sentence forward-backward correlation, coding result is the second hidden sequence vector or the second hidden vector matrix. The length of second hidden sequence vector is identical as the length of term vector sequence, i.e., the number of the second hidden vector in the second hidden sequence vector For the number for the word that sentence sequence includes.
Any term vector inputs first circulation neural network, and first circulation neural network exports a new vector, referred to as Second hidden vector.
Second hidden sequence vector is mapped to by text vector relevant to analysis task in order to obtain using attention mechanism Different task text spaces obtains the corresponding task text vector of different analysis tasks.
Analysis task includes at least element and judges task and case by prediction task.Element is to judgement case by relevant multiple Science of law element, thus element judges that the number of task to be multiple, is respectively used to judge the value of different science of law elements And prediction.Element is predetermined, how many element, and correspondingly how many element predicts task.
For example, element may include profit, dealing, death, violence, government offices or country's work for criminal case Personnel public arena, illegally occupy, injure, 10 elements such as during subject intent and production operation.
Whether the meaning of above-mentioned 10 elements is respectively as follows: profit, refer to defendant (or suspect) for the purpose of profit;It buys It sells, refers in defendant (or suspect) behavior whether be related to act of purchase and sale;Death refers to whether the injured party is dead;Violence, refer to by Accuse whether (or suspect) uses violent means crime;Government offices or functionary in the state organ refer in case whether relate to And government offices and functionary in the state organ;Public arena, refers to whether case occurs in public;Illegally occupy, refer to defendant (or Suspect) whether for the purpose of illegally occupying;Injury, refers to whether the injured party is injured;Subject intent refers to defendant's (or crime Suspect) it is subjective whether calculated crime;During production operation, refer to whether case occurs during production operation.
It, can be with for different types of administrative case (such as case involving public security, traffic offence case and industrial and commercial administration case) Using corresponding element, with judgement case by.
It is understood that each element, which judges task, a corresponding task text vector.
In order to realize attention mechanism, define the hidden vector of task for each analysis task, thus task it is hidden to The number of amount and the number of analysis task are identical, and each hidden vector of task respectively corresponds an analysis task.Task vector is used as and looks into It askes vector (query).
Correlation matrix, the degree of correlation for presentation code result and the hidden vector of each task.
Case by prediction task, for predict case by.
Step S103, each element is judged that the corresponding task text vector of task carries out maximum pond, obtains element judgement The overall task text vector of task, using second circulation neural network to element judge task overall task text vector and Case is encoded by the corresponding task text vector of prediction task, and acquisition case is incited somebody to action by the corresponding first hidden vector of prediction task Case is input to case by prediction model by the corresponding first hidden vector of prediction task, obtain case to be analyzed describe the case of text by Prediction result.
Since element judges the number of the corresponding task text vector of task to be multiple, for the ease of carry out case by predicting, By each element judge the corresponding task text vector of task be defined as judging each element the corresponding task text vector of task into The result of row maximum Chi Huahou.
tattr=max_pooling ([t1,t2,...,tk])
tattr,i=max (t1,i,t2,i,...,tk,i)
Wherein, tattrIndicate that element judges the overall task text vector of task;t1,t2,...,tkRespectively indicate each element The corresponding task text vector of judgement task;K is positive integer, indicates that element judges the number of task;tattr,iIndicate tattr? I element value;1≤i≤d1, d1The dimension of expression task text vector;t1,i,t2,i,...,tk,iRespectively indicate each element judgement I-th of element value of the corresponding task text vector of task.
The dependence between each analysis task is captured using second circulation neural network, when analysis task includes that element is sentenced When disconnected task and case are by prediction task, element judges the overall task text vector of task for tattr, case corresponds to by prediction task Task text vector be taccu, by tattrAnd taccuJudge that task and case form task sequence by the sequence of prediction task by element Column, obtain element by second circulation neural network and judge the corresponding first hidden vector of taskIt is corresponding by prediction task with case The first hidden vectorCalculation formula is
Wherein, RNN indicates the operation that second circulation neural network executes.
Acquisition case is by the corresponding first hidden vector of prediction taskLater, willIt is input to case by prediction model, in fact Now by the first hidden vectorMapping is appeared in court by the corresponding Label space of prediction task, acquisition case is by prediction result.
For example, case may include larceny, robbery crime, deliberately wound by the corresponding label of prediction task for criminal case Evil crime and corruption offence etc.;For traffic offence case, case by the corresponding label of prediction task may include hypervelocity, not believe by traffic Signal lamp regulation is current, violate traffic prohibitory sign and intentional block is stained automotive number plate etc..
Case can be any trained classifier by prediction model, for example, support vector machines, artificial neural network and Decision tree etc..
For example, using trained full Connection Neural Network as case by prediction model, forCase is by prediction model OutputWherein, Yaccu=larceny, robbery crime ..., then the case that case is exported by prediction model is by predicting As a result yaccuFor
Wherein, the operation that softmax expression case is executed by prediction model;WaccuAnd baccuExpression case is by prediction model Parameter.
It is understood that case is by prediction result.yaccuFor vector, yaccuThe value of every dimension represents corresponding label Probability.That is, yaccuEach element value respectively indicate corresponding case by the probability of label.
Wherein, the hidden vector of first circulation neural network, task, correlation matrix, second circulation neural network and case are by predicting Model is all based on and obtains after sample legal documents are trained.
It is understood that can be trained based on sample legal documents, adjusting parameter obtains first circulation nerve net Network, the hidden vector of each task, correlation matrix, second circulation neural network and case are by prediction model.
Sample legal documents refer to the legal documents for determining final legal consequences.For example, can be method for criminal case The judgement document of institute;It can be the administrative punishment form that administration organ assigns for administrative case.
The embodiment of the present invention analyzes law merit based on the dependence of law element and case between, can basis Element distinguishes the case of similar charge, and can be suitably used for analyzing whole cases by case facts, and be not limited to common part case By case facts, so as to greatly improve the accuracy of case analysis, and there is higher case coverage rate.
Content based on the various embodiments described above, analysis task further include: related law article prediction task and duration prediction task.
Specifically, in order to further increase merit analysis it is comprehensive, related law article and duration can also be analyzed, Thus analysis task further includes related law article prediction task and duration prediction task.
Related law article predicts task, for predicting related law article.
Duration prediction task, for predicting the duration of punishment.For example, for criminal case, punishment when a length of prison term;It is right In different administrative cases, the duration of punishment can be respectively the duration of administrative detention, the duration suspended business to bring up to standard and provisionally suspend driving The duration etc. of card.
Correspondingly, it obtains element and judges that the overall task text vector of task includes: later to utilize second circulation nerve net Network judges that element, the overall task text vector of task, case are pre- by the corresponding task text vector of prediction task, related law article The corresponding task text vector of survey task and the corresponding task text vector of duration prediction task are encoded, and acquisition case is by predicting Task, related law article prediction task and the corresponding first hidden vector of duration prediction task.
Specifically, analysis task includes that element judges that task, case are pre- by prediction task, related law article prediction task and duration When survey task, element is judged to the overall task text vector t of taskattr, case is by the corresponding task text vector of prediction task taccu, the corresponding task text vector t of related law article prediction tasklawTask text vector corresponding with duration prediction task ttime, judge task, case by prediction task, the sequence of related law article prediction task and duration prediction task, composition according to element Task sequence captures the dependence between each analysis task using second circulation neural network, compiles to the task sequence Code obtains element and judges the corresponding first hidden vector of taskCase is by the corresponding first hidden vector of prediction taskIt is related Law article predicts the corresponding first hidden vector of taskThe first hidden vector corresponding with duration prediction task
For example, using it is long Memory Neural Networks are as second circulation neural network in short-term when, each analysis task corresponding the The calculation formula of one hidden vector is
Wherein, LSTM indicates the operation that long Memory Neural Networks in short-term execute.
Case judges task by prediction task, dependent on each element;Related law article predicts task, appoints dependent on the judgement of each element Business and case are by prediction task;Duration prediction task judges task, case by prediction task and related law article prediction dependent on each element Task.
By case by the corresponding first hidden vector of prediction task, the corresponding first hidden vector sum duration of related law article prediction task The corresponding first hidden vector of prediction task is separately input into case by prediction model, related law article prediction model and duration prediction mould Type obtains case to be analyzed and describes the case of text by prediction result, related law article prediction result and duration prediction result.
Wherein, related law article prediction model and duration prediction model are all based on after sample legal documents are trained and obtain ?.
Specifically, acquisition case is by the corresponding first hidden vector of prediction taskRelated law article prediction task corresponding the One hidden vectorThe first hidden vector corresponding with duration prediction taskLater, willWithIt inputs respectively To case by prediction model, related law article prediction model and duration prediction model, realize the first hidden vectorWith It is respectively mapped to case to be predicted on task and the corresponding Label space of duration prediction task by prediction task, related law article, acquisition case By prediction result, related law article prediction result, duration prediction result.
Acquisition case is by prediction result, related law article prediction result, the specific steps of duration prediction result, with above-described embodiment Middle acquisition case is similar by the specific steps of prediction result, and details are not described herein again.
It is understood that can be trained based on sample legal documents, adjusting parameter, obtains related law article and predict mould Type and duration prediction model.
Analysis task includes that element judges task, case by prediction task, related law article prediction task and duration prediction task
The embodiment of the present invention carries out related law article by the dependence between related law article based on law element, case pre- It surveys, is predicted, can be obtained more acurrate by the dependence clock synchronization length between, related law article and duration based on law element, case Ground correlation law article prediction result and duration prediction are as a result, so as to improve the accuracy of case analysis and comprehensive.
Content based on the various embodiments described above obtains each analysis according to coding result, the hidden vector sum correlation matrix of task and appoints It is engaged in after corresponding task text vector, further includes: each element is judged into the corresponding task text vector of task, is inputted respectively The corresponding element judgment models of task are judged to element, obtain the result that element judges task.
Wherein, each element judges the corresponding element judgment models of task, is all based on after sample legal documents are trained It obtains.
Specifically, the corresponding task text vector t of each analysis task is obtained1,t2,...,tkLater, by t1,t2,...,tk Corresponding element judgment models are separately input into, the predicted value of each element is obtained, the result of task is judged as element.
The predictor formula of the predicted value of any element is
yi=softmax (Witi+bi)
Wherein, tiIndicate that i-th of element judges the corresponding task text of task;yiIndicate the predicted value of i-th of element;1≤ i≤k;K is positive integer, indicates the number of element;WiAnd biIndicate the parameter of i-th of element judgment models; Yattr={ being, no }.
It is understood that yiFor vector, yiThe value of every dimension represents the probability of corresponding label.For example, yi= [0.1,0.9], the probability for indicating that i-th of element value is no is 90%, and the probability that value is yes is 10%.
It is understood that can be trained based on sample legal documents, adjusting parameter obtains each element and judges mould Type.
The embodiment of the present invention judges the corresponding task text vector of task by element judgment models and each element, obtains each The predicted value of element can improve the comprehensive and intelligent level of merit analysis convenient for the main points of case are appreciated more fully.
Content based on the various embodiments described above describes text to case to be analyzed and is segmented and named Entity recognition, Obtaining sentence sequence, sequence of events and naming the specific steps of entity includes: to describe text to case to be analyzed to segment And part-of-speech tagging, stop words is deleted, multiple sentences are obtained;Each sentence includes several words and the corresponding word of each word Property.
Specifically, it describes each sentence in text to case to be analyzed to segment, each word for obtaining participle Language carries out part-of-speech tagging, and deletes stop words, and the case being analysed to describes the original series s={ s that text is converted into sentence1, s2,...,sm}.The original series include multiple sentence s1,s2,...,sm, m indicate original series in sentence quantity.
Stop words refers in processing natural language data (or text), to save memory space and improving treatment effeciency, Certain words or word are fallen in meeting automatic fitration before or after handling natural language data (or text), these words or word are referred to as Stop Words (stop words).
Law merit is analyzed, stop words mainly includes the function word for including in human language, these function words are extremely Generally, compared with other words, function word is without what physical meaning.
Each sentence s in original seriesjFor a sequence of terms sj={ wj1,wj2,...,wjnAnd each word correspondence Part of speech cj={ cj1,cj2,...,cjn}.Wherein, n indicates sentence sjThe quantity for the word for including;wjiIt indicates in j-th of sentence I-th of word;1≤j≤m;1≤i≤n;cjiIndicate the corresponding part of speech of i-th of word in j-th of sentence, i.e. wjiIt is corresponding Part of speech;cji∈ C, C indicate part of speech table.
Multiple sentences are screened according to the triggering vocabulary constructed in advance, retains and describes material facts related with case Sentence, form sentence sequence.
After obtaining original series, the sentence in original series can be sieved according to the triggering vocabulary constructed in advance Choosing detects and develops the significant fact involved in text to case, retains the sentence for describing material facts related with case, The sentence for not describing material facts related with case is deleted, the sentence of reservation is formed into sentence sequence s '={ s '1,s′2,..., s′m′}.The quantity of sentence in m ' expression sentence sequence.
Sentence comprising event trigger word is considered comprising the corresponding event of trigger word.For example, " hitting " is a triggering Word, if in some sentence in sentence sequence including " hitting " word, which includes attack.
According to the word and the corresponding word of word that each sentence includes in default rule, syntax dependence, sentence sequence Property, it obtains case to be analyzed and describes several events and each name entity that text describes, by several events according to event The sequencing of time of origin forms sequence of events.
Specifically, according to features such as syntax dependence, parts of speech, using default rule, each sentence in subordinate clause subsequence Each name entity is extracted in the word for including, and the hair of dependent event can be extracted by entities such as the name of extraction, place names The attributes such as Radix Rehmanniae point, event personage, time of origin, thus obtain description several events and each locale, It is related to personage and time of origin.
For example, it is the victim of attack that default rule, which is the object of verb " hitting ", so as to according to verb Word determination before and after " hitting " is related to the personage of attack, and subject is injurer, object is victim.
After obtaining several above-mentioned events, the real timeline that meets accident can be combed, several above-mentioned events are sent out according to event The sequencing of raw time, rather than the sequencing occurred in sentence sequence, form sequence of events.For in sequence of events Each event, in addition to mark what is other than, if getting locale, be related to personage and occur when Between.
For example, case to be analyzed describe text be " Lee's burglary property, is found during theft by owner, with Fighting with owner makes owner bleed injury, and Lee immediately escapes ";(such as v indicates dynamic for progress Chinese word segmentation and part of speech Word, p indicate that preposition, n indicate that noun, np indicate that name, d indicate that adverbial word, w indicate punctuate etc.) result of mark is (Lee, np) (entering the room, v) (theft, v) (property, n) (, w) (theft, v) (process, n) (in, f) (quilt, p) (owner, n) (it was found that v) ( W) (with that is, d), (with c), (owner, n) (occurs, v) (fight, v) and (make, v) (owner, n) (bleed, v) (injured, v) (, w) (Lee, np) (immediately, d) (escaping, v) (, u) (., w));Delete quilt, the result of medium Chinese stop words is (Lee, np) (entering the room, v) (theft, v) (property, n) (theft, v) (process, n) (owner, n) (it was found that v) (with that is, d) (with c) (owner, N) (fighting, v) (owner, n) (bleeding, v) (injured, v) (Lee, np) (immediately, d) (escaping, v) (occurs, v);It is named The result of Entity recognition is that acquisition sentence " steal process owner and find master flow of fighting with owner by Lee's burglary property Blood injury Lee immediately escapes ", entity includes (Lee, np) and (owner, n);It is detected, acquisition sequence of events is event 1: Thievery, personage: Lee, event 2: attack, personage: Lee, owner.
The embodiment of the present invention screens sentence by trigger word, can screen out junk fact, reduces input noise, from And it can be reduced data processing amount and improve accuracy of analysis.
Content based on the various embodiments described above, each word for including according to sentence sequence, sequence of events and name entity, is obtained The specific steps for taking multiple term vectors include: each word for including by sentence sequence, when occurring according to each event in sequence of events Between sequencing spliced, obtain sequence of terms.
Specifically, the sentence sequence s ' word for including is spliced according to the sequencing of Time To Event, is obtained One input sequence of terms w={ w1,w2,...,wl}.Wherein, l indicates the quantity of word.
Sequence of terms is mapped according to the term vector table that pre-training obtains, obtains each word that sentence sequence includes Original term vector.
Pre-training is carried out to term vector, obtains a term vector table.Pre-training can using Word2vec, GloVe and Any one method in FastText etc., the embodiment of the present invention are not specifically limited this.
The sequence of terms of input is mapped to obtain the original term vector of each word by above-mentioned term vector table.
For each word that sentence sequence includes, whether event described in the sentence according to where word and word are life Name entity, is extended the original term vector of word, obtains the corresponding term vector of word, obtains multiple term vectors.
For each word that sentence sequence includes, event described in the sentence according to where word and the word are No is name entity (including being any name entity), is extended to the original term vector of the word, i.e., in the word Increase several elements after original term vector, several increased elements for event described in sentence where indicating word with And the word is any name entity, to be the corresponding term vector of the word by the prime word vector extensions of the word.
After each word that distich sub-series of packets contains all is extended, multiple term vectors are obtained, constitute term vector sequenceWherein, l indicates word quantity, and d indicates the dimension of term vector.
V={ v1,v2,...,vl}
Wherein, v1,v2,...,vlRespectively word w1,w2,...,wlCorresponding term vector.
Whether event described in sentence according to where word of the embodiment of the present invention and word are name entity, to word Original term vector is extended, and obtains the corresponding term vector of word, term vector is enabled to better describe the context of the word, from And more accurate element judging result and case can be obtained according to term vector by analysis result.
Content based on the various embodiments described above obtains each analysis according to coding result, the hidden vector sum correlation matrix of task and appoints Be engaged in corresponding task text vector specific steps include: for each analysis task, it is corresponding according to coding result, analysis task The hidden vector sum correlation matrix of task, obtain the corresponding weight of coding result, and according to the corresponding weight of coding result to coding As a result it is weighted summation, obtains the corresponding task text vector of analysis task.
It is understood that coding result is the second hidden sequence vectorWherein, d1Indicate the second hidden vector Dimension.
H={ h1,h2,...,hl}
Second hidden sequence vector h includes 1 the second hidden vector, i.e., the length of the second hidden sequence vector is with sequence of terms w's Length is identical.
Each hidden vector of task can form task vector sequence u={ u1,u2,...,up};Wherein, uiIndicate i-th of analysis The hidden vector of the corresponding task of task;1≤i≤p;The quantity of p expression analysis task.
For example, the quantity of element prediction task is 10, other analysis tasks further include that case is pre- by prediction task, related law article Survey task and duration prediction task, then p=13.
For i-th of analysis task, according to the hidden vector u of the corresponding task of the analysis taski, the second hidden sequence vector h and Correlation matrix Wa, the corresponding task text vector t of the analysis task can be obtainedi
Specific step is as follows:
Obtain the weight vectors of the analysis taskWeight vectors α from the second hidden sequence vector h each second it is hidden to The weight of amount forms;
The calculation formula of weight is
Wherein, αjIndicate the weight of j-th of second hidden vectors in the second hidden sequence vector h;1≤j≤l;
After the weight vectors α for obtaining the analysis task, is calculated by following formula and obtain ti,
Through the above steps, the corresponding task text vector of each analysis task can be obtained.
The embodiment of the present invention obtains coding result for every based on the degree of correlation between coding result and the hidden vector of task The weight of the hidden vector of one task is weighted summation to coding result according to the corresponding weight of coding result, obtains analysis task Corresponding task text vector can more accurately characterize the feature of each analysis task, to obtain more accurate merit analysis knot Fruit.
Content based on the various embodiments described above, first circulation neural network are long Memory Neural Networks in short-term;Second circulation Neural network is long Memory Neural Networks in short-term.
Specifically, first circulation neural network and second circulation neural network can use gating cycle neural network.
Gating cycle neural network adjusts the structure of network on the basis of simple cycle neural network, joined Door control mechanism, for the transmitting of information in control neural network.The information that door control mechanism can be used to control in memory unit has How much need to retain, how many needs to abandon, new status information again how many need to be saved in memory unit it is medium.This makes Gating cycle neural network can learn the relatively long dependence of span, disappear and gradient explosion without gradient Problem.
Common gating cycle neural network includes that long Memory Neural Networks in short-term and door control cycling element.
Preferably, first circulation neural network and second circulation neural network can use long short-term memory nerve net Network.
Long Memory Neural Networks (Long Short-term Memory, abbreviation LSTM) in short-term are a kind of time recurrent neurals Network is suitable for being spaced and postpone relatively long critical event in processing and predicted time sequence.Long short-term memory nerve net Network is a kind of special gating cycle neural network, and a kind of special Recognition with Recurrent Neural Network.
In general Recognition with Recurrent Neural Network, memory unit does not have the ability of the magnitude of value of scaling information, and therefore, memory is single Member equally regards it for the status information at each moment, and which results in some useless letters are often stored in memory unit Breath, and actually useful information has been squeezed by these useless information and has been gone out.LSTM has exactly been done from this starting point accordingly to be changed Only have a kind of network state different into the Recognition with Recurrent Neural Network with general structure, the state of network is divided into internal shape in LSTM State and two kinds of external status.The external status of LSTM is similar to the state in the Recognition with Recurrent Neural Network of general structure, the i.e. state It is both the output of current time hidden layer and the input of subsequent time hidden layer.Here internal state is then that LSTM is peculiar 's.
It is input gate (input gate), out gate respectively there are three the control unit for being referred to as " door " in LSTM (output gate) and forget door (forget gate), wherein input gate and to forget door be that LSTM can remember and rely on for a long time It is crucial.Input gate determines that how many information of the state of current time network needs to be saved in internal state, and forgets door then Determining past status information, how many needs to abandon.Finally, by out gate determine current time internal state how many Information needs to export to external status.
Learnt by the memory and forgetting status information, the LSTM enable of selectivity than general Recognition with Recurrent Neural Network The dependence at longer time interval.
The embodiment of the present invention can preferably be caught by using long Memory Neural Networks in short-term as first circulation neural network The semantic information for catching sentence forward-backward correlation, by using length, Memory Neural Networks, can be more as second circulation neural network in short-term The dependence between analysis task is captured, well so as to obtain more accurately analysis as a result, improving the accuracy of analysis.
Fig. 2 is the structural schematic diagram according to law merit analytical equipment provided in an embodiment of the present invention.Based on above-mentioned each reality The content of example is applied, as shown in Fig. 2, the device includes data processing module 201, true coding module 202 and task sequence prediction Module 203, in which:
Data processing module 201 is segmented and is named Entity recognition for describing text to case to be analyzed, is obtained Sentence sequence, sequence of events and name entity;
True coding module 202, each word, sequence of events and name entity for including according to sentence sequence, obtains Multiple term vectors encode each term vector using first circulation neural network, and according to coding result, the hidden vector of task Task text vector corresponding with correlation matrix each analysis task of acquisition;Wherein, analysis task includes that element judges task and case By prediction task;Element is to judgement case by relevant multiple science of law elements;Element judges the number of task and the number of element Identical, each element judges that task respectively corresponds a science of law element;The number of the hidden vector of task is identical as the number of analysis task, Each hidden vector of task respectively corresponds an analysis task;
Task sequence prediction module 203, for each element to be judged that the corresponding task text vector of task carries out maximum pond Change, obtains the overall task text vector that element judges task, the whole of task is judged to element using second circulation neural network Body task text vector and case are encoded by the corresponding task text vector of prediction task, and acquisition case is corresponding by prediction task First hidden vector, and case is input to case by prediction model by the corresponding first hidden vector of prediction task, obtain case to be analyzed Part describes the case of text by prediction result;
Wherein, the hidden vector of first circulation neural network, task, correlation matrix, second circulation neural network and case are by predicting Model is all based on and obtains after sample legal documents are trained.
Specifically, data processing module 201 describes text to case to be analyzed and segments, and to the word that participle obtains Language is named Entity recognition, obtains sentence sequence, sequence of events and name entity.
After true coding module 202 obtains sentence sequence, sequence of events and name entity, include for sentence sequence Each word is generated the correlation model of term vector, and binding events sequence and name entity using any, can obtain one Term vector sequence including multiple term vectors;Can use first circulation neural network to each word in the term vector sequence to Amount is encoded, and the semantic information of sentence forward-backward correlation is captured, and coding result is the second hidden sequence vector or the second hidden moment of a vector Battle array;Second hidden sequence vector is mapped to according to the hidden vector sum correlation matrix of task using attention mechanism by different task texts This space obtains the corresponding task text vector of different analysis tasks.
Each element is judged that the corresponding task text vector of task carries out maximum pond by task sequence prediction module 203, is obtained It must want the overall task text vector of plain judgement task;The overall task text vector and case that element is judged task are by prediction times It is engaged in corresponding task text vector, judges that task and case form task sequence by the sequence of prediction task by element, utilize second Recognition with Recurrent Neural Network captures the dependence between each analysis task, judges element the overall task text vector and case of task It is encoded by the corresponding task text vector of prediction task, acquisition case is by the corresponding first hidden vector of prediction task;By case by The corresponding first hidden vector of prediction task is input to case by prediction model, realizes case by the corresponding first hidden vector of prediction task Mapping is appeared in court by the corresponding Label space of prediction task, acquisition case is by prediction result.
Law merit analytical equipment provided in an embodiment of the present invention, the method provided for executing the various embodiments described above of the present invention Merit analysis method is restrained, each module which includes realizes that the specific method of corresponding function and process are detailed in The embodiment of above-mentioned law merit analysis method, details are not described herein again.
The law merit analytical equipment is used for the law merit analysis method of foregoing embodiments.Therefore, in aforementioned each reality The description and definition in the law merit analysis method in example are applied, can be used for the reason of each execution module in the embodiment of the present invention Solution.
The embodiment of the present invention analyzes law merit based on the dependence of law element and case between, can basis Element distinguishes the case of similar charge, and can be suitably used for analyzing whole cases by case facts, and be not limited to common part case By case facts, so as to greatly improve the accuracy of case analysis, and there is higher case coverage rate.
Fig. 3 is the structural block diagram according to electronic equipment provided in an embodiment of the present invention.Content based on the above embodiment, such as Shown in Fig. 3, which may include: processor (processor) 301, memory (memory) 302 and bus 303;Its In, processor 301 and memory 302 pass through bus 303 and complete mutual communication;Processor 301 is stored in for calling In reservoir 302 and the computer program instructions that can be run on processor 301, to execute provided by above-mentioned each method embodiment Law merit analysis method, for example, text is described to case to be analyzed and is segmented and named Entity recognition, obtains sentence Subsequence, sequence of events and name entity;Each word, sequence of events and the name entity for including according to sentence sequence, obtain more A term vector encodes each term vector using first circulation neural network, and according to coding result, the hidden vector sum of task Correlation matrix obtains the corresponding task text vector of each analysis task;Wherein, analysis task include element judge task and case by Prediction task;Element is to judgement case by relevant multiple science of law elements;Element judges the number of task and the number phase of element Together, each element judges that task respectively corresponds a science of law element;The number of the hidden vector of task is identical as the number of analysis task, respectively The hidden vector of task respectively corresponds an analysis task;Each element is judged that the corresponding task text vector of task carries out maximum pond Change, obtains the overall task text vector that element judges task, the whole of task is judged to element using second circulation neural network Body task text vector and case are encoded by the corresponding task text vector of prediction task, and acquisition case is corresponding by prediction task First hidden vector, and case is input to case by prediction model by the corresponding first hidden vector of prediction task, obtain case to be analyzed Part describes the case of text by prediction result;Wherein, the hidden vector of first circulation neural network, task, correlation matrix, second circulation mind Through network and case by prediction model, it is all based on and is obtained after sample legal documents are trained.
Another embodiment of the present invention discloses a kind of computer program product, and computer program product is non-transient including being stored in Computer program on computer readable storage medium, computer program include program instruction, when program instruction is held by computer When row, computer is able to carry out law merit analysis method provided by above-mentioned each method embodiment, for example, to be analyzed Case describe text and segmented and named Entity recognition, obtain sentence sequence, sequence of events and name entity;According to sentence Each word, sequence of events and the name entity that sequence includes, obtain multiple term vectors, using first circulation neural network to each Term vector is encoded, and obtains the corresponding task text of each analysis task according to coding result, the hidden vector sum correlation matrix of task This vector;Wherein, analysis task includes that element judges task and case by prediction task;Element is to judgement case by relevant multiple Science of law element;Element judges that the number of task is identical as the number of element, and each element judges that task respectively corresponds a science of law and wants Element;The number of the hidden vector of task is identical as the number of analysis task, and each hidden vector of task respectively corresponds an analysis task;It will be each Element judges that the corresponding task text vector of task carries out maximum pond, acquisition element judge the overall task text of task to Amount, the overall task text vector for judging task to element using second circulation neural network and case are by prediction task corresponding Business text vector is encoded, and acquisition case is by the corresponding first hidden vector of prediction task, and by case by prediction task corresponding the One hidden vector is input to case by prediction model, obtains case to be analyzed and describes the case of text by prediction result;Wherein, it first follows The hidden vector of ring neural network, task, correlation matrix, second circulation neural network and case are all based on example-based approach by prediction model What rule document obtained after being trained.
In addition, the logical order in above-mentioned memory 302 can be realized by way of SFU software functional unit and conduct Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally The technical solution of the inventive embodiments substantially part of the part that contributes to existing technology or the technical solution in other words It can be embodied in the form of software products, which is stored in a storage medium, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the present invention respectively The all or part of the steps of a embodiment method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk Etc. the various media that can store program code.
Another embodiment of the present invention provides a kind of non-transient computer readable storage medium, non-transient computer readable storages Medium storing computer instruction, computer instruction make computer execute the analysis of law merit provided by above-mentioned each method embodiment Method, for example, text is described to case to be analyzed and is segmented and named Entity recognition, obtains sentence sequence, event Sequence and name entity;Each word, sequence of events and the name entity for including according to sentence sequence, obtain multiple term vectors, benefit Each term vector is encoded with first circulation neural network, and is obtained according to coding result, the hidden vector sum correlation matrix of task Take the corresponding task text vector of each analysis task;Wherein, analysis task includes that element judges task and case by prediction task;It wants Element is to judgement case by relevant multiple science of law elements;Element judges that the number of task is identical as the number of element, and each element is sentenced Disconnected task respectively corresponds a science of law element;The number of the hidden vector of task is identical as the number of analysis task, each hidden vector of task Respectively correspond an analysis task;Each element is judged that the corresponding task text vector of task carries out maximum pond, obtains element The overall task text vector of judgement task, using second circulation neural network to element judge the overall task text of task to Amount is encoded with case by the corresponding task text vector of prediction task, acquisition case by the corresponding first hidden vector of prediction task, And case is input to case by prediction model by the corresponding first hidden vector of prediction task, it obtains case to be analyzed and describes text Case is by prediction result;Wherein, the hidden vector of first circulation neural network, task, correlation matrix, second circulation neural network and case by Prediction model is all based on and obtains after sample legal documents are trained.
The apparatus embodiments described above are merely exemplary, wherein unit can be as illustrated by the separation member Or may not be and be physically separated, component shown as a unit may or may not be physical unit, i.e., It can be located in one place, or may be distributed over multiple network units.It can select according to the actual needs therein Some or all of the modules achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creative labor In the case where dynamic, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Such understanding, above-mentioned skill Substantially the part that contributes to existing technology can be embodied in the form of software products art scheme in other words, the calculating Machine software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used So that a computer equipment (can be personal computer, server or the network equipment etc.) executes above-mentioned each implementation The method of certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of law merit analysis method characterized by comprising
Text is described to case to be analyzed and is segmented and named Entity recognition, obtains sentence sequence, sequence of events and name Entity;
Each word, the sequence of events and the name entity for including according to the sentence sequence, obtain multiple term vectors, benefit Each term vector is encoded with first circulation neural network, and according to coding result, the hidden vector sum Correlation Moment of task Battle array obtains the corresponding task text vector of each analysis task;Wherein, the analysis task includes that element judges task and case by pre- Survey task;Element is to judgement case by relevant multiple science of law elements;The element judges the number of task and the number of element Identical, each element judges that task respectively corresponds the science of law element;The number of the hidden vector of task with described point The number of analysis task is identical, and each hidden vector of task respectively corresponds the analysis task;
Each element is judged that the corresponding task text vector of task carries out maximum pond, the element is obtained and judges task Overall task text vector judges the element overall task text vector and the institute of task using second circulation neural network Case is stated to be encoded by the corresponding task text vector of prediction task, obtain the case from prediction task it is corresponding first it is hidden to Amount, and the case is input to case by prediction model by the corresponding first hidden vector of prediction task, it obtains case to be analyzed and retouches The case of text is stated by prediction result;
Wherein, the first circulation neural network, the hidden vector of the task, the correlation matrix, the second circulation nerve net Network and the case are all based on and are obtained after sample legal documents are trained by prediction model.
2. law merit analysis method according to claim 1, which is characterized in that the analysis task further include: related Law article predicts task and duration prediction task;
Correspondingly, it obtains the element and judges and include: after the overall task text vector of task
The overall task text vector that judges task to the element using second circulation neural network, the case are by prediction task Corresponding task text vector, the related law article prediction corresponding task text vector of task and the duration prediction task pair The task text vector answered is encoded, and obtains the case by prediction task, the related law article prediction task and the duration The corresponding first hidden vector of prediction task;
The case is predicted into the corresponding first hidden vector sum of task by the corresponding first hidden vector of prediction task, the related law article The corresponding first hidden vector of the duration prediction task is separately input into the case by prediction model, related law article prediction model And duration prediction model, it obtains case to be analyzed and describes the case of text by prediction result, related law article prediction result and duration Prediction result;
Wherein, the related law article prediction model and the duration prediction model are all based on the sample legal documents and carry out It is obtained after training.
3. law merit analysis method according to claim 1, which is characterized in that described hidden according to coding result, task Vector sum correlation matrix obtains after the corresponding task text vector of each analysis task, further includes:
Each element is judged into the corresponding task text vector of task, the element is separately input into and judges that task is corresponding Element judgment models obtain the result that the element judges task;
Wherein, each element judges the corresponding element judgment models of task, is all based on the sample legal documents and is instructed It is obtained after white silk.
4. law merit analysis method according to claim 1, which is characterized in that case to be analyzed describe text into Row participle and name Entity recognition, the specific steps for obtaining sentence sequence, sequence of events and name entity include:
Text is described to the case to be analyzed and carries out participle and part-of-speech tagging, stop words is deleted, obtains multiple sentences;It is each Sentence includes several words and the corresponding part of speech of each word;
The multiple sentence is screened according to the triggering vocabulary constructed in advance, retains and describes material facts related with case Sentence, form the sentence sequence;
The word and the corresponding word of word for including according to each sentence in default rule, syntax dependence, the sentence sequence Property, it obtains the case to be analyzed and describes several events and each name entity that text describes, it will several described events According to the sequencing of Time To Event, the sequence of events is formed.
5. law merit analysis method according to claim 1, which is characterized in that according to the sentence sequence include it is each Word, the sequence of events and the name entity, the specific steps for obtaining multiple term vectors include:
Each word for including by the sentence sequence is carried out according to the sequencing of each Time To Event in the sequence of events Splicing obtains sequence of terms;
The sequence of terms is mapped according to the term vector table that pre-training obtains, obtain that the sentence sequence includes is each The original term vector of word;
For each word that the sentence sequence includes, according to event and the word described in sentence where the word Whether it is the name entity, the original term vector of the word is extended, the corresponding term vector of the word is obtained, obtains Take the multiple term vector.
6. law merit analysis method according to claim 1, which is characterized in that described hidden according to coding result, task The specific steps that vector sum correlation matrix obtains the corresponding task text vector of each analysis task include:
For each analysis task, according to the coding result, the hidden vector sum of the corresponding task of the analysis task Correlation matrix obtains the corresponding weight of the coding result, and is tied according to the corresponding weight of the coding result to the coding Fruit is weighted summation, obtains the corresponding task text vector of the analysis task.
7. law merit analysis method according to any one of claims 1 to 6, which is characterized in that the first circulation nerve Network is long Memory Neural Networks in short-term;The second circulation neural network is long Memory Neural Networks in short-term.
8. a kind of law merit analytical equipment characterized by comprising
Data processing module is segmented and is named Entity recognition for describing text to case to be analyzed, obtains sentence sequence Column, sequence of events and name entity;
True coding module, each word, the sequence of events and the name entity for including according to the sentence sequence, Obtain multiple term vectors, each term vector encoded using first circulation neural network, and according to coding result, appoint The corresponding task text vector of each analysis task of hidden vector sum correlation matrix acquisition of being engaged in;Wherein, the analysis task includes element Judgement task and case are by prediction task;Element is to judgement case by relevant multiple science of law elements;The element judges task Number is identical as the number of element, and each element judges that task respectively corresponds the science of law element;The task it is hidden to The number of amount is identical as the number of the analysis task, and each hidden vector of task respectively corresponds the analysis task;
Task sequence prediction module, for each element to be judged that the corresponding task text vector of task carries out maximum pond, The overall task text vector that the element judges task is obtained, task is judged to the element using second circulation neural network Overall task text vector and the case encoded by the corresponding task text vector of prediction task, obtain the case by pre- The corresponding first hidden vector of survey task, and the case is input to case by prediction mould by the corresponding first hidden vector of prediction task Type obtains case to be analyzed and describes the case of text by prediction result;
Wherein, the first circulation neural network, the hidden vector of the task, the correlation matrix, the second circulation nerve net Network and the case are all based on and are obtained after sample legal documents are trained by prediction model.
9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor realizes law as described in any one of claim 1 to 7 when executing described program The step of merit analysis method.
10. a kind of non-transient computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer The step of law merit analysis method as described in any one of claim 1 to 7 is realized when program is executed by processor.
CN201910379141.1A 2019-05-08 2019-05-08 Legal case analysis method and device Active CN110276068B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910379141.1A CN110276068B (en) 2019-05-08 2019-05-08 Legal case analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910379141.1A CN110276068B (en) 2019-05-08 2019-05-08 Legal case analysis method and device

Publications (2)

Publication Number Publication Date
CN110276068A true CN110276068A (en) 2019-09-24
CN110276068B CN110276068B (en) 2020-08-28

Family

ID=67959767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910379141.1A Active CN110276068B (en) 2019-05-08 2019-05-08 Legal case analysis method and device

Country Status (1)

Country Link
CN (1) CN110276068B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928987A (en) * 2019-10-18 2020-03-27 平安科技(深圳)有限公司 Legal provision retrieval method based on neural network hybrid model and related equipment
CN111325387A (en) * 2020-02-13 2020-06-23 清华大学 Interpretable law automatic decision prediction method and device
CN111382333A (en) * 2020-03-11 2020-07-07 昆明理工大学 Case element extraction method in news text sentence based on case correlation joint learning and graph convolution
CN111460834A (en) * 2020-04-09 2020-07-28 北京北大软件工程股份有限公司 French semantic annotation method and device based on L STM network
CN111523313A (en) * 2020-07-03 2020-08-11 支付宝(杭州)信息技术有限公司 Model training and named entity recognition method and device
CN111552808A (en) * 2020-04-20 2020-08-18 北京北大软件工程股份有限公司 Administrative illegal case law prediction method and tool based on convolutional neural network
CN111797221A (en) * 2020-06-16 2020-10-20 北京北大软件工程股份有限公司 Similar case recommendation method and device
CN112100212A (en) * 2020-09-04 2020-12-18 中国航天科工集团第二研究院 Case scenario extraction method based on machine learning and rule matching
CN113157880A (en) * 2021-03-25 2021-07-23 科大讯飞股份有限公司 Element content obtaining method, device, equipment and storage medium
US11256856B2 (en) 2017-10-17 2022-02-22 Handycontract Llc Method, device, and system, for identifying data elements in data structures
US11475209B2 (en) 2017-10-17 2022-10-18 Handycontract Llc Device, system, and method for extracting named entities from sectioned documents

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239445A (en) * 2017-05-27 2017-10-10 中国矿业大学 The method and system that a kind of media event based on neutral net is extracted
CN107818138A (en) * 2017-09-28 2018-03-20 银江股份有限公司 A kind of case legal regulation recommends method and system
CN108009284A (en) * 2017-12-22 2018-05-08 重庆邮电大学 Using the Law Text sorting technique of semi-supervised convolutional neural networks
CN108304911A (en) * 2018-01-09 2018-07-20 中国科学院自动化研究所 Knowledge Extraction Method and system based on Memory Neural Networks and equipment
WO2018147653A1 (en) * 2017-02-08 2018-08-16 사회복지법인 삼성생명공익재단 Method, device and computer program for generating survival rate prediction model
CN109308355A (en) * 2018-09-17 2019-02-05 清华大学 Legal decision prediction of result method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018147653A1 (en) * 2017-02-08 2018-08-16 사회복지법인 삼성생명공익재단 Method, device and computer program for generating survival rate prediction model
CN107239445A (en) * 2017-05-27 2017-10-10 中国矿业大学 The method and system that a kind of media event based on neutral net is extracted
CN107818138A (en) * 2017-09-28 2018-03-20 银江股份有限公司 A kind of case legal regulation recommends method and system
CN108009284A (en) * 2017-12-22 2018-05-08 重庆邮电大学 Using the Law Text sorting technique of semi-supervised convolutional neural networks
CN108304911A (en) * 2018-01-09 2018-07-20 中国科学院自动化研究所 Knowledge Extraction Method and system based on Memory Neural Networks and equipment
CN109308355A (en) * 2018-09-17 2019-02-05 清华大学 Legal decision prediction of result method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘宗林 等: "《融入罪名关键词的法律判决预测多任务学习模型》", 《清华大学学报(自然科学版)》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11475209B2 (en) 2017-10-17 2022-10-18 Handycontract Llc Device, system, and method for extracting named entities from sectioned documents
US11256856B2 (en) 2017-10-17 2022-02-22 Handycontract Llc Method, device, and system, for identifying data elements in data structures
WO2021072892A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Legal provision search method based on neural network hybrid model, and related device
CN110928987A (en) * 2019-10-18 2020-03-27 平安科技(深圳)有限公司 Legal provision retrieval method based on neural network hybrid model and related equipment
CN110928987B (en) * 2019-10-18 2023-07-25 平安科技(深圳)有限公司 Legal provision retrieval method and related equipment based on neural network hybrid model
CN111325387A (en) * 2020-02-13 2020-06-23 清华大学 Interpretable law automatic decision prediction method and device
CN111325387B (en) * 2020-02-13 2023-08-18 清华大学 Interpretable law automatic decision prediction method and device
CN111382333A (en) * 2020-03-11 2020-07-07 昆明理工大学 Case element extraction method in news text sentence based on case correlation joint learning and graph convolution
CN111382333B (en) * 2020-03-11 2022-06-21 昆明理工大学 Case element extraction method in news text sentence based on case correlation joint learning and graph convolution
CN111460834A (en) * 2020-04-09 2020-07-28 北京北大软件工程股份有限公司 French semantic annotation method and device based on L STM network
CN111460834B (en) * 2020-04-09 2023-06-06 北京北大软件工程股份有限公司 French semantic annotation method and device based on LSTM network
CN111552808A (en) * 2020-04-20 2020-08-18 北京北大软件工程股份有限公司 Administrative illegal case law prediction method and tool based on convolutional neural network
CN111797221A (en) * 2020-06-16 2020-10-20 北京北大软件工程股份有限公司 Similar case recommendation method and device
CN111797221B (en) * 2020-06-16 2023-12-08 北京北大软件工程股份有限公司 Similar case recommending method and device
CN111523313A (en) * 2020-07-03 2020-08-11 支付宝(杭州)信息技术有限公司 Model training and named entity recognition method and device
CN111523313B (en) * 2020-07-03 2020-09-29 支付宝(杭州)信息技术有限公司 Model training and named entity recognition method and device
CN112100212A (en) * 2020-09-04 2020-12-18 中国航天科工集团第二研究院 Case scenario extraction method based on machine learning and rule matching
CN113157880B (en) * 2021-03-25 2023-01-17 科大讯飞股份有限公司 Element content obtaining method, device, equipment and storage medium
CN113157880A (en) * 2021-03-25 2021-07-23 科大讯飞股份有限公司 Element content obtaining method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110276068B (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN110276068A (en) Law merit analysis method and device
CN110825901A (en) Image-text matching method, device and equipment based on artificial intelligence and storage medium
ALRashdi et al. Deep learning and word embeddings for tweet classification for crisis response
CN108197098A (en) A kind of generation of keyword combined strategy and keyword expansion method, apparatus and equipment
CN111209384A (en) Question and answer data processing method and device based on artificial intelligence and electronic equipment
CN108335693B (en) Language identification method and language identification equipment
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
CN111522987A (en) Image auditing method and device and computer readable storage medium
CN111563158B (en) Text ranking method, ranking apparatus, server and computer-readable storage medium
CN110457585B (en) Negative text pushing method, device and system and computer equipment
CN112507912B (en) Method and device for identifying illegal pictures
CN110188195A (en) A kind of text intension recognizing method, device and equipment based on deep learning
CN108229527A (en) Training and video analysis method and apparatus, electronic equipment, storage medium, program
Pardos et al. Imputing KCs with representations of problem content and context
Altadmri et al. A framework for automatic semantic video annotation: Utilizing similarity and commonsense knowledge bases
CN108268629A (en) Image Description Methods and device, equipment, medium, program based on keyword
CN108229170A (en) Utilize big data and the software analysis method and device of neural network
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
CN110287314A (en) Long text credibility evaluation method and system based on Unsupervised clustering
CN111985207A (en) Method and device for acquiring access control policy and electronic equipment
CN114372532A (en) Method, device, equipment, medium and product for determining label marking quality
CN112818212B (en) Corpus data acquisition method, corpus data acquisition device, computer equipment and storage medium
Sethi et al. Large-scale multimedia content analysis using scientific workflows
CN115329176A (en) Search request processing method and device, computer equipment and storage medium
O'Keefe et al. Deep learning and word embeddings for tweet classification for crisis response

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant