Nothing Special   »   [go: up one dir, main page]

CN111460834A - French semantic annotation method and device based on L STM network - Google Patents

French semantic annotation method and device based on L STM network Download PDF

Info

Publication number
CN111460834A
CN111460834A CN202010273691.8A CN202010273691A CN111460834A CN 111460834 A CN111460834 A CN 111460834A CN 202010273691 A CN202010273691 A CN 202010273691A CN 111460834 A CN111460834 A CN 111460834A
Authority
CN
China
Prior art keywords
text
analyzed
words
stm
labels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010273691.8A
Other languages
Chinese (zh)
Other versions
CN111460834B (en
Inventor
莫同
李雨萌
骆旭辉
刘亚亭
张艺璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Peking University Software Engineering Co ltd
Original Assignee
Beijing Peking University Software Engineering Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Peking University Software Engineering Co ltd filed Critical Beijing Peking University Software Engineering Co ltd
Priority to CN202010273691.8A priority Critical patent/CN111460834B/en
Publication of CN111460834A publication Critical patent/CN111460834A/en
Application granted granted Critical
Publication of CN111460834B publication Critical patent/CN111460834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a method and a device for labeling legal entry semantics based on an L STM network, which comprises the steps of obtaining a text to be analyzed, analyzing the text to be analyzed to obtain all words of the text to be analyzed and part-of-speech labels corresponding to the words, converting the words into D-dimension word vectors, inputting the D-dimension word vectors into a fully-connected neural network to obtain feature codes, comparing the part-of-speech labels of the text to be analyzed with the part-of-speech labels of the text in a preset database to obtain a best-matched text, obtaining a final vector representation, inputting the final vector representation into the fully-connected neural network, and outputting semantic role labels of each word in the text to be analyzed.

Description

French semantic annotation method and device based on L STM network
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a French semantic annotation method and device based on L STM network.
Background
The existing methods for shallow semantic analysis such as semantic role labeling mostly need to combine with a certain degree of syntactic analysis or manually extracted features, and in the process of semantic analysis, certain error rate exists in the syntactic analysis, so that errors occur in the subsequent semantic analysis result. Semantic role labeling tasks in natural language processing have a plurality of technical problems. In recent years, with the rapid development of deep learning technology, the semantic role labeling effect of English and Chinese is greatly improved, and a good effect is achieved on data sets in multiple language fields.
However, as the number of cases and laws is increased in the judicial field, great pressure is brought to personnel working in relevant works of laws, even professional lawyers are difficult to be familiar with all legal laws, and a great amount of time and energy are needed in the process of acquiring relevant contents of cases from a large amount of legal texts, and the working efficiency is low. Therefore, assisting the work of the relevant practitioners through artificial intelligence becomes an urgent problem to be solved.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for labeling legal entry semantics based on an L STM network to solve the problems in the prior art that a large amount of time and effort are required to obtain case-related content from a large amount of legal texts and the work efficiency is low.
In order to achieve the purpose, the invention adopts the following technical scheme that the French semantic annotation method based on L STM network comprises the following steps:
acquiring a text and preprocessing the text to acquire the text to be analyzed;
analyzing the text to be analyzed to obtain all words of the text to be analyzed and part-of-speech labels corresponding to the words, converting all the words into D-dimensional word vectors by adopting a word vector model, and inputting all the D-dimensional word vectors into a fully-connected neural network to obtain the feature codes of all the words;
comparing part-of-speech labels of the text to be analyzed with part-of-speech labels of texts in a preset database to obtain a best matching text in the preset database, and vectorizing semantic role labels of the best matching text and position information corresponding to the semantic role labels to obtain a feature vector;
compounding the feature codes and the feature vectors to obtain final vector representation;
and inputting the final vector representation into a fully-connected neural network, and outputting the semantic role labels of each word in the text to be analyzed.
Further, the acquiring a text and preprocessing the text to acquire a text to be analyzed includes:
carrying out standardized processing on the text to obtain a text to be analyzed in a standard data input form; the standard text to be analyzed in the data input form is a text with a specified central predicate.
Further, the center predicate includes:
administrative subject, administrative relatives, time, place.
Further, the analyzing the text to be analyzed to obtain all words of the text to be analyzed and part-of-speech tags corresponding to the words includes:
splitting the text to be analyzed according to a legal dictionary by adopting a Chinese word segmentation tool and a part of speech tagging tool;
and acquiring all words of the analysis text and part-of-speech labels corresponding to the words.
Further, inputting all the D-dimensional word vectors into a fully-connected neural network to obtain feature codes of all the words, including:
sequentially inputting all the D-dimensional word vectors into a fully-connected neural network, wherein the fully-connected neural network is provided with a feature encoder, and the feature encoder comprises a bidirectional L STM with 4 layers of stacks, and comprises a first layer L STM, a second layer L STM, a third layer L STM and a fourth layer L STM;
the first layer L STM is encoded with the D-dimensional word vectors as input, then the input to each layer L STM is the output of the previous layer, and the fourth layer L STM outputs feature encoding.
Further, the comparing the part-of-speech tag of the text to be analyzed with the part-of-speech tag of the text in a preset database to obtain the best matching text in the preset database includes:
matching character strings to two sides by taking the central predicate as the center of part-of-speech labels of the text to be analyzed and part-of-speech labels of the text in a preset database;
and calculating the matching degree according to the matching length of the character string to obtain the best matching text.
Further, the semantic role labels of the best matching texts and the position information corresponding to the semantic role labels are vectorized to obtain feature vectors,
vectorizing the semantic role annotation of the best matching text to obtain a first vector representation;
vectorizing the distance between the semantic role label and the central predicate to obtain a second vector representation;
the first vector representation and the second vector representation are combined into a feature vector.
Further, the inputting the final vector representation into a fully-connected neural network and outputting a semantic role label of each word in the text to be analyzed includes:
inputting the final vector into a fully-connected neural network, wherein a softmax layer is arranged in the fully-connected neural network, semantic role labeling is carried out on each word by adopting a softmax classifier on the softmax layer, and the softmax layer outputs the semantic role labeling.
Further, the word vector model includes:
word2vec language model, glove language model, or BERT language model.
The embodiment of the application provides a french semantic annotation device based on L STM network, includes:
the preprocessing module is used for acquiring a text and preprocessing the text to acquire the text to be analyzed;
the first processing module is used for analyzing and processing the text to be analyzed so as to obtain all words of the text to be analyzed and part-of-speech labels corresponding to the words, converting all the words into D-dimensional word vectors by adopting a word vector model, and inputting all the D-dimensional word vectors into a fully-connected neural network to obtain feature codes of all the words;
the second processing module is used for comparing part-of-speech labels of the text to be analyzed with part-of-speech labels of texts in a preset database to obtain a best matching text in the preset database, and vectorizing semantic role labels of the best matching text and position information corresponding to the semantic role labels to obtain a feature vector;
an obtaining module, configured to compound the feature code and the feature vector to obtain a final vector representation;
and the output module is used for inputting the final vector representation into a fully-connected neural network and outputting the semantic role labels of each word in the text to be analyzed.
By adopting the technical scheme, the invention can achieve the following beneficial effects:
the method comprises the steps of firstly vectorizing a French text and predicting part-of-speech tagging results, secondly calculating the most similar French in a database based on the part-of-speech tagging results to obtain vector representation of semantic character tagging of the French, and finally inputting data into an L STM network to obtain semantic character tagging of each word.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating steps of a French semantic annotation method based on L STM network according to the present invention;
FIG. 2 is a schematic structural diagram of a French semantic annotation device based on L STM network.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
A specific French semantic annotation method based on the L STM network provided in the embodiment of the present application is described below with reference to the accompanying drawings.
As shown in fig. 1, the method for labeling French semantics based on L STM network provided in this embodiment of the present application includes:
s101, acquiring a text and preprocessing the text to acquire the text to be analyzed;
the method is mainly applied to a worker to look up legal provisions, firstly, the French text is obtained, and the French text is preprocessed, wherein the preprocessing is to standardize the text, the text is processed to obtain a standard data input form, namely, a center predicate in each input text is appointed, and the text with the appointed center predicate is the text to be analyzed.
S102, analyzing the text to be analyzed to obtain all words of the text to be analyzed and part-of-speech labels corresponding to the words, converting all the words into D-dimensional word vectors by adopting a word vector model, and inputting all the D-dimensional word vectors into a fully-connected neural network to obtain feature codes of all the words;
the method comprises the steps of splitting an obtained text to be analyzed into words, namely splitting the text into a plurality of words, simultaneously forming part-of-speech labels corresponding to the words, vectorizing the words by adopting a word vector model, converting the words into D-dimensional word vectors, inputting the D-dimensional word vectors into a fully-connected neural network for training, and obtaining feature codes of the words. The D-dimensional word vector represents a chinese word using a vector with dimension D.
The word recognition method comprises the following steps of obtaining a word vector model, and carrying out word recognition on the word vector model, wherein the word vector model is obtained by using a conventional word vector model, and the word vector model is not required to be subjected to special requirements.
S103, comparing part-of-speech labels of the text to be analyzed with part-of-speech labels of the text in a preset database to obtain a best matching text in the preset database, and vectorizing semantic role labels of the best matching text and position information corresponding to the semantic role labels to obtain a feature vector;
the method comprises the steps of presetting a database, arranging a French text in the database, carrying out part-of-speech tagging on a French in the database, comparing the part-of-speech tagging in a given text to be analyzed with the part-of-speech tagging on the French text in the database, finding out the French text in the database with the highest matching degree as the best matching text, vectorizing position information corresponding to semantic role tagging and semantic role tagging in the best matching text, and obtaining a feature vector.
S104, compounding the feature codes and the feature vectors to obtain final vector representation;
and (5) splicing and compounding the feature code of each word obtained in the step (S102) and the feature vector of the word corresponding to the best matching text obtained in the step (S103) to obtain final vector representation.
And S105, inputting the final vector representation into a fully-connected neural network, and outputting the semantic role label of each word in the text to be analyzed.
And inputting the final vector identification into the fully-connected neural network, and identifying and finally outputting semantic role labels of each word in the text to be analyzed through a softmax classifier in the fully-connected neural network.
The title semantic annotation method based on L STM network comprises vectorizing title text and predicting part-of-speech annotation result, calculating the most similar title text in database based on part-of-speech annotation result to obtain vector representation of title semantic role annotation, and inputting data into L STM network to obtain semantic role annotation of each word.
In some embodiments, obtaining a text and preprocessing the text to obtain a text to be analyzed includes:
carrying out standardized processing on the text to obtain a text to be analyzed in a standard data input form; the standard text to be analyzed in the form of data input is the text with a specified central predicate.
Preferably, the central predicate includes:
administrative subject, administrative relatives, time, place.
In some embodiments, analyzing the text to be analyzed to obtain all words of the text to be analyzed and part-of-speech tags corresponding to the words includes:
splitting a text to be analyzed according to a legal dictionary by adopting a Chinese word segmentation tool and a part-of-speech tagging tool;
and acquiring all words of the analyzed text and part-of-speech labels corresponding to the words.
Specifically, a Chinese word segmentation tool is adopted to segment the French text to obtain all words in the text, and a part-of-speech tagging tool is adopted to respectively tag the parts of speech of all the words obtained from the clockwork text to obtain part-of-speech tags corresponding to the words. And the text to be analyzed is split according to the legal dictionary to obtain related words in the legal dictionary. For example: administrative body, administrative relatives, time, place, and other semantic roles.
It should be noted that the chinese word segmentation tool and the part-of-speech tagging tool used in the present application are both in the prior art, and are not described herein again.
In some embodiments, inputting all D-dimensional word vectors into a fully-connected neural network to obtain feature codes of all words includes:
sequentially inputting all D-dimensional word vectors into a fully-connected neural network, wherein the fully-connected neural network is provided with a feature encoder, and the feature encoder comprises 4 layers of stacked bidirectional L STMs, including a first layer L STM, a second layer L STM, a third layer L STM and a fourth layer L STM;
the first layer L STM is encoded with D-dimensional word vectors as input, then the input to each layer L STM is the output of the previous layer, and the fourth layer L STM outputs feature encoding.
Specifically, all D-dimensional word vectors are sequentially input into a feature encoder formed by a bidirectional L STM structure, the feature encoder is formed by 4 stacked bidirectional L STMs and comprises a first layer L0 STM, a second layer L STM, a third layer L STM and a fourth layer L STM, the first layer L STM uses the D-dimensional vectors as input for encoding, the second layer L STM uses the output of the first layer L STM as input, the input of each layer L STM is the output of the previous layer, and finally, the fourth layer L outputs feature encoding W32 STMiTo improve the gradient disappearance phenomenon that occurs with multilayer L STM structures, a highway L STM structure was introduced in this application.
In some embodiments, comparing the part-of-speech tag of the text to be analyzed with the part-of-speech tag of the text in the preset database to obtain the best matching text in the preset database includes:
matching character strings to two sides by taking the central predicate as the center of part-of-speech labels of the text to be analyzed and part-of-speech labels of the text in a preset database;
and calculating the matching degree according to the matching length of the character string to obtain the best matching text.
Preferably, the semantic role labels of the best matching text and the position information corresponding to the semantic role labels are vectorized to obtain a characteristic vector,
vectorizing the semantic role label of the best matching text to obtain a first vector representation;
vectorizing the distance between the semantic role label and the central predicate to obtain a second vector representation;
the first vector representation and the second vector representation are combined into a feature vector.
Specifically, a longest character string matching method is used, a central predicate V is used as a center, the length of a character string matched to the two sides is L i, and the best matching text S is obtainedsim
Ssim=argmax(Li)
And vectorizing the semantic role labeling result of the best matching text Ssim to obtain the vector representation of the best matching text. Specifically, vectorization is performed on semantic role labeling results in the best matching text to obtain dim1 dimensional vector representation Rsim. And simultaneously, coding the relative distance between each semantic role and the central predicate to obtain a dim 2-dimensional vector representation PEsim. And splicing the vectors Rsim and PEsim with the obtained feature codes of the text to be analyzed to obtain the final vector representation.
In some embodiments, inputting the final vector representation into a fully-connected neural network, and outputting a semantic character label for each word in the text to be analyzed, includes:
and inputting the final vector into a fully-connected neural network, wherein a softmax layer is arranged in the fully-connected neural network, semantic role labeling is carried out on each word by adopting a softmax classifier in the softmax layer, and the softmax layer outputs the semantic role labeling.
Specifically, the output Wi of the last layer of bidirectional L STM is spliced with the vectors Rsim and PEsim obtained in the step S103 to obtain a final vector representation [ Wi; Rsim; PEsim ], and after the final vector representation [ Wi; Rsim; PEsim ] is input into a fully-connected neural network, a multi-classification result is obtained through a Softmax layer, wherein the output of the Softmax layer is the semantic role mark of each word in the text to be analyzed relative to a given predicate.
Preferably, the word vector model provided in the present application includes:
word2vec language model, glove language model, or BERT language model.
As shown in fig. 2, the present application provides a french semantic labeling apparatus based on L STM network, including:
the preprocessing module 201 is configured to acquire a text and preprocess the text to acquire a text to be analyzed;
the first processing module 202 is configured to analyze a text to be analyzed to obtain all words of the text to be analyzed and part-of-speech tags corresponding to the words, convert all the words into D-dimensional word vectors by using a word vector model, and input all the D-dimensional word vectors into a fully-connected neural network to obtain feature codes of all the words;
the second processing module 203 is configured to compare part-of-speech tags of the text to be analyzed with part-of-speech tags of the text in the preset database to obtain a best-matching text in the preset database, and vectorize semantic role tags of the best-matching text and position information corresponding to the semantic role tags to obtain feature vectors;
an obtaining module 204, configured to compound the feature codes and the feature vectors to obtain a final vector representation;
and the output module 205 is configured to input the final vector representation into a fully-connected neural network, and output a semantic role label of each word in the text to be analyzed.
The operating principle of the L STM network-based legal notation device is that a preprocessing module 201 obtains a text and preprocesses the text to obtain the text to be analyzed, a first processing module 202 analyzes the text to be analyzed to obtain all words and part-of-speech labels corresponding to the words of the text to be analyzed, a word vector model is adopted to convert all the words into D-dimensional word vectors, all the D-dimensional word vectors are input into a fully-connected neural network to obtain feature codes of all the words, a second processing module 203 compares the part-of-speech labels of the text to be analyzed with the part-of-speech labels of the text in a preset database to obtain a most-matched text in the preset database, the semantic label roles of the most-matched text and the position information corresponding to the semantic label roles are vectorized to obtain the feature vectors, an obtaining module 204 compounds the feature codes and the feature vectors to obtain a final vector representation, and an output module 205 inputs the final vector representation into the fully-connected neural network to output the role labels of each word in the text to be analyzed.
The invention provides a method and a device for semantic annotation of a law bar based on an L STM network, which comprises the steps of obtaining a text and preprocessing the text to obtain the text to be analyzed, analyzing the text to be analyzed to obtain all words of the text to be analyzed and part-of-speech annotations corresponding to the words, converting all the words into D-dimensional word vectors by adopting a word vector model, inputting all the D-dimensional word vectors into a fully-connected neural network to obtain feature codes of all the words, comparing the part-of-speech annotations of the text to be analyzed with the part-of-speech annotations of the text in a preset database to obtain a best-matched text in the preset database, vectorizing position information corresponding to the semantic role annotations and the semantic role annotations of the best-matched text to obtain feature vectors, compounding the feature codes and the feature vectors to obtain a final vector representation, inputting the final vector representation into the fully-connected neural network, outputting the semantic annotations of each word in the text to be analyzed, automatically analyzing elements of a performer, a receiver, time, a place and the like in the law bar, and providing efficient working efficiency for higher-level working staff.
It is to be understood that the apparatus embodiments provided above correspond to the method embodiments described above, and corresponding specific contents may be referred to each other, which are not described herein again.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A French semantic annotation method based on L STM network is characterized by comprising the following steps:
acquiring a text and preprocessing the text to acquire the text to be analyzed;
analyzing the text to be analyzed to obtain all words of the text to be analyzed and part-of-speech labels corresponding to the words, converting all the words into D-dimensional word vectors by adopting a word vector model, and inputting all the D-dimensional word vectors into a fully-connected neural network to obtain the feature codes of all the words;
comparing part-of-speech labels of the text to be analyzed with part-of-speech labels of texts in a preset database to obtain a best matching text in the preset database, and vectorizing semantic role labels of the best matching text and position information corresponding to the semantic role labels to obtain a feature vector;
compounding the feature codes and the feature vectors to obtain final vector representation;
and inputting the final vector representation into a fully-connected neural network, and outputting the semantic role labels of each word in the text to be analyzed.
2. The method of claim 1, wherein the obtaining and pre-processing the text to obtain the text to be analyzed comprises:
carrying out standardized processing on the text to obtain a text to be analyzed in a standard data input form; the standard text to be analyzed in the data input form is a text with a specified central predicate.
3. The method of claim 2, wherein the center predicate comprises:
administrative subject, administrative relatives, time, place.
4. The method according to claim 1, wherein the analyzing the text to be analyzed to obtain all words of the text to be analyzed and part-of-speech tags corresponding to the words comprises:
splitting the text to be analyzed according to a legal dictionary by adopting a Chinese word segmentation tool and a part of speech tagging tool;
and acquiring all words of the analysis text and part-of-speech labels corresponding to the words.
5. The method of claim 1, wherein inputting all the D-dimensional vectors into a fully-connected neural network to obtain feature codes of all the words comprises:
sequentially inputting all the D-dimensional vectors into a fully-connected neural network, wherein the fully-connected neural network is provided with a feature encoder, and the feature encoder comprises a bidirectional L STM with 4 layers of stacks, and comprises a first layer L STM, a second layer L STM, a third layer L STM and a fourth layer L STM;
the first layer L STM is encoded with the D-dimensional vectors as input, then the input to each layer L STM is the output of the previous layer, and the fourth layer L STM outputs feature encoding.
6. The method of claim 2, wherein the comparing the part-of-speech tag of the text to be analyzed with the part-of-speech tag of the text in a preset database to obtain the best matching text in the preset database comprises:
matching character strings to two sides by taking the central predicate as the center of part-of-speech labels of the text to be analyzed and part-of-speech labels of the text in a preset database;
and calculating the matching degree according to the matching length of the character string to obtain the best matching text.
7. The method of claim 6, wherein the semantic character label of the best matching text and the position information corresponding to the semantic character label are vectorized to obtain a feature vector,
vectorizing the semantic role annotation of the best matching text to obtain a first vector representation;
vectorizing the distance between the semantic role label and the central predicate to obtain a second vector representation;
the first vector representation and the second vector representation are combined into a feature vector.
8. The method of claim 1, wherein inputting the final vector representation into a fully-connected neural network and outputting a semantic role label for each word in the text to be analyzed comprises:
inputting the final vector into a fully-connected neural network, wherein a softmax layer is arranged in the fully-connected neural network, semantic role labeling is carried out on each word by adopting a softmax classifier on the softmax layer, and the softmax layer outputs the semantic role labeling.
9. The method of any of claims 1 to 8, wherein the word vector model comprises:
word2vec language model, glove language model, or BERT language model.
10. A French semantic annotation device based on L STM network, characterized by comprising:
the preprocessing module is used for acquiring a text and preprocessing the text to acquire the text to be analyzed;
the first processing module is used for analyzing and processing the text to be analyzed so as to obtain all words of the text to be analyzed and part-of-speech labels corresponding to the words, converting all the words into D-dimensional word vectors by adopting a word vector model, and inputting all the D-dimensional word vectors into a fully-connected neural network to obtain feature codes of all the words;
the second processing module is used for comparing part-of-speech labels of the text to be analyzed with part-of-speech labels of texts in a preset database to obtain a best matching text in the preset database, and vectorizing semantic role labels of the best matching text and position information corresponding to the semantic role labels to obtain a feature vector;
an obtaining module, configured to compound the feature code and the feature vector to obtain a final vector representation;
and the output module is used for inputting the final vector representation into a fully-connected neural network and outputting the semantic role labels of each word in the text to be analyzed.
CN202010273691.8A 2020-04-09 2020-04-09 French semantic annotation method and device based on LSTM network Active CN111460834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010273691.8A CN111460834B (en) 2020-04-09 2020-04-09 French semantic annotation method and device based on LSTM network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010273691.8A CN111460834B (en) 2020-04-09 2020-04-09 French semantic annotation method and device based on LSTM network

Publications (2)

Publication Number Publication Date
CN111460834A true CN111460834A (en) 2020-07-28
CN111460834B CN111460834B (en) 2023-06-06

Family

ID=71681233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010273691.8A Active CN111460834B (en) 2020-04-09 2020-04-09 French semantic annotation method and device based on LSTM network

Country Status (1)

Country Link
CN (1) CN111460834B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177406A (en) * 2021-04-23 2021-07-27 珠海格力电器股份有限公司 Text processing method and device, electronic equipment and computer readable medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014110980A1 (en) * 2013-01-21 2014-07-24 Liu Shugen Ideographical member identification and extraction method and machine-translation and manual-correction interactive translation method based on ideographical members
CN105894088A (en) * 2016-03-25 2016-08-24 苏州赫博特医疗信息科技有限公司 Medical information extraction system and method based on depth learning and distributed semantic features
CN106202010A (en) * 2016-07-12 2016-12-07 重庆兆光科技股份有限公司 The method and apparatus building Law Text syntax tree based on deep neural network
CN109767758A (en) * 2019-01-11 2019-05-17 中山大学 Vehicle-mounted voice analysis method, system, storage medium and equipment
CN110276068A (en) * 2019-05-08 2019-09-24 清华大学 Law merit analysis method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014110980A1 (en) * 2013-01-21 2014-07-24 Liu Shugen Ideographical member identification and extraction method and machine-translation and manual-correction interactive translation method based on ideographical members
CN105894088A (en) * 2016-03-25 2016-08-24 苏州赫博特医疗信息科技有限公司 Medical information extraction system and method based on depth learning and distributed semantic features
CN106202010A (en) * 2016-07-12 2016-12-07 重庆兆光科技股份有限公司 The method and apparatus building Law Text syntax tree based on deep neural network
CN109767758A (en) * 2019-01-11 2019-05-17 中山大学 Vehicle-mounted voice analysis method, system, storage medium and equipment
CN110276068A (en) * 2019-05-08 2019-09-24 清华大学 Law merit analysis method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱鹏飞: "基于Bi-LSTM的汉语自动语义角色标注研究" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177406A (en) * 2021-04-23 2021-07-27 珠海格力电器股份有限公司 Text processing method and device, electronic equipment and computer readable medium
CN113177406B (en) * 2021-04-23 2023-07-07 珠海格力电器股份有限公司 Text processing method, text processing device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN111460834B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN112163416B (en) Event joint extraction method for merging syntactic and entity relation graph convolution network
CN113743119A (en) Chinese named entity recognition module, method and device and electronic equipment
CN110046356B (en) Label-embedded microblog text emotion multi-label classification method
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN115098634B (en) Public opinion text emotion analysis method based on semantic dependency relationship fusion characteristics
CN111984780A (en) Multi-intention recognition model training method, multi-intention recognition method and related device
CN116775872A (en) Text processing method and device, electronic equipment and storage medium
CN113157918A (en) Commodity name short text classification method and system based on attention mechanism
CN113934909A (en) Financial event extraction method based on pre-training language and deep learning model
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN114970536B (en) Combined lexical analysis method for word segmentation, part-of-speech tagging and named entity recognition
CN116340513A (en) Multi-label emotion classification method and system based on label and text interaction
CN111460834B (en) French semantic annotation method and device based on LSTM network
CN114492796A (en) Multitask learning sign language translation method based on syntax tree
CN114528400A (en) Unified low-sample relation extraction method and device based on multi-selection matching network
CN117390189A (en) Neutral text generation method based on pre-classifier
CN117131856A (en) Traffic accident text causal relation extraction method based on problem guidance
CN116680407A (en) Knowledge graph construction method and device
CN114297408A (en) Relation triple extraction method based on cascade binary labeling framework
CN114611489A (en) Text logic condition extraction AI model construction method, extraction method and system
CN114298052A (en) Entity joint labeling relation extraction method and system based on probability graph
CN112487134A (en) Scientific and technological text problem extraction method based on extremely simple abstract strategy
CN117807999B (en) Domain self-adaptive named entity recognition method based on countermeasure learning
CN113297845B (en) Resume block classification method based on multi-level bidirectional circulation neural network
CN111402012B (en) E-commerce defective product identification method based on transfer learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant