CN115080924A - Software license clause extraction method based on natural language understanding - Google Patents
Software license clause extraction method based on natural language understanding Download PDFInfo
- Publication number
- CN115080924A CN115080924A CN202210875400.1A CN202210875400A CN115080924A CN 115080924 A CN115080924 A CN 115080924A CN 202210875400 A CN202210875400 A CN 202210875400A CN 115080924 A CN115080924 A CN 115080924A
- Authority
- CN
- China
- Prior art keywords
- license
- clause
- text data
- sentence
- software
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims abstract description 7
- 230000002457 bidirectional effect Effects 0.000 claims description 7
- 230000015654 memory Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 230000007787 long-term memory Effects 0.000 claims description 2
- 230000006403 short-term memory Effects 0.000 claims description 2
- 238000002372 labelling Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000008571 general function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/105—Arrangements for software license management or administration, e.g. for managing licenses at corporate level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Technology Law (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a software license clause extraction method based on natural language understanding, which comprises the steps of firstly, constructing a license text data set, marking words, phrases or sentences related to rights clauses and obligations clauses in license clauses, using the marked license text data set, training a license clause extraction model, and realizing the automatic extraction function of the license clauses in the license text; then, for given open source software, extracting all license text data in the given open source software, and extracting license terms related to the license text data by using the established term extraction model after denoising; and finally, for the extracted license terms, obtaining attitude polarity corresponding to each term based on syntactic analysis. The method can assist software developers to quickly understand the license and select the license, and help software development projects to avoid legal risk and economic risk caused by improper use of the license.
Description
Technical Field
The invention belongs to the technical field of software systems and artificial intelligence, and particularly relates to a software license clause extraction method based on natural language understanding.
Background
In recent years, with the continuous improvement of software development technology and the continuous expansion of software demand, millions of application programs have appeared in the software application market. Under the intense competition of the software market, developers often rely on third-party open source software to realize some general functions in order to shorten the development period, so that the developers are more attentive to research and development of own unique functions. In the context of widespread use of open source software, open source software licenses have been produced. The software owner sends out a descriptive word, and makes a limitation or a requirement on actions such as change, sharing, release and the like involved when other people use the open source software, so that the software owner has legal significance. The reasonable use of the license can bring convenience to the software market, otherwise, complex disputes about the copyright of the software can be caused, and legal risks are triggered.
In an actual development scene, most attention points of people on the license are focused on terms with key functions and polarities thereof, and semantic understanding and key information extraction on the license are imperative; meanwhile, as the requirements of developers are diversified, a plurality of self-defined licenses are continuously generated at present, and if the semantic understanding targets such as automatic extraction of license terms and attitude polarity can be realized, the new self-defined license can be better adapted, and the requirements of the developers on the licenses can be met, so that convenience is brought to the developers for understanding the licenses and selecting the licenses.
Furthermore, almost all open source software projects need to refer to additional open source code or modules, when an introduced open source module has a license, there may be incompatibility of license terms between the introduced module and the project overall license, the potential legal risk and economic risk of which are not overlooked, and the guarantee of license compatibility is based on the developer's full understanding of the terms of the license, so that it is necessary to implement automatic extraction of license terms.
There are many deficiencies in the existing research work: most researchers have long paid attention to identification of license names and versions in the past, and the identification essentially belongs to pattern matching, and only can identify a plurality of types of relatively popular licenses, and cannot meet the requirements of a large number of newly-generated custom license groups at present; in the aspect of license clause extraction, the previous research mainly models the license through manual summary, and has no mobility, and scholars propose an automatic clause extraction technology, but the accuracy is not high, and the usability needs to be further verified.
Disclosure of Invention
Compared with the traditional method, the method is not limited by the range of the license, and can more accurately, more comprehensively and more automatically realize the extraction of the terms of the software license and the semantic understanding on the basis of the extraction without being limited by the range of the license. Therefore, the invention provides a software license clause extraction method based on natural language understanding.
The invention is realized by the following technical scheme:
a software license terms extraction method based on natural language understanding, comprising the steps of:
step S1, constructing a license text data set and marking the license terms in the license text data set;
step S2, using the labeled data set of the license text of step S1 to train a license clause extraction model, and realizing the function of automatically extracting the license clauses in the license text;
step S3, extracting all the license text data in the given open source software;
step S4, carrying out denoising pretreatment on the license text data obtained in the step S3;
a step S5 of extracting license terms involved with the license text data processed in the step S4 using the terms extraction model established in the step S2;
in step S6, for the extracted license terms, the attitude polarity corresponding to each term is obtained based on the syntax analysis.
In the above technical solution, in step 1, corresponding manual labeling is performed on words, phrases or sentences related to rights clauses such as Distribute, Modify, and communication Use in the license text data set and obligation clauses such as inclusion code, close Source, and State Changes, and the label indicates the beginning, middle, or end of which clause the words, phrases or sentences belong to, or does not belong to any clause.
In the above technical solution, step 2 includes the following steps:
step S21, firstly, word embedding is carried out by using a Glove pre-training model, and words of a natural language are converted into corresponding vector representations;
step S22, extracting context characteristics by using a bidirectional long-short term memory network model;
step S23, calculating the state transition information of the prediction category after the prediction output of the bidirectional long and short term memory network model by using a conditional random field algorithm, and solving the conditional probability distribution of the term sequence by combining the existing category label, thereby obtaining an effective term prediction result;
and step S24, continuously training by using the steps until the accuracy of the model meets the requirement, and stopping training at the moment to obtain the final clause extraction model.
In the above technical solution, step S3 includes the following steps:
step S31, traversing each text file in the given open source software, finding out all special license files, and further obtaining the contained license text data;
step S32, traversing each code file in the given open source software, finding out the names of all the third party packages introduced by the code files, and obtaining the license name and the text content thereof indicated by the corresponding third party package through inquiry;
step S33, traversing each code file in the given open source software, and obtaining the license related content declared among the code lines by analyzing the code annotation; if the license is declared only by the name of the license, the detailed text content of the license is required to be obtained according to the name of the license;
and step S34, summarizing all the license text data obtained in the open source software in the steps S31-S33 as analysis data for the subsequent license clause extraction.
In the above technical solution, the denoising preprocessing operation in step S4 is to remove punctuation marks, numbers, spaces, line feeds, and tabs from the license text data obtained in step S3.
In the above technical solution, step S6 includes the following steps:
step S61, for each extracted license clause, finding the sentence where the license clause is located, and carrying out syntactic analysis on the sentence by utilizing a probabilistic context-free syntactic method to obtain a syntactic tree of the sentence;
step S62, analyzing the corresponding part of speech of each word belonging to the clause according to the grammar tree of the sentence in which each clause is located, and adding a 'key word set' of the clause if the sentence is a verb, a verb and a preposition;
step S63, according to the syntax tree of the sentence in which each clause is located, finding the sentence part that is inside the sentence, outside the clause, and is the main sentence of the clause from the syntax structure, analyzing the corresponding part-of-speech of each word in the sentence part, and adding the "key word set" of the clause if the sentence part is a verb, a modal verb, or a preposition;
step S64, traversing the key word set of the clause, finding out the negative word in the list, and judging whether the attitude polarity of the clause is negative; if not, finding out the necessary words in the table, and regarding the attitude polarity of the clause as necessary as long as the word appears; if not, the attitude polarity of the clause is considered to be ok; the attitude polarity of the clause is obtained in the above manner.
The invention also provides a computer-readable storage medium, storing a computer program which, when executed, implements the steps of the method described above.
The invention has the advantages and beneficial effects that:
(1) on the basis of natural language understanding, license terms are extracted based on a named entity recognition technology, and attitude polarity of corresponding terms is determined by combining syntactic analysis, so that automatic semantic understanding and key information extraction of the software license are realized. (2) The method can assist software developers to quickly understand the license and select the license, and help software development projects to avoid legal risks and economic risks caused by improper use of the license. (3) Experiments on the open source software system prove that compared with the existing open source software license clause extraction related work, the method has certain improvement on accuracy, and can provide more accurate, more comprehensive and more automatic license semantic understanding for software development.
Drawings
Fig. 1 is a basic flowchart of a software license term extraction method based on natural language understanding proposed by the present invention.
For a person skilled in the art, other relevant figures can be obtained from the above figures without inventive effort.
Detailed Description
In order to make the technical solution of the present invention better understood, the technical solution of the present invention is further described below with reference to specific examples.
Referring to fig. 1, a method for extracting software license terms based on natural language understanding includes the steps of:
step S1, constructing a license text data set and labeling the license terms therein, specifically including the following steps:
step S11, collect and acquire as many license text data as possible (this embodiment acquires 400 open source software licenses that are popular in the open source software field and custom licenses that exist in the open source software project code).
In step S12, the acquired license text is subjected to manual labeling of license terms for training of the supervised learning model in step S2. Specifically, the corresponding manual labeling is required for the words, phrases or sentences related to the right clauses such as Distribute, Modify and Commercial Use and the obligation clauses such as envelope code, clear Source and State Changes in the license text, wherein the labeling indicates the beginning, middle or end of which clause the words, phrases or sentences belong to or do not belong to any clause.
Step S2, using the labeled license text data set of step S1, to supervise and train a clause extraction model based on the named entity recognition technology, so as to implement the automatic extraction function of the clauses of the license text, specifically comprising the following steps:
step S21, word embedding is performed by using the Glove pre-training model, that is, the word of the natural language is converted into the corresponding vector representation by using the potential language knowledge in the pre-training model.
And step S22, extracting the context characteristics by using a bidirectional long-short term memory network (Bi-LSTM) model. The Recurrent Neural Network (RNN) is adept at handling predictive problems with time series data as input; a long-short term neural network (LSTM) introduces neural units such as a forgetting gate, an input gate, an output gate and a hidden state on the basis of the RNN, and is used for relieving the problems of gradient disappearance, gradient explosion, poor long-distance information dependence capability and the like of the RNN; the bidirectional long-short term memory network (Bi-LSTM) is composed of 2 independent LSTMs in forward sequence and reverse sequence, so that the characteristic parameters obtained at each moment have information between the past and the future at the same time, and the performance is relatively better in the aspects of text characteristic extraction efficiency and performance.
Step S23, using Conditional Random Field (CRF) algorithm, after the prediction output of the bidirectional long-short term memory network (Bi-LSTM) model, calculating the state transition information of the prediction category, namely the characteristic function of CRF, and then combining the existing category label to solve the conditional probability distribution of the clause sequence, thereby obtaining the effective clause prediction result.
And step S24, continuously training by using the steps until the accuracy of the model is not improved within 1000 times of training, and stopping training to obtain a final clause extraction model.
Step S3, for a given open source software, extracting all the license text data therein, specifically including the following steps:
step S31, traverse each text file in the given open source software, find out all the special license files, and then obtain the license text data contained therein.
Step S32, traversing each code file in the given open source software, obtaining names of all third party packages introduced by the code file through regular matching, and obtaining license names and text contents thereof indicated by the corresponding third party packages through query.
Step S33, traversing each code file in the given open source software, and obtaining the license related content declared among the code lines by analyzing the code annotation; if a license is declared by only the name of the license, the details of the license need to be available based on the name of the license (a database of known licenses can be built in advance, and the details of the license can be found based on the name of the license).
And step S34, summarizing all the license text data obtained in the open source software in the steps S31-S33 as analysis data for the subsequent license clause extraction.
Step S4, preprocessing the obtained license text, specifically, removing punctuation marks, numbers, redundant spaces, line feeds, tab marks, and the like in the text by using regular matching.
Step S5, for all license text content that has been extracted by the given open source software in step S4 and that has been pre-processed, the terms involved are extracted for it using the term extraction model obtained in step S2.
Step S6, for the extracted license terms, obtaining attitude polarity corresponding to each term based on syntax analysis; the method specifically comprises the following steps:
step S61, for each extracted license term, find the sentence where it is, and parse the sentence by using Probabilistic Context Free Grammar (PCFG) method to obtain the grammar tree of the sentence.
Step S62, analyzing the corresponding part-of-speech of each word belonging to each clause according to the syntax tree of the sentence in which each clause is located, and adding the "key word set" of the clause if the sentence is a verb, a verb and a preposition.
Step S63, according to the syntax tree of the sentence in which each clause is located, find the sentence part that is inside the sentence, outside the clause, and is the main sentence of the clause from the syntax structure, analyze the corresponding part-of-speech of each word in this sentence part, and add the "accent word set" of the clause if the sentence part is a verb, a modal verb, or a preposition.
Step S64, traversing the 'important word set' of the clause, finding out the negative words in the list, and judging whether the attitude polarity of the clause is negative (Cannot) by the principle of 'negative is positive'; if not, finding out the words in which the table is necessary, and regarding the attitude polarity of the clause as necessary as long as the word appears (Must); if not necessary, the attitude polarity of the clause is considered to be ok (Can). The attitude polarity of the clause is obtained in the above manner.
The invention has been described in an illustrative manner, and it is to be understood that any simple variations, modifications or other equivalent changes which can be made by one skilled in the art without departing from the spirit of the invention fall within the scope of the invention.
Claims (8)
1. A software license terms extraction method based on natural language understanding, comprising the steps of:
step S1, constructing a license text data set and marking the license terms in the license text data set;
step S2, using the labeled data set of the license text of step S1 to train a license clause extraction model, and realizing the function of automatically extracting the license clauses in the license text;
step S3, extracting all license text data in the given open source software;
step S4, carrying out denoising pretreatment on the license text data obtained in the step S3;
a step S5 of extracting license terms involved with the license text data processed in the step S4 using the terms extraction model established in the step S2;
in step S6, for the extracted license terms, the attitude polarity corresponding to each term is obtained based on the syntax analysis.
2. The natural language understanding-based software license term extraction method according to claim 1, characterized in that: in step 1, the words, phrases or sentences related to the right clauses and obligation clauses in the license text data set are labeled correspondingly, and the label indicates which clause of the words, phrases or sentences is the beginning, middle or end of the words, phrases or sentences or does not belong to any clause.
3. The natural language understanding-based software license term extraction method according to claim 2, characterized in that: in step 1, the entitlement clauses comprise Distribute, Modify and Commercial Use clauses, and the obligation clauses comprise Included copy, Disclose Source and State Changes clauses.
4. The natural language understanding-based software license term extraction method according to claim 1, characterized in that: the step 2 comprises the following steps:
step S21, firstly, word embedding is carried out by using a Glove pre-training model, and the vocabulary of the natural language is converted into corresponding vector representation;
step S22, extracting context characteristics by using a bidirectional long-short term memory network model;
step S23, calculating the state transition information of the prediction category after the prediction output of the bidirectional long and short term memory network model by using a conditional random field algorithm, and solving the conditional probability distribution of the term sequence by combining the existing category label to obtain an effective term prediction result;
and step S24, continuously training by using the steps until the accuracy of the model meets the requirement, and stopping training at the moment to obtain the final clause extraction model.
5. The natural language understanding-based software license term extraction method according to claim 1, characterized in that: step S3 includes the following steps:
step S31, traversing each text file in the given open source software, finding out all special license files, and further obtaining the contained license text data;
step S32, traversing each code file in the given open source software, finding out the names of all the third party packages introduced by the code files, and obtaining the license name and the text content thereof indicated by the corresponding third party package through inquiry;
step S33, traversing each code file in the given open source software, and obtaining the license related content declared among the code lines by analyzing the code annotation; if the license is declared only by the name of the license, the detailed text content of the license is required to be obtained according to the name of the license;
and step S34, summarizing all the license text data obtained in the open source software in the steps S31-S33 as analysis data for the subsequent license clause extraction.
6. The natural language understanding-based software license term extraction method according to claim 1, characterized in that: the denoising preprocessing operation of step S4 is to remove punctuation marks, numbers, spaces, line feeds, and tabs from the license text data obtained in step S3.
7. The natural language understanding-based software license terms extracting method according to claim 1, wherein: step S6 includes the steps of:
step S61, for each extracted license clause, finding the sentence where the license clause is located, and carrying out syntactic analysis on the sentence by utilizing a probabilistic context-free syntactic method to obtain a syntactic tree of the sentence;
step S62, analyzing the corresponding part of speech of each word belonging to the clause according to the grammar tree of the sentence in which each clause is located, and adding a 'key word set' of the clause if the sentence is a verb, a verb and a preposition;
step S63, according to the syntax tree of the sentence in which each clause is located, finding the sentence part that is inside the sentence, outside the clause, and is the main sentence of the clause from the syntax structure, analyzing the corresponding part-of-speech of each word in the sentence part, and adding the "key word set" of the clause if the sentence part is a verb, a modal verb, or a preposition;
step S64, traversing the key word set of the clause, finding out the negative word in the list, and judging whether the attitude polarity of the clause is negative; if not, finding out the necessary words in the table, and regarding the attitude polarity of the clause as necessary as long as the word appears; if not, the attitude polarity of the clause is considered to be ok; the attitude polarity of the clause is obtained in the above manner.
8. A computer-readable storage medium, characterized in that a computer program is stored which, when executed, realizes the steps of the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210875400.1A CN115080924B (en) | 2022-07-25 | 2022-07-25 | Software license clause extraction method based on natural language understanding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210875400.1A CN115080924B (en) | 2022-07-25 | 2022-07-25 | Software license clause extraction method based on natural language understanding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115080924A true CN115080924A (en) | 2022-09-20 |
CN115080924B CN115080924B (en) | 2022-11-15 |
Family
ID=83243686
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210875400.1A Active CN115080924B (en) | 2022-07-25 | 2022-07-25 | Software license clause extraction method based on natural language understanding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115080924B (en) |
Citations (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070271190A1 (en) * | 2006-05-19 | 2007-11-22 | Foster Glen A | Discovering licenses in software files |
US20120253793A1 (en) * | 2011-04-01 | 2012-10-04 | Rima Ghannam | System for natural language understanding |
US20130297626A1 (en) * | 2012-03-23 | 2013-11-07 | AVG Technologies CZ,s.r.o | Systems and methods for extraction of policy information |
US20170011119A1 (en) * | 2015-07-06 | 2017-01-12 | Rima Ghannam | System for Natural Language Understanding |
JP2017091101A (en) * | 2015-11-06 | 2017-05-25 | 日本電信電話株式会社 | Clause identification device, method, and program |
CN106934254A (en) * | 2017-02-15 | 2017-07-07 | 中国银联股份有限公司 | The analysis method and device of a kind of licensing of increasing income |
CN106951743A (en) * | 2017-03-22 | 2017-07-14 | 上海英慕软件科技有限公司 | A kind of software code infringement detection method |
CN107783960A (en) * | 2017-10-23 | 2018-03-09 | 百度在线网络技术(北京)有限公司 | Method, apparatus and equipment for Extracting Information |
CN109063421A (en) * | 2018-06-28 | 2018-12-21 | 东南大学 | A kind of analysis of open source licensing compliance and conflicting detection method |
CN109062904A (en) * | 2018-08-23 | 2018-12-21 | 上海互教教育科技有限公司 | Logical predicate extracting method and device |
CN109154939A (en) * | 2016-04-08 | 2019-01-04 | 培生教育公司 | The system and method generated for automated content polymerization |
CN109933664A (en) * | 2019-03-12 | 2019-06-25 | 中南大学 | A kind of fine granularity mood analysis improved method based on emotion word insertion |
CN110609983A (en) * | 2019-08-19 | 2019-12-24 | 广州利科科技有限公司 | Structured decomposition method for policy file |
CN110674639A (en) * | 2019-09-24 | 2020-01-10 | 拾音智能科技有限公司 | Natural language understanding method based on pre-training model |
CN110705265A (en) * | 2019-08-27 | 2020-01-17 | 阿里巴巴集团控股有限公司 | Contract clause risk identification method and device |
US20200285716A1 (en) * | 2019-03-07 | 2020-09-10 | International Business Machines Corporation | Detection and monitoring of software license terms and conditions |
CN111753089A (en) * | 2020-06-28 | 2020-10-09 | 深圳壹账通智能科技有限公司 | Topic clustering method and device, electronic equipment and storage medium |
CN112084309A (en) * | 2020-09-17 | 2020-12-15 | 北京中科微澜科技有限公司 | License selection method and system based on open source software map |
CN112364165A (en) * | 2020-11-12 | 2021-02-12 | 上海犇众信息技术有限公司 | Automatic classification method based on Chinese privacy policy terms |
CN113128227A (en) * | 2020-01-14 | 2021-07-16 | 普天信息技术有限公司 | Entity extraction method and device |
CN113268714A (en) * | 2021-06-03 | 2021-08-17 | 西南大学 | Automatic extraction method for license terms of open source software |
CN114254653A (en) * | 2021-12-23 | 2022-03-29 | 深圳供电局有限公司 | Scientific and technological project text semantic extraction and representation analysis method |
CN114365158A (en) * | 2019-09-12 | 2022-04-15 | 维亚奈系统公司 | Visual creation and monitoring of machine learning models |
CN114417851A (en) * | 2021-12-03 | 2022-04-29 | 重庆邮电大学 | Emotion analysis method based on keyword weighted information |
-
2022
- 2022-07-25 CN CN202210875400.1A patent/CN115080924B/en active Active
Patent Citations (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070271190A1 (en) * | 2006-05-19 | 2007-11-22 | Foster Glen A | Discovering licenses in software files |
US20120253793A1 (en) * | 2011-04-01 | 2012-10-04 | Rima Ghannam | System for natural language understanding |
US20130297626A1 (en) * | 2012-03-23 | 2013-11-07 | AVG Technologies CZ,s.r.o | Systems and methods for extraction of policy information |
US20170011119A1 (en) * | 2015-07-06 | 2017-01-12 | Rima Ghannam | System for Natural Language Understanding |
JP2017091101A (en) * | 2015-11-06 | 2017-05-25 | 日本電信電話株式会社 | Clause identification device, method, and program |
CN109154939A (en) * | 2016-04-08 | 2019-01-04 | 培生教育公司 | The system and method generated for automated content polymerization |
CN106934254A (en) * | 2017-02-15 | 2017-07-07 | 中国银联股份有限公司 | The analysis method and device of a kind of licensing of increasing income |
CN106951743A (en) * | 2017-03-22 | 2017-07-14 | 上海英慕软件科技有限公司 | A kind of software code infringement detection method |
CN107783960A (en) * | 2017-10-23 | 2018-03-09 | 百度在线网络技术(北京)有限公司 | Method, apparatus and equipment for Extracting Information |
CN109063421A (en) * | 2018-06-28 | 2018-12-21 | 东南大学 | A kind of analysis of open source licensing compliance and conflicting detection method |
CN109062904A (en) * | 2018-08-23 | 2018-12-21 | 上海互教教育科技有限公司 | Logical predicate extracting method and device |
US20200285716A1 (en) * | 2019-03-07 | 2020-09-10 | International Business Machines Corporation | Detection and monitoring of software license terms and conditions |
CN109933664A (en) * | 2019-03-12 | 2019-06-25 | 中南大学 | A kind of fine granularity mood analysis improved method based on emotion word insertion |
CN110609983A (en) * | 2019-08-19 | 2019-12-24 | 广州利科科技有限公司 | Structured decomposition method for policy file |
CN110705265A (en) * | 2019-08-27 | 2020-01-17 | 阿里巴巴集团控股有限公司 | Contract clause risk identification method and device |
CN114365158A (en) * | 2019-09-12 | 2022-04-15 | 维亚奈系统公司 | Visual creation and monitoring of machine learning models |
CN110674639A (en) * | 2019-09-24 | 2020-01-10 | 拾音智能科技有限公司 | Natural language understanding method based on pre-training model |
CN113128227A (en) * | 2020-01-14 | 2021-07-16 | 普天信息技术有限公司 | Entity extraction method and device |
CN111753089A (en) * | 2020-06-28 | 2020-10-09 | 深圳壹账通智能科技有限公司 | Topic clustering method and device, electronic equipment and storage medium |
CN112084309A (en) * | 2020-09-17 | 2020-12-15 | 北京中科微澜科技有限公司 | License selection method and system based on open source software map |
CN112364165A (en) * | 2020-11-12 | 2021-02-12 | 上海犇众信息技术有限公司 | Automatic classification method based on Chinese privacy policy terms |
CN113268714A (en) * | 2021-06-03 | 2021-08-17 | 西南大学 | Automatic extraction method for license terms of open source software |
CN114417851A (en) * | 2021-12-03 | 2022-04-29 | 重庆邮电大学 | Emotion analysis method based on keyword weighted information |
CN114254653A (en) * | 2021-12-23 | 2022-03-29 | 深圳供电局有限公司 | Scientific and technological project text semantic extraction and representation analysis method |
Non-Patent Citations (1)
Title |
---|
张荷花等: "BIM模型智能检查工具研究与应用", 《土木建筑工程信息技术》 * |
Also Published As
Publication number | Publication date |
---|---|
CN115080924B (en) | 2022-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112101041B (en) | Entity relationship extraction method, device, equipment and medium based on semantic similarity | |
Yih et al. | Semantic parsing for single-relation question answering | |
US11163956B1 (en) | System and method for recognizing domain specific named entities using domain specific word embeddings | |
CN112836046A (en) | Four-risk one-gold-field policy and regulation text entity identification method | |
WO2024131111A1 (en) | Intelligent writing method and apparatus, device, and nonvolatile readable storage medium | |
CN108874774B (en) | Service calling method and system based on intention understanding | |
CN112100354A (en) | Man-machine conversation method, device, equipment and storage medium | |
WO2024067276A1 (en) | Video tag determination method and apparatus, device and medium | |
CN111143571B (en) | Entity labeling model training method, entity labeling method and device | |
CN113609838B (en) | Document information extraction and mapping method and system | |
CN115357719B (en) | Power audit text classification method and device based on improved BERT model | |
CN113919366A (en) | Semantic matching method and device for power transformer knowledge question answering | |
CN113779227B (en) | Case fact extraction method, system, device and medium | |
CN111738018A (en) | Intention understanding method, device, equipment and storage medium | |
Lee | Natural Language Processing: A Textbook with Python Implementation | |
CN113988071A (en) | Intelligent dialogue method and device based on financial knowledge graph and electronic equipment | |
CN113869054B (en) | Deep learning-based power field project feature recognition method | |
CN115080924B (en) | Software license clause extraction method based on natural language understanding | |
CN113177121A (en) | Text topic classification method and device, electronic equipment and storage medium | |
WO2023169301A1 (en) | Text processing method and apparatus, and electronic device | |
WO2023087935A1 (en) | Coreference resolution method, and training method and apparatus for coreference resolution model | |
WO2023173541A1 (en) | Text-based emotion recognition method and apparatus, device, and storage medium | |
Yin | Fuzzy information recognition and translation processing in English interpretation based on a generalized maximum likelihood ratio algorithm | |
WO2022227196A1 (en) | Data analysis method and apparatus, computer device, and storage medium | |
Liao et al. | The sg-cim entity linking method based on bert and entity name embeddings |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |