CN118396786A - Contract document auditing method and device, electronic equipment and computer readable storage medium - Google Patents
Contract document auditing method and device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN118396786A CN118396786A CN202410411157.7A CN202410411157A CN118396786A CN 118396786 A CN118396786 A CN 118396786A CN 202410411157 A CN202410411157 A CN 202410411157A CN 118396786 A CN118396786 A CN 118396786A
- Authority
- CN
- China
- Prior art keywords
- contract
- document
- text
- target
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 238000010801 machine learning Methods 0.000 claims abstract description 85
- 238000012550 audit Methods 0.000 claims abstract description 33
- 238000004364 calculation method Methods 0.000 claims description 36
- 239000013598 vector Substances 0.000 claims description 34
- 238000012549 training Methods 0.000 claims description 23
- 238000004458 analytical method Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 abstract 1
- 238000000605 extraction Methods 0.000 description 20
- 238000004891 communication Methods 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 239000002131 composite material Substances 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/54—Extraction of image or video features relating to texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Human Resources & Organizations (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Strategic Management (AREA)
- Computing Systems (AREA)
- Economics (AREA)
- Multimedia (AREA)
- Tourism & Hospitality (AREA)
- Life Sciences & Earth Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- General Business, Economics & Management (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Marketing (AREA)
- Biomedical Technology (AREA)
- Technology Law (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Biophysics (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
Abstract
The application discloses a contract document auditing method and device, electronic equipment and a computer readable storage medium. According to the embodiment of the application, the contract type, the contract user information and the theme corresponding to the target contract document are determined by analyzing the acquired target contract document, then the element value of each text unit in the target contract document is calculated by using a first machine learning model according to the contract type, and the text units with the element values more than a preset sequence number are determined as element units according to the sequencing result of the element values; calculating an audit value of each element unit according to contract user information, semantics of each element unit and position information by using a second machine learning model; finally, the risk degree of the target contract document can be determined according to the difference between the auditing value and the preset threshold value, so that the auditing method of the embodiment of the application can further consider the user who signs the contract and the contract signing scene to audit the target contract document, so that the element units can be determined by fully considering the user information and the contract scene, and the risk degree of the contract is determined based on the element units, thereby greatly reducing the probability of disputes caused by the contract in actual use.
Description
Technical Field
The present application relates to the field of text processing technologies, and in particular, to a method and apparatus for auditing a contract document, an electronic device, and a computer readable storage medium.
Background
With the rapid development of office automation and computer technology, electronic documents are widely used in our lives and works. In particular, since electronic documents can be rapidly generated and processed by electronic devices such as computers, mobile phones, etc., and can be conveniently transferred between devices through the internet or other networks, they have become an integral part of social activities and technological production activities. People's life and business communication have also shifted from paper media to electronic media, so that the use amount of electronic documents has increased greatly, and the demands have become more diversified.
The wide application of electronic documents, especially in the field of contract auditing, requires auditing the integrity of the terms of the content of the contract, and the validity and rationality of the terms of the contract. In particular, once the contract is signed, constraint is formed on both sides of the contract based on the text content in the contract, so that the aspects of semantic logic, correctness, text expression accuracy, signing validity, signature compliance and consistency and the like of the contract content need to be paid attention during the verification of the contract so as to avoid errors and risks.
Notably, due to the large number of contract types, the portion of interest to the user may be different for different types of contracts in different scenarios. However, the existing checking mode mainly depends on manual work, which not only has low efficiency, but also cannot consider the differences of different users in different scenes, so that the verification of the combination cannot be effectively and accurately performed. Thus, a solution is needed that enables auditing of the appropriate documents.
Disclosure of Invention
The embodiment of the application provides a contract document auditing method and device, electronic equipment and a computer readable storage medium, which are used for solving the defect that in the prior art, a scene and a user cannot be considered for targeted auditing due to manual auditing.
In order to achieve the technical purpose, the embodiment of the application provides a contract document auditing method, which comprises the following steps:
acquiring a target contract document;
Analyzing the target contract document to determine the contract type, contract user information and theme corresponding to the target contract document, wherein the contract user information comprises: signing user information of both contracting parties, drafting user information of contracting drafting parties and correlation party user information of contract correlation parties,
Calculating element values of each text unit in the target contract document according to the contract type by using a first machine learning model, wherein the element values indicate contribution degrees of each text unit to the theme of the target contract document;
According to the sorting result of the element values, determining text units with the element values of which the sorting order is larger than a preset sequence number as element units;
calculating an audit value of each element unit according to the contract user information, the semantics of each element unit and the position information by using a second machine learning model;
And determining the risk degree of the target contract document according to the difference between the auditing value and a preset threshold value, wherein the risk degree indicates the probability of disputes caused by the target contract document in actual use.
Another embodiment of the present application provides a contract document auditing apparatus, including:
the acquisition module is used for acquiring the target contract document;
The analysis module is used for analyzing the target contract document to determine the contract type, the contract user information and the theme corresponding to the target contract document, wherein the contract user information comprises: signing user information of both contracting parties, drafting user information of contracting drafting parties and correlation party user information of contract correlation parties,
A first calculation module, configured to calculate, using a first machine learning model according to the contract type, element values of each text unit in the target contract document, where the element values indicate contribution of each text unit to a topic of the target contract document;
a first determining module, configured to determine, as an element unit, a text unit whose element value is ranked greater than a predetermined sequence number according to a ranking result of the element values;
the second calculation module is used for calculating the auditing value of each element unit according to the contract user information, the semantics of each element unit and the position information by using a second machine learning model;
And the second determining module is used for determining the risk degree of the target contract document according to the difference between the auditing value and a preset threshold value, wherein the risk degree indicates the probability of disputes caused by the target contract document in actual use.
The embodiment of the application also provides electronic equipment, which comprises:
a memory for storing a program;
And the processor is used for running the program stored in the memory to execute the contract document auditing method according to the embodiment of the application.
The embodiment of the application also provides a computer readable storage medium, on which a computer program executable by a processor is stored, wherein the program, when being executed by the processor, realizes the contract document auditing method as provided by the embodiment of the application.
According to the contract document auditing method and device, the electronic equipment and the computer readable storage medium, the contract type, the contract user information and the theme corresponding to the target contract document are determined by analyzing the acquired target contract document, then the element value of each text unit in the target contract document is calculated by using a first machine learning model according to the contract type, and the text units with the element values ranked larger than the preset sequence number are determined as element units according to the ranking result of the element values; calculating an audit value of each element unit according to contract user information, semantics of each element unit and position information by using a second machine learning model; finally, the risk degree of the target contract document can be determined according to the difference between the auditing value and the preset threshold value, so that the auditing method of the embodiment of the application can further consider the user who signs the contract and the contract signing scene to audit the target contract document, so that the element units can be determined by fully considering the user information and the contract scene, and the risk degree of the contract is determined based on the element units, thereby greatly reducing the probability of disputes caused by the contract in actual use.
The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a flow chart of one embodiment of a method for auditing a treaty document provided by the present application;
FIG. 2 is a schematic diagram of an embodiment of a contract document audit apparatus provided by the present application;
fig. 3 is a schematic structural diagram of an embodiment of an electronic device provided by the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.
Example 1
With the rapid development of office automation and computer technology, electronic documents are widely used in our lives and works. In particular, since electronic documents can be rapidly generated and processed by electronic devices such as computers, mobile phones, etc., and can be conveniently transferred between devices through the internet or other networks, they have become an integral part of social activities and technological production activities. People's life and business communication have also shifted from paper media to electronic media, so that the use amount of electronic documents has increased greatly, and the demands have become more diversified.
The wide application of electronic documents, especially in the field of contract auditing, requires auditing the integrity of the terms of the content of the contract, and the validity and rationality of the terms of the contract. In particular, once the contract is signed, constraint is formed on both sides of the contract based on the text content in the contract, so that the aspects of semantic logic, correctness, text expression accuracy, signing validity, signature compliance and consistency and the like of the contract content need to be paid attention during the verification of the contract so as to avoid errors and risks.
Notably, due to the large number of contract types, the portion of interest to the user may be different for different types of contracts in different scenarios. However, the existing checking mode mainly depends on manual work, which not only has low efficiency, but also cannot consider the differences of different users in different scenes, so that the verification of the combination cannot be effectively and accurately performed.
For example, FIG. 1 is a flow chart of one embodiment of a method for auditing a treaty document provided by the present application. As shown in fig. 1, in the contract document auditing method according to the embodiment of the present application, it may include:
S101: and acquiring a target contract document.
In step S101, according to the contract document auditing method according to the embodiment of the present application, a specified target contract document can be acquired by a server. In embodiments of the present application, the target contract document may be any type of contract document, for example, may be a business contract, a labor contract, a lease contract, a business contract, and the like. In particular, in cases where the current contractual service tends to be complex, a contract may involve different principals, e.g., in a business contract, multiple principals may be involved, e.g., first, second, third, etc. For this reason, in the contract document auditing method of the embodiment of the present application, the related contract document may be parsed from the previous contract processing history data by reading it in step S101, or the contract document of each contract may be stored in advance on a separate server or a server deployed with the contract document auditing method according to the embodiment of the present application, so that such a prestored contract document may be acquired in step S101, and the target contract document may be parsed or directly acquired therefrom.
S102: the target contract document is parsed to determine contract types, contract user information, and topics.
In step S102, according to the method for auditing a contract document in the embodiment of the present application, the target contract document acquired in step S101 may be parsed to determine the contract type, contract user information, and topic corresponding to the contract document. For example, in the case of a business contract, the contract type may be a trade contract, a service contract, a borrowing contract, etc., the contract user information may include principal information of contracting for a party a, a party b, a party c, etc., and the contract subject may be a contract label, a contract amount, a contract term, etc. In particular, in the method for auditing a contract document according to the embodiment of the present application, in step S102, the relevant party user information related to the contract document may be further analyzed, for example, in a business contract, in addition to the main body of contracting with the first party, the second party, the third party, etc., the relevant party such as a guarantee person, a witness, etc., may be involved.
For example, in step S102, the contract type may be determined from the contract terms in the target contract document acquired in step S101. For example, if the contract term includes the contents of the trade mark, trade price, trade mode, etc., the contract may be determined to be a trade contract. In addition, the contractual user information may also be determined based on the subject information referred to in the contractual terms. For example, if the contract terms include information about the name of the party, the address of the party, the contact of the party, etc., then the party may be determined to be the contracting subject of the contract. In addition, the contract topic may be determined based on information related to contract terms, contract amount, contract duration, and the like.
S103: the element values of each text unit in the target contract document are calculated using a first machine learning model according to the contract type.
In step S103, according to the contract document auditing method of the embodiment of the present application, the element values of the respective text units in the target contract document may be calculated using the first machine learning model according to the contract type determined in step S102. For example, in the case of a commercial contract, the element value may be the degree of association of important content, such as contract subject, contract amount, contract period, etc., with the user or content, as referred to in the contract terms. In particular, in the contract document auditing method of the embodiment of the present application, in step S103, a predetermined reference target document may also be further acquired according to the contract type. For example, if the contract type determined in step S102 is a buy-and-sell contract, a predetermined buy-and-sell contract reference target document may be acquired in accordance with the buy-and-sell contract type in step S103.
For example, in step S103, the similarity between the contract type with the reference target document may be calculated from the contract type determined in step S102. For example, if the contract type determined in step S102 is a buy-sell contract, the similarity between the target contract document and the buy-sell contract reference target document may be calculated from the buy-sell contract reference target document. In addition, the element value of each text unit may be calculated according to the degree of association between each text unit in the target contract document and each text unit in the reference target document. For example, if the degree of association between a certain text unit in the target contract document and a certain text unit in the reference target document is high, it may be determined that the text unit is an important text unit and a high element value may be given.
For example, in step S103, the similarity between the contract type with the reference target document may be calculated from the contract type determined in step S102. For example, in the case of a business contract, the similarity may be the degree of matching of important content, such as contract targets, contract amounts, contract period limits, and the like, involved in the terms of the contract. In particular, in the contract document auditing method according to the embodiment of the present application, the similarity may be further calculated according to a predetermined reference target document library. For example, if the contract type determined in step S102 is a sales contract, the similarity between the target contract document and the sales contract reference target document may be calculated from the sales contract reference target document library.
For example, a predetermined reference target document may be acquired according to the contract type determined in step S102. For example, if the contract type determined in step S102 is a buy-sell contract, a predetermined buy-sell contract reference target document may be acquired according to the buy-sell contract type. In addition, the similarity may also be calculated based on the degree of matching between the contract terms in the target contract document and the contract terms in the reference target document. For example, if the contract terms in the target contract document match the contract terms in the reference target document to a higher degree, it may be determined that the similarity is higher.
The ranking may then be performed according to similarity to find the highest ranked reference target document as the reference target document of the same contract type as the target contract document.
According to the contract document auditing method provided by the embodiment of the application, the reference target documents with highest similarity ranks can be used as the reference target documents which belong to the same contract type as the target contract documents according to the calculated similarity ranks. For example, in the case of a business contract, the business contract reference target document with highest similarity ranking may be used as the reference target document of the same contract type as the target contract document. In particular, in the contract document auditing method of the embodiment of the present application, the reference target document may be further determined according to a predetermined ranking threshold. For example, if the predetermined ranking threshold is 0.8, a reference target document having a similarity ranking greater than 0.8 may be determined to be a reference target document of the same contract type as the target contract document.
For example, the reference target documents may be ranked according to the similarity calculated as above. For example, if the calculated similarity is:
reference target document 1:0.9
Reference target document 2:0.7
Reference target document 3:0.6
The reference target document 1 may be determined to be a reference target document of the same contract type as the target contract document.
Then, a first degree of association between each text unit and each text unit in the reference target document may be calculated.
In this step, according to the contract document auditing method of the embodiment of the present application, the first degree of association between each text unit and each text unit in the reference target document determined as above can be calculated. For example, in the case of a business contract, the first degree of association may be a degree of correspondence of the important content, such as contract objective, contract amount, contract period limit, and the like, involved in the contract terms. In particular, in the contract document auditing method according to the embodiment of the application, the first association degree can be further determined according to a predetermined association degree threshold. For example, if the predetermined relevance threshold is 0.5, text units having a first relevance greater than 0.5 may be determined to be text units having a higher first relevance.
For example, the first degree of association may be calculated based on the degree of matching between each text unit in the target contract document and each text unit in the reference target document. For example, if a certain text unit in the target contract document matches a certain text unit in the reference target document to a higher degree, it may be determined that the text unit has a higher first degree of relevance.
In addition, a second degree of association of each text unit in the target contract document with the contract user information may also be calculated.
In the contract document auditing method according to the embodiment of the application, the second degree of association between each text unit in the target contract document and the contract user information determined in step S102 may be calculated. For example, in the case of a business contract, the second degree of association may be a degree of association of important content, such as contract subject, contract amount, contract period, etc., referred to in terms of the contract, with the contract user information. In particular, in the contract document auditing method according to the embodiment of the present application, the second association degree may be further determined according to a predetermined association degree threshold. For example, if the predetermined relevance threshold is 0.5, text units having a second relevance greater than 0.5 may be determined to be text units having a higher second relevance.
For example, the second degree of association may be calculated based on the degree of matching between each text unit in the target contract document and the contract user information. For example, if a certain text unit in the target contract document matches a certain information in the contract user information to a higher degree, it may be determined that the text unit has a higher second degree of association.
Then, for each text unit in the target contract document, an element value thereof may be determined according to the first degree of association and the second degree of association above a predetermined threshold.
In the contract document auditing method according to the embodiment of the application, for each text unit in the target contract document, the element value thereof can be determined according to the determined first association degree and second association degree. For example, in the case of a commercial contract, the element value may be a weight value of important content such as a contract target, a contract amount, a contract period limit, or the like, which are involved in contract terms. In particular, in the contract document auditing method of the embodiment of the present application, the element value may be further determined according to a predetermined element value threshold.
For example, if the determined first degree of association is 0.9 and the second degree of association is 0.7, an average value of 0.8 of the two degrees of association may be taken as the element value of the text unit, or the element value of the text unit may also be calculated in a weighted average manner in accordance with a weight value set in advance for the first degree of association with the reference target document and for the second degree of association with the contractual user.
S104: and determining the text units with the element values ranked larger than the predetermined sequence number as element units according to the ranking result of the element values.
In step S104, according to the contract document auditing method of the embodiment of the present application, each text unit may be ranked according to the element value calculated in step S103, and the text units whose ranking of the element value is greater than the predetermined sequence number may be determined as the element units. For example, in the case of a business contract, the element units may be the important content of the contract subject, contract amount, contract period limit, and the like, which are involved in the contract terms. In particular, in the contract document auditing method of the embodiment of the present application, in step S104, the element units may be further determined according to a predetermined sequence number. For example, if the predetermined order number is 3, text units having an order of element values greater than 3 may be determined as element units.
S105: and calculating the auditing value of each element unit according to the contract user information, the semantics of each element unit and the position information by using a second machine learning model.
In step S105, according to the contract document auditing method of the embodiment of the present application, the auditing value of each element unit may be calculated from the contract user information, the semantics of each element unit, and the position information determined in step S102, using the second machine learning model. For example, in the case of a business contract, the audit value may be a risk value for the critical content, contract amount, contract period, etc., referred to in the contract terms. In particular, in the contract document auditing method of the embodiment of the present application, in step S105, a predetermined third machine learning model may also be further acquired according to the contract type. For example, if the contract type determined in step S102 is a trade contract, a predetermined third machine learning model related to the trade contract may be acquired according to the trade contract type in step S105. In particular, in an embodiment of the present application, the third machine learning model may be a standard term text based machine learning model.
For example, in step S105, semantic parsing may be performed for each element unit determined to generate an element text word sequence. For example, if a certain element unit is "user a pays xx element" related to contract performance, it may be semantically parsed to generate element text word sequences such as "user a", "pays" and "xx element". In addition, an element vector may be generated from the position information of the element unit. For example, if a certain element unit is located at line 10 of the contractual terms, an element vector may be further generated based on the semantically parsed content. In addition, the contract user information and the element vector may also be input to a second machine learning model to obtain an audit value. For example, if the contractual user information is a first party name, a first party address, a first party contact, etc., the first party name, the first party address, the first party contact, etc., information and the element vector may be input to a second machine learning model to obtain the audit value. For example, if user A is the party A for whom the contract should make a payment, the degree to which user A corresponds to the name of the party A may be used as the audit value.
For example, in step S105, semantic parsing may be performed for each element unit determined to generate an element text word sequence. Specifically, semantic parsing may be performed for each element unit determined in step S104 to generate an element text word sequence. For example, in the case of a business contract, the element text word sequence may be a word sequence of important content, such as contract label, contract amount, contract period, etc., that is involved in the terms of the contract. In particular, in the contract document auditing method of the embodiment of the application, semantic parsing can be further performed according to a predetermined semantic parsing model. For example, if the predetermined semantic parsing model is a word vector model, the element units may be semantically parsed according to the word vector model to generate an element text word sequence.
For example, semantic parsing may be performed based on the textual content of the element units to generate an element text word sequence. For example, if a certain element unit is "user a pays xx element" related to contract performance, it may be semantically parsed to generate element text word sequences such as "user a", "pays" and "xx element".
An element vector may then be generated based on the element text word sequence and the location information of the element unit. For example, in the case of a business contract, the element vector may be a word vector sequence of important content such as contract subject, contract amount, contract period, etc., that is involved in the terms of the contract. In the case of "user a pays xx element" related to contract performance as above, an element vector may be generated from the position of each parsed word in the contract. In particular, in the contract document auditing method of the embodiment of the present application, the element vector may be further generated according to a predetermined word vector model. For example, if the predetermined Word vector model is a Word2Vec model, the Word2Vec model may be used to generate element vectors for element text Word sequences.
The contractual user information and element vector may then be input to the second machine learning model to obtain the audit value.
According to the contract document auditing method of the embodiment of the present application, the contract user information determined in step S102 and the generated element vectors, for example, "user a", "payment", and "xx element" may be input to the second machine learning model to obtain an auditing value. For example, in the case of a business contract, the audit value may be a risk value for the contract's subject, contract amount, contract period, etc., referred to in the contract terms, that may be at risk.
S106: and determining the risk degree of the target contract document according to the difference between the auditing value and the preset threshold value.
In step S106, according to the contract document auditing method of the embodiment of the present application, the risk degree of the target contract document may be determined according to the difference between the auditing value calculated in step S105 and the preset threshold. For example, in the case of a business contract, the risk level may be the dispute probability of the important content of the contract subject, the contract amount, the contract period limit, etc., involved in the contract terms. In particular, in the method for auditing a contract document according to the embodiment of the present application, in step S106, the risk level of the target contract document may be further determined according to a preset threshold. For example, if the preset threshold is 0.5, a contract document whose difference between the audit value and the preset threshold is greater than 0.5 may be determined to be a high risk contract document.
In addition, in the contract document auditing method according to the embodiment of the application, the first machine learning model used can be trained before or during auditing. For example, a predetermined set of user questions may be obtained based on the contract type. In particular, in the case of a business contract, the user question set may be a question set of common questions related to contractual, contractual amounts, contract period limits, and the like. For example, if the contract type determined in step S102 is a buy-sell contract, a common question related to the buy-sell contract may be acquired as a predetermined user question group according to the buy-sell contract type.
And then, acquiring a preset text corresponding to at least one element unit according to the contract type according to each text unit. And acquiring a preset text corresponding to at least one element unit according to the contract type determined in the step S102. For example, in the case of a commercial contract, the text corresponding to the element unit may be standard terms related to important contents of the contract label, the contract amount, the contract period limit, and the like, or qualified terms that have been reviewed and confirmed by a professional. In particular, in the method for auditing a contract document according to the embodiment of the application, the text corresponding to the preset element unit can be further acquired according to the preset element unit text library. For example, if the predetermined element unit text library contains a preset text related to a purchase and sale contract, the preset text related to the purchase and sale contract may be acquired as a text corresponding to the element unit according to the purchase and sale contract type. Of course, in the embodiment of the application, a plurality of related contract key information documents can be recalled from a corpus according to contract types and subjected to related preprocessing, long-width documents can be divided according to clause document structures, and Guan Wendang paragraphs are screened through matching of custom problem keywords and the keyword relativity in the paragraphs and are used as reference element texts.
Then, a first machine learning model may be used to calculate a correlation for each reference element text in the text unit and a pre-set reference element text library, respectively, to determine a degree of correspondence between the text unit and the reference text.
For example, the correlation may be calculated according to the degree of matching between the acquired text unit and each reference element text in the preset reference element text library. For example, if the acquired text unit is "contract subject", a certain reference element text in the preset reference element text library is "subject", the correlation may be calculated according to the degree of matching between "contract subject" and "subject".
In addition, in the embodiment of the application, the association degree between the text units can be further obtained, and the text units with the association degree larger than the preset threshold value are divided into text segments. For example, in the case of a business contract, the degree of association may be a semantic relatedness between text elements. In particular, in the contract document auditing method according to the embodiment of the present application, the association degree may be further calculated according to a predetermined association degree calculation model. For example, if the predetermined relevance calculation model is a TF-IDF model, semantic relevance between text units may be calculated according to the TF-IDF model to obtain relevance. For example, the relevance may be calculated from semantic relevance between the acquired text units. For example, if the acquired text units are "contract subject" and "contract amount", the degree of association may be calculated from the semantic correlation between "contract subject" and "contract amount". In addition, the text segments can be divided according to a predetermined association threshold. For example, if the predetermined relevance threshold is 0.5, text units having a relevance greater than 0.5 may be divided into text segments.
Finally, the element value of the text unit may be determined based on the relevance and the degree of relevance. For example, in the case of a commercial contract, the element value may be the contribution of the text element to the contractual, contract amount, contract period, and the like. In particular, in the contract document auditing method of the embodiment of the present application, the element value may be further calculated according to a predetermined element value calculation model. For example, if the predetermined element value calculation model is a weighted average model, the correlation and the degree of association may be weighted-averaged according to the weighted average model to calculate the element value.
And determining that the training of the first machine learning model was successful when the number of element values greater than the predetermined element threshold is greater than the predetermined number threshold. For example, if the predetermined training success judgment model is an accuracy rate model, the training result of the first machine learning model may be judged according to the accuracy rate model to determine that the training is successful.
For example, if 4 element values among the element values calculated for a certain training sample are greater than a predetermined element threshold value of 0.8 during training, it may be determined that the training of the first machine learning model was successful.
In addition, the method for auditing the contract document according to the embodiment of the application can acquire a predetermined third machine learning model according to the contract type after acquiring the element text word sequence. The third machine learning model may be, for example, a machine model based on standard clause text or on clause text validated by a professional.
For example, in the case of a business contract, the third machine learning model may be a model for judging word rationality. For example, a predetermined third machine learning model may be acquired according to the contract type determined in step S102. For example, if the contract type determined in step S102 is a sales contract, a third machine learning model related to the sales contract may be acquired as a predetermined third machine learning model according to the sales contract type.
Then, in an embodiment of the application, a third machine learning model may be used to calculate a confidence value for each word in the sequence of element text words, where the confidence value indicates the rationality of each word at a corresponding location in the sequence of element text words. For example, in the case of a business contract, the confidence value may be a composite score of factors such as how frequently words appear in the terms of the contract, the collocation of words with other words, and the like. In particular, in the contract document auditing method according to the embodiment of the present application, the trusted value may be further calculated according to a predetermined trusted value calculation model. For example, if the predetermined confidence value calculation model is a bayesian network model, the confidence value for each word in the sequence of element text words may be calculated according to the bayesian network model.
For example, the confidence value may be calculated based on factors such as the frequency of occurrence of each word in the sequence of element text words in the contractual terms, the collocation of the words with other words, and the like. In addition, a trusted value may be calculated based on training results of the third machine learning model. For example, if the training result of the third machine learning model is:
the word "contract" has a confidence value of 0.9
The word "target" has a confidence value of 0.8
The trusted value of the word "is 0.7 and the trusted value of the word" contract "may be determined to be 0.9, the trusted value of the word" target "is determined to be 0.8, and the trusted value of the word" is determined to be 0.7.
Words in the sequence of element text words may then be ordered by the size of the confidence value. For example, in the case of a business contract, a word with a larger confidence value indicates that it is more plausible at its corresponding location in the sequence of element text words. For example, words in the sequence of element text words may be ordered according to the calculated confidence values. For example, if the element text word sequence is "contract", "target", "object", and its confidence values are 0.9, 0.8, 0.7, 0.6, respectively, the words may be ordered according to the magnitude of the confidence values, resulting in an ordered word sequence of "contract", "target", "object".
Finally, words whose ranking order is less than a predetermined ranking threshold may be determined to be false candidate words.
According to the contract document auditing method, words with the sorting order smaller than the preset sorting threshold value can be determined to be error candidate words. For example, in the case of a business contract, the predetermined ranking threshold may be 0.5. In particular, in the contract document auditing method according to the embodiment of the present application, the error candidate word may be further determined according to a predetermined error candidate word determination model. For example, if the predetermined miscandidate word judgment model is a rule-based model, words having a ranking order smaller than a predetermined ranking threshold may be judged according to the rule-based model to determine miscandidate words. For example, the wrong candidate word may be determined according to a predetermined ranking threshold of 0.5. For example, if the sequence of words ordered by steps is "contract", "target", "object", "word" then "of words ordered in order less than 0.5 may be determined to be erroneous candidate words.
In addition, according to the embodiment of the application, the contract auditing method can also analyze the target contract document to acquire signature data of the contract user. In embodiments of the present application, typically, the signer of the contract needs to sign the contract to indicate approval and validate it. For example, the signer of the contract may stamp the company in the case of a company, and may sign or stamp the individual if it is an individual. Thus, according to the embodiment of the present application, the target contract document input in step S101 may be parsed to acquire signature data of the contract user. For example, in the case of a business contract, the signature data may be an electronic signature or a handwritten signature signed by the contracting user on the contract document. In particular, in the contract document auditing method according to the embodiment of the application, the analysis can be further performed according to a predetermined analysis model. For example, if the predetermined parsing model is a rule-based parsing model, the target contract document may be parsed according to the rule-based parsing model to obtain signature data.
For example, signature data may be obtained from an electronic signature or handwritten signature area in a target contract document. In addition, signature data may be acquired according to an analysis result of a predetermined analysis model. For example, if the analysis result of the predetermined analysis model is:
electronic signature data: { "type": electronic "," data ": electronic signature data encoded by" base64 "}
Handwritten signature data: { "type": "handwritten", "data": "base64 encoded handwritten signature data" } electronic signature data and handwritten signature data may be used as signature data, respectively.
According to the contract document auditing method provided by the embodiment of the application, the image characteristics of the acquired signature data can be extracted to determine the type of the signature data. For example, in the case of a commercial contract, the image features may be stroke features, pressure features, etc. of the signature data. In particular, in the contract document auditing method according to the embodiment of the present application, the image features may be further extracted according to a predetermined image feature extraction model. For example, if the predetermined image feature extraction model is a convolutional neural network-based model, image feature extraction may be performed on signature data according to the convolutional neural network-based model.
For example, image features may be extracted from stroke features, pressure features, etc. of the signature data. In addition, the type of signature data may be determined based on the extraction result of the predetermined image feature extraction model. For example, if the extraction result of the predetermined image feature extraction model is:
Electronic signature data: { "type": "electronic", "features": "extracted electronic signature image feature" }
Handwritten signature data: { "type": "handwritten", "features": "extracted handwritten signature image feature" } electronic signature data and handwritten signature data may then be used as different types of signature data, respectively.
When the signature data is handwriting signature data, the reference signature data of the user is extracted from a preset signature database according to the user information corresponding to the signature data.
According to the contract document auditing method, when the type of the extracted signature data is handwriting signature data, the reference signature data of the user can be extracted from a preset signature database according to the user information corresponding to the signature data. For example, in the case of a business contract, the reference signature data may be a handwritten signature signed by the contracting user on other contract documents. In particular, in the contract document auditing method according to the embodiment of the application, reference signature data can be further queried according to a predetermined signature database query model. For example, if the predetermined signature database query model is a hash table-based model, a preset signature database may be queried according to the hash table-based model to extract reference signature data.
For example, the reference signature data may be queried based on information such as a name, an identification card number, etc. in the user information corresponding to the signature data. Furthermore, the reference signature data may be extracted based on a query result of a predetermined signature database query model. For example, if the query result of the predetermined signature database query model is:
Reference signature data: { "user_id": "user ID", "data": "base64 encoded reference signature data" } the reference signature data may be used as the reference signature data for the user.
According to the contract document auditing method of the embodiment of the application, the similarity between the extracted handwritten signature data and the acquired reference signature data can be calculated by using a fourth machine learning model. For example, in the case of a commercial contract, the similarity may be a stroke similarity, a stress similarity, or the like between handwritten signature data and reference signature data. In particular, in the contract document auditing method according to the embodiment of the present application, the similarity may be further calculated according to a predetermined similarity calculation model. For example, if the predetermined similarity calculation model is a dynamic time warping-based model, the similarity between the handwritten signature data and the reference signature data may be calculated from the dynamic time warping-based model.
For example, the similarity may be calculated from a stroke similarity, a pressure similarity, etc. between the handwritten signature data and the reference signature data. In addition, the similarity may be calculated based on a training result of the fourth machine learning model. For example, if the training result of the fourth machine learning model is: a similarity between the handwritten signature data and the reference signature data of 0.9 may then determine the similarity between the handwritten signature data and the reference signature data to be 0.9. For example, the acquired signature data type may be determined, where the data type includes coordinate point data (including signature position x, signature position y, signature pressure p, signature time t) and image data, and according to the acquired coordinate point data and image data, a handwriting signature comparison model is called to compare with data in a handwriting library, and determine whether the data is signed by the same person. When the calculated similarity is greater than the predetermined signature similarity, the signature data is determined to be the signature of the user. For example, in the case of a commercial contract, the predetermined signature similarity may be 0.8. In particular, in the contract document auditing method according to the embodiment of the present application, whether the signature data is the signature of the user may be further determined according to a predetermined signature similarity determination model. For example, if the predetermined signature similarity determination model is a threshold-based model, the similarity may be determined based on the threshold-based model to determine whether the signature data is the signature of the user.
For example, it may be determined whether the signature data is the signature of the user based on a predetermined signature similarity of 0.8. For example, if the calculated similarity is 0.9, the signature data may be determined to be the signature of the user. In determining that the visa data is seal data, in the contract checking method according to the embodiment of the application, in the case of a commercial contract, the image features may be shape features, texture features, and the like of the seal data. In particular, in the contract document auditing method according to the embodiment of the present application, the image features may be further extracted according to a predetermined image feature extraction model. For example, if the predetermined image feature extraction model is a convolutional neural network-based model, image feature extraction may be performed on signature data according to the convolutional neural network-based model. Similarly, when the type of the extracted signature data is seal data, the reference seal data of the user is extracted from a preset signature database according to the user information corresponding to the signature data. For example, in the case of a business contract, the reference seal data may be a seal that the contractor seals against other contract documents. In particular, in the contract document auditing method according to the embodiment of the application, the reference seal data can be further queried according to a predetermined signature database query model. For example, if the predetermined signature database query model is a hash table-based model, a preset signature database may be queried according to the hash table-based model to extract the reference seal data.
According to the contract document auditing method provided by the embodiment of the application, the similarity between the extracted seal data and the extracted reference seal data can be calculated by further using a fifth machine learning model. For example, in the case of a commercial contract, the similarity may be a shape similarity, a texture similarity, or the like between the seal data and the reference seal data. In particular, in the contract document auditing method according to the embodiment of the present application, the similarity may be further calculated according to a predetermined similarity calculation model. For example, if the predetermined similarity calculation model is a correlation-based model, the similarity between the seal data and the reference seal data may be calculated from the correlation-based model.
According to the contract document auditing method provided by the embodiment of the application, when the calculated similarity is greater than the preset seal similarity, the seal data is determined to be the seal of the user. For example, in the case of a commercial contract, the predetermined stamp similarity may be 0.8. For example, it may be determined whether the stamp data is the stamp of the user based on a predetermined stamp similarity of 0.8. For example, if the calculated similarity is 0.9, it may be determined that the stamp data is the stamp of the user. If it is 0.7, it can be determined that the stamp data is not the stamp of the user or is a suspected stamp.
According to the contract document auditing method, the contract type, the contract user information and the theme corresponding to the target contract document are determined by analyzing the acquired target contract document, then the element value of each text unit in the target contract document is calculated by using a first machine learning model according to the contract type, and the text units with the element values of which the order is greater than a preset order number are determined as element units according to the ordering result of the element values; calculating an audit value of each element unit according to contract user information, semantics of each element unit and position information by using a second machine learning model; finally, the risk degree of the target contract document can be determined according to the difference between the auditing value and the preset threshold value, so that the auditing method of the embodiment of the application can further consider the user who signs the contract and the contract signing scene to audit the target contract document, so that the element units can be determined by fully considering the user information and the contract scene, and the risk degree of the contract is determined based on the element units, thereby greatly reducing the probability of disputes caused by the contract in actual use.
Example two
Fig. 2 is a schematic structural diagram of an embodiment of a contract document auditing apparatus provided by the application. As shown in fig. 2, the contract document auditing apparatus provided by the embodiment of the present application may include: the device comprises an acquisition module 21, a parsing module 22, a first calculation module 23, a first determination module 24, a second calculation module 25 and a second determination module 26.
The acquisition module 21 may be used to acquire a target contract document.
According to the contract document auditing apparatus of the embodiment of the present application, the acquisition module 21 can acquire the specified target contract document through the server. In embodiments of the present application, the target contract document may be any type of contract document, for example, may be a business contract, a labor contract, a lease contract, a business contract, and the like. In particular, in cases where the current contractual service tends to be complex, a contract may involve different principals, e.g., in a business contract, multiple principals may be involved, e.g., first, second, third, etc. For this reason, in the contract document auditing apparatus of the embodiment of the present application, the acquisition module 21 may read the previous contract processing history data and parse out the relevant contract document therefrom, or may store the contract document of each contract in advance on a separate server or a server where the contract document auditing apparatus according to the embodiment of the present application is deployed, so that the acquisition module 21 may acquire the contract document thus stored in advance and parse out or directly acquire the target contract document therefrom.
The parsing module 22 may be used to parse the target contract document to determine contract types, contract user information, and topics.
According to the contract document auditing device provided by the embodiment of the application, the analyzing module 22 can analyze the target contract document acquired by the acquiring module 21 to determine the contract type, the contract user information and the theme corresponding to the contract document. For example, in the case of a business contract, the contract type may be a trade contract, a service contract, a borrowing contract, etc., the contract user information may include principal information of contracting for a party a, a party b, a party c, etc., and the contract subject may be a contract label, a contract amount, a contract term, etc. In particular, in the contract document auditing apparatus according to the embodiment of the present application, the parsing module 22 may further parse the relevant party user information related to the contract document, for example, in a business contract, in addition to the main body of contracting with the first party, the second party, the third party, etc., the relevant party such as the guarantor, the witness, etc. may be involved.
For example, the parsing module 22 may determine the contract type based on the contract terms in the target contract document acquired by the acquisition module 21. For example, if the contract term includes the contents of the trade mark, trade price, trade mode, etc., the contract may be determined to be a trade contract. In addition, the contractual user information may also be determined based on the subject information referred to in the contractual terms. For example, if the contract terms include information about the name of the party, the address of the party, the contact of the party, etc., then the party may be determined to be the contracting subject of the contract. In addition, the contract topic may be determined based on information related to contract terms, contract amount, contract duration, and the like.
The first calculation module 23 may be configured to calculate the element values of the respective text units in the target contract document using the first machine learning model according to the contract type.
According to the contract document auditing apparatus of the embodiment of the present application, the first calculation module 23 may calculate the element values of the respective text units in the target contract document using the first machine learning model according to the contract type determined by the parsing module 22. For example, in the case of a commercial contract, the element value may be the degree of association of important content, such as contract subject, contract amount, contract period, etc., with the user or content, as referred to in the contract terms. In particular, in the contract document auditing apparatus of the embodiment of the present application, the first calculation module 23 may further acquire a predetermined reference target document according to the contract type. For example, if the contract type determined by the parsing module 22 is a buy-sell contract, the first calculation module 23 may acquire a predetermined buy-sell contract reference target document according to the buy-sell contract type.
For example, the first calculation module 23 may calculate the similarity with the contract type of the reference target document according to the contract type determined by the parsing module 22. For example, if the contract type determined by the parsing module 22 is a sales contract, then the similarity between the target contract document and the sales contract reference target document may be calculated from the sales contract reference target document. In addition, the element value of each text unit may be calculated according to the degree of association between each text unit in the target contract document and each text unit in the reference target document. For example, if the degree of association between a certain text unit in the target contract document and a certain text unit in the reference target document is high, it may be determined that the text unit is an important text unit and a high element value may be given.
For example, the first calculation module 23 may calculate the similarity with the contract type of the reference target document according to the contract type determined by the parsing module 22. For example, in the case of a business contract, the similarity may be the degree of matching of important content, such as contract targets, contract amounts, contract period limits, and the like, involved in the terms of the contract. In particular, in the contract document auditing apparatus of the embodiment of the present application, the similarity may be further calculated from a predetermined reference target document library. For example, if the contract type determined by the parsing module 22 is a sales contract, then the similarity between the target contract document and the sales contract reference target document may be calculated from the sales contract reference target document library.
For example, the predetermined reference target document may be obtained based on the contract type determined by parsing module 22. For example, if the contract type determined by the parsing module 22 is a buy and sell contract, a predetermined buy and sell contract reference target document may be obtained according to the buy and sell contract type. In addition, the similarity may also be calculated based on the degree of matching between the contract terms in the target contract document and the contract terms in the reference target document. For example, if the contract terms in the target contract document match the contract terms in the reference target document to a higher degree, it may be determined that the similarity is higher.
The ranking may then be performed according to similarity to find the highest ranked reference target document as the reference target document of the same contract type as the target contract document.
According to the contract document auditing device provided by the embodiment of the application, the reference target documents with highest similarity ranking can be used as the reference target documents which belong to the same contract type as the target contract documents according to the similarity calculated in the above way. For example, in the case of a business contract, the business contract reference target document with highest similarity ranking may be used as the reference target document of the same contract type as the target contract document. In particular, in the contract document auditing apparatus of the embodiment of the present application, the reference target document may be further determined according to a predetermined ranking threshold. For example, if the predetermined ranking threshold is 0.8, a reference target document having a similarity ranking greater than 0.8 may be determined to be a reference target document of the same contract type as the target contract document.
For example, the reference target documents may be ranked according to the similarity calculated as above. For example, if the calculated similarity is:
reference target document 1:0.9
Reference target document 2:0.7
Reference target document 3:0.6
The reference target document 1 may be determined to be a reference target document of the same contract type as the target contract document.
Then, a first degree of association between each text unit and each text unit in the reference target document may be calculated.
In this step, according to the contract document auditing apparatus of the embodiment of the present application, the first degree of association between each text unit and each text unit in the reference target document determined as above can be calculated. For example, in the case of a business contract, the first degree of association may be a degree of correspondence of the important content, such as contract objective, contract amount, contract period limit, and the like, involved in the contract terms. In particular, in the contract document auditing apparatus according to the embodiment of the present application, the first association degree may be further determined according to a predetermined association degree threshold. For example, if the predetermined relevance threshold is 0.5, text units having a first relevance greater than 0.5 may be determined to be text units having a higher first relevance.
For example, the first degree of association may be calculated based on the degree of matching between each text unit in the target contract document and each text unit in the reference target document. For example, if a certain text unit in the target contract document matches a certain text unit in the reference target document to a higher degree, it may be determined that the text unit has a higher first degree of relevance.
In the contract document auditing apparatus according to the embodiment of the application, the second degree of association between each text unit in the target contract document and the contract user information determined by the parsing module 22 can be calculated. For example, in the case of a business contract, the second degree of association may be a degree of association of important content, such as contract subject, contract amount, contract period, etc., referred to in terms of the contract, with the contract user information. In particular, in the contract document auditing apparatus according to the embodiment of the present application, the second association degree may be further determined according to a predetermined association degree threshold. For example, if the predetermined relevance threshold is 0.5, text units having a second relevance greater than 0.5 may be determined to be text units having a higher second relevance.
For example, the second degree of association may be calculated based on the degree of matching between each text unit in the target contract document and the contract user information. For example, if a certain text unit in the target contract document matches a certain information in the contract user information to a higher degree, it may be determined that the text unit has a higher second degree of association.
Then, for each text unit in the target contract document, an element value thereof may be determined according to the first degree of association and the second degree of association above a predetermined threshold.
In the contract document auditing apparatus according to the embodiment of the application, the element value of each text unit in the target contract document may be determined according to the determined first degree of association and second degree of association. For example, in the case of a commercial contract, the element value may be a weight value of important content such as a contract target, a contract amount, a contract period limit, or the like, which are involved in contract terms. In particular, in the contract document audit apparatus of the embodiment of the present application, the element value may be further determined according to a predetermined element value threshold.
For example, if the determined first degree of association is 0.9 and the second degree of association is 0.7, an average value of 0.8 of the two degrees of association may be taken as the element value of the text unit, or the element value of the text unit may also be calculated in a weighted average manner in accordance with a weight value set in advance for the first degree of association with the reference target document and for the second degree of association with the contractual user.
The first determining module 24 may be configured to determine, as the element unit, a text unit whose element value is ranked greater than a predetermined sequence number according to the ranking result of the element values.
According to the contract document auditing apparatus of the embodiment of the present application, the first determining module 24 may sort the respective text units according to the element values calculated by the first calculating module 23, and determine the text units having the element values sorted more than the predetermined sequence number as the element units. For example, in the case of a business contract, the element units may be the important content of the contract subject, contract amount, contract period limit, and the like, which are involved in the contract terms. In particular, in the contract document auditing apparatus of the embodiment of the present application, the first determination module 24 may further determine the element units according to a predetermined sequence number. For example, if the predetermined order number is 3, text units having an order of element values greater than 3 may be determined as element units.
The second calculation module 25 may be configured to calculate the audit value for each element unit based on the contract user information, the semantics of each element unit, and the location information using a second machine learning model.
According to the contract document auditing apparatus of the embodiment of the present application, the second calculation module 25 may calculate the auditing value of each element unit according to the contract user information, the semantics of each element unit, and the position information determined by the parsing module 22 using the second machine learning model. For example, in the case of a business contract, the audit value may be a risk value for the critical content, contract amount, contract period, etc., referred to in the contract terms. In particular, in the contract document auditing apparatus of the embodiment of the present application, the second calculation module 25 may further acquire a predetermined third machine learning model according to the contract type. For example, if the contract type determined by the parsing module 22 is a buy-sell contract, the second calculation module 25 may obtain a predetermined third machine learning model related to the buy-sell contract based on the buy-sell contract type. In particular, in an embodiment of the present application, the third machine learning model may be a standard term text based machine learning model.
For example, the second computing module 25 may perform semantic parsing for each element unit determined to generate an element text word sequence. For example, if a certain element unit is "user a pays xx element" related to contract performance, it may be semantically parsed to generate element text word sequences such as "user a", "pays" and "xx element". In addition, an element vector may be generated from the position information of the element unit. For example, if a certain element unit is located at line 10 of the contractual terms, an element vector may be further generated based on the semantically parsed content. In addition, the contract user information and the element vector may also be input to a second machine learning model to obtain an audit value. For example, if the contractual user information is a first party name, a first party address, a first party contact, etc., the first party name, the first party address, the first party contact, etc., information and the element vector may be input to a second machine learning model to obtain the audit value. For example, if user A is the party A for whom the contract should make a payment, the degree to which user A corresponds to the name of the party A may be used as the audit value.
For example, the second computing module 25 may perform semantic parsing for each element unit determined to generate an element text word sequence. Specifically, semantic parsing may be performed for each element unit determined by the first determination module 24 to generate an element text word sequence. For example, in the case of a business contract, the element text word sequence may be a word sequence of important content, such as contract label, contract amount, contract period, etc., that is involved in the terms of the contract. In particular, in the contract document auditing apparatus according to the embodiment of the present application, the semantic parsing may be further performed according to a predetermined semantic parsing model. For example, if the predetermined semantic parsing model is a word vector model, the element units may be semantically parsed according to the word vector model to generate an element text word sequence.
For example, semantic parsing may be performed based on the textual content of the element units to generate an element text word sequence. For example, if a certain element unit is "user a pays xx element" related to contract performance, it may be semantically parsed to generate element text word sequences such as "user a", "pays" and "xx element".
An element vector may then be generated based on the element text word sequence and the location information of the element unit. For example, in the case of a business contract, the element vector may be a word vector sequence of important content such as contract subject, contract amount, contract period, etc., that is involved in the terms of the contract. In the case of "user a pays xx element" related to contract performance as above, an element vector may be generated from the position of each parsed word in the contract. In particular, in the contract document auditing apparatus of the embodiment of the present application, the element vector may be further generated according to a predetermined word vector model. For example, if the predetermined Word vector model is a Word2Vec model, the Word2Vec model may be used to generate element vectors for element text Word sequences.
The contractual user information and element vector may then be input to the second machine learning model to obtain the audit value.
According to the contract document auditing apparatus of an embodiment of the present application, the contract user information determined by the parsing module 22 and the generated element vectors, for example, "user a", "payment", and "xx element" may be input to the second machine learning model to obtain an auditing value. For example, in the case of a business contract, the audit value may be a risk value for the contract's subject, contract amount, contract period, etc., referred to in the contract terms, that may be at risk.
The second determination module 26 may be configured to determine a risk level of the target contract document based on a difference between the audit value and a preset threshold.
According to the contract document auditing apparatus of the embodiment of the present application, the second determining module 26 may determine the risk level of the target contract document according to the difference between the auditing value calculated by the second calculating module 25 and the preset threshold value. For example, in the case of a business contract, the risk level may be the dispute probability of the important content of the contract subject, the contract amount, the contract period limit, etc., involved in the contract terms. In particular, in the contract document auditing apparatus according to the embodiment of the present application, the second determination module 26 may further determine the risk level of the target contract document according to a preset threshold. For example, if the preset threshold is 0.5, a contract document whose difference between the audit value and the preset threshold is greater than 0.5 may be determined to be a high risk contract document.
In addition, in the contract document auditing apparatus according to the embodiment of the present application, the first machine learning model used may also be trained before or during the auditing is performed. For example, a predetermined set of user questions may be obtained based on the contract type. In particular, in the case of a business contract, the user question set may be a question set of common questions related to contractual, contractual amounts, contract period limits, and the like. For example, if the contract type determined by the parsing module 22 is a buy and sell contract, then the common questions associated with the buy and sell contract may be obtained as a predetermined set of user questions based on the buy and sell contract type.
And then, acquiring a preset text corresponding to at least one element unit according to the contract type according to each text unit. And acquiring a preset text corresponding to at least one element unit according to the contract type determined by the analysis module 22. For example, in the case of a commercial contract, the text corresponding to the element unit may be standard terms related to important contents of the contract label, the contract amount, the contract period limit, and the like, or qualified terms that have been reviewed and confirmed by a professional. In particular, in the contract document auditing apparatus according to the embodiment of the present application, the second determination module 26 may further obtain the text corresponding to the preset element unit according to the predetermined element unit text library. For example, if the predetermined element unit text library contains a preset text related to a purchase and sale contract, the preset text related to the purchase and sale contract may be acquired as a text corresponding to the element unit according to the purchase and sale contract type. Of course, in the embodiment of the application, a plurality of related contract key information documents can be recalled from a corpus according to contract types and subjected to related preprocessing, long-width documents can be divided according to clause document structures, and Guan Wendang paragraphs are screened through matching of custom problem keywords and the keyword relativity in the paragraphs and are used as reference element texts.
Then, a first machine learning model may be used to calculate a correlation for each reference element text in the text unit and a pre-set reference element text library, respectively, to determine a degree of correspondence between the text unit and the reference text.
For example, the correlation may be calculated according to the degree of matching between the acquired text unit and each reference element text in the preset reference element text library. For example, if the acquired text unit is "contract subject", a certain reference element text in the preset reference element text library is "subject", the correlation may be calculated according to the degree of matching between "contract subject" and "subject".
In addition, in the embodiment of the application, the association degree between the text units can be further obtained, and the text units with the association degree larger than the preset threshold value are divided into text segments. For example, in the case of a business contract, the degree of association may be a semantic relatedness between text elements. In particular, in the contract document auditing apparatus of the embodiment of the present application, the association degree may be further calculated according to a predetermined association degree calculation model. For example, if the predetermined relevance calculation model is a TF-IDF model, semantic relevance between text units may be calculated according to the TF-IDF model to obtain relevance. For example, the relevance may be calculated from semantic relevance between the acquired text units. For example, if the acquired text units are "contract subject" and "contract amount", the degree of association may be calculated from the semantic correlation between "contract subject" and "contract amount". In addition, the text segments can be divided according to a predetermined association threshold. For example, if the predetermined relevance threshold is 0.5, text units having a relevance greater than 0.5 may be divided into text segments.
Finally, the element value of the text unit may be determined based on the relevance and the degree of relevance. For example, in the case of a commercial contract, the element value may be the contribution of the text element to the contractual, contract amount, contract period, and the like. In particular, in the contract document audit apparatus of the embodiment of the present application, the element value may be further calculated according to a predetermined element value calculation model. For example, if the predetermined element value calculation model is a weighted average model, the correlation and the degree of association may be weighted-averaged according to the weighted average model to calculate the element value.
And determining that the training of the first machine learning model was successful when the number of element values greater than the predetermined element threshold is greater than the predetermined number threshold. For example, if the predetermined training success judgment model is an accuracy rate model, the training result of the first machine learning model may be judged according to the accuracy rate model to determine that the training is successful.
For example, if 4 element values among the element values calculated for a certain training sample are greater than a predetermined element threshold value of 0.8 during training, it may be determined that the training of the first machine learning model was successful.
In addition, the contract document auditing apparatus according to the embodiment of the present application may acquire a predetermined third machine learning model according to the contract type after acquiring the element text word sequence. The third machine learning model may be, for example, a machine model based on standard clause text or on clause text validated by a professional.
For example, in the case of a business contract, the third machine learning model may be a model for judging word rationality. For example, a predetermined third machine learning model may be obtained based on the contract type determined by parsing module 22. For example, if the contract type determined by the parsing module 22 is a sales contract, a third machine learning model associated with the sales contract may be obtained as a predetermined third machine learning model according to the sales contract type.
Then, in an embodiment of the application, a third machine learning model may be used to calculate a confidence value for each word in the sequence of element text words, where the confidence value indicates the rationality of each word at a corresponding location in the sequence of element text words. For example, in the case of a business contract, the confidence value may be a composite score of factors such as how frequently words appear in the terms of the contract, the collocation of words with other words, and the like. In particular, in the contract document auditing apparatus of the embodiment of the present application, the trusted value may be further calculated according to a predetermined trusted value calculation model. For example, if the predetermined confidence value calculation model is a bayesian network model, the confidence value for each word in the sequence of element text words may be calculated according to the bayesian network model.
For example, the confidence value may be calculated based on factors such as the frequency of occurrence of each word in the sequence of element text words in the contractual terms, the collocation of the words with other words, and the like. In addition, a trusted value may be calculated based on training results of the third machine learning model. For example, if the training result of the third machine learning model is:
the word "contract" has a confidence value of 0.9
The word "target" has a confidence value of 0.8
The trusted value of the word "is 0.7 and the trusted value of the word" contract "may be determined to be 0.9, the trusted value of the word" target "is determined to be 0.8, and the trusted value of the word" is determined to be 0.7.
Words in the sequence of element text words may then be ordered by the size of the confidence value. For example, in the case of a business contract, a word with a larger confidence value indicates that it is more plausible at its corresponding location in the sequence of element text words. Words in the sequence of element text words may be ordered, for example, according to the calculated confidence values. For example, if the element text word sequence is "contract", "target", "object", and its confidence values are 0.9, 0.8, 0.7, 0.6, respectively, the words may be ordered according to the magnitude of the confidence values, resulting in an ordered word sequence of "contract", "target", "object".
Finally, words whose ranking order is less than a predetermined ranking threshold may be determined to be false candidate words.
According to the contract document auditing device of the embodiment of the application, the words with the sorting order smaller than the preset sorting threshold value can be determined as error candidate words. For example, in the case of a business contract, the predetermined ranking threshold may be 0.5. In particular, in the contract document auditing apparatus of the embodiment of the present application, the error candidate word may be further judged according to a predetermined error candidate word judgment model. For example, if the predetermined miscandidate word judgment model is a rule-based model, words having a ranking order smaller than a predetermined ranking threshold may be judged according to the rule-based model to determine miscandidate words. For example, the wrong candidate word may be determined according to a predetermined ranking threshold of 0.5. For example, if the ordered word sequence is "contract", "target", "object", "word" then "of the word order less than 0.5" may be determined as a wrong candidate word.
In addition, according to the embodiment of the application, the contract auditing method can also analyze the target contract document to acquire signature data of the contract user. In embodiments of the present application, typically, the signer of the contract needs to sign the contract to indicate approval and validate it. For example, the signer of the contract may stamp the company in the case of a company, and may sign or stamp the individual if it is an individual. Thus, according to the embodiment of the present application, the target contract document acquired by the acquisition module 21 can be parsed to acquire signature data of the contract user. For example, in the case of a business contract, the signature data may be an electronic signature or a handwritten signature signed by the contracting user on the contract document. In particular, in the contract document auditing apparatus according to the embodiment of the present application, the parsing may be further performed according to a predetermined parsing model. For example, if the predetermined parsing model is a rule-based parsing model, the target contract document may be parsed according to the rule-based parsing model to obtain signature data.
For example, signature data may be obtained from an electronic signature or handwritten signature area in a target contract document. In addition, signature data may be acquired according to an analysis result of a predetermined analysis model. For example, if the analysis result of the predetermined analysis model is:
electronic signature data: { "type": electronic "," data ": electronic signature data encoded by" base64 "}
Handwritten signature data: { "type": "handwritten", "data": "base64 encoded handwritten signature data" } electronic signature data and handwritten signature data may be used as signature data, respectively.
According to the contract document auditing device provided by the embodiment of the application, the image characteristics of the acquired signature data can be extracted to determine the type of the signature data. For example, in the case of a commercial contract, the image features may be stroke features, pressure features, etc. of the signature data. In particular, in the contract document audit apparatus of the embodiment of the present application, the image features may be further extracted according to a predetermined image feature extraction model. For example, if the predetermined image feature extraction model is a convolutional neural network-based model, image feature extraction may be performed on signature data according to the convolutional neural network-based model.
For example, image features may be extracted from stroke features, pressure features, etc. of the signature data. In addition, the type of signature data may be determined based on the extraction result of the predetermined image feature extraction model. For example, if the extraction result of the predetermined image feature extraction model is:
Electronic signature data: { "type": "electronic", "features": "extracted electronic signature image feature" }
Handwritten signature data: { "type": "handwritten", "features": "extracted handwritten signature image feature" } electronic signature data and handwritten signature data may then be used as different types of signature data, respectively.
When the signature data is handwriting signature data, the reference signature data of the user is extracted from a preset signature database according to the user information corresponding to the signature data.
According to the contract document auditing device provided by the embodiment of the application, when the type of the extracted signature data is handwriting signature data, the reference signature data of the user can be extracted from the preset signature database according to the user information corresponding to the signature data. For example, in the case of a business contract, the reference signature data may be a handwritten signature signed by the contracting user on other contract documents. In particular, in the contract document auditing apparatus according to the embodiment of the present application, the reference signature data may be further queried according to a predetermined signature database query model. For example, if the predetermined signature database query model is a hash table-based model, a preset signature database may be queried according to the hash table-based model to extract reference signature data.
For example, the reference signature data may be queried based on information such as a name, an identification card number, etc. in the user information corresponding to the signature data. Furthermore, the reference signature data may be extracted based on a query result of a predetermined signature database query model. For example, if the query result of the predetermined signature database query model is:
Reference signature data: { "user_id": "user ID", "data": "base64 encoded reference signature data" } the reference signature data may be used as the reference signature data for the user.
According to the contract document auditing apparatus of the embodiment of the present application, the similarity between the extracted handwritten signature data and the acquired reference signature data can be calculated using the fourth machine learning model. For example, in the case of a commercial contract, the similarity may be a stroke similarity, a stress similarity, or the like between handwritten signature data and reference signature data. In particular, in the contract document auditing apparatus of the embodiment of the present application, the similarity may be further calculated according to a predetermined similarity calculation model. For example, if the predetermined similarity calculation model is a dynamic time warping-based model, the similarity between the handwritten signature data and the reference signature data may be calculated from the dynamic time warping-based model.
For example, the similarity may be calculated from a stroke similarity, a pressure similarity, etc. between the handwritten signature data and the reference signature data. In addition, the similarity may be calculated based on a training result of the fourth machine learning model. For example, if the training result of the fourth machine learning model is: a similarity between the handwritten signature data and the reference signature data of 0.9 may then determine the similarity between the handwritten signature data and the reference signature data to be 0.9. For example, the acquired signature data type may be determined, where the data type includes coordinate point data (including signature position x, signature position y, signature pressure p, signature time t) and image data, and according to the acquired coordinate point data and image data, a handwriting signature comparison model is called to compare with data in a handwriting library, and determine whether the data is signed by the same person. When the calculated similarity is greater than the predetermined signature similarity, the signature data is determined to be the signature of the user. For example, in the case of a commercial contract, the predetermined signature similarity may be 0.8. In particular, in the contract document auditing apparatus according to the embodiment of the present application, whether the signature data is the signature of the user may be further determined according to a predetermined signature similarity determination model. For example, if the predetermined signature similarity determination model is a threshold-based model, the similarity may be determined based on the threshold-based model to determine whether the signature data is the signature of the user.
For example, it may be determined whether the signature data is the signature of the user based on a predetermined signature similarity of 0.8. For example, if the calculated similarity is 0.9, the signature data may be determined to be the signature of the user. In determining that the visa data is seal data, in the contract checking method according to the embodiment of the application, in the case of a commercial contract, the image features may be shape features, texture features, and the like of the seal data. In particular, in the contract document audit apparatus of the embodiment of the present application, the image features may be further extracted according to a predetermined image feature extraction model. For example, if the predetermined image feature extraction model is a convolutional neural network-based model, image feature extraction may be performed on signature data according to the convolutional neural network-based model. Similarly, when the type of the extracted signature data is seal data, the reference seal data of the user is extracted from a preset signature database according to the user information corresponding to the signature data. For example, in the case of a business contract, the reference seal data may be a seal that the contractor seals against other contract documents. In particular, in the contract document auditing apparatus according to the embodiment of the present application, the reference seal data may be further queried according to a predetermined signature database query model. For example, if the predetermined signature database query model is a hash table-based model, a preset signature database may be queried according to the hash table-based model to extract the reference seal data.
According to the contract document auditing device provided by the embodiment of the application, the similarity between the extracted seal data and the extracted reference seal data can be calculated by further using a fifth machine learning model. For example, in the case of a commercial contract, the similarity may be a shape similarity, a texture similarity, or the like between the seal data and the reference seal data. In particular, in the contract document auditing apparatus of the embodiment of the present application, the similarity may be further calculated according to a predetermined similarity calculation model. For example, if the predetermined similarity calculation model is a correlation-based model, the similarity between the seal data and the reference seal data may be calculated from the correlation-based model.
According to the contract document auditing device provided by the embodiment of the application, when the calculated similarity is greater than the preset seal similarity, the seal data is determined to be the seal of the user. For example, in the case of a commercial contract, the predetermined stamp similarity may be 0.8. For example, it may be determined whether the stamp data is the stamp of the user based on a predetermined stamp similarity of 0.8. For example, if the calculated similarity is 0.9, it may be determined that the stamp data is the stamp of the user. If it is 0.7, it can be determined that the stamp data is not the stamp of the user or is a suspected stamp.
According to the contract document auditing device, the contract type, the contract user information and the theme corresponding to the target contract document are determined by analyzing the acquired target contract document, then the element value of each text unit in the target contract document is calculated by using a first machine learning model according to the contract type, and the text units with the element values of which the order is greater than a preset order number are determined as element units according to the ordering result of the element values; calculating an audit value of each element unit according to contract user information, semantics of each element unit and position information by using a second machine learning model; finally, the risk degree of the target contract document can be determined according to the difference between the auditing value and the preset threshold value, so that the auditing method of the embodiment of the application can further consider the user who signs the contract and the contract signing scene to audit the target contract document, so that the element units can be determined by fully considering the user information and the contract scene, and the risk degree of the contract is determined based on the element units, thereby greatly reducing the probability of disputes caused by the contract in actual use.
Example III
The internal functions and structures of a test device are described above, which may be implemented as an electronic device. Fig. 3 is a schematic structural diagram of an embodiment of an electronic device provided by the present application. As shown in fig. 3, the electronic device comprises a memory 31 and a processor 32.
A memory 31 for storing a program. In addition to the programs described above, the memory 31 may be configured to store various other data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and the like.
The memory 31 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The processor 32 is not limited to a Central Processing Unit (CPU), but may be a processing chip such as a Graphics Processor (GPU), a Field Programmable Gate Array (FPGA), an embedded neural Network Processor (NPU), or an Artificial Intelligence (AI) chip. The processor 32 is coupled to the memory 31 and executes a program stored in the memory 31, and the program runs to execute the contract document auditing method according to the first embodiment.
Further, as shown in fig. 3, the electronic device may further include: communication component 33, power component 34, audio component 35, display 36, and other components. Only some of the components are schematically shown in fig. 3, which does not mean that the electronic device only comprises the components shown in fig. 3.
The communication component 33 is configured to facilitate communication between the electronic device and other devices, either wired or wireless. The electronic device may access a wireless network based on a communication standard, such as WiFi, 3G, 4G, or 5G, or a combination thereof. In one exemplary embodiment, the communication component 33 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 33 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
A power supply assembly 34 provides power to the various components of the electronic device. Power supply components 34 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic devices.
The audio component 35 is configured to output and/or input audio signals. For example, the audio component 35 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in the memory 31 or transmitted via the communication component 33. In some embodiments, the audio component 35 further comprises a speaker for outputting audio signals.
The display 36 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may detect not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.
Claims (10)
1.A method for auditing a treaty document, comprising:
acquiring a target contract document;
Analyzing the target contract document to determine the contract type, contract user information and theme corresponding to the target contract document, wherein the contract user information comprises: signing user information of both contracting parties, drafting user information of contracting drafting parties and correlation party user information of contract correlation parties,
Calculating element values of each text unit in the target contract document according to the contract type by using a first machine learning model, wherein the element values indicate contribution degrees of each text unit to the theme of the target contract document;
According to the sorting result of the element values, determining text units with the element values of which the sorting order is larger than a preset sequence number as element units;
calculating an audit value of each element unit according to the contract user information, the semantics of each element unit and the position information by using a second machine learning model;
And determining the risk degree of the target contract document according to the difference between the auditing value and a preset threshold value, wherein the risk degree indicates the probability of disputes caused by the target contract document in actual use.
2. The method of claim 1, wherein calculating the element values for each text unit in the target contract document using a first machine learning model according to the contract type comprises:
Calculating the similarity between the contract type and the contract type of the reference target document according to the contract type;
sorting according to the similarity, and taking the reference target document with highest similarity sorting as a reference target document of the same contract type as the target contract document;
calculating a first degree of association between each text unit and each text unit in the reference target document;
calculating a second association degree of each text unit in the target contract document with the contract user information;
For each text unit in the target contract document, determining an element value according to the first association degree and the second association degree which are higher than a preset threshold value.
3. The method of claim 2, wherein calculating an audit value for each element unit based on the contract user information, semantics of each element unit, and location information using a second machine learning model comprises:
for each determined element unit, performing semantic parsing to generate an element text word sequence;
generating element vectors according to the element text word sequences and the position information of the element units;
the contractual user information and the element vector are input to the second machine learning model to obtain the audit value.
4. A method of auditing a treaty document according to claim 2 or 3, characterised in that the method further comprises:
acquiring a preset user problem group according to the contract type;
acquiring a preset text corresponding to at least one element unit according to the contract type according to each text unit;
using the first machine learning model, respectively calculating correlation for each reference element text in the text unit and a preset reference element text library;
acquiring the association degree between each text unit, and dividing the text units with the association degree larger than a preset threshold value into text segments;
Determining an element value of the text unit according to the relevance and the relevance;
and determining that the training of the first machine learning model is successful when the number of element values greater than the predetermined element threshold is greater than a predetermined number threshold.
5. The method of contract document auditing according to claim 1, characterized in that the method further comprises:
for each determined element unit, performing semantic parsing to generate an element text word sequence;
acquiring a predetermined third machine learning model according to the contract type;
Calculating, using the third machine learning model, a confidence value for each word in the sequence of element text words, wherein the confidence value indicates the rationality of each word at a corresponding location in the sequence of element text words;
Sorting words in the element text word sequence according to the magnitude of the trusted value;
words whose ranking order is less than a predetermined ranking threshold are determined to be false candidate words.
6. The method of contract document auditing according to claim 1, characterized in that the method further comprises:
analyzing the target contract document to obtain signature data of contract users;
extracting image features of the signature data to determine a type of the signature data;
when the signature data is handwriting signature data, extracting reference signature data of the user from a preset signature database according to user information corresponding to the signature data;
calculating a similarity between the handwritten signature data and the reference signature data using a fourth machine learning model;
And when the similarity is greater than the preset signature similarity, determining the signature data as the signature of the user.
7. The method of contract document auditing according to claim 1, characterized in that the method further comprises:
analyzing the target contract document to obtain signature data of contract users;
extracting image features of the signature data to determine a type of the signature data;
when the signature data is seal data, extracting reference seal data of the user from a preset signature database according to user information corresponding to the signature data;
calculating the similarity between the seal data and the reference seal data by using a fifth machine learning model;
and when the similarity is greater than the preset seal similarity, determining the seal data as the seal of the user.
8. A contract document auditing apparatus, comprising:
the acquisition module is used for acquiring the target contract document;
The analysis module is used for analyzing the target contract document to determine the contract type, the contract user information and the theme corresponding to the target contract document, wherein the contract user information comprises: signing user information of both contracting parties, drafting user information of contracting drafting parties and correlation party user information of contract correlation parties,
A first calculation module, configured to calculate, using a first machine learning model according to the contract type, element values of each text unit in the target contract document, where the element values indicate contribution of each text unit to a topic of the target contract document;
a first determining module, configured to determine, as an element unit, a text unit whose element value is ranked greater than a predetermined sequence number according to a ranking result of the element values;
the second calculation module is used for calculating the auditing value of each element unit according to the contract user information, the semantics of each element unit and the position information by using a second machine learning model;
And the second determining module is used for determining the risk degree of the target contract document according to the difference between the auditing value and a preset threshold value, wherein the risk degree indicates the probability of disputes caused by the target contract document in actual use.
9. An electronic device, comprising:
a memory for storing a program;
A processor for executing the program stored in the memory to perform the contract document auditing method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which is executable by a processor, characterized in that the program, when executed by the processor, implements the contract document auditing method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410411157.7A CN118396786A (en) | 2024-04-07 | 2024-04-07 | Contract document auditing method and device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410411157.7A CN118396786A (en) | 2024-04-07 | 2024-04-07 | Contract document auditing method and device, electronic equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118396786A true CN118396786A (en) | 2024-07-26 |
Family
ID=91993254
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410411157.7A Pending CN118396786A (en) | 2024-04-07 | 2024-04-07 | Contract document auditing method and device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118396786A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118606920A (en) * | 2024-08-07 | 2024-09-06 | 支付宝(杭州)信息技术有限公司 | Transaction processing method and device, storage medium and electronic equipment |
-
2024
- 2024-04-07 CN CN202410411157.7A patent/CN118396786A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118606920A (en) * | 2024-08-07 | 2024-09-06 | 支付宝(杭州)信息技术有限公司 | Transaction processing method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109582772B (en) | Contract information extraction method, contract information extraction device, computer equipment and storage medium | |
US9171072B2 (en) | System and method for real-time dynamic measurement of best-estimate quality levels while reviewing classified or enriched data | |
CN111737499B (en) | Data searching method based on natural language processing and related equipment | |
CN110674131A (en) | Financial statement data processing method and device, computer equipment and storage medium | |
US11860955B2 (en) | Method and system for providing alternative result for an online search previously with no result | |
CN118396786A (en) | Contract document auditing method and device, electronic equipment and computer readable storage medium | |
US9336187B2 (en) | Mediation computing device and associated method for generating semantic tags | |
CN110569502A (en) | Method and device for identifying forbidden slogans, computer equipment and storage medium | |
CN111651552B (en) | Structured information determining method and device and electronic equipment | |
CN113590945B (en) | Book recommendation method and device based on user borrowing behavior-interest prediction | |
CN112632268B (en) | Complaint work order detection processing method, complaint work order detection processing device, computer equipment and storage medium | |
CN110858353A (en) | Method and system for obtaining case referee result | |
CN110532229B (en) | Evidence file retrieval method, device, computer equipment and storage medium | |
CN115392235A (en) | Character matching method and device, electronic equipment and readable storage medium | |
CN110389963A (en) | The recognition methods of channel effect, device, equipment and storage medium based on big data | |
WO2023167727A1 (en) | Story message generation | |
CN113254787B (en) | Event analysis method, device, computer equipment and storage medium | |
CN111316259A (en) | System and method for dynamic synthesis and transient clustering of semantic attributes for feedback and adjudication | |
CN110825847B (en) | Method and device for identifying intimacy between target people, electronic equipment and storage medium | |
CN113901817A (en) | Document classification method and device, computer equipment and storage medium | |
CN113204710A (en) | Public opinion analysis method and device, terminal equipment and storage medium | |
CN113095078A (en) | Associated asset determination method and device and electronic equipment | |
CN112200602A (en) | Neural network model training method and device for advertisement recommendation | |
Perez et al. | Combatting Human Trafficking in the Cyberspace: A Natural Language Processing-Based Methodology to Analyze the Language in Online Advertisements | |
US12050879B2 (en) | Systems and methods for reducing input to and increasing processing speeds of natural language processing models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |