CN114359912B - Software page key information extraction method and system based on graph neural network - Google Patents
Software page key information extraction method and system based on graph neural network Download PDFInfo
- Publication number
- CN114359912B CN114359912B CN202210279500.8A CN202210279500A CN114359912B CN 114359912 B CN114359912 B CN 114359912B CN 202210279500 A CN202210279500 A CN 202210279500A CN 114359912 B CN114359912 B CN 114359912B
- Authority
- CN
- China
- Prior art keywords
- text
- text line
- lines
- neural network
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 35
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000003062 neural network model Methods 0.000 claims abstract description 19
- 238000001514 detection method Methods 0.000 claims abstract description 11
- 238000007635 classification algorithm Methods 0.000 claims abstract description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 230000000007 visual effect Effects 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 10
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 7
- 102100032202 Cornulin Human genes 0.000 claims description 6
- 101000920981 Homo sapiens Cornulin Proteins 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 2
- 238000012015 optical character recognition Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 229940088594 vitamin Drugs 0.000 description 1
- 229930003231 vitamin Natural products 0.000 description 1
- 235000013343 vitamin Nutrition 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
- 150000003722 vitamin derivatives Chemical class 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention belongs to the technical field of software page information extraction, and particularly relates to a software page key information extraction method and system based on a graph neural network. The method comprises S1, inputting the web page picture, and outputting all the text line coordinate information on the picture; s2, cutting out all text lines according to the obtained text line coordinate information and identifying to obtain character information of each text line; s3, combining the webpage picture, the text line coordinate information and the text line character information, and outputting the category of all text lines through a text line classification algorithm based on a graph neural network model; s4, performing key-value pair matching according to the category of the text line; and if the matching is successful, outputting the text information corresponding to the required key value pair. The system comprises a text line detection module, a text line identification module, a text line classification module and a text line key value pair matching module. The method has the characteristics of strong universality and capability of being applied to all software text types.
Description
Technical Field
The invention belongs to the technical field of software page information extraction, and particularly relates to a software page key information extraction method and system based on a graph neural network.
Background
The RPA application scenario typically encounters the task of web page or software page specific text information extraction. The task needs to acquire all the text information on the page by means of Optical Character Recognition (OCR) technology, and then extract the required field content through some post-processing operations (such as regular matching according to keywords, etc.).
In recent years, with the development of the field of artificial intelligence, deep neural networks are widely applied in the field of OCR, such as document recognition, certificate recognition, bill recognition and the like. Compared with the traditional OCR recognition algorithm, the deep neural network can obviously improve the application range and recognition accuracy of OCR recognition. The most commonly used Convolutional Neural Networks (CNN) tend to focus on only local features of the image, ignoring the interrelationships before the local features. The graph neural network can regard local features of the image as graph nodes and learn the interrelation among the nodes. In some specific scenes such as software interfaces and the like, text lines on the images have great interrelations, and more useful information can be learned by using a graph neural network.
The key information extraction refers to extracting required specified field information from the image text. For example, specific field information such as name, gender, ethnicity, identification card number, etc. is extracted from the identification card picture. There are often many text messages on a general software interface, and only a few key text messages are useful in actual business. If all the useful key information is to be extracted from all the text information, a series of complicated post-processing methods, such as template matching and the like, need to be designed. When designing the template, the character information of the text line, the position information of the text line, etc. need to be considered. It takes a lot of labor cost and time cost to set different post-processing rules for different software interfaces.
One of the existing key information extraction methods is to determine whether a matching relationship exists between a template image and a character string of an image to be detected based on template matching according to a preset template. For example, after all text information on the picture is identified, some regular rules are set according to text features of the key fields to match with all text lines on the picture, and the text line successfully matched with the regular rule of the corresponding key field is the key information.
In addition, a deep neural network-based method is used for classifying all text boxes in the image extracted by the OCR algorithm. For example, if the picture to be tested is an identification card picture, all text boxes in the picture can be classified into categories such as name, nationality, date of birth, address and the like, so that the key information extraction is completed.
However, the method based on template matching is very dependent on the layout of the image text, and once the text layout of the image to be detected is inconsistent with the preset template text layout, the extraction of the key information is wrong or fails. In addition, the interface text layouts of different application software are different, and a universal matching template is difficult to design. For example, to extract a name field from a picture, it is generally necessary to design a matching pattern by first searching the field for the keyword "name" and then matching the text boxes of 2-3 Chinese characters from the text box on the right side of the "name" field. If the interface typesetting of certain software is not arranged from left to right but arranged from top to bottom, the actual name is below the keyword 'name'. In this case, the matching pattern set in the past cannot be applied. Therefore, the template matching based method is difficult to have good versatility.
The method based on deep neural network classification is to assign a category to all text lines in the picture. For example, to extract information from an identification card, all text line fields on the identification card can be classified into categories such as "name", "gender", "date of birth", "address", "identification card number", and the like. When a certain key field needs to be extracted, corresponding field information can be extracted only according to the corresponding category of the key field. This approach does not need to rely on a specific template, but does require all the categories to be unambiguous. The text types on different application software are very different, and all the categories are difficult to exhaust. Therefore, the deep neural network classification-based method can only be used for specific scenes, and is not very universal.
Based on the above problems, it is very important to design a method and a system for extracting key information of a software page based on a graph neural network, which have strong universality and can be applied to all software text types.
For example, chinese patent application No. CN201911163754.8 describes a method, an apparatus, a terminal device and a server for accessing a web page, the method includes: acquiring an access request of a target webpage; the access request carries preset keywords; acquiring the position information of the keywords in the target webpage and the page data of the target webpage; and displaying the page data of the target webpage according to the position information. Although the page data of the target webpage is displayed according to the position information of the keywords, the user can quickly find the relevant contents of the searched keywords in the target webpage, so that the user experience is improved, the method has the defect that the method can only be used in a specific scene and is not very universal.
Disclosure of Invention
The invention provides a software page key information extraction method and system based on a graph neural network, which have strong universality and can be applied to all software text types, and aims to solve the problems that the existing key information extraction method can only be used in specific scenes and does not have good universality in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
the software page key information extraction method based on the graph neural network comprises the following steps;
s1, the input webpage picture passes through a DBNet text detection algorithm, and all text line coordinate information on the webpage picture is output;
s2, cutting out all text lines and identifying according to the obtained text line coordinate information through a CRNN text identification algorithm to obtain character information of each text line;
s3, combining the input webpage picture and the obtained text line coordinate information and text line character information, and outputting the category of all text lines through a text line classification algorithm based on a graph neural network model;
s4, respectively extracting the text line coordinate information features and the text line character information features of any two text lines, fusing to obtain fusion features, and simultaneously performing key value pair matching by combining the categories of the text lines; and if the matching is successful, outputting the text information corresponding to all the required key value pairs.
Preferably, the categories of the text line described in step S3 include three categories of "key", "value", and "other".
Preferably, step S3 includes the steps of:
s31, extracting the characteristics of the web page picture by using a CNN backbone network, and processing the characteristics of all text lines into a uniform dimension by using an ROI Pooling layer; extraction of visual features of each text line with CNN + ROI PoolingExtracting semantic features of text lines using long-short term memory networks LSTMAnd combining the visual featuresAnd semantic featuresFusing to obtain a fused feature,Representing the splicing operation, the formula is as follows:
s32, utilizing the fusion characteristics of each text lineEstablishing a graph neural network model, and constructing an undirected graph by taking each text line as a graph node, wherein the undirected graph is represented asWhereinRepresenting the fused features of all lines of text,weights representing edges of two nodes in the undirected graph;
constructing feature vectors considering spatial relationships between text lines
Wherein,,is shown asThe coordinates of the center point of the individual text line,,is shown asThe coordinates of the center point of the individual text line,,is shown asThe width and height of the individual lines of text,,denotes the firstWidth and height of individual text lines;andrepresenting the distance between two lines of text;andrepresents the aspect ratio of each of the two text lines;andrepresenting the difference in aspect ratio between two lines of text.
Preferably, step S3 further includes the steps of:
Wherein,is a linear transformation forThe process of raising the vitamin content is carried out,to representThe process of the regularization is carried out,a multi-layer neural network is represented.
Preferably, step S3 further includes the steps of:
s34, using the following formula to make an undirected graphNode ofIteration is carried out, the iteration times are hyper-parameters, and can be adjusted as required:
wherein,a function of the ReLU activation is represented,is a linear transformation that is a function of,is shown asIn the second iterationA graph node;
and S35, completing the construction of the graph neural network model.
Preferably, step S4 includes the steps of:
s41, extracting semantic features of the character information of each text line by using a long-short term memory network (LSTM)Feature of text line coordinate information having four vertices for each text line,,,Fusing to get a fused feature:
Wherein,、respectively represent the firstA line of text andsemantic features of individual text lines;is shown asVertex coordinates of individual text lines;is shown asVertex coordinates of individual text lines;、is shown asWidth and height of individual text lines;、is shown asThe width and height of individual text lines.
S42, fusing the fused featuresSending the two text lines to a classifier, and outputting the class of 0 when the two text lines do not belong to the same key value pair; when two text lines belong to the same key-value pair, the output category is 1.
The invention also provides a software page key information extraction system based on the graph neural network, which comprises the following steps:
the text line detection module is used for outputting all text line coordinate information on the webpage picture by a DBNet text detection algorithm on the input webpage picture;
the text line recognition module is used for cutting out all text lines and recognizing the text lines according to the obtained text line coordinate information through a CRNN text recognition algorithm to obtain character information of each text line;
the text line classification module is used for combining the input webpage picture with the obtained text line coordinate information and text line character information and outputting the categories of all the text lines through a text line classification algorithm based on a graph neural network model;
and the text line key value pair matching module is used for respectively extracting the text line coordinate information characteristics and the text line character information characteristics of any two text lines, fusing to obtain fusion characteristics, and meanwhile, matching the key value pairs according to the categories of the text lines.
Preferably, the software page key information extraction system based on the graph neural network further comprises;
and the key value pair output module is used for outputting text information corresponding to all required key value pairs when the key value pairs are successfully matched.
Preferably, the text line classification module further includes:
the graph neural network model module is used for constructing a graph neural network model;
and the classification module is used for outputting the categories of all text lines.
Compared with the prior art, the invention has the beneficial effects that: (1) the invention creatively applies the graph neural network to the extraction of the key information of the RPA application software, and can directly output all key value pairs in the software picture, thereby helping to extract the wanted key information and greatly reducing the complexity of searching the key information by manually setting rules in the later period; (2) the key information extraction method disclosed by the invention integrates the visual characteristics of the image, the semantic characteristics of the text and the position characteristics of the text line, so that the extraction accuracy of the key information is greatly improved; (3) the contrast learning method adopted by the key-value pair matching of the invention only needs a small amount of text box type labeling samples, thus having good key-value pair matching effect and strong system generalization.
Drawings
FIG. 1 is a flow chart of a method for extracting key information of a software page based on a graph neural network according to the present invention;
FIG. 2 is a functional architecture diagram of the software page key information extraction system based on graph neural network in the present invention;
FIG. 3 is a functional architecture diagram of the text line classification module of the present invention;
fig. 4 is a flowchart illustrating capturing a picture from an RPA to extracting key information according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention, the following description will explain the embodiments of the present invention with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
Example 1:
as shown in FIG. 1, the invention provides a software page key information extraction method based on a graph neural network, which comprises the following steps;
s1, the input webpage picture passes through a DBNet text detection algorithm, and all text line coordinate information on the webpage picture is output;
s2, cutting out all text lines and identifying according to the obtained text line coordinate information through a CRNN text identification algorithm to obtain character information of each text line;
s3, combining the input web page picture with the obtained text line coordinate information and text line character information, and outputting the category of all text lines through a text line classification algorithm based on a graph neural network model;
s4, respectively extracting the text line coordinate information features and the text line character information features of any two text lines, fusing to obtain fusion features, and simultaneously performing key value pair matching by combining the categories of the text lines; and if the matching is successful, outputting the text information corresponding to all the required key value pairs.
Further, the categories of the text line described in step S3 include three categories of "key", "value", and "other".
The classification aims to extract all key values in the picture on one hand and filter out some invalid text lines on the other hand. A general classification network extracts visual features of an image through a series of convolution operations, and classifies pictures according to the visual features. However, the current task is to classify the text lines, the difference of the visual features of the text lines is not obvious, and the classification based on the visual features cannot have a good classification effect. The category of the text line has a strong relationship with semantic information and position information of the text line, some key information such as 'name', 'date' and the like are specific texts, and the 'value' is generally positioned at the right side or below the 'key'. Therefore, the classification accuracy of the text lines can be improved by taking the position information and the semantic information of the text lines as the input of the network.
As shown in fig. 3, step S3 includes the following steps:
s31, extracting the characteristics of the web page picture by using a CNN backbone network, and processing the characteristics of all text lines into a uniform dimension by using an ROI Pooling layer; visual feature extraction of each text line with CNN + ROI PoolingExtracting semantic features of text lines using long-short term memory networks LSTMAnd combining the visual featuresAnd semantic featuresFusing to obtain a fused feature,Representing the splicing operation, the formula is as follows:
s32, utilizing the fusion characteristics of each text lineEstablishing a graph neural network model, and constructing an undirected graph by taking each text line as a graph node, wherein the undirected graph is represented asIn whichRepresenting the fusion characteristics of all text lines and representing the weight of the edges of two nodes in the undirected graph;
constructing feature vectors considering spatial relationships between text lines
Wherein,,is shown asThe coordinates of the center point of the individual text line,,is shown asThe coordinates of the center point of the individual text line,,is shown asThe width and height of the individual lines of text,,is shown asWidth and height of each text line;andrepresenting the distance between two text lines;andrepresents the aspect ratio of each of the two text lines;andrepresenting the difference in aspect ratio between two lines of text.
Wherein,is a linear transformation forThe dimension of the mixture is increased by a plurality of steps,representThe process of the regularization is carried out,a multi-layer neural network is represented.
S34, using the following formula to make an undirected graphNode ofIteration is carried out, the iteration times are hyper-parameters, and can be adjusted according to needs:
wherein,a function of the ReLU activation is represented,is a linear transformation that is a function of,is shown asThe first in the second iterationA graph node;
and S35, completing the construction of the graph neural network model.
ROI Pooling is an operation that can process different dimensional features into the same dimension, and is ubiquitous in the mainstream two-stage target detection algorithm (e.g., fast RCNN).
Step S4 includes the steps of:
s41, extracting semantic features of the character information of each text line by using a long-short term memory network (LSTM)Feature of text line coordinate information having four vertices for each text line,,,Fusing to obtain a fused feature:
Wherein,、respectively representA line of text andsemantic features of individual text lines;is shown asVertex coordinates of each text line;is shown asVertex coordinates of individual text lines;、is shown asWidth and height of individual text lines;、is shown asThe width and height of individual text lines.
S42, fusing the fused featuresSending the two text lines to a classifier, and outputting the classification of 0 when the two text lines do not belong to the same key value pair; when two text lines belong to the same key-value pair, the output category is 1.
The invention divides the key information extraction into two steps, namely text line classification and text line key value pair matching. The text line classification is to classify all detected text lines into three categories: keys (keys), values (values) and others (other) do not need to distinguish specific key value categories, so that the universality is greatly enhanced, and the method can be applied to all software text types. The text line key value pair matching is to pair all keys and values, and bind each text line belonging to the category of "key" with the corresponding text line belonging to the category of "value", so that the corresponding value can be obtained as long as the key corresponding to certain key information is input.
As shown in fig. 2, the present invention further provides a software page key information extraction system based on the graph neural network, including:
the text line detection module is used for outputting all text line coordinate information on the webpage picture by a DBNet text detection algorithm on the input webpage picture;
the text line recognition module is used for cutting out all text lines and recognizing the text lines according to the obtained text line coordinate information through a CRNN text recognition algorithm to obtain character information of each text line;
the text line classification module is used for combining the input webpage picture with the obtained text line coordinate information and text line character information and outputting the categories of all the text lines through a text line classification algorithm based on a graph neural network model;
and the text line key value pair matching module is used for respectively extracting the text line coordinate information characteristics and the text line character information characteristics of any two text lines, fusing to obtain fusion characteristics, and simultaneously performing key value pair matching by combining the categories of the text lines.
And the key value pair output module is used for outputting the text information corresponding to all the required key value pairs when the key value pairs are successfully matched.
Further, the text line classification module further includes:
the graph neural network model module is used for constructing a graph neural network model;
and the classification module is used for outputting the categories of all text lines.
Based on the technical scheme of the invention, in the specific implementation and operation process, the specific implementation flow of the invention is described by using the flow chart from capturing pictures by the RPA to extracting key information shown in FIG. 4.
As shown in fig. 4, the specific implementation flow is as follows:
1. capturing pictures of application software pages by using an RPA (resilient packet access) as input, and configuring names of key information fields needing to be output;
2. inputting the picture into a text detector, and detecting all text line coordinates in the picture;
3. cutting out all text lines from the original image according to the text line coordinates detected in the step 2, inputting the text lines into a text recognizer, and recognizing the character content of each text line;
4. inputting the original image, the coordinates of the text lines output by the text detector and the content of the text lines output by the text recognizer into a text line classifier to obtain the categories (keys, values and other) of all the text lines;
5. inputting each text line belonging to the key and all text lines belonging to the value into a key-value matcher for matching, and binding the current key and the value if matching is successful;
6. matching the name of the key according to the name of the key information field set in the step 1;
7. the "value" bound to it is output according to the "key" corresponding to the name.
The invention creatively applies the graph neural network to the extraction of the key information of the RPA application software, and can directly output all key value pairs in the software picture, thereby helping to extract the wanted key information and greatly reducing the complexity of searching the key information by manually setting rules in the later period; the key information extraction method disclosed by the invention integrates the visual characteristics of the image, the semantic characteristics of the text and the position characteristics of the text line, so that the extraction accuracy of the key information is greatly improved; the contrast learning method adopted by the key-value pair matching of the invention only needs a small amount of text box type labeling samples, thus having good key-value pair matching effect and strong system generalization.
The foregoing has outlined rather broadly the preferred embodiments and principles of the present invention and it will be appreciated that those skilled in the art may devise variations of the present invention that are within the spirit and scope of the appended claims.
Claims (5)
1. The software page key information extraction method based on the graph neural network is characterized by comprising the following steps;
s1, the input webpage picture passes through a DBNet text detection algorithm, and all text line coordinate information on the webpage picture is output;
s2, cutting out all text lines and identifying according to the obtained text line coordinate information through a CRNN text identification algorithm to obtain character information of each text line;
s3, combining the input webpage picture and the obtained text line coordinate information and text line character information, and outputting the category of all text lines through a text line classification algorithm based on a graph neural network model;
s4, respectively extracting the text line coordinate information features and the text line character information features of any two text lines, fusing to obtain fusion features, and simultaneously performing key value pair matching by combining the categories of the text lines; if the matching is successful, outputting text information corresponding to all required key value pairs;
the categories of the text line in step S3 include three categories of "key", "value", and "other";
step S3 includes the following steps:
s31, extracting the characteristics of the web page picture by using a CNN backbone network, and processing the characteristics of all text lines into a uniform dimension by using an ROI Pooling layer; extraction of visual features of each text line with CNN + ROI PoolingExtracting semantic features of text lines using long-short term memory networks LSTMAnd combining the visual featuresAnd semantic featuresFusing to obtain a fused feature,Representing the splicing operation, the formula is as follows:
s32, utilizing the fusion characteristics of each text lineEstablishing a graph neural network model, and constructing an undirected graph by taking each text line as a graph node, wherein the undirected graph is represented asIn whichRepresenting the fused features of all lines of text,weights representing edges of two nodes in the undirected graph;
constructing feature vectors considering spatial relationships between text lines
Wherein,,denotes the firstThe coordinates of the center point of the individual text line,,denotes the firstThe coordinates of the center point of the individual text line,,is shown asThe width and height of the individual lines of text,,denotes the firstWidth and height of individual text lines;andrepresenting the distance between two text lines;andrepresents the aspect ratio of each of the two text lines;andrepresenting the difference in aspect ratio between two lines of text;
Wherein,is a linear transformation forThe dimension of the mixture is increased by a plurality of steps,to representThe process of the regularization is carried out,representing a multi-layer neural network;
s34, using the following formula to make an undirected graphNode ofPerforming iteration with the number of iterations being a hyper-parameterCan be adjusted as required:
wherein,a function of the ReLU activation is represented,is a linear transformation that is a function of,is shown asIn the second iterationA graph node;
and S35, completing the construction of the graph neural network model.
2. The method for extracting the key information of the software page based on the graph neural network as claimed in claim 1, wherein the step S4 comprises the following steps:
s41, extracting semantic features of the text line character information of each text line by using a long-short term memory network LSTMFeature of text line coordinate information having four vertices for each text line,,,Fusing to obtain a fused feature:
Wherein,、respectively represent the firstA line of text andsemantic features of individual text lines;denotes the firstVertex coordinates of each text line;is shown asVertex coordinates of individual text lines;、denotes the firstWidth and height of each text line;、denotes the firstWidth and height of individual text lines;
3. The software page key information extraction system based on the graph neural network is applied to the software page key information extraction method based on the graph neural network as claimed in any one of claims 1-2, and is characterized in that the software page key information extraction system based on the graph neural network comprises:
the text line detection module is used for outputting all text line coordinate information on the webpage picture by the DBNet text detection algorithm;
the text line recognition module is used for cutting out all text lines and recognizing the text lines according to the obtained text line coordinate information through a CRNN text recognition algorithm to obtain character information of each text line;
the text line classification module is used for combining the input webpage picture with the obtained text line coordinate information and text line character information and outputting the categories of all the text lines through a text line classification algorithm based on a graph neural network model;
and the text line key value pair matching module is used for respectively extracting the text line coordinate information characteristics and the text line character information characteristics of any two text lines, fusing to obtain fusion characteristics, and simultaneously performing key value pair matching by combining the categories of the text lines.
4. The software page key information extraction system based on the graph neural network as claimed in claim 3, further comprising;
and the key value pair output module is used for outputting text information corresponding to all required key value pairs when the key value pairs are successfully matched.
5. The graph neural network-based software page key information extraction system of claim 3, wherein the text line classification module further comprises:
the graph neural network model module is used for constructing a graph neural network model;
and the classification module is used for outputting the categories of all text lines.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210279500.8A CN114359912B (en) | 2022-03-22 | 2022-03-22 | Software page key information extraction method and system based on graph neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210279500.8A CN114359912B (en) | 2022-03-22 | 2022-03-22 | Software page key information extraction method and system based on graph neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114359912A CN114359912A (en) | 2022-04-15 |
CN114359912B true CN114359912B (en) | 2022-06-24 |
Family
ID=81095001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210279500.8A Active CN114359912B (en) | 2022-03-22 | 2022-03-22 | Software page key information extraction method and system based on graph neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114359912B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117079288B (en) * | 2023-10-19 | 2023-12-29 | 华南理工大学 | Method and model for extracting key information for recognizing Chinese semantics in scene |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN112257841A (en) * | 2020-09-03 | 2021-01-22 | 北京大学 | Data processing method, device and equipment in graph neural network and storage medium |
CN112464781A (en) * | 2020-11-24 | 2021-03-09 | 厦门理工学院 | Document image key information extraction and matching method based on graph neural network |
CN114187595A (en) * | 2021-12-14 | 2022-03-15 | 中国科学院软件研究所 | Document layout recognition method and system based on fusion of visual features and semantic features |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11403488B2 (en) * | 2020-03-19 | 2022-08-02 | Hong Kong Applied Science and Technology Research Institute Company Limited | Apparatus and method for recognizing image-based content presented in a structured layout |
CN114037985A (en) * | 2021-11-04 | 2022-02-11 | 北京有竹居网络技术有限公司 | Information extraction method, device, equipment, medium and product |
-
2022
- 2022-03-22 CN CN202210279500.8A patent/CN114359912B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN112257841A (en) * | 2020-09-03 | 2021-01-22 | 北京大学 | Data processing method, device and equipment in graph neural network and storage medium |
CN112464781A (en) * | 2020-11-24 | 2021-03-09 | 厦门理工学院 | Document image key information extraction and matching method based on graph neural network |
CN114187595A (en) * | 2021-12-14 | 2022-03-15 | 中国科学院软件研究所 | Document layout recognition method and system based on fusion of visual features and semantic features |
Non-Patent Citations (3)
Title |
---|
Graph-based Visual-Semantic Entanglement Network for Zero-shot Image Recognition;Yang Hu 等;《arXiv》;20210614;第1-15页 * |
基于主次关系特征的自动文摘方法;张迎等;《计算机科学》;20200615;第16-21页 * |
基于深度学习技术的图片文字提取技术的研究;蒋良卫等;《信息系统工程》;20200320(第03期);第89-90页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114359912A (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8744196B2 (en) | Automatic recognition of images | |
CN110717534B (en) | Target classification and positioning method based on network supervision | |
CN111753120B (en) | Question searching method and device, electronic equipment and storage medium | |
WO2020071558A1 (en) | Business form layout analysis device, and analysis program and analysis method therefor | |
CN112464781A (en) | Document image key information extraction and matching method based on graph neural network | |
CN111931859B (en) | Multi-label image recognition method and device | |
Hu et al. | Enriching the metadata of map images: a deep learning approach with GIS-based data augmentation | |
CN112381086A (en) | Method and device for outputting image character recognition result in structured mode | |
CN113469067A (en) | Document analysis method and device, computer equipment and storage medium | |
CN114359912B (en) | Software page key information extraction method and system based on graph neural network | |
CN112966676B (en) | Document key information extraction method based on zero sample learning | |
CN115063784A (en) | Bill image information extraction method and device, storage medium and electronic equipment | |
CN113936764A (en) | Method and system for desensitizing sensitive information in medical report sheet photo | |
CN113628181A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN115640401B (en) | Text content extraction method and device | |
CN115858695A (en) | Information processing method and device and storage medium | |
Vishwanath et al. | Deep reader: Information extraction from document images via relation extraction and natural language | |
JP6896260B1 (en) | Layout analysis device, its analysis program and its analysis method | |
CN116092100A (en) | Text content extraction method and device | |
Rahul et al. | Deep reader: Information extraction from document images via relation extraction and natural language | |
Akhter et al. | Semantic segmentation of printed text from marathi document images using deep learning methods | |
Liao et al. | Image-matching based identification of store signage using web-crawled information | |
Yadav et al. | Rfpssih: reducing false positive text detection sequels in scenery images using hybrid technique | |
CN113591680B (en) | Method and system for identifying longitude and latitude of geological picture drilling well | |
Khlif | Multi-lingual scene text detection based on convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |