WO2020164281A1 - 基于文字定位识别的表格解析方法、介质及计算机设备 - Google Patents
基于文字定位识别的表格解析方法、介质及计算机设备 Download PDFInfo
- Publication number
- WO2020164281A1 WO2020164281A1 PCT/CN2019/118422 CN2019118422W WO2020164281A1 WO 2020164281 A1 WO2020164281 A1 WO 2020164281A1 CN 2019118422 W CN2019118422 W CN 2019118422W WO 2020164281 A1 WO2020164281 A1 WO 2020164281A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- picture
- layout
- position information
- recognition
- table layout
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000011218 segmentation Effects 0.000 claims abstract description 9
- 230000015654 memory Effects 0.000 claims description 31
- 238000004458 analytical method Methods 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 9
- 238000010191 image analysis Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 238000013136 deep learning model Methods 0.000 abstract description 5
- 238000001514 detection method Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 210000004556 brain Anatomy 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000006872 improvement Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000008450 motivation Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 102100032202 Cornulin Human genes 0.000 description 2
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
Definitions
- This application relates to the field of computer processing technology, and in particular to a table analysis method, medium and computer equipment based on text positioning and recognition.
- Deep learning is developing rapidly in the field of image recognition. It has completely surpassed the accuracy and efficiency of traditional methods, and is deeply concerned in the field of image recognition. Deep learning is a new field in machine learning research. Its motivation lies in establishing and simulating a neural network for analysis and learning of the human brain. It mimics the mechanism of the human brain to interpret data, such as images, sounds and texts.
- the recognition of the table refers to the conversion of the table in the table picture into editable table text. In this process, text recognition and image recognition are required.
- the existing technical solution is to perform table analysis based on the presence of table lines. When there is no table line, the table format picture cannot be extracted.
- the present application provides a form analysis method and corresponding device based on text positioning and recognition, which mainly realizes the positioning and recognition of text in form pictures by using established deep learning models, and improves the efficiency and accuracy of form picture recognition.
- This application also provides a computer device and a readable storage medium for executing the table analysis method based on text positioning and recognition of this application.
- the present application provides a method for analyzing table images based on text positioning and recognition, the method including:
- the input form picture to a pre-trained text positioning network to obtain position information of characters in the form picture includes:
- a rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
- This application provides a form analysis method based on text positioning and recognition.
- the position information of the characters in the form pictures is obtained; the form pictures are performed according to the position information.
- a first table layout; according to the first table layout and the cell character content, a table file of the table picture is generated.
- the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.
- This application can detect whether the table picture contains grid lines; if the table picture contains grid lines, extract the second table layout of the table picture; combine the second table layout with the first A table layout is compared, and when the result of the comparison is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
- This application can additionally detect whether there are table lines in the table picture. In the case that the table pictures have table lines, the table lines are directly extracted, and then the obtained first table layout and the extracted table line form the first The two table layouts are compared to verify whether the first table layout is valid.
- This application uses the text positioning network and the text recognition network to parse the table pictures, which can be compatible with the situations where there is no table line and the table line or the table line is incomplete, and the scope of application is wide.
- the present application may further calculate the comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table layout,
- the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value
- the text positioning network is retrained. This application can flexibly and intelligently learn through this mechanism, and intelligently adjust the pre-trained text positioning network, so that the analysis result of the table image becomes more and more accurate.
- FIG. 1 is a flowchart of a table parsing method based on text positioning recognition in an embodiment
- Figure 2 is a text positioning network based on scene text detection in the prior art
- FIG. 3 is a schematic diagram of obtaining position information of characters in the table picture in an embodiment
- FIG. 4 is a structural block diagram of a table analysis device based on text positioning recognition in an embodiment
- Fig. 5 is a block diagram of the internal structure of a computer device in an embodiment.
- An embodiment of the present application provides a table analysis method based on text positioning and recognition. As shown in FIG. 1, the method includes the following steps:
- the deep network training is performed by inputting multiple target samples in advance, and the text positioning network capable of positioning the text of the table picture and the text recognition network capable of recognizing the text of the table picture are trained. Specifically, feature point extraction and feature fusion are performed on the sample picture, and finally the text positioning network and the text recognition network are output.
- the target sample includes at least a picture sample and the coordinates of a marked rectangular frame with text.
- Deep network training is a new field in machine learning research. Its motivation is to establish and simulate a neural network that simulates the human brain for analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.
- the general idea of this application is the text detection and recognition process based on deep network training, specifically through FasterRCNN (deep learning-based target detection technology), CTPN (natural scene text detection) and other positioning networks for text detection and recognition in pictures. Positioning to obtain the location information of the text, and then input the area pointed to by the location information to the RNN-based text recognition network such as RCNN for text recognition, and obtain the character string corresponding to the location information.
- RNN-based text recognition network such as RCNN for text recognition
- Figure 2 is a text positioning network based on EAST (scene text detection).
- the text positioning network used in this application is an improvement based on the EAST text positioning network.
- the text positioning network used in this application is the score in the network structure shown in FIG. 2 After the map is connected to the LSTM (Long Short-term Memory Network), the score map is brightened and evenly touched. Use dice during training Loss replaces focus-loss.
- LSTM is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in time series.
- inputting the form picture described in this application to the pre-trained text positioning network to obtain the position information of the characters in the form picture specifically includes: inputting the form picture to the pre-trained text positioning network; Several character strings of are used as a character string combination; the smallest rectangular frame surrounding the character string combination is obtained; a rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
- FIG. 3 is a schematic diagram of obtaining position information of characters in the table picture.
- the table picture contains several character string combinations. After the text positioning network is used, the smallest rectangular frame wrapping each character string combination is output.
- the position information of the characters in the table picture is expressed as the coordinate value of the smallest rectangular frame that wraps the combination of character strings.
- the coordinates of the four vertices of the rectangular frame surrounding the character string combination can be directly obtained through the character positioning network.
- the position information is expressed as the coordinate values of the upper left corner and the lower right corner of the rectangular frame.
- the minimum and maximum values of the X axis and the minimum and maximum values of the Y axis constitute the coordinates of the upper left corner and the lower right corner of the rectangular frame, thereby obtaining a standard rectangular frame.
- the coordinates of the four vertices of the smallest rectangular frame surrounding a certain string combination obtained through the text positioning network are: A (X1, Y1), A (X1, Y2), A (X2, Y1), and A (X2, Y2), according to the size of X1, X2, Y1, and Y2, select the coordinates of the upper left and lower right corners of the rectangle.
- a rectangular frame is determined according to the position information, and a cell picture is determined according to the rectangular frame.
- the present application performs image segmentation on the form picture according to the rectangular frame, and cuts out the cell picture corresponding to the rectangular frame from the form picture, wherein each cell picture contains a character string combination.
- the present application inputs the cell picture to the text recognition network to recognize the content of the character string combination in the cell picture to obtain the cell character content.
- the character recognition network is a classic character recognition CRNN network, and the cell character content that can be edited is obtained through the network.
- extracting the first table layout of the table picture according to the position information specifically includes: extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information; Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points in the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row; calculate the total number of rows and the total The number of columns is used as the first table layout.
- the rectangular frame wrapping each character string combination is divided into the positions of the rows and columns corresponding to the table pictures according to the overlap ratio of the position information in the horizontal direction and the vertical direction.
- the ordinates of the vertices of the rectangular boxes in the same row are the same or similar
- the abscissas of the rectangular boxes in the same column are the same or similar.
- This application can set when the ordinates of two points are the same or the difference between the ordinates of the two points is within a preset range to determine that the two points are in the same row, and when the abscissas of the two points are the same or When the difference between the abscissas of the two points is within the preset range, it is determined that the two points are located in the same column.
- this application divides the vertices of the rectangular frame with the same or similar ordinates into the same row, and divides the same or similar abscissas into the same column.
- the first table layout includes at least the number of rows and columns of the table.
- the name content of the table it has a text length that spans columns, so you can remove it first.
- the generating a table file of the table picture according to the first table layout and the cell character content specifically includes: drawing a table according to the first table layout; The characters are correspondingly filled in the cells of the drawn table to generate a table file of the table picture.
- the table corresponding to the table picture is drawn, and the table contains the same number of cells as the combination of the character strings. Further, this application fills the identified cell character content into the cells of the table to generate a table file, whose content can be saved in csv or json format for data analysis and processing by the program, thereby realizing the analysis of the table image .
- the method before the input of the form picture to the pre-trained text positioning network and the position information of the characters in the form picture is obtained, the method further includes: detecting whether the form picture contains grid lines; if the form If the picture contains grid lines, extract the second table layout of the table picture; compare the second table layout with the first table layout, and when the comparison result is that the first table layout and the When the second table layout is consistent, it is verified that the first table layout is valid.
- the second table layout can be extracted through the open and close operation of image science.
- the present application can verify the reliability of the first table layout and the second table layout by comparing the first table layout with the second table layout.
- the present application may also calculate a comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table.
- the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, the text positioning network is retrained to improve the recognition accuracy of the solution.
- the present application provides a form image analysis device based on text positioning recognition, including:
- the input module 11 is used to input form pictures to a pre-trained text positioning network to obtain position information of characters in the form pictures.
- the deep network training is performed by inputting multiple target samples in advance, and the text positioning network capable of positioning the text of the table picture and the text recognition network capable of recognizing the text of the table picture are trained. Specifically, feature point extraction and feature fusion are performed on the sample picture, and finally the text positioning network and the text recognition network are output.
- the target sample includes at least a picture sample and the coordinates of a marked rectangular frame with text.
- Deep network training is a new field in machine learning research. Its motivation is to establish and simulate a neural network that simulates the human brain for analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.
- the general idea of this application is the text detection and recognition process based on deep network training, specifically through FasterRCNN (deep learning-based target detection technology), CTPN (natural scene text detection) and other positioning networks for text detection and recognition in pictures. Positioning to obtain the location information of the text, and then input the area pointed to by the location information to the RNN-based text recognition network such as RCNN for text recognition, and obtain the character string corresponding to the location information.
- RNN-based text recognition network such as RCNN for text recognition
- Figure 2 is a text positioning network based on EAST (scene text detection).
- the text positioning network used in this application is an improvement based on the EAST text positioning network.
- the text positioning network used in this application is the score in the network structure shown in FIG. 2 After the map is connected to the LSTM (Long Short-term Memory Network), the score map is brightened and evenly touched. Use dice during training Loss replaces focus-loss.
- LSTM is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in time series.
- inputting the form picture described in this application to the pre-trained text positioning network to obtain the position information of the characters in the form picture specifically includes: inputting the form picture to the pre-trained text positioning network; Several character strings of are used as a character string combination; the smallest rectangular frame surrounding the character string combination is obtained; a rectangular coordinate system is established, and the coordinates of each vertex of the rectangular frame are obtained as the position information.
- FIG. 3 is a schematic diagram of obtaining position information of characters in the table picture.
- the table picture contains several character string combinations. After the text positioning network is used, the smallest rectangular frame wrapping each character string combination is output.
- the position information of the characters in the table picture is expressed as the coordinate value of the smallest rectangular frame that wraps the combination of character strings.
- the coordinates of the four vertices of the rectangular frame surrounding the character string combination can be directly obtained through the character positioning network.
- the position information is expressed as the coordinate values of the upper left corner and the lower right corner of the rectangular frame.
- the minimum and maximum values of the X axis and the minimum and maximum values of the Y axis constitute the coordinates of the upper left corner and the lower right corner of the rectangular frame, thereby obtaining a standard rectangular frame.
- the coordinates of the four vertices of the smallest rectangular frame that wraps a certain string combination obtained through the text positioning network are: A(X1, Y1), A(X1, Y2), A(X2, Y1), and A (X2, Y2), according to the size of X1, X2, Y1, and Y2, select the coordinates of the upper left and lower right corners of the rectangle.
- the segmentation module 12 is configured to perform graphic segmentation on the table picture according to the position information, segment the cell picture corresponding to the position information, and input the cell picture into a pre-trained text recognition network for character recognition to obtain Cell character content.
- a rectangular frame is determined according to the position information, and a cell picture is determined according to the rectangular frame.
- the present application performs image segmentation on the form picture according to the rectangular frame, and cuts out the cell picture corresponding to the rectangular frame from the form picture, wherein each cell picture contains a character string combination.
- the present application inputs the cell picture to the text recognition network to recognize the content of the character string combination in the cell picture to obtain the cell character content.
- the character recognition network is a classic character recognition CRNN network, and the cell character content that can be edited is obtained through the network.
- the extraction module 13 is configured to extract the first table layout of the table picture according to the position information.
- extracting the first table layout of the table picture according to the position information specifically includes: extracting the coordinate values of the points at the upper left corner and the lower right corner of the rectangular frame in the position information; Divide the rectangular boxes corresponding to the points with the same abscissa into the same column according to the coordinate values of the points in the upper left corner and the lower right corner, and divide the rectangular boxes corresponding to the points with the same ordinate into the same row; calculate the total number of rows and the total The number of columns is used as the first table layout.
- the rectangular frame wrapping each character string combination is divided into the positions of the rows and columns corresponding to the table pictures according to the overlap ratio of the position information in the horizontal direction and the vertical direction.
- the ordinates of the vertices of the rectangular boxes in the same row are the same or similar
- the abscissas of the rectangular boxes in the same column are the same or similar.
- This application can set when the ordinates of two points are the same or the difference between the ordinates of the two points is within a preset range to determine that the two points are in the same row, and when the abscissas of the two points are the same or When the difference between the abscissas of the two points is within the preset range, it is determined that the two points are located in the same column.
- this application divides the vertices of the rectangular frame with the same or similar ordinates into the same row, and divides the same or similar abscissas into the same column.
- the first table layout includes at least the number of rows and columns of the table.
- the name content of the table it has a text length that spans columns, so you can remove it first.
- the generating module 14 is configured to generate a table file of the table picture according to the first table layout and the cell character content.
- the generating a table file of the table picture according to the first table layout and the cell character content specifically includes: drawing a table according to the first table layout; The characters are correspondingly filled in the cells of the drawn table to generate a table file of the table picture.
- the table corresponding to the table picture is drawn, and the table contains the same number of cells as the combination of the character strings. Further, this application fills the identified cell character content into the cells of the table to generate a table file, whose content can be saved in csv or json format for data analysis and processing by the program, thereby realizing the analysis of the table image .
- the method before the input of the form picture to the pre-trained text positioning network and the position information of the characters in the form picture is obtained, the method further includes: detecting whether the form picture contains grid lines; if the form If the picture contains grid lines, extract the second table layout of the table picture; compare the second table layout with the first table layout, and when the comparison result is that the first table layout and the When the second table layout is consistent, it is verified that the first table layout is valid.
- the second table layout can be extracted through the open and close operation of image science.
- the present application can verify the reliability of the first table layout and the second table layout by comparing the first table layout with the second table layout.
- the present application may also calculate a comparison result of the second table layout and the first table layout, and the comparison result is expressed as the difference between the first table layout and the second table.
- the comparison result is that the number of points of difference between the first table layout and the second table layout is greater than a preset value, the text positioning network is retrained to improve the recognition accuracy of the solution.
- an embodiment of the present application provides a computer-readable storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium.
- the computer-readable storage medium stores computer-readable instructions, and when the program is executed by a processor, the table analysis method based on text positioning and recognition according to any one of the technical solutions is implemented.
- the computer-readable storage medium includes but is not limited to any type of disk (including floppy disk, hard disk, optical disk, CD-ROM, and magneto-optical disk), ROM (Read-Only Memory, read-only memory), RAM (Random AcceSS Memory, random memory), EPROM (EraSable Programmable Read-Only Memory, erasable programmable read-only memory), EEPROM (Electrically EraSable Programmable Read-Only Memory, electrically erasable programmable read-only memory), flash memory, magnetic card or optical card.
- a storage device includes any medium that stores or transmits information in a readable form by a device (for example, a computer, a mobile phone), and may be a read-only memory, a magnetic disk, or an optical disk.
- the computer-readable storage medium provided by the embodiment of the application can realize the input of form pictures to a pre-trained text positioning network to obtain the position information of the characters in the form pictures; graph the form pictures according to the position information Segmentation, segmenting the cell picture corresponding to the position information, inputting the cell picture into a pre-trained text recognition network for character recognition, and obtaining cell character content; extracting the first part of the table picture according to the position information A table layout; according to the first table layout and the cell character content, a table file of the table picture is generated.
- the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.
- the present application provides a computer device.
- the computer device includes a processor 303, a memory 305, an input unit 307, and a display unit 309.
- the memory 305 may be used to store the application program 301 and various functional modules, and the processor 303 runs the application program 301 stored in the memory 305 to execute various functional applications and data processing of the device.
- the memory 305 may be internal memory or external memory, or include both internal memory and external memory.
- the internal memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or random access memory.
- ROM read only memory
- PROM programmable ROM
- EPROM electrically programmable ROM
- EEPROM electrically erasable programmable ROM
- flash memory or random access memory.
- External storage can include hard disks, floppy disks, ZIP disks, U disks, tapes, etc.
- the memory disclosed in this application includes but is not limited to these types of memory.
- the memory 305 disclosed in this application is merely an example and not a limitation.
- the input unit 307 is used for receiving input of signals and receiving keywords input by the user.
- the input unit 307 may include a touch panel and other input devices.
- the touch panel can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc., to operate on the touch panel or near the touch panel), and according to the preset
- the program drives the corresponding connection device; other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as playback control buttons, switch buttons, etc.), trackball, mouse, and joystick.
- the display unit 309 may be used to display information input by the user or information provided to the user and various menus of the computer device.
- the display unit 309 may take the form of a liquid crystal display, an organic light emitting diode, or the like.
- the processor 303 is the control center of the computer equipment. It uses various interfaces and lines to connect the various parts of the entire computer. By running or executing the software programs and/or modules stored in the memory 303, and calling the data stored in the memory, execute Various functions and processing data.
- the one or more processors 303 shown in FIG. 5 can execute and realize the functions of the input module 11, the recognition module 12, the extraction module 13, and the generation module 14 shown in FIG. 4.
- the computer device includes a memory 305 and a processor 303.
- the memory 305 stores computer-readable instructions.
- the processor 303 executes the steps of a table analysis method based on character positioning recognition described in the above embodiment.
- the computer device provided by the embodiment of the application can input form pictures to a pre-trained text positioning network to obtain position information of characters in the form pictures; perform graphic segmentation and segmentation on the form pictures according to the position information
- the cell picture corresponding to the position information is extracted, and the cell picture is input into a pre-trained text recognition network for character recognition to obtain the cell character content; according to the position information, the first table layout of the table picture is extracted ; According to the first table layout and the cell character content, a table file of the table picture is generated.
- the established deep learning model can be used to locate and recognize the text in the table image, which improves the efficiency and accuracy of the table image recognition.
- the present application can also detect whether the table picture contains grid lines; if the table picture contains grid lines, extract the second table layout of the table picture; The second table layout is compared with the first table layout, and when the comparison result is that the first table layout is consistent with the second table layout, it is verified that the first table layout is valid.
- This application can additionally detect whether there are table lines in the table picture. In the case that the table pictures have table lines, the table lines are directly extracted, and then the obtained first table layout and the extracted table line form the first The two table layouts are compared to verify whether the first table layout is valid.
- This application uses the text positioning network and the text recognition network to parse the table pictures, which can be compatible with the situations where there is no table line and the table line or the table line is incomplete, and the scope of application is wide.
- the computer-readable storage medium provided in the embodiment of the present application can implement the above-mentioned embodiment of the table analysis method based on text positioning and recognition.
- the aforementioned storage medium may be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Non-volatile storage media such as Memory, ROM, or Random Access Memory (RAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Character Discrimination (AREA)
- Character Input (AREA)
- Image Analysis (AREA)
Abstract
一种基于文字定位识别的表格解析方法,所述方法包括:输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息(S11);依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容(S12);依据所述位置信息,提取所述表格图片的第一表格布局(S13);依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件(S14)。可以利用建立好的深度学习模型进行表格图片中文字的定位与识别,提高了表格图片识别的效率以及准确率。
Description
本申请要求于2019年2月13日提交中国专利局、申请号为201910115364.7、发明名称为“基于文字定位识别的表格解析方法、介质及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及计算机处理技术领域,尤其涉及一种基于文字定位识别的表格解析方法、介质及计算机设备。
背景技术
目前,深度学习在图片识别领域发展迅速,它已完全超越传统方法的准确率和效率,深受图片识别领域的关注。深度学习是机器学习研究中的一个新的领域,其动机在于建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,例如图像,声音和文本。然而,表格的识别是指将表格图片中的表格转换成可编辑的表格文本,该过程中需要用到文本的识别以及图像的识别。
现有的技术中,也有应用深度学习对表格图片中的表格进行解析,但是现有的技术方案中,是通过深度学习对表格图片中的表格线进行检测识别,其至少存在以下缺陷:
现有的技术方案是基于有表格线的情况进行表格解析,当没有表格线时的表格格式图片,则不能进行表格提取。
发明内容
本申请提供一种基于文字定位识别的表格解析方法及相应的装置,其主要实现了利用建立好的深度学习模型进行表格图片中文字的定位与识别,提高了表格图片识别的效率以及准确率。
本申请还提供一种用于执行本申请的基于文字定位识别的表格解析方法的计算机设备及可读存储介质。
为解决上述问题,本申请采用如下各方面的技术方案:
第一方面,本申请提供一种基于文字定位识别的表格图片解析方法,所述方法包括:
输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;
依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;
依据所述位置信息,提取所述表格图片的第一表格布局;
依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;
其中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,包括:
输入表格图片至预先训练的文字定位网络;
获取所述表格图片中连续的若干个字符串作为一个字符串组合;
获取包围所述字符串组合的最小的矩形框;
建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。
相对于现有技术,本申请的技术方案至少具备如下优点:
1、本申请提供一种基于文字定位识别的表格解析方法,通过输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;依据所述位置信息,提取所述表格图片的第一表格布局;依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。本申请可以利用建立好的深度学习模型进行表格图片中文字的定位与识别,提高了表格图片识别的效率以及准确率。
2、本申请通过输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。本申请通过该机制获取所述表格图片中文字的位置信息,提高文字定位的准确性与效率。
3、本申请可以检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。本申请还可以另外检测所述表格图片是否存在表格线,在所述表格图片存在表格线的情况下,直接提取所述表格线,然后将得到的第一表格布局与提取的表格线构成的第二表格布局进行比对以校验所述第一表格布局是否有效。本申请通过文字定位网络以及文字识别网络解析表格图片,可以兼容无表格线和有表格线或表格线残缺的情况,适用范围广。
4、本申请还可以进一步计算所述第二表格布局与所述第一表格布局的比对结果,所述比对结果被表达为所述第一表格布局与所述第二表格的差异点,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络。本申请通过该机制可以灵活智能学习,智能调整预先训练好的文字定位网络,以使得表格图片的解析结果越来越精准。
附图说明
图1为一个实施例中基于文字定位识别的表格解析方法流程图;
图2为现有技术中基于场景文字检测的文字定位网络;
图3为一个实施例中为获取到所述表格图片中字符的位置信息示意图;
图4为一个实施例中基于文字定位识别的表格解析装置结构框图;
图5为一个实施例中计算机设备的内部结构框图。
本申请目的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
在本申请的说明书和权利要求书及上述附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如S11、S12等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
本领域普通技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。
本领域普通技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1,本申请实施例提供一种基于文字定位识别的表格解析方法,如图1所示,所述方法包括以下步骤:
S11、输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息。
本申请实施例中,预先通过输入多个目标样本进行深度网络的训练,训练出能够进行表格图片的文字定位的所述文字定位网络和能够进行表格图片文字识别的文字识别网络。具体的,对所述样本图片进行特征点提取以及特征融合,最终输出所述文字定位网络和所述文字识别网络。其中,所述目标样本至少包括图片样本以及标注的有文字的矩形框坐标。
深度网络的训练是机器学习研究中的一个新的领域,其动机在于建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,例如图像,声音和文本。
本申请的总体思路为基于深度网络的训练的文字检测与识别过程,具体是通过FasterRCNN(基于深度学习的目标检测技术)、CTPN(自然场景文本检测)等定位网络针对图片中的文字进行检测和定位,得到文字的位置信息,然后将该位置信息所指向的区域输入到基于RNN文字识别网络如RCNN等进行文字的识别,得到该位置信息对应的字符串。
请参考图2,图2为基于EAST(场景文字检测)文字定位网络。本申请所应用的文字定位网络是基于EAST文字定位网络改进而成。具体的,本申请所应用的文字定位网络是在图2所示的网络结构中的score
map后接入LSTM(长短期记忆网络),将score map提亮摸均匀,训练时使用dice
loss替换focus-loss。其中,LSTM是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。
进一步的,本申请所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,具体包括:输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。
请参考图3,图3为获取到所述表格图片中字符的位置信息示意图。如图3所示,所述表格图片中包含若干个字符串组合。通过所述文字定位网络后输出包裹各个字符串组合的最小矩形框。本申请实施例中,所述表格图片中字符的位置信息被表达为包裹所述字符串组合的最小矩形框的坐标值。本申请通过所述文字定位网络可以直接得到包裹所述字符串组合的矩形框的四个顶点的坐标。具体的,所述位置信息被表达为该矩形框的左上角以及右下角的坐标值。在实际使用时,因为表格文字基本是水平的,所以取得到的Quad
Geometry这个函数中四个坐标的X轴最小值与最大值,Y轴的最小值与最大值,组成所述矩形框的左上角与右下角的坐标,从而得到标准的矩形框。例如,通过所述文字定位网络得到包裹某个字符串组合的最小矩形框的四个顶点的坐标分别为:A(X1,Y1)、A(X1,Y2)、A(X2,Y1)以及A(X2,Y2),依据X1、X2、Y1以及Y2的大小值,选取该矩形的左上角以及右下角的点的坐标值。
S12、依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容。
本申请实施例中,依据所述位置信息确定一个矩形框,依据所述矩形框确定一个单元格图片。具体的,本申请依据所述矩形框对所述表格图片进行图像分割,从所述表格图片中截取出该矩形框对应的单元格图片,其中,每个单元格图片中包含一个字符串组合。
进一步的,本申请将所述单元格图片输入至所述文字识别网络,以对所述单元格图片中的字符串组合的内容进行识别得到所述单元格字符内容。本申请实施例中,所述文字识别网络是经典的文字识别CRNN网络,通过该网络后得到可供编辑的所述单元格字符内容。
S13、依据所述位置信息,提取所述表格图片的第一表格布局。
本申请实施例中,所述依据所述位置信息,提取所述表格图片的第一表格布局,具体包括:提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;计算总的行数以及总的列数作为所述第一表格布局。
本申请实施例中,通过所述位置信息在水平方向上和垂直方向上的重叠比例将包裹各个字符串组合的矩形框划分到表格图片对应的行列的位置。其中,相同行中矩形框的顶点的纵坐标相同或者相近,相同列的矩形框的横坐标相同或者相近。本申请可以设定当两个点的纵坐标相同或者两个点的纵坐标的差值在预设范围内时判断该两个点位于同一行,以及设定当两个点的横坐标相同或者两个点的横坐标的差值在预设范围内时判断该两个点位于同一列。本申请依据该原理,将矩形框的顶点的纵坐标相同或相近的划分为同一行,将横坐标相同或相近的划分为同一列。
请继续参考图3,如图3所示,同一列的矩形框的顶点的横坐标存在相同或相近的,而不同列的横坐标范围没有交集。同一行的矩形框具有重合的纵坐标的交集,而不同行的纵坐标范围不存在交集。
本申请实施例中,所述第一表格布局至少包括表格的行数以及列数。对于表格的名称内容,它具有跨列的文字长度,则可以将其先去除。通过以上规则,可以提取所述表格图片的行的数量N以及列的数量M,进一步的,提取出所述表格图片的N×M布局格式。
S14、依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。
本申请实施例中,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,具体包括:依据所述第一表格布局绘制表格;将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。
本申请实施例中,提取所述表格图片的第一表格布局之后绘制所述表格图片对应的表格,所述表格中包含与所述字符串组合数量相同的单元格。进一步的,本申请将识别出的单元格字符内容对应填入所述表格的单元格中生成表格文件,其内容可保存为csv或者json格式可供程序进行数据分析处理,从而实现表格图片的解析。
本申请实施例中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息之前,还包括:检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。一种可能的设计中,如果所述表格图中表格有网格线,可以通过图像学开闭运算提取出所述第二表格布局。
实际上,本申请可以通过将所述第一表格布局与所述第二表格布局进行比对同时验证所述第一表格布局与所述第二表格布局的可靠性。
优选的,本申请还可以计算所述第二表格布局与所述第一表格布局的比对结果,所述比对结果被表达为所述第一表格布局与所述第二表格的差异点,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络,以提高本方案的识别精度。
请参考图4,在另一种实施例中,本申请提供了一种基于文字定位识别的表格图片解析装置,包括:
输入模块11,用于输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息。
本申请实施例中,预先通过输入多个目标样本进行深度网络的训练,训练出能够进行表格图片的文字定位的所述文字定位网络和能够进行表格图片文字识别的文字识别网络。具体的,对所述样本图片进行特征点提取以及特征融合,最终输出所述文字定位网络和所述文字识别网络。其中,所述目标样本至少包括图片样本以及标注的有文字的矩形框坐标。
深度网络的训练是机器学习研究中的一个新的领域,其动机在于建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,例如图像,声音和文本。
本申请的总体思路为基于深度网络的训练的文字检测与识别过程,具体是通过FasterRCNN(基于深度学习的目标检测技术)、CTPN(自然场景文本检测)等定位网络针对图片中的文字进行检测和定位,得到文字的位置信息,然后将该位置信息所指向的区域输入到基于RNN文字识别网络如RCNN等进行文字的识别,得到该位置信息对应的字符串。
请参考图2,图2为基于EAST(场景文字检测)文字定位网络。本申请所应用的文字定位网络是基于EAST文字定位网络改进而成。具体的,本申请所应用的文字定位网络是在图2所示的网络结构中的score
map后接入LSTM(长短期记忆网络),将score map提亮摸均匀,训练时使用dice
loss替换focus-loss。其中,LSTM是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。
进一步的,本申请所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,具体包括:输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。
请继续参考图3,图3为获取到所述表格图片中字符的位置信息示意图。如图3所示,所述表格图片中包含若干个字符串组合。通过所述文字定位网络后输出包裹各个字符串组合的最小矩形框。本申请实施例中,所述表格图片中字符的位置信息被表达为包裹所述字符串组合的最小矩形框的坐标值。本申请通过所述文字定位网络可以直接得到包裹所述字符串组合的矩形框的四个顶点的坐标。具体的,所述位置信息被表达为该矩形框的左上角以及右下角的坐标值。在实际使用时,因为表格文字基本是水平的,所以取得到的Quad
Geometry这个函数中四个坐标的X轴最小值与最大值,Y轴的最小值与最大值,组成所述矩形框的左上角与右下角的坐标,从而得到标准的矩形框。例如,通过所述文字定位网络得到包裹某个字符串组合的最小矩形框的四个顶点的坐标分别为:A(X1,Y1)、A(X1,Y2)、A(X2,Y1)以及A(X2,Y2),依据X1、X2、Y1以及Y2的大小值,选取该矩形的左上角以及右下角的点的坐标值。
分割模块12,用于依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容。
本申请实施例中,依据所述位置信息确定一个矩形框,依据所述矩形框确定一个单元格图片。具体的,本申请依据所述矩形框对所述表格图片进行图像分割,从所述表格图片中截取出该矩形框对应的单元格图片,其中,每个单元格图片中包含一个字符串组合。
进一步的,本申请将所述单元格图片输入至所述文字识别网络,以对所述单元格图片中的字符串组合的内容进行识别得到所述单元格字符内容。本申请实施例中,所述文字识别网络是经典的文字识别CRNN网络,通过该网络后得到可供编辑的所述单元格字符内容。
提取模块13,用于依据所述位置信息,提取所述表格图片的第一表格布局。
本申请实施例中,所述依据所述位置信息,提取所述表格图片的第一表格布局,具体包括:提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;计算总的行数以及总的列数作为所述第一表格布局。
本申请实施例中,通过所述位置信息在水平方向上和垂直方向上的重叠比例将包裹各个字符串组合的矩形框划分到表格图片对应的行列的位置。其中,相同行中矩形框的顶点的纵坐标相同或者相近,相同列的矩形框的横坐标相同或者相近。本申请可以设定当两个点的纵坐标相同或者两个点的纵坐标的差值在预设范围内时判断该两个点位于同一行,以及设定当两个点的横坐标相同或者两个点的横坐标的差值在预设范围内时判断该两个点位于同一列。本申请依据该原理,将矩形框的顶点的纵坐标相同或相近的划分为同一行,将横坐标相同或相近的划分为同一列。
请继续参考图3,如图3所示,同一列的矩形框的顶点的横坐标存在相同或相近的,而不同列的横坐标范围没有交集。同一行的矩形框具有重合的纵坐标的交集,而不同行的纵坐标范围不存在交集。
本申请实施例中,所述第一表格布局至少包括表格的行数以及列数。对于表格的名称内容,它具有跨列的文字长度,则可以将其先去除。通过以上规则,可以提取所述表格图片的行的数量N以及列的数量M,进一步的,提取出所述表格图片的N×M布局格式。
生成模块14,用于依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。
本申请实施例中,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,具体包括:依据所述第一表格布局绘制表格;将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。
本申请实施例中,提取所述表格图片的第一表格布局之后绘制所述表格图片对应的表格,所述表格中包含与所述字符串组合数量相同的单元格。进一步的,本申请将识别出的单元格字符内容对应填入所述表格的单元格中生成表格文件,其内容可保存为csv或者json格式可供程序进行数据分析处理,从而实现表格图片的解析。
本申请实施例中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息之前,还包括:检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。一种可能的设计中,如果所述表格图中表格有网格线,可以通过图像学开闭运算提取出所述第二表格布局。
实际上,本申请可以通过将所述第一表格布局与所述第二表格布局进行比对同时验证所述第一表格布局与所述第二表格布局的可靠性。
优选的,本申请还可以计算所述第二表格布局与所述第一表格布局的比对结果,所述比对结果被表达为所述第一表格布局与所述第二表格的差异点,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络,以提高本方案的识别精度。
在另一种实施例中,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质可以为非易失性可读存储介质。所述计算机可读存储介质上存储有计算机可读指令,该程序被处理器执行时实现任一项技术方案所述的基于文字定位识别的表格解析方法。其中,所述计算机可读存储介质包括但不限于任何类型的盘(包括软盘、硬盘、光盘、CD-ROM、和磁光盘)、ROM(Read-Only
Memory,只读存储器)、RAM(Random AcceSS Memory,随即存储器)、EPROM(EraSable Programmable
Read-Only Memory,可擦写可编程只读存储器)、EEPROM(Electrically EraSable Programmable
Read-Only
Memory,电可擦可编程只读存储器)、闪存、磁性卡片或光线卡片。也就是,存储设备包括由设备(例如,计算机、手机)以能够读的形式存储或传输信息的任何介质,可以是只读存储器,磁盘或光盘等。
本申请实施例提供的一种计算机可读存储介质,可实现输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;依据所述位置信息,提取所述表格图片的第一表格布局;依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。本申请可以利用建立好的深度学习模型进行表格图片中文字的定位与识别,提高了表格图片识别的效率以及准确率。
此外,在又一种实施例中,本申请提供了一种计算机设备,如图5所示,所述计算机设备包括处理器303、存储器305、输入单元307以及显示单元309等器件。本领域技术人员可以理解,图5示出的结构器件并不构成对所有计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件。存储器305可用于存储应用程序301以及各功能模块,处理器303运行存储在存储器305的应用程序301,从而执行设备的各种功能应用以及数据处理。存储器305可以是内存储器或外存储器,或者包括内存储器和外存储器两者。内存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦写可编程ROM(EEPROM)、快闪存储器、或者随机存储器。外存储器可以包括硬盘、软盘、ZIP盘、U盘、磁带等。本申请所公开的存储器包括但不限于这些类型的存储器。本申请所公开的存储器305只作为例子而非作为限定。
输入单元307用于接收信号的输入,以及接收用户输入的关键字。输入单元307可包括触控面板以及其它输入设备。触控面板可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板上或在触控面板附近的操作),并根据预先设定的程序驱动相应的连接装置;其它输入设备可以包括但不限于物理键盘、功能键(比如播放控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。显示单元309可用于显示用户输入的信息或提供给用户的信息以及计算机设备的各种菜单。显示单元309可采用液晶显示器、有机发光二极管等形式。处理器303是计算机设备的控制中心,利用各种接口和线路连接整个电脑的各个部分,通过运行或执行存储在存储器303内的软件程序和/或模块,以及调用存储在存储器内的数据,执行各种功能和处理数据。图5中所示的一个或多个处理器303能够执行、实现图4中所示的输入模块11、识别模块12、提取模块13以及生成模块14的功能。
在一种实施方式中,所述计算机设备包括存储器305和处理器303,所述存储器305中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器303执行以上实施例所述的一种基于文字定位识别的表格解析方法的步骤。
本申请实施例提供的一种计算机设备,可实现输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;依据所述位置信息,提取所述表格图片的第一表格布局;依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件。本申请可以利用建立好的深度学习模型进行表格图片中文字的定位与识别,提高了表格图片识别的效率以及准确率。
另一种实施例中,本申请还可以实现检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。本申请还可以另外检测所述表格图片是否存在表格线,在所述表格图片存在表格线的情况下,直接提取所述表格线,然后将得到的第一表格布局与提取的表格线构成的第二表格布局进行比对以校验所述第一表格布局是否有效。本申请通过文字定位网络以及文字识别网络解析表格图片,可以兼容无表格线和有表格线或表格线残缺的情况,适用范围广。
本申请实施例提供的计算机可读存储介质可以实现上述基于文字定位识别的表格解析方法的实施例,具体功能实现请参见方法实施例中的说明,在此不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only
Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。
Claims (20)
- 一种基于文字定位识别的表格图片解析方法,其特征在于,所述方法包括:输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;依据所述位置信息,提取所述表格图片的第一表格布局;依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;其中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,包括:输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。
- 根据权利要求1所述的基于文字定位识别的表格图片解析方法,其特征在于,还包括:输入表格图片的样本进行深度网络的训练,训练出所述文字定位网络以及所述文字识别网络。
- 根据权利要求1所述的基于文字定位识别的表格图片解析方法,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局,包括:提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;计算总的行数以及总的列数作为所述第一表格布局。
- 根据权利要求1所述的基于文字定位识别的表格图片解析方法,其特征在于,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,包括:依据所述第一表格布局绘制表格;将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。
- 根据权利要求1所述的基于文字定位识别的表格图片解析方法,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局之后,包括:检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。
- 根据权利要求5所述的基于文字定位识别的表格图片解析方法,其特征在于,所述依据所述位置信息,生成所述表格图片的第一表格布局之后,包括:计算所述第二表格布局与所述第一表格布局的比对结果,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络。
- 一种基于文字定位识别的表格图片解析装置,其特征在于,所述基于文字定位识别的表格图片解析装置包括:输入模块,用于输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;识别模块,用于依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;提取模块,用于依据所述位置信息,提取所述表格图片的第一表格布局;生成模块,用于依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;其中,所述输入模块,还用于:输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。
- 如权利要求7所述的基于文字定位识别的表格图片解析装置,其特征在于,所述基于文字定位识别的表格图片解析装置还包括:训练模块,输入表格图片的样本进行深度网络的训练,训练出所述文字定位网络以及所述文字识别网络。
- 如权利要求7所述的基于文字定位识别的表格图片解析装置,其特征在于,所述提取模块还用于:提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;计算总的行数以及总的列数作为所述第一表格布局。
- 如权利要求7所述的基于文字定位识别的表格图片解析装置,其特征在于,所述基于文字定位识别的表格图片解析装置还包括:检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。
- 根据权利要求10所述的基于文字定位识别的表格图片解析装置,其特征在于,所述基于文字定位识别的表格图片解析装置还包括:计算所述第二表格布局与所述第一表格布局的比对结果,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现如下步骤:输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;依据所述位置信息,提取所述表格图片的第一表格布局;依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;其中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,包括:输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。
- 如权利要求12所述的计算机可读存储介质,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局,包括:提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;计算总的行数以及总的列数作为所述第一表格布局。
- 根据权利要求12所述的计算机可读存储介质,其特征在于,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,包括:依据所述第一表格布局绘制表格;将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。
- 根据权利要求12所述的计算机可读存储介质,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局之后,包括:检测所述表格图片中是否包含网格线;若所述表格图片包含网格线,则提取所述表格图片的第二表格布局;将所述第二表格布局与所述第一表格布局进行比对,当比对结果为所述第一表格布局与所述第二表格布局一致时,则验证所述第一表格布局有效。
- 根据权利要求15所述的计算机可读存储介质,其特征在于,所述依据所述位置信息,生成所述表格图片的第一表格布局之后,包括:计算所述第二表格布局与所述第一表格布局的比对结果,当对比结果为所述第一表格布局与所述第二表格布局的差异点的数量大于预置值时,则重新训练所述文字定位网络。
- 一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现如下步骤:输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息;依据所述位置信息对所述表格图片进行图形分割,分割出所述位置信息对应的单元格图片,将所述单元格图片输入预先训练的文字识别网络进行字符识别,得到单元格字符内容;依据所述位置信息,提取所述表格图片的第一表格布局;依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件;其中,所述输入表格图片至预先训练的文字定位网络,得到所述表格图片中字符的位置信息,包括:输入表格图片至预先训练的文字定位网络;获取所述表格图片中连续的若干个字符串作为一个字符串组合;获取包围所述字符串组合的最小的矩形框;建立直角坐标系,获取所述矩形框的各个顶点的坐标作为所述位置信息。
- 根据权利要求17所述的计算机设备,其特征在于,还包括:输入表格图片的样本进行深度网络的训练,训练出所述文字定位网络以及所述文字识别网络。
- 根据权利要求17所述的计算机设备,其特征在于,所述依据所述位置信息,提取所述表格图片的第一表格布局,包括:提取所述位置信息中所述矩形框的左上角以及右下角的点的坐标值;依据所述左上角以及右下角的点的坐标值将相同横坐标的点对应的矩形框分为同一列,将相同纵坐标的点对应的矩形框分为同一行;计算总的行数以及总的列数作为所述第一表格布局。
- 根据权利要求17所述的计算机设备,其特征在于,所述依据所述第一表格布局以及所述单元格字符内容,生成所述表格图片的表格文件,包括:依据所述第一表格布局绘制表格;将所述单元格字符对应填入绘制的表格的单元格中,生成所述表格图片的表格文件。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910115364.7A CN109961008B (zh) | 2019-02-13 | 2019-02-13 | 基于文字定位识别的表格解析方法、介质及计算机设备 |
CN201910115364.7 | 2019-02-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020164281A1 true WO2020164281A1 (zh) | 2020-08-20 |
Family
ID=67023672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/118422 WO2020164281A1 (zh) | 2019-02-13 | 2019-11-14 | 基于文字定位识别的表格解析方法、介质及计算机设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109961008B (zh) |
WO (1) | WO2020164281A1 (zh) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111985459A (zh) * | 2020-09-18 | 2020-11-24 | 北京百度网讯科技有限公司 | 表格图像校正方法、装置、电子设备和存储介质 |
CN112036304A (zh) * | 2020-08-31 | 2020-12-04 | 平安医疗健康管理股份有限公司 | 医疗票据版面识别的方法、装置及计算机设备 |
CN112132794A (zh) * | 2020-09-14 | 2020-12-25 | 杭州安恒信息技术股份有限公司 | 审计视频的文字定位方法、装置、设备和可读存储介质 |
CN112200117A (zh) * | 2020-10-22 | 2021-01-08 | 长城计算机软件与系统有限公司 | 表格识别方法及装置 |
CN112364726A (zh) * | 2020-10-27 | 2021-02-12 | 重庆大学 | 基于改进east的零件喷码字符定位的方法 |
CN112686258A (zh) * | 2020-12-10 | 2021-04-20 | 广州广电运通金融电子股份有限公司 | 体检报告信息结构化方法、装置、可读存储介质和终端 |
CN112712014A (zh) * | 2020-12-29 | 2021-04-27 | 平安健康保险股份有限公司 | 表格图片结构解析方法、系统、设备和可读存储介质 |
CN112800904A (zh) * | 2021-01-19 | 2021-05-14 | 深圳市玩瞳科技有限公司 | 一种根据手指指向识别图片中字符串的方法及装置 |
CN113128490A (zh) * | 2021-04-28 | 2021-07-16 | 湖南荣冠智能科技有限公司 | 一种处方信息扫描和自动识别方法 |
CN113378789A (zh) * | 2021-07-08 | 2021-09-10 | 京东数科海益信息科技有限公司 | 单元格位置的检测方法、装置和电子设备 |
CN113392811A (zh) * | 2021-07-08 | 2021-09-14 | 北京百度网讯科技有限公司 | 一种表格提取方法、装置、电子设备及存储介质 |
CN113538291A (zh) * | 2021-08-02 | 2021-10-22 | 广州广电运通金融电子股份有限公司 | 卡证图像倾斜校正方法、装置、计算机设备和存储介质 |
CN113743072A (zh) * | 2021-08-03 | 2021-12-03 | 合肥工业大学 | 家谱登记表的信息抽取方法及其装置、电子设备 |
CN113762260A (zh) * | 2020-09-09 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | 一种版面图片的处理方法、装置、设备及存储介质 |
CN113850175A (zh) * | 2021-09-22 | 2021-12-28 | 上海妙一生物科技有限公司 | 一种单据识别方法、装置、设备及存储介质 |
CN114155544A (zh) * | 2021-11-15 | 2022-03-08 | 深圳前海环融联易信息科技服务有限公司 | 一种无线表格识别方法、装置、计算机设备及存储介质 |
CN114170616A (zh) * | 2021-11-15 | 2022-03-11 | 嵊州市光宇实业有限公司 | 基于图纸组的电力工程物资信息采集及分析系统和方法 |
CN114612921A (zh) * | 2022-05-12 | 2022-06-10 | 中信证券股份有限公司 | 表单识别方法、装置、电子设备和计算机可读介质 |
CN115841679A (zh) * | 2023-02-23 | 2023-03-24 | 江西中至科技有限公司 | 图纸表格提取方法、系统、计算机及可读存储介质 |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961008B (zh) * | 2019-02-13 | 2024-07-16 | 平安科技(深圳)有限公司 | 基于文字定位识别的表格解析方法、介质及计算机设备 |
CN110334647A (zh) * | 2019-07-03 | 2019-10-15 | 云南电网有限责任公司信息中心 | 一种基于图像识别的参数格式化方法 |
CN110347994B (zh) * | 2019-07-12 | 2023-06-30 | 北京香侬慧语科技有限责任公司 | 一种表格处理方法和装置 |
CN110532968B (zh) * | 2019-09-02 | 2023-05-23 | 苏州美能华智能科技有限公司 | 表格识别方法、装置和存储介质 |
CN110826393B (zh) * | 2019-09-17 | 2022-12-30 | 中国地质大学(武汉) | 钻孔柱状图信息自动提取方法 |
CN110956087B (zh) * | 2019-10-25 | 2024-04-19 | 北京懿医云科技有限公司 | 一种图片中表格的识别方法、装置、可读介质和电子设备 |
CN110895696A (zh) * | 2019-11-05 | 2020-03-20 | 泰康保险集团股份有限公司 | 一种图像信息提取方法和装置 |
CN111178353A (zh) * | 2019-12-16 | 2020-05-19 | 中国建设银行股份有限公司 | 一种图像文字的定位方法和装置 |
CN111368744B (zh) * | 2020-03-05 | 2023-06-27 | 中国工商银行股份有限公司 | 图片中非结构化表格识别方法及装置 |
CN111382717B (zh) * | 2020-03-17 | 2022-09-09 | 腾讯科技(深圳)有限公司 | 一种表格识别方法、装置和计算机可读存储介质 |
CN111428723B (zh) * | 2020-04-02 | 2021-08-24 | 苏州杰锐思智能科技股份有限公司 | 字符识别方法及装置、电子设备、存储介质 |
CN111639637B (zh) * | 2020-05-29 | 2023-08-15 | 北京百度网讯科技有限公司 | 表格识别方法、装置、电子设备和存储介质 |
CN111753727B (zh) * | 2020-06-24 | 2023-06-23 | 北京百度网讯科技有限公司 | 用于提取结构化信息的方法、装置、设备及可读存储介质 |
CN111783735B (zh) * | 2020-07-22 | 2021-01-22 | 欧冶云商股份有限公司 | 一种基于人工智能的钢材单据解析系统 |
CN112149506B (zh) * | 2020-08-25 | 2025-01-03 | 北京来也网络科技有限公司 | 结合rpa和ai的图像中的表格生成方法、设备及存储介质 |
CN113807158A (zh) * | 2020-12-04 | 2021-12-17 | 四川医枢科技股份有限公司 | 一种pdf内容提取方法、装置及设备 |
CN112541332B (zh) * | 2020-12-08 | 2023-06-23 | 北京百度网讯科技有限公司 | 表单信息抽取方法、装置、电子设备及存储介质 |
CN112733855B (zh) * | 2020-12-30 | 2024-04-09 | 科大讯飞股份有限公司 | 表格结构化方法、表格恢复设备及具有存储功能的装置 |
CN113553892A (zh) * | 2020-12-31 | 2021-10-26 | 内蒙古卫数数据科技有限公司 | 一种基于深度学习和ocr的检验、体检报告单结果提取方法 |
CN113065405B (zh) * | 2021-03-08 | 2022-12-23 | 南京苏宁软件技术有限公司 | 图片识别方法、装置、计算机设备和存储介质 |
CN113297308B (zh) * | 2021-03-12 | 2023-09-22 | 贝壳找房(北京)科技有限公司 | 表格结构化信息提取方法、装置及电子设备 |
CN112906695B (zh) * | 2021-04-14 | 2022-03-08 | 数库(上海)科技有限公司 | 适配多类ocr识别接口的表格识别方法及相关设备 |
CN113298167B (zh) * | 2021-06-01 | 2024-10-15 | 北京思特奇信息技术股份有限公司 | 一种基于轻量级神经网络模型的文字检测方法及系统 |
CN113609906B (zh) * | 2021-06-30 | 2024-06-21 | 南京信息工程大学 | 一种面向文献的表格信息抽取方法 |
CN113569677B (zh) * | 2021-07-16 | 2024-07-16 | 国网天津市电力公司 | 一种基于扫描件的纸质试验报告生成方法 |
CN113989822B (zh) * | 2021-12-24 | 2022-03-08 | 中奥智能工业研究院(南京)有限公司 | 基于计算机视觉和自然语言处理的图片表格内容提取方法 |
CN114782969A (zh) * | 2022-05-11 | 2022-07-22 | 北京鼎泰智源科技有限公司 | 一种基于生成对抗网络的图像表格数据提取方法及装置 |
CN117875293B (zh) * | 2024-01-08 | 2024-08-30 | 北京当镜数字科技有限公司 | 一种业务表单模板快速数字化的生成方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908136A (zh) * | 2009-06-08 | 2010-12-08 | 比亚迪股份有限公司 | 一种表格识别处理方法及系统 |
US20150169972A1 (en) * | 2013-12-12 | 2015-06-18 | Aliphcom | Character data generation based on transformed imaged data to identify nutrition-related data or other types of data |
CN105512611A (zh) * | 2015-11-25 | 2016-04-20 | 成都数联铭品科技有限公司 | 一种表格图像检测识别方法 |
CN108805076A (zh) * | 2018-06-07 | 2018-11-13 | 浙江大学 | 环境影响评估报告书表格文字的提取方法及系统 |
CN109961008A (zh) * | 2019-02-13 | 2019-07-02 | 平安科技(深圳)有限公司 | 基于文字定位识别的表格解析方法、介质及计算机设备 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104517112B (zh) * | 2013-09-29 | 2017-11-28 | 北大方正集团有限公司 | 一种表格识别方法与系统 |
CN105426856A (zh) * | 2015-11-25 | 2016-03-23 | 成都数联铭品科技有限公司 | 一种图像表格文字识别方法 |
-
2019
- 2019-02-13 CN CN201910115364.7A patent/CN109961008B/zh active Active
- 2019-11-14 WO PCT/CN2019/118422 patent/WO2020164281A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101908136A (zh) * | 2009-06-08 | 2010-12-08 | 比亚迪股份有限公司 | 一种表格识别处理方法及系统 |
US20150169972A1 (en) * | 2013-12-12 | 2015-06-18 | Aliphcom | Character data generation based on transformed imaged data to identify nutrition-related data or other types of data |
CN105512611A (zh) * | 2015-11-25 | 2016-04-20 | 成都数联铭品科技有限公司 | 一种表格图像检测识别方法 |
CN108805076A (zh) * | 2018-06-07 | 2018-11-13 | 浙江大学 | 环境影响评估报告书表格文字的提取方法及系统 |
CN109961008A (zh) * | 2019-02-13 | 2019-07-02 | 平安科技(深圳)有限公司 | 基于文字定位识别的表格解析方法、介质及计算机设备 |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112036304A (zh) * | 2020-08-31 | 2020-12-04 | 平安医疗健康管理股份有限公司 | 医疗票据版面识别的方法、装置及计算机设备 |
CN113762260A (zh) * | 2020-09-09 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | 一种版面图片的处理方法、装置、设备及存储介质 |
CN112132794A (zh) * | 2020-09-14 | 2020-12-25 | 杭州安恒信息技术股份有限公司 | 审计视频的文字定位方法、装置、设备和可读存储介质 |
CN111985459A (zh) * | 2020-09-18 | 2020-11-24 | 北京百度网讯科技有限公司 | 表格图像校正方法、装置、电子设备和存储介质 |
CN111985459B (zh) * | 2020-09-18 | 2023-07-28 | 北京百度网讯科技有限公司 | 表格图像校正方法、装置、电子设备和存储介质 |
CN112200117A (zh) * | 2020-10-22 | 2021-01-08 | 长城计算机软件与系统有限公司 | 表格识别方法及装置 |
CN112200117B (zh) * | 2020-10-22 | 2023-10-13 | 长城计算机软件与系统有限公司 | 表格识别方法及装置 |
CN112364726A (zh) * | 2020-10-27 | 2021-02-12 | 重庆大学 | 基于改进east的零件喷码字符定位的方法 |
CN112364726B (zh) * | 2020-10-27 | 2024-06-04 | 重庆大学 | 基于改进east的零件喷码字符定位的方法 |
CN112686258A (zh) * | 2020-12-10 | 2021-04-20 | 广州广电运通金融电子股份有限公司 | 体检报告信息结构化方法、装置、可读存储介质和终端 |
CN112712014A (zh) * | 2020-12-29 | 2021-04-27 | 平安健康保险股份有限公司 | 表格图片结构解析方法、系统、设备和可读存储介质 |
CN112712014B (zh) * | 2020-12-29 | 2024-04-30 | 平安健康保险股份有限公司 | 表格图片结构解析方法、系统、设备和可读存储介质 |
CN112800904A (zh) * | 2021-01-19 | 2021-05-14 | 深圳市玩瞳科技有限公司 | 一种根据手指指向识别图片中字符串的方法及装置 |
CN113128490A (zh) * | 2021-04-28 | 2021-07-16 | 湖南荣冠智能科技有限公司 | 一种处方信息扫描和自动识别方法 |
CN113128490B (zh) * | 2021-04-28 | 2023-12-05 | 湖南荣冠智能科技有限公司 | 一种处方信息扫描和自动识别方法 |
CN113392811B (zh) * | 2021-07-08 | 2023-08-01 | 北京百度网讯科技有限公司 | 一种表格提取方法、装置、电子设备及存储介质 |
CN113378789A (zh) * | 2021-07-08 | 2021-09-10 | 京东数科海益信息科技有限公司 | 单元格位置的检测方法、装置和电子设备 |
CN113392811A (zh) * | 2021-07-08 | 2021-09-14 | 北京百度网讯科技有限公司 | 一种表格提取方法、装置、电子设备及存储介质 |
CN113378789B (zh) * | 2021-07-08 | 2023-09-26 | 京东科技信息技术有限公司 | 单元格位置的检测方法、装置和电子设备 |
CN113538291A (zh) * | 2021-08-02 | 2021-10-22 | 广州广电运通金融电子股份有限公司 | 卡证图像倾斜校正方法、装置、计算机设备和存储介质 |
CN113538291B (zh) * | 2021-08-02 | 2024-05-14 | 广州广电运通金融电子股份有限公司 | 卡证图像倾斜校正方法、装置、计算机设备和存储介质 |
CN113743072A (zh) * | 2021-08-03 | 2021-12-03 | 合肥工业大学 | 家谱登记表的信息抽取方法及其装置、电子设备 |
CN113850175A (zh) * | 2021-09-22 | 2021-12-28 | 上海妙一生物科技有限公司 | 一种单据识别方法、装置、设备及存储介质 |
CN114170616A (zh) * | 2021-11-15 | 2022-03-11 | 嵊州市光宇实业有限公司 | 基于图纸组的电力工程物资信息采集及分析系统和方法 |
CN114155544A (zh) * | 2021-11-15 | 2022-03-08 | 深圳前海环融联易信息科技服务有限公司 | 一种无线表格识别方法、装置、计算机设备及存储介质 |
CN114612921B (zh) * | 2022-05-12 | 2022-07-19 | 中信证券股份有限公司 | 表单识别方法、装置、电子设备和计算机可读介质 |
CN114612921A (zh) * | 2022-05-12 | 2022-06-10 | 中信证券股份有限公司 | 表单识别方法、装置、电子设备和计算机可读介质 |
CN115841679B (zh) * | 2023-02-23 | 2023-05-05 | 江西中至科技有限公司 | 图纸表格提取方法、系统、计算机及可读存储介质 |
CN115841679A (zh) * | 2023-02-23 | 2023-03-24 | 江西中至科技有限公司 | 图纸表格提取方法、系统、计算机及可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN109961008A (zh) | 2019-07-02 |
CN109961008B (zh) | 2024-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020164281A1 (zh) | 基于文字定位识别的表格解析方法、介质及计算机设备 | |
WO2020107765A1 (zh) | 语句分析处理方法、装置、设备以及计算机可读存储介质 | |
WO2020253112A1 (zh) | 测试策略的获取方法、装置、终端及可读存储介质 | |
WO2019156332A1 (ko) | 증강현실용 인공지능 캐릭터의 제작 장치 및 이를 이용한 서비스 시스템 | |
WO2012161359A1 (ko) | 사용자 인터페이스 방법 및 장치 | |
WO2015065006A1 (en) | Multimedia apparatus, online education system, and method for providing education content thereof | |
WO2020107761A1 (zh) | 广告文案处理方法、装置、设备及计算机可读存储介质 | |
WO2018090740A1 (zh) | 一种基于混合现实技术实现陪伴的方法及装置 | |
WO2011068284A1 (ko) | 어학학습 전자기기 구동 방법, 시스템 및 이를 응용한 동시통역 학습기 | |
CN111665941A (zh) | 一种面向虚拟实验的多模态语义融合人机交互系统和方法 | |
WO2024048854A1 (ko) | 지식 기반 질의 응답을 위한 구조적 주의 집중 기제 기반의 추론 방법 및 이를 수행하기 위한 컴퓨팅 장치 | |
WO2018088664A1 (ko) | 러프 셋을 이용한 형태소 품사 태깅 코퍼스 오류 자동 검출 장치 및 그 방법 | |
WO2020159140A1 (ko) | 전자 장치 및 이의 제어 방법 | |
WO2016182393A1 (ko) | 사용자의 감성을 분석하는 방법 및 디바이스 | |
WO2020045909A1 (en) | Apparatus and method for user interface framework for multi-selection and operation of non-consecutive segmented information | |
WO2014069815A1 (ko) | 학습용 마스크 디스플레이 장치 및 학습용 마스크 표시 방법 | |
WO2012034469A1 (zh) | 基于手势的人机交互方法及系统、计算机存储介质 | |
WO2023224433A1 (en) | Information generation method and device | |
WO2020022645A1 (en) | Method and electronic device for configuring touch screen keyboard | |
CN112016077A (zh) | 一种基于滑动轨迹模拟的页面信息获取方法、装置和电子设备 | |
WO2023068495A1 (ko) | 전자 장치 및 그 제어 방법 | |
WO2022145723A1 (en) | Method and apparatus for detecting layout | |
WO2023277421A1 (ko) | 수어의 형태소 단위 분할 방법, 형태소 위치 예측 방법 및 데이터 증강 방법 | |
CN117671426A (zh) | 基于概念蒸馏和clip的可提示分割模型预训练方法及系统 | |
WO2015109772A1 (zh) | 数据处理设备和数据处理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19915547 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 05.10.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19915547 Country of ref document: EP Kind code of ref document: A1 |