Nothing Special   »   [go: up one dir, main page]

CN111368693A - Identification method and device for identity card information - Google Patents

Identification method and device for identity card information Download PDF

Info

Publication number
CN111368693A
CN111368693A CN202010129302.4A CN202010129302A CN111368693A CN 111368693 A CN111368693 A CN 111368693A CN 202010129302 A CN202010129302 A CN 202010129302A CN 111368693 A CN111368693 A CN 111368693A
Authority
CN
China
Prior art keywords
address
information
character set
text information
address information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010129302.4A
Other languages
Chinese (zh)
Inventor
冯程
吴昀蓁
易显维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202010129302.4A priority Critical patent/CN111368693A/en
Publication of CN111368693A publication Critical patent/CN111368693A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Remote Sensing (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a method and a device for identifying identity card information, and relates to the technical field of computers. One specific implementation mode of the method comprises the steps of receiving identity card picture information and obtaining a slice of a target column; inputting the slice into a recognition model based on a preset character set to obtain corresponding text information; wherein, the character set is a selected non-repetitive character set; and searching address information matched with the text information according to a preset address library, and correcting and outputting the text information according to the address information. Therefore, the embodiment of the invention can solve the problem of low identification efficiency of the existing identification card address and issuing authority column.

Description

Identification method and device for identity card information
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for identifying identity card information.
Background
The conventional OCR process for identity cards generally comprises two steps of text detection and text recognition. Text detection refers to framing out an area containing text, and text recognition refers to recognizing horizontal lines of text. When the recognition is performed, a given character set is required, and a common Chinese character set is usually adopted when the character set is selected, wherein the number of characters in the set is 3755.
Wherein, OCR is Optical Character Recognition in English and Chinese is Optical Character Recognition. It uses optical technology and computer technology to read out the characters printed or written on paper and convert them into a format which can be accepted by computer and understood by human. The identification card OCR identification refers to a process of identifying characters in an identification card column by an OCR method.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
when identifying the ID card address and the columns of the issuing organization, the address often includes some uncommon words, which are not included in the common character set, but the number of characters of the Chinese character set including all the uncommon words is more than 9000. The model needs to be trained before the identity card is identified by using the model, and based on the above conditions, the model is not only slow in training speed but also difficult to train, so that the identification efficiency of the identity card address and the column of the issuing authority is low.
In addition, the model refers to some potential law existing in the data, and the model training refers to a process of finding out the law, which is also called "learning", and the process is completed by executing some learning algorithm. The data used by the training process is referred to as "training data", where each sample is referred to as a "training sample" and the set of training samples is referred to as a "training set". The length of time required to train out the model is the model training speed.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for identifying identity card information, which can solve the problem of low efficiency in identifying the address of an identity card and the column of an issuing authority.
In order to achieve the above object, according to an aspect of the embodiments of the present invention, a method for identifying identity card information is provided, including receiving identity card picture information, and obtaining a slice of a target field; inputting the slice into a recognition model based on a preset character set to obtain corresponding text information; wherein, the character set is a selected non-repetitive character set; and searching address information matched with the text information according to a preset address library, and correcting and outputting the text information according to the address information.
Optionally, selecting a non-repetitive character set includes:
establishing a python list, and traversing each sample in a training set of the recognition model;
each sample is cut into individual characters and placed in a python list to obtain a non-repetitive character set through the unique function of python.
Optionally, the method further comprises:
the character set is stored as a text file for recall.
Optionally, obtaining a slice of the target field includes:
and intercepting the slice on the identity card picture according to the area coordinate of the target column.
Optionally, comprising:
the identification model adopts a CRNN model.
Optionally, searching for address information matching the text information includes:
coding the text information based on a preset address dictionary;
and calculating cosine similarity between codes corresponding to the address information in the address library according to the coded text information to obtain the address information with the highest cosine similarity.
Optionally, after calculating the cosine similarity between codes corresponding to the address information in the address library, the method includes:
and sorting the address information in the address base from high to low according to the cosine similarity value.
Optionally, the method further comprises:
before coding the text information or the address information in the address base, extracting key words of the text information or the address information in the address base, and removing preset fixed words.
In addition, the invention also provides an identification device of the identity card information, which comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for receiving the identity card picture information and acquiring the slice of the target column; the processing module is used for inputting the slices into the recognition model based on a preset character set so as to obtain corresponding text information; wherein, the character set is a selected non-repetitive character set; and searching address information matched with the text information according to a preset address library, and correcting and outputting the text information according to the address information.
Optionally, the processing module selects a non-repetitive character set, including:
establishing a python list, and traversing each sample in a training set of the recognition model;
each sample is cut into individual characters and placed in a python list to obtain a non-repetitive character set through the unique function of python.
Optionally, the processing module is further configured to:
the character set is stored as a text file for recall.
Optionally, the obtaining module obtains the slice of the target field, including:
and intercepting the slice on the identity card picture according to the area coordinate of the target column.
Optionally, comprising:
the identification model adopts a CRNN model.
Optionally, the searching, by the processing module, for address information matching the text information includes:
coding the text information based on a preset address dictionary;
and calculating cosine similarity between codes corresponding to the address information in the address library according to the coded text information to obtain the address information with the highest cosine similarity.
Optionally, after the processing module calculates the cosine similarity between the codes corresponding to the address information in the address library, the processing module includes:
and sorting the address information in the address base from high to low according to the cosine similarity value.
Optionally, the processing module is further configured to:
before coding the text information or the address information in the address base, extracting key words of the text information or the address information in the address base, and removing preset fixed words.
One embodiment of the above invention has the following advantages or benefits: the slice of the target column is obtained by receiving the picture information of the identity card; inputting the slice into a recognition model based on a preset character set to obtain corresponding text information; wherein, the character set is a selected non-repetitive character set; according to a preset address library, address information matched with the text information is searched, and then the text information is corrected according to the address information and output, so that the technical problem that the existing identification card address and issuing authority field identification efficiency is low is solved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic view of a main flow of an identification method of identification card information according to a first embodiment of the present invention;
fig. 2 is a schematic view of a main flow of an identification method of identification card information according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of a main flow of building a recognition model according to a third embodiment of the present invention;
fig. 4 is a schematic diagram of main blocks of an identification apparatus of identification card information according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of an identification method of identification card information according to a first embodiment of the present invention, as shown in fig. 1, the identification method of identification card information includes:
and S101, receiving the picture information of the identity card and acquiring the slice of the target column.
In some embodiments, since the identity card is not easily bent, the positions of the target columns in different identity cards are fixed, and therefore, the slice can be cut from the identity card picture according to the area coordinates of the target columns, so as to obtain the slice of the target columns.
For example, the target field area may be rectangular, that is, coordinates of four vertices of the rectangle may be stored in advance, and then the slice is intercepted according to the coordinates of the four vertices.
Step S102, inputting the slice into a recognition model based on a preset character set to obtain corresponding text information; wherein, the character set is a selected non-repetitive character set.
In some embodiments, the non-repeating character set is selected by:
and establishing a python list, and traversing each sample in the training set of the recognition model. Each sample is cut into individual characters and placed in a python list to obtain a non-repetitive character set through the unique function of python. Among them, python is a cross-platform computer programming language. The function of the unique function is element deduplication.
Preferably, the character set is stored as a text file for recall.
In other embodiments, the recognition model may employ a CRNN model. The CRNN (An End-to-End variable near Network for Image-based Sequence Recognition and Itsap search to Scene Text Recognition) Recognition method includes CNN (volumetric near Network) for extracting feature sequences from An input Image, RNN (current near Network, loop layer for predicting a tag (real value) distribution of feature sequences obtained from the loop layer), and CTC (connection Temporal Classification, transcription layer for converting the tag distribution obtained from the loop layer into a final Recognition result through operations such as deduplication integration). Preferably, the non-repetitive character set in the training set is selected as the character set in the CRNN.
And step S103, searching address information matched with the text information according to a preset address library, and correcting and outputting the text information according to the address information.
In some embodiments, when address information matching the text information is searched, the text information may be encoded based on a preset address dictionary. And calculating cosine similarity between codes corresponding to the address information in the address library according to the coded text information to obtain the address information with the highest cosine similarity. That is, the address information with the highest cosine similarity can be substituted for the text information and output, thereby realizing the correction of the text information.
For example: if 5000 words are included in the address dictionary, the text information or each address information in the address base can be encoded into a vector with 5000 dimensions. The position superscript 1 of the word in the address dictionary is included for each address information in the text information or address library, and the position superscript 0 is not included. The address information with the highest cosine similarity is obtained as the reference standard address, and then the address information can be directly used as text information to further realize the correction of the text information. Such as: the text information is 'yellow leather region in Wuhan City', the address information with the highest similarity value is 'yellow and wavelike region in Wuhan City', and the similarity value is 0.447214.
In a further embodiment, after the cosine similarity between the codes corresponding to the address information in the address library is calculated, the address information in the address library can be sorted from high to low according to the cosine similarity value.
In addition, before encoding the text information or the address information in the address base, extracting keywords of the text information or the address information in the address base (for example, by a method of semantic-based keyword extraction, word2vec + Kmeans, and the like), and removing preset fixed words, for example: the removed fixed words can be the common "province, city, district, town, village".
Preferably, the address information matched with the text information in the address base, that is, the address information with the highest similarity is obtained through a term frequency-inverse document frequency algorithm (tf-idf algorithm). The tf-idf algorithm is a weighting technique for information retrieval (information retrieval) and text mining (text mining).
In summary, the identification method of the identity card information according to the present invention adopts a method of reducing the character set to improve the model training speed in the identification of the address and the issuing authority column of the identity card, and the method of reducing the character set is to select a non-repetitive character set for the training set.
That is to say, when the identification information of the identity card is identified, a frequently used chinese character set is usually selected as a character set, the number of characters contained in the character set is 3755, in the identification of the specific problems of the address and the issuing organization in the identity card, many uncommon characters are often contained, the frequently used character set cannot meet the requirements, and the number of characters contained in the frequently used character set including the uncommon characters is as high as 9000, so that the training is time-consuming and the satisfactory model is difficult to train. The invention adopts the minimum non-repeated set in the training set as the character set, the number of the character set is only 3374, the required uncommon characters are contained, and the number of the character set is lower than that of the character set in common use, so the training speed is reduced, and the prediction accuracy is improved.
Fig. 2 is a schematic diagram of a main flow of an identification method of identity card information according to a second embodiment of the present invention, as shown in fig. 2, the identification method of identity card information includes:
step S201, receiving the information of the id card picture, and obtaining a slice of the target field.
Step S202, based on a preset character set, inputting the slice into a recognition model to obtain corresponding text information.
The character set is a non-repetitive character set, and each sample in a training set of a traversal recognition model can be established by establishing a python list; and cutting each sample into a single character, and putting the single character into a python list to obtain a non-repetitive character set based on the unique function of python.
Step S203, extracting the keywords of the text information, and removing the preset fixed words.
And step S204, encoding the text information based on a preset address dictionary.
Step S205, according to the coded text information, calculating the cosine similarity between the codes corresponding to the address information in the address base.
Step S206, according to the cosine similarity value, the address information in the address base is sorted from high to low.
Step S207, the address information with the highest cosine similarity is obtained.
And S208, correcting the text information according to the address information to output the corrected text information.
Fig. 3 is a schematic diagram of a main flow of constructing a recognition model according to a third embodiment of the present invention, and as shown in fig. 3, the method of constructing a recognition model includes:
step S301, obtaining the picture information of the identity card, and intercepting the slice of the target column to obtain a training sample.
Step S302, a python list is established, and each training sample is traversed.
Step S303, cutting each training sample into a single character, and putting the single character into a python list to obtain a non-repetitive character set through an unique function of python.
Step S304, storing the non-repeated character set as a text file for calling.
Step S305, calling a non-repetitive character set, and inputting a training sample into the CRNN model for model training.
Step S306, obtaining a recognition model.
Fig. 4 is a schematic diagram of main modules of an identification apparatus for identification card information according to an embodiment of the present invention, and as shown in fig. 4, the identification apparatus 400 for identification card information includes an obtaining module 401 and a processing module 402. The obtaining module 401 receives the information of the id card picture, and obtains the slice of the target field. The processing module 402 inputs the slice into the recognition model based on a preset character set to obtain corresponding text information; wherein, the character set is a selected non-repetitive character set; and searching address information matched with the text information according to a preset address library, and correcting and outputting the text information according to the address information.
In some embodiments, the processing module 402 selects a non-repeating character set, including:
establishing a python list, and traversing each sample in a training set of the recognition model;
each sample is cut into individual characters and placed in a python list to obtain a non-repetitive character set through the unique function of python.
In a further embodiment, the processing module 402 is further configured to: the character set is stored as a text file for recall.
In other embodiments, the obtaining module 401 obtains a slice of the target field, including: and intercepting the slice on the identity card picture according to the area coordinate of the target column.
It is worth noting that the recognition model may employ a CRNN model.
As another embodiment, the processing module 402 searches for address information matching the text information, including:
coding the text information based on a preset address dictionary; and calculating cosine similarity between codes corresponding to the address information in the address library according to the coded text information to obtain the address information with the highest cosine similarity.
In a further embodiment, after the processing module 402 calculates the cosine similarity between the codes corresponding to the address information in the address library, the method includes:
and sorting the address information in the address base from high to low according to the cosine similarity value.
In another further embodiment, the processing module 402 is further configured to:
before coding the text information or the address information in the address base, extracting key words of the text information or the address information in the address base, and removing preset fixed words.
It should be noted that the identification method of the identification card information and the identification apparatus of the identification card information according to the present invention have a corresponding relationship in the specific implementation content, and therefore, the repeated content is not described again.
Fig. 5 shows an exemplary system architecture 500 of an identification method or an identification apparatus of identification card information to which an embodiment of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be various electronic devices having an identification screen of identification card information and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 501, 502, 503. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the identification method of the identification card information provided by the embodiment of the present invention is generally executed by the server 505, and accordingly, the computing device is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the computer system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output section 607 including a display such as a Cathode Ray Tube (CRT), a liquid crystal identification card information identifier (LCD), and the like, and a speaker and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module and a processing module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include receiving identification card picture information and obtaining a slice of a target field; inputting the slice into a recognition model based on a preset character set to obtain corresponding text information; wherein, the character set is a selected non-repetitive character set; and searching address information matched with the text information according to a preset address library, and correcting and outputting the text information according to the address information.
According to the technical scheme of the embodiment of the invention, the problem of low identification efficiency of the existing identification card address and issuing authority column can be solved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (18)

1. A method for identifying identity card information is characterized by comprising the following steps:
receiving the picture information of the identity card and acquiring a slice of a target column;
inputting the slice into a recognition model based on a preset character set to obtain corresponding text information; wherein, the character set is a selected non-repetitive character set;
and searching address information matched with the text information according to a preset address library, and correcting and outputting the text information according to the address information.
2. The method of claim 1, wherein selecting a non-repeating character set comprises:
establishing a python list, and traversing each sample in a training set of the recognition model;
each sample is cut into individual characters and placed in a python list to obtain a non-repetitive character set through the unique function of python.
3. The method of claim 2, further comprising:
the character set is stored as a text file for recall.
4. The method of claim 1, wherein obtaining a slice of the target field comprises:
and intercepting the slice on the identity card picture according to the area coordinate of the target column.
5. The method of claim 1, comprising:
the identification model adopts a CRNN model.
6. The method of claim 1, wherein searching for address information matching the text information comprises:
coding the text information based on a preset address dictionary;
and calculating cosine similarity between codes corresponding to the address information in the address library according to the coded text information to obtain the address information with the highest cosine similarity.
7. The method of claim 6, wherein after calculating the cosine similarity between codes corresponding to address information in the address library, the method comprises:
and sorting the address information in the address base from high to low according to the cosine similarity value.
8. The method of claim 6, further comprising:
before coding the text information or the address information in the address base, extracting key words of the text information or the address information in the address base, and removing preset fixed words.
9. An apparatus for recognizing identity card information, comprising:
the acquisition module is used for receiving the identity card picture information and acquiring the slice of the target column;
the processing module is used for inputting the slices into the recognition model based on a preset character set so as to obtain corresponding text information; wherein, the character set is a selected non-repetitive character set; and searching address information matched with the text information according to a preset address library, and correcting and outputting the text information according to the address information.
10. The apparatus of claim 9, wherein the processing module selects a non-repeating character set comprising:
establishing a python list, and traversing each sample in a training set of the recognition model;
each sample is cut into individual characters and placed in a python list to obtain a non-repetitive character set through the unique function of python.
11. The apparatus of claim 10, wherein the processing module is further configured to:
the character set is stored as a text file for recall.
12. The apparatus of claim 9, wherein the obtaining module obtains the slice of the target field, comprising:
and intercepting the slice on the identity card picture according to the area coordinate of the target column.
13. The apparatus of claim 9, comprising:
the identification model adopts a CRNN model.
14. The apparatus of claim 9, wherein the processing module searches for address information matching the text information, comprising:
coding the text information based on a preset address dictionary;
and calculating cosine similarity between codes corresponding to the address information in the address library according to the coded text information to obtain the address information with the highest cosine similarity.
15. The apparatus of claim 14, wherein after the processing module calculates the cosine similarity between the codes corresponding to the address information in the address library, the processing module further comprises:
and sorting the address information in the address base from high to low according to the cosine similarity value.
16. The apparatus of claim 14, wherein the processing module is further configured to:
before coding the text information or the address information in the address base, extracting key words of the text information or the address information in the address base, and removing preset fixed words.
17. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
18. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202010129302.4A 2020-02-28 2020-02-28 Identification method and device for identity card information Pending CN111368693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010129302.4A CN111368693A (en) 2020-02-28 2020-02-28 Identification method and device for identity card information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010129302.4A CN111368693A (en) 2020-02-28 2020-02-28 Identification method and device for identity card information

Publications (1)

Publication Number Publication Date
CN111368693A true CN111368693A (en) 2020-07-03

Family

ID=71210216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010129302.4A Pending CN111368693A (en) 2020-02-28 2020-02-28 Identification method and device for identity card information

Country Status (1)

Country Link
CN (1) CN111368693A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569839A (en) * 2021-08-31 2021-10-29 重庆紫光华山智安科技有限公司 Certificate identification method, system, device and medium
CN114155543A (en) * 2021-12-08 2022-03-08 北京百度网讯科技有限公司 Neural network training method, document image understanding method, device and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106375942A (en) * 2016-09-20 2017-02-01 杭州联络互动信息科技股份有限公司 Method and device for transmission of data information
CN107194407A (en) * 2017-05-18 2017-09-22 网易(杭州)网络有限公司 A kind of method and apparatus of image understanding
CN109102844A (en) * 2018-08-24 2018-12-28 北京锐客科技有限公司 A kind of clinical test source data automatic Verification method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106375942A (en) * 2016-09-20 2017-02-01 杭州联络互动信息科技股份有限公司 Method and device for transmission of data information
CN107194407A (en) * 2017-05-18 2017-09-22 网易(杭州)网络有限公司 A kind of method and apparatus of image understanding
CN109102844A (en) * 2018-08-24 2018-12-28 北京锐客科技有限公司 A kind of clinical test source data automatic Verification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
梁开孟等: "基于电商平台的客户评论诗句分析与挖掘" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569839A (en) * 2021-08-31 2021-10-29 重庆紫光华山智安科技有限公司 Certificate identification method, system, device and medium
CN113569839B (en) * 2021-08-31 2024-02-09 重庆紫光华山智安科技有限公司 Certificate identification method, system, equipment and medium
CN114155543A (en) * 2021-12-08 2022-03-08 北京百度网讯科技有限公司 Neural network training method, document image understanding method, device and equipment

Similar Documents

Publication Publication Date Title
US20200013386A1 (en) Method and apparatus for outputting voice
CN107679119B (en) Method and device for generating brand derivative words
US11055373B2 (en) Method and apparatus for generating information
CN107766492B (en) Image searching method and device
CN113657113B (en) Text processing method and device and electronic equipment
CN114861889B (en) Deep learning model training method, target object detection method and device
CN111368697A (en) Information identification method and device
CN112988753B (en) Data searching method and device
CN110910178A (en) Method and device for generating advertisement
CN112148841B (en) Object classification and classification model construction method and device
CN111368693A (en) Identification method and device for identity card information
CN110874532A (en) Method and device for extracting keywords of feedback information
CN110852057A (en) Method and device for calculating text similarity
CN113742485A (en) Method and device for processing text
CN113590756A (en) Information sequence generation method and device, terminal equipment and computer readable medium
CN112784596A (en) Method and device for identifying sensitive words
CN111783433A (en) Text retrieval error correction method and device
CN113239687B (en) Data processing method and device
CN110929512A (en) Data enhancement method and device
CN113486148A (en) PDF file conversion method and device, electronic equipment and computer readable medium
CN110413603B (en) Method and device for determining repeated data, electronic equipment and computer storage medium
CN110796137A (en) Method and device for identifying image
CN111339776A (en) Resume parsing method and device, electronic equipment and computer-readable storage medium
CN112579080A (en) Method and device for generating user interface code
CN111178353A (en) Image character positioning method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220930

Address after: 25 Financial Street, Xicheng District, Beijing 100033

Applicant after: CHINA CONSTRUCTION BANK Corp.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.

TA01 Transfer of patent application right