CN114913535A - Method, device, equipment and medium for identifying three-dimensional multi-plane text - Google Patents
Method, device, equipment and medium for identifying three-dimensional multi-plane text Download PDFInfo
- Publication number
- CN114913535A CN114913535A CN202210310193.5A CN202210310193A CN114913535A CN 114913535 A CN114913535 A CN 114913535A CN 202210310193 A CN202210310193 A CN 202210310193A CN 114913535 A CN114913535 A CN 114913535A
- Authority
- CN
- China
- Prior art keywords
- data
- text
- plane
- feature vector
- obtaining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000013598 vector Substances 0.000 claims abstract description 55
- 238000012545 processing Methods 0.000 claims abstract description 28
- 230000009466 transformation Effects 0.000 claims description 47
- 238000013527 convolutional neural network Methods 0.000 claims description 29
- 238000011176 pooling Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 11
- 238000013519 translation Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000002285 radioactive effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method for identifying a three-dimensional multi-plane text, which comprises the following steps: acquiring image data of text blocks containing continuous planes; processing the image data to obtain feature vector data; according to the feature vector data, plane information data in the image data are obtained; obtaining text image information data in the image data according to the feature vector data; obtaining text data according to the plane information data and the text image information data; and obtaining character content data according to the text data. The method for identifying the text in the three-dimensional multi-plane can identify the characters in the three-dimensional scene.
Description
Technical Field
The invention relates to the technical field of character recognition, in particular to a method, a device, equipment and a medium for recognizing a three-dimensional multi-plane text.
Background
In recent years, with the development of deep learning techniques, text recognition based on text pictures has been able to achieve high accuracy. The traditional artificial intelligence character recognition technology cannot understand characters on a complex scene, for example, a three-dimensional scene is composed of a plurality of planes, and when characters exist on each plane, the traditional artificial intelligence character recognition technology cannot recognize the characters on the three-dimensional scene. The character detection and identification method in the prior art has the problems that the identification task of the text in a two-dimensional single plane can only be solved, the characters in the text image under continuous multiple planes cannot be identified, and the like.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a method, an apparatus, a device and a medium for recognizing a text located in a three-dimensional multi-plane, which can recognize a text in a three-dimensional scene.
To achieve the above and other related objects, the present invention provides a method for recognizing a three-dimensional multi-plane text, including:
acquiring image data of text blocks containing continuous planes;
processing the image data to obtain feature vector data;
according to the feature vector data, plane information data in the image data are obtained;
obtaining text image information data in the image data according to the feature vector data;
obtaining text data according to the plane information data and the text image information data;
and obtaining character content data according to the text data.
In an embodiment of the present invention, the processing the image data to obtain feature vector data includes:
inputting the image data serving as a parameter into a residual error network of a convolutional neural network to obtain intermediate data;
inputting the intermediate data serving as parameters into a pooling layer of the convolutional neural network to obtain feature vector data;
the feature vector data comprises text fractional feature map data and multichannel geometric figure feature map data of a single-channel pixel level.
In one embodiment of the present invention, the first and second electrodes are,
the obtaining of the plane information data in the image data according to the feature vector data includes:
inputting the text fractional feature map data of a single-channel pixel level in the feature vector data into a full-connection layer network of a convolutional neural network to obtain plane information data in the image data;
the plane information data comprises plane quantity data and plane parameter data;
the plane parameter data comprises codes, normals and offsets;
the loss function of the regression plane parameters of the full-connection layer network is expressed as:
wherein S is * Representing the required plane quantity data;
s represents data of the number of preset planes in the network;
P i * three-dimensional coordinates representing the predicted target point;
P j representing the three-dimensional coordinates of the shot point on the plane.
In an embodiment of the present invention, the obtaining text image information data in image data according to the feature vector data includes:
inputting the multichannel geometric figure feature map data in the feature vector data into a convolutional neural network to obtain text image information data in the image data;
the text image information data comprises position data and text block direction data of a text block in the image data.
In an embodiment of the present invention, the step of obtaining the text data according to the plane information data and the text image information data includes:
obtaining perspective transformation data according to plane parameter data in the plane information data;
and obtaining text data according to the perspective transformation data and the position data of the text block in the text image information data.
In one embodiment of the present invention, the perspective transformation data is represented as:
wherein M represents a perspective transformation matrix, i.e., perspective transformation data;
theta represents the rotation angle of the text box, and the text box represents a frame marked on the periphery of the character area;
t x representing a translation parameter in the x-axis direction in the perspective transformation;
t y representing a translation parameter in the y-axis direction in perspective transformation;
m represents a magnification parameter in the perspective transformation.
In an embodiment of the present invention, the translation parameter t in the x-axis direction in the perspective transformation x Expressed as: t is t x =d l *cosθ-d t *sinθ-u;
Translation parameter t in y-axis direction in the perspective transformation y Expressed as: t is t y =d t *cosθ+d l *sinθ-v;
the width w of the feature map after the radial transformation is expressed as: w ═ m ═ d (d) l +d r );
Wherein d is t Representing the distance from the feature point to the top of the text box, wherein the feature point represents and generates a text fractional feature graph and a multi-channel geometric figure feature graph at a single-channel pixel level;
d b representing the distance from the characteristic point to the bottommost part of the text box;
d l representing the distance from the feature point to the leftmost part of the text box;
d r representing the distance from the feature point to the rightmost part of the text box;
h represents the height of the feature map after radioactive transformation;
x, y and z represent coordinates of the picture obtained through perspective transformation;
u, v, and w represent coordinates of the feature points, and [ x, y, z ] is M [ u, v, w ].
The invention also provides a device for recognizing the three-dimensional multi-plane text, which comprises:
the data acquisition module is used for acquiring image data of text blocks containing continuous planes;
the extraction module is used for processing the image data to obtain feature vector data;
the first processing module is used for obtaining plane information data in the image data according to the feature vector data;
the second processing module is used for obtaining text image information data in the image data according to the feature vector data;
the text processing module is used for obtaining text data according to the plane information data and the text image information data; and
and the character recognition module is used for obtaining character content data according to the text data.
The invention also provides computer equipment which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the identification method of the three-dimensional multi-plane text.
The invention also provides a computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method for recognizing a text located in three-dimensional multi-planes.
As described above, the present invention provides a method, an apparatus, a device, and a medium for recognizing a text in three-dimensional multi-planes, which can solve a task of recognizing a text in multi-planes in a three-dimensional scene, and can recognize characters in a text image of continuous multi-planes in a three-dimensional scene.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart illustrating a method for recognizing a three-dimensional multi-plane text according to the present invention.
Fig. 2 is a flowchart illustrating an embodiment of step S20 in fig. 1.
Fig. 3 is a flowchart illustrating an embodiment of step S50 in fig. 1.
Fig. 4 is a schematic structural diagram of a recognition apparatus for three-dimensional multi-plane text according to the present invention.
Fig. 5 is a schematic structural diagram of the extraction module shown in fig. 4.
Fig. 6 is a schematic structural diagram of the text processing module in fig. 4.
Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Element number description:
10. a data acquisition module;
20. an extraction module; 21. a residual module; 22. a pooling module;
30. a first processing module;
40. a second processing module;
50. a text processing module; 51. a perspective transformation module; 52. a text extraction module;
60. and a character recognition module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Please refer to fig. 1-7. It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Referring to fig. 1, a method for recognizing a text in three-dimensional multi-plane according to an embodiment of the present invention may be applied to a text recognition process, and the method for recognizing a text in three-dimensional multi-plane may include the steps of:
step S10, image data of text blocks containing continuous multi-planes is acquired.
And step S20, processing the image data to obtain feature vector data.
Step S30 is to obtain plane information data in the image data based on the feature vector data.
And step S40, obtaining text image information data in the image data according to the characteristic vector data.
And step S50, obtaining text data according to the plane information data and the text image information data.
And step S60, obtaining character content data according to the text data.
In one embodiment of the present invention, when step S10 is performed, image data of text blocks containing consecutive multi-planes is acquired. Specifically, the image data of the text block containing multiple continuous planes may be a picture, and the picture may include multiple planes, for example, the picture may be a corner of a wall, so that the picture may include three planes perpendicular to each other. So that each plane may include a corresponding text word, the three planes being connected to each other, i.e. image data is obtained containing text blocks of successive planes.
Referring to fig. 2, in an embodiment of the invention, when step S20 is performed, the image data is processed to obtain feature vector data. The sub-step of step S20 may include:
and step S21, inputting the image data serving as parameters into a residual error network of a convolutional neural network in the character recognition model to obtain intermediate data.
And step S22, inputting the intermediate data serving as parameters into a pooling layer of a convolutional neural network in the character recognition model to obtain feature vector data, wherein the feature vector data comprises single-channel pixel-level text fractional feature map data and multi-channel geometric figure feature map data.
In an embodiment of the present invention, specifically, when performing step S21, feature extraction may be performed on the image data first, and the image data is input into a residual error network of a convolutional neural network in the character recognition model as a parameter to obtain intermediate data. Convolutional Neural Networks (CNNs) are a class of feed-forward Neural Networks that contain convolution computations and have a deep structure. The convolutional neural network has the characteristic learning ability and can carry out translation invariant classification on input information according to the hierarchical structure of the convolutional neural network. The convolutional neural network has a multi-layer structure and may include an input layer, a hidden layer, and an output layer. The input layer can process multidimensional data, and before the input data is input into the convolutional neural network, the input data needs to be normalized in a channel or time/frequency dimension. The hidden layers may include convolutional layers, pooling layers, fully-connected layers, residual networks, and the like. The residual error network can improve the accuracy by increasing the equivalent depth, and the gradient disappearance problem caused by increasing the depth in the neural network is relieved because the inner residual error block uses jump connection.
In one embodiment of the present invention, specifically, when performing step S22, the obtained intermediate data may be used as a parameter to be input into the pooling layer of the convolutional neural network to obtain the feature vector data. In the pooling layer, after the feature extraction is performed on the convolutional layer, the output feature map is transmitted to the pooling layer for feature selection and information filtering. The pooling layer contains a pre-set pooling function whose function is to replace the result of a single point in the feature map with the feature map statistics of its neighboring regions. The step of selecting the pooling area by the pooling layer is the same as the step of scanning the characteristic diagram by the convolution kernel, and the pooling size, the step length and the filling are controlled.
In one embodiment of the present invention, when step S30 is performed, that is, based on the feature vector data, plane information data within the image data is obtained. Specifically, the text fractional feature map data of a single-channel pixel level in the feature vector data can be input into a full-connection layer network of a convolutional neural network to obtain plane information data in the image data;
the plane information data comprises plane quantity data and plane parameter data;
the plane parameter data comprises codes, normals and offsets;
the loss function of the regression plane parameters of the full-connection layer network is expressed as:
wherein S is * Indicating that the number of planes data is required,
s represents data of a preset number of planes in the network,
P i * three-dimensional coordinates representing the predicted target point,
P j representing the three-dimensional coordinates of the shot point on the plane.
In an embodiment of the present invention, in particular, text fractional feature map data at a single channel pixel level in the feature vector data may be used as a parameter and input into a full connection layer of a convolutional neural network to obtain plane information data in the image data. The fully-connected layer in the convolutional neural network is equivalent to the hidden layer in the conventional feedforward neural network. The fully-connected layer is located at the last part of the hidden layer of the convolutional neural network and only signals are transmitted to other fully-connected layers. The feature map loses spatial topology in the fully connected layer, is expanded into vectors and passes through the excitation function. The convolutional layer and the pooling layer in the convolutional neural network can extract the characteristics of input data, and the fully-connected layer is used for carrying out nonlinear combination on the extracted characteristics to obtain output. The plane parameter data may include codes, normals, and offsets. The encoding may be encoding the predicted plane to obtain position information of the plane in the picture, that is, encoding the predicted plane to determine the position of the plane in the picture. The normal may be the direction of the plane determined by the encoding, i.e. the direction of the determined plane. The offset may be depth information of the determined plane, i.e. the depth of the determined plane. Since the planes within the picture are three-dimensional, one plane cannot be determined by the normal alone. Plane information data in the image data can be acquired through the full connection layer of the convolutional neural network.
In one embodiment of the present invention, when step S40 is performed, that is, based on the feature vector data, text image information data within the image data is obtained. Specifically, the multi-channel geometric feature map data in the feature vector data may be input into a convolution layer in the convolutional neural network as a parameter. The convolutional layer can perform feature extraction on input data, the convolutional layer internally comprises a plurality of convolutional kernels, and each element forming the convolutional kernels corresponds to a weight coefficient and a deviation amount, and is similar to a neuron of a feedforward neural network. Each neuron in the convolution layer is connected to a plurality of neurons in a closely located region in the previous layer, the size of the region being dependent on the size of the convolution kernel. The convolutional layer may thus output text image information data within the image data, which may include position data of text blocks and text block direction data within the image data.
Referring to fig. 3, in an embodiment of the invention, when the step S50 is performed, the text data is obtained according to the plane information data and the text image information data. Wherein, the sub-step of step S50 may include:
and step S51, obtaining perspective transformation data according to the plane parameter data in the plane information data.
And step S52, obtaining text data according to the perspective transformation data and the position data of the text block in the text image information data, wherein the text data is the text data in which the text blocks on different planes are transformed to the same plane and are aligned normally and horizontally.
In an embodiment of the present invention, specifically, when performing step S51 and step S52, the perspective transformation data, i.e., the perspective transformation matrix, may be generated according to the plane parameter data in the plane information data. And then, multiplying the perspective transformation matrix by the position data of the text block to obtain a transformed characteristic matrix, namely obtaining text data, wherein the text data can be character data which is obtained by transforming the character blocks on different planes to the same plane and is aligned normally and horizontally. The perspective transformation process can be expressed as:
t x =d l *cosθ-d t *sinθ-u
t y =d t *cosθ+d l *sinθ-v
w=m*(d l +d r )
[x,y,z]=M[u,v,w]
wherein, M represents a perspective transformation matrix,
h represents the height of the feature map after the radial transformation,
w represents the width of the feature map after the radial transformation,
d t representing the distance from the feature point to the top of the text box, generating a text fractional feature map and a multi-channel geometric feature map at a single-channel pixel level by the feature point representation, representing a border marked on the periphery of the text area by the text box,
d b representing the distance of the feature point to the bottom-most part of the text box,
d l indicating the distance of the feature point to the leftmost part of the text box,
d r indicating the distance of the feature point to the rightmost part of the text box,
theta denotes a rotation angle of the text box,
x, y and z represent coordinates of the picture obtained through perspective transformation;
u, v and w represent the coordinates of the characteristic points;
t x representing the translation parameters in the x-axis direction in the perspective transformation,
t y representing the translation parameters in the y-axis direction in the perspective transformation,
theta represents the rotation angle of the text detection box,
m represents a magnification parameter in the perspective transformation.
In one embodiment of the present invention, when step S60 is performed, that is, based on the text data, text information data is obtained. Because the text data is the text data which converts the text blocks on different planes into the same plane and is aligned normally and horizontally, the text information in the text data can be directly identified through a text identification network, the text identification network can comprise a full convolution network, a bidirectional LSTM network, a full connection layer and a CTC decoder, and therefore the final result can be output through the text identification network, and the corresponding text information data can be obtained.
It can be seen that, in the above scheme, feature extraction is performed on image data containing a text block with multiple continuous planes to obtain feature vector data, then the number of planes, a plane parameter, a position of the text block, and a direction of the text block are obtained according to the feature vector data, then text data is obtained through perspective transformation operation according to the number of planes, the plane parameter, the position of the text block, and the direction of the text block, and the text data is text data in which text blocks on different planes are transformed to the same plane and aligned normally and horizontally, so that final text content data can be obtained by performing text extraction on the text data, and thus, the text in the text image with multiple continuous planes in a three-dimensional scene can be identified.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Referring to fig. 4, the present invention further provides a device for recognizing a three-dimensional multi-plane text, where the device for recognizing a three-dimensional multi-plane text corresponds to the method for recognizing a three-dimensional multi-plane text in the foregoing embodiment one by one. The device for recognizing the three-dimensional multi-plane text can comprise a data acquisition module 10, an extraction module 20, a first processing module 30, a second processing module 40, a text processing module 50 and a word recognition module 60. The functional modules are explained in detail as follows:
the data acquisition module 10 may be used to acquire image data containing text blocks of successive multi-planes. The extraction module 20 may be configured to process the image data to obtain feature vector data. The first processing module 30 may be configured to obtain plane information data in the image data according to the feature vector data. The second processing module 40 may be configured to obtain text image information data in the image data according to the feature vector data. The text processing module 50 is configured to obtain text data according to the plane information data and the text image information data. The text recognition module 60 may be configured to obtain text content data according to the text data.
In one embodiment of the present invention, the data acquisition module 10 may be used to acquire image data containing text blocks of successive multi-planes. Specifically, the image data of the text block containing multiple continuous planes may be a picture, and the picture may include multiple planes, for example, the picture may be a corner of a wall, so that the picture may include three planes perpendicular to each other. So that each plane may include a corresponding text word, the three planes being connected to each other, i.e. image data is obtained containing text blocks of successive planes.
Referring to fig. 5, in an embodiment of the invention, the extracting module 20 may include a residual module 21 and a pooling module 22. The residual error module 21 may be configured to input the image data as a parameter into a residual error network of a convolutional neural network in the character recognition model to obtain intermediate data, and the pooling module 22 may be configured to input the intermediate data as a parameter into a pooling layer of the convolutional neural network in the character recognition model to obtain feature vector data.
In an embodiment of the present invention, the first processing module 30 may be configured to obtain plane information data in the image data according to the feature vector data, where the plane information data includes plane quantity data and plane parameter data. Specifically, the feature vector data may be used as a parameter and input into a full-link layer of the convolutional neural network to obtain plane information data in the image data.
In an embodiment of the present invention, the second processing module 40 may be configured to obtain text image information data in the image data according to the feature vector data. Specifically, the feature vector data may be used as a parameter and input into a convolution layer in the convolutional neural network, and the convolution layer may output text image information data in the image data, where the text image information data may include position data of a text block and text block direction data in the image data.
Referring to fig. 6, in an embodiment of the present invention, the text processing module 50 may include a perspective transformation module 51 and a text extraction module 52. The perspective transformation module 51 may be configured to obtain perspective transformation data according to the plane parameter data in the plane information data, and the text extraction module 52 may be configured to obtain text data according to the perspective transformation data and the position data of the text block in the text image information data, where the text data is text data in which text blocks on different planes are transformed to the same plane and are aligned normally and horizontally.
In one embodiment of the present invention, the text recognition module 60 may be configured to obtain text information data according to the text data. Because the text data is the text data which converts the text blocks on different planes into the same plane and is aligned normally and horizontally, the text information in the text data can be directly identified through a text identification network, the text identification network can comprise a full convolution network, a bidirectional LSTM network, a full connection layer and a CTC decoder, and therefore the final result can be output through the text identification network, and the corresponding text information data can be obtained.
The identification device for the three-dimensional multi-plane text can improve the accuracy of a user portrait system, enrich the label capacity of the user portrait system and improve the accuracy of future recall recommendation.
For specific limitations of the recognition device located in the three-dimensional multi-planar text, reference may be made to the above limitations of the recognition method located in the three-dimensional multi-planar text, and details are not repeated here. The modules in the device for recognizing three-dimensional multi-plane text can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Referring to fig. 7, the present invention also provides a computer device including a processor, a memory, a network interface and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes non-volatile and/or volatile storage media, internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The network interface of the computer device is used for communicating with an external client through a network connection. The computer program is executed by a processor to implement the functions or steps of a recognition method for three-dimensional multi-planar text.
In one embodiment of the invention, a computer device may include a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
image data containing text blocks of successive multi-planes is acquired.
And processing the image data to obtain feature vector data.
And obtaining plane information data in the image data according to the feature vector data.
And obtaining text image information data in the image data according to the feature vector data.
And obtaining text data according to the plane information data and the text image information data.
And obtaining the character content data according to the text data.
It should be noted that, the functions or steps that can be implemented by the computer-readable storage medium or the computer device can be referred to the related descriptions of the server side and the client side in the foregoing method embodiments, and are not described here one by one to avoid repetition.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus (Rambus) direct RAM (RDRAM), direct bused dynamic RAM (DRDRAM), and bused dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
In the description of the present specification, reference to the description of the terms "present embodiment," "example," "specific example," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The embodiments of the invention disclosed above are intended merely to aid in the explanation of the invention. The examples are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims (10)
1. A method for recognizing a text located in three-dimensional multi-planes, comprising:
acquiring image data of text blocks containing continuous planes;
processing the image data to obtain feature vector data;
according to the feature vector data, plane information data in the image data are obtained;
obtaining text image information data in the image data according to the feature vector data;
obtaining text data according to the plane information data and the text image information data;
and obtaining character content data according to the text data.
2. The method of claim 1, wherein the step of processing the image data to obtain feature vector data comprises:
inputting the image data serving as a parameter into a residual error network of a convolutional neural network to obtain intermediate data;
inputting the intermediate data serving as parameters into a pooling layer of the convolutional neural network to obtain feature vector data;
the feature vector data comprises text fractional feature map data and multichannel geometric figure feature map data of a single-channel pixel level.
3. The method for recognizing the text located in the three-dimensional multi-plane according to claim 1, wherein the obtaining plane information data in the image data according to the feature vector data comprises:
inputting the text fractional feature map data of a single-channel pixel level in the feature vector data into a full-connection layer network of a convolutional neural network to obtain plane information data in the image data;
the plane information data comprises plane quantity data and plane parameter data;
the plane parameter data comprises codes, normals and offsets;
the loss function of the regression plane parameters of the full-connection layer network is expressed as:
wherein S is * Representing the required plane quantity data;
s represents data of the number of preset planes in the network;
P i * three-dimensional coordinates representing the predicted target point;
P j representing the three-dimensional coordinates of the shot point on the plane.
4. The method for recognizing the text located in three-dimensional and multi-plane according to claim 1, wherein the obtaining the text image information data in the image data according to the feature vector data comprises:
inputting the multichannel geometric figure feature map data in the feature vector data into a convolutional neural network to obtain text image information data in the image data;
the text image information data comprises position data and text block direction data of a text block in the image data.
5. The method for recognizing the text positioned on the three-dimensional and multi-plane according to claim 1, wherein the step of obtaining the text data based on the plane information data and the text image information data comprises:
obtaining perspective transformation data according to plane parameter data in the plane information data;
and obtaining text data according to the perspective transformation data and the position data of the text block in the text image information data.
6. The method for recognizing the text located in the three-dimensional multi-plane according to claim 5, wherein the perspective transformation data M is expressed as:
wherein M represents a perspective transformation matrix, i.e., perspective transformation data;
theta represents the rotation angle of the text box, and the text box represents a frame marked on the periphery of the character area;
t x representing a translation parameter in the x-axis direction in the perspective transformation;
t y representing a translation parameter in the y-axis direction in perspective transformation;
m represents a magnification parameter in the perspective transformation.
7. The method of claim 6, wherein the text is stored in a memory of the device,
translation parameter t in x-axis direction in the perspective transformation x Expressed as: t is t x =d l *cosθ-d t *sinθ-u;
Translation parameter t in y-axis direction in the perspective transformation y Expressed as: t is t y =d t *cosθ+d l *sinθ-v;
the width w of the feature map after the radial transformation is expressed as: w ═ m ═ d (d) l +d r );
Wherein, d t Representing the distance from the feature point to the top of the text box, wherein the feature point represents and generates a text fractional feature graph and a multi-channel geometric figure feature graph at a single-channel pixel level;
d b representing the distance from the feature point to the bottommost part of the text box;
d l representing the distance from the feature point to the leftmost part of the text box;
d r representing the distance of the feature point to the rightmost part of the text box;
h represents the height of the feature map after radioactive transformation;
x, y and z represent coordinates of the picture obtained through perspective transformation;
u, v, and w represent coordinates of the feature points, and [ x, y, z ] is M [ u, v, w ].
8. An apparatus for recognizing a three-dimensional multi-plane text, comprising:
the data acquisition module is used for acquiring image data of text blocks containing continuous planes;
the extraction module is used for processing the image data to obtain feature vector data;
the first processing module is used for obtaining plane information data in the image data according to the feature vector data;
the second processing module is used for obtaining text image information data in the image data according to the feature vector data;
the text processing module is used for obtaining text data according to the plane information data and the text image information data; and
and the character recognition module is used for obtaining character content data according to the text data.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method for recognition of text lying in three-dimensional multi-planes according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method for recognition of text lying in three-dimensional multi-planes according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210310193.5A CN114913535B (en) | 2022-03-28 | 2022-03-28 | Identification method, device, equipment and medium for three-dimensional multi-plane text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210310193.5A CN114913535B (en) | 2022-03-28 | 2022-03-28 | Identification method, device, equipment and medium for three-dimensional multi-plane text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114913535A true CN114913535A (en) | 2022-08-16 |
CN114913535B CN114913535B (en) | 2024-07-26 |
Family
ID=82763521
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210310193.5A Active CN114913535B (en) | 2022-03-28 | 2022-03-28 | Identification method, device, equipment and medium for three-dimensional multi-plane text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114913535B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956171A (en) * | 2019-11-06 | 2020-04-03 | 广州供电局有限公司 | Automatic nameplate identification method and device, computer equipment and storage medium |
CN112926616A (en) * | 2019-12-06 | 2021-06-08 | 顺丰科技有限公司 | Image matching method and device, electronic equipment and computer-readable storage medium |
US20210295114A1 (en) * | 2018-12-07 | 2021-09-23 | Huawei Technologies Co., Ltd. | Method and apparatus for extracting structured data from image, and device |
-
2022
- 2022-03-28 CN CN202210310193.5A patent/CN114913535B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210295114A1 (en) * | 2018-12-07 | 2021-09-23 | Huawei Technologies Co., Ltd. | Method and apparatus for extracting structured data from image, and device |
CN110956171A (en) * | 2019-11-06 | 2020-04-03 | 广州供电局有限公司 | Automatic nameplate identification method and device, computer equipment and storage medium |
CN112926616A (en) * | 2019-12-06 | 2021-06-08 | 顺丰科技有限公司 | Image matching method and device, electronic equipment and computer-readable storage medium |
Non-Patent Citations (2)
Title |
---|
宁煜西;周铭;李广强;王宁;: "基于卷积神经网络的航班跟踪视频关键信息识别", 空军预警学院学报, no. 05, 15 October 2018 (2018-10-15) * |
张蕾;赵海霞;普杰信;刘宏;: "利用立体视觉的室内场景分类", 计算机工程与应用, no. 34, 24 October 2011 (2011-10-24) * |
Also Published As
Publication number | Publication date |
---|---|
CN114913535B (en) | 2024-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112036395B (en) | Text classification recognition method and device based on target detection | |
CN107704857A (en) | A kind of lightweight licence plate recognition method and device end to end | |
CN110516541B (en) | Text positioning method and device, computer readable storage medium and computer equipment | |
CN108171122A (en) | The sorting technique of high-spectrum remote sensing based on full convolutional network | |
Follmann et al. | A rotationally-invariant convolution module by feature map back-rotation | |
CN110245621B (en) | Face recognition device, image processing method, feature extraction model, and storage medium | |
CN111310758A (en) | Text detection method and device, computer equipment and storage medium | |
CN108241861A (en) | A kind of data visualization method and equipment | |
CN112101386A (en) | Text detection method and device, computer equipment and storage medium | |
CN111414913B (en) | Character recognition method, recognition device and electronic equipment | |
CN114170231A (en) | Image semantic segmentation method and device based on convolutional neural network and electronic equipment | |
CN115083571A (en) | Pathological section processing method, computer device and storage medium | |
JP7552287B2 (en) | OBJECT DETECTION METHOD, OBJECT DETECTION DEVICE, AND COMPUTER PROGRAM | |
WO2024179388A1 (en) | Plankton object detection and classification method based on multi-layer neural network architecture | |
CN117237623B (en) | Semantic segmentation method and system for remote sensing image of unmanned aerial vehicle | |
CN106709490A (en) | Character recognition method and device | |
CN114913535B (en) | Identification method, device, equipment and medium for three-dimensional multi-plane text | |
CN111598100A (en) | Vehicle frame number identification method and device, computer equipment and storage medium | |
CN117746018A (en) | Customized intention understanding method and system for plane scanning image | |
CN117058554A (en) | Power equipment target detection method, model training method and device | |
CN107220651B (en) | Method and device for extracting image features | |
CN117037136A (en) | Scene text recognition method, system, equipment and storage medium | |
CN116128792A (en) | Image processing method and related equipment | |
CN114155540A (en) | Character recognition method, device and equipment based on deep learning and storage medium | |
Bhargavi et al. | A survey on recent deep learning architectures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |