CN115278108A

CN115278108A - Writing shooting method and device, learning machine and storage medium

Info

Publication number: CN115278108A
Application number: CN202210886866.1A
Authority: CN
Inventors: 王阳; 陈泽伟; 王光明; 邓泽方
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-11-01
Anticipated expiration: 2042-07-26
Also published as: CN115278108B

Abstract

The embodiment of the application discloses a writing shooting method, a writing shooting device, a learning machine and a storage medium, wherein the method is applied to the learning machine, the learning machine is provided with a camera module, the camera module comprises a plurality of cameras, the field angle of each camera is fixed on a desktop through the learning machine, the vertical height between each camera and the desktop and the shooting range required by each camera are determined when the desktop is shot, and the distance between the plurality of cameras is determined according to the number and the shooting range of the cameras, and the method comprises the following steps: controlling all cameras to shoot a desktop to obtain a plurality of sub-images, wherein a text object is placed on the desktop; detecting a target sub-image containing a target object in the plurality of sub-images, wherein the target object is related to the writing action received by the text object; and keeping the camera corresponding to the target sub-image for shooting and closing other cameras. Adopt above-mentioned scheme can solve and can't guarantee the technical problem who shoots the content definition when using wide-angle camera to shoot among the prior art.

Description

Writing shooting method and device, learning machine and storage medium

Technical Field

The embodiment of the application relates to the technical field of learning machines, in particular to a writing shooting method and device, a learning machine and a storage medium.

Background

At present, artificial intelligence technology is widely applied to various industries. For example, in the educational industry, artificial intelligence techniques are applied to learning machines as an auxiliary learning device for users during their learning process. In the application process of the artificial intelligence technology, the camera is used as a key device, the contents of the user such as handwriting learning and handwriting answering can be shot, then the handwriting contents can be recognized by the artificial intelligence technology, and further the functions of automatically counting wrong questions, searching weak knowledge points and the like are achieved, so that the user can learn in a targeted manner.

It can be understood that the clearer the shooting of the camera is, the more the accuracy of subsequent recognition of answers is facilitated. Referring to fig. 1, when the learning machine 1 is fixed on the desktop 2 and shoots the workbook or the test paper 3 placed on the desktop 2, the camera 4 needs to be placed obliquely to shoot the desktop 2, so that the depth of field and the angle of view of the camera in the shooting process affect the shooting definition and the shooting range. When the workbook or the test paper 3 shot is large, the whole workbook or the test paper 3 can be shot only by using the wide-angle camera, however, the edge definition is low when the wide-angle camera shoots, and subsequent recognition processing is not facilitated.

In summary, how to ensure the definition of the shot content when shooting with the camera of the learning machine becomes a technical problem which needs to be solved urgently.

Disclosure of Invention

The application provides a writing shooting method and device, a learning machine and a storage medium, and aims to solve the technical problem that in the prior art, the definition of shot contents cannot be guaranteed when a wide-angle camera is used for shooting.

In a first aspect, an embodiment of the present application provides a writing shooting method, which is applied to a learning machine, wherein the learning machine is provided with a camera module, the camera module includes a plurality of cameras, a field angle of the camera is fixed on a desktop through the learning machine, a vertical height between the camera and the desktop and a shooting range required by each camera are determined when the desktop is shot, and a distance between the plurality of cameras is determined according to the number of the cameras and the shooting range;

the method comprises the following steps:

controlling all the cameras to shoot the desktop to obtain a plurality of sub-images, wherein a text object is placed on the desktop, and each camera corresponds to one sub-image;

detecting a target sub-image containing a target object in a plurality of sub-images, wherein the target object is related to the writing action received by the text object;

and keeping the camera corresponding to the target sub-image for shooting, and closing other cameras.

In a second aspect, an embodiment of the present application further provides a writing shooting device, which is applied to a learning machine, wherein the learning machine is provided with a camera module, the camera module includes a plurality of cameras, a field angle of the camera is fixed on a desktop through the learning machine, when the desktop is shot, a vertical height between the camera and the desktop and a shooting range required by each camera are determined, and an interval between the plurality of cameras is determined according to the number of the cameras and the shooting range;

the device comprises:

the first shooting unit is used for controlling all the cameras to shoot the desktop to obtain a plurality of sub-images, a text object is placed on the desktop, and each camera corresponds to one sub-image;

a first detection unit, configured to detect a target sub-image containing a target object from among the plurality of sub-images, where the target object is related to a writing action received by the text object;

and the second shooting unit is used for keeping the camera corresponding to the target sub-image to shoot and closing other cameras.

In a third aspect, an embodiment of the present application further provides a learning machine, where the learning machine includes a camera module, one or more processors, and a memory, the camera module includes a plurality of cameras, a field angle of the camera is determined by a vertical height between the camera and a desktop when the learning machine is fixed on the desktop and the desktop is photographed and a photographing range required by each camera, and a distance between the plurality of cameras is determined according to the number of the cameras and the photographing range;

the camera module is used for shooting according to the instruction of the processor;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the handwriting capture method of the first aspect.

In a fourth aspect, embodiments of the present application further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are used to perform the writing photographing method according to the first aspect.

According to the writing shooting method, the writing shooting device, the learning machine and the storage medium, all cameras are controlled to shoot a desktop on which a text object is placed, the target sub-images containing the target object are identified in the multiple sub-images obtained through shooting, then only the camera corresponding to the target sub-image is kept to shoot, and other cameras are closed to obtain a technical means of writing pictures, the technical problem that the definition of shot contents cannot be guaranteed when a wide-angle camera is used for shooting in the prior art is solved, the shooting range of each camera can be reduced through the multiple cameras in a regional mode, under the condition that pixels are not changed, fewer shooting details are described, the content shot by each camera is enabled to be clearer, the union region of the shooting ranges of the cameras serves as the shooting range of the learning machine camera module, the view field (namely the shooting range) shot by the camera module can be guaranteed to be large enough, and the actual requirements of users are met. Furthermore, the field angle of the camera is fixed on the desktop through the learning machine, when the desktop is shot, the vertical height of the camera and the desktop and the shooting range required by each camera are determined, the reasonable field angle can be set for the camera, so that the visual field of the camera is smaller, the edge of the shooting range is closer to the center, the definition of the edge is improved, and subsequent identification and analysis are facilitated. And moreover, the distance between the cameras is determined by combining the number of the cameras and the maximum shooting range, so that the reasonable arrangement of the cameras in the learning machine is ensured.

Drawings

FIG. 1 is a schematic illustration of a placement of a prior art learning machine;

fig. 2 is a schematic view of a distribution of cameras according to an embodiment of the present application;

fig. 3 is a schematic view of a distribution of cameras according to an embodiment of the present application;

fig. 4 is a schematic view of a distribution of cameras according to an embodiment of the present application;

fig. 5 is a schematic distribution diagram of cameras according to an embodiment of the present application;

FIG. 6 is a schematic view of a position of a reflector and a camera according to an embodiment of the present disclosure;

FIG. 7 is a captured image provided in accordance with an embodiment of the present application;

fig. 8 is a schematic diagram of a shooting range imaging provided in an embodiment of the present application;

FIG. 9 is a flowchart illustrating a writing photographing method according to an embodiment of the present application;

FIG. 10 is a flowchart illustrating a writing photographing method according to an embodiment of the present application;

FIG. 11 is a schematic diagram illustrating a relative position relationship between a learning machine and a desktop according to an embodiment of the present application;

FIG. 12 is a plan view of triangle ABC in FIG. 11;

fig. 13 is a plan view of the triangular AED of fig. 11;

fig. 14 is a schematic diagram illustrating a relationship between a field angle and a shooting range according to an embodiment of the present application;

fig. 15 is a schematic diagram of a camera pitch according to an embodiment of the present application;

FIG. 16 is a schematic view of another camera pitch provided in accordance with an embodiment of the present application;

fig. 17 is a schematic structural diagram of a writing photographing device according to an embodiment of the present application;

fig. 18 is a schematic structural diagram of a learning machine according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration and not limitation. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

One embodiment of the present application provides a writing photographing method, which may be performed by a writing photographing apparatus, which may be implemented in software and/or hardware, and integrated in a writing photographing apparatus. The writing and shooting device may be composed of two or more physical entities or may be composed of one physical entity, which is not limited in the embodiments. The writing and shooting device can be a tablet computer, a notebook computer, a mobile phone, a learning machine and the like. At present, a writing and shooting device is taken as an example of a learning machine. The learning machine is used as an auxiliary device for learning of the user, and can make a learning plan, display a teaching course, recommend exercises, shoot a learning process or a question making process of the user, correct the exercises answered by the user and the like, and the learning machine can be recorded as a learning device, a learning terminal and the like. In the using process of the learning machine, the learning machine can be fixed on the desktop so as to shoot a user in the learning process or the writing process of the desktop, wherein the fixing mode of the learning machine and the desktop is not limited currently, for example, a support is placed on the desktop, the learning machine is placed on the support, and the fixation of the learning machine and the desktop can be realized. At present, the structure of support is fixed, and the contained angle with the desktop is fixed when the support is placed to the learning, and the learning is known.

The learning machine is provided with at least one operating system, and at least one application program can be installed under the operating system. The application program may be an application program carried by an operating system, or an application program downloaded from a third-party device or a server, and is not limited at present. The learning machine is also provided with a display screen which can have a touch function. The learning machine is also provided with a communication device, and network courses, exercise answers and the like can be obtained through the communication device.

In one embodiment, the learning machine is further provided with a camera module. The camera module comprises a plurality of cameras, the field angle of each camera is determined by the vertical height of the camera and the desktop and the shooting range required by each camera when the learning machine is fixed on the desktop and the desktop is shot, and the distance between the plurality of cameras is determined according to the number of the cameras and the shooting range.

For example, fig. 2 to fig. 5 are schematic distribution diagrams of cameras respectively provided in an embodiment of the present application, which respectively show arrangement manners of the cameras when the number of the cameras 51 in the camera module 5 is two, three, five, and seven, and can also be understood as relative positions between the cameras. It can be understood that the installation positions of the cameras in the learning machine can be determined according to actual shooting requirements, for example, a plurality of cameras are installed on one frame of the learning machine facing the user side, or a plurality of cameras are installed on a plurality of frames of the learning machine facing the user side, and at this time, the front cameras of the learning machine can be understood by the cameras. In an embodiment, a reflective mirror may be further provided for the camera module, wherein each camera may be configured with one reflective mirror or one reflective mirror shared by multiple cameras, and the reflective mirror is used for reflecting external light to the camera for imaging, for example, fig. 6 is a schematic position diagram of a reflective mirror and a camera provided in an embodiment of the present application, and referring to fig. 6, the reflective mirror 52 is located in front of the camera 51 and presents a certain angle, so as to emit the lower content to be photographed 6 to the camera for imaging. Optionally, the size of the reflector is larger than the field angle of a camera for shooting, so that the camera cannot shoot contents outside the reflector when the camera utilizes the reflector to realize shooting.

Illustratively, the cameras are of the same type, e.g., fixed focus cameras or zoom cameras. Further, parameters (such as a focal length, a field angle, and the like) set when each camera shoots are the same, and a shooting range is also the same, where the shooting range refers to a range of the real world that can be shot when the camera shoots, and currently, taking an example in which a learning machine is fixed on a desktop and shoots the desktop as an example, on the basis, the shooting range mentioned in the embodiment refers to a range of the desktop that can be shot when the camera shoots the desktop. Further, the actual area corresponding to the far shooting range is larger and the unit photosensitive area corresponding to the camera photosensitive area is smaller when the camera shoots, for example, referring to fig. 7, for a shot image provided by an embodiment of the present application, when the camera shoots, the near horizontal shooting range (the lower end of fig. 7) is less than one line, and the far horizontal shooting range (the upper end of fig. 7) exceeds one line. In fig. 7, a part of the text is subjected to mosaic processing. However, the number of pixels corresponding to the far-end shooting range and the near-end shooting range in the picture shot by the camera is the same, for example, fig. 8 is a schematic imaging view of the shooting range provided in an embodiment of the present application, and fig. 8 illustrates an image shot by the camera as 5 × 5 pixels (piexs), at this time, when the camera 51 shoots the content 6 to be shot, an image of 5 × 5piexs may be obtained, the far-end shooting range (e.g., the upper end of fig. 7) is presented by 5 pixels in the horizontal direction at the upper end in the image, and the near-end shooting range (e.g., the lower end of fig. 7) is presented by 5 pixels in the horizontal direction at the lower end in the image, at this time, for the content of the same size in the near-end and the far-end, the content of the near-end is presented by using more pixels, so that the content is clearer. Based on this, in the embodiment, a proper field angle (generally smaller than the maximum field angle when a single camera performs shooting) is set for the camera, so that the shooting range of the camera is properly smaller, further more pixels are used for presenting the content in the shooting range, the shot image is clearer, the distortion is relatively smaller, and the subsequent image processing is facilitated. The field angle may be referred to as a field of view, and the size of the field angle determines a field of view (currently, a shooting range) of an optical device (currently, a camera). In the embodiment, when the learning machine is fixed on a desktop, an included angle between the learning machine and the desktop is fixed, at the moment, a proper viewing angle is set for each camera by combining the installation position of the camera on the learning machine, the shooting range of the camera on the desktop required in the application process of the learning machine and the vertical height of the camera with the desktop, wherein the currently used shooting range is the range which a user expects the camera to shoot, and after the proper viewing angle is set for the camera in principle, the camera can meet the required shooting range. It should be noted that there is an intersection or an adjacent intersection between the shooting ranges corresponding to the cameras at adjacent positions, and at this time, the union of the shooting ranges may be regarded as the total shooting range of the camera module.

Further, the distance between the cameras installed in the learning machine may also be determined by combining the number of the cameras and the shooting range of the cameras, for example, when there are two cameras, the maximum value of the distance between the cameras is the length of the side line farthest from the camera in the shooting range, so as to ensure that the shooting ranges of the two cameras are close to each other, thereby avoiding missing the shooting content. In one embodiment, in combination with an application scenario of the learning machine, the total shooting range of the camera module should be greater than or equal to an area of a book or a test paper used by the user during learning, for example, the area when the book or the test paper is written in a size of A3 is the largest, and therefore, the total shooting range of the camera module should be greater than or equal to an area range corresponding to the size of A3. In one embodiment, the A3 size is 297 × 420mm, and thus, the total photographing range of the photographing module is 297 × 420mm. It can be understood that since the total shooting range of the shooting module is known and the number of cameras is also known, an appropriate shooting range can be set for the cameras in combination with the total shooting range and the number of cameras, and then the angle of view and the pitch of the cameras are determined in combination with the shooting range of the cameras. And then obtaining the installation position of the camera in the learning machine.

Fig. 9 is a flowchart of a writing photographing method according to an embodiment of the present application, and with reference to fig. 9, the writing photographing method includes:

and 110, controlling all the cameras to shoot the desktop to obtain a plurality of sub-images, wherein a text object is placed on the desktop, and each camera corresponds to one sub-image.

At present, the learning machine is fixed on the desktop and the camera module is in the state of starting, and at this moment, the learning machine can control each camera of camera module and shoot. In one embodiment, when the learning machine shoots the desktop, the camera can directly shoot the desktop, or the camera can shoot through the reflector, and in any mode, the desktop only needs to be shot. In one embodiment, a texting object is placed on the desktop, the texting object refers to a book or paper containing text content, and currently, the texting object is a book or a test paper. The shooting of the desktop by each camera specifically refers to shooting of a text object.

Optionally, when the camera performs shooting, the shooting may be continuous shooting, or only one image may be shot, and the embodiment is not limited. At present, the image shot by each camera is recorded as a subimage, it can be understood that a subregion of the text object in the shooting range of the corresponding camera is displayed in the subimage, and the subimages obtained by the cameras are spliced to obtain an image containing the complete text object. Currently, each acquired sub-image is a sub-image obtained by shooting by each camera at the same time.

Step 120, detecting a target sub-image containing a target object among the plurality of sub-images, wherein the target object is related to the writing action received by the text object.

For example, after acquiring each sub-image, each sub-image is identified to determine whether the target object is included therein. The target object is an object related to a writing action when a user writes on the text object, and in one embodiment, the target object is a pen and/or a human hand. It is understood that the user generally operates on the text object by a pen (which may be a writing pen, a point-and-read pen, etc.) or a human hand. When the target object is a pen, the target object can be specifically a pen point and a pen point, the pen point is a part of the stylus pen contacting with a text object, and the pen point is a part connecting the pen point and the pen holder. Further, the manner of identifying whether the sub-image includes the target object is not limited at present, for example, a neural network model is constructed, whether the sub-image includes the target object is identified through the neural network model (which is trained), or the target object in the sub-image is identified by using an image identification technology or an image tracking technology.

For example, if the target object is identified in a sub-image, it is determined that the user may write in a part of the text object captured by the sub-image, i.e. a writing action is issued. Generally, only one sub-image contains the target object at the same time, and the sub-image containing the target object is recorded as the target sub-image.

And step 130, keeping the camera corresponding to the target sub-image for shooting, and closing other cameras.

For example, after the target sub-image is determined, a camera for capturing the target sub-image (i.e., a camera corresponding to the target sub-image) is determined. And then, controlling the camera corresponding to the target sub-image to continue shooting, namely continuously shooting. And other cameras are closed, so that hardware resources of the learning machine are saved.

Optionally, after the camera corresponding to the control target sub-image continues to shoot, the continuously acquired sub-image forms a writing picture of the user, and then the learning machine can identify a track written by the user in the writing picture to determine writing content of the user, so as to perform subsequent functions, wherein in an answer scene, whether answer is accurate is judged based on the writing content, and in a learning scene, knowledge mastering degree of the user is determined based on the writing content.

According to the writing shooting method, all cameras are controlled to shoot a desktop on which a text object is placed, the target sub-images containing a target object are identified in the multiple sub-images obtained through shooting, then only the cameras corresponding to the target sub-images are kept to shoot, and other cameras are closed to obtain a technical means of writing pictures. Furthermore, the field angle of the camera is fixed on the desktop through the learning machine, when the desktop is shot, the vertical height of the camera and the desktop and the shooting range required by each camera are determined, the reasonable field angle can be set for the camera, so that the field of view of the camera is smaller, the edge of the shooting range is closer to the center, the definition of the edge is improved, and the follow-up identification and analysis are facilitated. And moreover, the distance between the cameras is determined by combining the number of the cameras and the maximum shooting range, so that the reasonable arrangement of the cameras in the learning machine is ensured.

In one embodiment, because there is mobility when a user writes, that is, the writing position changes, the camera shooting the target object also changes, and at this time, the learning machine needs to determine the camera shooting the target object immediately to realize writing tracking. Based on this, in this embodiment, when the camera corresponding to the target sub-image is kept to perform shooting, the method further includes: and continuously detecting the target object in the target sub-image. The keeping of the camera corresponding to the target sub-image for shooting and the closing of other cameras further include: and when the target sub-image is detected not to contain the target object, returning to execute the operation of controlling all the cameras to shoot the desktop.

For example, when a camera corresponding to the target sub-image captures the image to obtain the writing picture, the learning machine detects a real-time target sub-image continuously captured by the camera to determine whether the target sub-image includes the target object. If the target object is included, the user still writes in the shooting range corresponding to the camera, and if the target object is not included, the user changes the writing position or finishes writing, and at this time, further judgment is needed. And when the judgment is further made, the learning machine controls all the cameras to shoot the desktop again, namely, the operation of controlling all the cameras to shoot the desktop is executed, the subimages obtained by shooting by all the cameras are obtained, then, the target subimages containing the target object are identified in all the subimages, the cameras corresponding to the target subimages are continuously kept to shoot, and other cameras are closed. It should be noted that, in a case where each sub-image does not include the target object, that is, the learning machine does not recognize the target object, at this time, the learning machine confirms that the user is not writing, and continues to control each camera to shoot, so as to continue to detect the target object.

Optionally, a time length is preset, and a specific value of the time length may be set according to an actual situation. And when the target sub-image is detected not to contain the target object, starting timing, and continuously identifying whether the target sub-image obtained in real time contains the target object. And if the target object is detected, continuing to keep the camera corresponding to the target subimage for shooting, closing other cameras, and if the timed duration reaches the preset time length and the target object is not detected yet, determining that the user does not write in the current shooting range, and at the moment, returning to execute the operation of controlling all the cameras to shoot the desktop so as to avoid the influence of the writing pause of the user on the detection result.

The target object is detected in real time when the camera corresponding to the target sub-image continuously shoots, and when the target object is not detected, all the cameras are controlled to shoot again so as to detect the target sub-image containing the target object again and further control the corresponding camera to shoot.

Fig. 10 is a flowchart of a writing photographing method according to an embodiment of the present application. In this embodiment, the camera module includes two cameras, each of the cameras has the same field angle, and when the field angle of the camera is fixed on a desktop through the learning machine and the vertical height between the camera and the desktop when shooting the desktop and the shooting range required by each camera are determined, a calculation formula of the field angle is as follows:

Fov＝2arctan(s/h)

h＝AF/sinα

fov represents a field angle, s represents a half of the length of the farthest shooting edge when the camera shoots, the farthest shooting edge is determined according to the shooting range, AF represents the vertical height between the learning machine and the desktop when the learning machine is fixed on the desktop, h represents the length of a connecting line between the camera and the midpoint of the farthest shooting edge when the learning machine is fixed on the desktop and shoots the desktop, and α represents an included angle between the connecting line and the desktop.

For example, the description is currently made by taking an example in which the camera module includes two cameras. Fig. 11 is a schematic diagram illustrating a relative position relationship between a learning machine and a desktop according to an embodiment of the present application. Referring to fig. 11, when the learning machine 1 is fixed on a desk, each camera 51 is positioned above the side of the learning machine 1 facing the user. The included angle of learning machine 1 and desktop is marked as theta, and the total shooting scope when the camera module was shot needs to cover the area that the A3 size corresponds, and when the camera module was shot promptly, horizontal shooting scope (parallel with learning machine and desktop contact line) need reach 420mm, and vertical shooting scope need reach 297mm. In the embodiment, when there are two cameras 51, the shooting range required on the table when each camera 51 shoots should be equal to or greater than 297 × 210mm, that is, the shooting range of each camera in the transverse direction should be equal to or greater than 210mm. Currently, a shooting range of a camera is described by taking the camera as an example, and referring to fig. 11, in the shooting range of the camera, an edge (corresponding to the edge BC in fig. 11) farthest from the camera and parallel to an edge where the learning machine and the table top are in contact should be greater than or equal to 210mm, and a specific value thereof may be set in combination with an actual situation. When side BC is denoted as the farthest imaging side and the midpoint of the farthest imaging side is denoted as point D, BD = DC. For ease of understanding, the triangle ABC and the triangle AED can be formed by marking the camera as point A, making a perpendicular line from the camera to the bottom edge of the learning machine (i.e., the edge touching the desktop), and marking the intersection as point E. Fig. 12 is a plan view of the triangle ABC in fig. 11 and fig. 13 is a plan view of the triangle AED in fig. 11. The triangle ABC is an isosceles triangle, for the triangle ABC, the angle of the angle BAC is the angle of view of the camera, the side AD can represent the length from the point a (i.e., the camera) to the side BC (i.e., the farthest side for shooting), and it can be understood that the side AD is perpendicular to the side BC. Currently, the length of the side AD is recorded as h, that is, h represents the length of a connection line between the camera and the midpoint of the farthest side in shooting when the learning machine is fixed on a desktop and the desktop is shot. At this time, tan (Fov/2) = s/h can be obtained from the trigonometric function, where s is half the length of the farthest side imaged by the camera. After the shooting range required by the camera is determined, the specific value of s can be determined, namely s is set, so that Fov can be obtained after h is obtained. For a triangular AED, a triangle AFD can also be obtained by making a perpendicular from point A to edge ED and marking the intersection as point F, referring to fig. 13. The length of the side AF is understood to be the vertical distance from the camera to the desk surface, i.e. the vertical height of the learning machine to the desk surface when the learning machine is fixed on the desk surface. The length of the side AD is h, the angle AED is θ, and the side AE can be understood as the width of the learning machine (i.e., the length of the side that is not parallel to the table when fixed on the table), and currently, the length of the side AE is denoted as d, and as can be seen from fig. 13, the length of the side AF can be determined by d and θ, i.e., AF = d × sin (1- θ), and then, using trigonometric function or pythagorean theorem, when the lengths of the side AF and the side AE are known, the length of the side FE can be derived, and since the length of the side DE is known (297 nm in the maximum shooting range), the length of the side FD can be obtained, and at this time, the length of the side AD, i.e., the value of h, and further the value of Fov can be obtained by using pythagorean theorem. Or after the learning machine is fixed, when the camera shoots the desktop, the included angle between the connecting line of the camera and the farthest shooting edge and the desktop is known, and the included angle is the angle ADE, it can be understood that the specific value of the included angle is irrelevant to the height between the camera and the desktop, at present, the angle of the angle ADE is recorded as α, at this time, α can also represent the included angle between the connecting line AD and the desktop, and can be obtained, sin α = AF/h, because α and AF are determined, h can be obtained based on the above formula, and further Fov is obtained based on h and s. It is understood that, referring to fig. 13, h/sin θ = d/sin α = AF can also be derived using the sine theorem, and thus, the relationship of h and d can be found as h = sin θ d/sin α. Fig. 14 is a schematic diagram illustrating a relationship between a viewing angle and a shooting range according to an embodiment of the present application, and referring to fig. 14, when a viewing angle of a camera is fixed, a shooting farthest side is closer to the camera, and the shooting range is smaller, as in fig. 14, a distance from a side a1 to the camera is smaller than a distance from a side a2 to the camera, and therefore, a shooting range when the side a1 is taken as the shooting farthest side is smaller than a shooting range when the side a2 is taken as the shooting farthest side.

In one embodiment, the camera module comprises two cameras, the distance between the two cameras is smaller than or equal to the length of the farthest shooting edge when the camera shoots the desktop, and the farthest shooting edge is determined according to the shooting range. For example, taking fig. 11 as an example, the length of the farthest shooting side (i.e., side BC) in shooting by one camera is 2s, and the value of s is preset. At this time, when the distance between the two cameras is 2s, the maximum shooting ranges of the two cameras do not coincide, for example, fig. 15 is a schematic diagram of the distance between the cameras provided in an embodiment of the present application, and when the distance between the cameras is denoted as L, and L =2s, a schematic diagram of a plane where the camera and the shooting farthest side of the shooting range are located (i.e., a plane where the triangle ABC is located) is shown in fig. 15, and at this time, the shooting farthest sides of the two cameras are adjacent, i.e., the two shooting ranges are adjacent. Fig. 16 is a schematic diagram of another camera pitch provided in an embodiment of the present application, referring to fig. 16, when the pitch between two cameras is 0, two cameras can be understood as one camera, and at this time, a schematic plan view of the camera and the farthest shooting edge of the shooting range is shown in fig. 16, and at this time, the shooting ranges of the two cameras completely overlap, and based on this, the pitch between two cameras should be greater than 0 and less than or equal to the length of the farthest shooting edge.

On this basis, referring to fig. 10, the writing photographing method includes:

step 210, controlling all the cameras to shoot the desktop to obtain a plurality of sub-images, wherein a text object is placed on the desktop, and each camera corresponds to one sub-image.

In one embodiment, the textual matter includes a topic of at least one problem. Wherein the type of problem is currently not limiting.

Step 220, splicing the multiple sub-images according to the relative position between the cameras to obtain a text image containing a complete text object.

For example, when the camera module adopts the structure shown in fig. 2, the relative positions of the two cameras are arranged horizontally left and right, and at this time, the sub-images taken by the two cameras are also in a horizontal left and right arrangement relationship, and based on the relationship, the two sub-images can be spliced to obtain an image, which can be regarded as an image taken by the camera module, and the image includes a complete text object. Currently, the images obtained by stitching are recorded as text images.

In one embodiment, during splicing, the arrangement relationship of the sub-images is determined based on the relative positions of the cameras. Thereafter, for two adjacent sub-images, the same part of the two sub-images is found. It can be understood that, because the shooting ranges of the adjacent cameras have partially overlapped areas, the sub-images shot by the adjacent cameras also have partially identical contents, and therefore, the identical parts in the adjacent sub-images can be found during splicing. It can be understood that, because the positions of the adjacent cameras on the learning machine are different, the same content in the two sub-images captured by the adjacent cameras may have some differences, and at this time, the same content in at least one sub-image needs to be transformed so as to align the same content in the two sub-images. The method comprises the steps of splicing two adjacent sub-images, wherein the step of aligning the same content in the two adjacent sub-images is an actual means which is already realized, no description is made at present, and then the step of splicing the images based on the same content can be carried out because the same content between the adjacent sub-images is already aligned.

And step 230, performing text recognition on the text image to determine the title content in the text object.

Illustratively, text recognition is performed on the text image to obtain text content in the text object, and currently, the text content of the text object is the subject content of the exercise. The technical means used in text recognition is not limited at present, for example, a neural network used for text recognition is trained, and then the topic content can be obtained after the text image is input into the neural network. For another example, the text image is processed by Optical Character Recognition (OCR) to obtain the title content included in the text image. In general, in a text image, only text (including chinese characters, english characters, numerals, symbols, and the like) appears in a text object, and therefore, the subject content of the text image may be the subject content of the text object. Or, firstly, the text object in the text image is identified, and then the text object is subjected to text identification to obtain the subject content of the text object.

And step 240, searching answer content corresponding to the topic content in a preset topic library.

Xi Tiku contains a large number of problem topics (i.e., topic content) and corresponding answer content. The subject database can be stored in the learning machine or in a background server of the learning machine. Optionally, the problem base can be classified according to disciplines and grades.

Illustratively, after the topic content is obtained, one or more topic libraries are accessed to find the same problem as the topic content in the topic library. In one embodiment, one or more exercises most similar to the topic content are searched in the topic library by calculating the topic similarity, and then the answer content of the searched exercises is obtained and used as the answer content corresponding to the topic content. At this time, the obtained answer content may be used to determine the accuracy of the current answer of the user.

Optionally, when the target object cannot be detected, the multiple cameras need to be controlled again to shoot so as to detect the target object again, and in the process, the topic content does not need to be identified.

Step 250, detecting a target sub-image containing a target object in a plurality of sub-images, wherein the target object is related to the writing action received by the text object.

And step 260, keeping the camera corresponding to the target sub-image for shooting, closing other cameras, and recognizing a writing track corresponding to the writing action according to the target sub-image acquired in real time.

In the shooting process, after each target sub-image is obtained, the target sub-image is compared with the previous target sub-image to determine a writing track newly written in the current target sub-image, and according to the method, the writing track written in real time by the user can be obtained. Meanwhile, the writing sequence of the writing tracks can be obtained.

And 270, determining the writing content according to the writing track.

Optionally, a text library is pre-constructed, and the tracks of the texts are recorded in the text library, and then the current recognized writing track is compared with the tracks of the texts in the text library to determine the text represented by the current recognized writing track, so as to obtain the writing content. It is understood that when a user continuously inputs a plurality of texts, whether to write a new text may be determined according to a position interval of a writing trace and a pause time of the writing trace. For example, when the user writes "we", after the writing of the "i" word is completed, a certain position interval exists between the first stroke of the "i" word and the last stroke of the "i" word and a pause time is generated, so that it can be determined that a new text "i" is written based on the position interval and the pause time.

Optionally, a neural network for text recognition is constructed, and after the writing trajectory is input into the neural network, the writing content output by the neural network can be obtained.

Step 280, comparing the written content with the answer content to determine whether the written content is accurate.

Illustratively, the writing content is compared with the corresponding answer content, if the writing content is the same as the corresponding answer content, the writing content is determined to be correct, namely the user answers the question accurately, otherwise, the writing content is determined to be wrong, namely the user answers the question incorrectly. This process may be considered a process of modifying the answer. It can be understood that since the writing content is determined in real time during the writing process, the writing content determined in real time is compared with the answer content, and real-time correction in the answering process can be realized.

At present, only the camera corresponding to the target sub-image is used for shooting, and the camera can only shoot the subject content of one or more exercises, so that the subject content answered by the current user needs to be determined first, and then whether the written content is accurate or not needs to be judged, at this time, the comparison of the written content and the answer content in the step may include: determining title contents contained in the target sub-images according to the relative positions of the cameras; and comparing the corresponding answer content of the writing content and the question content.

For example, since the learning machine knows the relative position between the cameras, based on the relative position, it can determine which part of the text object (i.e. which region of the text image) corresponds to the shooting range of the camera corresponding to the target sub-image, and further determine the topic content included in the part, and then only the answer content corresponding to the included topic content is used and compared with the written content. Optionally, after the topic content corresponding to the current target sub-image is determined, which exercise is being answered by the user can be determined according to the writing position of the user when writing, generally speaking, different types of exercises are fixed, and the relative positions of the answer area and the topic content are relatively fixed, for example, the answer areas of the selected questions, the blank filling questions and the judgment questions are all in the topic content or at the positions behind the topic content, and the answer areas of the subjective questions and the application questions are all at the lower positions of the topic content, so that the exercise to be answered at present can be determined by combining the writing position and the relative position relationship of the recently-connected multiple topic contents, and only the answer content corresponding to the exercise is obtained, and then the answer accuracy is judged.

The method includes the steps that when a text object comprises exercise topics and a user answers, all cameras are controlled to shoot a desktop on which the text object is placed, then a plurality of sub-images obtained through shooting are spliced to obtain a text image comprising the complete text object, the text image is identified to obtain topic content, answer content corresponding to the topic content is searched in an exercise library, then target sub-images comprising target objects are identified in the sub-images, then the cameras corresponding to the target sub-images are kept to shoot, other cameras are closed, a writing track of the user is obtained based on the shot target sub-images, the writing content is identified based on the writing track, then the writing content and the answer content are compared to determine whether the user answers accurately or not, real-time correction can be achieved in the user answering process, the range of each camera can be reduced through regional shooting of the plurality of cameras, and under the condition that the shooting of pixels is unchanged, less shooting details are described, so that the content shot by each camera is clearer, real-time correction of the text identification is improved, and real-time correction accuracy of the text identification is guaranteed. Further, the angle of field of the camera is fixed on the desktop through the learning machine and the vertical height of the camera and the desktop and the required maximum shooting range of each camera are determined when the desktop is shot, a reasonable angle of field can be set for the camera by combining with actual shooting requirements, the field of the camera is smaller, the edge of the shooting range is closer to the center, the definition of the edge is improved, the subsequent identification and analysis are convenient, the distance between the cameras is determined by combining the number of the cameras and the maximum shooting range, and the reasonable arrangement of the cameras in the learning machine is ensured.

Fig. 17 is a schematic structural diagram of a writing camera according to an embodiment of the present application. The writing and shooting device is applied to a learning machine, the learning machine is provided with a camera module, the camera module comprises a plurality of cameras, the field angle of the cameras is determined by the vertical height of the cameras and the shooting range required by each camera when the learning machine is fixed on a desktop and the desktop is shot, and the distance between the cameras is determined according to the number of the cameras and the shooting range. Referring to fig. 17, the writing photographing apparatus includes a first photographing unit 301, a first detecting unit 302, and a second photographing unit 303.

The first shooting unit 301 is configured to control all the cameras to shoot the desktop to obtain a plurality of sub-images, a text object is placed on the desktop, and each camera corresponds to one sub-image; a first detecting unit 302, configured to detect a target sub-image containing a target object, among a plurality of sub-images, where the target object is related to a writing action received by the text object; and a second shooting unit 303, configured to keep the camera corresponding to the target sub-image shooting, and turn off other cameras.

In one embodiment of the present application, the second photographing unit 303 includes: and the continuous detection subunit is used for continuously detecting the target object in the target sub-image when the camera corresponding to the target sub-image is kept for shooting, and the closing subunit is used for closing other cameras. The device further comprises: and the return execution unit is used for keeping the cameras corresponding to the target sub-images for shooting, and returning to execute the operation of controlling all the cameras to shoot the desktop when other cameras are closed and the target sub-images are detected not to contain the target object.

In one embodiment of the present application, the second photographing unit 303 includes: the real-time identification subunit is used for keeping the camera corresponding to the target sub-image for shooting, and identifying a writing track corresponding to the writing action according to the target sub-image obtained in real time when other cameras are closed; and the writing content determining subunit is used for determining the writing content according to the writing track.

In one embodiment of the present application, the textbook includes a title of at least one problem, and the apparatus further comprises: the image splicing unit is used for controlling all the cameras to shoot the desktop to obtain a plurality of sub-images, and then splicing the sub-images according to the relative positions of the cameras to obtain a text image containing a complete text object; the title recognition unit is used for performing text recognition on the text image so as to determine the title content in the text object; the answer searching unit is used for searching answer contents corresponding to the question contents in a preset question library; and the content comparison unit is used for comparing the writing content with the answer content after the writing content is determined according to the writing track so as to determine whether the writing content is accurate or not.

In one embodiment of the present application, the content comparison unit includes: the title determining subunit is used for determining the title content contained in the target sub-image according to the relative position between the cameras; and the answer comparison subunit is used for comparing the writing content with the answer content corresponding to the question content.

In an embodiment of the present application, the camera module includes two cameras, the angle of view of each camera is the same, and a calculation formula of the angle of view is:

Fov＝2arctan(s/h)

h＝AF/sinα

the learning machine comprises a desk top, a camera, a table top, a camera, a shooting range, an AF (auto focus) and an alpha (alpha) which are sequentially arranged on the desk top, wherein Fov represents a field angle, s represents a half length of a farthest shooting edge when the camera shoots, the farthest shooting edge is determined according to the shooting range, AF represents a vertical height between the learning machine and the desk top when the learning machine is fixed on the desk top, h represents a length of a connecting line between the camera and a midpoint of the farthest shooting edge when the learning machine is fixed on the desk top and the desk top shoots the desk top, and alpha represents an included angle between the connecting line and the desk top.

In one embodiment of the present application, the camera module includes two cameras, a distance between the two cameras is less than or equal to a length of a farthest shot edge when the desktop is shot by the camera, and the farthest shot edge is determined according to the shooting range.

The writing shooting device provided by the embodiment is contained in the learning machine, can be used for executing the writing shooting method provided by any embodiment, and has corresponding functions and beneficial effects.

Fig. 18 is a schematic structural diagram of a learning machine according to an embodiment of the present application. Specifically, as shown in fig. 18, the learning machine includes a processor 40, a memory 41, and a camera module 42; the number of the processors 40 in the learning machine can be one or more, and one processor 40 is taken as an example in fig. 18; the processor 40, the memory 41, and the camera module 42 in the learning machine may be connected by a bus or other means, and fig. 18 illustrates an example in which these are connected by a bus.

The camera module 42 includes a plurality of cameras, the field angles of the cameras are fixed on a desktop through the learning machine, and when the desktop is photographed, the vertical heights of the cameras and the desktop and the photographing ranges required by the cameras are determined, and the distances between the plurality of cameras are determined according to the number of the cameras and the photographing ranges. The camera module 42 is used for shooting according to the instruction of the processor 40.

The memory 41, as a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules in the writing photographing method in the embodiments of the present application (e.g., the first photographing unit 301, the first detecting unit 302, and the second photographing unit 303 in the writing photographing apparatus). The processor 40 executes various functional applications of the learning machine and data processing by running software programs, instructions and modules stored in the memory 41, that is, implements the writing photographing method provided by any of the above-described embodiments.

The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the learning machine, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the learning machine over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The learning machine may further include an input device operable to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the problem-searching apparatus. The learning machine may also include output devices, which may include a display screen, speakers, etc. The learning machine may further comprise communication means for data communication with a background server or other devices.

The learning machine comprises the writing shooting device provided by the embodiment, can be used for executing the writing shooting method provided by any embodiment, and has corresponding functions and beneficial effects.

The embodiment of the present application also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are used to perform the relevant operations in the writing and shooting method provided in any embodiment of the present application, and have corresponding functions and advantages.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product.

Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. Those skilled in the art will appreciate that the present application is not limited to the particular embodiments described herein, but is capable of many obvious modifications, rearrangements and substitutions without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A writing shooting method is applied to a learning machine and is characterized in that the learning machine is provided with a camera module, the camera module comprises a plurality of cameras, the field angle of each camera is fixed on a desktop through the learning machine, the vertical height between each camera and the desktop and the shooting range required by each camera are determined when the desktop is shot, and the distance between the plurality of cameras is determined according to the number of the cameras and the shooting range;

the method comprises the following steps:

2. The writing photographing method according to claim 1, wherein when the camera which keeps the corresponding target sub-image performs photographing, the method further comprises:

continuously detecting a target object in the target sub-image;

the keeping of the camera corresponding to the target sub-image for shooting and the closing of other cameras further include:

and when the target sub-image is detected not to contain the target object, returning to execute the operation of controlling all the cameras to shoot the desktop.

3. The writing shooting method according to claim 1, wherein when the camera keeping the target sub-image corresponding to the target sub-image is used for shooting and other cameras are turned off, the method further comprises the following steps:

recognizing a writing track corresponding to the writing action according to the target subimage acquired in real time;

and determining the writing content according to the writing track.

4. The writing photographing method of claim 3, wherein the text object includes a subject of at least one problem,

the controlling all the cameras to shoot the desktop to obtain a plurality of sub-images further comprises:

splicing the plurality of sub-images according to the relative position between the cameras to obtain a text image containing a complete text object;

performing text recognition on the text image to determine the title content in the text object;

searching answer content corresponding to the question content in a preset question library;

after determining the writing content according to the writing track, the method further comprises the following steps:

and comparing the written content with the answer content to determine whether the written content is accurate.

5. The writing photographing method of claim 4, wherein the comparing the writing contents and the answer contents comprises:

determining the title content contained in the target sub-image according to the relative position between the cameras;

and comparing the corresponding answer content of the writing content and the title content.

6. The writing photographing method according to claim 1, wherein the camera module comprises two cameras, each camera has the same angle of view, and the calculation formula of the angle of view is:

Fov＝2arctan(s/h)

h＝AF/sinα

7. The writing shooting method according to claim 1, wherein the camera module comprises two cameras, the distance between the two cameras is smaller than or equal to the length of the farthest shot edge when the camera shoots the desktop, and the farthest shot edge is determined according to the shooting range.

8. The writing and shooting device is applied to a learning machine and is characterized in that the learning machine is provided with a camera module, the camera module comprises a plurality of cameras, the field angle of each camera is fixed on a desktop through the learning machine, the vertical height between each camera and the desktop and the shooting range required by each camera are determined when the desktop is shot, and the distance between the cameras is determined according to the number of the cameras and the shooting range;

the device comprises:

and the second shooting unit is used for keeping the camera corresponding to the target sub-image for shooting and closing other cameras.

9. The learning machine is characterized by comprising a camera module, one or more processors and a memory, wherein the camera module comprises a plurality of cameras, the field angle of each camera is determined by the vertical height between the camera and a desktop and the shooting range required by each camera when the learning machine is fixed on the desktop and shoots the desktop, and the distance between the plurality of cameras is determined according to the number of the cameras and the shooting range;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the writing photography method of any of claims 1-7.

10. A storage medium containing computer-executable instructions for performing the writing capture method of any of claims 1-7 when executed by a computer processor.