Nothing Special   »   [go: up one dir, main page]

WO2019176235A1 - Image generation method, image generation device, and image generation system - Google Patents

Image generation method, image generation device, and image generation system Download PDF

Info

Publication number
WO2019176235A1
WO2019176235A1 PCT/JP2018/048149 JP2018048149W WO2019176235A1 WO 2019176235 A1 WO2019176235 A1 WO 2019176235A1 JP 2018048149 W JP2018048149 W JP 2018048149W WO 2019176235 A1 WO2019176235 A1 WO 2019176235A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
detection target
model
target image
generation
Prior art date
Application number
PCT/JP2018/048149
Other languages
French (fr)
Japanese (ja)
Inventor
クリンキグト,マルティン
小味 弘典
俊明 垂井
村上 智一
Original Assignee
株式会社日立産業制御ソリューションズ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立産業制御ソリューションズ filed Critical 株式会社日立産業制御ソリューションズ
Priority to CN201880089706.6A priority Critical patent/CN111742342A/en
Publication of WO2019176235A1 publication Critical patent/WO2019176235A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to an image generation method, an image generation apparatus, and an image generation system.
  • Patent Document 1 Japanese Patent Laid-Open No. 2012-2012
  • Patent Document 2 Japanese Patent No. 088787
  • the object tracking unit extracts a region in which a recognition target is shown from the image of each frame constituting the moving image.
  • the image conversion unit performs geometric conversion on the image in this region.
  • the recognition target sample is generated based on the image, and the region cutout unit sets a region for the image of the frame constituting the moving image, and the image composition unit 35 selects a plurality of images in the set region.
  • a non-recognition target sample image is generated based on an image obtained by synthesizing these regions.
  • the learning unit learns a recognition target using the recognition target sample and the non-recognition target sample.
  • Patent Document 1 how to deal with a case where it is difficult to obtain a sample image for training is not considered in detail, and the cost burden for the user to obtain the sample image for training is not discussed. Still not resolved.
  • an image for machine learning training is generated from data such as a vector model and a 3D model using a neural network, and the generated image is used for machine learning training, thereby improving the efficiency of machine learning training.
  • the purpose is to improve the accuracy of image detection.
  • one of the representative image generation methods of the present invention includes a background image acquisition step of acquiring a background image by an image selection unit, and detection including metadata by the image selection unit.
  • An image for machine learning training is generated from data such as a vector model or a 3D model by a neural network, and the generated image is used for machine learning to improve the efficiency of machine learning training and the accuracy of image detection. Can do.
  • a large amount of images of the object to be detected are required.
  • a white cane hereinafter also referred to as a “white cane”
  • a large amount of white cane footage is generated and machine learning is effectively enhanced.
  • FIG. 1 is a diagram showing an overall system configuration of hardware according to an embodiment of the present invention. As shown in FIG. 1, this system includes a central server 100, a client terminal 130, a client terminal 140, and a network (Internet LAN or the like) 150. The central server 100, the client terminal 130, and the client terminal 140 may be communicably connected to each other via the network 150.
  • a central server 100 the client terminal 130, and the client terminal 140 may be communicably connected to each other via the network 150.
  • the central server 100 is a device that performs image generation requested from the client terminals 130 and 140 via the network 150.
  • the central server 100 can include functional units that perform functions such as image selection, model creation, image processing, and machine learning in the image generation process.
  • the central server 100 may include a unit (for example, a storage unit 120) that stores image data such as a background image and a detection target image, which will be described later, and model data such as a vector model and a 3D model.
  • the client terminal 130 and the client terminal 140 are devices for transmitting an image generation request to the central server 100 via the network 150.
  • the user can input image generation conditions to the client terminal 130 and the client terminal 140.
  • the user may designate a detection target to be described later and a background image used for image generation using the client terminal 130 or the client terminal 140.
  • Instructions such as conditions input at the client terminal 130 and the client terminal 140 are transmitted to the central server 100 via the network 150.
  • the central server 100 is a device that performs image generation requested from the client terminals 130 and 140 via the network 150. As shown in FIG. 1, the central server 100 includes a processing unit 110 that performs each function of image generation, and a storage unit 120 that stores information used for the image generation.
  • the processing unit 110 includes a functional unit for performing each function according to the embodiment of the present invention. Specifically, the processing unit 110 acquires a background image, generates an image selection unit 112 that identifies a detection target image including metadata from a source image, and generates a detection target image model corresponding to the detection target image.
  • a model creation unit 114 that establishes a final image by combining the background image and the detection target image model, an image processing unit 116 that performs image processing on the model, a machine learning detection accuracy improvement process, and a machine learning image It comprises a machine learning unit 118 that performs each step of the improvement process of the creation ability.
  • the processing unit 110 functions as each functional unit described above when an arithmetic processing unit such as a CPU (Central Processing Unit) in the apparatus executes a control program stored in the memory.
  • an arithmetic processing unit such as a CPU (Central Processing Unit) in the apparatus executes a control program stored in the memory.
  • CPU Central Processing Unit
  • the storage unit 120 includes an image database 122 and an image / model database 124.
  • the image database 122 is a database (device or logical storage area) that stores background images used for image generation and data of detection target images described later.
  • the image database 122 may store, for example, image data indicating the state of the station platform as shown in FIG. 7 and metadata included in the image.
  • the storage unit 120 receives an image (source image, background image, desired detection target image) specified by the user from the client terminal 130 or the client terminal 140, and the received image data is used for image generation. It may be stored in the image database 122 in a format that can be obtained.
  • the image / model database 124 is a database (device or logical storage area) that stores a specific image and a model associated with the image in a form associated with each other. For example, as shown in FIGS. 5 to 6, which will be described later, a model (vector model, point cloud, etc.) used for image generation and a realistic image associated with the model are stored in the image / model database 124. It may be saved.
  • the image database 122 and the image / model database 124 of the storage unit 120 are realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory), a flash memory (Flash Memory), or a storage unit such as a hard disk or an optical disk. May be.
  • the client terminal 130 is a device for transmitting an image generation request to the central server 100 via the network 150.
  • the client terminal 130 includes a processing unit 132 that executes commands sent from other functional units in the terminal, an instruction receiving unit 134 that receives instructions from the user (such as image generation conditions), and images (source images, background images, An image selection unit 136 for selecting a detection target image), a communication unit 138 for managing exchanges with the central server 100 and other network terminals (for example, the client terminal 140), and information (image data, commands from the user, etc.).
  • a storage unit 139 for storing.
  • the user can use the client terminal 130 to input image generation conditions and to specify a source image, a background image, or a detection target image used for image generation.
  • the client terminal 130 may transmit conditions and instructions input from the user to the central server 100.
  • the client terminal 140 is a device for transmitting an image generation request to the central server 100 via the network 150.
  • the client terminal 140 includes a processing unit 142 that executes commands sent from other functional units in the terminal, and an instruction receiving unit 144 that receives instructions from the user (such as image generation conditions).
  • a communication unit 146 that manages exchanges with the central server 100 and other network terminals (for example, the client terminal 130), and a storage unit 148 that stores information (image data, commands from the user, and the like).
  • the client terminal 140 differs from the client terminal 130 in that it does not have an image selection unit such as the image selection unit 136.
  • an image used for image generation is selected using the image selection unit 112 of the central server 100 according to a user instruction.
  • the image selection unit 112 of the central server 100 may select automatically (for example, randomly).
  • FIG. 2 is a flowchart showing the flow of the image generation method according to the first embodiment of the present invention.
  • a background image is acquired.
  • the expression to acquire includes obtaining, receiving, securing, procuring, selecting, and specifying.
  • the background image is an image that becomes a final image by arranging a detection target image described later.
  • the background image may be an image showing various environments such as a station platform, an airport boarding gate, a concert or sports game venue, or a shopping mall.
  • This background image may be designated by the user (for example, the image selection unit 136 of the client terminal 130), and is stored in the image database 122 of the storage unit 120 according to an instruction (for example, via the client terminal 140) input by the user.
  • the selected image may be selected by the image selection unit 112 of the central server 100 from among the images being displayed.
  • a detection target image is specified.
  • the expression specifying includes selecting, selecting, setting, specifying, identifying, or detecting.
  • the detection target image is an image in which an object that the user wants to place in the background image is captured.
  • the detection target image may be an image showing an object to be trained by the machine learning unit so that it can be detected from within the image.
  • the detection target image may include, for example, a person who has a white cane, a person who wears specific clothes, a luggage exceeding a predetermined size, a certain kind of animal, and the like.
  • the detection target image may include metadata.
  • the metadata here indicates information indicating the position of the detection target image such as two-dimensional coordinates, and the shape (rectangle, round) and size (length / height as viewed in pixels) of the detection target image. Information that represents the information and the nature of the detection target image (labels of people, animals, luggage, cars, etc.) may be included. This metadata may be used for machine learning training described later.
  • the detection target image may be specified by the image selection unit 112 of the central server 100 from the source image input by the user, or may be specified directly by a user instruction.
  • the instruction receiving unit 134 of the client terminal 130 or the instruction receiving unit 144 of the client terminal 140 is displayed as a “target image” in the source image. It is assumed that the user is designated as “unweared person” and the request is received from the user. In this case, the instruction receiving unit 134 that has received this request transmits a user instruction to the central server 100, and the image selection unit 112 of the central server 100 selects a person who has not worn a helmet from the source image in accordance with the user request. You may specify as a detection target image.
  • a detection target image model is created.
  • the expression creating includes creating, creating, forming, preparing, and creating.
  • the detection target image model is a model that embodies the shape and structure of the object shown in the detection target image.
  • the detection target image model for example, a vector model, a point cloud, or a 3D model may be used.
  • the detection target image model may be automatically performed by, for example, a known model creation tool.
  • the detection target image model created here is processed in the image processing unit 116 of the central server 100 and a hostile generation network described later, so that it can be finished as an image closer to a real image. Good.
  • a final image is established.
  • the phrase establishing includes establishing, setting, establishing, creating, building, providing, and creating.
  • the final image is an image generated by combining the background image acquired in step S200 and the detection target image model created in step S240.
  • the final image may be generated by the model creation unit 114 of the central server 100 combining the background image and the detection target image model. Details of the process of combining the background image and the detection target image model to generate the final image will be described with reference to FIG.
  • a background image is acquired, a detection target image including metadata is identified from the source image, a detection target image model corresponding to the detection target image is generated, and the background image and the detection target image model are combined. Then, by establishing the final image, it is possible to generate an image for machine learning even when there are few actual detection target images and it is difficult to obtain.
  • the camera parameter calculation method according to the first embodiment of the present invention will be described with reference to FIG.
  • the background image 320, the ticket gate 321, the horizontal line 323, and the detection target image model 327 are used for the calculation of the camera parameters.
  • the model creation unit 114 can place the detection target image model 327 in the background image 320 at an appropriate position, size, and orientation.
  • a reference object is first identified from the background image 320.
  • the reference object here is an object that serves as a guide for the size of the detection target image model arranged in the background image 320.
  • the reference object may be one whose size is generally known or easily estimated and estimated.
  • the ticket gate 321 may be identified as a reference object.
  • camera parameters are calculated based on the dimensional elements (height, length, etc.) of the identified reference object. Specifically, a reference such as a horizontal line 323 that matches the position of the ticket gate 321 identified as the reference object in the background image 320 is set, and the length and height of the ticket gate 321 as measured by the number of pixels are measured.
  • the final image can be generated by combining the detection target image model 327 with the background image 320 based on the camera parameters calculated here.
  • the camera parameter is calculated based on the dimension element of the reference object, and the detection target image model is appropriately combined with the background image based on the calculated camera parameter. It can be arranged in size, position and orientation.
  • step S400 a background image is acquired.
  • the background image acquisition here is substantially the same as the background image acquisition in S200 in FIG. 3, and thus the description thereof is omitted here.
  • a vector model representing the detection target image is arranged on the background image acquired in step S400.
  • a vector model is a model that expresses the shape and structure of an object shown in a detection target image by a space vector.
  • FIG. 5 is a diagram illustrating an example of a vector model and an image according to the first embodiment of the present invention.
  • a vector model 531 shown in FIG. 5 is a vector model of the human body.
  • the vector model 531 may be arranged in the background image at an appropriate position, size, and orientation based on the camera parameters described with reference to FIG.
  • the vector model is adjusted.
  • the adjustment of the vector model may be performed by the image processing unit 116 of the central server 100.
  • the vector model adjustment is an image closer to an actual image (hereinafter also referred to as “realistic image”) obtained by using the generally known image processing technique to arrange the vector model in step S420.
  • the adjustment of the vector model here may be performed by, for example, a hostile generation network (sometimes referred to as a generative Adversary Network or GAN).
  • the image processing unit 116 compares the vector model arranged in the background image with a model stored in the image / model database of the storage unit 120 of the central server 100, and the similarity between the vector model and the model is one of them. The highest image may be selected.
  • the vector model 531 may be converted into a realistic image 532 such as a person holding a white cane.
  • the final image may be generated by superimposing the selected image on the vector model.
  • step S460 machine learning training is performed using the final image generated in step S440.
  • the machine learning here may be performed by the machine learning unit 118 of the central server 100.
  • the final image obtained by combining the background image and the detection target image may be performed using a technique for training a neural network such as a hostile generation network. Since the details of the machine learning training process will be described later, a description thereof is omitted here.
  • step S480 the machine learning system trained in S460 is actually applied.
  • a machine learning system trained with images generated by the method of the present embodiment is used for actual training such as fully automatic driving car accident detection, structural crack detection, natural disaster simulation, etc. This is useful when it is difficult to obtain image data.
  • the image generation according to the present invention may use a vector model and a realistic image corresponding to the vector model.
  • a detection target that is a source of the detection target image 641 can be obtained by being photographed in a laboratory environment.
  • a generally known edge / orientation detection algorithm such as Openpose is applied to the detection target image 641 to generate a vector model superimposed on the detection target image 641.
  • the vector model 643 and the detection target image 641 are stored in the image / model database 124 of the storage unit 120 as a model / image pair 645 associated with each other, whereby each of the processing units 110 of the central server 100 is stored.
  • the functional part can be made accessible.
  • the hostile generation network can be easily trained by storing the vector model and the realistic image in association with each other.
  • an image showing the state of the station platform is acquired as the background image 301.
  • the camera parameters of the background image 301 are calculated based on reference objects such as the track 302 and the ticket gate 303.
  • the model creation unit 114 places a vector model 304 corresponding to the detection target image requested by the user in the background image 301 in accordance with the position, size, and orientation specified by the calculated camera parameter.
  • the image processing unit 116 generates a realistic image 305 corresponding to the vector model 304 and inserts it into the background image 301 at the same position, size, and orientation as the vector model 304.
  • the final image 309 is obtained by combining the realistic image 305 corresponding to the detection target image and the background image 301. Further, as described above, this final image 309 may be used to train a machine learning method such as a neural network or a support vector machine. As described above, even when the number of detection targets is small and it is difficult to obtain, the machine learning image can be generated by using the invention of this embodiment.
  • step S400 a background image is acquired.
  • step S420 the vector model is placed on the background image.
  • step S440 the vector model is adjusted. Since these steps are substantially the same as the image generation method described with reference to FIG. 4, description thereof is omitted here.
  • the present invention is not limited to this, and the final image may be used for another purpose. Accordingly, in step S800, the final image generated is provided after adjusting the vector model in step S440. This final image may be provided, for example, to the party who requested the image generation, or may be transmitted to a third party. In addition to machine learning training, the final image may be applied to advertising, face recognition, object detection, image processing, and the like. As described above, the image generation method according to the present invention may be applied not only to train machine learning but also to various fields.
  • a model such as a vector model
  • a vector model representing the detection target image.
  • the process for establishing the detection target image according to the second embodiment of the present invention is not a vector model or a realistic image, but simply inserting a partial image into the background image as a detection target image, Solve the above problems.
  • an image showing the state of the station platform is acquired as the background image 301.
  • the background image 301 includes reference objects such as the track 302 and the ticket gate 303.
  • the position / size specified in the camera parameter (or the image generation condition input by the user) calculated based on the reference object such as the track 302 and the ticket gate 303 is set.
  • the partial image 314 is defined as the detection target image. This partial image is, for example, an image that is made in an arbitrary size and shape and becomes a certain area in the background image.
  • the shape of the area of the partial image 314 according to the present invention is not limited to a rectangle, and may be any shape. .
  • a final image having a smaller file size than the final image obtained by the image generation method described in the first embodiment can be obtained.
  • an image that matches the size of the partial image 314 is inserted into the area of the partial image 314. Also good.
  • the process for establishing the detection target image solves the above problem by inserting or replacing a clear partial image as a detection target image in the background image.
  • an image indicating the state of the platform at the station where a candidate object (for example, a person) 324 as a detection target image is captured is acquired as the background image 301.
  • this candidate object 324 is, for example, unclear or partially missing, it may be difficult to generate a model that accurately represents this candidate object 324.
  • the partial image 325 is drawn so as to surround the whole or part of the candidate object 324.
  • the partial image 325 may be designated by a user via a GUI or the like, or may be automatically generated by the machine learning unit 118.
  • the image processing unit 116 of the central server 100 performs image processing on the area where the partial image 325 is designated, thereby finishing the candidate object 324 as the detection target image 326, so that the detection target image 326 is displayed.
  • the reflected background image 301 as the final image 329, a final image that can be used for machine learning is obtained.
  • the partial image 325 here is substantially the same as the partial image 314 described in the third embodiment.
  • the third embodiment even when the detection target image is unclear or incomplete, it is possible to generate a high-quality image and obtain an effect of pushing the file size of the image.
  • an aspect of the present invention relates to the use of an image generated by the above-described image generation method for machine learning training.
  • an example of improving the image generation capability of machine learning will be described with respect to a hostile generation network, but the present invention is not limited thereto, and may be applied to any machine learning method such as a support vector machine.
  • the hostile generation network is a network composed of two networks, a generation network (generator) and an identification network (discriminator), and learns by competing two data sets. Specifically, when a pair of a basic basic image and a target image desired to be generated in a network is input, the generation side generates and outputs a created image as a result. It is better that the created image is similar to the target image. The identification side compares the created image with the target image to determine the accuracy of the created image. Thus, the generator learns to deceive the discriminator, and the discriminator learns to identify more accurately
  • a detection target that is a source of the detection target image is photographed in a laboratory environment to obtain a detection target image. For example, as shown in FIG. 11, by photographing a person who has a white color, a person having a white color can be obtained as a detection target image.
  • a vector corresponding to the detection target image is applied by applying a generally known edge / orientation detection algorithm such as Openpose to the detection target image obtained in step 1110.
  • a model is generated. For example, when the detection target image is a person, as shown in FIG. 11, a vector model representing a person's head, shoulders, arms, torso, legs, and the like may be generated.
  • an edge extraction technique may be applied. These edges may be expressed by splines or the like.
  • step 1130 the detection target image captured in step 1110 and the vector model generated in step 1120 are stored in the image / model database 124 of the storage unit 120 as a model / image pair associated with each other. It may be saved.
  • step 1140 the background image in which the vector model is arranged is input as a basic image to a hostile generation network (sometimes called a second neural network).
  • step 1150 the generation network of the hostile generation network converts the vector model into a realistic image based on the model / image pair associated in step 1130, thereby realizing a realistic corresponding to the vector model.
  • An image in which a simple image is reflected in the background image is created as a creation image.
  • the hostile generation network compares the detection target (target image) photographed in step 1110 with the created image created in step 1150.
  • the identification network of the hostile generation network compares the metadata of the target image and the created image (information that defines the position, shape, size, property, etc. of the object in the image). Also good.
  • the identification network may compare the target image and the created image using a predetermined similarity criterion. This similarity criterion may be, for example, a threshold of the degree to which two or more images are similar to each other. If the target image and the generated image meet a predetermined similarity criterion (that is, if it is determined that the target image and the generated image are sufficiently similar to each other), the parameters of the hostile generation network are adjusted.
  • the This parameter adjustment includes, for example, setting the conditions used for creating the created image so as to be applied to other image generation.
  • the basic image is input to the hostile generation network, a created image is created based on the basic image, the created image and the target image are compared, and the created image and the target image achieve a predetermined similarity criterion.
  • a hostile generation network that can generate a high-quality final image can be obtained by adjusting the parameters of the hostile generation network.
  • one detection target 1203 may be captured by a plurality of cameras 1207, 1208, and 1209.
  • the detection target images captured by the respective cameras 1207, 1208, and 1209 and the background images captured by the respective cameras 1207, 1208, and 1209 are used for the image generation method described above, thereby performing the same detection.
  • a final image showing the object 1203 from different perspectives can be generated.
  • the detection target moves, it is necessary to represent the verification target model as an image series in order to express the movement of the detection target.
  • the detection target 1213 advances in the direction indicated by the arrow 1215 as shown in FIG.
  • the movement of the detection target 1213 is imaged by the camera 1217. Therefore, by using the image captured by the camera 1217 in the image generation method described above, a detection target model (such as a vector model) can be generated for each frame of movement of the detection target 1213.
  • a detection target model such as a vector model
  • an image series that smoothly represents the movement of the detection target 1213 can be obtained. Note that not only when the detection target moves, but also an image showing the same detection target in different illumination environments (for example, morning and night, or natural light and artificial light) can be generated.
  • the model creation unit 114 may generate a first detection target model corresponding to the first detection target image and a second detection target image model corresponding to the second detection target image. Next, as described above, the model creation unit 114 acquires the first background image. Finally, the model creation unit 114 may insert the first detection target image model and the second detection target image model into the first background image.
  • an aspect of the present invention relates to using an image generated by the above-described image generation method for machine learning training.
  • an example of improving the object detection accuracy of machine learning will be described with respect to a neural network such as Faster-RCNN and SVM.
  • the present invention is not limited to this and may be applied to any object detection algorithm or machine learning method. .
  • the metadata may be information defining characteristics such as the position, shape, size, and property of the detection target model 402 in the image 401.
  • the image 401 may be a final image generated by any of the image generation methods described above (for example, the final image 309 in FIG. 3, the final image 315 in FIG. 9, the final image 329 in FIG. 10, etc.).
  • the image 401 including the detection target model may be provided to the object detection neural network.
  • the target image 404 is provided to the object detection network.
  • the target image 404 is, for example, an image including a target object 405 that is the same as or similar to the detection target model 402 shown in the image 401, and is an image that is a target for object detection.
  • the object detection network performs object detection optimization 403 on the target image 404 and attempts to identify the target object 405 from the target image 404 based on the metadata of the detection target model 402. Specifically, the object detection network compares the metadata of the detection target model 402 with the object shown in the target image 404, and specifies the object having the highest matching with the metadata. As shown in FIG. 10, the object detection network may be indicated by a square area 406 or the like surrounding the specified target object 405.
  • the specific accuracy of the object detection network is calculated based on the result of the object detection.
  • This identification accuracy evaluates factors such as how much the object identified by the object detection network matches the detection target model 402, whether all target objects have been identified, or objects other than the target object have been incorrectly identified.
  • the result is a process of expressing the result in a quantitative form. For example, the degree of specification may be expressed as a percentage such as 75% or 91%. As an example, when 9 out of 10 target objects are correctly identified, the calculated identification accuracy may be 90%.
  • the calculated specific accuracy may be compared with a predetermined specific accuracy criterion (a predetermined accuracy threshold). If the calculated specific accuracy does not achieve a predetermined specific accuracy criterion, it may be determined that the object detection optimization described above is repeated (that is, better specific accuracy by repeating object detection). Seeking).
  • the metadata associated with the detection target image model is provided to the object detection network, and based on the metadata, the detection target image is specified from the target image to the object detection network.
  • the present invention may be configured as a client / server architecture.
  • the user 1401 may specify a desired background image and a desired detection target via a terminal 1402 such as a computer, a tablet PC, or a smartphone.
  • the server on the cloud 1403 may generate a final image using the detection target 1409 specified by the user 1401 and the data stored in the background image 1408 and / or the storage unit 1404.
  • the camera 1405 may be directly connected to the cloud 1403, and an image or video captured by the camera 1405 may be transmitted to the image generation service provider without going through the user terminal.
  • the user may contact a desired detection target using another means such as an e-mail, a telephone, or a smartphone.
  • 100 central server 110 processing unit, 112 image selection unit, 114 model creation unit, 116 image processing unit, 118 machine learning unit, 120 storage unit, 122 image database, 124 image / model database, 130 client terminal, 140 client terminal

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The purpose of the present invention is to generate, by means of a neural network, an image for machine-learning training from data such as a vector model or a 3D model, and improve the efficiency of the machine-learning training or the accuracy of object detection in the image by using the generated image in machine learning. To this end, when it is difficult to obtain the image for machine-learning training, the present invention can generate images for machine-learning training in large quantities by combining an image to be detected and a background image that represents a desired background and by applying a hostile generation network to the combined image.

Description

画像生成方法、画像生成装置及び画像生成システムImage generation method, image generation apparatus, and image generation system
 本発明は、画像生成方法、画像生成装置及び画像生成システムに関するものである。 The present invention relates to an image generation method, an image generation apparatus, and an image generation system.
 近年、画像処理技術において、機械学習手法(ディープラーニング等のニューラルネットワーク)を用いることで、撮影された画像から特定の物体を認識する検出精度が向上している。これらの機械学習手法を最適化する方法の一つとしては、多数の画像をトレーニング用のサンプルとしてシステムに入力し、機械学習を訓練させることがある。 In recent years, detection accuracy for recognizing a specific object from a captured image has been improved by using a machine learning method (a neural network such as deep learning) in image processing technology. One method for optimizing these machine learning techniques is to input a large number of images as training samples into the system to train machine learning.
 例えば、認識対象及び認識対象を含まない画像パターンを、自動的に収集し、このようにして収集した画像パターンを機械学習に用いることにより、高精度の画像認識を行うシステムとして、特開2012-088787(特許文献1)に記載の技術がある。この公報には「物体追跡部は、動画像を構成する各フレームの画像から、認識対象が映っている領域を抽出する。画像変換部は、この領域内の画像に対して幾何変換を行った画像に基づいて認識対象サンプルを生成する。領域切出部は、動画像を構成するフレームの画像に対して領域を設定する。画像合成部35は、設定したそれぞれの領域内の画像中の複数の領域を合成した画像に基づいて非認識対象サンプル画像を生成する。学習部は、認識対象サンプルと非認識対象サンプルとを用いて認識対象を学習する。」という記載がある。 For example, as a system for performing high-accuracy image recognition by automatically collecting a recognition target and an image pattern that does not include the recognition target and using the collected image pattern for machine learning, Japanese Patent Laid-Open No. 2012-2012 There is a technique described in Japanese Patent No. 088787 (Patent Document 1). This publication states that “the object tracking unit extracts a region in which a recognition target is shown from the image of each frame constituting the moving image. The image conversion unit performs geometric conversion on the image in this region. The recognition target sample is generated based on the image, and the region cutout unit sets a region for the image of the frame constituting the moving image, and the image composition unit 35 selects a plurality of images in the set region. There is a description that a non-recognition target sample image is generated based on an image obtained by synthesizing these regions.The learning unit learns a recognition target using the recognition target sample and the non-recognition target sample.
特開2012-088787号公報JP 2012-088787 A
 訓練用のサンプル画像が入手しやすい場合、機械学習の性能は向上しやすいが、訓練用のサンプル画像が入手困難・入手不可能な場合には、機械学習による画像物体検出の精度を向上することが難しい。このため、ユーザは、コストをかけて機械学習を訓練するサンプル画像を入手することとなる。しかし、特許文献1では、訓練用のサンプル画像が入手困難の場合にどの様に対応すべきかについては、深く検討されておらず、ユーザが訓練用のサンプル画像を入手するためのコスト負担については、依然として解決されていない。 If training sample images are readily available, machine learning performance is likely to improve.If training sample images are difficult to obtain or cannot be obtained, improve the accuracy of image object detection by machine learning. Is difficult. For this reason, a user will acquire the sample image which trains machine learning at cost. However, in Patent Document 1, how to deal with a case where it is difficult to obtain a sample image for training is not considered in detail, and the cost burden for the user to obtain the sample image for training is not discussed. Still not resolved.
 そこで、本発明では、ベクトルモデルや3Dモデル等のデータから機械学習訓練用の画像をニューラルネットワークを用いて生成し、この生成された画像を機械学習の訓練に用いることで機械学習の訓練の効率や画像の検知精度を向上することを目的とする。 Accordingly, in the present invention, an image for machine learning training is generated from data such as a vector model and a 3D model using a neural network, and the generated image is used for machine learning training, thereby improving the efficiency of machine learning training. The purpose is to improve the accuracy of image detection.
 上記課題を解決するために、代表的な本発明の画像生成方法の一つは、画像選択部によって、背景画像を取得する背景画像取得工程と、前記画像選択部によって、メタデータを備えた検出対象画像をソース画像から特定する検出対象画像特定工程と、モデル作成部によって、前記検出対象画像に対応する検出対象画像モデルを生成するモデル生成工程と、前記モデル作成部によって、前記背景画像と前記検出対象画像モデルとを結合させることにより、最終画像を確立する検出対象画像確立工程を含む画像生成方法である。 In order to solve the above problems, one of the representative image generation methods of the present invention includes a background image acquisition step of acquiring a background image by an image selection unit, and detection including metadata by the image selection unit. A detection target image specifying step of specifying a target image from a source image, a model generation step of generating a detection target image model corresponding to the detection target image by a model generation unit, and the background image and the This is an image generation method including a detection target image establishing step of establishing a final image by combining with a detection target image model.
 ベクトルモデルや3Dモデル等のデータから機械学習訓練用の画像をニューラルネットワークによって生成し、この生成された画像を機械学習に用いることで機械学習の訓練の効率や画像の検出の精度を向上することができる。 An image for machine learning training is generated from data such as a vector model or a 3D model by a neural network, and the generated image is used for machine learning to improve the efficiency of machine learning training and the accuracy of image detection. Can do.
本発明の実施形態に係るハードウェアの全体システム構成を示す図である。It is a figure which shows the whole system configuration | structure of the hardware which concerns on embodiment of this invention. 本発明の第1実施形態に係る画像生成方法のフローチャートである。It is a flowchart of the image generation method which concerns on 1st Embodiment of this invention. 本発明の第1実施形態に係るカメラパラメータの計算手法を説明するための図である。It is a figure for demonstrating the calculation method of the camera parameter which concerns on 1st Embodiment of this invention. 本発明の第1実施形態に係る画像生成方法の変形例のフローチャートである。It is a flowchart of the modification of the image generation method which concerns on 1st Embodiment of this invention. 本発明の第1実施形態に係るベクトルモデル及び画像の一例を示す図である。It is a figure which shows an example of the vector model and image which concern on 1st Embodiment of this invention. 本発明の第1実施形態に係るベクトルモデル・画像の対応付けの一例を示す図である。It is a figure which shows an example of matching of the vector model and image which concerns on 1st Embodiment of this invention. 本発明の第1実施形態に係る検出対象画像を確立するための処理の一例を示す図である。It is a figure which shows an example of the process for establishing the detection target image which concerns on 1st Embodiment of this invention. 本発明の実施形態に係る画像生成方法のフローチャートである。It is a flowchart of the image generation method which concerns on embodiment of this invention. 本発明の第2実施形態に係る検出対象画像を確立するための処理の一例を示す図である。It is a figure which shows an example of the process for establishing the detection target image which concerns on 2nd Embodiment of this invention. 本発明の第3実施形態に係る検出対象画像を確立するための処理の一例を示す図である。It is a figure which shows an example of the process for establishing the detection target image which concerns on 3rd Embodiment of this invention. 本発明の第4実施形態に係る機械学習画像を作成する能力を向上させる工程の一例を示す図である。It is a figure which shows an example of the process of improving the capability to produce the machine learning image which concerns on 4th Embodiment of this invention. 本発明の第5実施形態に係る画像を生成する方法の一例を示す図である。It is a figure which shows an example of the method to produce | generate the image which concerns on 5th Embodiment of this invention. 本発明の第5実施形態に係る画像を生成する方法の一例を示す図である。It is a figure which shows an example of the method to produce | generate the image which concerns on 5th Embodiment of this invention. 本発明の第6実施形態に係る機械学習の検出精度を向上させる工程の一例を示す図である。It is a figure which shows an example of the process of improving the detection precision of the machine learning which concerns on 6th Embodiment of this invention. 本発明の実施形態に係るシステムアーキテクチャの一例を示す図である。It is a figure which shows an example of the system architecture which concerns on embodiment of this invention. 本発明の実施形態の一例を説明する概念図である。It is a conceptual diagram explaining an example of embodiment of this invention.
 以下、図面を参照して、従来例及び本発明の第1実施形態について説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Hereinafter, a conventional example and a first embodiment of the present invention will be described with reference to the drawings. In addition, this invention is not limited by this embodiment. Moreover, in description of drawing, the same code | symbol is attached | subjected and shown to the same part.
 まず、図16を参照して、本発明の実施形態の概念の一例を説明する。 First, an example of the concept of the embodiment of the present invention will be described with reference to FIG.
 物体の検出を行う機械学習においては、機械学習のシステムを訓練するためには、検出させたい物体の画像を大量に必要としている。例えば、白い杖を持っている人(以下、「白杖者」ともいう)を機械学習を用いて検知しようとする場合、従来は白杖者の映像が大量にないと学習ができなかった。そこで、本発明では、検出対象(例えば、白杖者)の映像が少ない場合であっても、一般歩行者の映像(大量に入手可)と、白杖者の映像(少量)とを用いることにより、大量の白杖者の映像を生成し、機械学習を効率的に強化するものである。 In machine learning for detecting an object, in order to train a machine learning system, a large amount of images of the object to be detected are required. For example, when trying to detect a person who has a white cane (hereinafter also referred to as a “white cane”) using machine learning, it has not been possible to learn unless there are a large number of images of the white cane. Therefore, in the present invention, even when there are few images of the detection target (for example, a white cane), an image of a general pedestrian (available in large quantities) and a video of a white cane (small amount) are used. Thus, a large amount of white cane footage is generated and machine learning is effectively enhanced.
[画像生成システムの構成」
 図1は、本発明の実施形態に係るハードウェアの全体システム構成を示す図である。図1に示すように、本システムは、中央サーバ100と、クライアント端末130と、クライアント端末140と、ネットワーク(インターネットLAN等)150から構成される。そして、中央サーバ100と、クライアント端末130と、クライアント端末140とはネットワーク150を介してお互いに通信可能に接続されていてもよい。
[Configuration of image generation system]
FIG. 1 is a diagram showing an overall system configuration of hardware according to an embodiment of the present invention. As shown in FIG. 1, this system includes a central server 100, a client terminal 130, a client terminal 140, and a network (Internet LAN or the like) 150. The central server 100, the client terminal 130, and the client terminal 140 may be communicably connected to each other via the network 150.
 中央サーバ100は、ネットワーク150を介してクライアント端末130、140から要求された画像生成を行う装置である。具体的には、中央サーバ100は画像生成の工程における画像選択、モデル作成、画像処理、機械学習等の機能を実施する機能部を含むことができる。また、中央サーバ100は、後述する背景画像及び検出対象画像等の画像データや、ベクトルモデルと3Dモデル等のモデルデータを格納する手段(例えば記憶部120)を有していてもよい。 The central server 100 is a device that performs image generation requested from the client terminals 130 and 140 via the network 150. Specifically, the central server 100 can include functional units that perform functions such as image selection, model creation, image processing, and machine learning in the image generation process. In addition, the central server 100 may include a unit (for example, a storage unit 120) that stores image data such as a background image and a detection target image, which will be described later, and model data such as a vector model and a 3D model.
 クライアント端末130及びクライアント端末140は、ネットワーク150を介して中央サーバ100に画像生成の要求を送信するための装置である。具体的には、ユーザはクライアント端末130及びクライアント端末140に、画像生成の条件を入力することができる。例えば、ユーザは、クライアント端末130又はクライアント端末140を用いて、後述する検出対象物や画像生成に用いられる背景画像を指定してもよい。クライアント端末130及びクライアント端末140で入力された条件等の指示はネットワーク150を介して中央サーバ100に送信される。 The client terminal 130 and the client terminal 140 are devices for transmitting an image generation request to the central server 100 via the network 150. Specifically, the user can input image generation conditions to the client terminal 130 and the client terminal 140. For example, the user may designate a detection target to be described later and a background image used for image generation using the client terminal 130 or the client terminal 140. Instructions such as conditions input at the client terminal 130 and the client terminal 140 are transmitted to the central server 100 via the network 150.
 [中央サーバ100の構成]
 前述のように、中央サーバ100は、ネットワーク150を介してクライアント端末130、140から要求された画像生成を行う装置である。図1に示すように、中央サーバ100は、画像生成の各機能を実施する処理部110と、当該画像生成に用いられる情報を記憶する記憶部120とを含む。
[Configuration of Central Server 100]
As described above, the central server 100 is a device that performs image generation requested from the client terminals 130 and 140 via the network 150. As shown in FIG. 1, the central server 100 includes a processing unit 110 that performs each function of image generation, and a storage unit 120 that stores information used for the image generation.
 処理部110は本発明の実施形態に係る各機能を実施するための機能部を含む。具体的には、処理部110は、背景画像を取得し、メタデータを備えた検出対象画像をソース画像から特定する画像選択部112と、検出対象画像に対応する検出対象画像モデルを生成し、背景画像と検出対象画像モデルとを結合させることにより、最終画像を確立するモデル作成部114と、モデルに対して画像処理を施す画像処理部116と、機械学習検出精度の向上処理及び機械学習画像作成能力の向上処理の各工程を実施する機械学習部118とから構成される。 The processing unit 110 includes a functional unit for performing each function according to the embodiment of the present invention. Specifically, the processing unit 110 acquires a background image, generates an image selection unit 112 that identifies a detection target image including metadata from a source image, and generates a detection target image model corresponding to the detection target image. A model creation unit 114 that establishes a final image by combining the background image and the detection target image model, an image processing unit 116 that performs image processing on the model, a machine learning detection accuracy improvement process, and a machine learning image It comprises a machine learning unit 118 that performs each step of the improvement process of the creation ability.
 処理部110は、装置内部のCPU(Central Processing Unit)等の演算処理部がメモリに記憶された制御プログラムを実行することによって、上記記載の各機能部として機能する。 The processing unit 110 functions as each functional unit described above when an arithmetic processing unit such as a CPU (Central Processing Unit) in the apparatus executes a control program stored in the memory.
 記憶部120は、画像データベース122と、画像・モデルデータベース124とを含む。画像データベース122は、画像生成に用いられる背景画像や、後述する検出対象画像のデータを格納するデータベース(装置又は論理的な記憶領域)である。画像データベース122には、例えば、図7に示されるような駅のホームの様子を示す画像データと、当該画像が備えるメタデータが格納されていてもよい。ある実施形態では、記憶部120は、ユーザが指定した画像(ソース画像、背景画像、所望の検出対象画像)をクライアント端末130又はクライアント端末140から受信し、受信した画像データを画像生成に使われ得る形式で画像データベース122に保存してもよい。また、画像・モデルデータベース124は、特定の画像と、当該画像に対応付けられたモデルをお互いに対応付けた形態で格納するデータベース(装置又は論理的な記憶領域)である。例えば、後述する図5~図6等に示されるように、画像生成に用いられるモデル(ベクトルモデルや点群等)と、そのモデルに対応付けられた現実的な画像が画像・モデルデータベース124に保存されてもよい。なお、記憶部120の画像データベース122と画像・モデルデータベース124は、例えば、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶部によって実現されてもよい。 The storage unit 120 includes an image database 122 and an image / model database 124. The image database 122 is a database (device or logical storage area) that stores background images used for image generation and data of detection target images described later. The image database 122 may store, for example, image data indicating the state of the station platform as shown in FIG. 7 and metadata included in the image. In one embodiment, the storage unit 120 receives an image (source image, background image, desired detection target image) specified by the user from the client terminal 130 or the client terminal 140, and the received image data is used for image generation. It may be stored in the image database 122 in a format that can be obtained. The image / model database 124 is a database (device or logical storage area) that stores a specific image and a model associated with the image in a form associated with each other. For example, as shown in FIGS. 5 to 6, which will be described later, a model (vector model, point cloud, etc.) used for image generation and a realistic image associated with the model are stored in the image / model database 124. It may be saved. The image database 122 and the image / model database 124 of the storage unit 120 are realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory), a flash memory (Flash Memory), or a storage unit such as a hard disk or an optical disk. May be.
 [クライアント端末130の構成]
 前述のように、クライアント端末130は、ネットワーク150を介して中央サーバ100に画像生成要求を送信するための装置である。クライアント端末130は、端末内の他の機能部から送られる命令を実行する処理部132と、ユーザからの指示(画像生成条件等)を受け付ける指示受付部134と、画像(ソース画像、背景画像、検出対象画像)を選択する画像選択部136と、中央サーバ100や他のネットワーク端末(例えばクライアント端末140)とのやり取りを管理する通信部138と、情報(画像データやユーザからのコマンド等)を格納する記憶部139とを含む。上記の通り、ある実施形態では、ユーザはクライアント端末130を利用し、画像生成の条件を入力したり、画像生成に用いられるソース画像、背景画像、又は検出対象画像を指定したりすることができる。クライアント端末130はユーザから入力された条件や指示を中央サーバ100に送信してもよい。
[Configuration of Client Terminal 130]
As described above, the client terminal 130 is a device for transmitting an image generation request to the central server 100 via the network 150. The client terminal 130 includes a processing unit 132 that executes commands sent from other functional units in the terminal, an instruction receiving unit 134 that receives instructions from the user (such as image generation conditions), and images (source images, background images, An image selection unit 136 for selecting a detection target image), a communication unit 138 for managing exchanges with the central server 100 and other network terminals (for example, the client terminal 140), and information (image data, commands from the user, etc.). And a storage unit 139 for storing. As described above, in an embodiment, the user can use the client terminal 130 to input image generation conditions and to specify a source image, a background image, or a detection target image used for image generation. . The client terminal 130 may transmit conditions and instructions input from the user to the central server 100.
 [クライアント端末140の構成]
 クライアント端末130と同様に、クライアント端末140はネットワーク150を介して中央サーバ100に画像生成要求を送信するための装置である。また、クライアント端末140は、クライアント端末130と同様に、端末内の他の機能部から送られる命令を実行する処理部142と、ユーザからの指示(画像生成条件等)を受け付ける指示受付部144と、中央サーバ100や他のネットワーク端末(例えばクライアント端末130)とのやり取りを管理する通信部146と、情報(画像データやユーザからのコマンド等)を格納する記憶部148とを含む。クライアント端末140は、画像選択部136のような画像選択部を有しない点において、クライアント端末130と異なる。従って、画像生成要求が画像選択部を有しないクライアント端末140のような端末から送られる場合は、画像生成に用いられる画像の選択はユーザの指示によって中央サーバ100の画像選択部112を用いて選択されてもよく、又は、中央サーバ100の画像選択部112に自動的(例えば、ランダム)に選択させてもよい。
[Configuration of Client Terminal 140]
Similar to the client terminal 130, the client terminal 140 is a device for transmitting an image generation request to the central server 100 via the network 150. Similarly to the client terminal 130, the client terminal 140 includes a processing unit 142 that executes commands sent from other functional units in the terminal, and an instruction receiving unit 144 that receives instructions from the user (such as image generation conditions). A communication unit 146 that manages exchanges with the central server 100 and other network terminals (for example, the client terminal 130), and a storage unit 148 that stores information (image data, commands from the user, and the like). The client terminal 140 differs from the client terminal 130 in that it does not have an image selection unit such as the image selection unit 136. Therefore, when an image generation request is sent from a terminal such as the client terminal 140 that does not have an image selection unit, an image used for image generation is selected using the image selection unit 112 of the central server 100 according to a user instruction. Alternatively, the image selection unit 112 of the central server 100 may select automatically (for example, randomly).
 次に、図2を参照して、第1実施形態における背景画像取得、検出対象特定、検出対象画像モデルの作成、及び検出対象画像の確立について説明する。図2は、本発明の第1実施形態に係る画像生成方法の流れを示すフローチャートである。 Next, background image acquisition, detection target identification, detection target image model creation, and detection target image establishment in the first embodiment will be described with reference to FIG. FIG. 2 is a flowchart showing the flow of the image generation method according to the first embodiment of the present invention.
 まず、ステップS200では、背景画像が取得される。本明細書では、取得するという表現は、入手したり、受信したり、確保したり、調達したり、選択したり、指定したりすることを含む。背景画像とは、後述する検出対象画像が配置されることで最終画像となる画像である。背景画像は例えば、駅のホーム、空港の搭乗ゲート、コンサートやスポーツ試合の会場、ショッピングモール等の様々な環境の様子を写す画像であってもよい。この背景画像はユーザに指定されてもよく(例えば、クライアント端末130の画像選択部136)、ユーザが入力した指示(例えばクライアント端末140を介して)に応じて記憶部120の画像データベース122に保存されている画像の中から中央サーバ100の画像選択部112に選択されてもよい。 First, in step S200, a background image is acquired. In this specification, the expression to acquire includes obtaining, receiving, securing, procuring, selecting, and specifying. The background image is an image that becomes a final image by arranging a detection target image described later. For example, the background image may be an image showing various environments such as a station platform, an airport boarding gate, a concert or sports game venue, or a shopping mall. This background image may be designated by the user (for example, the image selection unit 136 of the client terminal 130), and is stored in the image database 122 of the storage unit 120 according to an instruction (for example, via the client terminal 140) input by the user. The selected image may be selected by the image selection unit 112 of the central server 100 from among the images being displayed.
 次に、ステップS220では、検出対象画像が特定される。本明細書では、特定するという表現は、選定したり、選択したり、設定したり、指定したり、又は識別したり、検知したりすることを含む。検出対象画像とは、ユーザが背景画像に配置したい物体が写る画像である。例えば、検出対象画像は、画像内から検出できるように機械学習部を訓練する対象の物体を示す画像であってもよい。検出対象画像は、例えば、白杖を持っている人、特定の服装をした人、所定の大きさを超えた荷物、ある種類の動物等を含んでもよい。また、この検出対象画像はメタデータを備えていてもよい。ここでのメタデータとは、2次元の座標等の検出対象画像の位置を表す情報、検出対象画像の形状(矩形、丸い)や大きさ(ピクセルで見た長さ・高さ等)を示す情報、及び検出対象画像の性質(人間、動物、荷物、自動車等のラベル)を表す情報を含んでもよい。このメタデータは、後述する機械学習訓練に用いられてもよい。 Next, in step S220, a detection target image is specified. In this specification, the expression specifying includes selecting, selecting, setting, specifying, identifying, or detecting. The detection target image is an image in which an object that the user wants to place in the background image is captured. For example, the detection target image may be an image showing an object to be trained by the machine learning unit so that it can be detected from within the image. The detection target image may include, for example, a person who has a white cane, a person who wears specific clothes, a luggage exceeding a predetermined size, a certain kind of animal, and the like. Further, the detection target image may include metadata. The metadata here indicates information indicating the position of the detection target image such as two-dimensional coordinates, and the shape (rectangle, round) and size (length / height as viewed in pixels) of the detection target image. Information that represents the information and the nature of the detection target image (labels of people, animals, luggage, cars, etc.) may be included. This metadata may be used for machine learning training described later.
 この検出対象画像は、ユーザが入力したソース画像から中央サーバ100の画像選択部112によって特定されてもよく、又は直接にユーザの指示によって特定されてもよい。一例として、背景画像が工事現場の画像として指定された場合には、クライアント端末130の指示受付部134又はクライアント端末140の指示受付部144は、検出対象画像として、ソース画像に写っている「ヘルメット未着用の人」として指定し、その要求をユーザから受信したとする。この場合、この要求を受信した指示受付部134はユーザの指示を中央サーバ100に送信し、中央サーバ100の画像選択部112はユーザの要求に合わせてソース画像内から、ヘルメット未着用の人を検出対象画像として特定してもよい。 The detection target image may be specified by the image selection unit 112 of the central server 100 from the source image input by the user, or may be specified directly by a user instruction. As an example, when the background image is designated as an image of a construction site, the instruction receiving unit 134 of the client terminal 130 or the instruction receiving unit 144 of the client terminal 140 is displayed as a “target image” in the source image. It is assumed that the user is designated as “unweared person” and the request is received from the user. In this case, the instruction receiving unit 134 that has received this request transmits a user instruction to the central server 100, and the image selection unit 112 of the central server 100 selects a person who has not worn a helmet from the source image in accordance with the user request. You may specify as a detection target image.
 次に、ステップS240では、検出対象画像モデルが作成される。本明細書では、作成するという表現は、生成したり、創造したり、形成したり、用意したり、作り出したりすることを含む。検出対象画像モデルは、検出対象画像に示されている物体の形状や構造を具現化する模型である。検出対象画像モデルとしては、例えば、ベクトルモデル、点群、又は3Dモデル等が使われてもよい。検出対象画像モデルは、例えば、周知のモデル作成ツールによって自動的に行われてもよい。後述するように、ここで作成された検出対象画像モデルは、中央サーバ100の画像処理部116及び後述する敵対的生成ネットワークにおいて加工されることにより、より現実の画像に近い画像として仕上げられてもよい。 Next, in step S240, a detection target image model is created. As used herein, the expression creating includes creating, creating, forming, preparing, and creating. The detection target image model is a model that embodies the shape and structure of the object shown in the detection target image. As the detection target image model, for example, a vector model, a point cloud, or a 3D model may be used. The detection target image model may be automatically performed by, for example, a known model creation tool. As will be described later, the detection target image model created here is processed in the image processing unit 116 of the central server 100 and a hostile generation network described later, so that it can be finished as an image closer to a real image. Good.
 次に、ステップS260では、最終画像が確立される。本明細書では、確立するという表現は、樹立したり、設定したり、設立したり、生成したり、構築したり、設けたり、創造したりすることを含む。最終画像とは、ステップS200で取得した背景画像と、ステップS240で作成された検出対象画像モデルとを組み合わせることで生成された画像である。具体的には、最終画像は、中央サーバ100のモデル作成部114が背景画像と検出対象画像モデルとを結合させることで生成されてもよい。背景画像と検出対象画像モデルを結合させ、最終画像を生成する工程の詳細は図7を参照して説明するため、ここでの説明を省略する。 Next, in step S260, a final image is established. As used herein, the phrase establishing includes establishing, setting, establishing, creating, building, providing, and creating. The final image is an image generated by combining the background image acquired in step S200 and the detection target image model created in step S240. Specifically, the final image may be generated by the model creation unit 114 of the central server 100 combining the background image and the detection target image model. Details of the process of combining the background image and the detection target image model to generate the final image will be described with reference to FIG.
 このように、背景画像を取得し、メタデータを備えた検出対象画像をソース画像から特定し、検出対象画像に対応する検出対象画像モデルを生成し、背景画像と前記検出対象画像モデルとを結合させ、最終画像を確立することで、実際の検出対象画像が少なく、入手困難な場合にも、機械学習用の画像を生成することができる。 In this way, a background image is acquired, a detection target image including metadata is identified from the source image, a detection target image model corresponding to the detection target image is generated, and the background image and the detection target image model are combined. Then, by establishing the final image, it is possible to generate an image for machine learning even when there are few actual detection target images and it is difficult to obtain.
 次に、図3を参照して、本発明の第1実施形態に係るカメラパラメータの計算手法について説明する。図3に示されるように、背景画像320と、改札口321と、水平線323と、検出対象画像モデル327とがカメラパラメータの計算に用いられる。 Next, the camera parameter calculation method according to the first embodiment of the present invention will be described with reference to FIG. As shown in FIG. 3, the background image 320, the ticket gate 321, the horizontal line 323, and the detection target image model 327 are used for the calculation of the camera parameters.
 上記説明した検出対象画像を確立する工程において、検出対象画像モデル327を適切な大きさ等で背景画像320に配置するためには、背景画像320のカメラパラメータを計算する必要がある。ここで計算されたカメラパラメータを用いることで、モデル作成部114は検出対象画像モデル327を適切な位置、大きさ、及び姿勢で背景画像320に配置することができる。 In the step of establishing the detection target image described above, in order to place the detection target image model 327 on the background image 320 with an appropriate size or the like, it is necessary to calculate the camera parameters of the background image 320. By using the camera parameters calculated here, the model creation unit 114 can place the detection target image model 327 in the background image 320 at an appropriate position, size, and orientation.
 カメラパラメータを計算するために、まず、基準の物体が背景画像320から識別される。ここでの基準の物体とは、背景画像320に配置される検出対象画像モデルのサイズの目安となる物体である。例えば、基準の物体は、大きさが一般的に知られている、又は推定・推測されやすいものであってもよい。一例として、ここでは、改札口321が基準の物体として識別されてもよい。次に、識別された基準の物体の寸法要素(高さ、長さ等)に基づいて、カメラパラメータが計算される。具体的には、基準の物体として識別された改札口321の背景画像320における位置に合わせた水平線323等の基準を設定し、改札口321のピクセル数で見た長さ・高さが測定される。そして、改札口321の実際の長さ・高さを、背景画像320におけるピクセル数で見た長さ・高さで割ることにより得た割合を用いることで、検出対象画像モデルのあるべき大きさを容易に計算することができる。したがって、ここで計算されたカメラパラメータに基づいて、検出対象画像モデル327を背景画像320に結合させることにより、最終画像を生成することができる。 In order to calculate camera parameters, a reference object is first identified from the background image 320. The reference object here is an object that serves as a guide for the size of the detection target image model arranged in the background image 320. For example, the reference object may be one whose size is generally known or easily estimated and estimated. As an example, here, the ticket gate 321 may be identified as a reference object. Next, camera parameters are calculated based on the dimensional elements (height, length, etc.) of the identified reference object. Specifically, a reference such as a horizontal line 323 that matches the position of the ticket gate 321 identified as the reference object in the background image 320 is set, and the length and height of the ticket gate 321 as measured by the number of pixels are measured. The Then, by using the ratio obtained by dividing the actual length / height of the ticket gate 321 by the length / height as viewed from the number of pixels in the background image 320, the size of the image model to be detected should be Can be easily calculated. Therefore, the final image can be generated by combining the detection target image model 327 with the background image 320 based on the camera parameters calculated here.
 このように、基準の物体の寸法要素に基づいて、カメラパラメータを計算し、計算された前記カメラパラメータに基づいて、検出対象画像モデルを背景画像に結合させることで、検出対象画像モデルを適切な大きさ・位置・向きで配置することができる。 As described above, the camera parameter is calculated based on the dimension element of the reference object, and the detection target image model is appropriately combined with the background image based on the calculated camera parameter. It can be arranged in size, position and orientation.
 次に、図4を参照して、本発明の第1実施形態に係る画像生成方法の変形例の流れについて説明する。 Next, a flow of a modification of the image generation method according to the first embodiment of the present invention will be described with reference to FIG.
 まず、ステップS400では、背景画像が取得される。ここでの背景画像取得は、図3におけるS200の背景画像取得と実質的に同様であるため、ここでの説明を省略する。 First, in step S400, a background image is acquired. The background image acquisition here is substantially the same as the background image acquisition in S200 in FIG. 3, and thus the description thereof is omitted here.
 次に、ステップS420では、検出対象画像を表すベクトルモデルがステップS400で取得した背景画像に配置される。ベクトルモデルとは、検出対象画像に写っている物体の形状や構造を空間ベクトルで表現したモデルである。例えば、ベクトルモデルの一例は図5で示されている。図5は、本発明の第1実施形態に係るベクトルモデル及び画像の一例を示す図である。図5に示されているベクトルモデル531は人間の身体のベクトルモデルである。このベクトルモデル531は例えば、図3を用いて説明したカメラパラメータに基づいて、適切な位置、大きさ、及び向きで背景画像に配置されてもよい。 Next, in step S420, a vector model representing the detection target image is arranged on the background image acquired in step S400. A vector model is a model that expresses the shape and structure of an object shown in a detection target image by a space vector. For example, an example of a vector model is shown in FIG. FIG. 5 is a diagram illustrating an example of a vector model and an image according to the first embodiment of the present invention. A vector model 531 shown in FIG. 5 is a vector model of the human body. For example, the vector model 531 may be arranged in the background image at an appropriate position, size, and orientation based on the camera parameters described with reference to FIG.
 次に、ステップS440では、ベクトルモデルが調整される。ベクトルモデルの調整は中央サーバ100の画像処理部116によって行われてもよい。ここで、ベクトルモデル調整とは、一般的に知られている画像処理技術を使用することによりステップS420で配置したベクトルモデルをより現実の画像に近い画像(以下「現実的な画像」とも言う)へと変換することを意味する。具体的には、ここでのベクトルモデルの調整は、例えば敵対的生成ネットワーク(Generative Adversarial Network、又はGANと呼ぶこともある)によって行われてもよい。一例として画像処理部116は、背景画像に配置されたベクトルモデルを中央サーバ100の記憶部120の画像・モデルデータベースに格納されているモデルと比較し、その中から当該ベクトルモデルと類似性が一番高い画像を選択してもよい。例えば、図5が示すように、ベクトルモデル531は、白杖を持っている人物のような現実的な画像532へと変換されてもよい。ここで選択された画像は、ベクトルモデルと重ね合わせられることで最終画像が生成されてもよい。 Next, in step S440, the vector model is adjusted. The adjustment of the vector model may be performed by the image processing unit 116 of the central server 100. Here, the vector model adjustment is an image closer to an actual image (hereinafter also referred to as “realistic image”) obtained by using the generally known image processing technique to arrange the vector model in step S420. To convert to. Specifically, the adjustment of the vector model here may be performed by, for example, a hostile generation network (sometimes referred to as a generative Adversary Network or GAN). As an example, the image processing unit 116 compares the vector model arranged in the background image with a model stored in the image / model database of the storage unit 120 of the central server 100, and the similarity between the vector model and the model is one of them. The highest image may be selected. For example, as shown in FIG. 5, the vector model 531 may be converted into a realistic image 532 such as a person holding a white cane. The final image may be generated by superimposing the selected image on the vector model.
 次に、ステップS460では、ステップS440で生成された最終画像を用いて、機械学習の訓練が行われる。ここでの機械学習は、中央サーバ100の機械学習部118によって行われてもよい。上記の通り、背景画像と検出対象画像を結合させた最終画像は、敵対的生成ネットワークのようなニューラルネットワークを訓練させる手法を用いて行ってもよい。機械学習の訓練工程の詳細は後述するため、ここでの説明を省略する。 Next, in step S460, machine learning training is performed using the final image generated in step S440. The machine learning here may be performed by the machine learning unit 118 of the central server 100. As described above, the final image obtained by combining the background image and the detection target image may be performed using a technique for training a neural network such as a hostile generation network. Since the details of the machine learning training process will be described later, a description thereof is omitted here.
 次に、ステップS480では、S460で訓練された機械学習のシステムが実際に適用される。例えば、本実施形態の手法によって生成された画像で訓練された機械学習のシステムは、例えば、全自動運転の車の事故検知、構造物のひび割れ検出、自然災害のシムレーション等の、実際の訓練画像データが入手困難な場合に有用であると考えられる。 Next, in step S480, the machine learning system trained in S460 is actually applied. For example, a machine learning system trained with images generated by the method of the present embodiment is used for actual training such as fully automatic driving car accident detection, structural crack detection, natural disaster simulation, etc. This is useful when it is difficult to obtain image data.
 このように、ベクトルモデルを配置し、調整することで、機械学習用の良質な画像が得られる。 In this way, a good quality image for machine learning can be obtained by arranging and adjusting the vector model.
 次に、図6を参照して、本発明の第1実施形態に係るベクトルモデル・画像の対応付けの一例について説明する。 Next, an example of vector model / image association according to the first embodiment of the present invention will be described with reference to FIG.
 上記の通り、本発明に係る画像生成には、ベクトルモデル及びベクトルモデルに対応する現実的な画像が用いられることがある。ここでは、現実的な画像とベクトルモデルの対応付けについて説明する。まず、検出対象画像641の元となる検出対象は実験室環境において撮影されることで得ることができる。次に、Openpose等のような一般的に知られている周知のエッジ・向き検出アルゴリズムを検出対象画像641に対して適用することで、検出対象画像641に重ね合わせられたベクトルモデルが生成される。次に、このベクトルモデル643と検出対象画像641とをお互いに対応付けられたモデル・画像ペア645として記憶部120の画像・モデルデータベース124に保存することで、中央サーバ100の処理部110の各機能部にアクセス可能にすることができる。 As described above, the image generation according to the present invention may use a vector model and a realistic image corresponding to the vector model. Here, the association between a realistic image and a vector model will be described. First, a detection target that is a source of the detection target image 641 can be obtained by being photographed in a laboratory environment. Next, a generally known edge / orientation detection algorithm such as Openpose is applied to the detection target image 641 to generate a vector model superimposed on the detection target image 641. . Next, the vector model 643 and the detection target image 641 are stored in the image / model database 124 of the storage unit 120 as a model / image pair 645 associated with each other, whereby each of the processing units 110 of the central server 100 is stored. The functional part can be made accessible.
 このように、ベクトルモデルと現実的な画像をお互いに対応付けた形で格納することで、敵対的生成ネットワークを容易に訓練できるというメリットがある。 As described above, there is an advantage that the hostile generation network can be easily trained by storing the vector model and the realistic image in association with each other.
 次に、図7を参照して、本発明の第1実施形態に係る検出対象画像を確立するための処理の一例について説明する。 Next, an example of a process for establishing a detection target image according to the first embodiment of the present invention will be described with reference to FIG.
 図7に示されるように、駅のホームの様子を示す画像が背景画像301として取得されている。図3を参照して説明したように、背景画像301のカメラパラメータは線路302や改札口303のような基準の物体に基づいて計算される。次に、モデル作成部114は、上記計算したカメラパラメータに指定される位置・大きさ・向きに応じて、ユーザから要求された検出対象画像に対応するベクトルモデル304を背景画像301に配置する。次に、画像処理部116はベクトルモデル304に対応する現実的な画像305を生成し、ベクトルモデル304と同じ位置・大きさ・向きで背景画像301に挿入する。なお、この段階で、この現実的な画像を背景画像301に溶け込ませるための光調整・エッジ調和等の画像処理が施されてもよい。このように、検出対象画像に対応する現実的な画像305と背景画像301とを結合させることで、最終画像309が得られる。また、上記の通り、この最終画像309はニューラルネットワークやサポートベクターマシン等の機械学習手法を訓練させるために使用されてもよい。このように、検出対象が少なく、入手困難な場合にも、本実施形態の発明を用いれば、機械学習用の画像を生成することができる。 As shown in FIG. 7, an image showing the state of the station platform is acquired as the background image 301. As described with reference to FIG. 3, the camera parameters of the background image 301 are calculated based on reference objects such as the track 302 and the ticket gate 303. Next, the model creation unit 114 places a vector model 304 corresponding to the detection target image requested by the user in the background image 301 in accordance with the position, size, and orientation specified by the calculated camera parameter. Next, the image processing unit 116 generates a realistic image 305 corresponding to the vector model 304 and inserts it into the background image 301 at the same position, size, and orientation as the vector model 304. Note that at this stage, image processing such as light adjustment and edge harmony may be performed in order to merge this realistic image into the background image 301. In this manner, the final image 309 is obtained by combining the realistic image 305 corresponding to the detection target image and the background image 301. Further, as described above, this final image 309 may be used to train a machine learning method such as a neural network or a support vector machine. As described above, even when the number of detection targets is small and it is difficult to obtain, the machine learning image can be generated by using the invention of this embodiment.
 なお、本実施形態では1つの検出対象画像を背景画像に結合させる例を説明したが、本発明はそれに限定されず、上記の工程を繰り返すことで複数の検出対象画像を1つの背景画像に配置することも可能である。 In this embodiment, an example in which one detection target image is combined with a background image has been described. However, the present invention is not limited to this, and a plurality of detection target images are arranged in one background image by repeating the above steps. It is also possible to do.
 次に、図8を参照して、本発明の実施形態に係る画像生成方法の流れについて説明する。 Next, the flow of the image generation method according to the embodiment of the present invention will be described with reference to FIG.
 まず、ステップS400では、背景画像が取得される。ステップS420では、ベクトルモデルが背景画像に配置される。S440では、ベクトルモデルが調整される。これらの工程は、図4を参照して説明した画像生成方法と実質的に同様であるため、ここでの説明を省略する。 First, in step S400, a background image is acquired. In step S420, the vector model is placed on the background image. In S440, the vector model is adjusted. Since these steps are substantially the same as the image generation method described with reference to FIG. 4, description thereof is omitted here.
 上記では、生成された最終画像を機械学習訓練に用いる例を説明したが、本発明はそれに限定されず、最終画像を別の目的のために使用されてもよい。従って、ステップS800では、ステップS440でベクトルモデルを調整した後、生成された最終画像が提供される。この最終画像は例えば、画像の生成を要求した相手に提供されてもよく、第三者に送信されてもよい。機械学習訓練の他にも、最終画像は広告、顔認識、物体検出、画像処理等に適用されてもよい。このように、本発明に係る画像生成方法は機械学習を訓練させるためだけでなく、様々な分野に応用されてもよい。 In the above, an example in which the generated final image is used for machine learning training has been described. However, the present invention is not limited to this, and the final image may be used for another purpose. Accordingly, in step S800, the final image generated is provided after adjusting the vector model in step S440. This final image may be provided, for example, to the party who requested the image generation, or may be transmitted to a third party. In addition to machine learning training, the final image may be applied to advertising, face recognition, object detection, image processing, and the like. As described above, the image generation method according to the present invention may be applied not only to train machine learning but also to various fields.
 次に、図9を参照して、本発明の第2実施形態に係る検出対象画像を確立するための処理の一例について説明する。 Next, an example of processing for establishing a detection target image according to the second embodiment of the present invention will be described with reference to FIG.
 画像生成要求の条件によっては、検出対象画像を表すモデル(ベクトルモデル等)を生成することが難しいあるいは、不要な場合がある。例えば、一例として、検出対象画像の細かな要素(例えば、色、形状、構造等)を描写する必要がない場合には、最終画像のファイルサイズを抑えるために、ベクトルモデルや現実的な画像より粗末な画像で代用してもよいケースがある。そのため、本発明の第2実施形態に係る検出対象画像を確立するための処理は、ベクトルモデルや現実的な画像ではなく、部分的な画像を単に検出対象画像として背景画像に挿入することで、上記の課題を解決する。 Depending on the conditions for image generation request, it may be difficult or unnecessary to generate a model (such as a vector model) representing the detection target image. For example, as an example, when it is not necessary to describe detailed elements (for example, color, shape, structure, etc.) of the detection target image, in order to reduce the file size of the final image, a vector model or a realistic image is used. There are cases where a poor image may be substituted. Therefore, the process for establishing the detection target image according to the second embodiment of the present invention is not a vector model or a realistic image, but simply inserting a partial image into the background image as a detection target image, Solve the above problems.
 図9は、駅のホームの様子を示す画像が背景画像301として取得されている。そして、図9では、図7と同じように、背景画像301には線路302や改札口303等の基準の物体を含んでいる。次に、第2の実施形態においては線路302や改札口303等の基準の物体に基づいて計算されたカメラパラメータ(あるいはユーザに入力された画像生成条件)に指定される位置・大きさ等に応じて、部分的画像314が検出対象画像として定義される。この部分的画像は例えば、任意の大きさや形状で作られており、背景画像内の一定領域となる画像である。なお、図9に示される部分的画像314は矩形の領域として示されているが、本発明に係る部分的画像314の領域の形状は矩形だけに限定されず、任意の形状であってもよい。このように、部分的画像314が挿入された背景画像301を最終画像315とすることで、第1実施形態で説明した画像生成方法による最終画像に比べてファイルサイズが低い最終画像が得られる。また、図9に示されるように、部分的画像314のサイズに合わせた画像(例えば画像データベース122に格納されている画像又はユーザに選択された画像)は部分画像314の領域内に挿入されてもよい。 In FIG. 9, an image showing the state of the station platform is acquired as the background image 301. In FIG. 9, as in FIG. 7, the background image 301 includes reference objects such as the track 302 and the ticket gate 303. Next, in the second embodiment, the position / size specified in the camera parameter (or the image generation condition input by the user) calculated based on the reference object such as the track 302 and the ticket gate 303 is set. Accordingly, the partial image 314 is defined as the detection target image. This partial image is, for example, an image that is made in an arbitrary size and shape and becomes a certain area in the background image. Although the partial image 314 shown in FIG. 9 is shown as a rectangular area, the shape of the area of the partial image 314 according to the present invention is not limited to a rectangle, and may be any shape. . As described above, by setting the background image 301 into which the partial image 314 is inserted as the final image 315, a final image having a smaller file size than the final image obtained by the image generation method described in the first embodiment can be obtained. Also, as shown in FIG. 9, an image that matches the size of the partial image 314 (for example, an image stored in the image database 122 or an image selected by the user) is inserted into the area of the partial image 314. Also good.
 このように、第2実施形態によれば、画像のファイルサイズを抑制できる効果が得られる。 Thus, according to the second embodiment, an effect of suppressing the file size of the image is obtained.
 次に、図10を参照して、本発明の第3実施形態に係る検出対象画像確立処理の一例について説明する。 Next, an example of the detection target image establishment process according to the third embodiment of the present invention will be described with reference to FIG.
 選択された背景画像によっては、検出対象画像とする物体が不鮮明又は不完全なため、検出対象画像を表すモデル(ベクトルモデル等)を生成することが難しい場合がある。例えば、一例として、検出対象画像とする物体の一部がぼやけたり、切れたりし、又は複数写っていることがあると、正確なベクトルモデルを作成することが困難であり、機械学習訓練に用いられる画像が生成できない。そのため、本発明の第3実施形態に係る検出対象画像を確立するための処理は、鮮明な部分的画像を検出対象画像として背景画像に挿入又は置換することで、上記の課題を解決する。 Depending on the selected background image, it may be difficult to generate a model (such as a vector model) representing the detection target image because an object to be detected is unclear or incomplete. For example, as an example, if a part of the object to be detected is blurred, cut off, or appears in multiple images, it is difficult to create an accurate vector model, which is used for machine learning training. The generated image cannot be generated. Therefore, the process for establishing the detection target image according to the third embodiment of the present invention solves the above problem by inserting or replacing a clear partial image as a detection target image in the background image.
 図10に示されるように、検出対象画像とする候補の物体(例えば、人物)324が写っている駅のホームの様子を示す画像が背景画像301として取得されている。しかし、この候補の物体324は、例えば不鮮明であったり、一部が欠落しているため、この候補の物体324を正確に表すモデルを生成することが困難な場合が存在する。このような場合には、本実施形態の検出対象画像を確立するための処理において、部分的画像325を候補の物体324の全体あるいは一部を取り囲むように描くこととする。そして、この部分的画像325は、例えばユーザによってGUI等を介して指定されてもよく、または機械学習部118によって自動的に生成されてもよい。次に、中央サーバ100の画像処理部116によって、部分的画像325が指定されている領域に対して画像処理を施すことで、候補の物体324を検出対象画像326として仕上げ、検出対象画像326が写っている背景画像301を最終画像329とすることで、機械学習に用いることができる最終画像が得られる。なお、ここでの部分的画像325は第3実施形態において説明した部分的画像314と実質的に同様である。 As shown in FIG. 10, an image indicating the state of the platform at the station where a candidate object (for example, a person) 324 as a detection target image is captured is acquired as the background image 301. However, since this candidate object 324 is, for example, unclear or partially missing, it may be difficult to generate a model that accurately represents this candidate object 324. In such a case, in the process for establishing the detection target image of the present embodiment, the partial image 325 is drawn so as to surround the whole or part of the candidate object 324. The partial image 325 may be designated by a user via a GUI or the like, or may be automatically generated by the machine learning unit 118. Next, the image processing unit 116 of the central server 100 performs image processing on the area where the partial image 325 is designated, thereby finishing the candidate object 324 as the detection target image 326, so that the detection target image 326 is displayed. By using the reflected background image 301 as the final image 329, a final image that can be used for machine learning is obtained. The partial image 325 here is substantially the same as the partial image 314 described in the third embodiment.
 このように、第3の実施形態によれば、検出対象画像が不鮮明・不完全な場合においても、良質な画像を生成することができ、画像のファイルサイズを押させる効果が得られる。 As described above, according to the third embodiment, even when the detection target image is unclear or incomplete, it is possible to generate a high-quality image and obtain an effect of pushing the file size of the image.
 次に、図11を参照して、本発明の第4実施形態に係る機械学習画像作成能力向上工程の一例について説明する。 Next, an example of a machine learning image creation capability improving process according to the fourth embodiment of the present invention will be described with reference to FIG.
 上記の通り、本発明の態様は、上記の画像生成方法で生成された画像を機械学習の訓練に用いることに関する。以下、機械学習の画像作成能力を向上する例を敵対的生成ネットワークについて説明するが、本発明はそれに限定されず、サポートベクターマシン等、任意の機械学習手法に適用されてもよい。 As described above, an aspect of the present invention relates to the use of an image generated by the above-described image generation method for machine learning training. Hereinafter, an example of improving the image generation capability of machine learning will be described with respect to a hostile generation network, but the present invention is not limited thereto, and may be applied to any machine learning method such as a support vector machine.
 敵対的生成ネットワークとは、生成ネットワーク(generator)と識別ネットワーク(discriminator)の2つのネットワークから構成され、2つのデータセットを競合させることで学習していくネットワークである。具体的には、基礎となる基本画像と、ネットワークに生成して欲しい目的画像のペアが入力されると、生成側が結果として作成画像を生成し、出力する。この作成画像は、目的画像に類似していれば類似しているほどよい。識別側がこの作成画像と目的画像を比較することで、作成画像の精度を判定する。このように、生成側は識別側を欺こうと学習し、識別側はより正確に識別しようと学習する The hostile generation network is a network composed of two networks, a generation network (generator) and an identification network (discriminator), and learns by competing two data sets. Specifically, when a pair of a basic basic image and a target image desired to be generated in a network is input, the generation side generates and outputs a created image as a result. It is better that the created image is similar to the target image. The identification side compares the created image with the target image to determine the accuracy of the created image. Thus, the generator learns to deceive the discriminator, and the discriminator learns to identify more accurately
 本実施形態では、まず、ステップ1110では、検出対象画像の元となる検出対象を実験室環境において撮影して、検出対象画像を取得する。例えば、図11に示されるように、白状を持っている人物を撮影することで、白状を持っている人物が検出対象画像として得られる。次に、ステップ1120では、ステップ1110で入手した検出対象画像に対してOpenpose等のような一般的に知られている周知のエッジ・向き検出アルゴリズムを適用することで、検出対象画像に対応するベクトルモデルが生成される。例えば、検出対象画像が人物の場合、図11に示されるように、人物の頭、肩、腕、胴体、脚等の部分を表すベクトルモデルが生成されてもよい。なお、検出対象がスーツケースや自動車等のようなエッジを有するものの場合には、エッジ抽出技術が適用されてもよい。これらのエッジはスプライン等で表現されてもよい。 In the present embodiment, first, in step 1110, a detection target that is a source of the detection target image is photographed in a laboratory environment to obtain a detection target image. For example, as shown in FIG. 11, by photographing a person who has a white color, a person having a white color can be obtained as a detection target image. Next, in step 1120, a vector corresponding to the detection target image is applied by applying a generally known edge / orientation detection algorithm such as Openpose to the detection target image obtained in step 1110. A model is generated. For example, when the detection target image is a person, as shown in FIG. 11, a vector model representing a person's head, shoulders, arms, torso, legs, and the like may be generated. In addition, when the detection target has an edge such as a suitcase or a car, an edge extraction technique may be applied. These edges may be expressed by splines or the like.
 次に、ステップ1130では、ステップ1110で撮影された検出対象画像と、ステップ1120で生成されたベクトルモデルとが、お互いに対応付けられたモデル・画像ペアとして記憶部120の画像・モデルデータベース124に保存されてもよい。次に、ステップ1140では、このベクトルモデルが配置された背景画像が基本画像として敵対的生成ネットワーク(第2ニューラルネットワークと呼ばれることもある)に入力される。そして、ステップ1150では、敵対的生成ネットワークの生成ネットワークはステップ1130で対応付けられたモデル・画像ペアに基づいて、ベクトルモデルを現実的な画像へと変換することで、ベクトルモデルに対応する現実的な画像が背景画像に写っている画像を作成画像として作成する。 In step 1130, the detection target image captured in step 1110 and the vector model generated in step 1120 are stored in the image / model database 124 of the storage unit 120 as a model / image pair associated with each other. It may be saved. Next, in step 1140, the background image in which the vector model is arranged is input as a basic image to a hostile generation network (sometimes called a second neural network). In step 1150, the generation network of the hostile generation network converts the vector model into a realistic image based on the model / image pair associated in step 1130, thereby realizing a realistic corresponding to the vector model. An image in which a simple image is reflected in the background image is created as a creation image.
 次に、敵対的生成ネットワークは、ステップ1110で撮影した検出対象(目的画像)とステップ1150で作成した作成画像を比較する。具体的には、敵対的生成ネットワークの識別ネットワークが、目的画像と作成画像のそれぞれのメタデータ(画像に写っている物体の位置、形状、大きさ、性質等を定義する情報)を比較してもよい。さらに、識別ネットワークは目的画像及び作成画像を所定の類似度基準を用いて比較してもよい。この類似度基準とは、例えば、2つ以上の画像がお互いに類似している度合の閾値であってもよい。目的画像及び作成画像が所定の類似度基準を達成する場合(つまり、目的画像と作成画像がお互いに十分に類似していると判定された場合)には、敵対的生成ネットワークのパラメータが調整される。このパラメータ調整とは、例えば、この作成画像を作成するために使用された条件を、他の画像生成にも適用されるように設定することを含む。 Next, the hostile generation network compares the detection target (target image) photographed in step 1110 with the created image created in step 1150. Specifically, the identification network of the hostile generation network compares the metadata of the target image and the created image (information that defines the position, shape, size, property, etc. of the object in the image). Also good. Further, the identification network may compare the target image and the created image using a predetermined similarity criterion. This similarity criterion may be, for example, a threshold of the degree to which two or more images are similar to each other. If the target image and the generated image meet a predetermined similarity criterion (that is, if it is determined that the target image and the generated image are sufficiently similar to each other), the parameters of the hostile generation network are adjusted. The This parameter adjustment includes, for example, setting the conditions used for creating the created image so as to be applied to other image generation.
 このように、基本画像を敵対的生成ネットワークに入力し、基本画像に基づいて作成画像を作成し、作成画像と目的画像とを比較し、作成画像と目的画像とが所定の類似度基準を達成する場合には、敵対的生成ネットワークのパラメータを調整することで、良質な最終画像を生成することができる敵対的生成ネットワークが得られる。 In this way, the basic image is input to the hostile generation network, a created image is created based on the basic image, the created image and the target image are compared, and the created image and the target image achieve a predetermined similarity criterion. In this case, a hostile generation network that can generate a high-quality final image can be obtained by adjusting the parameters of the hostile generation network.
 次に、図12及び図13を参照して、本発明の第5実施形態に係る画像生成方法の一例について説明する。 Next, an example of an image generation method according to the fifth embodiment of the present invention will be described with reference to FIGS.
 本発明の画像生成方法によれば、1つの検出対象に対して1つのモデルを生成することだけでなく、1つの検出対象に対して複数の検出対象モデルを生成することもできる。図12に示されるように、1つの検出対象1203は複数のカメラ1207、1208、1209によって撮像されてもよい。このように、それぞれのカメラ1207、1208、1209に撮像された検出対象画像と、それぞれのカメラ1207、1208、1209に撮像された背景画像とを上記説明した画像生成方法に用いることで、同じ検出対象1203を異なる観点から示す最終画像を生成することができる。 According to the image generation method of the present invention, it is possible not only to generate one model for one detection target, but also to generate a plurality of detection target models for one detection target. As illustrated in FIG. 12, one detection target 1203 may be captured by a plurality of cameras 1207, 1208, and 1209. As described above, the detection target images captured by the respective cameras 1207, 1208, and 1209 and the background images captured by the respective cameras 1207, 1208, and 1209 are used for the image generation method described above, thereby performing the same detection. A final image showing the object 1203 from different perspectives can be generated.
 また、検出対象が移動する場合には、検出対象の動きを表現するためには、検証対象モデルを画像系列として表す必要がある。例えば、図13に示されるように、検出対象1213が矢印1215に示される方向に進むとする。検出対象1213の移動はカメラ1217によって撮像される。従って、カメラ1217で撮像された映像を上記説明した画像生成方法に用いることで、検出対象1213の移動の各フレームに対して検出対象モデル(ベクトルモデル等)を生成することができる。これらの検出対象モデルのそれぞれに対してニューラルネットワークによる画像処理を行うことで、検出対象1213の動きをスムーズに表す画像系列が得られる。なお、検出対象が移動する場合だけでなく、同じ検出対象を異なる照明環境(例えば朝と夜、または自然光と人工光)で示す画像を生成することもできる。 Also, when the detection target moves, it is necessary to represent the verification target model as an image series in order to express the movement of the detection target. For example, assume that the detection target 1213 advances in the direction indicated by the arrow 1215 as shown in FIG. The movement of the detection target 1213 is imaged by the camera 1217. Therefore, by using the image captured by the camera 1217 in the image generation method described above, a detection target model (such as a vector model) can be generated for each frame of movement of the detection target 1213. By performing image processing using a neural network for each of these detection target models, an image series that smoothly represents the movement of the detection target 1213 can be obtained. Note that not only when the detection target moves, but also an image showing the same detection target in different illumination environments (for example, morning and night, or natural light and artificial light) can be generated.
 なお、ここでは、単一の検出対象を異なる観点で見た検出対象画像を生成する例を説明したが、本発明はそれに限定されず、複数の異なる物体を表す検出対象画像モデルを同じ背景画像に結合させることも可能である。具体的には、モデル作成部114は第1検出対象画像に対応する第1検出対象モデルと、第2検出対象画像に対応する第2検出対象画像モデルを生成してもよい。次に、上記説明したように、モデル作成部114は第1背景画像を取得する。最後に、モデル作成部114は、第1検出対象画像モデル及び第2検出対象画像モデルを第1背景画像に挿入してもよい。 In addition, although the example which produces | generates the detection target image which looked at the single detection target from the different viewpoint was demonstrated here, this invention is not limited to it, The detection target image model showing a several different object is the same background image It is also possible to combine them. Specifically, the model creation unit 114 may generate a first detection target model corresponding to the first detection target image and a second detection target image model corresponding to the second detection target image. Next, as described above, the model creation unit 114 acquires the first background image. Finally, the model creation unit 114 may insert the first detection target image model and the second detection target image model into the first background image.
 このように、1つの検出対象に対して複数の検出対象モデルを生成し、又は画像系列を検証対象モデルとして生成することで、訓練効果が高い画像を得ることができる。 Thus, by generating a plurality of detection target models for one detection target or generating an image series as a verification target model, an image with a high training effect can be obtained.
 次に、図14を参照して、本発明の第6実施形態に係る機械学習検出精度向上工程の一例について説明する。 Next, an example of a machine learning detection accuracy improving process according to the sixth embodiment of the present invention will be described with reference to FIG.
 上記の通り、本発明の態様は、上記の画像生成方法で生成された画像を機械学習訓練に用いることに関する。以下、機械学習の物体検出精度を向上する例をFaster-RCNNやSVM等のニューラルネットワークについて説明するが、本発明はそれに限定されず、任意の物体検出アルゴリズムや機械学習手法に適用されてもよい。 As described above, an aspect of the present invention relates to using an image generated by the above-described image generation method for machine learning training. Hereinafter, an example of improving the object detection accuracy of machine learning will be described with respect to a neural network such as Faster-RCNN and SVM. However, the present invention is not limited to this and may be applied to any object detection algorithm or machine learning method. .
 まず、第1ニューラルネットワーク(物体検出ニューラルネットワークとも呼ばれる)を最適化するために、検出対象画像モデルに関連付けられたメタデータが物体検出ニューラルネットワークに提供される。このメタデータは、上記説明した通り、画像401における検出対象モデル402の位置、形状、大きさ、性質等の特性を定義する情報であってもよい。画像401は、上記説明したいずれかの画像生成方法によって生成された最終画像であってもよい(例えば、図3の最終画像309、図9の最終画像315、図10の最終画像329等)。又は、検出対象モデル402のメタデータだけでなく、検出対象モデルを含む画像401が丸ごと物体検出ニューラルネットワークに提供されてもよい。 First, in order to optimize the first neural network (also referred to as an object detection neural network), metadata associated with the detection target image model is provided to the object detection neural network. As described above, the metadata may be information defining characteristics such as the position, shape, size, and property of the detection target model 402 in the image 401. The image 401 may be a final image generated by any of the image generation methods described above (for example, the final image 309 in FIG. 3, the final image 315 in FIG. 9, the final image 329 in FIG. 10, etc.). Alternatively, not only the metadata of the detection target model 402 but also the image 401 including the detection target model may be provided to the object detection neural network.
 次に、対象画像404が物体検出ネットワークに提供される。この対象画像404は、例えば、画像401に写っている検出対象モデル402と同じあるいは類似している対象物体405を含む画像であり、物体検出の対象とする画像である。次に、物体検出ネットワークは対象画像404に対して物体検出最適化403を行い、検出対象モデル402のメタデータに基づいて、対象画像404の中から対象物体405を特定しようとする。具体的には、物体検出ネットワークは検出対象モデル402のメタデータを対象画像404に写っている物体と比較し、このメタデータと合致性が一番高い物体を特定する。図10に示されるように、物体検出ネットワークは特定した対象物体405を取り囲む四角い領域406等で示してもよい。 Next, the target image 404 is provided to the object detection network. The target image 404 is, for example, an image including a target object 405 that is the same as or similar to the detection target model 402 shown in the image 401, and is an image that is a target for object detection. Next, the object detection network performs object detection optimization 403 on the target image 404 and attempts to identify the target object 405 from the target image 404 based on the metadata of the detection target model 402. Specifically, the object detection network compares the metadata of the detection target model 402 with the object shown in the target image 404, and specifies the object having the highest matching with the metadata. As shown in FIG. 10, the object detection network may be indicated by a square area 406 or the like surrounding the specified target object 405.
 次に、物体検出の結果に基づいて、物体検出ネットワークの特定精度が算出される。この特定精度とは、物体検出ネットワークが特定した物体が検出対象モデル402とどのぐらい一致したか、すべての対象物体が特定されたか、対象物体以外の物体が間違って特定されたか等のファクターを評価し、その結果を定量的な形で表現する処理である。この特定度は例えば、75%や91%等のパーセントで表されてもよい。一例として、10個の対象物体のうち、9個が正しく特定された場合には、算出される特定精度を90%としてもよい。次に、算出された特定精度は所定の特定精度基準(予め定められた精度の閾値)と比較されてもよい。算出された特定精度が所定の特定精度基準を達成しない場合には、上記説明した物体検出最適化を繰り返して行われることが決定されてもよい(つまり、物体検出を繰り返すことでよりよい特定精度を求める)。 Next, the specific accuracy of the object detection network is calculated based on the result of the object detection. This identification accuracy evaluates factors such as how much the object identified by the object detection network matches the detection target model 402, whether all target objects have been identified, or objects other than the target object have been incorrectly identified. The result is a process of expressing the result in a quantitative form. For example, the degree of specification may be expressed as a percentage such as 75% or 91%. As an example, when 9 out of 10 target objects are correctly identified, the calculated identification accuracy may be 90%. Next, the calculated specific accuracy may be compared with a predetermined specific accuracy criterion (a predetermined accuracy threshold). If the calculated specific accuracy does not achieve a predetermined specific accuracy criterion, it may be determined that the object detection optimization described above is repeated (that is, better specific accuracy by repeating object detection). Seeking).
 このように、検出対象画像モデルに関連付けられたメタデータを物体検出ネットワークに提供し、メタデータに基づいて、対象とする画像の中から検出対象画像を物体検出ネットワークに特定させ、当該特定の結果により、特定精度を算出し、物体検出最適化を行うことにより、物体検出ネットワークの検出精度を向上させる効果が得られる。 In this way, the metadata associated with the detection target image model is provided to the object detection network, and based on the metadata, the detection target image is specified from the target image to the object detection network. Thus, by calculating the specific accuracy and optimizing the object detection, an effect of improving the detection accuracy of the object detection network can be obtained.
 次に、図15を参照して、本発明の実施形態に係るシステムアーキテクチャの一例について説明する。 Next, an example of the system architecture according to the embodiment of the present invention will be described with reference to FIG.
 上記説明したように、本発明はクライアント・サーバアーキテクチャとして構成されてもよい。具体的には、図15に示されるように、ユーザ1401は、コンピュータ、タブレットPC,スマートフォン等のような端末1402を介して、希望の背景画像及び希望の検出対象を指定してもよい。次に、クラウド1403上のサーバは、ユーザ1401が指定した検出対象1409と背景画像1408及び/又は記憶部1404に格納されているデータを用いて、最終画像を生成してもよい。 As described above, the present invention may be configured as a client / server architecture. Specifically, as illustrated in FIG. 15, the user 1401 may specify a desired background image and a desired detection target via a terminal 1402 such as a computer, a tablet PC, or a smartphone. Next, the server on the cloud 1403 may generate a final image using the detection target 1409 specified by the user 1401 and the data stored in the background image 1408 and / or the storage unit 1404.
 別のシステムアーキテクチャとしては、端末1402を含まない構成も可能である。この場合には、カメラ1405は直接にクラウド1403に接続されてもよく、カメラ1405によって撮影された画像や映像はユーザの端末を介さずに画像生成サービス提供者に送信されてもよい。この場合、ユーザは電子メール、電話、スマートフォン等の別の手段を用いて希望の検出対象を連絡してもよい。 As another system architecture, a configuration not including the terminal 1402 is also possible. In this case, the camera 1405 may be directly connected to the cloud 1403, and an image or video captured by the camera 1405 may be transmitted to the image generation service provider without going through the user terminal. In this case, the user may contact a desired detection target using another means such as an e-mail, a telephone, or a smartphone.
100 中央サーバ,110 処理部,112 画像選択部,114 モデル作成部,116 画像処理部,118 機械学習部,120 記憶部,122 画像データベース,124 画像・モデルデータベース,130 クライアント端末,140 クライアント端末 100 central server, 110 processing unit, 112 image selection unit, 114 model creation unit, 116 image processing unit, 118 machine learning unit, 120 storage unit, 122 image database, 124 image / model database, 130 client terminal, 140 client terminal

Claims (15)

  1.  画像生成方法であって、
     画像選択部によって、背景画像を取得する背景画像取得工程と、
     前記画像選択部によって、メタデータを備えた検出対象画像をソース画像から特定する検出対象画像特定工程と、
     モデル作成部によって、前記検出対象画像に対応する検出対象画像モデルを生成するモデル生成工程と、
     前記モデル作成部によって、前記背景画像と前記検出対象画像モデルとを結合させることにより、最終画像を確立する検出対象画像確立工程と
     を含む画像生成方法。
    An image generation method comprising:
    A background image acquisition step of acquiring a background image by the image selection unit;
    A detection target image specifying step of specifying, from the source image, a detection target image including metadata by the image selection unit;
    A model generation step of generating a detection target image model corresponding to the detection target image by the model creation unit;
    A detection target image establishing step of establishing a final image by combining the background image and the detection target image model by the model creating unit.
  2.  前記メタデータは、前記検出対象画像の位置を表す情報、前記検出対象画像の形状及び大きさを示す情報、及び前記検出対象画像の性質を表す情報を含むものである請求項1に記載の画像生成方法。 The image generation method according to claim 1, wherein the metadata includes information indicating a position of the detection target image, information indicating a shape and a size of the detection target image, and information indicating a property of the detection target image. .
  3.  前記検出対象画像モデルは、ベクトルモデル、3Dモデル、及び点群モデルから選ばれるものである請求項1に記載の画像生成方法。 The image generation method according to claim 1, wherein the detection target image model is selected from a vector model, a 3D model, and a point cloud model.
  4.  前記モデル生成工程は、
     前記モデル作成部によって前記検出対象画像に対応する前記ベクトルモデルを生成するベクトルモデル生成工程と、
     機械学習部によって、前記ベクトルモデルに対して画像処理を施し、前記検出対象画像モデルを生成するものである請求項3に記載の画像生成方法。
    The model generation step includes
    A vector model generating step for generating the vector model corresponding to the detection target image by the model generating unit;
    The image generation method according to claim 3, wherein a machine learning unit performs image processing on the vector model to generate the detection target image model.
  5.  前記検出対象画像確立工程は、
     前記背景画像から、基準の物体を識別する物体識別工程と、
     前記基準の物体の寸法要素に基づいて、カメラパラメータを計算するカメラパラメータ計算工程と、
     計算された前記カメラパラメータに基づいて、前記検出対象画像モデルを前記背景画像に結合させる結合工程ことにより、最終画像を確立するものである請求項1に記載の画像生成方法。
    The detection target image establishment step includes:
    An object identification step for identifying a reference object from the background image;
    A camera parameter calculation step for calculating a camera parameter based on a dimension element of the reference object;
    The image generation method according to claim 1, wherein a final image is established by a combining step of combining the detection target image model with the background image based on the calculated camera parameter.
  6.  機械学習検出精度向上工程を含む画像生成方法であって、
     前記機械学習検出精度向上工程は、
     第1ニューラルネットワークを最適化するために、前記検出対象画像モデルに関連付けられたメタデータを前記第1ニューラルネットワークに提供し、前記メタデータに基づいて、対象とする画像の中から前記検出対象画像を前記第1ニューラルネットワークに特定させる検出対象画像特定訓練工程と、
     検出対象画像特定訓練工程の結果により、特定精度を算出する特定精度算出工程と、
    前記特定精度を所定の特定精度基準と比較することにより、前記特定精度が前記所定の特定精度基準を達成しない場合、前記検出対象画像特定訓練工程を繰り返すことを決定する特定精度判定工程と
     を含む請求項1に記載の画像生成方法。
    An image generation method including a machine learning detection accuracy improving step,
    The machine learning detection accuracy improving step is:
    In order to optimize the first neural network, metadata associated with the detection target image model is provided to the first neural network, and the detection target image is selected from among target images based on the metadata. Detecting target image specifying training step for causing the first neural network to specify
    According to the result of the detection target image identification training step, a specific accuracy calculation step for calculating a specific accuracy,
    A specific accuracy determination step that determines that the detection target image specific training step is repeated when the specific accuracy does not achieve the predetermined specific accuracy criterion by comparing the specific accuracy with a predetermined specific accuracy criterion. The image generation method according to claim 1.
  7.  機械学習画像作成能力向上工程を含む画像作成方法であって、
     前記機械学習画像作成能力向上工程は
     基本画像を第2ニューラルネットワークに入力し、前記基本画像に基づいて作成画像を作成する作成画像作成工程と、
     前記作成画像と目的画像とを比較する比較工程と、
     前記作成画像と前記目的画像とが所定の類似度基準を達成する場合には、前記第2ニューラルネットワークのパラメータを調整するパラメータ調整工程と、
     を含む請求項1に記載の画像生成方法。
    An image creation method including a machine learning image creation capability improvement step,
    The machine learning image creation capability improvement step inputs a basic image to a second neural network, and creates a creation image creation step of creating a creation image based on the basic image;
    A comparison step of comparing the created image with a target image;
    When the created image and the target image achieve a predetermined similarity criterion, a parameter adjustment step of adjusting a parameter of the second neural network;
    The image generation method according to claim 1, comprising:
  8.  前記第2ニューラルネットワークは敵対的生成ネットワークである請求項7に記載の画像生成方法。 The image generation method according to claim 7, wherein the second neural network is a hostile generation network.
  9.  前記検出対象画像確立工程は、
     前記モデル作成部によって、第1検出対象画像に対応する第1検出対象画像モデルを生成する第1対象モデル生成工程と、
     前記モデル作成部によって、第2検出対象画像に対応する第2検出対象画像モデルを生成する第2対象モデル生成工程と、
     前記モデル作成部によって、第1背景画像を取得する第1背景画像取得工程と、
     前記第1背景画像に対して、前記第1検出対象画像及び第2検出対象画像を挿入する工程と、
     を含む請求項1に記載の画像生成方法。
    The detection target image establishment step includes:
    A first target model generation step of generating a first detection target image model corresponding to the first detection target image by the model creating unit;
    A second target model generation step of generating a second detection target image model corresponding to the second detection target image by the model creation unit;
    A first background image acquisition step of acquiring a first background image by the model creating unit;
    Inserting the first detection target image and the second detection target image with respect to the first background image;
    The image generation method according to claim 1, comprising:
  10.  前記検出対象画像モデルは、前記検出対象画像に対応する画像系列であることを含む請求項1に記載の画像生成方法。 The image generation method according to claim 1, wherein the detection target image model includes an image series corresponding to the detection target image.
  11.  前記検出対象画像確立工程は、
     前記モデル作成部によって、
     前記ソース画像の一部に不鮮明な個所が存在している場合には、
     当該個所を他の鮮明な画像で置換又は挿入することにより最終画像を生成するものである
    請求項1に記載の画像生成方法。
    The detection target image establishment step includes:
    By the model creation unit,
    If there are blurry parts in the source image,
    The image generation method according to claim 1, wherein the final image is generated by replacing or inserting the portion with another clear image.
  12.  画像生成装置であって、
     背景画像を取得し、メタデータを備えた検出対象画像をソース画像から特定する画像選択部と、
     前記検出対象画像に対応する検出対象画像モデルを生成するモデル生成し、前記背景画像と前記検出対象画像モデルとを結合させることにより、最終画像を確立するモデル作成
    部と、
     を有する画像生成装置。
    An image generation device,
    An image selection unit that acquires a background image and identifies a detection target image including metadata from a source image;
    A model generation unit for generating a detection target image model corresponding to the detection target image, and establishing a final image by combining the background image and the detection target image model;
    An image generation apparatus having
  13.  前記検出対象画像モデルは、ベクトルモデル、3Dモデル、及び点群モデルから選ばれるものである請求項12に記載の画像生成装置。 The image generation apparatus according to claim 12, wherein the detection target image model is selected from a vector model, a 3D model, and a point cloud model.
  14.  前記モデル作成部は、
    前記検出対象画像に対応する前記ベクトルモデルを生成し、
     前記画像生成装置は、
    前記ベクトルモデルに対して画像処理を施し、前記検出対象画像モデルを生成する機械学習部を更に有する
    請求項13に記載の画像生成装置。
    The model creation unit
    Generating the vector model corresponding to the detection target image;
    The image generation device includes:
    The image generation apparatus according to claim 13, further comprising a machine learning unit that performs image processing on the vector model to generate the detection target image model.
  15.  中央サーバとクライアント端末とがネットワークを介して接続された画像生成システムであって、
     クライアント端末は画像選択部を有し、
     中央サーバはモデル作成部を有し、
     前記画像選択部は、
     ユーザの入力により、背景画像を取得し、
     メタデータを備えた検出対象画像をソース画像から特定し、
     前記背景画像及び前記検出対象画像を中央サーバに送信し、
     前記中央サーバは、
     前記背景画像及び前記検出対象画像をクライアント端末から受信し、
     前記モデル作成部は、
     前記検出対象画像に対応する検出対象画像モデルを生成するモデル生成し、
     前記背景画像と前記検出対象画像モデルとを結合させることにより、最終画像を確立する、
     ことを特徴とする画像生成システム。
    An image generation system in which a central server and a client terminal are connected via a network,
    The client terminal has an image selection unit,
    The central server has a model creation unit,
    The image selection unit
    A background image is obtained by user input,
    Identify the detection target image with metadata from the source image,
    Transmitting the background image and the detection target image to a central server;
    The central server is
    Receiving the background image and the detection target image from a client terminal;
    The model creation unit
    Generating a model for generating a detection target image model corresponding to the detection target image;
    Establishing a final image by combining the background image and the detection target image model;
    An image generation system characterized by that.
PCT/JP2018/048149 2018-03-12 2018-12-27 Image generation method, image generation device, and image generation system WO2019176235A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201880089706.6A CN111742342A (en) 2018-03-12 2018-12-27 Image generation method, image generation device, and image generation system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018043822A JP6719497B2 (en) 2018-03-12 2018-03-12 Image generation method, image generation device, and image generation system
JP2018-043822 2018-03-12

Publications (1)

Publication Number Publication Date
WO2019176235A1 true WO2019176235A1 (en) 2019-09-19

Family

ID=67908225

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/048149 WO2019176235A1 (en) 2018-03-12 2018-12-27 Image generation method, image generation device, and image generation system

Country Status (3)

Country Link
JP (1) JP6719497B2 (en)
CN (1) CN111742342A (en)
WO (1) WO2019176235A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161295A (en) * 2019-12-30 2020-05-15 神思电子技术股份有限公司 Background stripping method for dish image
CN111310592A (en) * 2020-01-20 2020-06-19 杭州视在科技有限公司 Detection method based on scene analysis and deep learning
CN113362353A (en) * 2020-03-04 2021-09-07 上海分众软件技术有限公司 Method for identifying advertising player frame by utilizing synthesis training picture
WO2021224895A1 (en) * 2020-05-08 2021-11-11 Xailient Systems and methods for distributed data analytics
CN114120070A (en) * 2022-01-29 2022-03-01 浙江啄云智能科技有限公司 Image detection method, device, equipment and storage medium
WO2022044367A1 (en) * 2020-08-26 2022-03-03 株式会社Jvcケンウッド Machine learning device and far-infrared imaging device
WO2022044369A1 (en) * 2020-08-26 2022-03-03 株式会社Jvcケンウッド Machine learning device and image processing device
GB2610682A (en) * 2021-06-28 2023-03-15 Nvidia Corp Training object detection systems with generated images

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102348368B1 (en) * 2020-05-21 2022-01-11 주식회사 누아 Device, method, system and computer readable storage medium for generating training data of machine learing model and generating fake image using machine learning model
CN118200663A (en) * 2024-03-15 2024-06-14 宁波艾腾湃数字技术有限公司 Interactive video playing method combined with digital three-dimensional model display

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10240908A (en) * 1997-02-27 1998-09-11 Hitachi Ltd Video composing method
JPH10336577A (en) * 1997-05-30 1998-12-18 Matsushita Electric Ind Co Ltd Figure picture printing device
JP2014178957A (en) * 2013-03-15 2014-09-25 Nec Corp Learning data generation device, learning data creation system, method and program
WO2017154630A1 (en) * 2016-03-09 2017-09-14 日本電気株式会社 Image-processing device, image-processing method, and recording medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012088787A (en) * 2010-10-15 2012-05-10 Canon Inc Image processing device, image processing method
JP5645079B2 (en) * 2011-03-31 2014-12-24 ソニー株式会社 Image processing apparatus and method, program, and recording medium
US8903167B2 (en) * 2011-05-12 2014-12-02 Microsoft Corporation Synthesizing training samples for object recognition
JP6008045B2 (en) * 2013-06-28 2016-10-19 日本電気株式会社 Teacher data generation device, method, program, and crowd state recognition device, method, program
CN105184253B (en) * 2015-09-01 2020-04-24 北京旷视科技有限公司 Face recognition method and face recognition system
US20180012411A1 (en) * 2016-07-11 2018-01-11 Gravity Jack, Inc. Augmented Reality Methods and Devices
CN107784316A (en) * 2016-08-26 2018-03-09 阿里巴巴集团控股有限公司 A kind of image-recognizing method, device, system and computing device
CN106682587A (en) * 2016-12-02 2017-05-17 厦门中控生物识别信息技术有限公司 Image database building method and device
EP3660787A4 (en) * 2017-07-25 2021-03-03 Cloudminds (Shenzhen) Robotics Systems Co., Ltd. Training data generation method and generation apparatus, and image semantics segmentation method therefor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10240908A (en) * 1997-02-27 1998-09-11 Hitachi Ltd Video composing method
JPH10336577A (en) * 1997-05-30 1998-12-18 Matsushita Electric Ind Co Ltd Figure picture printing device
JP2014178957A (en) * 2013-03-15 2014-09-25 Nec Corp Learning data generation device, learning data creation system, method and program
WO2017154630A1 (en) * 2016-03-09 2017-09-14 日本電気株式会社 Image-processing device, image-processing method, and recording medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161295A (en) * 2019-12-30 2020-05-15 神思电子技术股份有限公司 Background stripping method for dish image
CN111161295B (en) * 2019-12-30 2023-11-21 神思电子技术股份有限公司 Dish image background stripping method
CN111310592B (en) * 2020-01-20 2023-06-16 杭州视在科技有限公司 Detection method based on scene analysis and deep learning
CN111310592A (en) * 2020-01-20 2020-06-19 杭州视在科技有限公司 Detection method based on scene analysis and deep learning
CN113362353A (en) * 2020-03-04 2021-09-07 上海分众软件技术有限公司 Method for identifying advertising player frame by utilizing synthesis training picture
WO2021224895A1 (en) * 2020-05-08 2021-11-11 Xailient Systems and methods for distributed data analytics
US11275970B2 (en) 2020-05-08 2022-03-15 Xailient Systems and methods for distributed data analytics
US12045720B2 (en) 2020-05-08 2024-07-23 Xailient Systems and methods for distributed data analytics
WO2022044369A1 (en) * 2020-08-26 2022-03-03 株式会社Jvcケンウッド Machine learning device and image processing device
WO2022044367A1 (en) * 2020-08-26 2022-03-03 株式会社Jvcケンウッド Machine learning device and far-infrared imaging device
JP7524676B2 (en) 2020-08-26 2024-07-30 株式会社Jvcケンウッド Machine learning device and image processing device
JP7528637B2 (en) 2020-08-26 2024-08-06 株式会社Jvcケンウッド Machine learning device and far-infrared imaging device
GB2610682A (en) * 2021-06-28 2023-03-15 Nvidia Corp Training object detection systems with generated images
GB2610682B (en) * 2021-06-28 2024-09-18 Nvidia Corp Training object detection systems with generated images
CN114120070A (en) * 2022-01-29 2022-03-01 浙江啄云智能科技有限公司 Image detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
JP6719497B2 (en) 2020-07-08
CN111742342A (en) 2020-10-02
JP2019159630A (en) 2019-09-19

Similar Documents

Publication Publication Date Title
WO2019176235A1 (en) Image generation method, image generation device, and image generation system
JP7262884B2 (en) Biometric face detection method, device, equipment and computer program
US11954904B2 (en) Real-time gesture recognition method and apparatus
JP7190842B2 (en) Information processing device, control method and program for information processing device
US9710912B2 (en) Method and apparatus for obtaining 3D face model using portable camera
CN105512627B (en) A kind of localization method and terminal of key point
CN108388882B (en) Gesture recognition method based on global-local RGB-D multi-mode
CN110428490B (en) Method and device for constructing model
CN104599287B (en) Method for tracing object and device, object identifying method and device
JP6628494B2 (en) Apparatus, program, and method for tracking object using discriminator learning based on real space information
US20110025834A1 (en) Method and apparatus of identifying human body posture
JP7292492B2 (en) Object tracking method and device, storage medium and computer program
JP2000306095A (en) Image collation/retrieval system
CN111241932A (en) Automobile exhibition room passenger flow detection and analysis system, method and storage medium
WO2021184754A1 (en) Video comparison method and apparatus, computer device and storage medium
JP7314959B2 (en) PERSONAL AUTHENTICATION DEVICE, CONTROL METHOD, AND PROGRAM
CN114359892B (en) Three-dimensional target detection method, three-dimensional target detection device and computer-readable storage medium
CN111259700B (en) Method and apparatus for generating gait recognition model
CN104978583B (en) The recognition methods of figure action and device
JP6377566B2 (en) Line-of-sight measurement device, line-of-sight measurement method, and program
CN115620403A (en) Living body detection method, electronic device, and storage medium
CN114581978A (en) Face recognition method and system
CN112926497B (en) Face recognition living body detection method and device based on multichannel data feature fusion
CN115375739A (en) Lane line generation method, apparatus, and medium
EP3646243B1 (en) Learning template representation libraries

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18909976

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18909976

Country of ref document: EP

Kind code of ref document: A1