CN116250242A

CN116250242A - Shooting device, server device and 3D data generation method

Info

Publication number: CN116250242A
Application number: CN202180067706.8A
Authority: CN
Inventors: 岛川真人
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2020-10-07
Filing date: 2021-09-24
Publication date: 2023-06-09
Also published as: WO2022075073A1; JPWO2022075073A1

Abstract

The present technology relates to an imaging apparatus, a server apparatus, and a 3D data generation method that can perform volume imaging using a simple imaging apparatus. The server device is provided with a control unit that receives information relating to imaging from a plurality of imaging devices, and determines whether or not the plurality of imaging devices are capable of volume imaging based on the received information. The imaging device is provided with a control unit that transmits information on itself concerning imaging to the server device, and acquires candidates of a set value concerning volume imaging, which are transmitted from the server device based on a determination result of whether volume imaging is possible. The present disclosure can be applied to, for example, an image processing system or the like that provides volume photographing and reproduction based on a photographing apparatus such as a smart phone.

Description

Shooting device, server device and 3D data generation method

Technical Field

The present technology relates to an imaging device, a server apparatus, and a 3D data generation method, and more particularly, to an imaging device, a server apparatus, and a 3D data generation method that enable volume imaging using a simple imaging device.

Background

The following techniques exist: A3D model of an object is generated from moving images captured at a plurality of viewpoints, and a virtual viewpoint image of the 3D model corresponding to an arbitrary viewing position is generated, whereby an image of a free viewpoint is provided. This technique is also referred to as volume capture (volumetric capture), and the like.

In the case of capturing a moving image for generating a 3D model, since the subject is captured from different directions (viewpoints), it is necessary to dispose a plurality of imaging devices for capturing the subject at different locations, determine the positional relationship between the imaging devices, and acquire synchronization in each of the plurality of imaging devices to perform capturing (for example, refer to patent document 1). Hereinafter, this image for generating a 3D model is also referred to as a volume image (volumetric photography).

At the present time, volume photographing is generally performed in a dedicated studio or the like using a dedicated instrument.

Patent document 1: japanese patent application laid-open No. 2019-87791

However, it is desirable to simply perform volume shooting using an electronic device having a shooting function, which is generally possessed by a user such as a smart phone or a tablet computer.

Disclosure of Invention

The present technology has been completed in view of such a situation so that volume photographing can be performed using a simple photographing apparatus.

The imaging device according to the first aspect of the present technology includes a control unit that transmits information on itself concerning imaging to a server device and acquires candidates of a set value concerning volume imaging transmitted from the server device based on a determination result of whether or not volume imaging is possible.

In the first aspect of the present technology, the information on the imaging itself is transmitted to the server device, and the candidates of the set value related to the volume imaging, which are transmitted from the server device based on the determination result of whether the volume imaging is possible, are acquired.

The server device according to the second aspect of the present technology includes a control unit that receives information on photographing from a plurality of photographing devices and determines whether or not the plurality of photographing devices are capable of performing volume photographing based on the received information.

In a second aspect of the present technology, information relating to photographing is received from a plurality of photographing apparatuses, and it is determined whether or not the plurality of photographing apparatuses are capable of volume photographing based on the received information.

In the 3D data generation method according to the third aspect of the present technology, information relating to imaging is received from a plurality of imaging devices, whether the plurality of imaging devices can perform volume imaging is determined based on the received information, and data of a 3D model of an object is generated from imaging images captured by the plurality of imaging devices determined to be capable of volume imaging.

In a third aspect of the present technology, information relating to photographing is received from a plurality of photographing apparatuses, whether the plurality of photographing apparatuses are capable of volume photographing is determined based on the received information, and data of a 3D model of a subject is generated from photographed images photographed by the plurality of photographing apparatuses determined to be capable of volume photographing.

The photographing apparatus and the server device may be separate devices, or may be internal blocks constituting one device.

Drawings

Fig. 1 is a block diagram showing a configuration example of an image processing system to which the present disclosure is applied.

Fig. 2 is a diagram illustrating volume photographing and reproduction.

Fig. 3 is a diagram illustrating an example of a data format of 3D model data.

Fig. 4 is a block diagram showing a detailed configuration example of each device of the image processing system.

Fig. 5 is a flowchart illustrating a volume shooting playback process of the image processing system.

Fig. 6 is a flowchart illustrating details of the grouping process of the photographing apparatus in step S1 of fig. 5.

Fig. 7 is a diagram showing an example of a screen of a two-dimensional code.

Fig. 8 is a diagram showing an example of the camera calibration process.

Fig. 9 is a flowchart illustrating details of the camera calibration process of the photographing apparatus in step S3 of fig. 5.

Fig. 10 is a diagram showing an example of the camera calibration process.

Fig. 11 is a diagram showing an example of the camera calibration process.

Fig. 12 is a flowchart illustrating details of the volume imaging setting process in step S4 in fig. 5.

Fig. 13 is a diagram showing an example of capability information.

Fig. 14 is a flowchart illustrating details of the synchronous shooting process of the shooting device in step S5 of fig. 5.

Fig. 15 is a flowchart illustrating details of the synchronous shooting process of the other shooting apparatuses in step S5 of fig. 5.

Fig. 16 is a flowchart illustrating details of the offline modeling process in step S7 of fig. 5.

Fig. 17 is a diagram showing an example of 3D model data of an object.

Fig. 18 is a flowchart illustrating details of the content reproduction process in step S8 of fig. 5.

Fig. 19 is a flowchart illustrating details of the real-time modeling reproduction processing in step S9 of fig. 5.

Fig. 20 is a flowchart illustrating the automatic calibration process.

Fig. 21 is a diagram showing another example of the camera calibration process.

Fig. 22 is a flowchart illustrating a camera calibration process performed by the control apparatus.

Fig. 23 is a diagram showing an example of a feedback screen.

Fig. 24 is a flowchart illustrating a camera calibration process performed by the cloud server.

Fig. 25 is a flowchart illustrating details of the camera parameter calculation process of step S224 in fig. 24.

Fig. 26 is a block diagram showing a configuration example of an embodiment of a computer to which the present disclosure is applied.

Detailed Description

Hereinafter, a mode for carrying out the present technology (hereinafter, referred to as an embodiment) will be described with reference to the drawings. In the present specification and the drawings, components having substantially the same functional structure are denoted by the same reference numerals, and repetitive description thereof will be omitted. The following procedure is followed.

1. Structural example of image processing System

2. Overview of volume shooting and reproduction

3. Detailed configuration example of each device of image processing system

4. Flow chart of volume shooting reproduction processing

5. Flow chart of grouping processing of photographing device

6. Flow chart of camera calibration process of photographing device

7. Flow chart of shooting setting process for volume shooting

8. Flow chart of synchronous shooting process of shooting device

9. Flow chart of offline modeling process

10. Flow chart of content reproduction processing

11. Flow chart of real-time modeling reproduction process

12. Flow chart of automatic calibration process

13. Camera calibration process without using calibration plate image

14. Computer structural example

< 1. Structural example of image processing System >

Fig. 1 shows an image processing system of the present disclosure and is a configuration example of an image processing system that provides volume shooting and reproduction based on a shooting device such as a smart phone.

The image processing system 1 includes N (N > 1) imaging devices 11, a cloud server (server apparatus) 12, and a playback device 13.

The N imaging apparatuses 11 perform volume imaging of a predetermined subject OBJ, and transmit the resulting image data to the cloud server 12. The photographing device 11 is constituted by an electronic device having a photographing function, such as a smart phone, a tablet pc, a digital camera, and a game terminal. In the example of fig. 1, an example is shown in which n=5, that is, 5 photographing apparatuses 11-1 to 11-5 are used to photograph the subject OBJ, but the number of photographing apparatuses 11 is arbitrary.

The cloud server 12 generates data (3D model data) MO of the 3D model of the object OBJ based on image data obtained by volume shooting. In addition, the cloud server 12 transmits the stored 3D model data MO of the predetermined object OBJ to the playback device 13 in response to a request from the playback device 13.

The playback device 13 acquires and plays back 3D model data MO of a predetermined object OBJ from the cloud server 12, thereby generating a free viewpoint image of the 3D model of the object OBJ from an arbitrary viewing position, and displays the free viewpoint image on a predetermined display. The playback device 13 is configured by, for example, an electronic device equipped with a display such as a smart phone or tablet 13A, a personal computer 13B, and a Head Mounted Display (HMD) 13C.

The N-station photographing apparatus 11 and the cloud server 12 are connected via a predetermined network. The cloud server 12 and the playback device 13 are also connected via a predetermined network. The network connecting them may be any communication network, may be a wired communication network, may be a wireless communication network, or may be both of them. The network may be composed of 1 communication network or a plurality of communication networks. For example, a wide area communication network for a wireless mobile body such as the internet, a public telephone line network, a so-called 4G line, or 5G line, a WAN (WIDe Area Network: wide area network), a LAN (Local Area Network: local area network), a wireless communication network for performing communication according to the Bluetooth (registered trademark) standard, a communication path for near field communication such as NFC (Near Field Communication: near field communication), a communication path for infrared communication, a communication network for wired communication according to the HDMI (registered trademark) (High-Definition Multimedia Interface: high-definition multimedia interface), USB (Universal Serial Bus: universal serial bus) standard, or the like, a communication network or a communication path of any communication standard may be included in the network.

< 2. Overview of volume shooting and reproduction >

With reference to fig. 2 and 3, volume photographing and reproduction will be briefly described.

For example, as shown in fig. 2, a plurality of imaging devices are used to capture a predetermined imaging space in which an object such as a person is disposed from the outer periphery of the predetermined imaging space, thereby obtaining a plurality of captured images. The captured image is composed of, for example, a moving image. In the example of fig. 2, 3 cameras CAM1 to CAM3 are arranged to surround the object #ob1, but the number of cameras CAM is not limited to 3, but arbitrary. Since the number of photographing devices CAM at the time of photographing is a known number of views at the time of generating a free-viewpoint image, the more the number of views, the more accurate the free-viewpoint image can be represented. The subject #ob1 is a person who takes a predetermined action.

Using the captured images obtained by the plurality of imaging devices CAM from different directions, a 3D object MO1 (3D modeling) which is a 3D model of the object #ob1 which is a display object in the imaging space is generated. For example, the 3D object MO1 is generated by a Visual Hull or the like that cuts out the three-dimensional shape of the subject using captured images in different directions.

Then, data (3D model data) of 1 or more 3D objects among 1 or more 3D objects existing in the shooting space is transferred to the reproduction-side apparatus and reproduced. That is, in the playback-side apparatus, the 3D object is rendered based on the acquired data of the 3D object, whereby the 3D shape image is displayed on the audiovisual device of the viewer. Fig. 2 shows an example in which the audio-visual device is a display D1 or a Head Mounted Display (HMD) D2.

The playback side can request only 3D objects of the audio-visual objects among 1 or more 3D objects existing in the shooting space, and display the 3D objects on the audio-visual device. For example, the playback side assumes that the viewing range of the viewer is a virtual camera of the shooting range, requests only 3D objects captured by the virtual camera among a plurality of 3D objects existing in the shooting space, and displays the 3D objects on the viewing device. The viewpoint of the virtual camera (virtual viewpoint) can be set at an arbitrary position so that the viewer can view the subject from an arbitrary viewpoint in the real world. In the 3D object, images representing the background of a predetermined space can be appropriately synthesized.

Fig. 3 shows an example of a data format of general 3D model data.

The 3D model data is generally represented by 3D shape data representing a 3D shape (geometric information) of the subject and texture data representing color information of the subject.

The 3D shape data is represented by, for example, a point cloud form in which the three-dimensional position of the subject is represented by a set of points, a 3D mesh form in which vertices (Vertex) called polygonal meshes are connected to each other, a voxel form in which a set of cubes called voxels (voxels) are represented, and the like.

The texture data includes, for example, a multi-texture format stored in a captured image (two-dimensional texture image) captured by each imaging device CAM, a UV mapping format expressed by a UV coordinate system and stored in a two-dimensional texture image attached to each point or each polygon mesh as 3D shape data, and the like.

As in the upper stage of fig. 3, the 3D model data is described in the form of 3D shape data and a multi-texture format held in a plurality of captured images P1 to P8 captured by each capturing device CAM, and is in the form of a view dependent whose color information can be changed according to a virtual viewpoint (position of a virtual camera).

On the other hand, as in the lower stage of fig. 3, the 3D model data is described in a UV mapping format in which 3D shape data and texture information of an object are mapped to a UV coordinate system, and the 3D model data is in a viesinependent format in which color information is the same according to a virtual viewpoint (position of a virtual camera).

Conventionally, volume imaging, which is imaging for generating 3D model data, is generally performed in a dedicated imaging room or the like using a dedicated instrument or the like.

However, the image processing system 1 of fig. 1 is a system capable of performing volume shooting using an electronic device such as a smart phone or a tablet computer which a user generally has.

< 3. Detailed structural example of each device of image processing System >

Fig. 4 is a block diagram showing a detailed configuration example of the photographing apparatus 11, the cloud server 12, and the reproducing apparatus 13.

(photographing apparatus 11)

The photographing apparatus 11 includes a communication unit 31, a messaging unit 32, a control unit 33, a sound output unit 34, a speaker 35, an image output unit 36, a display 37, a flash output unit 38, and a flash 39.

The imaging device 11 further includes a touch sensor 40, a gyro sensor 41, an acceleration sensor 42, a GPS sensor 43, a sensor input unit 44, and a synchronization signal generation unit 45.

The photographing apparatus 11 further includes 1 or more cameras 51 (51A to 51C), a microphone 52, a camera input/output section 53, an image processing section 54, an image compression section 55, a sound input section 56, a sound processing section 57, a sound compression section 58, and a stream transmission section 59.

The communication unit 31 is configured by various communication modules such as carrier communication for wireless mobile units such as a 4G line and a 5G line, wireless communication such as Wi-Fi (registered trademark), and wired communication such as 1000BASE-T, and communicates messages and data with the cloud server 12.

The messaging unit 32 communicates messages with the cloud server 12 via the communication unit 31. Regardless of the kind of communication by the communication section 31, the messaging section 32 corresponds to an instant messaging system capable of communicating messages with the cloud server 12 side. As such an instant messaging system, for example, jabber, XMPP, SIP (Session Initiation Protocol: session initiation protocol) and the like exist.

The control unit 33 controls the entire operation of the photographing apparatus 11 based on the message received by the message transmitting/receiving unit 32 and the operation of the user detected by the operation unit, not shown. For example, the control unit 33 communicates a message with the cloud server 12 via the messaging unit 32, notifies the user of the operation content, or starts volume shooting (shooting of the subject) in response to a request from the cloud server 12, and transmits image data obtained by shooting to the cloud server 1. The control unit 33 transmits the capability information of the photographing apparatus 11 to the cloud server 12, or supplies sensor information, which is the detection results of various sensors of the photographing apparatus 11, to the stream transmission unit 59. The control unit 33 supplies the setting value information of the photographing function supplied from the cloud server 12 to the camera input/output unit 53, the image processing unit 54, and the image compression unit 55, and performs predetermined settings. The setting value information of the photographing function includes, for example, setting values concerning exposure time, resolution, compression scheme, bit rate, and the like.

An application program for executing volume photographing (hereinafter referred to as a volume photographing application) is installed in the photographing apparatus 11, and the control section 33 performs a process of controlling the entire operation of the photographing apparatus 11 by starting the execution of the volume photographing application.

The sound output unit 34 outputs a sound signal to the speaker 35 under the control of the control unit 33. The speaker 35 outputs sound based on the sound signal supplied from the sound output section 34. The image output unit 36 outputs an image signal to the distributor 37 according to the control of the control unit 33. The display 37 displays an image based on the image signal supplied from the image output section 36. The flash output unit 38 outputs a light emission control signal to the flash 39 in response to control from the control unit 33. The flash 39 emits light based on the light emission control signal from the flash output section 38.

The touch sensor 40 detects a touch position when a user touches the display 37, and supplies the touch position as sensor information to the sensor input unit 44. The gyro sensor 41 detects the angular velocity and supplies the detected angular velocity as sensor information to the sensor input unit 44. The acceleration sensor 42 detects acceleration and supplies the acceleration as sensor information to the sensor input unit 44. The GPS sensor 43 receives GPS signals of one of the GNSS (Global Navigation Satellite System/global positioning satellite system). The GPS sensor 43 detects the current position of the photographing apparatus 11 based on the received GPS signal, and supplies it as sensor information to the sensor input section 44. The sensor input unit 44 acquires sensor information supplied from each of the touch sensor 40, the gyro sensor 41, the acceleration sensor 42, and the GPS sensor 43, and supplies the sensor information to the control unit 33.

The synchronization signal generation unit 45 generates a synchronization signal based on an instruction from the control unit 33, and supplies the generated synchronization signal to 1 or more cameras 51. As means for generating the synchronization signal by the synchronization signal generating unit 45, the following method can be adopted depending on the type of network connecting the N photographing devices 11 and the cloud server 12.

The first synchronization method is a method of acquiring synchronization using clock information of carrier communication such as 5G lines. In carrier communication, since a highly accurate clock is provided for communication, synchronization can be obtained by matching time using the clock information.

The second synchronization method is a method of acquiring synchronization using time information included in a GPS signal. The GPS signal has highly accurate time information about the degree of the ground master clock used as PTP (Precision Time Protocol: precision time protocol), and therefore can be used to match time and acquire synchronization.

The third synchronization method is a method of performing time synchronization by multicast communication (Multicast communication) using Wi-Fi wireless communication. In the case of connection under the same access point, a multicast packet (multicast packet) can be transmitted to detect timing and acquire synchronization. 802.11AC, wiFi Time Sync may also be used.

The fourth synchronization method is a method of performing time synchronization by multicast communication using carrier communication. In the carrier communication of the 5G line, since the communication is delayed by 1msec or less, the timing can be detected by transmitting the multicast packet by the carrier communication, and synchronization can be acquired.

The cameras 51A to 51C are cameras of different camera types (types), for example, the camera 51A is an RGB camera that generates an RGB image based on a light receiving result of receiving visible light (RGB), the camera 51B is an RGB-D camera that generates a depth image that stores a depth value as distance information to a subject as a pixel value of each pixel together with the RGB image, and the camera 51C is an infrared light (IR) camera that generates an IR image based on a light receiving result of receiving IR light. The RGB image, the depth image, and the IR image are referred to as camera images without particular distinction.

The cameras 51A to 51C perform predetermined settings for exposure time, gain, resolution, and the like based on the set value information supplied from the control section 33 via the camera input/output section 53. In addition, the cameras 51A to 51C output camera images obtained as a result of photographing to the camera input/output section 53.

The microphone 52 collects surrounding sounds and outputs the sounds to the sound input unit 56.

The camera input/output section 53 supplies the setting value information supplied from the control section 33 to each of the cameras 51A to 51C. In addition, the camera input/output section 53 supplies the camera images supplied from the cameras 51A to 51C to the image processing section 54.

The image processing unit 54 performs predetermined image signal processing such as demosaicing, color correction, distortion correction, and color space conversion on (RAW data of) the camera image supplied from the camera input/output unit 53.

The image compression unit 55 performs a predetermined compression encoding process on the image signal from the image processing unit 54 based on the set value specified by the control unit 33, and supplies the image stream after compression encoding to the stream transmission unit 59. Examples of the setting values specified by the control unit 33 include parameters such as compression scheme and bit rate.

The stream transmitting unit 59 transmits the image stream from the image compressing unit 55, the sound stream from the sound compressing unit 58, and the sensor information from the control unit 33 to the cloud server 12 via the communication unit 31. The sensor information from the control unit 33 is stored in an image stream of the camera image for each frame, for example, and transmitted.

The sound input unit 56 acquires the sound input from the microphone 52, and supplies the sound to the sound processing unit 57. The audio processing unit 57 performs predetermined audio signal processing such as noise removal processing on the audio signal from the audio input unit 56. The audio compression unit 58 performs a predetermined compression encoding process on the audio signal from the audio processing unit 57 based on the set value specified by the control unit 33, and supplies the compression-encoded audio stream to the stream transmission unit 59. Examples of the setting values specified by the control unit 33 include parameters such as compression scheme and bit rate.

(cloud server 12)

The cloud server 12 includes a controller 101, a messaging unit 102, a communication unit 103, a stream receiving unit 104, a calibration unit 105, a modeling task generating unit 106, and a task storage unit 107.

The cloud server 12 includes an offline modeling unit 108, a content management unit 109, a content storage unit 110, a real-time modeling unit 111, a stream transmission unit 112, and an auto calibration unit 113.

The controller 101 controls the operation of the whole cloud server 12. For example, the controller 101 transmits a predetermined message to each of the photographing device 11 and the playback device 13 via the message transmitting/receiving unit 102, thereby performing predetermined operations such as a photographing operation and a playback operation. The controller 101 controls the offline modeling unit 108 or the like to perform offline modeling, or controls the real-time modeling unit 111 or the like to perform real-time modeling. In the offline modeling, generation of a 3D model based on volume photographing and reproduction of a 3D model based on the generated 3D model data (display of a free viewpoint image) are performed at different timings. On the other hand, in real-time modeling, generation of a 3D model and reproduction of a 3D model based on the generated 3D model data are performed as a series of processes. The controller 101 determines the voxel size and the bounding box size, which are modeling parameters at the time of generating the 3D model, according to the imaging target area of each imaging device 11 and the setting of the user. The voxel size represents the size of the voxel, and the bounding box size represents the processing range of the voxel for which the 3D object is explored.

The messaging unit 102 performs message communication with the photographing apparatus 11 or the reproducing apparatus 13 via the communication unit 103. The messaging unit 102 corresponds to, for example, the messaging unit 32 of the photographing apparatus 11.

The communication unit 103 is configured by various communication modules such as carrier communication, wireless communication such as Wi-Fi (registered trademark), and wired communication such as 1000BASE-T, and communicates messages and data with the imaging device 11 or the playback device 13.

The stream receiving unit 104 receives the image stream and the sound stream transmitted from each of the photographing apparatuses 11 via the communication unit 103, and executes decoding processing corresponding to a predetermined compression encoding scheme executed in the photographing apparatus 11. The stream receiving unit 104 supplies the decoded camera image or sound signal to at least one of the calibration unit 105, the modeling task generating unit 106, and the real-time modeling unit 111. The stream receiving unit 104 supplies the sensor information stored in the image stream and transmitted to the calibration unit 105, for example.

The calibration section 105 performs calibration based on the camera image and sensor information of each photographing device 11 according to an instruction from the controller 101, and calculates camera parameters of each camera 51. The camera parameters include an external parameter and an internal parameter, but in the case where the internal parameter is set to a fixed value, only the external parameter is calculated. The calculated camera parameters are supplied to the modeling task generation unit 106, the offline modeling unit 108, the real-time modeling unit 111, and the automatic calibration unit 113.

The modeling task generation unit 106 generates a modeling task as a task for generating a 3D model based on the image stream from the stream reception unit 104 and the camera parameters from the calibration unit 105 in accordance with an instruction from the controller 101, and stores the modeling task in the task storage unit 107. In the task storage unit 107, the modeling tasks sequentially supplied from the modeling task generation unit 106 are stored as a task queue.

The offline modeling unit 108 sequentially fetches 1 or more modeling tasks stored as a task queue in the task storage unit 107, and executes offline 3D modeling processing. Data of the 3D model of the object (3D model data) is generated by the 3D modeling process, and supplied to the content management unit 109. In the 3D modeling process, the offline modeling unit 108 may request the automatic calibration unit 113 for calibration processing at that time as needed, based on sensor information associated with image data of each frame of the predetermined imaging device 11, and update the calibration information.

The content management unit 109 stores and manages the 3D model data of the object generated by the offline modeling unit 108 as content in the content storage unit 110. The content management unit 109 acquires 3D model data, which is a predetermined object of the content specified by the controller 101, from the content storage unit 110, and transmits the 3D model data to the playback device 13 via the stream transmission unit 112. The content storage 110 stores 3D model data that is an object of content.

The real-time modeling unit 111 acquires the camera image supplied from the stream receiving unit 104, and performs 3D modeling processing in real time. Through the 3D modeling process, data of a 3D model of the object (3D model data) is generated in units of frames, and sequentially supplied to the stream transmitting unit 112. In the 3D modeling process, the real-time modeling unit 111 may request the automatic calibration unit 113 for calibration processing at that time as needed based on sensor information associated with image data of each frame of the predetermined imaging device 11, and update the calibration information.

The stream transmitting unit 112 transmits the 3D model data of the object supplied from the content managing unit 109 or the real-time modeling unit 111 to the playback device 13.

The automatic calibration unit 113 executes a calibration process to update the camera parameters when it is determined that the camera parameters calculated by the calibration unit 105 need to be updated based on the sensor information associated with the image data of each frame from the offline modeling unit 108 or the real-time modeling unit 111.

(reproduction apparatus 13)

The playback device 13 includes a communication unit 151, a messaging unit 152, a control unit 153, a sensor 154, a sensor input unit 155, a stream receiving unit 156, a playback unit 157, an image output unit 158, a display 159, a sound output unit 160, and a speaker 161.

The communication unit 151 is configured by various communication modules such as carrier communication, wireless communication such as Wi-Fi (registered trademark), and wired communication such as 1000BASE-T, and communicates messages and data with the cloud server 12.

The messaging unit 152 communicates messages with the cloud server 12 via the communication unit 103. The messaging unit 152 corresponds to the messaging unit 102 of the cloud server 12.

The control unit 153 controls the overall operation of the playback device 13 based on the message received by the message transmitting/receiving unit 152 and the operation of the viewer detected by an operation unit, not shown. For example, the control unit 153 causes the message transmitting/receiving unit 152 to transmit a message requesting predetermined content based on an operation of the viewer. The control unit 153 causes the playback unit 157 to play back the 3D model data of the object transmitted from the cloud server 12 in response to the request for the content. When the playback unit 157 is caused to play back the 3D model data of the object, the control unit 153 performs control so as to become a virtual viewpoint based on the sensor information supplied from the sensor input unit 155.

The sensor 154 is composed of a gyro sensor and an acceleration sensor, detects the viewing position of the viewer, and supplies the detected viewing position as sensor information to the sensor input unit 155. The sensor input unit 155 supplies the sensor information supplied from the sensor 154 to the control unit 153.

The stream receiving unit 156 is a receiving unit corresponding to the stream transmitting unit 112 of the cloud server 12, and receives 3D model data of an object transmitted from the cloud server 12 and supplies the received 3D model data to the reproducing unit 157.

The playback unit 157 plays back the 3D model of the object so as to be based on the viewing range of the viewing position supplied from the control unit 153, based on the 3D model data of the object. The free viewpoint image of the 3D model obtained as a result of reproduction is supplied to the image output unit 158, and the sound is supplied to the sound output unit 160.

The image output unit 158 supplies the free viewpoint image from the playback unit 157 to the display 159 and displays the free viewpoint image. The display 159 displays the free viewpoint image supplied from the image output unit 158.

The audio output unit 160 causes the speaker 161 to output an audio signal from the playback unit 157. The speaker 161 outputs a sound based on the sound signal from the sound output section 160.

The photographing apparatus 11, the cloud server 12, and the reproducing apparatus 13 of the image processing system 1 each have the above configuration.

The details of the processing performed by the image processing system 1 will be described below.

< 4. Flow chart of volume shooting reproduction processing >

Fig. 5 is a flowchart of the volume photographing reproduction process based on the image processing system 1.

First, in step S1, the image processing system 1 performs a grouping process of grouping a plurality of (N) photographing apparatuses 11 participating in volume photographing into one group of photographing apparatuses 11. Details of this process will be described later with reference to fig. 6 and 7.

In step S2, the image processing system 1 executes a calibration-purpose shooting setting process in which setting of the shooting device 11 for performing camera calibration processing is performed. Each imaging device 11 sets the exposure time, resolution, and the like to predetermined values according to, for example, the imaging function and settable range of its own. The set value for the camera calibration process may be predetermined, or may be set (selected) by the user based on the capability information of the user, as in the volume imaging setting process of step S4 described later.

In step S3, the image processing system 1 executes a camera calibration process of the photographing apparatus 11 that calculates camera parameters of each of the plurality of photographing apparatuses 11. Details of this process will be described later with reference to fig. 8 to 11.

In step S4, the image processing system 1 executes a volume shooting imaging setting process of setting the imaging device 11 for volume shooting. Details of this process will be described later with reference to fig. 12 and 13.

In step S5, the image processing system 1 executes synchronous shooting processing of the shooting devices 11 that perform shooting by synchronizing shooting timings of the plurality of shooting devices 11, respectively. Details of this process will be described later with reference to fig. 14 and 15.

In step S6, the image processing system 1 determines whether to perform offline modeling or real-time modeling. The offline modeling is a 3D modeling process that generates a 3D model at a timing different from that of the volume imaging, and the real-time modeling is a 3D modeling process that generates a 3D model in synchronization with the volume imaging. For example, when grouping the photographing devices 11, the user designates a predetermined 1 photographing device 11, and decides whether to perform offline modeling or real-time modeling.

In the case where it is determined in step S6 that offline modeling is performed, the following processing of steps S7 and S8 is performed.

In step S7, the image processing system 1 performs an offline modeling process of generating 3D model data based on image streams of the subject captured by the plurality of capturing apparatuses 11. Details of this process will be described later with reference to fig. 16 and 17.

In step S8, the image processing system 1 executes the following content reproduction processing: the playback device 13 is caused to display a free viewpoint image of the 3D model with the 3D model data generated in the offline modeling process as content. Details of this process will be described later with reference to fig. 18.

On the other hand, when it is determined in step S6 that the real-time modeling is performed, the following process of step S9 is performed.

In step S9, the image processing system 1 executes the real-time modeling reproduction processing as follows: 3D model data is generated based on image streams of the subject photographed by the plurality of photographing apparatuses 11, and the generated 3D model data is transmitted to the reproducing apparatus 13 and caused to display a free viewpoint image of the 3D model. Details of this process will be described later with reference to fig. 19.

The volume shooting reproduction processing based on the image processing system 1 ends in the above.

The processing of each step in fig. 5 may be performed continuously, or each step may be performed at different timings and with a predetermined interval.

< 5. Flowchart of packet processing of photographing apparatus >

Next, the details of the grouping processing of the photographing apparatus 11 performed as step S1 of fig. 5 will be described with reference to the flowchart of fig. 6.

Among the plurality of photographing apparatuses 11 participating in volume photographing, a predetermined 1 photographing apparatus 11 is referred to as a master apparatus, and the other photographing apparatuses 11 are referred to as participating apparatuses.

First, in step S21, the photographing device 11 as a master starts a volume photographing application, inputs a user ID, a password, and a group name, and transmits a group registration request to the cloud server 12. The user ID may be input by the user, or may be automatically (not input by the user) input using the terminal name, MAC address, or the like of the photographing apparatus 11. When the group registration request is transmitted to the cloud server 12, the group registration request is also transmitted to the cloud server 12 in match with the location information of the master device. The positional information may be information that enables determination of whether or not the plurality of imaging devices 11 constituting the group can perform volume imaging in step S24 described later. The location information may be, for example, latitude and longitude information detected by the GPS sensor 43, latitude and longitude information obtained by triangulation of the electric field intensity of a base station using carrier communication, latitude and longitude information obtained by transmitting access point information (for example, SSID and MAC address information) of Wi-Fi wireless communication to Wi-Fi location information service, or the like.

In step S22, the master device acquires the two-dimensional code returned from the cloud server 12 in accordance with the group registration request, and displays the two-dimensional code on the display 37.

Fig. 7 shows an example of a screen of the two-dimensional code displayed on the display 37 of the host device in step S22.

In the screen of fig. 7, the user ID (USERID), password (PASSWD), group name (GroupName) entered by the user in step S21 are displayed together with the text of "volume modeling service" which is the name of the service provided by the cloud server 12. In addition, the two-dimensional code is displayed together with the text "please take the following two-dimensional code with a camera".

In step S23, the photographing device 11 other than the main device, that is, each of the participating devices starts the volume photographing application, and photographs the two-dimensional code displayed on the display 37 of the main device. The participating device that has captured the two-dimensional code transmits group identification information identifying the registered group and its own position information to the cloud server 12. Alternatively, if the participating device captures a two-dimensional code, the volume capturing application may be automatically started, and the group identification information and the position information of the participating device may be transmitted to the cloud server 12. The group identification information may be, for example, a user ID or a group name.

In step S24, the cloud server 12 determines whether or not the master device and the participating device of the same group are capable of volume shooting using the group identification information and the position information transmitted from the participating device. The controller 101 of the cloud server 12 determines whether or not the participating device is capable of volume shooting, for example, based on whether or not the participating device is located in a vicinity position within a certain range from the position of the main device. When the participating device is present in a vicinity position within a certain range from the position of the master device, the cloud server 12 determines that volume photographing is possible.

When it is determined in step S24 that the participating device is capable of volume shooting, the cloud server 12 registers the participating device in the group indicated by the group identification information, that is, in the group of the master device, and transmits a registration completion message to the participating device.

On the other hand, when it is determined in step S24 that the participating device cannot perform volume shooting, the cloud server 12 transmits a group registration error to the participating device.

In step S27, the image processing system 1 determines whether or not the group registration is completed, and if it is determined that the group registration is not completed, the processing in steps S23 to S27 is repeated. For example, in the master device, when the display of the screen of the two-dimensional code of fig. 7 is ended and an operation to end the registration job is performed, it is determined that the group registration is completed.

When it is determined in step S27 that the group registration is completed, the grouping process of the photographing apparatus 11 of fig. 6 ends.

In the grouping processing described above, the group identification information is acquired by identifying the two-dimensional bar code and transmitted to the cloud server 12, but the group identification information may be input by the user and transmitted to the cloud server 12.

The group registration of the participating devices may be performed by an operation such as a WPS button of Wi-Fi (registered trademark). As a specific process, for example, when the master device presses a group registration button on the screen, the master device transmits the subject and location information that is in the group registration accepted state to the cloud server 12. When the user of the participating device presses the group registration button on the screen, the subject and position information of the pressing of the group registration button are transmitted from the participating device to the cloud server 12. The cloud server 12 confirms the position information of the participating device having pressed the group registration button within a certain time from the pressing of the group registration button of the master device, determines whether or not the group can be registered in the master device, and transmits a registration completion message or a group registration error to the participating device.

In the above-described grouping process, it is determined whether or not the participating device is capable of volume shooting based on only the position information of the master device and the participating device, but it may be determined based on other contents.

For example, the determination may be made based on whether or not the master device and the participating device are connected to the same access point. For example, the master device and the participant device may each photograph the same subject, and determine that volume photographing is possible based on the fact that the same subject is photographed in the photographed image. For example, the master device may be used to capture a predetermined portion of the participant device (for example, a display image of a display) from the captured image, and it may be determined that volume capturing is possible. Instead, the master device may also be photographed with the participant device. Further, a plurality of the above determination contents may be combined to determine.

By the above-described grouping processing, a general electronic device used by a user at ordinary times, for example, a smart phone, a tablet pc, or the like can be simply registered as the photographing device 11 that performs volume photographing, and an operation in cooperation with another photographing device 11 that performs group registration can be performed.

< 6. Flowchart of camera calibration processing of photographing apparatus >

Next, details of the camera calibration process of the photographing apparatus 11 performed as step S3 of fig. 5 will be described.

In a dedicated imaging room or the like for volume imaging, a calibration plate or the like is prepared in advance for calibration of a camera, but it is necessary that calibration can be performed even without such a specially prepared plate or the like.

Accordingly, in the image processing system 1, the following camera calibration process is performed: as shown in fig. 8, by sequentially performing processing of causing one photographing device 11 to display a calibration plate image by all photographing devices 11 grouped, and the other all photographing devices 11 photograph the displayed calibration plate image and detect feature points, camera parameters of the photographing devices 11 are calculated. As the camera 51 of the photographing apparatus 11, a camera 51 of the same face as the display 37 is used.

In the example of fig. 8, the photographing device 11-5 among the 5 photographing devices 11 displays a calibration plate image, and the other photographing devices 11-1 to 11-4 photograph the calibration plate image of the photographing device 11-5 and detect feature points. The photographing devices 11-1 to 11-4 also sequentially display calibration plate images, and the other photographing devices 11 photograph the displayed calibration plate images and detect feature points.

The details of the camera calibration process of the photographing apparatus 11 performed as step S3 of fig. 5 will be described with reference to the flowchart of fig. 9.

First, in step S41, the cloud server 12 selects one photographing apparatus 11 that displays a calibration plate image. The cloud server 12 transmits a command (message) to display a calibration board image for the selected photographing apparatus 11 (hereinafter, also referred to as the selected photographing apparatus 11).

In step S42, the selection photographing apparatus 11 receives a command to display a calibration board image from the cloud server 12, and displays the calibration board image on the display 37.

In step S43, the other photographing apparatuses 11 than the selected photographing apparatus 11 photograph the calibration plate images displayed on the display 37 of the selected photographing apparatus 11 in synchronization with each other.

In step S44, the other imaging devices 11 than the imaging device 11 are selected to detect the feature points of the calibration plate image obtained by the imaging, and the feature point information, which is the information of each detected feature point, is transmitted to the cloud server 12.

In step S45, the calibration section 105 of the cloud server 12 acquires and stores the feature point information of the calibration plate image transmitted from each photographing device 11 via the communication section 103 or the like.

In step S46, the cloud server 12 determines whether or not the calibration board image is displayed in all of the photographing devices 11 grouped.

In the case where it is determined in step S46 that the calibration plate image is not displayed in all the photographing devices 11, the process returns to step S41, and the processes of steps S41 to S46 described above are executed again. That is, the following processing is performed: one of the photographing devices 11 that has not displayed the calibration plate image is selected, and the other photographing devices 11 photograph the displayed calibration plate image and detect feature points, respectively, and transmit feature point information to the cloud server 12.

On the other hand, when it is determined in step S46 that the calibration plate image is displayed on all the photographing devices 11, the process proceeds to step S47, and the calibration section 105 of the cloud server 12 uses the stored feature point information to estimate the three-dimensional position of the feature point, the internal parameters of the photographing devices 11, and the external parameters for each photographing device 11.

As a method of calculating camera parameters of an imaging device from images captured by a plurality of imaging devices, there are, for example, algorithms for solving beam adjustment and nonlinear optimization problems, etc., as described in "installation and evaluation of beam adjustment for three-dimensional restoration" in "rock-point-you , tube Gu Baozhi, jin Gujian first." Computer Vision and Image Medium (CVIM) 2011.19 (2011): 1-8 et al.

In step S48, the calibration unit 105 of the cloud server 12 supplies the inferred internal parameters and external parameters of each photographing device 11 to the offline modeling unit 108 and the real-time modeling unit 111, and ends the camera calibration process.

By the camera calibration processing described above, even in the case where there is no particularly prepared calibration board or the like, the camera parameters of the respective photographing devices 11 grouped can be calculated.

In the camera calibration processing described above, on the premise that the same calibration plate image is displayed by the plurality of photographing devices 11 that are grouped, the plurality of photographing devices 11 sequentially display the calibration plate image.

However, in the case where the calibration plate images displayed by the plurality of photographing devices 11 grouped are different as shown in fig. 10, the plurality of photographing devices 11 simultaneously display the calibration plate images, and each photographing device 11 can detect the feature points of the calibration plate images displayed at the other photographing devices 11 at once. The calibration plate image may be an image having a different pattern shape or may be an image having a different color. The calibration plate image shown in fig. 10 is an example of a plate image of a checkerboard pattern and a plate image of a dot pattern, and is an image in which the number of lattices (MxN, kxL), the size of dots, and the density are different.

Alternatively, as in the case of the photographing devices 11-6 of fig. 11, one of the photographing devices 11 that are grouped may be used for displaying the calibration plate image, and the user may detect the feature point of the calibration plate image by photographing with each photographing device 11 while moving the photographing device 11-6 on which the calibration plate image is displayed like a calibration plate.

< 7. Flowchart of image setting process for volume image capturing >

Next, the details of the volume imaging setting process executed as step S4 in fig. 5 will be described with reference to the flowchart in fig. 12.

First, in step S61, the grouped plurality of photographing apparatuses 11 transmit their own capability information to the cloud server 12, respectively.

The capability information is information on the shooting function and settable range of the shooting device 11, and can include, for example, information on the following items.

1. Camera type (RGB, RGB-D, IR)

2. Camera settings 1 (exposure time, gain, zoom)

3. Camera settings 2 (shooting resolution, ROI, frame Rate)

4. Image coding setting (output resolution, compression coding mode, bit rate)

5. Camera synchronization mode (Type 1, type2, type 3)

6. Camera calibration mode (Type 1, type2, type 3)

The ROI of the camera setting 2 represents a range that can be set as a region of interest among the imaging ranges represented by the imaging resolution.

Fig. 13 shows an example of capability information possessed by a certain photographing apparatus 11.

According to the capability information of fig. 13, the camera of the photographing apparatus 11 is identified as "camera 1", the camera type is "RGB", any one of "500", "1000", "10000" can be set as the exposure time, any one of "30", "60", "120" can be set as the frame rate, and any one of "3840×2160", "1920×1080", "640×480" can be set as the photographing resolution.

In the image capturing apparatus 11, any one of "h.264", "h.265" and "3840×2160", "1920×1080" and "640×480" can be set as the compression encoding scheme, and any one of "1M", "3M", "5M", "10M", "20M" and "50M" can be set as the bit rate.

In the photographing apparatus 11, any one of "type 1", "type 2", and "type 3" can be set as the camera synchronization method, and any one of "type 1", "type 2", and "type 3" can be set as the calibration method.

In step S62, the cloud server 12 generates candidates of setting values of the respective photographing apparatuses 11 suitable for volume photographing based on the capability information of all the photographing apparatuses 11 set as one group. The setting values of the respective photographing devices 11 are common to all photographing devices 11 set as one group, and also different setting values are present. For example, the camera synchronization method and the like are set in common in all the photographing devices 11, but the camera type may be set as the plurality of photographing devices 11 are RGB, the remaining photographing devices 11 are IR, and the like.

In step S63, the cloud server 12 determines whether or not volume imaging, that is, imaging for generating a 3D model is possible. The details of whether or not volume imaging is possible are described later, but for example, the imaging target areas of all the imaging devices 11 set as one group can be superimposed, and it can be determined whether or not there are areas common to a certain number of imaging devices 11 or more.

When it is determined in step S63 that the volume photographing is possible, the process proceeds to step S64, and the cloud server 12 transmits the candidates of the set value to each photographing apparatus 11, and each photographing apparatus 11 receives the candidates of the set value and presents the candidates to the user.

The user refers to the displayed candidates of the setting values, and selects the desired candidates of the setting values. Then, in step S65, each photographing apparatus 11 transmits candidates of the setting value specified by the user to the cloud server 12.

In step S66, the cloud server 12 acquires candidates of the set values designated by the user and transmitted from the respective photographing apparatuses 11, transmits a setting signal for setting the set values designated by the user to the respective photographing apparatuses 11, and ends the volume photographing setting process.

On the other hand, when it is determined in step S63 that the volume photographing is not possible, the process proceeds to step S67, and the cloud server 12 transmits the case where the volume photographing is not possible to each photographing apparatus 11, and presents the same to the user. Thus, the volume imaging setting process ends.

As described above, in the imaging setting process for volume imaging, it is determined whether or not volume imaging is possible based on the capability information of each imaging device 11, and a predetermined setting value is selected from the setting values at which volume imaging is possible.

A determination example of the set value candidates of each imaging device 11 and a determination example of whether or not volume imaging is possible based on the capability information of each imaging device 11 will be described below. It is assumed that the capability information of each photographing apparatus 11 is acquired. The desired value for the processing time, which is a desired value of the frame rate during generation of the 3D model data and the content that is desired to be processed for several minutes in several minutes, is set as the request value of the user.

[STEP1]

First, the controller 101 of the cloud server 12 determines a camera synchronization method based on the capability information of each photographing apparatus 11. For example, when the camera synchronization methods of the photographing apparatuses 11-1 to 11-N take the following conditions, the controller 101 selects a camera synchronization method with a higher priority among the camera synchronization methods supported by all the photographing apparatuses 11. In the case where the priority is Type1> Type2> … > Type, type2 is selected.

Photographing apparatus 11-1: type1, type2, … Type K

Photographing apparatus 11-2: type2, … Type K

…

Photographing apparatus 11-N: type1, type2, … Type K

[STEP2]

Next, the controller 101 measures the communication band with each of the photographing devices 11, and determines the maximum bit rate [ Mbps ]. The maximum bit rates respectively decided by the photographing apparatuses 11-1 to 11-N are as follows.

Photographing apparatus 11-1: x is X ₀ [Mbps]

Photographing apparatus 11-2: x is X ₁ [Mbps]

…

Photographing apparatus 11-N: x is X _n [Mbps]

[STEP3]

Next, the controller 101 calculates possible combinations of the camera type, resolution, frame rate, compression coding scheme, and the like within the range of the maximum bit rate of each photographing device 11, and determines candidates of the set value for each photographing device 11.

Photographing apparatus 11-1:

candidate 1 (RGB, 3840X 2160, 30, H.265, 40M)

Candidate 2 (RGB, 3840X 2160, 60, H.265, 50M)

Candidate 3 (RGB, 1920×1080, 30, H.265, 20M)

Candidate 4 (RGB-D, 1920×1080, 30, H.265, 40M)

Photographing apparatus 11-2:

candidate 1 (RGB, 3840X 2160, 30, H.265, 40M)

Candidate 2 (RGB, 3840X 2160, 60, H.265, 50M)

Candidate 3 (RGB, 1920×1080, 120, H.265, 20M)

Candidate 4 (RGB-D, 1920×1080, 30, H.265, 40M)

…

[STEP4]

The controller 101 determines whether or not volume imaging is possible using the determined candidates of the set values. The controller 101 can determine whether or not volume photographing is possible by the following method.

For example, the controller 101 calculates the imaging target area of each imaging device 11 based on the calibration result of each imaging device 11, and determines whether or not volume imaging is possible based on whether or not there is a common imaging area when the imaging target areas of each imaging device 11 are overlapped. The shared imaging region may be a region shared by all the imaging target regions of the imaging devices 11, or may be a region shared by the imaging target regions of the imaging devices 11 having a predetermined number or more.

Further, for example, the controller 101 calculates modeling parameters of the 3D model from the resolution of each imaging device 11 and the imaging target region, and determines whether or not volume imaging is possible based on whether or not the calculated modeling parameters are within a predetermined range. The modeling parameters consist of voxel size and bounding box size. More specifically, the controller 101 determines the size of the voxel (several square millimeters) when generating the 3D model, based on the resolution of each imaging device 11 and the imaging target region. The controller 101 determines whether or not the volume imaging is possible based on whether or not the calculated Voxel size is between the lower limit value voxel_l and the upper limit value voxel_u of the Voxel size (voxel_l is equal to or smaller than Voxel size is equal to or smaller than voxel_u). The controller 101 may add a plurality of Voxel sizes and bounding box sizes between the lower limit value voxel_l and the upper limit value voxel_u of the Voxel sizes as candidates of the set value.

Further, for example, the controller 101 estimates a processing time required for 3D modeling from candidates of each set value of each imaging device 11, and determines whether or not volume imaging is possible based on whether or not the estimated 3D modeling processing time is within a predetermined time. More specifically, the controller 101 estimates the decoding processing time of the 3D model data from the resolution, frame rate, compression encoding scheme, bit rate, and the like of each imaging device 11, and estimates the rendering processing time of the free-viewpoint image based on the number of processing target pixels calculated from the voxel size and the bounding box size. The controller 101 determines whether the estimated 3D modeling processing time is within the processing time requested by the user in the case of offline modeling, and determines whether the estimated 3D modeling processing time is equal to or less than a predetermined value that can be processed in real time in the case of real-time modeling.

[STEP5]

When it is determined that the volume imaging is possible, the controller 101 arranges candidates of the set value determined in [ STEP3] in a predetermined priority order and presents the candidates to the user.

For example, when giving priority to image quality, candidates of the set values are presented in the order of (1) a set value having a small voxel size, (2) a set value having a high resolution, and (3) a set value having a high frame rate.

In the case of prioritizing the frame rate, for example, candidates of the set values are presented in the order of (1) a set value having a high frame rate, (2) a set value having a small voxel size, and (3) a set value having a high resolution.

As described above, the cloud server 12 determines the setting value candidates of each imaging device 11 based on the capability information of each imaging device 11, and automatically determines whether or not volume imaging is possible.

In the case of using a smartphone, a tablet computer, or the like, which is generally used by a user, as the photographing apparatus 11 that performs volume photographing, the performance of an electronic apparatus possessed by the user becomes various. By the above-described imaging setting process for volume imaging, the setting value relating to the imaging function can be optimally determined according to the performance of the imaging device 11 participating in volume imaging.

< 8. Flowchart of synchronous shooting processing of shooting device >

Next, details of the synchronous shooting process of the shooting device 11 performed in step S5 of fig. 5 will be described.

In the synchronous shooting process of the shooting device 11, there are at least two of the process described with reference to fig. 14 and the process described with reference to fig. 15. The synchronous shooting process of fig. 14 is a process of synchronously shooting using clock information in the shooting device 11, and the synchronous shooting process of fig. 15 is a process in a case where clock information in the shooting device 11 cannot be used. The synchronous shooting process in the case where the clock information in the shooting device 11 cannot be used is a shooting process for enabling post-synchronization of the shot image data.

First, a synchronous shooting process in the case of synchronously shooting using clock information in the shooting device 11 will be described with reference to a flowchart of fig. 14. At the start of the execution of the processing of fig. 14, the clock information within the photographing apparatus 11 is synchronized with high accuracy by the first to fourth synchronization methods described above.

In step S81, the cloud server 12 acquires the current time from each of the grouped plurality of photographing apparatuses 11.

In step S82, the cloud server 12 determines, as the capturing start time, a time obtained by adding a predetermined time to the latest time among the acquired times of the respective photographing devices 11.

In step S83, the cloud server 12 transmits a predetermined command to each photographing apparatus 11 via the network, thereby controlling the cameras 51 of each photographing apparatus 11 to be in a standby state. The standby state is a state in which photographing can be performed based on a synchronization signal when the synchronization signal is input.

In step S84, the cloud server 12 transmits the capture start time determined in step S82 to each of the photographing devices 11.

In step S85, each of the photographing devices 11 acquires the capture start time transmitted from the cloud server 12, and sets the acquisition start time in the synchronization signal generation unit 45. When the acquisition start time arrives, the synchronization signal generation unit 45 starts generation and output of the synchronization signal. The synchronization signal is output at the frame rate set in the photographing setting process of fig. 8 and supplied to the camera 51.

In step S86, the camera 51 of each photographing apparatus 11 photographs a subject to be a 3D model based on the inputted synchronization signal. The image data (image stream) of the subject obtained by shooting is temporarily stored in the shooting device 11 or transmitted to the cloud server 12 in real time.

In step S87, each photographing apparatus 11 determines whether the end of photographing is instructed. The end of the photographing may be determined by operating the photographing apparatus 11 specified by the main apparatus or the like by the user, and the end may be instructed from the photographing apparatus 11 to another photographing apparatus 11, or may be instructed via the cloud server 12.

If it is determined in step S87 that the end of shooting is not instructed, steps S86 and S87 are repeated. That is, shooting based on the synchronization signal is continued.

On the other hand, when it is determined in step S87 that the end of photographing is instructed, the process proceeds to step S88, and each photographing apparatus 11 stops photographing the subject, ending the synchronous photographing process. The generation of the synchronization signal by the synchronization signal generation unit 45 is also stopped.

Next, a synchronous shooting process in the case where clock information in the shooting device 11 cannot be used will be described with reference to the flowchart of fig. 15.

First, in step S101, the cloud server 12 transmits a predetermined command to each photographing apparatus 11 via the network, thereby causing each photographing apparatus 11 to start photographing.

In step S102, each photographing device 11 photographs a subject to be a 3D model based on the synchronization signal generated by the synchronization signal generation section 45 based on the internal clock. The image data (image stream) of the subject obtained by shooting is temporarily stored in the shooting device 11 or transmitted to the cloud server 12 in real time.

In step S103, the cloud server 12 determines whether or not the predetermined time has elapsed since the start of shooting.

When it is determined in step S103 that the predetermined time has elapsed since the start of shooting, the cloud server 12 transmits a lighting instruction of the flash 39 to the predetermined shooting device 11, for example, the master device in step S104.

In step S105, the predetermined photographing apparatus 11, which has transmitted the lighting instruction of the flash, lights the flash 39 based on the lighting instruction.

On the other hand, when it is determined in step S103 that the predetermined time is not elapsed after the predetermined time has elapsed from the start of shooting, steps S104 and S105 are skipped. Therefore, the processing of steps S104 and S105 is executed only at a predetermined time after a lapse of a certain time from the start of shooting.

In step S106, each photographing apparatus 11 determines whether the end of photographing is instructed. If it is determined in step S106 that the end of shooting is not instructed, steps S102 to S106 are repeated. That is, shooting based on the synchronization signal is continued.

On the other hand, when it is determined in step S106 that the end of photographing is instructed, the process proceeds to step S107, and each photographing apparatus 11 stops photographing the subject, ending the synchronous photographing process. The generation of the synchronization signal by the synchronization signal generation unit 45 is also stopped.

According to the synchronous shooting process of fig. 15, images of flash light emission are included in the image streams shot by the grouped plurality of shooting apparatuses 11. In performing the 3D modeling processing, by aligning the time information of the image streams respectively photographed by the plurality of photographing apparatuses 11 so that the frames in which the flash emits light become the same time stamp, synchronization of the photographed images respectively photographed by the plurality of photographing apparatuses 11 can be acquired.

In addition to the method of lighting the flash 39 during shooting to align the time stamps, the synchronization of the shot images may be acquired by outputting sound from the speaker 35 and aligning the time information of the image stream so that the frames in which the sound is recorded are the same time stamp.

< 9. Flow chart of offline modeling Process >)

Next, the details of the offline modeling process performed as step S7 of fig. 5 will be described with reference to the flowchart of fig. 16.

First, in step S121, the plurality of photographing apparatuses 11 determine whether or not photographing is completed, and wait until it is determined that photographing is completed.

When it is determined in step S121 that the photographing is completed, the process proceeds to step S122, and each photographing apparatus 11 transmits an image stream of a photographed image obtained by photographing the subject to the cloud server 12 via a predetermined network. The captured image obtained by capturing the subject is subjected to predetermined image signal processing such as demosaicing processing in the image processing unit 54, is subjected to predetermined compression encoding processing in the image compression unit 55, and is then transmitted as an image stream from the stream transmission unit 59.

In step S123, the stream receiving unit 104 of the cloud server 12 receives the image streams transmitted from the respective photographing devices 11 via the communication unit 103, and supplies the image streams to the modeling task generating unit 106.

In step S124, the modeling task generation unit 106 of the cloud server 12 acquires the camera parameters of each photographing apparatus 11 from the calibration unit 105, generates a modeling task together with the image stream from the stream reception unit 104, and stores the modeling task in the task storage unit 107. In the task storage unit 107, the modeling tasks sequentially supplied from the modeling task generation unit 106 are stored as a task queue.

In step S125, the offline modeling unit 108 acquires one of the modeling tasks stored in the task storage unit 107.

In step S126, the offline modeling section 108 acquires a captured image of the i-th frame in the image stream of each capturing device 11 of the acquired modeling task. Here, "1" is set to the initial value of the variable i of the frame number of the image stream.

In step S127, the offline modeling unit 108 generates a silhouette image from the acquired captured image of the i-th frame of each imaging device 11 and a background image set in advance. The silhouette image is an image representing the subject region by silhouette, and can be generated by using, for example, a background difference method of acquiring a difference between the captured image and the background image.

In step S128, the offline modeling unit 108 generates shape data representing the 3D shape of the object from the N silhouette images corresponding to the respective imaging devices 11, for example, by Visual Hull. Visual Hull is a method of projecting N silhouettes based on camera parameters and cutting out a three-dimensional shape. Shape data representing a 3D shape of an object is represented by voxel data representing whether the object belongs or does not belong in units of a three-dimensional grid (voxel), for example.

In step S129, the offline modeling unit 108 converts shape data representing the 3D shape of the object from voxel data into data in a mesh format called a polygonal mesh. In the transformation of the data form of the polygon mesh that is easy to render in the display device, for example, an algorithm such as a matching cube method can be used.

In step S130, the offline modeling unit 108 generates a texture image corresponding to the shape data of the object. In the case where the texture image takes the multi-texture form described with reference to fig. 3, the photographed image photographed by each photographing device 11 is directly taken as the texture image. On the other hand, in the case of adopting the UV mapping format described with reference to fig. 3, a UV mapping image corresponding to the shape data of the object is generated as a texture image.

In step S131, the offline modeling unit 108 determines whether or not the current frame is the final frame of the image stream of each photographing device 11 of the acquired modeling task.

In the case where it is determined in step S131 that the current frame is not the final frame of the image stream of each photographing apparatus 11 of the acquired modeling task, the process advances to step S132. Then, after the variable i is self-added to 1 in step S132, the above steps S126 to S131 are repeated. That is, for the captured image of the next frame in the image stream of each capturing apparatus 11 of the acquired modeling task, the shape data and the texture image of the object are generated.

On the other hand, when it is determined in step S131 that the current frame is the final frame of the image stream of each photographing apparatus 11 of the acquired modeling task, the process proceeds to step S133, and the offline modeling unit 108 supplies the 3D model data of the object to the content management unit 109. The content management unit 109 stores and manages the 3D model data of the object supplied from the offline modeling unit 108 as content in the content storage unit 110.

Fig. 17 shows an example of 3D model data of an object stored in the content storage unit 110. The 3D model data of the object has shape data and texture images of the object in units of frames.

According to the above, the offline modeling process ends.

< 10. Flow chart of content reproduction processing >)

Next, the details of the content reproduction process executed as step S8 in fig. 5 will be described with reference to the flowchart in fig. 18.

A user of the reproducing apparatus 13, that is, a viewer performs an operation of designating content to be reproduced. In step S141, the messaging unit 152 of the playback device 13 requests the cloud server 12 for 3D model data of the content that is specified for playback by the user.

In step S142, the cloud server 12 transmits the 3D model data of the specified content to the reproduction device 13. More specifically, the controller 101 of the cloud server 12 receives a message requesting 3D model data of predetermined content via the messaging unit 102. The controller 101 instructs the content management section 109 to transmit the requested content to the reproduction device 13. The content management unit 109 acquires 3D model data of the specified content from the content storage unit 110, and transmits the 3D model data to the playback device 13 via the stream transmission unit 112.

In step S143, the reproducing apparatus 13 acquires shape data and texture images of the kth frame as the content transmitted from the cloud server 12. Here, "1" is set to the initial value of the variable k of the frame number of the 3D model data of the transmitted content.

In step S144, the control unit 153 of the playback device 13 determines the virtual viewpoint position, which is the viewing position of the viewer with respect to the object, based on the sensor information detected by the sensor 154. The information of the determined virtual viewpoint position is supplied to the playback unit 157.

In step S145, the playback unit 157 of the playback device 13 performs rendering processing of an image of the object that depicts the 3D model, based on the determined virtual viewpoint position. That is, the playback unit 157 generates an object image obtained by observing the object of the 3D model from the virtual viewpoint position, and outputs the object image to the display 159 via the image output unit 158.

In step S146, the playback unit 157 determines whether or not the current frame is the final frame of the acquired content.

In the case where it is determined in step S146 that the current frame is not the final frame of the acquired content, the process advances to step S147. Then, after the variable k is added 1 from step S147, the above steps S143 to S146 are repeated. That is, based on the shape data and texture image of the object of the next frame of the acquired content, an object image observed from the virtual viewpoint position is generated and displayed.

On the other hand, when it is determined in step S146 that the current frame is the final frame of the acquired content, the content reproduction process ends.

< 11. Flow chart of real-time modeling reproduction processing >

Next, details of the real-time modeling reproduction processing executed as step S9 in fig. 5 will be described with reference to the flowchart in fig. 19.

The state in which the real-time modeling reproduction process is performed is the case in which it is determined in step S6 of fig. 5 that real-time modeling is performed, and therefore the captured images based on the volume captured image stream are sequentially transmitted from each of the plurality of capturing devices 11 to the cloud server 12.

In step S161, the stream receiving unit 104 of the cloud server 12 receives the captured images of the image streams sequentially transmitted from the respective capturing apparatuses 11 via the communication unit 103, and supplies the captured images to the real-time modeling unit 111.

In step S162, the real-time modeling unit 111 acquires the captured images of the respective imaging devices 11 supplied and stored from the stream receiving unit 104 frame by frame. More specifically, the real-time modeling unit 111 acquires, for each of the plurality of imaging devices 11, the 1 frame in which the time information in the accumulated imaging image of each imaging device 11 is earliest.

In step S163, the real-time modeling unit 111 generates a silhouette image from the acquired captured images of the respective imaging devices 11 and a background image set in advance.

In step S164, the real-time modeling unit 111 generates shape data representing the 3D shape of the object from the N silhouette images corresponding to the respective imaging devices 11, for example, by Visual Hull. Shape data representing a 3D shape of an object is represented by voxel data, for example.

In step S165, the real-time modeling unit 111 converts shape data representing the 3D shape of the object from voxel data to data in a mesh format called a polygonal mesh.

In step S166, the real-time modeling unit 111 generates a texture image corresponding to the shape data of the object.

In step S167, the real-time modeling section 111 transmits the shape data and the texture image of the object to the playback device 13 via the stream transmission section 112.

In step S168, the reproduction section 157 of the reproduction apparatus 13 acquires the shape data and the texture image of the object transmitted from the cloud server 12 via the stream reception section 156.

In step S169, the control unit 153 of the playback device 13 determines the virtual viewpoint position, which is the viewing position of the viewer with respect to the object, based on the sensor information detected by the sensor 154. The information of the determined virtual viewpoint position is supplied to the playback unit 157.

In step S170, the playback unit 157 of the playback device 13 performs rendering processing of an image of the object that depicts the 3D model, based on the determined virtual viewpoint position. That is, the playback unit 157 generates an object image obtained by observing the object of the 3D model from the virtual viewpoint position, and outputs the object image to the display 159 via the image output unit 158.

In step S171, the control section 153 of the playback device 13 determines whether or not the reception of the shape data and the texture image of the object of the 3D model is ended.

If it is determined in step S171 that the reception of the shape data and the texture image of the object of the 3D model has not been completed, the process proceeds to step S162, and the processes of steps S162 to S171 described above are repeated.

On the other hand, when it is determined in step S171 that the reception of the shape data and the texture image of the object of the 3D model is completed, the real-time modeling reproduction process is completed.

Further, in the real-time modeling reproduction processing described above, the processing of the cloud server 12 and the reproduction device 13 is described as being continuously performed, but in practice, it is needless to say that the processing is independently performed in each of the cloud server 12 and the reproduction device 13. The real-time modeling reproduction process is a process as follows: the cloud server 12 sequentially generates and transmits 3D model data in units of frames composed of shape data and texture images shown in fig. 17, and the playback device 13 sequentially receives the 3D model data in units of frames, plays back (draws) and displays it.

< 12. Flow chart of automatic calibration Process >)

In photographing using a smartphone or a tablet pc as the playback device 13, it is not possible to assume that the camera is firmly fixed as is done in a conventional dedicated studio, and that the position shifts due to vibrations such as hand shake.

Therefore, in the above-described real-time modeling reproduction processing, the cloud server 12 can acquire sensor information of the gyro sensor 41, the acceleration sensor 42, and the like of the photographing apparatus 11 to determine the positional deviation of the photographing apparatus 11, and perform the automatic calibration processing of updating the camera parameters.

The automatic calibration process will be described with reference to the flowchart of fig. 20. This process is performed, for example, simultaneously with the real-time modeling reproduction process of fig. 19.

First, in step S181, the photographing apparatus 11 acquires sensor information of various sensors such as the gyro sensor 41, the acceleration sensor 42, the GPS sensor 43, and the like, and transmits the acquired sensor information to the cloud server 12 as frame data together with a photographed image. The sensor information is, for example, header information stored in each frame, and is transmitted in units of frames.

In step S182, the stream receiving unit 104 of the cloud server 12 receives the captured images of the image streams sequentially transmitted from the respective capturing apparatuses 11 via the communication unit 103, and supplies the captured images to the real-time modeling unit 111. The real-time modeling unit 111 supplies sensor information of frame data stored in the captured image to the automatic calibration unit 113.

In step S183, the automatic calibration unit 113 determines whether or not the imaging device 11 that has moved by a predetermined value or more is present in the imaging devices 11 that perform volume imaging, based on the sensor information from the real-time modeling unit 111.

If it is determined in step S183 that the photographing apparatus 11 having moved by the predetermined value or more is present, the following processing in steps S184 to S186 is executed, and if it is determined that the photographing apparatus 11 having moved by the predetermined value or more is not present, the processing in steps S184 to S186 is skipped.

In step S184, the automatic calibration section 113 extracts feature points from the captured images of the respective imaging devices 11.

In step S185, the automatic calibration unit 113 calculates camera parameters of the imaging device 11 to be updated using feature points extracted from the captured images of the imaging devices 11. The image capturing apparatus 11 to be updated is the image capturing apparatus 11 determined to have moved by the predetermined value or more in step S183. The camera parameters calculated here are external parameters among the internal parameters and the external parameters, and the internal parameters are not changed.

In step S186, the automatic calibration section 113 supplies the calculated camera parameters of the imaging device 11 to be updated to the real-time modeling section 111. The real-time modeling unit 111 generates 3D model data using the updated camera parameters of the imaging device 11.

In step S187, the cloud server 12 determines whether or not shooting is ended. For example, the cloud server 12 determines that the image capturing has not been completed when the image capturing devices 11 transmit the captured images, and determines that the image capturing has been completed when the image capturing devices 11 do not transmit the captured images.

When it is determined in step S187 that the shooting has not been completed, the process returns to step S181, and the processes in steps S181 to S187 described above are repeated.

On the other hand, when it is determined in step S187 that the photographing is completed, the automatic calibration process is completed.

In the above-described automatic calibration processing, the camera parameters are calculated and updated only for the photographing apparatus 11 determined to have moved by a predetermined value or more, but the camera parameters of all the photographing apparatuses 11 may be calculated when the camera parameters are updated. In addition, not only the external parameter but also the internal parameter may be calculated.

In addition, when calculating the camera parameters of the imaging device 11 to be updated, the external parameters may be calculated by limiting the range of the external parameters based on the offset calculated from the sensor information. Since the calculation of the external parameters is a nonlinear optimization calculation concerning the three-dimensional position of the feature points, the internal parameters of the imaging apparatus 11, and the external parameters, the calculation amount can be reduced by limiting the desirable range of the external parameters or fixing the internal parameters, and the processing can be speeded up.

< 13 camera calibration process without calibration plate image >)

In the camera calibration process described with reference to fig. 8 to 11, instead of using a calibration board, a calibration board image is displayed at the photographing device 11 and feature points are extracted, thereby calculating camera parameters.

However, considering the display size of a smart phone or a tablet computer, the feature points may not be sufficiently detected.

Therefore, a camera calibration process (hereinafter, referred to as a second camera calibration process) other than the method of displaying the calibration plate image at the photographing apparatus 11 will be described.

In the second camera calibration process described below, the image processing system 1 photographs an arbitrary subject, extracts feature points thereof, and thereby calculates camera parameters. For example, as shown in fig. 21, camera parameters are calculated based on the result of photographing 1 person and a table as the subject.

In addition, in the second camera calibration process, a control device 15 that controls the whole of the camera calibration process is prepared independently of the plurality of (5) photographing devices 11-1 to 11-5 that photograph the subject. The control device 15 may be a device having the same configuration as the reproduction device 13, and the reproduction device 13 may be used.

The second camera calibration process will be described with reference to fig. 22 to 25. Fig. 22 is a flowchart of the camera calibration process performed by the control device 15 in the second camera calibration process, and fig. 24 and 25 are flowcharts of the camera calibration process performed by the cloud server 12.

First, a camera calibration process performed by the control apparatus 15 will be described with reference to a flowchart of fig. 22.

In step S201, the control device 15 transmits a calibration shooting start message for starting calibration shooting to the cloud server 12.

After transmitting the calibration shooting start message, the control device 15 stands by until a predetermined message is transmitted from the cloud server 12. During calculation of the camera parameters of each photographing apparatus 11, feedback information for camera parameter calculation is transmitted from the cloud server 12 to the control apparatus 15. When the calculation of the camera parameters of each photographing device 11 is completed, a calibration completion message is transmitted from the cloud server 12 to the control device 15.

In step S202, the control device 15 receives the message transmitted from the cloud server 12.

In step S203, the control device 15 determines whether the calibration is completed. The control device 15 determines that the calibration is completed when the received message is a calibration completed message, and determines that the calibration is not completed when the received message is feedback information.

In the case where it is determined in step S203 that the calibration has not been completed, the process advances to step S204, and the control device 15 displays a feedback screen on the display based on the feedback information.

The calibration section 105 of the cloud server 12 extracts feature points for each of the captured images captured by the respective capturing devices 11, and associates the extracted feature points with each other in a plurality of captured images corresponding to the respective capturing devices 11. That is, the calibration section 105 acquires the correspondence between the photographing devices 11 for each of the extracted feature points. Then, the calibration unit 105 calculates camera parameters by solving a nonlinear optimization problem that minimizes errors, using the plurality of feature points for which correspondence between the imaging devices 11 has been acquired, and targeting the three-dimensional position, internal parameters, and external parameters of each feature point. In the case where there are few feature points extracted from the captured image and there are few feature points that establish a correspondence between the capturing apparatuses 11, the cloud server 12 transmits feedback information to the control apparatus 15 to feed back the user in order to re-perform capturing by the capturing apparatus 11.

Fig. 23 shows an example of a feedback screen displayed on the control device 15 based on feedback information from the cloud server 12.

The feedback screen of fig. 23 shows an example of a case where the number of feature points that establish correspondence between the photographing apparatuses 11-2 and 11-4 is small.

The captured image 201 captured by the capturing device 11-2 and the captured image 202 captured by the capturing device 11-4 are displayed on the feedback screen of fig. 23. Circles (poor) are displayed in a superimposed manner on feature points where correspondence is established, and crosses (x) are displayed in a superimposed manner on feature points where correspondence is not established, in each of the captured

images

201 and 202.

On the feedback screen, a feedback message 203 for improving the accuracy of calibration when the photographing is performed again is displayed. In the example of fig. 23, a feedback message 203 "please adjust the subject so that both the camera 2 and the camera 4 capture a shared portion" is displayed.

Then, on the feedback screen, an object image 204 in the case where a 3D model of the subject is generated using the calibration result at the current time is displayed. In the target image 204, the virtual viewpoint position can be changed by performing a swipe operation on the screen.

Returning to fig. 22, in the case where it is determined in step S203 that the calibration is completed, the process advances to step S205, and the control device 15 transmits a calibration shooting end message for ending the calibration shooting to the cloud server 12, ending the camera calibration process.

Next, a camera calibration process performed by the cloud server 12 will be described with reference to a flowchart of fig. 24.

First, in step S221, the cloud server 12 receives a calibration shooting start message transmitted from the control apparatus 15.

In step S222, the cloud server 12 causes each photographing apparatus 11 to start synchronous photographing. The synchronous shooting process of each shooting device 11 can be performed in the same way as in fig. 14.

In step S223, the cloud server 12 acquires the captured image transmitted from each of the capturing apparatuses 11.

In step S224, the cloud server 12 performs a camera parameter calculation process of calculating camera parameters using the captured images transmitted from the respective capturing apparatuses 11.

Fig. 25 is a flowchart showing details of the camera parameter calculation process of step S224.

In this process, first, in step S241, the calibration section 105 of the cloud server 12 extracts feature points for each of the captured images captured by the respective capturing apparatuses 11.

Next, in step S242, the calibration section 105 determines whether or not the extracted feature points are sufficient. For example, when the number of extracted feature points in 1 captured image is equal to or greater than a predetermined value, it is determined that the extracted feature points are sufficient, and when the number is smaller than the predetermined value, it is determined that the extracted feature points are insufficient.

When it is determined in step S242 that the extracted feature points are insufficient, the process proceeds to step S247, which will be described later.

On the other hand, when it is determined in step S242 that the extracted feature points are sufficient, the process proceeds to step S243, and the calibration unit 105 establishes correspondence between feature points between the imaging devices 11, and detects feature points corresponding between the imaging devices 11.

Next, in step S244, the calibration unit 105 determines whether or not the corresponding feature points are sufficient. For example, when the number of corresponding feature points in 1 captured image is equal to or greater than a predetermined value, it is determined that the corresponding feature points are sufficient, and when the number is smaller than the predetermined value, it is determined that the corresponding feature points are insufficient.

If it is determined in step S244 that the corresponding feature points are insufficient, the process proceeds to step S247, which will be described later.

On the other hand, when it is determined in step S244 that the corresponding feature points are sufficient, the process advances to step S245, where the calibration section 105 calculates the three-dimensional positions of the corresponding feature points, and calculates the camera parameters of the respective photographing devices 11, that is, the internal parameters and the external parameters. The three-dimensional position of each feature point and the camera parameters of each photographing apparatus 11 are calculated by solving a nonlinear optimization problem that minimizes an error.

In step S246, the calibration unit 105 determines whether or not the error in the nonlinear optimization operation at the time of calculating the camera parameters is a predetermined value or less and sufficiently small.

When it is determined in step S246 that the error in the nonlinear optimization operation is not equal to or smaller than the predetermined value, the process proceeds to step S247, which will be described later.

In step S247, the calibration section 105 generates feedback information including a captured image with feature points and a feedback message.

When the feature points extracted in step S242 are insufficient, the detected feature points are superimposed and displayed in circles (good) on the captured

images

201 and 202 of the feedback screen of fig. 23, which are the feedback information generated in the process of step S247. On the other hand, when the feature points corresponding to the step S244 are insufficient, the feature points corresponding to the feedback screen of fig. 23, which is the feedback information generated in the step S247, are displayed in a superimposed manner by circles ((see below)), and the feature points corresponding to the feedback screen are displayed in a superimposed manner by crosses (x). On the other hand, when it is determined in step S246 that the error is large, the captured image of the imaging device 11 having a large error is displayed on the feedback screen of fig. 23 as the feedback information generated in the process of step S247.

On the other hand, when it is determined in step S246 that the error of the nonlinear optimization operation is equal to or smaller than the predetermined value and sufficiently small, the process proceeds to step S248, and the calibration unit 105 generates a calibration completed message.

The camera parameter calculation process ends by the process of step S247 or S248, and the process returns to fig. 24 to proceed to step S225.

In step S225 of fig. 24, the calibration section 105 determines whether the calibration is completed, that is, whether a calibration completed message is generated.

When it is determined in step S225 that the calibration is not completed, that is, when feedback information is generated, the process proceeds to step S226, and the calibration unit 105 transmits the generated feedback information as a message. After step S226, the process returns to step S224, and the process of step S224 and the following steps are repeated.

On the other hand, when it is determined in step S225 that the calibration is completed, that is, when a calibration completed message is generated, the process advances to step S227, and the cloud server 12 transmits a photographing completion message to each photographing apparatus 11. Each photographing apparatus 11 receives the photographing end message, and ends the synchronous photographing.

In step S228, the cloud server 12 transmits the generated calibration complete message to the control device 15, ending the camera calibration process.

According to the second camera calibration process described above, the calibration of the camera can be performed without the calibration board or the calibration board image.

< 14. Computer structural example >)

The series of processes described above may be executed by hardware or software. When a series of processes are executed by software, a program constituting the software is installed on a computer. Here, the computer includes a microcomputer incorporated in dedicated hardware, a general-purpose personal computer capable of executing various functions by installing various programs, and the like.

Fig. 26 is a block diagram showing a configuration example of hardware of a computer that executes the series of processes described above by a program.

In the computer, a CPU (Central Processing Unit: central processing unit) 301, a ROM (Read Only Memory) 302, and a RAM (Random Access Memory: random access Memory) 303 are connected to each other through a bus 304.

An input/output interface 305 is also connected to the bus 304. An input unit 306, an output unit 307, a storage unit 308, a communication unit 309, and a driver 310 are connected to the input/output interface 305.

The input unit 306 is configured by a keyboard, a mouse, a microphone, a touch panel, an input terminal, and the like. The output unit 307 is configured by a display, a speaker, an output terminal, and the like. The storage unit 308 is constituted by a hard disk, a RAM disk, a nonvolatile memory, or the like. The communication unit 309 is constituted by a network interface or the like. The drive 310 drives a removable recording medium 311 such as a magnetic disk, an optical disk, an magneto-optical disk, or a semiconductor memory.

In the computer configured as described above, the CPU301 loads and executes the program stored in the storage unit 308 into the RAM303 via the input/output interface 305 and the bus 304, for example, and thus performs the series of processes described above. The RAM303 also appropriately stores data and the like necessary for the CPU301 to execute various processes.

The program executed by the computer (CPU 301) can be provided by being recorded in a removable recording medium 311 as a package medium or the like, for example. In addition, the program can be provided via a wired or wireless transmission medium such as a local area network, the internet, or digital satellite broadcasting.

In the present specification, the steps described in the flowcharts may be performed in time series in the order described, or may be performed in parallel or at a necessary timing such as when a call is made, without necessarily performing processing in time series.

In the present specification, a system refers to a collection of a plurality of structural elements (devices, modules (components), etc.), irrespective of whether all the structural elements are located in the same housing. Therefore, a plurality of devices which are housed in different cases and connected via a network, and one device which houses a plurality of modules in one case are both systems.

The embodiments of the present disclosure are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present disclosure.

In addition, for example, as long as no contradiction occurs, the plurality of techniques of the present disclosure can be individually and independently implemented. Of course, any of a plurality of the present techniques may be used in combination. For example, a part or all of the present technology described in any of the embodiments may be combined with a part or all of the present technology described in other embodiments. In addition, some or all of the above-described present techniques may be implemented in combination with other techniques not described above.

For example, in the present technology, a configuration of cloud computing that shares one function and processes it together with a plurality of devices via a network can be adopted.

In addition, each step described in the flowcharts described above can be executed by a plurality of apparatuses in a shared manner, in addition to one apparatus.

Further, in the case where a plurality of processes are included in one step, the plurality of processes included in the one step can be performed in addition to being performed by one apparatus, and can be performed in a shared manner by a plurality of apparatuses.

The effects described in the present specification are merely examples, and are not limiting, and effects other than the effects described in the present specification may be used.

In addition, the present technology can take the following configuration.

(1) A photographing apparatus includes a control unit that transmits information on itself concerning photographing to a server device and acquires candidates of a set value concerning volume photographing transmitted from the server device based on a determination result of whether or not volume photographing is possible.

(2) The imaging apparatus according to (1) above, wherein the control unit presents candidates of the set value acquired from the server device to a user for selection.

(3) The imaging apparatus according to the above (1) or (2), wherein the control unit further controls to transmit identification information of a group composed of a plurality of imaging devices that perform volume imaging and position information of the imaging devices and to receive registration of the group.

(4) The imaging apparatus according to any one of (1) to (3), wherein the control unit acquires, as candidates of the set value, a camera synchronization system supported by all imaging devices that perform volume imaging.

(5) The imaging apparatus according to (4) above, wherein the camera synchronization method is a method of generating a synchronization signal based on any one of time information of carrier communication, time information of a GPS signal, and timing detection in wireless communication or multicast communication of carrier communication.

(6) The photographing apparatus according to any one of the above (1) to (5), wherein the control section controls such that, after photographing of volume photographing is started, a flash or a sound is outputted at a prescribed timing to generate an image including the flash or the sound, and photographed images photographed by the respective photographing devices are synchronized by detecting the image including the flash or the sound.

(7) The imaging apparatus according to any one of (1) to (6), wherein the control unit controls the camera calibration processing to capture a predetermined image displayed by another imaging device, detect a feature point of the predetermined image, and transmit the feature point to the server device.

(8) The imaging apparatus according to the above (7), wherein the control unit controls the imaging device to sequentially capture the predetermined images displayed by the plurality of imaging devices that perform volume imaging.

(9) The imaging apparatus according to the above (7), wherein the control unit controls the imaging device to capture the predetermined image simultaneously displayed by the plurality of imaging devices that perform volume imaging.

(10) The imaging apparatus according to (9) above, wherein the predetermined image displayed simultaneously is a different image among the plurality of imaging devices.

(11) The imaging apparatus according to any one of (1) to (10), wherein the control unit transmits, in units of frames, sensor information for determining whether or not to update the camera parameters, together with a captured image obtained by capturing a predetermined subject.

(12) The server apparatus includes a control unit that receives information on photographing from a plurality of photographing devices, and determines whether or not the plurality of photographing devices are capable of performing volume photographing based on the received information.

(13) The server device according to the above (12), wherein the control unit generates candidates of a set value related to volume imaging of the imaging apparatus based on the received information, and determines whether or not the volume imaging is possible based on the generated candidates of the set value.

(14) The server apparatus according to any one of (12) to (13), wherein the control unit calculates an imaging target area of each imaging device, and determines whether or not volume imaging is possible based on whether or not there is an area common to a predetermined number of imaging devices when the imaging target areas of the imaging devices are overlapped.

(15) The server device according to any one of (12) to (14), wherein the control unit calculates a modeling parameter of the 3D model based on a resolution of each imaging apparatus and an imaging target area, and determines whether or not volume imaging is possible based on whether or not the calculated modeling parameter is within a predetermined range.

(16) The server device according to any one of (12) to (15), wherein the control unit estimates a processing time required for 3D modeling based on the information of each imaging apparatus, and determines whether or not volume imaging is possible based on whether or not the estimated 3D modeling processing time is within a predetermined time.

(17) The server device according to any one of (12) to (16), further comprising a calibration unit that acquires predetermined images captured by the plurality of imaging devices and calculates camera parameters.

(18) The server apparatus according to any one of (12) to (17), further comprising a modeling unit that generates data of a 3D model of the subject from the captured images captured by the plurality of imaging devices, and transmits the generated data of the 3D model to a playback device.

(19) The server device according to the above (18), further comprising a calibration unit configured to update a camera parameter when it is determined that the imaging apparatus is moving based on sensor information received in frame units together with an imaging image obtained by imaging the subject.

(20) A3D data generation method, wherein, receive the information correlated to shooting from a plurality of shooting apparatuses, and on the basis of the above-mentioned information received, judge whether the above-mentioned plurality of shooting apparatuses can carry on the volume shooting, according to the shooting image that is shot by the above-mentioned plurality of shooting apparatuses judged to be able to carry on the volume shooting, produce the data of 3D model of the photographic subject.

Reference numerals illustrate: 1 … image processing system; 11 … shooting device; 12 … cloud server; 13a … smartphone or tablet; 13B … personal computer; 13 … reproduction device; 13C … Head Mounted Display (HMD); 15 … control device; 33 … control part; 45 … synchronization signal generation unit; 51 … camera; 101 … controller; 105 … calibration part; 106, … modeling task generating section; 108 … off-line modeling part; 109 … content manager; 111 … real-time modeling section; 113 … automatic calibration section; 153 … control section; 157 … reproduction section; 301 … CPU;302 … ROM;303 … RAM;306 … input; 307 … output section; 308 … store; 309 … communication unit; 310 … driver.

Claims

1. A photographing apparatus, wherein,

the control unit is provided with a control unit which transmits information on itself concerning imaging to a server device and acquires candidates of a set value concerning volume imaging transmitted from the server device based on a determination result of whether volume imaging is possible.

2. The photographing apparatus of claim 1, wherein,

the control unit presents candidates of the setting value acquired from the server device to a user for selection.

3. The photographing apparatus of claim 1, wherein,

the control section also controls so as to transmit identification information of a group made up of a plurality of imaging devices that perform volume imaging and position information of itself, and to receive whether registration with the group is possible.

4. The photographing apparatus of claim 1, wherein,

the control unit acquires, as candidates of the set value, camera synchronization systems supported by all of the imaging devices that perform volume imaging.

5. The photographing apparatus of claim 4, wherein,

the camera synchronization method is a method of generating a synchronization signal based on any one of time information of carrier communication, time information of a GPS signal, and timing detection in wireless communication or multicast communication of carrier communication.

6. The photographing apparatus of claim 1, wherein,

the control section performs control such that, after the start of shooting of the volume shooting, a flash or a sound is output at a prescribed timing to generate an image including the flash or the sound, and the shot images shot by the respective shooting devices are synchronized by detecting the image including the flash or the sound.

7. The photographing apparatus of claim 1, wherein,

the control unit performs control so that, in the camera calibration process, a predetermined image displayed by another imaging device is imaged, and feature points of the predetermined image are detected and transmitted to the server device.

8. The photographing apparatus as claimed in claim 7, wherein,

the control unit controls the imaging device to sequentially capture the predetermined images displayed by the plurality of imaging devices that perform volume imaging.

9. The photographing apparatus as claimed in claim 7, wherein,

the control unit controls the imaging device to capture the predetermined image simultaneously displayed by the plurality of imaging devices that perform volume imaging.

10. The photographing apparatus as claimed in claim 9, wherein,

The predetermined image displayed at the same time is a different image among the plurality of photographing devices.

11. The photographing apparatus of claim 1, wherein,

the control unit transmits the sensor information in units of frames together with a captured image obtained by capturing a predetermined subject,

the sensor information is used to determine whether to update the camera parameters.

12. A server apparatus, wherein,

the imaging apparatus includes a control unit that receives information on imaging from a plurality of imaging devices and determines whether or not the plurality of imaging devices are capable of volume imaging based on the received information.

13. The server device according to claim 12, wherein,

the control unit generates candidates of a set value related to volume shooting of the shooting device based on the received information, and determines whether or not volume shooting is possible based on the generated candidates of the set value.

14. The server device according to claim 12, wherein,

the control unit calculates an imaging target area of each imaging device, and determines whether or not volume imaging is possible based on whether or not there is an area shared by a predetermined number or more of imaging devices when the imaging target areas of the imaging devices are overlapped.

15. The server device according to claim 12, wherein,

the control unit calculates modeling parameters of the 3D model from the resolution of each imaging device and the imaging target region, and determines whether or not volume imaging is possible based on whether or not the calculated modeling parameters are within a predetermined range.

16. The server device according to claim 12, wherein,

the control unit estimates a processing time required for 3D modeling based on the information of each imaging device, and determines whether or not volume imaging is possible based on whether or not the estimated 3D modeling processing time is within a predetermined time.

17. The server device according to claim 12, wherein,

the camera device further includes a calibration unit that acquires predetermined images captured by the plurality of imaging devices and calculates camera parameters.

18. The server device according to claim 12, wherein,

the imaging apparatus further includes a modeling unit that generates data of a 3D model of the subject from the captured images captured by the plurality of imaging devices, and transmits the generated data of the 3D model to the playback device.

19. The server device according to claim 18, wherein,

The camera device further includes a calibration unit that updates camera parameters when it is determined that the imaging device is moving based on sensor information received in units of frames together with an imaging image obtained by imaging the subject.

20. A 3D data generation method, wherein,

receiving information related to photographing from a plurality of photographing apparatuses, and based on the received information, determining whether the plurality of photographing apparatuses are capable of volume photographing,

data of a 3D model of the subject is generated from captured images captured by the plurality of imaging devices determined to be capable of volume capturing.