WO2021181965A1

WO2021181965A1 - Image processing device, image processing method, and program

Info

Publication number: WO2021181965A1
Application number: PCT/JP2021/004160
Authority: WO
Inventors: 洋司山本; 小曽根　卓義; 隆一唯野
Original assignee: ソニーグループ株式会社
Priority date: 2020-03-09
Filing date: 2021-02-04
Publication date: 2021-09-16

Abstract

The present invention realises processing that increases the variety of image effects produced through the expression of image shake states. This image processing device is configured such that the following are carried out: region setting processing in which a plurality of regions are set in an image that is from input moving image data; and vibration change processing in which states of shake appearing in output moving image data are made to be different states for each region set by the region setting processing.

Description

Image processing device, image processing method, program

This technology relates to image processing devices, image processing methods, and programs, and particularly to image processing related to image shake.

There is known a technique for performing image processing such as various corrections on a moving image captured by an imaging device.
Patent Document 1 below discloses that vibration isolation processing is performed on moving image data related to captured images, and the influence of vibration isolation processing is removed on the moving image data after the vibration isolation processing.

Japanese Unexamined Patent Publication No. 2015-216510

By the way, in recent years, users can easily perform image imaging and image adjustment using mobile terminals such as smartphones and tablets, or cameras themselves and personal computers, and video posting is also popular.
In such an environment, it is desired to produce higher quality images and various images instead of outputting the images captured by the user as they are.
It is also desired that broadcasters and the like can produce various images.

For example, adding shaking to an image according to the content of a moving image is a method of producing an effect that expands the image expression. However, there are various requests for image effects, such as when there are subjects in the image that are desired to be shaken and subjects that are not desired to be shaken, or when only a part of the image is to be shaken violently.
Therefore, the present disclosure proposes a technique capable of generating an image in which shaking is changed according to the image content when adding or removing shaking in a moving image.

In the image processing device according to the present technology, the area setting unit for setting a plurality of areas in the image of the input moving image data and the shaking state appearing in the output moving image data are different for each area set by the area setting unit. It is provided with a shaking change unit that performs a shaking change process so as to be.
The shaking change process is to change the shaking state by reducing the shaking generated in the moving image or adding the shaking. Then, by dividing one screen of the input video data into a plurality of areas and changing the shaking state for each area, for example, a region without shaking and a region with shaking can be formed in the screen, or the shaking can be formed. It is possible to form a small area and a large shaking area.

In the image processing apparatus according to the present technology described above, the shaking changing unit adds shaking for one area set by the area setting unit and reduces shaking for the other area set by the area setting unit. It is conceivable to perform change processing.
For example, it is possible to obtain a region that does not shake (or has less shake) and a region that shakes (or has a large shake) in the image.

In the image processing apparatus according to the present technology described above, the shaking changing unit adds shaking for one area set by the area setting unit, and the other area set by the area setting unit is more than the one area. It is conceivable to perform a shaking change process that adds a small amount of shaking.
For example, a region with less shaking and a region with large shaking can be obtained in the image.

In the image processing apparatus according to the present technology described above, the shaking changing unit reduces shaking for one area set by the area setting unit, and is more than the other area set by the area setting unit. It is conceivable to perform a shaking change process that reduces shaking with a large reduction amount.
For example, in the image, a region where the shaking is slightly reduced and a region where the shaking is significantly reduced can be obtained.

In the image processing apparatus according to the present technology described above, it is conceivable that the shaking changing unit performs shaking changing processing for adding or reducing shaking to one of one area and the other area set by the area setting unit. ..
For example, it is possible to obtain a region in which shaking is added or reduced and a region in which shaking is not added or reduced in the image.

The image processing device according to the present technology described above includes a user interface processing unit that detects operation information related to shaking change, and the area setting unit sets a plurality of areas based on the operation information detected by the user interface processing unit. It is possible to set it.
For example, the user can specify a boundary or a range for dividing into a plurality of areas in one screen.

In the image processing apparatus according to the present technology described above, it is conceivable that the area setting unit sets a plurality of areas based on the image analysis of the input moving image data.
For example, by subject recognition in an image, an area is set based on a pixel range in which a specific subject is captured.

In the image processing device according to the present technology described above, the shaking changing unit attaches each frame of the input moving image data to the celestial sphere model and rotates each frame using the shaking information corresponding to each frame. , It is conceivable to change the shaking of the entire image.
For example, rotation processing is performed on the celestial sphere model based on the shaking information (for example, quaternion) of the imaging device obtained from the information of the angular velocity sensor and the acceleration sensor.

In the image processing apparatus according to the present technology described above, it is conceivable that the shaking changing unit changes the shaking amount for each region by moving the coordinate points on the celestial sphere model in each frame.
For example, by moving the coordinate points of the image on the celestial sphere model, the enlargement or reduction of the image fluctuates from frame to frame, thereby realizing partial shaking.

In the image processing apparatus according to the present technology described above, the shaking changing unit moves the coordinate points by setting the coordinate points of the pixels of the boundary region between one region and the other region set by the region setting unit. It is conceivable that it is a process of moving.
For example, by expanding and contracting the boundary region of a plurality of regions in the image, the shaking of one region is increased.

In the image processing method according to the present technology, the image processing apparatus sets an area setting process for setting a plurality of areas in the image of the input moving image data, and the shaking state appearing in the output moving image data is an area set in the area setting process. A shaking change process that makes the state different for each is performed.
This makes it possible to add or remove shake in each area of the image.
The program according to the present technology is a program that causes an information processing apparatus to execute a process corresponding to such an image processing method.
As a result, the image processing of the present disclosure can be executed by various information processing devices.

It is explanatory drawing of the apparatus used in embodiment of this technique. It is explanatory drawing of the information transmitted between the devices of embodiment. It is a block diagram of the image pickup apparatus of an embodiment. It is explanatory drawing of the image shake removal processing in the image pickup apparatus of embodiment. It is a block diagram of the information processing apparatus of embodiment. It is explanatory drawing of the functional configuration example as an image processing apparatus of embodiment. It is explanatory drawing of the other functional configuration example as an image processing apparatus of embodiment. It is explanatory drawing of the shaking change for each area of embodiment. It is explanatory drawing of the shaking change by the celestial sphere model of embodiment. It is explanatory drawing of the coordinate change of the celestial sphere model in the shaking change of embodiment. It is explanatory drawing of the coordinate change of the celestial sphere model in the shaking change of embodiment. It is explanatory drawing of enlargement / reduction for each area of embodiment. It is explanatory drawing of the coordinate change for expansion of embodiment. It is explanatory drawing of the coordinate change for reduction of embodiment. It is explanatory drawing of the content of the moving image file and metadata of embodiment. It is explanatory drawing of metadata about lens distortion correction. It is explanatory drawing of the image processing of embodiment. It is explanatory drawing of attachment to the celestial sphere model of embodiment. It is explanatory drawing of the sample timing of the IMU data of embodiment. It is explanatory drawing of the fluctuation information adjustment for each frequency band of embodiment. It is explanatory drawing of the shaking information adjustment for each direction of embodiment. It is explanatory drawing of the swing information adjustment for each frequency band and each direction of the embodiment. It is explanatory drawing of the correspondence between the output image of an embodiment and a celestial sphere model. It is explanatory drawing of rotation and perspective projection of the output coordinate plane of embodiment. It is explanatory drawing of the cut-out area of embodiment.

Hereinafter, embodiments will be described in the following order.
<1. Equipment configuration applicable as an image processing device>
<2. Device configuration and processing function>
<3. Video files and metadata>
<4. Image processing of the embodiment>
<5. Summary and modification>

Prior to the description of the embodiment, some terms used in the description will be described.
“Shake” refers to the interframe shake of the images that make up a moving image. It broadly refers to vibration components (image fluctuations between frames) that occur between frames, such as shaking caused by camera shake in an image captured by a so-called image pickup device, or shaking intentionally added by image processing. It shall be.

"Interframe shake modification" refers to changing the state of shaking in an image, such as reducing the shaking occurring in the image or adding shaking to the image.
This "shake change" shall include the following "interframe shake reduction" and "interframe shake addition".

"Shake removal" refers to eliminating or reducing the shaking that occurs in the image due to camera shake (total removal of shaking) or reduction (partial removal of shaking). That is, it means adjusting so as to reduce the shaking based on the shaking information at the time of imaging. The so-called image stabilization performed in the image pickup apparatus is to remove the shaking.

"Adding shaking" means adding shaking to the image. This includes adding shaking to an image without shaking, and adding shaking to an image with shaking so as to further increase the shaking.

The above-mentioned "sway removal (including partial sway removal)" and "sway addition" are processes for changing the sway state or obtaining an image with sway as a result, and therefore can be said to be a process for adding a sway effect. ..
As an example of the purpose of the shaking effect, it is assumed that the image is intentionally shaken in order to give power to the moving image scene.

"Shake information during imaging" is information related to shaking when an image is taken by an image pickup device, such as motion detection information of the image pickup device, information that can be calculated from the detection information, and posture information indicating the posture of the image pickup device. , Information such as shift and rotation as the movement of the image pickup device is applicable.
In the embodiment, the quarternion (QD) and the IMU data are given as specific examples of the "shaking information at the time of imaging", but there are other examples such as shift / rotation information, and the present invention is not particularly limited.

<1. Equipment configuration applicable as an image processing device>
In the following embodiments, an example in which the image processing device according to the present disclosure is realized mainly by an information processing device such as a smartphone or a personal computer will be described, but the image processing device can be realized in various devices. First, a device to which the technology of the present disclosure can be applied will be described.

FIG. 1A shows an example of an image source VS and an image processing device (TDx, TDy) that acquires a moving image file MF from the image source VS. The moving image file MF includes image data (that is, moving image data) and audio data constituting the moving image. However, there may be an audio file separate from the video file so that it can be synchronized. The moving image data also includes a plurality of continuously shot still image data.
The image processing device TDx is a device that temporarily performs shaking change processing on the moving image data acquired from the image source VS.
On the other hand, the image processing device TDy is a device that secondarily performs the shaking change processing on the moving image data that has already been subjected to the shaking change processing by another image processing device.

As the image source VS, an image pickup device 1, a server 4, a recording medium 5, and the like are assumed.
As the image processing devices TDx and TDy, a mobile terminal 2 such as a smartphone, a personal computer 3 and the like are assumed. Although not shown, various devices such as an image editing dedicated device, a cloud server, a television device, and a video recording / playback device are assumed as image processing devices TDx and TDy. These devices can function as any of the image processing devices TDx and TDy.

The image pickup device 1 as an image source VS is a digital camera or the like capable of performing video imaging, and transfers the video file MF obtained by video imaging to a mobile terminal 2 or a personal computer 3 via wired communication or wireless communication. do.
The server 4 may be a local server, a network server, a cloud server, or the like, but refers to a device capable of providing a moving image file MF captured by the image pickup device 1. It is conceivable that the server 4 transfers the moving image file MF to the mobile terminal 2 or the personal computer 3 via some kind of transmission path.

The recording medium 5 may be a solid-state memory such as a memory card, a disk-shaped recording medium such as an optical disk, or a tape-shaped recording medium such as a magnetic tape, but removable recording in which the moving image file MF captured by the imaging device 1 is recorded. Pointing to the medium. It is conceivable that the moving image file MF read from the recording medium 5 is read by the mobile terminal 2 or the personal computer 3.

The mobile terminal 2 and the personal computer 3 as the image processing devices TDx and TDy are capable of performing image processing on the moving image file MF acquired from the above image source VS. The image processing referred to here includes shaking change processing (shaking addition and shaking removal).
The shaking change processing is performed, for example, for each frame of moving image data, after the processing of pasting to the celestial sphere model, which will be described later, is performed, and then the rotation is performed using the attitude information corresponding to the frame, or the coordinates of the pixels on the celestial sphere model. It can be done by moving points.

Note that a certain mobile terminal 2 or personal computer 3 may serve as an image source VS for another mobile terminal 2 or personal computer 3 that functions as an image processing device TDx or TDy.

FIG. 1B shows an image pickup device 1 and a mobile terminal 2 as one device that can function as both an image source VS and an image processing device TDx.
For example, a microcomputer or the like inside the image pickup apparatus 1 performs the shaking change processing.
That is, the image pickup apparatus 1 can output the image as the result of the image processing in which the shaking is removed or the shaking is added by performing the shaking changing process on the moving image file MF generated by the imaging.

The same applies to the mobile terminal 2, and since it can be an image source VS by having an imaging function, the image processing result obtained by removing or adding shaking by performing the above-mentioned shaking change processing on the moving image file MF generated by imaging. The image can be output as.
Of course, not limited to the image pickup device 1 and the mobile terminal 2, various other devices that can serve as an image source and an image processing device can be considered.

As described above, the image processing device TDx of the embodiment, the device functioning as the TDy, and the image source VS are various. And another image processing device TDy will be described as separate devices.

FIG. 2 shows a state of information transmission in the image processing device TDy of the image source VS and the image processing device TDx.
The moving image data VD1 and the metadata MTD1 are transmitted from the image source VS to the image processing device TDx via wired communication, wireless communication, or a recording medium.
As will be described later, the moving image data VD1 and the metadata MTD1 are information transmitted as, for example, a moving image file MF.
The metadata MTD1 may include a coordinate conversion parameter HP as information on shaking removal at the time of imaging performed, for example, as image stabilization.

The image processing device TDx can perform various processes by receiving the moving image data VD1, the metadata MTD1, and the coordinate conversion parameter HP.
For example, the image processing device TDx can perform the shaking change processing on the moving image data VD1 by using the shaking information at the time of imaging included in the metadata MTD1.
Further, for example, the image processing device TDx can cancel the shaking removal applied to the moving image data VD1 at the time of imaging by using the coordinate conversion parameter HP included in the metadata MTD1.

When the shaking change processing is performed, the image processing device TDx may perform a process of associating the moving image data with the shaking information at the time of imaging and the shaking change information SMI that can specify the processing amount of the shaking change processing. Shake change information SMI includes information on shake change for each region, which will be described later.
Then, the associated moving image data, the shaking information at the time of imaging, and the shaking change information SMI are collectively or separately transmitted to the image processing device TDy via wired communication, wireless communication, or a recording medium. Can be done.
Here, the term "associate" means, for example, to make the other information available (linkable) when processing one piece of information (data, commands, programs, etc.). That is, the information associated with each other may be collected as one file or the like, or may be individual information. For example, the information B associated with the information A may be transmitted on a transmission path different from that of the information A. Further, for example, the information B associated with the information A may be recorded on a recording medium (or another recording area of the same recording medium) different from the information A. Note that this "association" may be a part of the information, not the entire information. For example, an image and information corresponding to the image may be associated with each other in an arbitrary unit such as a plurality of frames, one frame, or a part within the frame.
More specifically, for example, assigning the same ID (identification information) to a plurality of pieces of information, recording a plurality of pieces of information on the same recording medium, storing a plurality of pieces of information in the same folder, and a plurality of cases. The act of storing the information in the same file (giving one to the other as metadata), embedding multiple pieces of information in the same stream, for example, embedding meta in an image like a digital watermark. , Included in "associate".

FIG. 2 shows moving image data transmitted from the image processing device TDx to the image processing device TDy as moving image data VD2. The moving image data VD2 is an image in which the shake removal performed by the image pickup device 1 is canceled, an image in which the shake removal is performed by the image processing device TDx, or a shake change process is performed by the image processing device TDx. Various examples are conceivable, such as a previous image or an image that has undergone image processing other than shaking change.
Further, FIG. 2 shows the metadata MTD2 transmitted from the image processing device TDx to the image processing device TDy. The metadata MTD2 may have the same information as the metadata MTD1 or may have some different information.
When the metadata MTD2 includes the shaking information at the time of imaging, the image processing device TDy is in a state where at least the moving image data VD2, the shaking information at the time of imaging included in the metadata MTD2, and the shaking change information SMI are associated with each other. You can get it.
It should be noted that a data form in which the shaking change information SMI is also included in the metadata MTD2 can be considered.

In this embodiment, an example of image processing that can be executed by the image processing apparatus TDx (or TDy) will be described in particular.

<2. Device configuration and processing function>
First, a configuration example of the image pickup apparatus 1 serving as the image source VS will be described with reference to FIG.
As described in FIG. 1B, when it is assumed that the moving image file MF captured by the mobile terminal 2 is subjected to image processing by the mobile terminal 2, the mobile terminal 2 has the same configuration as the following image pickup device 1 in terms of the image pickup function. You just have to prepare.

Further, in the image pickup apparatus 1, so-called image stabilization, which is a process of reducing image shake due to the movement of the image pickup device at the time of imaging, is performed, which is "shaking removal" performed by the image pickup apparatus. On the other hand, the "shaking addition" and "shaking removal" performed by the image processing devices TDx and TDy are separate processes independent of the "shaking removal" performed at the time of imaging by the image pickup device 1.

As shown in FIG. 3, the image pickup apparatus 1 includes, for example, a lens system 11, an image sensor unit 12, a camera signal processing unit 13, a recording control unit 14, a display unit 15, an output unit 16, an operation unit 17, a camera control unit 18, and a memory. It has a unit 19, a driver unit 22, and a sensor unit 23.

The lens system 11 includes a lens such as a cover lens, a zoom lens, and a focus lens, an aperture mechanism, and the like. Light from the subject (incident light) is guided by the lens system 11 and focused on the image sensor unit 12.
Although not shown, the lens system 11 may be provided with an optical image stabilization mechanism that corrects image shake (interframe shake) and blur due to camera shake or the like.

The image sensor unit 12 includes, for example, an image sensor 12a (imaging element) such as a CMOS (Complementary Metal Oxide Semiconductor) type or a CCD (Charge Coupled Device) type.
The image sensor unit 12 executes, for example, CDS (Correlated Double Sampling) processing, AGC (Automatic Gain Control) processing, and the like on the electric signal obtained by photoelectric conversion of the light received by the image sensor 12a, and further performs A / D. (Analog / Digital) Perform conversion processing. Then, the image pickup signal as digital data is output to the camera signal processing unit 13 and the camera control unit 18 in the subsequent stage.
Note that the optical image stabilization mechanism (not shown) is a mechanism that corrects image shake by moving the image sensor 12a side instead of the lens system 11 side, or spatial optical image stabilization using a gimbal. In some cases, it may be a balanced optical image stabilization mechanism, and any method may be used.
In the optical image stabilization mechanism, in addition to the interframe shake, the blur in the frame is also corrected as described later.

The camera signal processing unit 13 is configured as an image processing processor by, for example, a DSP (Digital Signal Processor) or the like. The camera signal processing unit 13 performs various signal processing on the digital signal (image image signal) from the image sensor unit 12. For example, as a camera process, the camera signal processing unit 13 performs preprocessing, simultaneous processing, YC generation processing, resolution conversion processing, codec processing, and the like.
The camera signal processing unit 13 also performs various correction processes. However, it is assumed that the image stabilization may or may not be performed in the image pickup apparatus 1.

In the pre-processing, a clamping process for clamping the black level of R, G, B to a predetermined level, a correction process between the color channels of R, G, B, etc. are performed on the captured image signal from the image sensor unit 12. conduct.
In the simultaneous processing, a color separation processing is performed so that the image data for each pixel has all the color components of R, G, and B. For example, in the case of an image sensor using a Bayer array color filter, demosaic processing is performed as color separation processing.
In the YC generation process, a luminance (Y) signal and a color (C) signal are generated (separated) from the image data of R, G, and B.
In the resolution conversion process, the resolution conversion process is executed on the image data subjected to various signal processing.

FIG. 4 shows an example of various correction processes (internal correction of the image pickup apparatus 1) performed by the camera signal processing unit 13. In FIG. 4, the optical image stabilization performed by the lens system 11 and the correction processing performed by the camera signal processing unit 13 are illustrated by their execution order.

In the optical image stabilization as processing F1, the in-lens image stabilization by shifting the yaw direction and pitch direction of the lens system 11 and the in-body image stabilization by shifting the yaw direction and pitch direction of the image sensor 12a are performed to perform image stabilization. The image of the subject is formed on the image sensor 12a in a state where the influence of the above is physically canceled.
The in-lens image stabilization and the in-body image stabilization may be only one, or both may be used. When both in-lens image stabilization and in-body image stabilization are used, it is conceivable that the in-body image stabilization does not shift in the yaw direction or pitch direction.
In addition, neither in-lens image stabilization nor in-body image stabilization is adopted, and for image stabilization, only electronic image stabilization or only optical image stabilization may be performed.

In the camera signal processing unit 13, processing from processing F2 to processing F7 is performed by spatial coordinate transformation for each pixel.
In the process F2, lens distortion correction is performed.
In the process F3, focal plane distortion correction is performed as one element of electronic image stabilization. It should be noted that this is to correct the distortion when the rolling shutter type reading is performed by, for example, the CMOS type image sensor 12a.

Roll correction is performed in the process F4. That is, the roll component is corrected as one element of the electronic image stabilization.
In the process F5, trapezoidal distortion correction is performed for the trapezoidal distortion caused by the electronic image stabilization. The keystone distortion caused by electronic image stabilization is perspective distortion caused by cutting out a place away from the center of the image.
In the process F6, the pitch direction and the yaw direction are shifted and cut out as one element of the electronic image stabilization.
For example, camera shake correction, lens distortion correction, and trapezoidal distortion correction are performed by the above procedure.
It is not essential to carry out all of the processes listed here, and the order of the processes may be changed as appropriate.

In the codec processing in the camera signal processing unit 13 of FIG. 3, the image data subjected to the above various processing is subjected to, for example, coding processing for recording or communication, and file generation. For example, a moving image file MF as an MP4 format used for recording MPEG-4 compliant video / audio is generated. It is also conceivable to generate files in formats such as PEG (Joint Photographic Experts Group), TIFF (Tagged Image File Format), GIF (Graphics Interchange Format), and HEIF (High Efficient Image File) as still image files.
The camera signal processing unit 13 also generates metadata to be added to the moving image file MF by using the information from the camera control unit 18 and the like.

Although the audio processing system is not shown in FIG. 3, it actually has an audio recording system and an audio processing system, and the moving image file MF may include the audio data as well as the moving image data.

The recording control unit 14 records and reproduces, for example, a recording medium using a non-volatile memory. The recording control unit 14 performs a process of recording a moving image file MF such as moving image data or still image data, a thumbnail image, or the like on a recording medium, for example.
The actual form of the recording control unit 14 can be considered in various ways. For example, the recording control unit 14 may be configured as a flash memory built in the image pickup device 1 and a write / read circuit thereof, or a recording medium that can be attached to and detached from the image pickup device 1, such as a memory card (portable flash memory, etc.). ) May be in the form of a card recording / playback unit that performs recording / playback access. Further, it may be realized as an HDD (Hard Disk Drive) or the like as a form built in the image pickup apparatus 1.

The display unit 15 is a display unit that displays various displays to the imager, and is, for example, a display such as a liquid crystal panel (LCD: Liquid Crystal Display) or an organic EL (Electro-Luminescence) display arranged in the housing of the image pickup device 1. It is used as a display panel or view finder depending on the device.
The display unit 15 causes various displays to be executed on the display screen based on the instruction of the camera control unit 18.
For example, the display unit 15 displays a reproduced image of the image data read from the recording medium by the recording control unit 14.
Further, the display unit 15 is supplied with image data of the captured image whose resolution has been converted by the camera signal processing unit 13 for display, and the display unit 15 is based on the image data of the captured image in response to an instruction from the camera control unit 18. May be displayed. As a result, a so-called through image (subject monitoring image), which is an captured image during composition confirmation, is displayed.
Further, the display unit 15 causes various operation menus, icons, messages, etc., that is, display as a GUI (Graphical User Interface) to be executed on the screen based on the instruction of the camera control unit 18.

The output unit 16 performs data communication and network communication with an external device by wire or wirelessly.
For example, the image data (for example, a moving image file MF) is transmitted and output to an external display device, recording device, playback device, or the like.
Further, assuming that the output unit 16 is a network communication unit, it communicates with various networks such as the Internet, a home network, and a LAN (Local Area Network), and transmits and receives various data to and from servers, terminals, and the like on the network. You may do so.

The operation unit 17 collectively shows input devices for the user to perform various operation inputs. Specifically, the operation unit 17 shows various controls (keys, dials, touch panels, touch pads, etc.) provided in the housing of the image pickup apparatus 1.
The operation unit 17 detects the user's operation, and the signal corresponding to the input operation is sent to the camera control unit 18.

The camera control unit 18 is composed of a microcomputer (arithmetic processing device) provided with a CPU (Central Processing Unit).
The memory unit 19 stores information and the like used for processing by the camera control unit 18. As the illustrated memory unit 19, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a flash memory, and the like are comprehensively shown.
The memory unit 19 may be a memory area built in the microcomputer chip as the camera control unit 18, or may be configured by a separate memory chip.
The camera control unit 18 controls the entire image pickup apparatus 1 by executing a program stored in the ROM of the memory unit 19, the flash memory, or the like.
For example, the camera control unit 18 controls the shutter speed of the image sensor unit 12, gives instructions for various signal processing in the camera signal processing unit 13, captures and records according to the user's operation, and reproduces the recorded moving image file MF and the like. , The operation of each necessary part is controlled with respect to the operation of the lens system 11 such as zoom, focus, and aperture adjustment in the lens barrel, and the operation of the user interface.

The RAM in the memory unit 19 is used for temporarily storing data, programs, and the like as a work area for various data processing of the CPU of the camera control unit 18.
The ROM and flash memory (nonvolatile memory) in the memory unit 19 include an OS (Operating System) for the CPU to control each unit, content files such as a moving image file MF, application programs for various operations, and a firmware. It is used to store clothing and the like.

The driver unit 22 is provided with, for example, a motor driver for the zoom lens drive motor, a motor driver for the focus lens drive motor, a motor driver for the diaphragm mechanism motor, and the like.
These motor drivers apply a drive current to the corresponding driver in response to an instruction from the camera control unit 18, to move the focus lens and the zoom lens, open and close the diaphragm blades of the diaphragm mechanism, and the like.

The sensor unit 23 comprehensively shows various sensors mounted on the image pickup apparatus.
As the sensor unit 23, for example, an IMU (inertial measurement unit) is mounted. For example, an angular velocity is detected by a three-axis angular velocity (gyro) sensor of pitch-, yaw, and roll, and acceleration is detected by an acceleration sensor. can do.
The sensor unit 23 may include a sensor capable of detecting camera shake during imaging, and does not need to include both a gyro sensor and an acceleration sensor.
Further, the sensor unit 23 may be equipped with a position information sensor, an illuminance sensor, or the like.

For example, the moving image file MF imaged and generated by the above image pickup device 1 can be transferred to the image processing devices TDx and TDy of the mobile terminal 2 and the like for image processing.
The mobile terminal 2 and the personal computer 3 serving as the image processing devices TDx and TDy can be realized as, for example, an information processing device having the configuration shown in FIG. Similarly, the server 4 can be realized by the information processing device having the configuration shown in FIG.

In FIG. 5, the CPU 71 of the information processing apparatus 70 executes various processes according to a program stored in the ROM 72 or a program loaded from the storage unit 79 into the RAM 73. The RAM 73 also appropriately stores data and the like necessary for the CPU 71 to execute various processes.
The CPU 71, ROM 72, and RAM 73 are connected to each other via a bus 74. An input / output interface 75 is also connected to the bus 74.

An input unit 76 including an operator and an operation device is connected to the input / output interface 75.
For example, as the input unit 76, various controls and operation devices such as a keyboard, mouse, keys, dial, touch panel, touch pad, and remote controller are assumed.
The user's operation is detected by the input unit 76, and the signal corresponding to the input operation is interpreted by the CPU 71.

Further, a display unit 77 made of an LCD or an organic EL panel and an audio output unit 78 made of a speaker or the like are connected to the input / output interface 75 as one or a separate body.
The display unit 77 is a display unit that performs various displays, and is composed of, for example, a display device provided in the housing of the information processing device 70, a separate display device connected to the information processing device 70, and the like.
The display unit 77 executes the display of various images for image processing, moving images to be processed, and the like on the display screen based on the instruction of the CPU 71. Further, the display unit 77 displays various operation menus, icons, messages, etc., that is, as a GUI (Graphical User Interface) based on the instruction of the CPU 71.

A storage unit 79 composed of a hard disk, a solid-state memory, or the like, or a communication unit 80 composed of a modem or the like may be connected to the input / output interface 75.
The communication unit 80 performs communication processing via a transmission line such as the Internet, wire / wireless communication with various devices, bus communication, and the like.

A drive 82 is also connected to the input / output interface 75, if necessary, and a removable recording medium 81 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is appropriately mounted.
The drive 82 can read data files such as a moving image file MF and various computer programs from the removable recording medium 81. The read data file is stored in the storage unit 79, and the image and sound included in the data file are output by the display unit 77 and the sound output unit 78. Further, the computer program or the like read from the removable recording medium 81 is installed in the storage unit 79 as needed.

In this information processing device 70, for example, software for image processing as the image processing device of the present disclosure can be installed via network communication by the communication unit 80 or a removable recording medium 81. Alternatively, the software may be stored in the ROM 72, the storage unit 79, or the like in advance.

For example, such software (application program) constructs a functional configuration as shown in FIG. 6 in the CPU 71 of the information processing apparatus 70.
FIG. 6 shows a function provided as an information processing device 70 that functions as an image processing device TDx. That is, the information processing device 70 (CPU 71) has functions as a shake changing unit 101, a parameter setting unit 102, a user interface processing unit 103, and an area setting unit 104.
The "user interface" is also referred to as "UI", and the user interface processing unit 103 is also referred to as "UI processing unit 103" below.

The shaking changing unit 101 is a function of performing shaking changing processing for changing the shaking state appearing in the output moving image data by using the parameters PRM1 and PRM2 based on the user's input and automatic setting and the adjusted shaking information.
The shaking changing unit 101 performs processing for removing shaking and adding shaking to obtain an output image as moving image data in which shaking has been removed or added.
Further, the shaking changing unit 101 performs shaking changing processing for each area in the image based on the area information iAR.

The UI processing unit 103 is a function of causing the user to present an operator related to the shaking change and performing a process of acquiring operation information by the operator.
For example, the UI processing unit 103 performs a process of displaying an operator, a preview image, information about an image, and the like as a UI image on the display unit 77. Further, the UI processing unit 103 detects the user's operation by the input unit 76. For example, a touch operation on a UI image is detected.

The parameter setting unit 102 is a function of setting the parameters PRM1 and PRM2 for the shaking change processing based on the operation information acquired by the UI processing unit 103. That is, the user operation content detected by the UI processing unit 103 is converted into the parameters PRM1 and PRM2 for changing the shaking, and the data is supplied to the shaking changing unit 101 to change the shaking of the moving image data according to the user's operation. Allow processing to take place.

Here, the parameter PRM1 is a parameter for performing sway removal or sway addition, and can be said to be an indicated value of the overall sway removal amount or sway addition amount.
On the other hand, the parameter PRM2 is an instruction value of the shaking change processing that causes different shaking states for each area, such as the amount of difference in shaking for each of a plurality of areas set by the area setting unit 104 in the image. Specifically, the parameter PRM2 includes the amount of shaking for each region (the difference in the amount of shaking may be used), the period of shaking for each region, the value indicating the direction of shaking for each region, and the like. ..
In the explanation, the parameter PRM1 used for changing the shaking due to the rotation of the celestial sphere model, which will be described later, and the parameter PRM2 used for changing the shaking condition of a part area by enlarging / reducing the image by moving the coordinate points on the celestial sphere are distinguished. However, the use of the parameters PRM1 and PRM2 having such a meaning is only an example. At least, the shaking change process may be performed so that the shaking changing unit 101 is in a different shaking state for each region in the image according to the parameter PRM2.

The parameter setting unit 102 does not necessarily have to set the parameters according to the user operation. For example, parameters may be set to add a fixed shaking pattern. Further, for example, the parameters PRM1 and PRM2 may be automatically set according to the image content. For example, as archive data of shaking patterns, shaking patterns according to "slow walking person", "walking person", "running person", shaking patterns according to "rabbit", "cat", "horse", etc., automobiles, ships, etc. Make it possible to obtain information on shaking patterns according to airplanes, etc. from the database. The shaking information may be selected according to the image content, and the parameters PRM1 and PRM2 may be set according to the shaking pattern.
Of course, it is also conceivable that the user selects a shaking pattern and sets the parameters PRM1 and PRM2 according to the selection.

The area setting unit 104 performs a process of setting a plurality of areas in the image of the moving image data according to, for example, an operation of specifying the area detected by the UI processing unit 103. Then, the area information iAR indicating the set area is provided to the shaking change unit 101.
For example, the area setting unit 104 performs a touch operation such that the user traces an arbitrary range on the preview image displayed on the display unit 77 by the processing of the UI processing unit 103, an operation of designating or drawing some line, and the like. Depending on what is done, it is conceivable to determine the boundary instructed by the user and divide and set the area in the screen.
The shaking changing unit 101 grasps the area in the image by the area information iAR, and performs the shaking changing process so that the shaking is different for each area.

FIG. 7 shows another functional configuration example. FIG. 7 shows a configuration example in which the area setting unit 104 sets the area by image analysis instead of user operation.
The area setting unit 104 analyzes the input moving image data to perform, for example, determination of the subject person, determination of the main subject, determination of the type of the subject such as a moving object / fixed object / background, composition determination, and the like. Set the area division accordingly. Then, the area information iAR indicating the set area area is provided to the shaking change unit 101.

Regarding the area setting unit 104, an example of area setting based on user operation is shown in FIG. 6 and an example of automatic area setting based on image analysis is shown in FIG. 7, but both of them may be used. For example, when a user performs an operation of designating a subject, it is conceivable to identify the range of the subject or the range of a similar subject by image analysis and generate a boundary of the area. Further, for example, a region candidate may be presented to the user based on image analysis, and the user may be able to select the region division pattern to be adopted by an operation.

Although the UI processing unit 103 and the parameter setting unit 102 are shown in the configuration examples of FIGS. 6 and 7, these configurations are not essential when changing the shaking for each area. By providing at least the shaking changing unit 101 and the area setting unit 104, the shaking change for each area is realized.

Hereinafter, the shaking change for each region realized by the above functions will be described.
8A and 8B show examples of area setting, respectively.
For example, FIG. 8A is an example divided into a region AR1 in which a person riding a bicycle is shown on the lower side of the image and a region AR2 on the upper side considered to be a background image. note that. In the figure, the boundary region is shown as the region AR3, which can be considered to have a slight width at the boundaries of the regions AR1 and AR2.
The areas AR1 and AR2 are set according to the user performing an operation such as tracing the boundary between the areas AR1 and AR2. Alternatively, by image analysis, for example, the area AR1 is automatically set so as to surround the area in which a person is captured, and the other areas are set as the area AR2.

Here, the area AR1 is an area for expressing power by shaking, and since the area AR2 is a background or the like, it is desired to eliminate or reduce the shaking.
In the shaking change process according to the area setting in this way, the shaking is relatively large for the area AR1 and the shaking is reduced for the area AR2.

For example, assuming an image that has already been image-stabilized or an image captured by a fixed camera that does not originally shake, the following shaking change processing for each area executed by the shake changing unit 101 is performed. Shake change processing such as processing P1 and processing P2 is assumed.
-Processing P1: Shaking and adding only to the area AR1 (the area AR2 maintains a state without shaking).
-Processing P2: A large amount of shaking is added to the area AR1, and a small amount of shaking is added to the area AR2.

Further, as the shaking change processing for each area executed by the shaking changing unit 101 for the image with shaking, shaking changing processing such as the following processing P3, processing P4, processing P5, and processing P6 is assumed.
-Processing P3: Shake addition to the area AR1 and shaking removal (including partial removal) to the area AR2.
-Processing P4: A large amount of shaking is added to the area AR1, and a small amount of shaking is added to the area AR2.
-Processing P5: Shake removal (including partial removal) is performed only in the area AR2 (the area AR1 maintains the shaking state).
-Processing P6: Partial sway removal for removing a small amount of sway in the area AR1 and sway removal for almost eliminating the sway in the area AR2 are performed.

By changing the shaking for each of these areas, in the case of the image of FIG. 8A, it is possible to obtain an image that the areas of the people running on the bicycle are shaking, but the background sky and buildings are not shaking. ..
By setting the area to be shaken in the image and the area not to be shaken in this way, it is possible to achieve both the effect of the shaking and the effect of stopping the shaking to make it easier to see.

Regarding the area AR3, which is a boundary area, it is preferable to set the area where the intermediate amount of shaking is expressed at the boundary between the areas AR1 and AR2 so that the amount of shaking does not change suddenly. This makes it possible to prevent the boundary portion from becoming unnatural in an image in which a part of the shaking is emphasized.

FIG. 8B is an example of dividing the area of the course (paved road and dirt) in the image. In this case as well, it is possible to express the difference in the road surface by performing the above-mentioned processes P1 to P6 and making the sway of the dirt side area AR1 more violent than that of the pavement side area AR2. ..
That is, by making the degree of shaking different according to the difference in the situation in the image, an image effect that emphasizes the situation in the image can be obtained.

For example, the following is a specific example of the effect, taking as an example the image after the shaking at the time of the original imaging is removed by the shaking removal.

・ Off-road driving By adding pitching to images of cars and motorcycles running on rough roads and their surroundings, stationary objects such as distant views and buildings remain stationary and moving (cars). And the bike and its surroundings) can be emphasized.

・ Skiing, snowboarding When a skier or snowboarder who is skiing on a snow surface is photographed from behind or in front of the skier or snowboarder, adding rotation (roll) shaking to the image of the skier or snowboarder or its surroundings. Therefore, it is possible to produce a slalom feeling and a gliding feeling.

・ Wrestling of sumo wrestlers At the moment of wrestling, by adding rolling to the images of wrestlers, dohyo and their surroundings, the impact of the moment of collision can be emphasized and expressed in the images.

Hereinafter, the method of changing the shaking will be described.
First, rotation processing on the celestial sphere is effective for removing the overall shaking of the image and adding the overall shaking.
FIG. 9 schematically shows a processing example of shaking change using the celestial sphere model MT.

It is assumed that FIG. 9A shows one frame of an image that is shaken due to camera shake or the like at the time of imaging.
The frame of this two-dimensional image is converted into a three-dimensional image model attached to the virtual celestial sphere model MT as shown in FIG. 9B. In this case, the corresponding points from two dimensions to three dimensions are obtained from the lens distortion characteristic data.
When this celestial sphere model MT is rotated according to the shaking information at the time of imaging, a three-dimensional image model in which the shaking of FIG. 9C is removed can be obtained. The shaking information at the time of imaging is, for example, IMU data described later or a quarternion QD based on the IMU data.
Then, when the specific coordinate range AZ shown by the thick line is cut out from the celestial sphere model MT of FIG. 9C, a two-dimensional image in which shaking and lens distortion are removed can be obtained as shown in FIG. 9D.

The processing using such a celestial sphere model MT will be described in detail later, but in addition to such processing, a part of the image is enlarged / reduced frame by frame by moving the coordinate points on the celestial sphere model MT. Then, it becomes possible to express different shaking conditions for each area.

10 and 11 show a state in which a certain frame and another frame are attached to the celestial sphere model MT. Here, the radial line is a line corresponding to the longitude in the celestial sphere model MT, and the concentric line is a line corresponding to the latitude.
The images of FIGS. 10 and 11 are images of different frames that differ in time, and the positions of the images projected onto the celestial sphere model MT as shown in FIGS. 9A to 9B are displaced due to the shaking during imaging. It becomes a thing.

Here, since the shaking is different between the upper part (region AR2 in FIG. 8) and the lower part (region AR1 in FIG. 8) of the image, it is assumed that the shaking is added to the lower part of the image.
10 and 11 show the one with shaking added to the lower part, and when comparing the figures, it can be seen that the latitude / longitude lines on the lower side are slightly expanded in FIG. 10 and slightly reduced in FIG. .. This is to change the moving method (direction) and the amount of movement of the coordinate points at each intersection of the lines according to the degree of shaking.
As an image, the lower part is in a state of expansion and contraction. The degree of expansion and contraction differs from frame to frame, so that the shaking at the bottom of the image is expressed on the moving image.

In this way, the concept of the process of changing the degree of shaking depending on the region by moving the coordinate points will be described. This coordinate point movement can be considered as expansion or contraction of a partial pixel range.
FIG. 12A shows a pixel range of a certain moving image data and a cutout range CR as an image to be output.
FIG. 12A shows a state in which there is no enlargement / reduction in the moving image data, FIG. 12B shows a state in which the shaded area is enlarged, and FIG. 12C shows a state in which the shaded area is reduced. In any of FIGS. 12A, 12B, and 12C, the area of the cutting range CR does not change.

Here, the shaded area can be considered as the area AR3 which is the boundary area in FIG. 8A and the like.
In FIG. 12B, by enlarging the boundary region AR3, the coordinates of each pixel in the lower portion (region AR1 in FIG. 8A) move downward.
In FIG. 12C, the boundary region AR3 is reduced, so that the coordinates of each pixel in the lower portion (region AR1 in FIG. 8A) move upward.

In the case of FIG. 12B, the enlargement processing shown in FIG. 13 is performed in the shaded portion which is the boundary region (AR3). FIG. 13A shows a certain four pixels in the boundary region. “A”, “b”, “c”, and “d” are pixel values (for example, RGB values). When the enlargement process is performed as shown in FIG. 13B, when "la", "lb", "lc", and "ld" are the distances from the center of "p" to "a", "b", "c", and "d", the following To do so.

In the case of FIG. 12C, the reduction process shown in FIG. 14 is performed in the shaded portion which is the boundary region (AR3). FIG. 14A is similar to FIG. 13A.
The reduction process is performed as shown in FIGS. 14B and 14C. In FIG. 14B, "la", "lb", "lc", and "ld" are the distances from "p" to "a", "b", "c", and "d". In FIG. 14C, the pixel size is "1" and "p". And "a" show the x component.
Assuming that the reduction ratio is "s" and all pixels within the range of i = p = {a, b, c, d}, the following is performed.

Performing the above-mentioned enlargement / reduction for each frame means that the intersection coordinates of the mesh of the celestial sphere model MT are moved according to the degree of shaking, and the image cut out as the cutout range CR in FIG. 12 is from the boundary area. As you can see, shaking is added to one area.
Therefore, for example, with respect to the shaking of the region AR2, the rotation of the celestial sphere model MT removes, partially removes, or adds, and then the movement of the coordinate points due to the enlargement / reduction of the boundary region is added to the pixels of the region AR1. AR1 is in a state where a stronger shaking than the region AR2 is added.

In the above, the example of shaking the area AR1 with respect to the area AR2 has been described, but of course it is also possible to shake the areas AR1 and AR2 with different shaking amounts, shaking directions, or shaking cycles by changing the coordinates. A boundary area in contact with the area AR1 and a boundary area in contact with the area AR2 may be set, and enlargement / reduction may be performed for each frame according to different shaking amounts / cycles / directions.

Further, the above-mentioned processes P1 to P6 may be realized by combining the sway change due to the rotation of the celestial sphere model MT and the sway change of a specific region due to the coordinate change.

<3. Video files and metadata>
Hereinafter, an example in which the moving image file MF imaged by the image pickup device 1 which is the image source VS and input to the image processing device TDx is subjected to the shaking change processing for each region as described above will be described.
First, the contents of the moving image file MF and the contents of the metadata transmitted from the image source VS of the image pickup device 1 or the like to the image processing device TDx will be described.
FIG. 15A shows the data included in the moving image file MF. As shown in the figure, the moving image file MF includes various data as "header", "sound", "movie", and "metadata".

In the "header", information such as a file name and a file size as well as information indicating the presence or absence of metadata are described.
"Sound" is audio data recorded together with a moving image. For example, 2-channel stereo audio data is stored.
The "movie" is moving image data, and is composed of image data as each frame (# 1, # 2, # 3, ...) Constituting the moving image.
As the "metadata", additional information associated with each frame (# 1, # 2, # 3, ...) Constituting the moving image is described.

An example of the contents of the metadata is shown in FIG. 15B. For example, IMU data, coordinate conversion parameter HP, timing information TM, and camera parameter CP are described for one frame. It should be noted that these are a part of the metadata contents, and here, only the information related to the image processing described later is shown.

As the IMU data, a gyro (angular velocity data), an accelerator (acceleration data), and a sampling rate are described.
The IMU mounted on the image pickup apparatus 1 as the sensor unit 23 outputs angular velocity data and acceleration data at a predetermined sampling rate. Generally, this sampling rate is higher than the frame rate of the captured image, so that many IMU data samples can be obtained in one frame period.

Therefore, as the angular velocity data, n samples are associated with each frame, such as gyro sample # 1, gyro sample # 2, ... Gyro sample # n shown in FIG. 15C.
As acceleration data, m samples are associated with each frame, such as accelerator sample # 1, accelerator sample # 2, ... accelerator sample # m.
In some cases, n = m, and in other cases, n ≠ m.
Although the metadata is described here as an example associated with each frame, for example, the IMU data may not be completely synchronized with the frame. In such a case, for example, the time information associated with the time information of each frame is provided as the IMU sample timing offset in the timing information TM.

The coordinate conversion parameter HP is a general term for parameters used for correction accompanied by coordinate conversion of each pixel in the image. It also includes non-linear coordinate transformations such as lens distortion.
The coordinate conversion parameter HP is a term that can include at least a lens distortion correction parameter, a trapezoidal distortion correction parameter, a focal plane distortion correction parameter, an electronic image stabilization parameter, and an optical image stabilization parameter.

The lens distortion correction parameter is information for directly or indirectly grasping how the distortion such as barrel aberration and pincushion aberration is corrected and returning the image to the image before the lens distortion correction. The metadata regarding the lens distortion correction parameter as one of the metadata will be briefly described.
FIG. 16A shows the image height Y, the angle α, the entrance pupil position d1, and the exit pupil position d2 in the schematic diagram of the lens system 11 and the image sensor 12a.
The lens distortion correction parameter is used in image processing to know the incident angle of each pixel of the image sensor 12a. Therefore, it is sufficient to know the relationship between the image height Y and the angle α.

FIG. 16B shows the image 110 before the lens distortion correction and the image 111 after the lens distortion correction. The maximum image height H0 is the maximum image height before distortion correction, and is the distance from the center of the optical axis to the farthest point. The maximum image height H1 is the maximum image height after distortion correction.
The metadata required to understand the relationship between the image height Y and the angle α is the maximum image height H0 before distortion correction and the incident angle data d0, d1, ... d (N-) for each of the N image heights. 1). It is assumed that "N" is about 10 as an example.

Returning to FIG. 15B, the trapezoidal distortion correction parameter is a correction amount when correcting the trapezoidal distortion caused by shifting the cutout area from the center by electronic image stabilization, and is also a value corresponding to the correction amount of electronic image stabilization.

The focal plane distortion correction parameter is a value indicating the amount of correction for each line with respect to the focal plane distortion.

Regarding electronic image stabilization and optical image stabilization, it is a parameter that indicates the amount of correction in each axial direction of yaw, pitch, and roll.

The parameters of lens distortion correction, trapezoidal distortion correction, focal plane distortion correction, and electronic image stabilization are collectively referred to as coordinate conversion parameters, but these correction processes are performed by each of the image sensors 12a of the image sensor unit 12. This is because it is a correction process for an image formed on a pixel and is a parameter of a correction process that involves a coordinate conversion of each pixel. Optical image stabilization is also one of the coordinate conversion parameters, but this is because the correction of the fluctuation of the inter-frame component in the optical image stabilization is a process that involves coordinate conversion of each pixel.
That is, if the reverse correction is performed using these parameters, the image data to which the lens distortion correction, the trapezoidal distortion correction, the focal plane distortion correction, the electronic image stabilization, and the optical image stabilization have been performed can be captured before each correction processing, that is, by imaging. It is possible to return to the state when the image sensor 12a of the element unit 12 is imaged.

The lens distortion correction, trapezoidal distortion correction, and focal plane distortion correction parameters are distortion correction processing for the case where the optical image itself from the subject is an image captured in an optically distorted state, and each of them is an optical distortion. Since it is intended for correction, it is collectively referred to as an optical distortion correction parameter.
That is, if the reverse correction is performed using these parameters, the image data to which the lens distortion correction, the trapezoidal distortion correction, and the focal plane distortion correction have been performed can be returned to the state before the optical distortion correction.

The timing information TM in the metadata includes each information of exposure time (shutter speed), exposure start timing, readout time (curtain speed), number of exposure frames (long exposure information), IMU sample offset, and frame rate.
In the image processing of the present embodiment, these are mainly used to associate the line of each frame with the IMU data.
However, even if the image sensor 12a is a CCD or a global shutter CMOS, if the exposure center of gravity shifts using an electronic shutter or mechanical shutter, the exposure start timing and curtain speed are also used to match the exposure center of gravity. Correction is possible.

As the camera parameter CP in the metadata, the angle of view (focal length), zoom position, and lens distortion information are described.

<4. Image processing of the embodiment>
A processing example of the information processing device 70, which is the image processing device TDx as an embodiment, will be described.
FIG. 17 shows the procedures of various processes executed in the information processing device 70 as the image processing device TDx, and shows the relationship of the information used in each process.
Depending on the function of the shaking changing unit 101 of FIGS. 6 and 7, the processes of steps ST13, ST14, ST15, and ST16 enclosed as step ST30 in FIG. 17 are performed.
Depending on the function of the parameter setting unit 102, the parameter setting process of step ST41 is performed.
Depending on the function of the UI processing unit 103, the UI processing in step ST40 is performed.
Depending on the function of the area setting unit 104, the area setting process of step ST42 is performed.

In FIG. 17, step ST30 surrounded by a broken line is referred to as "sway change", and step ST16 is described as "sway change processing". Step ST16 is a celestial sphere model MT for actually changing the state of shaking. It is a process such as movement of coordinate points by rotation or enlargement / reduction of the boundary area, and is a "sway change" in a narrow sense.
On the other hand, step ST30 is a broadly defined "sway change" including the sway change process of step ST16, the celestial sphere model process as a preparation for the process, and parameter setting.

As the processing of FIG. 17, first, steps ST1, ST2, ST3, and ST4 as preprocessing will be described.
The pre-processing is the processing performed when the moving image file MF is imported.
The term "import" as used herein means that the information processing device 70 targets, for example, a moving image file MF that can be accessed by being imported into a storage unit 79 or the like, and performs image processing by performing preprocessing. It means to develop as possible. For example, it does not mean transferring from the image pickup device 1 to the mobile terminal 2 or the like.

The CPU 71 imports the moving image file MF designated by the user operation or the like so as to be the image processing target, and also performs processing related to the metadata added to the moving image file MF as preprocessing. For example, a process of extracting and storing metadata corresponding to each frame of a moving image is performed.
Specifically, in this preprocessing, metadata extraction (step ST1), all IMU data concatenation (step ST2), metadata retention (step ST3), conversion to quotation (posture information of imaging device 1), and retention. (Step ST4) is performed.

As the metadata extraction in step ST1, the CPU 71 reads the target moving image file MF and extracts the metadata included in the moving image file MF as described with reference to FIG.
Note that part or all of steps ST1, ST2, ST3, and ST4 may be performed on the image source VS side such as the image pickup apparatus 1. In that case, in the pre-processing, the contents after the processing described below are acquired as metadata.

Of the extracted metadata, the CPU 71 performs a concatenation process on the IMU data (angular velocity data (gyro sample) and acceleration data (accelerator sample)) in step ST2.
This is a process of constructing IMU data corresponding to the entire sequence of moving images by arranging and concatenating all the IMU data associated with all frames in chronological order.
Then, integration processing is performed on the connected IMU data to calculate a quarternion QD representing the posture of the imaging device 1 at each time point on the sequence of moving images, and this is stored and retained. It is an example that the quarternion QD is calculated.
It is also possible to calculate the quarternion QD using only the angular velocity data.

Among the extracted metadata, the CPU 71 performs a process of holding the metadata other than the IMU data, that is, the coordinate conversion parameter HP, the timing information TM, and the camera parameter CP in step ST3. That is, the coordinate conversion parameter HP, the timing information TM, and the camera parameter CP are stored in a state corresponding to each frame.

By performing the above preprocessing, the CPU 71 is ready to perform various image processing including the shaking change of the moving image data received as the moving image file MF.
The routine processing of FIG. 17 shows image processing performed on the moving image data of the moving image file MF that has been preprocessed as described above.

The CPU 71 takes out one frame of the moving image (step ST11), cancels the internal correction of the image pickup device (step ST12), pastes it on the celestial sphere model (step ST13), synchronizes processing (step ST14), adjusts the shaking information (step ST15), and shakes. Change (step ST16), output area designation (step ST17), plane projection and cutout (step ST18).

The CPU 71 performs each of the above steps ST11 to ST20 for each frame when reproducing the image of the moving image file MF.
Along with these, the CPU 71 performs UI processing (step ST40), parameter setting processing (step ST41), and area setting processing (step ST42) at necessary timings.

In step ST11, the CPU 71 decodes one frame of the moving image (moving image file MF) along the frame number FN. Then, one frame of moving image data PD (#FN) is output. Note that "(#FN)" indicates a frame number and indicates that the information corresponds to that frame.
If the moving image is not encoded by compression or the like, the decoding process in step ST11 is not necessary.

In step ST12, the CPU 71 performs a process of canceling the internal correction performed by the image pickup apparatus 1 for the moving image data PD (#FN) of one frame.
For this purpose, the CPU 71 refers to the coordinate conversion parameter HP (#FN) stored corresponding to the frame number (#FN) at the time of preprocessing, and performs a correction opposite to the correction performed by the image pickup apparatus 1. As a result, moving image data iPD (#FN) in a state in which lens distortion correction, trapezoidal distortion correction, focal plane distortion correction, electronic camera shake correction, and optical camera shake correction in the image pickup apparatus 1 are canceled is obtained. That is, it is moving image data in which the shaking removal or the like performed by the image pickup apparatus 1 is canceled and the influence of the shaking such as camera shake at the time of imaging appears as it is. This is because, after canceling the correction process at the time of imaging to the state before the correction, more accurate shaking removal and shaking addition using the shaking information at the time of imaging (for example, quaternion QD) are performed.
However, the process of canceling the internal correction of the image pickup apparatus as step ST12 may not be performed. For example, the process of step ST12 may be skipped and the moving image data PD (#FN) may be output as it is.

In step ST13, the CPU 71 attaches the 1-frame video data iPD (#FN) in a state where various corrections have been canceled to the celestial sphere model MT. At this time, the camera parameter CP (#FN) stored corresponding to the frame number (#FN), that is, the angle of view, the zoom position, and the lens distortion information are referred to.

FIG. 18 shows an outline of attachment to the celestial sphere model MT.
FIG. 18A shows the moving image data iPD. The image height h is the distance from the center of the image. Each circle in the figure indicates a position where the image heights h are equal.
From the angle of view, zoom position, and lens distortion information of this moving image data iPD frame, the "relationship between the image sensor surface and the incident angle φ" in that frame is calculated, and "data0" at each position on the image sensor surface ... Let's say "dataN-1". Then, from "data0" ... "dataN-1", it is expressed as a one-dimensional graph of the relationship between the image height h and the incident angle φ as shown in FIG. 18B. The incident angle φ is the angle of the light beam (the angle seen from the optical axis).
This one-dimensional graph is rotated once around the center of the captured image, and the relationship between each pixel and the incident angle is obtained.
Accordingly, mapping from the pixel G1 in FIG. 18C to the celestial sphere model MT is performed for each pixel of the moving image data iPD as in the pixel G2 on the celestial sphere coordinates.

As described above, an image (data) of the celestial sphere model MT in which the captured image is attached to the ideal celestial sphere with the lens distortion removed can be obtained. In this celestial sphere model MT, the parameters and distortions peculiar to the image pickup device 1 that originally captured the moving image data iPD are removed, and the range that can be seen by an ideal pinhole camera is pasted on the celestial sphere.
Therefore, by rotating the image of the celestial sphere model MT in a predetermined direction in this state, it is possible to realize the overall shaking removal of the image and the shaking change processing as the shaking addition.

Here, the attitude information (quarterion QD) of the image pickup apparatus 1 is used for the shaking change processing. Therefore, the CPU 71 performs the synchronization process in step ST14.
In the synchronous process, a process of identifying and acquiring a quaternion QD (#LN) suitable for each line corresponding to the frame number FN is performed. Note that "(#LN)" indicates a line number in the frame and indicates that the information corresponds to that line.

The reason why the quarternion QD (#LN) for each line is used is that when the image sensor 12a is a CMOS type and the imaging is performed by the rolling shutter method, the amount of shaking differs for each line.
For example, when the image sensor 12a is a CCD type and the image is taken by the global shutter method, a frame-by-frame quarternion QD (#FN) may be used.
Even when using a CCD or CMOS global shutter as the image sensor 12a, the center of gravity shifts when an electronic shutter (similar to a mechanical shutter) is used, so the center of the exposure period of the frame (shifts according to the shutter speed of the electronic shutter). ) Timing quotation should be used.

Now consider the blur that appears in the image.
Blur is image bleeding due to relative movement between the image pickup device and the subject in the same frame. That is, image bleeding due to shaking within the exposure time. The longer the exposure time, the stronger the effect of blurring.
Electronic image stabilization can reduce / eliminate "shake" that occurs between frames when a method that controls the image range to be cut out for each frame is used, but relative shake within the exposure time is such an electron. It cannot be reduced by image stabilization.
When changing the cutout area by image stabilization, the posture information of each frame is used, but if the posture information deviates from the center of the exposure period such as the start or end timing of the exposure period, the posture. The direction of shaking within the exposure time based on the above is biased, and bleeding is easily noticeable. Further, in the CMOS rolling shutter, the exposure period is different for each line.

Therefore, in the synchronization process of step ST14, the quarternion QD is acquired for each frame of the moving image data based on the timing of the exposure center of gravity for each line.
FIG. 19 shows the synchronization signal cV of the image pickup apparatus 1 during the vertical period, the synchronization signal sV of the image sensor 12a generated from the synchronization signal cV, and the sample timing of the IMU data, and also shows the exposure timing range 120. There is.
The exposure timing range is a parallelogram schematically showing the exposure period of each line of one frame when the exposure time is t4 by the rolling shutter method. Further, the temporal offset t0 of the synchronization signal cV and the synchronization signal sV, the IMU sample timing offset t1, the read start timing t2, the read time (shutter speed) t3, and the exposure time t4 are shown. The read start timing t2 is the timing after a predetermined time t2of has passed from the synchronization signal sV.
Each IMU data obtained at each IMU sample timing is associated with a frame. For example, the IMU data in the period FH1 is the metadata associated with the current frame indicating the exposure period in a parallelogram, and the IMU data in the period FH1 is the metadata associated with the next frame. However, by concatenating all the IMU data in step ST2 of FIG. 17, the association between each frame and the IMU data is released, and the IMU data can be managed in chronological order.
In this case, the IMU data corresponding to the exposure center of gravity (timing of the broken line W) of each line of the current frame is specified. This can be calculated if the temporal relationship between the IMU data and the effective pixel area of the image sensor 12a is known.

Therefore, the IMU data corresponding to the exposure center of gravity (timing of the broken line W) of each line is specified by using the information that can be acquired as the timing information TM corresponding to the frame (#FN).
That is, it is information on the exposure time, the exposure start timing, the readout time, the number of exposure frames, the IMU sample offset, and the frame rate.
Then, the quaternion QD calculated from the IMU data of the exposure center of gravity is specified and used as the quaternion QD (#LN) which is the attitude information for each line.

This quarternion QD (#LN) is provided for the process of adjusting the shaking information in step ST15.
In the shaking information adjustment, the CPU 71 adjusts the quaternion QD according to the input parameter PRM1 for changing the shaking.
The parameter PRM1 may be a parameter input according to a user operation or a parameter generated by automatic control.

The user can input to set the parameter PRM1 by the operation method provided in the UI processing of step ST40 so as to add an arbitrary degree of shaking to the image. Further, the CPU 71 can generate the parameter PRM1 by automatic control according to an image analysis, an image type, a user's shaking model selection operation, or the like.

For example, in the UI processing of step ST40, the user can input an operation to instruct the shaking change. That is, an operation for instructing the shaking as a shaking effect, an operation for instructing the degree of shaking removal, and the like.

Based on the UI processing in step ST40, the CPU 71 sets the parameters in step ST41. That is, the shaking change parameter is set according to the user operation, and is used for the shaking information adjustment process in step ST15.

Further, in the UI processing of step ST40, the user can further perform an operation of dividing the area and an operation input for instructing the amount of shaking to be applied to each area. That is, it is an operation for changing the shaking for each area.
In the parameter setting process of step ST41, the parameter PRM2 for changing the shaking for each area is set according to such processing and provided for the shaking changing process of step ST16.

Further, in the area setting process of step ST42, the CPU 71 sets the area according to the user operation and provides the area information iAR to the shaking change process of step ST16.
As the area setting process in step ST42, an automatic area setting based on the image analysis result may be performed instead of the area setting according to the user operation. Therefore, in the area setting process, there is a case where the image analysis process is performed on the moving image data iPD, the area is automatically set according to the subject recognition, and the area information iAR is provided for the shaking change process in step ST16.
Further, in the area setting process, the area information iAR may be generated according to both the image analysis and the user input. For example, when a user specifies a certain person on an image, the pixel range of the person is specified by image analysis, and the pixel range is set as one area.

In the process of adjusting the shaking information in step ST15, the CPU 71 adds shaking to the image or increases or decreases the amount of shaking based on the quaker QD which is the shaking information at the time of imaging and the parameter PRM1 set in step ST41. Generate adjusted quaternion eQD for.

A specific example of generating the adjusted quaternion eQD will be described with reference to FIGS. 20, 21, and 22.
FIG. 20 shows an example in which the adjusted quarternion eQD is generated according to the instruction of the gain for each frequency band by the parameter PRM1.
The frequency band is a band of fluctuation frequencies. For the sake of explanation, it is assumed that the band is divided into three bands: low band, middle band, and high band. Of course, this is only an example, and the number of bands may be 2 or more.
The low-frequency gain LG, the mid-frequency gain MG, and the high-frequency gain HG are given as the parameters PRM1.

The adjustment processing system of FIG. 20 includes a low-pass filter 41, a mid-pass filter 42, a high-pass filter 43,

gain calculation units

44, 45, 46, and a synthesis unit 47.
"Quaternion QDs for shaking" are input to this adjustment processing system. This is the conjugate of the quarternion QD as shake information during imaging.

Each value q for the current frame as the quaternion QDs for shaking and the predetermined frames before and after is input to the low-pass filter 41, and the low-pass component q _low is obtained.

The gain calculation unit 44 gives the low-frequency gain LG to the low-frequency component q _low.
Mean (q, n) in the equation indicates the average value of n before and after q.
It goes without saying that this mean (q, n) equation is just an example of a low-pass filter, and other calculation methods may be used. Each equation described below is also an example.

The value q of the quarternion QDs for shaking is also input to the _mid- range passing filter 42, and the mid-range component q mid is obtained.

Note that q ^* _low is a conjugate of _{q low.}
Also, "x" is the quaternion product.
The gain calculation unit 45 gives the mid-range gain MG to the _mid- range component q mid.

Further, the value q of the quarternion QDs for shaking is input to the high frequency passing filter 43, and the high frequency component q _high is obtained.

Note that q ^* _mid is a conjugate of _{q mid.}
The gain calculation unit 46 gives a high-frequency gain HG to the high-frequency component q _high.

These

gain calculation units

44, 45, 46 set the input to “q _in ”.

In this case, the following "q _out " is output with θ'= θ * gain.
(However, gain is low frequency gain LG, mid frequency gain MG, high frequency gain HG)

Such

gain calculating section

44, 45 and 46, respectively the low frequency gain LG, midrange gain MG, high frequency gain HG is low frequency component q _'low, midrange component q' given _mid, high-frequency components q 'You get _high. _{The value q mixed} obtained by combining this with the synthesis unit 47 is obtained.

Note that "x" is the quaternion product.
The value q _mixed thus obtained becomes the value of the adjusted quarternion eQD.
Although the above is an example of band division, an adjusted quarternion eQD generation method in which a gain corresponding to the parameter PRM1 is given without band division is also conceivable.

Next, FIG. 21 shows an example in which the adjusted quarternion eQD is generated according to the instruction of the gain for each direction by the parameter PRM1.
The direction is the direction of sway, that is, the direction of yaw, pitch, and roll.
Yaw gain YG, pitch gain PG, and roll gain RG are given as parameters PRM1.

The adjustment processing system of FIG. 21 includes a yaw component extraction unit 51, a pitch component extraction unit 52, a roll component extraction unit 53, a

gain calculation unit

54, 55, 56, and a synthesis unit 57.
Information on the yaw axis, the pitch axis, and the roll axis is provided to the yaw component extraction unit 51, the pitch component extraction unit 52, and the roll component extraction unit 53, respectively.

Each value q for the current frame as the quarternion QDs for shaking and the predetermined frames before and after is input to the yaw component extraction unit 51, the pitch component extraction unit 52, and the roll component extraction unit 53, respectively, and the yaw component q _yaw and the pitch component q are input. _{Find pitch} and roll component q _roll .
In each of these component extraction processes, the input is set to the next “q _in ”.

u is a unit vector representing the direction of axes such as the yaw axis, the pitch axis, and the roll axis.
_{In this case, the following "q out} " is output with θ'= θ * (a · u).

_{Then, the yaw component q yaw} , the pitch component q _pitch , and the roll component q _roll obtained by such component extraction are given the yaw gain YG, the pitch gain PG, and the roll gain RG by the

gain calculation units

54, 55, and 56, respectively. ..
The obtained yaw component q _'yaw, pitch component q' which has been subjected to gain calculation _pitch, the value q _mixed synthesized in roll component q _'roll synthesis unit 47.

In this case, "x" is also a quarter product.
The value q _mixed thus obtained becomes the value of the adjusted quarternion eQD.

FIG. 22 is an example in which the above frequency bands and directions are combined.
The adjustment processing system includes a low-pass filter 41, a mid-pass filter 42, a high-pass filter 43, direction-

specific processing units

58, 59, 60, a

gain calculation unit

44, 45, 46, and a synthesis unit 61.
Depending on the parameter PRM1, low-frequency gain LG, mid-frequency gain MG, high-frequency gain HG, and yaw gain YG, pitch gain PG, and roll gain RG (not shown) are given.

In this adjustment processing system, each value q for the current frame as the quaternion QDs for shaking and the predetermined frames before and after is supplied to the low-pass filter 41, the mid-pass filter 42, and the high-pass filter 43, and the respective bands are supplied. Get the ingredients. Each band component is input to the direction-

specific processing units

58, 59, 60.
Each of the direction-

specific processing units

58, 59, 60 has a yaw component extraction unit 51, a pitch component extraction unit 52, a roll component extraction unit 53, a

gain calculation unit

54, 55, 56, and a synthesis unit 57 in FIG. do.
That is, in the direction-specific processing unit 58, the low-frequency components of the quarternion QDs for shaking are divided into the yaw direction, roll direction, and pitch direction components, and the gain calculation is performed using the yaw gain YG, pitch gain PG, and roll gain RG. After performing the above, synthesize.
The direction-specific processing unit 59 divides the mid-range components of the quaternion QDs for shaking into the components in the yaw direction, the roll direction, and the pitch direction, performs the same gain calculation, and then synthesizes the components.
The direction-specific processing unit 60 divides the high-frequency components of the quaternion QDs for shaking into components in the yaw direction, roll direction, and pitch direction, performs gain calculation in the same manner, and then synthesizes the components.

It is assumed that the gains used in the direction-

specific processing units

58, 59, and 60 have different gain values. That is, the direction-specific processing unit 58 uses the low-frequency yaw gain YG, the low-frequency pitch gain PG, and the low-frequency roll gain RG, and the direction-specific processing unit 59 uses the mid-range yaw gain YG and the mid-range. The pitch gain PG and the roll gain RG for the mid range are used, and the direction-specific processing unit 60 uses the yaw gain YG for the high range, the pitch gain PG for the high range, and the roll gain RG for the high range. That is, it is conceivable that the direction-

specific processing units

58, 59, and 60 use nine gains.

The outputs of these direction-

specific processing units

58, 59, and 60 are supplied to the

gain calculation units

44, 45, and 46, respectively, and low-frequency gain LG, mid-frequency gain MG, and high-frequency gain HG are given, respectively. Then, it is synthesized by the synthesis unit 61 and output as the value of the adjusted quarternion eQD.

In the above example of FIG. 22, after dividing by frequency band first, processing for each direction is applied for each band component, but the reverse is also possible. That is, after dividing by direction first, processing for each frequency band may be applied for each direction component.
In that case, it is conceivable to use nine gains in the processing for each frequency band. For example, in the processing for each frequency band in the yaw direction, the low-frequency gain LG for the yaw direction, the mid-range gain MG for the yaw direction, and the high-frequency gain HG for the yaw direction are used. In the processing for each frequency band in the pitch direction, the low-frequency gain LG for the pitch direction, the mid-range gain MG for the pitch direction, and the high-frequency gain HG for the pitch direction are used. In the processing for each frequency band in the roll direction, the low-frequency gain LG for the roll direction, the mid-range gain MG for the roll direction, and the high-frequency gain HG for the roll direction are used.

In step ST15 of FIG. 17, for example, the adjusted quarternion eQD is generated by the above processing example.
Then, the generated adjusted quarternion eQD is provided for the shaking change processing in step ST16.
The shaking change processing in step ST16 can be considered to apply, for example, the adjusted quarternion eQD obtained by the processing of FIGS. 20, 21, and 22 to an image to add shaking.
If the adjusted quaternion eQD is generated as a value for removing sway or partially removing sway, not for swaying, what is the sway change processing in step ST16? The adjusted quaternion eQD is applied to the image to remove the sway. ， Can be considered to be reduced.

In the shaking change processing of step ST16, the CPU 71 rotates the image of the celestial sphere model MT to which the frame image is pasted in step ST13 by using the quatertern eQD (#LN) after adjusting for each line, thereby rotating the entire image. Try to change the sway.

Further, in the shaking change processing in step ST16, the CPU 71 changes the shaking for each area according to the area information iAR and the parameter PRM2. That is, first, the boundary region AR3 is specified from the regions AR1 and AR2 grasped from the region information iAR. Alternatively, the area AR3 is specified from the information of the boundary area AR3 set together with the areas AR1 and AR2 in the area setting process of step ST42 and included in the area information iAR.
Further, based on the parameter PRM2, the amount of shaking given to the area AR2 (for example, the difference in the amount of shaking from the area AR1), the period of shaking, the direction of shaking, and the like are grasped.
Then, by enlarging / reducing the area AR3 for each frame according to the values such as the amount of shaking, the period, and the direction and the identification of the boundary area AR3, the celestial sphere model MT of the pixels in the range of the area AR1 for each frame. Allow movement of the coordinate points above.

In such a sway change process in step ST16, the sway change from the above-mentioned process P1 to the process P6 is executed by the overall sway change due to the rotation of the celestial sphere model MT and the sway change due to the coordinate point movement.
For example, by shaking the region AR1 after removing the shaking (or removing a part of the shaking) by rotating the celestial sphere model MT, the region AR1 is shaken and the region AR2 is not shaken (or slightly shaken). (For example, processing P1, processing P3, processing P5)
Further, by further shaking the region AR1 after adding the shaking by the rotation of the celestial sphere model MT, the region AR1 is shaken greatly and the region AR2 is shaken slightly. (For example, processing P2, processing P4)
Further, by removing the shaking by rotating the celestial sphere model MT and then shaking the areas AR1 and AR2 with different amounts of shaking by moving the coordinate points, the area AR1 can generate a large shaking image and the area AR2 can generate a small shaking image. (For example, processing P2, processing P4)

The image of the celestial sphere model hMT whose shaking has been changed as described above is sent to the processing of step ST18.
In step ST18, the CPU 71 projects the image of the celestial sphere model hMT whose shaking has been changed onto a plane and cuts it out to obtain an image (output moving image data oPD) whose shaking has been changed.

In this case, the sway change is realized by the rotation of the celestial sphere model MT, and by using the celestial sphere model MT, the trapezoidal shape is not formed no matter where it is cut out, and as a result, the trapezoidal distortion is also eliminated. Further, as described above, the celestial sphere model MT has no lens distortion because the range that can be seen by an ideal pinhole camera is pasted on the celestial sphere. The focal plane distortion correction is also eliminated by rotating the celestial sphere model MT according to the adjusted quarternion eQD (#LN) based on the quarternion QD (#LN) for each line.
Furthermore, since the quarternion QD (#LN) corresponds to the exposure center of gravity of each line, the blur is inconspicuous in the image.

The correspondence between the image after the plane projection in step ST18 and the celestial sphere model MT is as follows.
FIG. 23A shows an example of the rectangular coordinate plane 131 to be projected in a plane. Let each coordinate of the image projected on the plane be (x, y).
As shown in FIG. 23B, the coordinate plane 131 is arranged (normalized) in the three-dimensional space so as to be in contact with the celestial sphere model MT in the center. That is, the center of the coordinate plane 131 is arranged at a position that coincides with the center of the celestial sphere model MT and is in contact with the celestial sphere model MT.

In this case, the coordinates are normalized based on the zoom magnification and the size of the cutout area. For example, when the horizontal coordinates of the coordinate plane 131 are 0 to outh and the vertical coordinates are 0 to outv as shown in FIG. 23A, outh and outv are the image sizes. Then, for example, the coordinates are normalized by the following equation.

In the above (Equation 12), min (A, B) is a function that returns the smaller value of A and B. Further, "zoom" is a parameter for controlling enlargement / reduction.
Further, xnorm, ynorm, and znorm are normalized x, y, and z coordinates.
According to each of the above equations (Equation 12), the coordinates of the coordinate plane 131 are normalized to the coordinates on the spherical surface of the hemisphere having a radius of 1.0.

As shown in FIG. 24A, the coordinate plane 131 is rotated by rotation matrix calculation for the rotation for obtaining the orientation of the cutout region. That is, the following rotation matrix (Equation 13) is used to rotate the pan angle, tilt angle, and roll angle. Here, the pan angle is a rotation angle that rotates the coordinates around the z-axis. The tilt angle is a rotation angle for rotating the coordinates around the x-axis, and the roll angle is a rotation angle for rotating the coordinates around the y-axis.

In the above (Equation 13), "Rt" is a tilt angle, "Rr" is a roll angle, and "Rp" is a pan angle. Further, (xrot, yrot, zrot) are the coordinates after rotation.

These coordinates (xrot, yrot, zrot) are used to calculate the celestial sphere corresponding points in perspective projection.
As shown in FIG. 24B, the coordinate plane 131 is perspectively projected onto the surface of the celestial sphere (region 132). That is, when a straight line is drawn from the coordinates toward the center of the celestial sphere, the point that intersects the sphere is found. Each coordinate is calculated as follows.

In (Equation 14), xsph, ysph, and zsph are coordinates obtained by projecting the coordinates on the coordinate plane 131 onto the coordinates on the surface of the celestial sphere model MT.
Image data projected in a plane can be obtained in this relationship.

For example, the cutout area for the image projected on the plane by the above method is set in step ST17 of FIG.

In step ST17, the cutout area information CRC in the current frame is set based on the tracking process by image analysis (subject recognition) and the cutout area instruction information CRC according to the user operation.
For example, FIGS. 25A and 25B show the cutout area information CRA set for the image of a certain frame in the state of the frame.
Such cutout area instruction information CRC is set for each frame.
The cutout area information CRA also reflects the instruction of the aspect ratio of the image by the user or automatic control.

The cutout area information CRA is reflected in the process of step ST18. That is, as described above, the region corresponding to the cutout region information CRA is projected on the celestial sphere model MT in a plane, and the output moving image data oPD is obtained.

The output moving image data oPD obtained in this way is moving image data in which the shaking change processing is performed in step ST16 so that the shaking state is different for each area.
By performing the processing of FIG. 17 every frame, when the output moving image data oPD is reproduced and displayed, an image to which shaking is added is displayed as a shaking effect.
Therefore, for example, when the user performs an operation for setting the parameters PRM1 and PRM2 and the area, the image is obtained by adding a shaking effect according to the user's intention for each area. In addition, the image may be an image in which the area is automatically set and the degree of shaking according to the subject is added to each area.
Such moving image data is displayed or saved as an image with a shaking effect.

<5. Summary and modification>
The following effects can be obtained in the above embodiments.
In the image processing device TDx of the embodiment, the area setting unit 104 (ST42) for setting a plurality of areas in the image of the input moving image data and the area set by the area setting unit 104 for the shaking state appearing in the output moving image data. It is provided with a shake changing unit 101 (ST30) that performs a shake changing process so as to be in a different state each time.
Therefore, for example, it is possible to form a region without shaking and a region with shaking in the screen, or to form a region with small shaking and a region with large shaking.
This makes it possible to achieve both the effect of shaking and the effect of stopping shaking to make it easier to see, or it is possible to enhance the effect of shaking by changing the degree of shaking for each area, resulting in a variety of image effects. Can be expanded.
In other words, instead of simply creating an image effect due to shaking on the screen, if there are subjects in the image that you want to shake and subjects that you do not want to shake, you can give shaking according to each, or only part of it is violent. It is possible to realize image effects that meet various needs, such as shaking only a specific subject when you want to shake it.
In addition, 3 or more regions may be set excluding the boundary region so that different vibration changes are performed for each region so that the degree of each shaking is different.

In the embodiment, the shaking changing unit 101 (ST30) adds shaking for one area (for example, area AR1) set by the area setting unit 104 (ST42) and reduces shaking for the other area (for example, area AR2). An example of performing the shaking change processing is described (for example, processing P3).
As a result, it is possible to generate an image that shakes or does not shake depending on the subject or the like in the image, achieves both visibility and a directing effect, and realizes a new image effect such as emphasizing a specific part by shaking.

In the embodiment, the shaking changing unit 101 (ST30) adds shaking to one area (for example, area AR1) set by the area setting unit 104 (ST42), and one area for another area (for example, area AR2). An example of performing a shaking change process for adding smaller shaking has been described (for example, process P2, process P4).
As a result, it is possible to set the magnitude of the shaking according to the subject in the image while increasing the shaking in the entire image, to achieve both visibility and effect, and to realize a new image effect due to the difference in the degree of shaking. ..

In the embodiment, the shaking changing unit 101 (ST30) reduces the shaking for one area (for example, area AR1) set by the area setting unit 104 (ST42), and one area for another area (for example, area AR2). An example of performing a shaking change process for reducing shaking with a larger reduction amount has been described (for example, process P6).
As a result, while adjusting the direction to reduce the shaking in the entire image, the magnitude of the shaking can be set according to the subject in the image, achieving both visibility and directing effect, and a new image effect due to the difference in the degree of shaking. can.

In the embodiment, the shaking changing unit 101 (ST30) adds or reduces shaking to one of one area (for example, area AR1) and another area (for example, area AR2) set by the area setting unit 104 (ST42). An example of performing the shaking change processing has been described (for example, processing P1, processing P5).
This also makes it possible to set the magnitude of the shaking in the image according to the subject and the like, to achieve both visibility and a directing effect, and to realize a new image effect due to the difference in the degree of shaking. It is also possible to utilize the original shaking as a shaking effect in the other area.

In the embodiment, the UI processing unit 103 (ST40) for detecting the operation information related to the shaking change is provided, and the area setting unit 104 describes an example of setting a plurality of areas based on the operation information detected by the UI processing unit 103. (See FIG. 6).
As a result, a plurality of areas are set according to the user's intention, and an image effect in which the degree of shaking differs in each area is realized. Therefore, it is possible to produce a shaking effect that reflects the user's image editing intention.

In the embodiment, the area setting unit 104 has described an example of setting a plurality of areas based on the image analysis of the input moving image data (see FIG. 7).
For example, subject recognition, detection of a specific subject, composition determination, determination of a fixed object such as a background, etc. are performed, and an area that should be shaken and an area that is not shaken are separated, and the area is automatically set. As a result, it is possible to set an appropriate area according to the image content and change the shaking for each area, and the user can easily obtain a shaking effect image in which the shaking is changed for each area.

In the embodiment, the shaking change unit 101 attaches each frame of the input moving image data to be processed to the celestial sphere model MT, and rotates the shaking information (adjusted quaternion eQD) corresponding to each frame to change the shaking. I gave an example of doing.
By rotating the celestial sphere model MT to increase or decrease the sway for each frame to change the sway as a whole, it is possible to change the sway without causing trapezoidal distortion (sway removal or sway addition). Therefore, as an image with a shaking effect, a high-quality image with little distortion can be obtained.

In the embodiment, an example is given in which the amount of shaking for each region is changed by moving the coordinate points on the celestial sphere model MT in each frame of the moving image data iPD (see FIGS. 9 to 14).
This makes it possible to partially increase the sway after changing the sway as a whole. Therefore, it is possible to change the shaking state for each region by enlarging / reducing by moving the coordinate points while making use of the shaking change due to the rotation of the celestial sphere model MT. Further, by expressing the shaking by enlarging and reducing, the continuity of the image is not interrupted at the boundary of the region, so that the image quality can be kept good without performing complicated joint processing.

In the embodiment, an example is described in which the movement of the coordinate points is a process of moving the coordinate points of the pixels of the boundary region between one region and the other region set by the region setting unit 104 (FIGS. 9 to 9). 14).
As a result, the position change (that is, shaking) of a part of the area can be realized by a relatively simple process of changing the coordinates of the pixels only in the boundary area, that is, enlarging / reducing the image of the boundary area by moving the coordinate points. In addition, since the area is not enlarged or reduced, each shaking can be expressed as a natural image.

It is also possible to change the shaking for each region by rotating the celestial sphere model MT without moving the coordinate points.
For example, in the moving image data iPD, the image of the area AR1 is attached to the first celestial sphere model, the image of the area AR2 is attached to the second celestial sphere model, and the amount of shaking, the period, and the rotation in the shaking direction indicated by the parameters PRM2 are calculated, respectively. conduct. Then, the images projected from each celestial sphere model are combined. If this is performed for each frame, it is possible to obtain output moving image data oPD having a different shaking state for each area.

The program of the embodiment is a program that causes, for example, a CPU, a DSP, or a device including these to execute the process described with reference to FIG.
That is, in the program of the embodiment, the area setting process (ST42) for setting a plurality of areas in the image of the input moving image data and the shaking state appearing in the output moving image data are different for each area set in the area setting process. This is a program that causes the information processing apparatus to execute the shaking change processing (ST30) so as to be.

With such a program, the above-mentioned image processing device TDx can be realized in a device such as a mobile terminal 2, a personal computer 3, or an image pickup device 1.

A program that realizes such an image processing device TDx can be recorded in advance in an HDD as a recording medium built in a device such as a computer device, a ROM in a microcomputer having a CPU, or the like.
Alternatively, flexible discs, CD-ROMs (Compact Disc Read Only Memory), MO (Magneto Optical) discs, DVDs (Digital Versatile Discs), Blu-ray discs (Blu-ray Discs (registered trademarks)), magnetic discs, semiconductor memories, It can be temporarily or permanently stored (recorded) on a removable recording medium such as a memory card. Such a removable recording medium can be provided as so-called package software.
In addition to installing such a program from a removable recording medium on a personal computer or the like, it can also be downloaded from a download site via a network such as a LAN (Local Area Network) or the Internet.

Further, according to such a program, it is suitable for a wide range of provision of the image processing apparatus TDx of the embodiment. For example, by downloading a program to a personal computer, a portable information processing device, a mobile phone, a game device, a video device, a PDA (Personal Digital Assistant), or the like, the personal computer or the like can function as the image processing device of the present disclosure. Can be done.

Note that the effects described in this specification are merely examples and are not limited, and other effects may be obtained.

The present technology can also adopt the following configurations.
(1)
An area setting unit that sets multiple areas in the image of the input video data,
An image processing device including a shaking changing unit that performs shaking changing processing so that the shaking state appearing in the output moving image data is different for each area set by the area setting unit.
(2)
The shaking change part is
The image processing apparatus according to (1) above, wherein shaking is added to one area set by the area setting unit, and shaking change processing is performed to reduce the shaking of another area set by the area setting unit.
(3)
The shaking change part is
Shake change processing is performed to add shaking to one area set by the area setting unit and to add shaking smaller than the one area to the other areas set by the area setting unit (1) or (2). ). The image processing apparatus.
(4)
The shaking change part is
Shake change processing is performed to reduce shaking in one area set by the area setting unit and to reduce shaking in another area set by the area setting unit with a reduction amount larger than that in the one area (1). The image processing apparatus according to any one of (3).
(5)
The shaking change part is
The image processing apparatus according to any one of (1) to (4) above, which performs a shaking change process for adding or reducing shaking to one of one area and the other area set by the area setting unit.
(6)
Equipped with a user interface processing unit that detects operation information related to shaking changes
The image processing device according to any one of (1) to (5) above, wherein the area setting unit sets a plurality of areas based on operation information detected by the user interface processing unit.
(7)
The image processing apparatus according to any one of (1) to (6) above, wherein the area setting unit sets a plurality of areas based on image analysis of the input moving image data.
(8)
The shaking changing unit changes the shaking of the entire image by pasting each frame of the input moving image data to the celestial sphere model and rotating each frame using the shaking information corresponding to each frame (1). The image processing apparatus according to any one of (7) to (7).
(9)
The image processing apparatus according to (8) above, wherein the shaking changing unit changes the amount of shaking for each region by moving coordinate points on the celestial sphere model in each frame.
(10)
The shaking change part is
The image processing apparatus according to (9) above, wherein the movement of the coordinate points is a process of moving the coordinate points of the pixels of the boundary region between one region and the other region set by the region setting unit.
(11)
The image processing device
Area setting process to set multiple areas in the image of input video data,
An image processing method for performing a shaking change process for making the shaking state appearing in the output moving image data different for each area set in the area setting process.
(12)
Area setting process to set multiple areas in the image of input video data,
Shake change processing that makes the state of shaking appearing in the output video data different for each area set in the area setting process, and
Is a program that causes the information processing device to execute.

1 Imaging device 2 Mobile terminal 3 Personal computer 4 Server 5 Recording medium 70 Information processing device,
71 CPU,
101 Shake change unit 102 Parameter setting unit 103 UI processing unit 104 Area setting unit

Claims

An area setting unit that sets multiple areas in the image of the input video data,
An image processing device including a shaking changing unit that performs shaking changing processing so that the shaking state appearing in the output moving image data is different for each area set by the area setting unit.
The shaking change part is
The image processing apparatus according to claim 1, wherein shaking is added to one area set by the area setting unit, and shaking change processing is performed to reduce the shaking of another area set by the area setting unit.
The shaking change part is
The image according to claim 1, wherein a shake is added to one area set by the area setting unit, and a shake change process is performed to add a shake smaller than that of the one area to another area set by the area setting unit. Processing equipment.
The shaking change part is
2. The image processing apparatus described.
The shaking change part is
The image processing apparatus according to claim 1, wherein a shaking changing process for adding or reducing shaking to one of one area and the other area set by the area setting unit is performed.
Equipped with a user interface processing unit that detects operation information related to shaking changes
The image processing device according to claim 1, wherein the area setting unit sets a plurality of areas based on operation information detected by the user interface processing unit.
The image processing device according to claim 1, wherein the area setting unit sets a plurality of areas based on image analysis of the input moving image data.
The shaking changing unit changes the shaking of the entire image by attaching each frame of the input moving image data to the celestial sphere model and rotating each frame using the shaking information corresponding to each frame. The image processing apparatus described.
The image processing apparatus according to claim 8, wherein the shaking changing unit changes the amount of shaking for each region by moving coordinate points on the celestial sphere model in each frame.
The shaking change part is
The image processing apparatus according to claim 9, wherein the movement of the coordinate points is a process of moving the coordinate points of the pixels of the boundary region between one region and the other region set by the region setting unit.
The image processing device
Area setting process to set multiple areas in the image of input video data,
An image processing method for performing a shaking change process for making the shaking state appearing in the output moving image data different for each area set in the area setting process.
Area setting process to set multiple areas in the image of input video data,
Shake change processing that makes the state of shaking appearing in the output video data different for each area set in the area setting process, and
Is a program that causes the information processing device to execute.