WO2015070558A1 - Video shooting control method and device - Google Patents
Video shooting control method and device Download PDFInfo
- Publication number
- WO2015070558A1 WO2015070558A1 PCT/CN2014/074831 CN2014074831W WO2015070558A1 WO 2015070558 A1 WO2015070558 A1 WO 2015070558A1 CN 2014074831 W CN2014074831 W CN 2014074831W WO 2015070558 A1 WO2015070558 A1 WO 2015070558A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speaker
- video
- picture
- current
- camera
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 92
- 238000005516 engineering process Methods 0.000 claims description 37
- 230000004807 localization Effects 0.000 claims description 33
- 230000009977 dual effect Effects 0.000 claims description 17
- 230000001815 facial effect Effects 0.000 abstract description 11
- 238000003384 imaging method Methods 0.000 description 26
- 238000010586 diagram Methods 0.000 description 15
- 230000033001 locomotion Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 230000014509 gene expression Effects 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000000717 retained effect Effects 0.000 description 5
- 230000008921 facial expression Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/142—Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
Definitions
- the present invention relates to the field of video images, and in particular, to a method and apparatus for controlling video capture. Background technique
- the camera captures a panoramic view of all participants at a fixed size and at a fixed angle.
- the camera may be far away from the speaker.
- the captured picture cannot determine who is speaking, and cannot see the speaker's facial expression, thus causing loss of valuable information of the conference.
- Embodiments of the present invention provide a method and apparatus for controlling video capture, which can reduce the number of video switching while keeping the speaker's face picture, make the picture tightly connected, and output the video more smoothly.
- a method of controlling video capture including:
- the first camera and the second camera are sequentially controlled to alternately capture the video of the current speaker
- the video of the current speaker After successfully acquiring the video of the current speaker, the video of the current speaker is output.
- the outputting the video of the current speaker includes: outputting a video of the current speaker in full screen;
- the displaying, by the full screen, the video of the current speaker includes:
- the video of the previous speaker of the current speaker is output in full screen
- the video of the current speaker After successfully acquiring the video of the current speaker, the video of the current speaker is output in full screen.
- the outputting the video of the current speaker comprises: simultaneously outputting the current speaker and the current speaker in a picture-in-picture format a video of the previous speaker;
- the picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, and the current speaker is output in the first picture, The previous speaker of the current speaker is output in the second picture.
- the method further includes:
- the simultaneously outputting the video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture includes:
- the outputting the video of the current speaker includes: simultaneously outputting the current speaker and the current speaker in a dual screen manner a video of the previous speaker;
- the output screen includes two partial screens that are not included in each other. One part of the screen outputs the current speaker, and the other part of the screen outputs the previous speaker of the current speaker.
- the method further includes:
- the simultaneously outputting the video of the current speaker and the previous speaker of the current speaker in the form of a two-picture includes:
- the third speaker is outputted in the portion of the screen, and the second speaker is outputted in the other portion of the screen.
- the method before the controlling the first camera to capture the video of the first speaker, the method further includes:
- the first camera and the second camera are controlled to capture a video of the entire venue and output the captured video.
- the first camera is controlled by the first camera Before the video, the method further includes:
- the tracking flag of the first imaging device is initially a first tracking flag
- the tracking flag of the second imaging device is initially a second Tracking mark
- the controlling the first camera to capture the video of the first speaker when the first speaker speaks includes: controlling, when the first speaker speaks, the first camera having the first tracking flag to capture the first speaker a video, after successfully acquiring the video of the first speaker, setting a tracking flag of the first camera device from the first tracking flag to the second tracking flag, and simultaneously a tracking flag is set from the second tracking flag to the first tracking flag;
- controlling the second camera to capture the video of the second speaker includes: changing the current speaker from the first speaker to the second speaker a speaker, controlling a second camera having the first tracking flag to capture a video of the second speaker, and after successfully acquiring the video of the second speaker, the tracking flag of the second camera is The first tracking flag is set as the second tracking flag, and the tracking flag of the first camera is set from the second tracking flag to the first tracking flag.
- the first camera device and the first device are sequentially controlled when a speaker change occurs subsequently
- the second camera device alternately captures the video of the current speaker: the camera device having the first tracking flag is controlled to capture the video of the current speaker every time the speaker changes, and after successfully acquiring the video of the current speaker,
- the tracking marks of the first imaging device and the second imaging device are interchanged.
- controlling the camera to capture the video of the speaker includes:
- the camera is controlled to capture the video of the speaker.
- controlling the camera to capture the video of the speaker by using the sound source localization technology comprises: controlling the camera to capture the video of the speaker by using the sound source localization technology and combining preset position or image recognition technology.
- controlling the second camera to capture the video of the second speaker includes:
- the second speaker position is in the output screen of the first speaker, further determining whether the second speaker position is within a setting area of the output screen of the first speaker;
- an apparatus for controlling video capture including:
- control unit configured to control, when the first speaker speaks, the first camera to capture a video of the first speaker
- the control unit is further configured to: when the current speaker changes from the first speaker to the second speaker, control the second camera to capture a video of the second speaker, where the second speaker is The next speaker whose first speaker position is different;
- the control unit is further configured to sequentially control the first camera device and the second camera device to sequentially capture a video of a current speaker when a speaker change occurs subsequently;
- a processing unit coupled to the control unit, configured to output a video of the current speaker after successfully acquiring the video of the current speaker.
- the processing unit is specifically configured to: Setting a full-screen display of the current speaker's video;
- the video of the current speaker is output in full screen.
- the processing unit is specifically configured to:
- the video of the previous speaker of the current speaker is output in full screen before the video of the current speaker is successfully acquired; after the video of the current speaker is successfully acquired, the video of the current speaker is output in full screen.
- the processing unit is further configured to:
- the picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, and the current speaker is displayed in the first picture. Displaying the previous speaker of the current speaker in the second screen;
- control unit is further configured to:
- the processing unit is specifically configured to:
- the processing unit is further configured to:
- the dual picture includes two parts of pictures that are not included in each other, a part of the picture displays the current speaker, and another part of the picture displays the previous speaker of the current speaker;
- the video of the current speaker and the previous speaker of the current speaker is simultaneously output in the form of a two-picture.
- control unit is further configured to:
- the processing unit is specifically configured to:
- the third speaker is outputted in the portion of the screen, and the second speaker is outputted in the other portion of the screen.
- control unit is further configured to:
- controlling the first camera and the second camera to capture a video of the entire venue Before controlling the first camera to capture the video of the first speaker, in the initial state, controlling the first camera and the second camera to capture a video of the entire venue;
- the processing unit is further configured to output the captured video.
- control unit is further configured to:
- the tracking flag of the first imaging device is initially a first tracking flag
- the tracking flag of the second imaging device is initially a second Tracking mark
- the control unit is specifically configured to: when the first speaker speaks, control the first camera with the first tracking flag to capture the video of the first speaker, after successfully acquiring the video of the first speaker, The tracking flag of the first camera device is set from the first tracking flag to the second tracking flag, and the tracking flag of the second camera device is set from the second tracking flag to the first tracking flag Sign
- the control unit is specifically configured to: when the current speaker changes from the first speaker to the second speaker, control the second camera having the first tracking flag to capture the video of the second speaker, After successfully acquiring the video of the second speaker, setting the tracking flag of the second camera device from the first tracking flag to the second tracking flag, and simultaneously tracking the tracking flag of the first camera device The second tracking flag is set as the first tracking flag.
- control unit is specifically configured to: when the speaker change occurs each time, the control has the first The camera of the tracking mark captures the video of the current speaker, and after successfully acquiring the video of the current speaker, the tracking marks of the first camera and the second camera are interchanged.
- control unit is specifically configured to:
- the camera is controlled to capture the video of the speaker.
- control unit is specifically configured to:
- the sound source localization technique is combined with preset position or image recognition technology to control the camera to capture the speaker's video.
- control unit is specifically configured to: Determining whether the second speaker position is in an output screen of the first speaker; if the second speaker position is not in an output screen of the first speaker, controlling the second camera to shoot a video of the second speaker;
- the second speaker position is in the output screen of the first speaker, further determining whether the second speaker position is within a setting area of the output screen of the first speaker;
- the method for controlling video shooting and the device for controlling video shooting according to the present invention when one of the participants alternately speaks, sequentially controls the first camera and the second camera to alternately capture the current speech.
- the video of the person, and output the video of the current speaker so that even if there are multiple people in the venue quickly alternately speaking, the two camera devices can capture the facial images of the plurality of speakers, and the technical solution provided by the present invention
- the video of the current speaker is output only after the camera device successfully acquires the video of the current speaker. Compared with the prior art, it is required to switch to the panorama of the conference site before the camera device successfully acquires the video of the next speaker.
- the invention can indeed reduce the number of video switchings, so that the picture is closely connected and the output video is smoother.
- FIG. 1 is a flow chart of an embodiment of a method for controlling video capture according to the present invention
- 2A is a schematic diagram of the speaker after the change of the shooting, in the case where the position of the speaker after the change is within the setting area of the output screen of the speaker before the change;
- 2B is a schematic diagram of the speaker after the change in the case where the changed speaker position is in the output screen of the speaker before the change but not in the setting area of the screen
- 2G is a schematic diagram of the speaker after the change of the shooting in the case where the speaker position after the change is not in the output screen of the speaker before the change
- 3A is a flow chart of a specific embodiment of a method for controlling video capture according to the present invention.
- FIG. 3B is another flowchart of a specific embodiment of a method for controlling video capture according to the present invention
- FIG. 4 is a schematic diagram of a specific embodiment of a method for controlling video capture according to the present invention.
- FIG. 5A is a schematic diagram showing an effect of outputting a camera rotation/push-pull process when displaying in full screen
- FIG. 5B is a schematic diagram showing an effect of not outputting a camera rotation/push-pull process when displaying in full screen
- FIG. 6 is a flowchart of another embodiment of a method for controlling video shooting according to the present invention
- FIG. 7 is another embodiment of a method for controlling video shooting according to the present invention
- FIG. 8A is a schematic diagram showing the effect of outputting the camera rotation/push-pull process when the picture-in-picture is displayed
- FIG. 8B is a schematic diagram showing the effect of not outputting the camera rotation/push-pull process when the picture-in-picture is displayed
- FIG. 10 is a schematic diagram of still another embodiment of a method for controlling video capture according to the present invention
- FIG. 1 1 A is an output camera rotation/push-pull when displayed in dual screen
- FIG. 1 is a schematic diagram showing the effect of not outputting the camera rotation/push-pull process when displaying in two pictures
- FIG. 1 is a structural block diagram of an embodiment of the apparatus for controlling video shooting according to the present invention
- FIG. 1 is a schematic structural diagram of another embodiment of an apparatus for controlling video shooting according to the present invention
- FIG. 1B is a schematic structural diagram of still another embodiment of an apparatus for controlling video shooting according to the present invention
- FIG. 1 is a flow chart of an embodiment of a method of controlling video capture of the present invention.
- the method for controlling video capture provided by the embodiment of the present invention may be implemented by a device having a control processing function, which may be, for example, a camera, a video controller, a video terminal, or the like.
- a control processing function which may be, for example, a camera, a video controller, a video terminal, or the like.
- the present invention The methods provided by the embodiment for controlling video shooting include:
- the first imaging device may be a camera module
- the second camera device may also be a camera module.
- the first imaging device and the second imaging device may also be a plurality of camera modules, and the specific applications of the plurality of camera modules may be similarly obtained according to the application of one camera module.
- the first camera device and the second camera device may be connected and fixed together by a connecting device, or may be independent of each other.
- the camera device mentioned in the embodiment of the present invention may be a video camera or other terminal device having a camera function.
- the method for controlling video shooting provided by the embodiment of the present invention can be applied to a video conference for capturing and outputting a video of a speaker in a local conference site, and can also be used for transmitting a video of a local conference site to a remote conference site, so as to be remotely located. Participants at the venue watched the situation at the local venue.
- both the first camera device and the second camera device can simultaneously control the panoramic view of the local site. If it is predetermined to control the first camera to shoot the first speaker in the conference site, it is preferable to output the video captured by the second camera to the remote site. At this point, because there are no speakers, the participants at the remote site only need to watch the panorama of the local site.
- the first camera device can be immediately controlled to capture the video of the first speaker; and the second camera device can still be controlled to capture the panorama of the local conference site.
- the position of the speaker can be determined using sound source localization techniques. Only the sound source localization technology may not accurately acquire the position of the speaker due to noise interference or the like. Therefore, it is also possible to pre-set the possible position of the speaker when speaking in the local venue, and locate the sound source. When the technology acquires the position of the speaker, the accuracy of the judgment in combination with the preset possible position (ie, the preset position) is higher. In order to obtain the position of the speaker more accurately, sound source localization technology and image recognition technology can be combined.
- the plurality of sound pickup microphones may be formed into a sound pickup microphone array, and when the first speaker speaks, the The pickup microphone array picks up the sound of the local venue, passing the audio front Handle to the sound source locator.
- the sound source locator is a module having a sound source localization function in the type of device having the control processing function, and the sound pickup microphone array has two or more sound pickups distributed at different positions of the local venue.
- the microphone is composed. The sound source locator receives the sound picked up by the sound collecting microphone array and performs positioning processing thereof to obtain position information of the first speaker.
- the controller may send a corresponding camera control command to the pan/tilt according to the position information, and the pan/tilt control the first camera to rotate to a suitable shooting angle to roughly obtain the video of the first speaker, where
- the pan/tilt is used to receive and execute camera control commands sent by the controller. Then, combining the position information obtained by the sound source positioning, the preset position information or the image recognition technology (the image recognition technology may specifically be face recognition, face detection, lip motion detection, etc.), and the first speaker is further obtained.
- Accurate position information generating a new control command to send to the pan/tilt, controlling the first camera to rotate/push and pull the camera, acquiring a picture of the first speaker size according to requirements, for example, the first speaker may be The face occupies 1 / 2, 1 / 3 or 1 / 4 of the entire picture.
- the embodiment of the invention uses the sound source localization technology combined with the preset position or the image recognition technology to accurately determine the position of the speaker, and then control the camera device. Take a picture. It should be noted that, in the present invention, only the sound source localization technology may be used according to actual conditions, or the sound source localization technology may be used in combination with the preset position, or the sound source localization technology may be used to collect the preset position, and the sound source localization technology may also be combined at the same time. Preset and image recognition technology.
- the current speaker refers to the person currently speaking in the local venue, and in steps S1 1, S1 2, the current speaker is the first speaker and the second speaker, respectively. It should be noted that, after the change of the speaker position and the camera successfully obtains the video of the speaker after the change, although the camera device has not successfully acquired the video of the speaker after the change, in the process, the current The speaker is already the speaker after the change.
- the position of the speaker may be changed according to the sound source localization technique, that is, the speaker changes from the first speaker to the position.
- the second speaker different from the first speaker controls the second camera to rotate/push and pull to a suitable shooting angle and shooting size. Then, as in step S1 1 , in combination with the preset position or image recognition technology, the second camera device is further controlled to rotate/push and pull the camera as needed, and the video of the second speaker size is captured.
- the speaker only moves a little, for example, only moves one or two positions, it can be considered that the position of the speaker has not changed, and it is not necessary to switch the camera, and as long as the speaker is still in the shooting screen.
- the setting area for example, in the center area of 80% of the entire screen, the camera does not need to rotate/push and pull the camera for tracking. If the speaker has moved, as long as the speaker is still in the setting area of the shooting screen, it can be considered that the position of the speaker has not changed, there is no need to switch the camera, and the camera does not need to rotate/push the camera for tracking.
- the speaker changes to another speaker, but the two speakers only alternate between the same position, or if the distance between the two speakers is very close, and is within the setting area of a camera shooting picture, It can be considered that the position of the speaker has not changed, and it is not necessary to switch the imaging device, and the imaging device does not need to rotate/push the camera for tracking (see FIG. 2A, the solid line indicates the imaging screen, and the broken line indicates the setting region).
- the same speaker or a different speaker if the speaker position is in the output screen but not in the setting area, there is no need to switch the camera, but the camera can be slightly rotated/pushed, so that the changed speaker is in the picture.
- Middle see Figure 2 B).
- the change of the speaker or the change of the position of the speaker refers to the change of the position of the speaker, and the distance between the changed position and the center position of the shooting screen reaches the need to switch the camera.
- the degree can be set according to the actual specific scene (refer to Figure 2 C).
- the first camera and the second camera are sequentially controlled to alternately capture the video of the current speaker.
- the subsequent speaker changes from the second speaker to the next speaker-third speaker of the second speaker
- controlling the first camera to capture the third speaker video. If the speaker change occurs later, that is, the speaker changes from the third speaker to the next speaker of the third speaker—the fourth speaker, the second camera is controlled to capture the fourth The speaker's video. Repeatedly, ensuring that the first camera device and the second camera device alternately shoot the current lecture The video of the speaker.
- A first starts to talk, then controls the first camera to shoot A; when the speaker changes from A to B, then controls the second.
- the imaging device takes a picture B; after the speaker changes from B to C, the first camera device is again controlled to capture C; and when the speaker is changed from C to C, the second camera device is again controlled to capture D, and so on.
- the picture taken by the camera of the prior art for taking a video of the speaker will include a plurality of speakers, and if the plurality of speakers are far away, they cannot Observing the expressions of the plurality of speakers in the captured picture results in loss of valuable information of the meeting.
- the first camera device and the second camera device can both track the speaker, wherein when one camera device tracks the current speaker, the other camera device tracks the changed speaker.
- the first camera device and the second camera device cooperate with each other and seamlessly: when the first camera device captures the current speaker, the second speaker device is used to capture the next speaker of the current speaker; When the second camera captures the current speaker, the next speaker of the current speaker is captured by the first camera.
- the first camera can keep track of shooting A, and the second camera can keep tracking B. If the speaker alternates, the first camera and the first camera The two camera units have respectively adjusted the focal length, thus eliminating the need to rotate/push the camera.
- the two cameras can alternately capture the speaker's face picture, retain more valuable information of the conference, and the efficiency of video tracking is also improved.
- the present invention does not limit the manner in which the video captured in the local site is transmitted to the remote site by means of encoding (such as encoding, decoding, etc.).
- the video of the current speaker can be sent to the video signal processor, and the video signal processor receives the current speech.
- processing such as encoding and compression is performed, and then the code stream obtained by the encoding and compression is transmitted to the remote site through the network; after receiving the code stream, the remote site performs decoding and the like, and obtains a solution.
- the current speaker's video can then be displayed on the display of the remote site in different ways. In this way, participants at the remote site can view the local site on the display.
- the method for controlling video capture provided by the embodiment of the present invention may further include: outputting a video of a previous speaker of the current speaker. That is, before successfully acquiring the video of the current speaker, the video of the previous speaker of the current speaker is output; after the video of the current speaker is successfully acquired, the video of the current speaker is output.
- the screen when the screen is output in full screen, not only can the output screen be continuous, but also the output image can be ensured to be high, and the camera device can be prevented from rotating/pushing and pulling the camera during the process of acquiring the current speaker's video.
- the picture that causes the output appears blurry, swaying, and so on.
- the current speaker when the screen of the local site is output, not only full-screen output but also a picture-in-picture, dual-picture, or the like may be output.
- the current speaker when outputting in the form of picture-in-picture, after successfully acquiring the video of the current speaker, the current speaker may be output in a large picture (first picture) and output in a small picture (second picture) The previous speaker of the current speaker.
- the output of the two-picture form after the video of the current speaker is successfully acquired, the current speaker may be outputted in a part of the pictures of the two-part picture not included, and the other part of the picture is outputted.
- the former speaker of the current speaker When the specific implementation of these output forms will be separately described in the following specific embodiments.
- the tracking flag may be separately set for the two camera devices before starting the shooting, for example, the The first camera device and the second camera device respectively set an initial tracking flag as a first tracking flag and a second tracking flag, and the tracking flag may use 0 or 1 or the like. The number is used to indicate.
- the camera device with the tracking mark as the first tracking mark may be set to specifically capture the video of the current speaker, and the camera device with the tracking mark as the second tracking mark is specifically used to capture the next speaker of the current speaker (or the former A video of a speaker).
- the tracking flags of the first camera device and the second camera device need to be interchanged.
- step S1 1 when the first speaker speaks, controlling the first camera to capture the video of the first speaker may include: speaking at the first speaker Controlling, by the first camera having the first tracking flag, the video of the first speaker, and after successfully acquiring the video of the first speaker, the tracking flag of the first camera is from the first The tracking flag is set to the second tracking flag, and the tracking flag of the second camera device is set from the second tracking flag to the first tracking flag.
- Step S1 2 When the current speaker changes from the first speaker to the second speaker, controlling the second camera to capture the video of the second speaker may include: changing the current speaker from the first speaker to a second speaker, controlling a second camera having the first tracking flag to capture a video of the second speaker, and after successfully acquiring the video of the second speaker, tracking the second camera The flag is set from the first tracking flag to the second tracking flag, and the tracking flag of the first camera is set from the second tracking flag to the first tracking flag.
- Step S1 3 when the speaker change is subsequently performed, sequentially controlling the first camera and the second camera to alternately capture the video of the current speaker may include: each time a speaker change occurs, the control has the The camera of the first tracking mark captures the video of the current speaker, and after successfully acquiring the video of the current speaker, the tracking marks of the first camera and the second camera are interchanged. In this way, it is possible to ensure that the two cameras cooperate with each other, seamlessly dock, and alternately capture the video of the current speaker.
- both the first camera device and the second camera device can track the speaker. Controlling, by the first speaker, the first camera to capture the first speaker, while the second camera is in preparation for tracking the next speaker of the first speaker Standby state. Changing the current speaker from the first speaker to the second speaker (ie, with the first speaker) Controlling the second camera to capture the second speaker while the second camera is shooting, while the first camera keeps capturing the first speaker and transitioning to preparation for tracking A state of the next speaker different from the second speaker position is taken. In this way, it can be ensured that the first camera device and the second camera device can cooperate with each other and seamlessly dock. It takes a certain time for the camera to successfully acquire the video of the changed speaker when the speaker changes.
- the prior art uses a camera device exclusively for capturing the panoramic view of the local venue, and the other camera device is specifically used for tracking the shooting of the speaker, and therefore, the camera device dedicated to tracking the speaker is successfully acquired.
- the screen Before the current speaker's video, the screen needs to be switched to the panorama of the venue. When the camera successfully acquires the current speaker's video, the screen is switched to the changed speaker, which will result in the video not being smooth.
- the video of the current speaker is output only after the camera device successfully acquires the video of the current speaker, and the current output is kept before the camera device successfully acquires the video of the current speaker. A video of the speaker's previous speaker.
- the present invention can indeed reduce the number of video switching, thereby making the screen tightly connected and the output video more smooth.
- the picture taken according to the prior art includes a plurality of speakers, and if the plurality of speakers are far apart, the picture cannot be observed in the captured picture.
- the expression of the plurality of speakers due to the cooperation of the first image pickup device and the second image pickup device, even if there is a speaker who alternately speaks in the local venue, the two camera devices can alternately capture the face image of the speaker.
- the imaging device may be marked with a tracking mark and the video captured by the imaging device that specifies the tracking flag may be output.
- the initial tracking flag of the first camera device may be set to 0 (ie, the first tracking flag), and the initial tracking flag of the second camera device is set to 1 (ie, the second tracking flag), wherein the tracking flag is 0.
- Camera for shooting the current speaker Video The camera with the tracking flag of 1 is used to capture the video of the next speaker of the current speaker, which will be described below for convenience.
- the tracking flag of the first camera is set to
- the tracking flag of the second camera is set to 0, or other manners of setting the tracking flag are also possible, which is not limited by the present invention.
- FIG. 3A is a flow chart of a specific embodiment of a method of controlling video capture in accordance with the present invention.
- Figure 3B is another flow diagram of one embodiment of a method of controlling video capture in accordance with the present invention.
- the method for controlling video shooting includes:
- the two cameras (the first camera and the second camera) are turned on, that is, at the beginning of the conference, the local site has not yet spoken.
- the two can be controlled.
- the camera captures the panorama of the local site.
- the angle and size of the camera can be set by the user.
- the preferred setting can be the setting that can include all participants and the main conference scene.
- the sound collection microphone array picks up the sound of the local conference site, and sends the sound to the sound source to locate the sound.
- the sound source locator generates speaker position information according to a sound source localization technique.
- the controller controls the camera with the tracking flag of 0 to capture the video of the first speaker size according to the position information. After the camera with the tracking flag of 0 (ie, the first camera) captures the video of the first speaker size, the tracking flag is set to 1, and the tracking flag of the other camera (ie, the second camera) is set to 1. Set to 0.
- the controller may control the camera with the tracking flag of 0 ( That is, the second camera) captures the video of the second speaker, and the method of controlling the shooting is the same as S32.
- the camera with the tracking flag of 0 captures the video of the second speaker size
- its tracking flag is set to 1
- the tracking flag of the other camera is set to 0 by 1.
- the first camera and the second camera are sequentially controlled to alternately capture the video of the current speaker.
- the tracking flag of the second camera becomes 1, and the tracking flag of the first camera becomes 0.
- the camera whose tracking flag is 0 ie, the first camera
- Going to the third speaker after the camera with the tracking flag of 0 successfully acquires the video of the third speaker, the tracking flag of the camera with the tracking flag of 0 is set to 0 by 0, and the camera of the other camera The tracking flag is set to 0 by 1.
- the camera whose tracking flag is 0 ie, the second camera
- the tracking flag of the camera with the tracking flag of 0 is set to 0 by 1 and the other The tracking flag of the camera is set to 0 by 1.
- the camera with the tracking flag of 0 (specifically the first camera or the second camera) is controlled to track the speaker after the change, and the camera successfully acquires the video of the speaker. After that, its tracking flag is set to 0 by 0, and the tracking flag of the other camera is set to 0 by 1.
- the video of the current speaker is output in full screen.
- the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0, and the tracking flag of the other camera is set to 0 by 1. and so,
- the video captured by the camera with the tracking flag of 1 after the change is the video of the current speaker.
- the full screen output of the current speaker's video means that the output video is from a camera. In the full-screen display, only one speaker can be displayed, or multiple speakers can be displayed. Among them, the distances of a plurality of speakers are relatively close, so that the body language or face information of each speaker can be observed according to the captured video.
- step S1 2 if a plurality of speakers are far apart to observe each speaker in the video captured by the same camera, the position of the speaker may be considered to be changed, and the video of the speaker may be taken by another camera. .
- the participant of the remote site can clearly observe the close-up picture of the current speaker, wherein the close-up picture may contain important meetings. Information, so that important meeting information can be retained as much as possible.
- the first picture shows that the display shows the panoramic view of the local site in full screen at the beginning of the meeting; the second picture shows that after the first speaker appears, the display is displayed full screen. A speaker's video; the third picture shows that after the speaker has changed from the first speaker to the second speaker, the display displays the second speaker in full screen.
- step S36 is performed before step S35.
- the camera Since the camera starts to change, before the camera successfully acquires the current speaker's video, the camera will rotate/push the camera, resulting in a blurred or unstable picture. However, in the above process, by outputting the video of the previous speaker of the current speaker, it is possible to avoid outputting the blurred or unstable picture.
- FIG. 5A in the order from left to right, the three figures are agreed to be the first picture, the second picture, and the third picture, respectively.
- the third picture speaker is the next speaker of the first picture speaker. From the time the speaker changes, until the camera successfully acquires the video of the size of the third picture speaker, if the direct output camera is rotating / The picture taken during the process of pushing and pulling the camera will result in a blurred or unstable picture in the second picture.
- the specific embodiment of the present invention outputs the video of the first picture speaker, and
- the video of the third picture talker is output only after the video of the size of the third picture talker is successfully acquired, so that the blurred or unstable picture can be avoided (refer to FIG. 5B).
- the screen of the output is not switched, and the panoramic view of the local site is still output;
- the output screen is the full screen display of the current speaker
- the camera does not switch and does not track.
- the output picture is that the current speaker is in the center area. Full screen display; if the speaker's movement is such that the speaker has not exceeded the current output picture but may or has exceeded the set center area of the picture, the camera does not switch, but can be properly tracked to keep the speaker Located in the central area; if the speaker's movement causes the speaker to have exceeded the current output picture, the camera is switched to track the speaker;
- the local conference speaker has changed once and changed to the person or other person next to it.
- the camera does not switch and does not perform tracking.
- the output screen is the full screen of the changed speaker in the center area. Display screen; if the position of the speaker after the change has not exceeded the output screen before the change but may or may have exceeded the setting center area of the screen, the camera does not switch, but can be properly tracked to maintain the changed speech.
- the player is located in the center area, and the output screen is a full-screen display screen in which the speaker after the change is located in the center area. If the changed speaker position has exceeded the output screen before the change, the camera is switched, and the changed speaker is switched. Tracking;
- the time for snatching is usually very short, and the output picture is not switched; (7), the local venue is discussed by many people, alternating speech, that is, the speaker position change occurs multiple times.
- the camera alternately tracks the speaker after each position change, and the output screen is the full screen display screen of the speaker after the change.
- each time the position change of the speaker occurs the camera with the tracking flag of 0 is controlled to track the speaker after the change of the shooting position, and after the camera successfully acquires the appropriate video of the speaker,
- the tracking flags are set to 0 by 0, and the tracking flag of the other camera is set to 0 by 1. This always ensures that at some point, one camera is shooting the current speaker, and another camera can be used to capture the next speaker of the current speaker. In other words, the two cameras can be mated and seamlessly mated. It takes a certain amount of time for the camera to successfully acquire the video of the changed speaker when the position of the speaker changes.
- the video of the previous speaker of the current speaker is kept output, and the video of the current speaker is output only after the camera successfully acquires the video of the current speaker, which is required to be compared with the prior art.
- Switching to the panoramic view of the site when the camera successfully acquires the changed speaker's video, the screen is switched to the changed speaker.
- the present invention can reduce the number of video switching, thereby making the picture tightly connected and the output video smoother. .
- the prior art camera shot for the camera video will include multiple speakers, if the multiple speakers are far apart, The expressions of the plurality of speakers are observed in the captured picture.
- the two cameras can alternately photograph the face of the speaker.
- the participants of the remote site can more clearly observe the facial features of the current speaker, and these facial features may contain important meeting information, so that more Preserve valuable meeting information.
- FIG. 6 is a flow chart of another embodiment of a method of controlling video capture of the present invention.
- the method for controlling video shooting provided by the specific embodiment of the present invention includes:
- the two cameras After the two cameras are turned on, that is, at the beginning of the meeting, no one in the local venue has spoken.
- the location of the local site is transmitted to the remote site.
- the two cameras can be controlled to capture the panorama of the local site.
- the angle and size of the camera can be set by the user.
- the preferred setting can include all participants and the main conference scene.
- the position information of the first speaker is obtained by using sound source localization technology.
- the preset position that is, the exact position of the first speaker is determined in combination with a preset position where the speaker is speaking in the local venue. Specifically, a preset position closest to the position obtained by the sound source localization can be found from a plurality of preset positions as an accurate position.
- the controller controls the camera with the tracking flag of 0 to capture the video of the first speaker according to the exact position of the first speaker. After the camera with the tracking flag of 0 captures the appropriate video of the first speaker, its tracking flag is set to 1, and the tracking flag of the other camera is set to 0 by 1.
- Control when the current speaker changes from the first speaker to the second speaker, the second camera to capture a video of the second speaker, where the second speaker is The next speaker with a different speaker position.
- the controller may control the tracking flag as in step S62.
- a camera of 0 ie, the second camera
- the camera with the tracking flag of 0 successfully captures the video of the second speaker, its tracking flag is set to 1, and the tracking flag of the other camera is set to 0 by 1.
- the first camera and the second camera are sequentially controlled to alternately capture the video of the current speaker.
- the second camera After the second camera successfully captures the video of the second speaker, the second camera The tracking flag of the machine becomes 1, and the tracking flag of the first camera becomes 0. If the speaker is changed from the second speaker to the third speaker, the camera with the tracking flag of 0 (ie, the first camera) is controlled to shoot the third speaker, when the tracking flag is 0. After successfully acquiring the appropriate video of the third speaker, the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0, and the tracking flag of the other camera (ie, the second camera) is set to 1 by 0. . Similarly, when the speaker is changed from the third speaker to the fourth speaker (ie, the next speaker of the third speaker), the camera with the tracking flag of 0 (ie, the second camera) is controlled.
- the tracking flag of the camera with the tracking flag of 0 is set to 0 by 1, another camera
- the tracking flag of the first camera ie, the first camera
- the alternate shooting is performed in the same manner.
- the picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, where the current speaker is output, in the second picture The previous speaker of the current speaker is output.
- the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0.
- the camera with the tracking flag of 1 captures the video of the current speaker
- the camera with the tracking flag of 0 captures the video of the previous speaker of the current speaker.
- the simultaneously outputting the video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture refers to outputting the current speaker in the first picture, including In the second picture that is smaller than the first picture and outputs the previous speaker of the current speaker.
- the first picture shows the panorama of the local site in the form of picture-in-picture at the beginning of the meeting;
- the second picture shows that after the first speaker appears, In the big picture (ie In the first screen, the first speaker is output, and the lower right corner of the screen (ie, the second screen) outputs a panoramic view of the local site;
- the third image shows that after the speaker is changed from the first speaker to the second speaker, the large screen The second speaker is output, and the first speaker is output in the lower right corner of the screen.
- step S66 is performed before step S65.
- the camera will rotate/push the camera, resulting in a blurred or unstable picture.
- the first two speakers of the current talker may be respectively output in the first picture and the second picture, so that the output of the blurred or unstable picture can be avoided.
- Figs. 8A and 8B In ease of understanding, the following description will be made with reference to Figs. 8A and 8B.
- Fig. 8A in the order from left to right, the three figures are agreed to be the first picture, the second picture, and the third picture, respectively.
- the lower right corner of the first picture ie the second picture
- the speaker is the first picture of the first picture (ie the first picture)
- the previous speaker of the speaker the first picture of the big picture is the third picture
- the former speaker of the speaker is changed from the first picture big picture speaker to the third picture big picture speaker.
- the specific embodiment of the present invention outputs the active picture of the first picture speaker (the large picture of the second picture) and the previous speech of the first picture speaker.
- the solidified picture (the lower right picture of the second picture) can avoid outputting the blurred or unstable picture.
- the output mode shown in the second figure of Fig. 8A can also be used in the process from the change of the speaker to the successful acquisition of the video of the current speaker by the camera.
- the current speaker is outputted in the first picture, and the second speaker outputs the previous speaker of the current speaker, and the picture combination mode is unchanged;
- the first screen outputs the presenter, the second screen does not switch or outputs the interjector, and preferably the second screen does not switch;
- the camera does not switch and does not perform tracking, and the first screen outputs the current speech.
- the action has a picture, the second picture does not change, and the output picture combination mode does not change; if the speaker moves so that the speaker has not exceeded the first picture currently output but may or may have exceeded the set center area of the first picture , the camera does not switch, but can be properly tracked to keep the speaker in the set center area of the first picture, the second picture is unchanged, and the output picture combination mode is unchanged; if the speaker's movement makes the speaker already If the first screen of the current output is exceeded, the camera is switched, and the speaker is tracked. After the tracking succeeds, the speaker is output in the first screen, and the first screen before the camera switching is switched to the second screen for output;
- the local conference speaker has changed once and changed to the person or other person next to it.
- the camera does not switch and does not perform tracking.
- the first screen outputs the changed speaker at the center.
- the picture in the area, the second picture does not change; if the position of the speaker after the change has not exceeded the first picture before the change but may or may have exceeded the setting center area of the first picture, the camera does not switch, but can do Appropriate tracking, so that the changed speaker is located in the center of the first screen, the second screen remains unchanged; if the changed speaker position has exceeded the first screen before the change, the camera is switched, after the change
- the speaker performs tracking, the first screen outputs the changed speaker, and the second screen outputs the speaker before the change;
- the time for snatching is usually very short, and the combination of the output pictures is unchanged; (7), the local venue, many people discuss, alternate speech, that is, the speaker position change occurs multiple times.
- the camera alternately tracks the speaker after each position change, and changes the combination mode of the output screen, that is, after each change, the The current speaker is output in one picture, and the second speaker outputs the previous speaker of the current speaker.
- the camera with the tracking flag of 0 is controlled to track the speaker after the change of the shooting position, and after the camera successfully acquires the video of the appropriate size of the speaker.
- the tracking flag is set to 0 by 0, and the tracking flag of the other camera is set to 0 by 1. This always ensures that at some point, one camera is shooting the current speaker while another camera is idle and can be used to capture the next speaker of the current speaker. In other words, the two cameras can be mated and seamlessly mated. It takes a certain amount of time for the camera to successfully acquire the video of the changed speaker when the position of the speaker changes.
- the video of the previous speaker of the current speaker is kept output, and the video of the current speaker is output only after the camera successfully acquires the video of the current speaker, which is required to be compared with the prior art.
- Switching to the panoramic view of the site when the camera successfully acquires the changed speaker's video, the screen is switched to the changed speaker.
- the present invention can reduce the number of video switching, thereby making the picture tightly connected and the output video smoother. .
- the prior art camera shot for the camera video will include multiple speakers, if the multiple speakers are far apart, The expressions of the plurality of speakers are observed in the captured picture.
- the two cameras can alternately photograph the face of the speaker.
- the video of the current speaker and the previous speaker of the current speaker is simultaneously output in the form of picture-in-picture, so that the participants of the remote site can clearly observe the facial features of the current speaker.
- FIG. 9 is a flow chart of still another embodiment of a method of controlling video capture of the present invention.
- the control device As shown in FIG. 9 , taking the camera device as a camera as an example, the control device provided by the specific embodiment of the present invention
- the methods of frequency shooting include:
- the two cameras After the two cameras are turned on, that is, at the beginning of the conference, no one has spoken at the local conference site.
- the two cameras In order to transmit the layout of the local conference site to the remote conference site, the two cameras can be controlled to capture the panoramic view of the local conference site.
- the angle and size can be set by the user.
- the preferred setting can be a setting that can include all the participants and the main conference scene.
- Control the first camera to capture the video of the first speaker by using sound source localization technology and image recognition technology.
- the controller controls the camera with the tracking flag of 0 to capture the video of the first speaker according to the exact position of the first speaker. After the camera with the tracking flag of 0 captures the appropriate video of the first speaker, its tracking flag is set to 1, and the tracking flag of the other camera is set to 0 by 1.
- Control when the current speaker changes from the first speaker to the second speaker, the second camera to capture a video of the second speaker, where the second speaker is The next speaker with a different speaker position.
- the controller may control the tracking flag as in step S92.
- a camera of 0 ie, the second camera
- the camera with the tracking flag of 0 captures the appropriate video of the second speaker, its tracking flag is set to 1, and the tracking flag of the other camera is set to 0 by 1.
- the first camera and the second camera are sequentially controlled.
- the camera alternately captures the current speaker's video.
- the tracking flag of the second camera becomes 1, and the tracking flag of the first camera becomes 0.
- the camera with the tracking flag of 0 ie, the first camera
- the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0
- the tracking flag of the other camera ie, the second camera
- the camera with the tracking flag of 0 is controlled.
- the tracking flag of the camera with the tracking flag of 0 is set to 0 by 1, another camera
- the tracking flag of the first camera ie, the first camera
- the alternate shooting is performed in the same manner.
- the dual picture includes two partial pictures that are not included in each other, one part of the picture outputs the current speaker, and the other part of the picture outputs the previous speaker of the current speaker.
- the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0.
- the camera with the tracking flag of 1 captures the video of the current speaker
- the camera with the tracking flag of 0 captures the video of the previous speaker of the current speaker.
- the simultaneous output of the video of the current speaker and the previous speaker of the current speaker in the form of a two-picture means that the current speaker is outputted in one screen and outputted in another screen.
- the previous speaker of the current speaker, the above two pictures are not included in each other.
- the participant of the remote site can also observe the expression of one party's reaction to the other party's speech.
- These expressions may contain important meeting information, so that It is possible to retain important meeting information.
- the first picture shows the beginning of the meeting, with a double screen
- the form outputs the panoramic view of the local site
- the second picture shows that after the first speaker appears, the first speaker is output in the left picture, and the local picture is output on the right side
- the third picture shows that the speaker is After a speaker changes to the second speaker, the second speaker is output on the right screen, and the first speaker is output on the left screen.
- step S96 is performed before step S95.
- the camera Since the start of the change of the speaker, the camera will rotate/push and pull the camera during the video capture of the current speaker's video, resulting in a blurred or unstable picture. To this end, the first two speakers of the current talker are respectively output in the dual picture, and the output of the blurred or unstable picture can be avoided.
- Figs. 1 1 A and 1 1 B the three figures are agreed to be the first picture, the second picture, and the third picture.
- the speaker on the right side of the first picture is the previous speaker of the speaker on the left side of the first picture.
- the speaker on the left side of the first picture is the previous speaker of the speaker on the right side of the third picture.
- the speaker is changed from the speaker on the left side of the first picture to the speaker on the right side of the third picture.
- the specific embodiment of the present invention outputs the active picture of the first picture speaker (the picture on the right side of the second picture) and the speaker of the first picture.
- the solidified picture of the previous speaker can avoid outputting the blurred or unstable picture.
- the output mode shown in the second figure of Fig. 7A can also be used in the process from the change of the speaker to the successful acquisition of the video of the current speaker by the camera.
- One part of the picture outputs the current speaker, and the other part of the picture outputs the previous speaker of the current speaker, and the picture combination mode is unchanged;
- One part of the picture outputs the presenter, and the other part of the picture does not switch or outputs the interjector, preferably the other part of the picture does not switch;
- the camera does not switch, does not track, and the output picture combination mode does not change;
- the movement makes the speaker still not beyond the current output picture but may or has exceeded the setting center area of the current output picture, the camera does not switch, but can be properly tracked to keep the speaker in the center area, output screen
- the combination mode is unchanged; if the speaker's movement causes the speaker to have exceeded the current output picture, the camera is switched to track the speaker;
- the local conference speaker has changed once and changed to the person or other person next to it.
- the camera does not switch and does not perform tracking, and the output picture is the picture in which the latter speaker is located in the center area. If the position of the next speaker has not exceeded the output picture of the previous speaker but may or may have exceeded the set center area of the picture, the camera does not switch, but can be properly tracked to keep the next speaker. Located in the central area, the output picture is the picture in which the latter speaker is located in the central area; if the latter speaker position has exceeded the output picture of the previous speaker, the camera is switched to track the next speaker;
- the local venue many people discuss, alternate speech, that is, the speaker position change occurs multiple times.
- the camera alternately tracks the speaker after each position change, and changes the combination mode of the output screen, that is, after each change, part of the screen
- the current speaker is output, and the other part of the picture outputs the previous speaker of the current speaker.
- the camera with the tracking flag of 0 is controlled to track the speaker after the change of the shooting position, and after the camera successfully acquires the video of the appropriate size of the speaker,
- the tracking flag is set to 0 by 0, and the tracking flag of the other camera is set to 0 by 1.
- the two cameras can be mated and seamlessly mated. It takes a certain amount of time for the camera to successfully acquire the video of the changed speaker when the position of the speaker changes. During this period, the video of the previous speaker of the current speaker is kept output, and the video of the current speaker is output only after the camera successfully acquires the video of the current speaker, which is required to be compared with the prior art. Switching to the panoramic view of the site, when the camera successfully acquires the changed speaker's video, the screen is switched to the changed speaker.
- the present invention can reduce the number of video switching, thereby making the picture tightly connected and the output video smoother. .
- the prior art camera shot for the camera video will include multiple speakers, if the multiple speakers are far apart, The expressions of the plurality of speakers are observed in the captured picture.
- the two cameras can alternately photograph the face of the speaker.
- the video of the current speaker and the previous speaker of the current speaker is outputted in the form of a two-screen, and the participant of the remote site can observe, besides, the face of the current speaker can be clearly observed.
- the reaction of one of the local venues to the other party's speech (suitable for multi-person conversations, especially when the two talk), thus retaining more valuable meeting information.
- an embodiment of the present invention further provides an apparatus for controlling video capture.
- the apparatus for controlling video shooting provided by the embodiment of the present invention may be implemented by a device having a control processing function, which may be, for example, a camera, a video controller, a video terminal, or the like.
- a device having a control processing function which may be, for example, a camera, a video controller, a video terminal, or the like.
- an apparatus for controlling video shooting according to an embodiment of the present invention includes:
- the control unit 1 21 is configured to control, when the first speaker speaks, the first camera to capture a video of the first speaker; and to control when the current speaker changes from the first speaker to the second speaker
- the second camera device captures a video of the second speaker, wherein the second speaker is a next speaker different from the first speaker position; and is further used to sequentially change the speaker when the player changes
- the first camera and the second camera are controlled to alternately capture a video of the current speaker.
- the processing unit 1 22 is connected to the control unit 1 21 for outputting the video of the current speaker after successfully acquiring the video of the current speaker.
- control unit 1 21 is further configured to: control the first camera device and the device in an initial state before the first camera device captures the video of the first speaker
- the second camera device captures a video of the entire venue
- the processing unit 1 22 is further configured to output the captured video.
- control unit 1 21 is further configured to: separately set a tracking flag for the first camera device and the second camera device, where the tracking of the first camera device The flag is initially a first tracking flag, and the tracking flag of the second camera device is initially a second tracking flag.
- the control unit 1 21 is specifically configured to: when the first speaker speaks, control the first camera with the first tracking flag to capture the video of the first speaker, after successfully acquiring the video of the first speaker Setting a tracking flag of the first camera device from the first tracking flag to the second tracking flag, and setting a tracking flag of the second camera device from the second tracking flag to the first A tracking sign.
- the control unit 1 21 is specifically configured to: when the current speaker changes from the first speaker to the second speaker, control the second camera with the first tracking flag to capture the video of the second speaker After successfully acquiring the video of the second speaker, setting the tracking flag of the second camera device from the first tracking flag to the second tracking flag, and simultaneously tracking the first camera device A flag is set from the second tracking flag to the first tracking flag.
- the control unit 1 21 is specifically configured to: when each subsequent speaker change occurs, control the camera device having the first tracking flag to capture the video of the current speaker, and after successfully acquiring the video of the current speaker, The tracking flags of the first imaging device and the second imaging device are interchanged.
- control unit 1 21 is specifically configured to: determine whether the second speaker position is in an output screen of the first speaker; if the second speaker position is not in an output of the first speaker Painting In the face, controlling the second camera to capture a video of the second speaker;
- the second speaker position is in the output screen of the first speaker, further determining whether the second speaker position is within a setting area of an output screen of the first speaker; The second speaker position is within the set area, then controlling the first camera to capture a video of the second speaker; if the second speaker position is not within the set area, controlling the location
- the first camera device tracks the second speaker to position the second speaker in the set area.
- control unit 1 21 may be specifically configured to: control the camera to capture a video of the speaker by using a sound source localization technique.
- control unit 1 21 may be specifically configured to: control the camera to capture a video of the speaker by using a sound source localization technique in combination with a preset position or an image recognition technology.
- first imaging device and the second imaging device may be connected and fixed together by the connecting device, or may be independent of each other.
- the control unit 1 21 controls one of the camera devices to capture the video of the current speaker, and the processing unit 1 22 outputs the video after successfully acquiring the video of the current speaker. At this time, another camera device is in a standby state ready to track the next speaker of the current speaker. When the subsequent speaker changes, the control unit 1 21 can immediately control the video camera in the standby state to capture the video of the next speaker of the current speaker. Since the process of obtaining the appropriate video of the speaker after the change takes time from the change of the position of the speaker, the picture output to the remote site during this period does not need to be switched to the panorama of the site first, but continues to be output.
- the video of the former speaker is changed, so that the number of video switching can be reduced, so that the picture is closely connected and the output video is smoother.
- the control unit 1 21 controls the two camera devices to alternately capture the video of the current speaker, even if there is a speaker in the venue to quickly alternately speak, the two camera devices can alternately capture the face image of the speaker, more reserved. Valuable meeting information.
- the processing unit 1 22 may output the video of the current speaker in full screen.
- the processing unit 1 22 is specifically configured to: after successfully acquiring the video of the current speaker, set a full-screen display of the video of the current speaker, and after completing the setting, output the video of the current speaker in full screen; Output the current speech in full screen before the current speaker's video The video of the previous speaker.
- the processing unit 1 22 may simultaneously output the video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture.
- the processing unit 1 22 is specifically configured to: after successfully acquiring the video of the current speaker, set a video of the current speaker and a video of a previous speaker of the current speaker to be displayed in the form of picture-in-picture;
- the picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, and the current speaker is displayed in the first picture.
- the previous speaker of the current speaker is displayed in the second screen; after the setting is completed, the video of the current speaker and the previous speaker of the current speaker is simultaneously output in the form of picture-in-picture.
- the control unit 1 21 is further configured to: when the current speaker changes from the second speaker to the third speaker, control the first camera to capture a video of the third speaker, wherein the third speaker The next speaker is different from the second speaker position.
- the processing unit 1 22 is specifically configured to: before successfully acquiring the video of the third speaker: outputting the second speaker in the first screen, and outputting the first speaker in the second screen a solidified picture; or, outputting, in the first picture, the second speaker, outputting, in the second picture, the third speaker that has started shooting but has not successfully acquired; After the video of the third speaker: the third speaker is outputted in the first picture, and the second speaker is outputted in the second picture.
- the processing unit 1 22 may simultaneously output the video of the current speaker and the previous speaker of the current speaker in a two-picture form.
- the processing unit 1 22 is specifically configured to: after successfully acquiring the video of the current speaker, set a video of the current speaker and a video of a previous speaker of the current speaker to be displayed in a dual screen;
- the dual screen includes two partial screens that are not included in each other, a part of the screen displays the current speaker, and another part of the screen displays the previous speaker of the current speaker; after the setting is completed, the simultaneous output is performed in the form of a dual screen.
- the control unit 1 21 is further configured to: when the current speaker changes from the second speaker to the third speaker, control the first camera to capture a video of the third speaker, wherein the third speaker The next speaker is different from the second speaker position.
- the processing unit 1 22 is specifically configured to: before successfully acquiring the video of the third speaker: output a solidified picture of the first speaker in the part of the screen, and output the second in the other part of the picture a speaker; or, in the part of the screen, outputting the third speaker in the process of having started shooting but not successfully acquiring, outputting the second speaker in the other partial screen; After the video of the three talkers: the third speaker is outputted in the part of the picture, and the second speaker is outputted in the other part of the picture.
- the video of the current speaker and the previous speaker of the current speaker is outputted in the form of a two-picture, and the participant of the remote site can observe the local meeting site in addition to the close-up of the current speaker's face.
- the reaction of one party to the other party's speech (suitable for multi-person conversations, especially in the case of two people talking), so that more valuable meeting information is retained.
- each unit included is only divided according to functional logic, but is not limited to the above division, as long as the corresponding function can be implemented;
- the specific names of the units are also for convenience of distinction from each other and are not intended to limit the scope of the present invention.
- the device 13 for controlling video capture provided by the embodiment of the present invention includes: a controller 1 31, configured to control the first camera module 1 32 to capture the first speaker when the first speaker speaks Video; for controlling when the current speaker changes from the first speaker to the second speaker
- the second camera module 133 captures a video of the second speaker, wherein the second speaker is a next speaker different from the first speaker position; and is further used when a speaker change occurs subsequently,
- the first camera module 132 and the second camera module 133 are sequentially controlled to alternately capture the video of the current speaker.
- the output processor 134 is coupled to the first camera module 132 and the second camera module 133 for outputting the video of the current speaker after successfully acquiring the video of the current speaker.
- the output processor 134 may be integrated in the first camera module 132 or the second camera module 133, or may be separated from the first camera module 132 and the second camera module 133.
- the controller 131 is further configured to: before the first camera module 132 is configured to capture the video of the first speaker, in the initial state, control the first camera module 132 and the second camera module 133 to capture the entire site.
- the output processor 134 is further configured to output the video of the entire conference site that is captured.
- the first camera module 132 and the second camera module 133 may be independent of each other, or may be connected and fixed together by a connecting device to form a dual camera module.
- the first camera module 132 and the second camera module 133 may be integrated on the device 13 that controls video capture, or may be separate from the device 13 that controls video capture.
- the controller 131 is further configured to: set a tracking flag for the first camera module 132 and the second camera module 133, where the first camera module 132 The tracking flag is initially a first tracking flag, and the tracking flag of the second camera module 133 is initially a second tracking flag.
- the controller 131 is specifically configured to: when the first speaker speaks, control the first camera module 132 having the first tracking flag to capture the video of the first speaker, after successfully acquiring the video of the first speaker Setting the tracking flag of the first camera module 132 from the first tracking flag to the second tracking flag, and setting the tracking flag of the second camera module 133 from the second tracking flag to The first tracking mark is described.
- the controller 131 is specifically configured to: when the current speaker changes from the first speaker to the second speaker, control the second camera module 133 having the first tracking flag to capture the video of the second speaker After the video of the second speaker is successfully acquired, the tracking flag of the second camera module 133 is set from the first tracking flag to the second tracking flag, and the first camera module is simultaneously A tracking flag of 132 is set from the second tracking flag to the first tracking flag.
- the controller 131 is specifically configured to: when each subsequent speaker change occurs, control the camera device having the first tracking flag to capture a video of the current speaker, and after successfully acquiring the video of the current speaker, The tracking marks of the first camera module 132 and the second camera module 133 are interchanged.
- the apparatus 13 for controlling video shooting provided by the embodiment of the present invention further includes:
- the sound microphone array 135 and the sound source locator 136 are configured to: acquire a position of a speaker by using a sound source localization technique, wherein the sound source locator 136 performs sound source localization technology according to the sound picked up by the sound pickup microphone array 135. Positioning.
- the controller 131 controls the camera module to capture the video of the speaker based on the position obtained by the sound source localization.
- the apparatus 13 for controlling video shooting provided by the embodiment of the present invention further includes: an image locator 137, configured to perform image recognition on a speaker by using image recognition techniques such as face detection, skin color detection, or lip motion detection.
- the controller 131 can be used to control the camera module to capture the video of the speaker according to the position information obtained by the image recognition technology.
- the controller 131 controls the camera module to capture the video of the speaker according to the position and preset position information obtained by the sound source.
- the image locator 137 is specifically configured to determine whether the second speaker position is in an output screen of the first speaker; if the second speaker position is not in an output screen of the first speaker
- the controller 131 controls the second camera module 133 to capture the video of the second speaker;
- the controller 131 controls the first The camera module 132 captures a video of the second speaker; if the second speaker position is not within the set area, the controller 131 controls the first camera module 132 to track the second speaker, so that The second talker position is within the set area.
- the controller 131 controls the first camera module 132 to capture the video of the current speaker, the output processor 134 acquires the video of the current speaker, and outputs the video. Frequency.
- the second camera module 1 33 is in a standby state ready to track the next speaker of the current speaker.
- the controller 1 31 can immediately control the second camera module 1 33 in the standby state to capture the video of the next speaker of the current speaker. Since the process of obtaining the appropriate video of the speaker after the change takes time from the change of the position of the speaker, the picture output to the remote site during this period does not need to be switched to the panorama of the site first, but continues to be output.
- the video of the former speaker is changed, so that the number of video switching can be reduced, so that the picture is closely connected and the output video is smoother.
- the controller 1 31 controls the two camera modules to alternately capture the video of the current speaker, even if there is a speaker in the venue to quickly alternately speak, the two camera modules can alternately capture the speaker's face picture, more reserved Valuable meeting information.
- the output processor 134 may output the video of the current speaker in full screen.
- the output processor 1 is specifically configured to: after successfully acquiring the video of the current speaker, set a full-screen display of the current speaker's video, and after completing the setting, output the video of the current speaker in full screen; Before the current speaker's video, the video of the current speaker's previous speaker is output in full screen.
- the output processor 134 may simultaneously output the video of the current speaker and the previous speaker of the current speaker in a picture-in-picture format.
- the output processor 1 is specifically configured to: after successfully acquiring the video of the current speaker, set the video of the current speaker and the video of the previous speaker of the current speaker to be displayed in the form of picture-in-picture
- the picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, and the current speaker is displayed in the first picture, The previous speaker of the current speaker is displayed in the second screen; after the setting is completed, the video of the current speaker and the previous speaker of the current speaker is simultaneously output in the form of picture-in-picture.
- the controller 1 31 is further configured to: when the current speaker changes from the second speaker to the third speaker, The first camera module 1 32 is controlled to capture a video of the third speaker, wherein the third speaker is the next speaker different from the second speaker position.
- the output processor 1 is specifically configured to: before successfully acquiring the video of the third speaker: outputting the second speaker in the first screen, and outputting the first speech in the second screen a solidified picture of the person; or, outputting, in the first picture, the second speaker, outputting, in the second picture, the third speaker who has started shooting but has not successfully acquired; After the video of the third speaker: the third speaker is outputted in the first picture, and the second speaker is outputted in the second picture.
- the output processor 134 may simultaneously output the video of the current speaker and the previous speaker of the current speaker in a two-picture form.
- the output processor 1 is specifically configured to: after successfully acquiring the video of the current speaker, set a video of the current speaker and a video of a previous speaker of the current speaker to be displayed in a dual screen;
- the dual screen includes two partial screens that are not included in each other, a part of the screen displays the current speaker, and another part of the screen displays the previous speaker of the current speaker; after the setting is completed, the two speakers simultaneously output in the form of two pictures.
- the controller 1 31 is further configured to: when the current speaker changes from the second speaker to the third speaker, control the first camera module 1 32 to capture a video of the third speaker, wherein the third The speaker is the next speaker who is different from the second speaker position.
- the output processor 1 is specifically configured to: before successfully acquiring the video of the third speaker: outputting a solidified picture of the first speaker in the part of the screen, and outputting the first part in the part of the screen a second speaker; or, in the part of the screen, outputting the third speaker who has started shooting but has not successfully acquired, and outputs the second speaker in the other partial screen; After successfully acquiring the video of the third speaker: outputting the third speaker in the part of the picture, and outputting the second speaker in the other part of the picture.
- the video of the current speaker and the previous speaker of the current speaker is outputted in the form of a two-picture, and the participant of the remote site can observe the local meeting site in addition to the close-up of the current speaker's face.
- the reaction of one of the parties to the other party's speech thus further retaining valuable meeting information.
- the apparatus 13 for controlling video capture provided by the embodiment of the present invention includes:
- Each of the above sections may be integrated into a complete device or separate parts and coordinated under the control of the controller 131 and the main control module 138.
- the controller 131 can control the two camera modules to shoot the conference site. panoramic. After the camera module captures the video of the local site, the video captured by the second camera module 133 is encoded and decoded by the video signal processor 140 in the video module 139, and under the control of the main control module 138, The video is transmitted to the remote site through the network.
- the microphone array 135 picks up the sound of the local site, and sends the voice of the local site to the sound source locator 136, where the local site
- the sound may be sent to the sound source locator 136, and may be sent to the sound source locator 136 after being subjected to denoising or the like by an internal module of the audio module 141 (for example, a module having a preprocessing function).
- the sound source locator 136 obtains the position information generated by the sound source locator 136 according to the position information generated by the sound source localization, and controls the first camera module 132.
- the image locator 137 determines the exact position (including the face position) of the first speaker using image recognition technology based on the video of the first speaker acquired by the first camera module 132.
- the first camera module 132 i.e., the camera module with the tracking flag of 0
- the controller 131 can control the The camera module with the tracking flag of 0 (ie, the second camera module 133) captures the video of the second speaker, and the method of controlling the shooting is the same as above.
- the second camera module 133 captures the appropriate video of the second speaker, its tracking flag is set to 0, and the tracking flag of the first camera module 132 is set to 0 by 1.
- the controller 131 controls the camera module with the tracking flag of 0 (specifically, the first camera module 132 or the second camera module 133) to track the speaker after the change of the shooting. Moreover, after the camera module successfully captures the appropriate video of the speaker, its tracking flag is set to 1 by 0, and the tracking flag of the other camera module is set to 0 by 1.
- the output processor 134 retrieves the speaker's video from the camera module. After the video of the speaker is obtained, the output processor 134 can set the output mode of the video, and the obtained video of the speaker can be output in a full screen, picture-in-picture or dual-picture manner.
- the output processor 134 transmits the video of the speaker to the video signal processor 140, and the video signal processor 140 encodes the video of the speaker. Then, under the control of the main control module 138, the video of the talker is transmitted from the video signal processor 140 to the remote conference site through the network.
- the main control module 138 can control the output processor 134 to output the video of the previous speaker of the current speaker.
- the audio signal processor 1 42 is used to process the sound of the speaker of the local venue picked up by the pickup microphone 143. It should be noted that the sound picked up by the pickup microphone 143 is different from the pickup.
- the sound picked up by the microphone array 1 35 is used for transmission to the remote site together with the video captured by the camera module, and the latter is used for sound source localization.
- Both the speaker 1 44 and the display 1 45 are basic configurations of the device 13 for controlling video capture, respectively for outputting audio and video in a local venue.
- the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as the cells may or may not be physical. Units can be located in one place, or they can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the embodiments of the present invention.
- the connection relationship between the modules indicates that there is a communication connection therebetween, and specifically, one or more communication buses or signal lines can be realized.
- aspects of the invention, or aspects of various aspects may be embodied as a system, method, or computer program product.
- aspects of the invention, or possible implementations of various aspects may be in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, etc.), or a combination of software and hardware aspects, They are collectively referred to herein as "circuits," “modules,” or “systems.”
- aspects of the invention, or various possible implementations of the invention may take the form of a computer program product, which is a computer readable program code stored in a computer readable medium.
- the computer readable medium can be a computer readable signal or a computer readable storage medium.
- the computer readable storage device includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPR0M or flash memory), optical fiber, portable Read only memory (CD-ROM).
- the processor in the computer reads the computer readable program code stored in the computer readable medium, such that the processor can perform the functional actions specified in each step or combination of steps in the flowchart; A device that functions as specified in each block, or combination of blocks.
- the computer readable program code can be executed entirely on the user's computer, partly on the user's computer, as a separate software package, partly on the user's computer and partly on the remote computer, or entirely on the remote computer or server.
- the functions noted in the various steps of the flowcharts or in the blocks of the block diagrams may not occur in the order noted in the drawings. For example, two steps, or two blocks shown in succession, may be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Studio Devices (AREA)
Abstract
Provided are a video shooting control method and device, which relate to the field of video images and can reduce the frequency of video switching while reserving a facial picture of a speaker, so that the picture link is tight, and the output video is more fluent. The method comprises: when a first speaker speaks, controlling a first shooting device to shoot a video of the first speaker; when the current speaker is changed from the first speaker to a second speaker, controlling a second shooting device to shoot a video of the second speaker, wherein the second speaker is the next speaker in a different position from the first speaker; when the speaker is subsequently changed again, controlling the first shooting device and the second shooting device in sequence to shoot a video of the current speaker alternately; and after the video of the current speaker is acquired successfully, outputting the video of the current speaker. The present invention is used in a video conference.
Description
一种控制视频拍摄的方法和装置 Method and device for controlling video shooting
本申请要求于 201 3 年 1 1 月 1 4 日提交中 国专利局、 申请号为 201 31 0566974. 1、发明名称为 "一种控制视频拍摄的方法和装置" 的中国专 利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域 This application claims priority to Chinese Patent Application No. 201 31 0566974.1, entitled "A Method and Apparatus for Controlling Video Shooting", which is filed in the Chinese Patent Office on January 1, 2011. The entire contents are incorporated herein by reference. Technical field
本发明涉及视频图像领域, 尤其涉及一种控制视频拍摄的方法和装置。 背景技术 The present invention relates to the field of video images, and in particular, to a method and apparatus for controlling video capture. Background technique
一般情况下, 视频会议中摄像机以固定的大小、 固定的角度拍摄所有与会 者的全景画面。 当会场比较大时, 摄像机可能离讲话人较远, 拍摄出来的画面 无法确定谁在讲话, 无法看清讲话人的面部表情, 由此造成会议有价值信息的 损失。 In general, in a video conference, the camera captures a panoramic view of all participants at a fixed size and at a fixed angle. When the venue is relatively large, the camera may be far away from the speaker. The captured picture cannot determine who is speaking, and cannot see the speaker's facial expression, thus causing loss of valuable information of the conference.
为了避免因只拍摄全景画面而造成会议有价值信息的损失, 可以使用两台 摄像机同时拍摄会场画面。 其中一台摄像机始终用于拍摄会场的全景画面, 另 一台摄像机用于跟踪拍摄讲话人的画面。 In order to avoid the loss of valuable information of the conference by taking only the panoramic picture, you can use two cameras to simultaneously capture the scene. One of the cameras is always used to capture the panoramic view of the venue, and the other camera is used to track the picture of the speaker.
当会场中有人交替讲话时, 由于跟踪拍摄讲话者画面的摄像机在成功获取 当前讲话者的画面之前需要转动 /推拉摄像头, 这一过程中拍摄到的视频不稳 定, 观看不舒服, 在这期间画面需要先切换到会场的全景。 但是, 这种切换会 导致画面的衔接不紧密, 传送到远端会场的视频不流畅, 会给观看者很不舒服 的感觉。 发明内容 When someone in the venue alternately speaks, since the camera that tracks the speaker's picture needs to rotate/push and pull the camera before successfully acquiring the current speaker's picture, the video captured during this process is unstable and unobstructed, during which time the picture is displayed. You need to switch to the panorama of the venue first. However, this kind of switching will result in the interface being not tightly connected, and the video transmitted to the remote site will not be smooth, which will give the viewer a very uncomfortable feeling. Summary of the invention
本发明的实施例提供一种控制视频拍摄的方法和装置, 能够在保留讲话者 的面部画面的同时, 减少视频切换次数, 使画面的衔接紧密, 输出的视频更加 流畅。 Embodiments of the present invention provide a method and apparatus for controlling video capture, which can reduce the number of video switching while keeping the speaker's face picture, make the picture tightly connected, and output the video more smoothly.
第一方面, 提供一种控制视频拍摄的方法, 包括: In a first aspect, a method of controlling video capture is provided, including:
在第一讲话者讲话时, 控制第一摄像装置拍摄第一讲话者的视频;
在当前讲话者从所述第一讲话者变更为第二讲话者时, 控制第二摄像装置 拍摄第二讲话者的视频, 其中, 所述第二讲话者为与所述第一讲话者位置不同 的下一个讲话者; Controlling, by the first speaker, the first camera to capture a video of the first speaker; Controlling, by the second camera, the second camera to capture a video of the second speaker when the current speaker changes from the first speaker to the second speaker, wherein the second speaker is different from the first speaker Next speaker;
当后续再发生讲话者变更时, 依次控制所述第一摄像装置和所述第二摄像 装置交替拍摄当前讲话者的视频; When the speaker change occurs again, the first camera and the second camera are sequentially controlled to alternately capture the video of the current speaker;
在成功获取所述当前讲话者的视频之后, 输出所述当前讲话者的视频。 结合第一方面, 在第一种可能的实现方式中, 所述输出所述当前讲话者的 视频包括: 全屏输出所述当前讲话者的视频; After successfully acquiring the video of the current speaker, the video of the current speaker is output. With reference to the first aspect, in a first possible implementation, the outputting the video of the current speaker includes: outputting a video of the current speaker in full screen;
结合第一方面的第一种可能的实现方式, 在第一方面的第二种可能的实现 方式中, 所述全屏输出所述当前讲话者的视频包括: In conjunction with the first possible implementation of the first aspect, in a second possible implementation manner of the first aspect, the displaying, by the full screen, the video of the current speaker includes:
在成功获取所述当前讲话者的视频之前, 全屏输出所述当前讲话者的前一 个讲话者的视频; Before the video of the current speaker is successfully acquired, the video of the previous speaker of the current speaker is output in full screen;
在成功获取所述当前讲话者的视频之后,全屏输出所述当前讲话者的视频。 结合第一方面, 在第一方面的第三种可能的实现方式中, 所述输出所述当 前讲话者的视频包括: 以画中画的形式同时输出所述当前讲话者和所述当前讲 话者的前一个讲话者的视频; After successfully acquiring the video of the current speaker, the video of the current speaker is output in full screen. In conjunction with the first aspect, in a third possible implementation of the first aspect, the outputting the video of the current speaker comprises: simultaneously outputting the current speaker and the current speaker in a picture-in-picture format a video of the previous speaker;
其中, 所述画中画包括第一画面和包含在所述第一画面中的比所述第一画 面小的第二画面, 在所述第一画面中输出所述当前讲话者, 在所述第二画面中 输出所述当前讲话者的前一个讲话者。 The picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, and the current speaker is output in the first picture, The previous speaker of the current speaker is output in the second picture.
结合第一方面的第三种可能的实现方式, 在第一方面的第四种可能的实现 方式中, 所述方法还包括: In conjunction with the third possible implementation of the first aspect, in a fourth possible implementation manner of the first aspect, the method further includes:
在当前讲话者从所述第二讲话者变更为第三讲话者时, 控制所述第一摄像 装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为与所述第二讲话者位置 不同的下一个讲话者; Controlling, by the first camera device, a video of a third speaker when the current speaker changes from the second speaker to the third speaker, wherein the third speaker is the second speaker The next speaker with a different location;
所述以画中画的形式同时输出所述当前讲话者和所述当前讲话者的前一个 讲话者的视频包括: The simultaneously outputting the video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture includes:
在成功获取所述第三讲话者的视频之前: 在所述第一画面中输出所述第二
讲话者, 在所述第二画面中输出所述第一讲话者的凝固画面; 或者, 在所述第 一画面中输出所述第二讲话者, 在所述第二画面中输出已经开始拍摄但尚未成 功获取过程中的所述第三讲话者; Before successfully acquiring the video of the third speaker: outputting the second in the first screen a speaker, outputting a solidified picture of the first speaker in the second picture; or outputting the second speaker in the first picture, and outputting the already started shooting in the second picture The third speaker in the process has not been successfully acquired;
在成功获取所述第三讲话者的视频之后: 在所述第一画面中输出所述第三 讲话者, 在所述第二画面中输出所述第二讲话者。 After successfully acquiring the video of the third speaker: outputting the third speaker in the first picture, and outputting the second speaker in the second picture.
结合第一方面, 在第一方面的第五种可能的实现方式中, 所述输出所述当 前讲话者的视频包括: 以双画面的形式同时输出所述当前讲话者和所述当前讲 话者的前一个讲话者的视频; With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the outputting the video of the current speaker includes: simultaneously outputting the current speaker and the current speaker in a dual screen manner a video of the previous speaker;
其中, 所述输出画面包括互不包含的二部分画面, 一部分画面输出所述当 前讲话者, 另一部分画面输出所述当前讲话者的前一个讲话者。 The output screen includes two partial screens that are not included in each other. One part of the screen outputs the current speaker, and the other part of the screen outputs the previous speaker of the current speaker.
结合第一方面的第五种可能的实现方式, 在第一方面的第六种可能的实现 方式中, 所述方法还包括: In conjunction with the fifth possible implementation of the first aspect, in a sixth possible implementation manner of the first aspect, the method further includes:
在当前讲话者从所述第二讲话者变更为第三讲话者时, 控制所述第一摄像 装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为与所述第二讲话者位置 不同的下一个讲话者; Controlling, by the first camera device, a video of a third speaker when the current speaker changes from the second speaker to the third speaker, wherein the third speaker is the second speaker The next speaker with a different location;
所述以双画面的形式同时输出所述当前讲话者和所述当前讲话者的前一个 讲话者的视频包括: The simultaneously outputting the video of the current speaker and the previous speaker of the current speaker in the form of a two-picture includes:
在成功获取所述第三讲话者的视频之前: 在所述一部分画面中输出所述第 一讲话者的凝固画面, 在所述另一部分画面中输出所述第二讲话者; 或者, 在 所述一部分画面中输出已经开始拍摄但尚未成功获取过程中的所述第三讲话 者, 在所述另一部分画面中输出所述第二讲话者; Before successfully acquiring the video of the third speaker: outputting a solidified picture of the first speaker in the part of the screen, and outputting the second speaker in the other part of the picture; or Outputting, in a part of the screen, the third speaker who has started shooting but has not successfully acquired, and outputs the second speaker in the other part of the screen;
在成功获取所述第三讲话者的视频之后: 在所述一部分画面中输出所述第 三讲话者, 在所述另一部分画面中输出所述第二讲话者。 After successfully acquiring the video of the third speaker: the third speaker is outputted in the portion of the screen, and the second speaker is outputted in the other portion of the screen.
结合第一方面, 在第一方面的第七种可能的实现方式中, 在所述控制第一 摄像装置拍摄第一讲话者的视频之前, 所述方法还包括: In conjunction with the first aspect, in a seventh possible implementation of the first aspect, before the controlling the first camera to capture the video of the first speaker, the method further includes:
在初始状态时, 控制所述第一摄像装置和所述第二摄像装置拍摄整个会场 的视频并将所拍摄的视频输出。
结合第一方面或第一方面的第一种至第七种可能的任一实现方式, 在第一 方面的第八种可能的实现方式中, 在所述控制第一摄像装置拍摄第一讲话者的 视频之前, 所述方法还包括: In the initial state, the first camera and the second camera are controlled to capture a video of the entire venue and output the captured video. In conjunction with the first aspect, or any one of the first to seventh possible implementations of the first aspect, in the eighth possible implementation of the first aspect, the first camera is controlled by the first camera Before the video, the method further includes:
为所述第一摄像装置和所述第二摄像装置分别设置追踪标志, 其中, 所述 第一摄像装置的追踪标志初始为第一追踪标志, 所述第二摄像装置的追踪标志 初始为第二追踪标志; Providing a tracking flag for the first imaging device and the second imaging device, wherein the tracking flag of the first imaging device is initially a first tracking flag, and the tracking flag of the second imaging device is initially a second Tracking mark
所述在第一讲话者讲话时,控制第一摄像装置拍摄第一讲话者的视频包括: 在第一讲话者讲话时, 控制具有第一追踪标志的第一摄像装置去拍摄第一讲话 者的视频, 在成功获取所述第一讲话者的视频后, 将所述第一摄像装置的追踪 标志从所述第一追踪标志设置为所述第二追踪标志, 同时将所述第二摄像装置 的追踪标志从所述第二追踪标志设置为所述第一追踪标志; The controlling the first camera to capture the video of the first speaker when the first speaker speaks includes: controlling, when the first speaker speaks, the first camera having the first tracking flag to capture the first speaker a video, after successfully acquiring the video of the first speaker, setting a tracking flag of the first camera device from the first tracking flag to the second tracking flag, and simultaneously a tracking flag is set from the second tracking flag to the first tracking flag;
所述在当前讲话者从所述第一讲话者变更为第二讲话者时, 控制第二摄像 装置拍摄第二讲话者的视频包括: 在当前讲话者从所述第一讲话者变更为第二 讲话者时, 控制具有所述第一追踪标志的第二摄像装置去拍摄第二讲话者的视 频, 在成功获取所述第二讲话者的视频后, 将所述第二摄像装置的追踪标志从 所述第一追踪标志设置为所述第二追踪标志, 同时将所述第一摄像装置的追踪 标志从所述第二追踪标志设置为所述第一追踪标志。 When the current speaker changes from the first speaker to the second speaker, controlling the second camera to capture the video of the second speaker includes: changing the current speaker from the first speaker to the second speaker a speaker, controlling a second camera having the first tracking flag to capture a video of the second speaker, and after successfully acquiring the video of the second speaker, the tracking flag of the second camera is The first tracking flag is set as the second tracking flag, and the tracking flag of the first camera is set from the second tracking flag to the first tracking flag.
结合第一方面的第八种可能的实现方式, 在第一方面的第九种可能的实现 方式中, 所述当后续再发生讲话者变更时, 依次控制所述第一摄像装置和所述 第二摄像装置交替拍摄当前讲话者的视频包括: 后续每次发生讲话者变更时, 控制具有所述第一追踪标志的摄像装置去拍摄当前讲话者的视频, 在成功获取 当前讲话者的视频后, 将所述第一摄像装置和所述第二摄像装置的追踪标志互 换。 In conjunction with the eighth possible implementation of the first aspect, in a ninth possible implementation manner of the first aspect, the first camera device and the first device are sequentially controlled when a speaker change occurs subsequently The second camera device alternately captures the video of the current speaker: the camera device having the first tracking flag is controlled to capture the video of the current speaker every time the speaker changes, and after successfully acquiring the video of the current speaker, The tracking marks of the first imaging device and the second imaging device are interchanged.
结合第一方面的第九种可能的实现方式, 在第一方面的第十种可能的实现 方式中, 控制摄像装置拍摄讲话者的视频包括: In conjunction with the ninth possible implementation of the first aspect, in a tenth possible implementation manner of the first aspect, controlling the camera to capture the video of the speaker includes:
利用声源定位技术, 控制摄像装置拍摄讲话者的视频。 Using the sound source localization technology, the camera is controlled to capture the video of the speaker.
结合第一方面的第十种可能的实现方式, 在第一方面的第十一种可能的实
现方式中, 所述利用声源定位技术, 控制摄像装置拍摄讲话者的视频包括: 利用声源定位技术并结合预置位或图像识别技术, 控制摄像装置拍摄讲话 者的视频。 In conjunction with the tenth possible implementation of the first aspect, the eleventh possible implementation in the first aspect In the current mode, the controlling the camera to capture the video of the speaker by using the sound source localization technology comprises: controlling the camera to capture the video of the speaker by using the sound source localization technology and combining preset position or image recognition technology.
结合第一方面或第一方面的第一种至第十一种可能的任一实现方式, 在第 一方面的第十二种可能的实现方式中, 所述在当前讲话者从所述第一讲话者变 更为第二讲话者时, 控制第二摄像装置拍摄第二讲话者的视频包括: In conjunction with the first aspect, or any one of the first to eleventh possible implementations of the first aspect, in a twelfth possible implementation of the first aspect, the current speaker is from the first When the speaker changes to the second speaker, controlling the second camera to capture the video of the second speaker includes:
判断所述第二讲话者位置是否在所述第一讲话者的输出画面中; 若所述第二讲话者位置不在所述第一讲话者的输出画面中, 则控制所述第 二摄像装置拍摄所述第二讲话者的视频; Determining whether the second speaker position is in an output screen of the first speaker; if the second speaker position is not in an output screen of the first speaker, controlling the second camera to shoot a video of the second speaker;
若所述第二讲话者位置在所述第一讲话者的输出画面中, 则进一步判断所 述第二讲话者位置是否在所述第一讲话者的输出画面的设定区域内; If the second speaker position is in the output screen of the first speaker, further determining whether the second speaker position is within a setting area of the output screen of the first speaker;
若所述第二讲话者位置在所述设定区域内, 则控制所述第一摄像装置拍摄 所述第二讲话者的视频; Controlling, by the first camera device, a video of the second speaker if the second speaker position is within the set area;
若所述第二讲话者位置不在所述设定区域内, 则控制所述第一摄像装置跟 踪拍摄所述第二讲话者, 以使所述第二讲话者位置在所述设定区域内。 And if the second speaker position is not within the set area, controlling the first camera to track the second speaker to position the second speaker in the set area.
第二方面, 提供一种控制视频拍摄的装置, 包括: In a second aspect, an apparatus for controlling video capture is provided, including:
控制单元, 用于在第一讲话者讲话时, 控制第一摄像装置拍摄第一讲话者 的视频; a control unit, configured to control, when the first speaker speaks, the first camera to capture a video of the first speaker;
所述控制单元, 还用于在当前讲话者从所述第一讲话者变更为第二讲话者 时, 控制第二摄像装置拍摄第二讲话者的视频, 其中, 所述第二讲话者为与所 述第一讲话者位置不同的下一个讲话者; The control unit is further configured to: when the current speaker changes from the first speaker to the second speaker, control the second camera to capture a video of the second speaker, where the second speaker is The next speaker whose first speaker position is different;
所述控制单元, 还用于当后续再发生讲话者变更时, 依次控制所述第一摄 像装置和所述第二摄像装置交替拍摄当前讲话者的视频; The control unit is further configured to sequentially control the first camera device and the second camera device to sequentially capture a video of a current speaker when a speaker change occurs subsequently;
处理单元, 与所述控制单元连接, 用于在成功获取所述当前讲话者的视频 之后输出所述当前讲话者的视频。 And a processing unit, coupled to the control unit, configured to output a video of the current speaker after successfully acquiring the video of the current speaker.
结合第二方面, 在第二方面的第一种可能的实现方式中, 所述处理单元具 体用于:
设置所述当前讲话者的视频全屏显示; With reference to the second aspect, in a first possible implementation manner of the second aspect, the processing unit is specifically configured to: Setting a full-screen display of the current speaker's video;
全屏输出所述当前讲话者的视频。 The video of the current speaker is output in full screen.
结合第二方面的第一种可能的实现方式, 在第二方面的第二种可能的实现 方式中, 所述处理单元具体用于: With reference to the first possible implementation of the second aspect, in a second possible implementation manner of the second aspect, the processing unit is specifically configured to:
在成功获取所述当前讲话者的视频之前, 全屏输出所述当前讲话者的前一 个讲话者的视频; 在成功获取所述当前讲话者的视频之后, 全屏输出所述当前 讲话者的视频。 The video of the previous speaker of the current speaker is output in full screen before the video of the current speaker is successfully acquired; after the video of the current speaker is successfully acquired, the video of the current speaker is output in full screen.
结合第二方面, 在第二方面的第三种可能的实现方式中, 所述处理单元还 具体用于: With reference to the second aspect, in a third possible implementation manner of the second aspect, the processing unit is further configured to:
设置所述当前讲话者的视频和所述当前讲话者的前一个讲话者的视频以画 中画的形式进行显示; Setting a video of the current speaker and a video of a previous speaker of the current speaker in a picture-in-picture format;
其中, 所述画中画包括第一画面和包含在所述第一画面中的、 比所述第一 画面小的第二画面, 在所述第一画面中显示所述当前讲话者, 在所述第二画面 中显示所述当前讲话者的前一个讲话者; The picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, and the current speaker is displayed in the first picture. Displaying the previous speaker of the current speaker in the second screen;
以画中画的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话 者的视频; Simultaneously outputting a video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture;
结合第二方面的第三种可能的实现方式, 在第二方面的第四种可能的实现 方式中, 所述控制单元还用于: In conjunction with the third possible implementation of the second aspect, in a fourth possible implementation of the second aspect, the control unit is further configured to:
在当前讲话者从所述第二讲话者变更为第三讲话者时, 控制所述第一摄像 装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为与所述第二讲话者位置 不同的下一个讲话者; Controlling, by the first camera device, a video of a third speaker when the current speaker changes from the second speaker to the third speaker, wherein the third speaker is the second speaker The next speaker with a different location;
所述处理单元具体用于: The processing unit is specifically configured to:
在成功获取所述第三讲话者的视频之前: 在所述第一画面中输出所述第二 讲话者, 在所述第二画面中输出所述第一讲话者的凝固画面; 或者, 在所述第 一画面中输出所述第二讲话者, 在所述第二画面中输出已经开始拍摄但尚未成 功获取过程中的所述第三讲话者; Before successfully acquiring the video of the third speaker: outputting the second speaker in the first screen, and outputting a solidified picture of the first speaker in the second screen; or Outputting the second speaker in the first screen, and outputting, in the second screen, the third speaker that has started shooting but has not successfully acquired;
在成功获取所述第三讲话者的视频之后: 在所述第一画面中输出所述第三
讲话者, 在所述第二画面中输出所述第二讲话者。 After successfully acquiring the video of the third speaker: outputting the third in the first screen The speaker outputs the second speaker in the second picture.
结合第二方面, 在第二方面的第五种可能的实现方式中, 所述处理单元还 具体用于: With reference to the second aspect, in a fifth possible implementation manner of the second aspect, the processing unit is further configured to:
设置所述当前讲话者的视频和所述当前讲话者的前一个讲话者的视频以双 画面的形式进行显示; Setting a video of the current speaker and a video of a previous speaker of the current speaker to display in a two-picture form;
其中, 所述双画面包括互不包含的二部分画面, 一部分画面显示所述当前 讲话者, 另一部分画面显示所述当前讲话者的前一个讲话者; The dual picture includes two parts of pictures that are not included in each other, a part of the picture displays the current speaker, and another part of the picture displays the previous speaker of the current speaker;
以双画面的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话 者的视频。 The video of the current speaker and the previous speaker of the current speaker is simultaneously output in the form of a two-picture.
结合第二方面的第五种可能的实现方式, 在第二方面的第六种可能的实现 方式中, 所述控制单元还用于: In conjunction with the fifth possible implementation of the second aspect, in a sixth possible implementation manner of the second aspect, the control unit is further configured to:
在当前讲话者从所述第二讲话者变更为第三讲话者时, 控制所述第一摄像 装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为与所述第二讲话者位置 不同的下一个讲话者; Controlling, by the first camera device, a video of a third speaker when the current speaker changes from the second speaker to the third speaker, wherein the third speaker is the second speaker The next speaker with a different location;
所述处理单元具体用于: The processing unit is specifically configured to:
在成功获取所述第三讲话者的视频之前: 在所述一部分画面中输出所述第 一讲话者的凝固画面, 在所述另一部分画面中输出所述第二讲话者; 或者, 在 所述一部分画面中输出已经开始拍摄但尚未成功获取过程中的所述第三讲话 者, 在所述另一部分画面中输出所述第二讲话者; Before successfully acquiring the video of the third speaker: outputting a solidified picture of the first speaker in the part of the screen, and outputting the second speaker in the other part of the picture; or Outputting, in a part of the screen, the third speaker who has started shooting but has not successfully acquired, and outputs the second speaker in the other part of the screen;
在成功获取所述第三讲话者的视频之后: 在所述一部分画面中输出所述第 三讲话者, 在所述另一部分画面中输出所述第二讲话者。 After successfully acquiring the video of the third speaker: the third speaker is outputted in the portion of the screen, and the second speaker is outputted in the other portion of the screen.
结合第二方面, 在第二方面的第七种可能的实现方式中, 所述控制单元还 用于: In conjunction with the second aspect, in a seventh possible implementation of the second aspect, the control unit is further configured to:
在控制第一摄像装置拍摄第一讲话者的视频之前, 在初始状态时, 控制所 述第一摄像装置和所述第二摄像装置拍摄整个会场的视频; Before controlling the first camera to capture the video of the first speaker, in the initial state, controlling the first camera and the second camera to capture a video of the entire venue;
所述处理单元, 还用于将所拍摄的视频输出。 The processing unit is further configured to output the captured video.
结合第二方面或第二方面的第一种至第七种可能的任一实现方式, 在第二
方面的第八种可能的实现方式中, 所述控制单元还用于: In combination with the second aspect or the first to seventh possible implementations of the second aspect, in the second In an eighth possible implementation of the aspect, the control unit is further configured to:
为所述第一摄像装置和所述第二摄像装置分别设置追踪标志, 其中, 所述 第一摄像装置的追踪标志初始为第一追踪标志, 所述第二摄像装置的追踪标志 初始为第二追踪标志; Providing a tracking flag for the first imaging device and the second imaging device, wherein the tracking flag of the first imaging device is initially a first tracking flag, and the tracking flag of the second imaging device is initially a second Tracking mark
所述控制单元具体用于: 在第一讲话者讲话时, 控制具有第一追踪标志的 第一摄像装置去拍摄第一讲话者的视频,在成功获取所述第一讲话者的视频后, 将所述第一摄像装置的追踪标志从所述第一追踪标志设置为所述第二追踪标 志, 同时将所述第二摄像装置的追踪标志从所述第二追踪标志设置为所述第一 追踪标志; The control unit is specifically configured to: when the first speaker speaks, control the first camera with the first tracking flag to capture the video of the first speaker, after successfully acquiring the video of the first speaker, The tracking flag of the first camera device is set from the first tracking flag to the second tracking flag, and the tracking flag of the second camera device is set from the second tracking flag to the first tracking flag Sign
所述控制单元具体用于: 在当前讲话者从所述第一讲话者变更为第二讲话 者时, 控制具有所述第一追踪标志的第二摄像装置去拍摄第二讲话者的视频, 在成功获取所述第二讲话者的视频后, 将所述第二摄像装置的追踪标志从所述 第一追踪标志设置为所述第二追踪标志, 同时将所述第一摄像装置的追踪标志 从所述第二追踪标志设置为所述第一追踪标志。 The control unit is specifically configured to: when the current speaker changes from the first speaker to the second speaker, control the second camera having the first tracking flag to capture the video of the second speaker, After successfully acquiring the video of the second speaker, setting the tracking flag of the second camera device from the first tracking flag to the second tracking flag, and simultaneously tracking the tracking flag of the first camera device The second tracking flag is set as the first tracking flag.
结合第二方面的第八种可能的实现方式, 在第二方面的第九种可能的实现 方式中, 所述控制单元具体用于: 后续每次发生讲话者变更时, 控制具有所述 第一追踪标志的摄像装置去拍摄当前讲话者的视频, 在成功获取当前讲话者的 视频后, 将所述第一摄像装置和所述第二摄像装置的追踪标志互换。 With reference to the eighth possible implementation manner of the second aspect, in a ninth possible implementation manner of the second aspect, the control unit is specifically configured to: when the speaker change occurs each time, the control has the first The camera of the tracking mark captures the video of the current speaker, and after successfully acquiring the video of the current speaker, the tracking marks of the first camera and the second camera are interchanged.
结合第二方面的第九种可能的实现方式, 在第二方面的第十种可能的实现 方式中, 所述控制单元具体用于: With reference to the ninth possible implementation of the second aspect, in a tenth possible implementation manner of the second aspect, the control unit is specifically configured to:
利用声源定位技术, 控制摄像装置拍摄讲话者的视频。 Using the sound source localization technology, the camera is controlled to capture the video of the speaker.
结合第二方面的第十种可能的实现方式, 在第二方面的第十一种可能的实 现方式中, 所述控制单元具体用于: In conjunction with the tenth possible implementation of the second aspect, in the eleventh possible implementation manner of the second aspect, the control unit is specifically configured to:
利用声源定位技术并结合预置位或图像识别技术, 控制摄像装置拍摄讲话 者的视频。 The sound source localization technique is combined with preset position or image recognition technology to control the camera to capture the speaker's video.
结合第二方面或第二方面的第一种至第十一种可能的任一实现方式, 在第 二方面的第十二种可能的实现方式中, 所述控制单元具体用于:
判断所述第二讲话者位置是否在所述第一讲话者的输出画面中; 若所述第二讲话者位置不在所述第一讲话者的输出画面中, 则控制所述第 二摄像装置拍摄所述第二讲话者的视频; With reference to the second aspect, or any one of the first to the eleventh possible implementation manners of the second aspect, in the twelfth possible implementation manner of the second aspect, the control unit is specifically configured to: Determining whether the second speaker position is in an output screen of the first speaker; if the second speaker position is not in an output screen of the first speaker, controlling the second camera to shoot a video of the second speaker;
若所述第二讲话者位置在所述第一讲话者的输出画面中, 则进一步判断所 述第二讲话者位置是否在所述第一讲话者的输出画面的设定区域内; If the second speaker position is in the output screen of the first speaker, further determining whether the second speaker position is within a setting area of the output screen of the first speaker;
若所述第二讲话者位置在所述设定区域内, 则控制所述第一摄像装置拍摄 所述第二讲话者的视频; Controlling, by the first camera device, a video of the second speaker if the second speaker position is within the set area;
若所述第二讲话者位置不在所述设定区域内, 则控制所述第一摄像装置跟 踪拍摄所述第二讲话者, 以使所述第二讲话者位置在所述设定区域内。 And if the second speaker position is not within the set area, controlling the first camera to track the second speaker to position the second speaker in the set area.
采用上述技术方案后, 根据本发明提供的控制视频拍摄的方法和控制视频 拍摄的装置, 当会场中有人交替讲话时, 依次控制所述第一摄像装置和所述第 二摄像装置交替拍摄当前讲话者的视频, 并输出当前讲话者的视频, 这样, 即 使在会场中存在多人快速地交替讲话, 两台摄像装置也能够拍摄多个讲话者的 面部画面, 而且在本发明所提供的技术方案中, 只有在摄像装置成功获取当前 讲话者的视频之后, 才输出当前讲话者的视频, 相对于现有技术中需要在摄像 装置成功获取下一个讲话者的视频之前先切换到会场的全景, 本发明确实能够 减少视频切换次数, 从而使画面衔接紧密, 输出的视频更加流畅。 附图说明 After the above technical solution, the method for controlling video shooting and the device for controlling video shooting according to the present invention, when one of the participants alternately speaks, sequentially controls the first camera and the second camera to alternately capture the current speech. The video of the person, and output the video of the current speaker, so that even if there are multiple people in the venue quickly alternately speaking, the two camera devices can capture the facial images of the plurality of speakers, and the technical solution provided by the present invention The video of the current speaker is output only after the camera device successfully acquires the video of the current speaker. Compared with the prior art, it is required to switch to the panorama of the conference site before the camera device successfully acquires the video of the next speaker. The invention can indeed reduce the number of video switchings, so that the picture is closely connected and the output video is smoother. DRAWINGS
为了更清楚的说明本发明实施例或现有技术中的技术方案, 下面将对实施 例或现有技术中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的 附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创 造性劳动的前提下, 还可以根据这些附图获得其他的附图。 In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art will be briefly described below. Obviously, the drawings in the following description are merely Some embodiments of the present invention may also be used to obtain other drawings based on these drawings without departing from the prior art.
图 1 为本发明控制视频拍摄的方法的一实施例的流程图; 1 is a flow chart of an embodiment of a method for controlling video capture according to the present invention;
图 2A 为变更后讲话者位置在变更前讲话者的输出画面的设定区域内的情 况下, 拍摄变更后讲话者的示意图; 2A is a schematic diagram of the speaker after the change of the shooting, in the case where the position of the speaker after the change is within the setting area of the output screen of the speaker before the change;
图 2B 为变更后讲话者位置在变更前讲话者的输出画面中但不在该画面的 设定区域内的情况下, 拍摄变更后讲话者的示意图;
图 2G为变更后讲话者位置不在变更前讲话者的输出画面中的情况下,拍摄 变更后讲话者的示意图; 2B is a schematic diagram of the speaker after the change in the case where the changed speaker position is in the output screen of the speaker before the change but not in the setting area of the screen; 2G is a schematic diagram of the speaker after the change of the shooting in the case where the speaker position after the change is not in the output screen of the speaker before the change;
图 3A为本发明控制视频拍摄的方法的一具体实施例的流程图; 3A is a flow chart of a specific embodiment of a method for controlling video capture according to the present invention;
图 3B为本发明控制视频拍摄的方法的一具体实施例的另一流程图; 图 4为本发明控制视频拍摄的方法的一具体实施例的示意图; FIG. 3B is another flowchart of a specific embodiment of a method for controlling video capture according to the present invention; FIG. 4 is a schematic diagram of a specific embodiment of a method for controlling video capture according to the present invention;
图 5A为全屏显示时输出摄像机转动 /推拉过程的效果示意图; FIG. 5A is a schematic diagram showing an effect of outputting a camera rotation/push-pull process when displaying in full screen; FIG.
图 5B为全屏显示时不输出摄像机转动 /推拉过程的效果示意图; 图 6为本发明控制视频拍摄的方法的另一具体实施例的流程图; 图 7为本发明控制视频拍摄的方法的另一具体实施例的示意图; 图 8A为以画中画显示时输出摄像机转动 /推拉过程的效果示意图; 图 8B为以画中画显示时不输出摄像机转动 /推拉过程的效果示意图; 图 9为本发明控制视频拍摄的方法的又一具体实施例的流程图; 图 1 0为本发明控制视频拍摄的方法的又一具体实施例的示意图; 图 1 1 A为以双画面显示时输出摄像机转动 /推拉过程的效果示意图; 图 1 1 B为以双画面显示时不输出摄像机转动 /推拉过程的效果示意图; 图 1 2为本发明控制视频拍摄的装置的一实施例的结构框图; 5B is a schematic diagram showing an effect of not outputting a camera rotation/push-pull process when displaying in full screen; FIG. 6 is a flowchart of another embodiment of a method for controlling video shooting according to the present invention; FIG. 7 is another embodiment of a method for controlling video shooting according to the present invention; FIG. 8A is a schematic diagram showing the effect of outputting the camera rotation/push-pull process when the picture-in-picture is displayed; FIG. 8B is a schematic diagram showing the effect of not outputting the camera rotation/push-pull process when the picture-in-picture is displayed; A flowchart of still another embodiment of a method for controlling video capture; FIG. 10 is a schematic diagram of still another embodiment of a method for controlling video capture according to the present invention; FIG. 1 1 A is an output camera rotation/push-pull when displayed in dual screen FIG. 1 is a schematic diagram showing the effect of not outputting the camera rotation/push-pull process when displaying in two pictures; FIG. 1 is a structural block diagram of an embodiment of the apparatus for controlling video shooting according to the present invention;
图 1 3A为本发明控制视频拍摄的装置的另一实施例的结构示意图; 图 1 3B为本发明控制视频拍摄的装置的再一实施例的结构示意图; 图 1 3G为本发明控制视频拍摄的装置的又一实施例的结构示意图。 具体实施方式 FIG. 1 is a schematic structural diagram of another embodiment of an apparatus for controlling video shooting according to the present invention; FIG. 1B is a schematic structural diagram of still another embodiment of an apparatus for controlling video shooting according to the present invention; A schematic structural view of a further embodiment of the device. detailed description
下面结合附图对本发明实施例的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明的一部分实施例, 而不是全部的实施例。 基于本 发明中的实施例, 本领域普通技术人员在没有做出创造性劳动的前提下所获得 的所有其他实施例, 都属于本发明保护的范围。 The technical solutions of the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
图 1 是本发明控制视频拍摄的方法的一实施例的流程图。 本发明实施例提 供的控制视频拍摄的方法可以由具备控制处理功能的一类装置来实施, 所述装 置举例而言可以为摄像机、 视频控制器、 视频终端等。 如图 1 所示, 本发明实
施例提供的控制视频拍摄的方法包括: 1 is a flow chart of an embodiment of a method of controlling video capture of the present invention. The method for controlling video capture provided by the embodiment of the present invention may be implemented by a device having a control processing function, which may be, for example, a camera, a video controller, a video terminal, or the like. As shown in Figure 1, the present invention The methods provided by the embodiment for controlling video shooting include:
S1 1 , 在第一讲话者讲话时, 控制第一摄像装置拍摄第一讲话者的视频。 在本发明实施例中, 设置两组摄像装置: 第一摄像装置和第二摄像装置来 拍摄讲话者的视频。 其中, 所述第一摄像装置可以为一个摄像模块, 所述第二 摄像装置也可以为一个摄像模块。 当然, 在本发明的范围内, 所述第一摄像装 置和所述第二摄像装置也可以分别为多个摄像模块, 多个摄像模块的具体应用 可根据一个摄像模块的应用而类似地得到。 所述第一摄像装置和所述第二摄像 装置可以由连接装置连接固定在一起, 也可以相互独立。 本发明实施例中提及 的摄像装置可以为摄像机或其他具备摄像功能的终端设备。 S1 1 , controlling the first camera to capture the video of the first speaker when the first speaker speaks. In the embodiment of the present invention, two sets of imaging devices are provided: a first imaging device and a second imaging device to capture a video of a speaker. The first imaging device may be a camera module, and the second camera device may also be a camera module. Of course, within the scope of the present invention, the first imaging device and the second imaging device may also be a plurality of camera modules, and the specific applications of the plurality of camera modules may be similarly obtained according to the application of one camera module. The first camera device and the second camera device may be connected and fixed together by a connecting device, or may be independent of each other. The camera device mentioned in the embodiment of the present invention may be a video camera or other terminal device having a camera function.
本发明实施例提供的控制视频拍摄的方法可以应用于视频会议中, 用于拍 摄并输出本地会场中讲话者的视频, 还可以用于将本地会场的画面传送到远端 会场, 以便于远端会场的与会者观看本地会场的情况。 The method for controlling video shooting provided by the embodiment of the present invention can be applied to a video conference for capturing and outputting a video of a speaker in a local conference site, and can also be used for transmitting a video of a local conference site to a remote conference site, so as to be remotely located. Participants at the venue watched the situation at the local venue.
在摄像装置开启后, 视频会议开始时, 若本地会场中尚无人讲话, 可同时 控制第一摄像装置和第二摄像装置均拍摄本地会场的全景。 若预定控制所述第 一摄像装置拍摄会场中第一个出现的讲话者, 优选的是先将第二摄像装置所拍 摄的视频输出至远端会场。 此时, 由于尚无讲话者出现, 远端会场的与会者只 需观看本地会场的全景即可。 当本地会场中有讲话者开始讲话时, 即出现第一 讲话者时, 可立即控制第一摄像装置拍摄第一讲话者的视频; 同时可仍控制第 二摄像装置拍摄本地会场的全景。 After the camera is turned on, when the video conference starts, if there is still no speech in the local conference site, both the first camera device and the second camera device can simultaneously control the panoramic view of the local site. If it is predetermined to control the first camera to shoot the first speaker in the conference site, it is preferable to output the video captured by the second camera to the remote site. At this point, because there are no speakers, the participants at the remote site only need to watch the panorama of the local site. When a speaker in the local conference site starts speaking, that is, when the first speaker appears, the first camera device can be immediately controlled to capture the video of the first speaker; and the second camera device can still be controlled to capture the panorama of the local conference site.
在本发明实施例中, 可以利用声源定位技术确定讲话者的位置。 只利用声 源定位技术可能由于噪声干扰等原因而无法准确获取讲话者的位置, 因此, 进 一步地, 也可以预先设定讲话者在本地会场中讲话时所处的可能位置, 在通过 声源定位技术获取讲话者的位置时, 结合预先设定的可能位置 (即预置位) 进 行判断的准确率更高。 为了更加准确地获取讲话者的位置, 可以结合声源定位 技术和图像识别技术。具体地, 在控制摄像装置(包括第一摄像装置和第二摄像 装置)拍摄讲话者的视频时, 可将多个拾音麦克风组成拾音麦克风阵列, 当第一 讲话者讲话时, 利用所述拾音麦克风阵列拾取本地会场的声音, 经过音频前处
理, 传送给声源定位器。 其中, 所述声源定位器是位于所述具备控制处理功能 的一类装置中具备声源定位功能的模块, 所述拾音麦克风阵列由两个以上的、 分布在本地会场不同位置的拾音麦克风组成。 所述声源定位器接收到所述拾音 麦克风阵列拾取的声音之后对其进行定位处理, 得到第一讲话者的位置信息。 控制器可根据位置信息生成相应的摄像装置控制指令发送给云台, 云台控制所 述第一摄像装置转动到合适的拍摄角度, 以粗略获取所述第一讲话者的视频, 其中, 所述云台用于接收和执行所述控制器发送的摄像装置控制指令。 然后, 结合声源定位得到的位置信息、 预置位信息或图像识别技术 (所述图像识别技 术具体可以为人脸识别, 人脸检测, 唇动检测等), 得到所述第一讲话者的更加 准确的位置信息, 生成新的控制指令发送给云台, 控制所述第一摄像装置转动 / 推拉摄像头, 按照需求获取所述第一讲话者大小合适的画面, 例如可以使所述 第一讲话者的面部占据整个画面的 1 /2、 1 /3或 1 /4等。 In an embodiment of the invention, the position of the speaker can be determined using sound source localization techniques. Only the sound source localization technology may not accurately acquire the position of the speaker due to noise interference or the like. Therefore, it is also possible to pre-set the possible position of the speaker when speaking in the local venue, and locate the sound source. When the technology acquires the position of the speaker, the accuracy of the judgment in combination with the preset possible position (ie, the preset position) is higher. In order to obtain the position of the speaker more accurately, sound source localization technology and image recognition technology can be combined. Specifically, when the camera device (including the first camera device and the second camera device) is controlled to capture the video of the speaker, the plurality of sound pickup microphones may be formed into a sound pickup microphone array, and when the first speaker speaks, the The pickup microphone array picks up the sound of the local venue, passing the audio front Handle to the sound source locator. The sound source locator is a module having a sound source localization function in the type of device having the control processing function, and the sound pickup microphone array has two or more sound pickups distributed at different positions of the local venue. The microphone is composed. The sound source locator receives the sound picked up by the sound collecting microphone array and performs positioning processing thereof to obtain position information of the first speaker. The controller may send a corresponding camera control command to the pan/tilt according to the position information, and the pan/tilt control the first camera to rotate to a suitable shooting angle to roughly obtain the video of the first speaker, where The pan/tilt is used to receive and execute camera control commands sent by the controller. Then, combining the position information obtained by the sound source positioning, the preset position information or the image recognition technology (the image recognition technology may specifically be face recognition, face detection, lip motion detection, etc.), and the first speaker is further obtained. Accurate position information, generating a new control command to send to the pan/tilt, controlling the first camera to rotate/push and pull the camera, acquiring a picture of the first speaker size according to requirements, for example, the first speaker may be The face occupies 1 / 2, 1 / 3 or 1 / 4 of the entire picture.
由于声源定位技术的精度不高或容易受到噪声干扰而造成定位不准, 本发 明实施例利用声源定位技术结合预置位或图像识别技术, 能够准确确定讲话者 的位置, 进而控制摄像装置进行拍摄。 需要说明的是, 本发明中可以根据实际 情况只使用声源定位技术, 或使用声源定位技术结合预置位, 或使用声源定位 技术集合预置位, 还可以使用声源定位技术同时结合预置位和图像识别技术。 Because the accuracy of the sound source localization technology is not high or the noise is easily disturbed, the embodiment of the invention uses the sound source localization technology combined with the preset position or the image recognition technology to accurately determine the position of the speaker, and then control the camera device. Take a picture. It should be noted that, in the present invention, only the sound source localization technology may be used according to actual conditions, or the sound source localization technology may be used in combination with the preset position, or the sound source localization technology may be used to collect the preset position, and the sound source localization technology may also be combined at the same time. Preset and image recognition technology.
S1 2, 在当前讲话者从所述第一讲话者变更为第二讲话者时, 控制第二摄像 装置拍摄第二讲话者的视频, 其中, 所述第二讲话者为与所述第一讲话者位置 不同的下一个讲话者。 S1 2, when the current speaker changes from the first speaker to the second speaker, controlling the second camera to capture a video of the second speaker, wherein the second speaker is speaking with the first speaker The next speaker with a different location.
当前讲话者是指本地会场中当前正在讲话的人, 在步骤 S1 1、 S1 2中, 当前 讲话者分别是所述第一讲话者和所述第二讲话者。 需要说明的是, 在讲话者位 置发生变更后且摄像装置成功获取变更后讲话者的视频之前, 虽然所述摄像装 置尚未成功获取所述变更后讲话者的视频, 但是, 在此过程中, 当前讲话者已 经是所述变更后讲话者。 The current speaker refers to the person currently speaking in the local venue, and in steps S1 1, S1 2, the current speaker is the first speaker and the second speaker, respectively. It should be noted that, after the change of the speaker position and the camera successfully obtains the video of the speaker after the change, although the camera device has not successfully acquired the video of the speaker after the change, in the process, the current The speaker is already the speaker after the change.
与所述控制第一摄像装置拍摄第一讲话者的视频类似, 可以先根据声源定 位技术识别出讲话者的位置发生变更, 即讲话者从所述第一讲话者变更为位置
不同于所述第一讲话者的所述第二讲话者, 进而控制第二摄像装置转动 /推拉 到合适的拍摄角度和拍摄大小。 然后, 像步骤 S1 1 那样地, 结合预置位或图 像识别技术, 按照需求进一步控制所述第二摄像装置转动 /推拉摄像头, 拍摄所 述第二讲话者大小合适的视频。 Similar to the control of the first camera to capture the video of the first speaker, the position of the speaker may be changed according to the sound source localization technique, that is, the speaker changes from the first speaker to the position. The second speaker different from the first speaker controls the second camera to rotate/push and pull to a suitable shooting angle and shooting size. Then, as in step S1 1 , in combination with the preset position or image recognition technology, the second camera device is further controlled to rotate/push and pull the camera as needed, and the video of the second speaker size is captured.
需要说明的是, 若讲话者只是稍微移动, 例如只移动了一、 两个身位的距 离, 可以认为讲话者的位置没有发生变更, 不需要切换摄像装置, 而且, 只要 讲话者仍然处于拍摄画面的设定区域内, 如占整个画面 80%的中心区域内, 摄 像装置也不需要转动 /推拉摄像头进行跟踪。若讲话者发生了走动, 只要讲话者 仍然处于拍摄画面的设定区域内, 可以认为讲话者的位置没有发生变更, 不需 要切换摄像装置, 摄像装置也不需要转动 /推拉摄像头进行跟踪。若讲话者变更 为另一个讲话者, 但是, 两个讲话者只是在同一位置上发生讲话交替, 或者, 两个讲话者的距离很近, 同处于一拍摄装置拍摄画面的设定区域内, 则可以认 为讲话者的位置没有发生变更, 不需要切换摄像装置, 摄像装置也不需要转动 / 推拉摄像头进行跟踪 (参照图 2 A, 实线表示拍摄画面, 虚线表示设定区域)。 不论是同一讲话者还是不同讲话者, 若讲话者位置在输出画面中但不在设定区 域内, 则不需要切换摄像装置, 但是可以轻微地转动 /推拉摄像头, 使得变更后 的讲话者处于画面的中部 (参照图 2 B )。 在下文的说明中, 除特别说明外, 讲 话者的变更或讲话者的位置变更均指讲话者的位置发生变更, 而且变更后的位 置与拍摄画面中心位置之间的距离达到了需要切换摄像装置的程度, 所述程度 可以根据实际的具体场景设定 (参照图 2 C )。 It should be noted that if the speaker only moves a little, for example, only moves one or two positions, it can be considered that the position of the speaker has not changed, and it is not necessary to switch the camera, and as long as the speaker is still in the shooting screen. In the setting area, for example, in the center area of 80% of the entire screen, the camera does not need to rotate/push and pull the camera for tracking. If the speaker has moved, as long as the speaker is still in the setting area of the shooting screen, it can be considered that the position of the speaker has not changed, there is no need to switch the camera, and the camera does not need to rotate/push the camera for tracking. If the speaker changes to another speaker, but the two speakers only alternate between the same position, or if the distance between the two speakers is very close, and is within the setting area of a camera shooting picture, It can be considered that the position of the speaker has not changed, and it is not necessary to switch the imaging device, and the imaging device does not need to rotate/push the camera for tracking (see FIG. 2A, the solid line indicates the imaging screen, and the broken line indicates the setting region). Whether the same speaker or a different speaker, if the speaker position is in the output screen but not in the setting area, there is no need to switch the camera, but the camera can be slightly rotated/pushed, so that the changed speaker is in the picture. Middle (see Figure 2 B). In the following description, unless otherwise specified, the change of the speaker or the change of the position of the speaker refers to the change of the position of the speaker, and the distance between the changed position and the center position of the shooting screen reaches the need to switch the camera. The degree can be set according to the actual specific scene (refer to Figure 2 C).
S13, 当后续再发生讲话者变更时, 依次控制所述第一摄像装置和所述第二 摄像装置交替拍摄当前讲话者的视频。 S13, when the speaker change occurs later, the first camera and the second camera are sequentially controlled to alternately capture the video of the current speaker.
具体而言, 当后续讲话者从所述第二讲话者变更为所述第二讲话者的下一 个讲话者-第三讲话者时, 控制所述第一摄像装置拍摄所述第三讲话者的视频。 若之后再发生讲话者变更, 即讲话者从所述第三讲话者变更为所述第三讲话者 的下一个讲话者—第四讲话者,则控制所述第二摄像装置拍摄所述第四讲话者的 视频。 如此反复, 保证所述第一摄像装置和所述第二摄像装置交替拍摄当前讲
话者的视频。 Specifically, when the subsequent speaker changes from the second speaker to the next speaker-third speaker of the second speaker, controlling the first camera to capture the third speaker video. If the speaker change occurs later, that is, the speaker changes from the third speaker to the next speaker of the third speaker—the fourth speaker, the second camera is controlled to capture the fourth The speaker's video. Repeatedly, ensuring that the first camera device and the second camera device alternately shoot the current lecture The video of the speaker.
举例而言,若本地会场中存在甲、 乙、 丙、 丁四位讲话者, 甲最先开始讲话, 则先控制第一摄像装置拍摄甲; 讲话者由甲变更为乙时, 则控制第二摄像装置 拍摄乙; 之后讲话者由乙变更为丙时, 则再次控制第一摄像装置拍摄丙; 当讲 话者再由丙变更为丁时, 则再次控制第二摄像装置拍摄丁, 如此反复。 For example, if there are four speakers in A, B, C, and D in the local venue, A first starts to talk, then controls the first camera to shoot A; when the speaker changes from A to B, then controls the second. The imaging device takes a picture B; after the speaker changes from B to C, the first camera device is again controlled to capture C; and when the speaker is changed from C to C, the second camera device is again controlled to capture D, and so on.
当会场中多人快速地交替讲话时, 现有技术用于拍摄讲话者视频的摄像装 置拍摄的画面会将多个讲话者包含进来, 若所述多个讲话者距离较远, 则无法 在所拍摄的画面中观察所述多个讲话者的表情, 导致会议有价值的信息损失。 本发明则不然, 第一摄像装置和第二摄像装置均可以追踪讲话者, 其中, 在一 摄像装置追踪当前讲话者时, 另一摄像装置追踪变更后的讲话者。 如此, 可以 保证第一摄像装置和第二摄像装置相互配合、 无缝对接: 在第一摄像装置拍摄 当前讲话者时, 利用第二摄像装置拍摄所述当前讲话者的下一个讲话者; 在第 二摄像装置拍摄当前讲话者时, 利用第一摄像装置拍摄所述当前讲话者的下一 个讲话者。 尤其是当本地会场中只有甲、 乙两位讲话者时, 第一摄像装置可以 保持跟踪拍摄甲, 第二摄像装置可以保持跟踪拍摄乙, 若讲话者交替讲话, 则 由于第一摄像装置和第二摄像装置都分别已经调整好了焦距,因而省去了转动 / 推拉摄像头的过程。 这样, 即使在会场中存在讲话者快速地交替讲话, 两摄像 装置也能够交替拍摄讲话者的面部画面, 更多地保留会议有价值的信息, 而且 视频追踪的效率也得到提高。 When a plurality of people in the venue quickly talk alternately, the picture taken by the camera of the prior art for taking a video of the speaker will include a plurality of speakers, and if the plurality of speakers are far away, they cannot Observing the expressions of the plurality of speakers in the captured picture results in loss of valuable information of the meeting. In the present invention, the first camera device and the second camera device can both track the speaker, wherein when one camera device tracks the current speaker, the other camera device tracks the changed speaker. In this way, it can be ensured that the first camera device and the second camera device cooperate with each other and seamlessly: when the first camera device captures the current speaker, the second speaker device is used to capture the next speaker of the current speaker; When the second camera captures the current speaker, the next speaker of the current speaker is captured by the first camera. Especially when there are only two speakers in A and B in the local venue, the first camera can keep track of shooting A, and the second camera can keep tracking B. If the speaker alternates, the first camera and the first camera The two camera units have respectively adjusted the focal length, thus eliminating the need to rotate/push the camera. Thus, even if there is a speaker in the venue who alternately speaks quickly, the two cameras can alternately capture the speaker's face picture, retain more valuable information of the conference, and the efficiency of video tracking is also improved.
S14,在成功获取所述当前讲话者的视频之后,输出所述当前讲话者的视频。 具体而言, 拍摄所述当前讲话者的摄像装置在成功获取到所述当前讲话者 的视频之后, 输出所述当前讲话者的视频, 所述输出所述当前讲话者的视频包 括在所述摄像装置的显示屏或本地会场的显示屏中以不同的方式 (即全屏、 画 中画、 双画面等) 进行输出, 也包括以不同的方式输出到远端会场。 需要说明 的是, 本发明对于在本地会场中所拍摄的视频通过何种方式 (例如编码、 解码 等) 传送到远端会场不做限定。 在传送到远端会场的过程中, 例如可以将所述 当前讲话者的视频发送给视频信号处理器, 视频信号处理器收到所述当前讲话
者的视频之后, 进行编码压縮等处理, 然后将所述编码压縮后得到的码流通过 网络传送到远端会场; 远端会场接收到所述码流后, 进行解码等处理, 得到所 述当前讲话者的视频,然后可以以不同的方式在远端会场的显示屏上进行显示。 这样, 远端会场的与会者就可以在所述显示屏上观看本地会场的画面。 S14. After successfully acquiring the video of the current speaker, output the video of the current speaker. Specifically, the camera that captures the current speaker outputs the video of the current speaker after successfully acquiring the video of the current speaker, and the video outputting the current speaker is included in the camera. The display of the device or the display of the local site is output in different ways (ie, full screen, picture-in-picture, dual-picture, etc.), and is also output to the remote site in different ways. It should be noted that the present invention does not limit the manner in which the video captured in the local site is transmitted to the remote site by means of encoding (such as encoding, decoding, etc.). In the process of transmitting to the remote site, for example, the video of the current speaker can be sent to the video signal processor, and the video signal processor receives the current speech. After the video of the user, processing such as encoding and compression is performed, and then the code stream obtained by the encoding and compression is transmitted to the remote site through the network; after receiving the code stream, the remote site performs decoding and the like, and obtains a solution. The current speaker's video can then be displayed on the display of the remote site in different ways. In this way, participants at the remote site can view the local site on the display.
在讲话者发生变更时, 摄像装置获取变更后讲话者视频的过程需要一定的 时间。 在这期间, 现有技术会先将画面切换到会场的全景, 待摄像装置成功获 取变更后的讲话者的视频时, 才将画面切换到变更后的讲话者, 这样会导致视 频不流畅。 在本发明实施例中, 在步骤 S14成功获取所述当前讲话者的视频之 前, 本发明实施例提供的控制视频拍摄的方法还可包括: 输出所述当前讲话者 的前一个讲话者的视频。 即, 在成功获取所述当前讲话者的视频之前, 输出所 述当前讲话者的前一个讲话者的视频;在成功获取所述当前讲话者的视频之后, 输出所述当前讲话者的视频。 这样, 在全屏输出画面时, 不仅可以保证输出画 面连续, 而且还可以保证输出画面^量较高, 避免摄像装置在获取所述当前讲 话者的视频的过程中, 因摄像装置转动 /推拉摄像头而导致输出的画面出现模 糊、 晃动等现象。 When the speaker changes, it takes a certain time for the camera to acquire the speaker video after the change. During this period, the prior art will switch the screen to the panoramic view of the conference site. When the camera successfully obtains the video of the changed speaker, the screen is switched to the changed speaker, which may result in the video not being smooth. In the embodiment of the present invention, before the video of the current speaker is successfully acquired in step S14, the method for controlling video capture provided by the embodiment of the present invention may further include: outputting a video of a previous speaker of the current speaker. That is, before successfully acquiring the video of the current speaker, the video of the previous speaker of the current speaker is output; after the video of the current speaker is successfully acquired, the video of the current speaker is output. In this way, when the screen is output in full screen, not only can the output screen be continuous, but also the output image can be ensured to be high, and the camera device can be prevented from rotating/pushing and pulling the camera during the process of acquiring the current speaker's video. The picture that causes the output appears blurry, swaying, and so on.
当然, 在本发明实施例中, 在输出所述本地会场的画面时, 不仅可以全屏 输出, 而且还可以以画中画、 双画面等形式进行输出。 当采用画中画的形式输 出时, 在成功获取所述当前讲话者的视频之后, 可以在大画面(第一画面)中输 出所述当前讲话者, 而在小画面(第二画面)中输出所述当前讲话者的前一个讲 话者。 当采用双画面形式输出时, 在成功获取所述当前讲话者的视频之后, 可 以在互不包含的两部分画面的其中一部分画面中输出所述当前讲话者, 而在另 一部分画面中输出所述当前讲话者的前一个讲话者。 关于这些输出形式的具体 实现方式将会在后面的具体实施例中分别介绍。 Of course, in the embodiment of the present invention, when the screen of the local site is output, not only full-screen output but also a picture-in-picture, dual-picture, or the like may be output. When outputting in the form of picture-in-picture, after successfully acquiring the video of the current speaker, the current speaker may be output in a large picture (first picture) and output in a small picture (second picture) The previous speaker of the current speaker. When the output of the two-picture form is adopted, after the video of the current speaker is successfully acquired, the current speaker may be outputted in a part of the pictures of the two-part picture not included, and the other part of the picture is outputted. The former speaker of the current speaker. The specific implementation of these output forms will be separately described in the following specific embodiments.
进一步地, 在本发明实施例中, 为了便于控制两摄像装置轮流拍摄当前讲 话者和输出所述当前讲话者的视频, 可以在开始拍摄之前分别为两摄像装置设 置追踪标志, 例如可以为所述第一摄像装置和所述第二摄像装置分别设置初始 的追踪标志为第一追踪标志和第二追踪标志, 所述追踪标志可以使用 0或 1 等
数字来表示。 可以设置追踪标志为第一追踪标志的摄像装置专门用于拍摄当前 讲话者的视频, 设置追踪标志为第二追踪标志的摄像装置专门用于拍摄所述当 前讲话者的下一个讲话者 (或前一个讲话者) 的视频。 而且, 在成功获取所述 当前讲话者的视频后, 所述第一摄像装置和所述第二摄像装置的追踪标志需要 互换。 Further, in the embodiment of the present invention, in order to facilitate controlling the two camera devices to take the current speaker and output the video of the current speaker in turn, the tracking flag may be separately set for the two camera devices before starting the shooting, for example, the The first camera device and the second camera device respectively set an initial tracking flag as a first tracking flag and a second tracking flag, and the tracking flag may use 0 or 1 or the like. The number is used to indicate. The camera device with the tracking mark as the first tracking mark may be set to specifically capture the video of the current speaker, and the camera device with the tracking mark as the second tracking mark is specifically used to capture the next speaker of the current speaker (or the former A video of a speaker). Moreover, after successfully acquiring the video of the current speaker, the tracking flags of the first camera device and the second camera device need to be interchanged.
在为第一摄像装置和第二摄像装置设置追踪标志的情况下, 步骤 S1 1在第 一讲话者讲话时, 控制第一摄像装置拍摄第一讲话者的视频可包括: 在第一讲 话者讲话时,控制具有第一追踪标志的第一摄像装置去拍摄第一讲话者的视频, 在成功获取所述第一讲话者的视频后, 将所述第一摄像装置的追踪标志从所述 第一追踪标志设置为所述第二追踪标志, 同时将所述第二摄像装置的追踪标志 从所述第二追踪标志设置为所述第一追踪标志。 In the case that the tracking icon is set for the first camera device and the second camera device, step S1 1 when the first speaker speaks, controlling the first camera to capture the video of the first speaker may include: speaking at the first speaker Controlling, by the first camera having the first tracking flag, the video of the first speaker, and after successfully acquiring the video of the first speaker, the tracking flag of the first camera is from the first The tracking flag is set to the second tracking flag, and the tracking flag of the second camera device is set from the second tracking flag to the first tracking flag.
步骤 S1 2在当前讲话者从所述第一讲话者变更为第二讲话者时, 控制第二 摄像装置拍摄第二讲话者的视频可包括: 在当前讲话者从所述第一讲话者变更 为第二讲话者时, 控制具有所述第一追踪标志的第二摄像装置去拍摄第二讲话 者的视频, 在成功获取所述第二讲话者的视频后, 将所述第二摄像装置的追踪 标志从所述第一追踪标志设置为所述第二追踪标志, 同时将所述第一摄像装置 的追踪标志从所述第二追踪标志设置为所述第一追踪标志。 Step S1 2: When the current speaker changes from the first speaker to the second speaker, controlling the second camera to capture the video of the second speaker may include: changing the current speaker from the first speaker to a second speaker, controlling a second camera having the first tracking flag to capture a video of the second speaker, and after successfully acquiring the video of the second speaker, tracking the second camera The flag is set from the first tracking flag to the second tracking flag, and the tracking flag of the first camera is set from the second tracking flag to the first tracking flag.
步骤 S1 3当后续再发生讲话者变更时, 依次控制所述第一摄像装置和所述 第二摄像装置交替拍摄当前讲话者的视频可包括:后续每次发生讲话者变更时, 控制具有所述第一追踪标志的摄像装置去拍摄当前讲话者的视频, 在成功获取 当前讲话者的视频后, 将所述第一摄像装置和所述第二摄像装置的追踪标志互 换。 这样, 能够保证两摄像装置相互配合、 无缝对接, 交替拍摄所述当前讲话 者的视频。 Step S1 3, when the speaker change is subsequently performed, sequentially controlling the first camera and the second camera to alternately capture the video of the current speaker may include: each time a speaker change occurs, the control has the The camera of the first tracking mark captures the video of the current speaker, and after successfully acquiring the video of the current speaker, the tracking marks of the first camera and the second camera are interchanged. In this way, it is possible to ensure that the two cameras cooperate with each other, seamlessly dock, and alternately capture the video of the current speaker.
本发明实施例中, 第一摄像装置和第二摄像装置均可以追踪讲话者。 在第 一讲话者讲话时, 控制所述第一摄像装置拍摄所述第一讲话者, 与此同时, 所 述第二摄像装置正处于准备跟踪拍摄所述第一讲话者的下一个讲话者的待命状 态。 在当前讲话者由所述第一讲话者变更为第二讲话者 (即与所述第一讲话者
位置不同的下一个讲话者) 时, 则控制所述第二摄像装置拍摄所述第二讲话者, 与此同时, 所述第一摄像装置保持拍摄所述第一讲话者, 并转变为准备跟踪拍 摄与所述第二讲话者位置不同的下一个讲话者的状态。 这样, 可以保证第一摄 像装置和第二摄像装置可以相互配合、 无缝对接。 由于在讲话者发生变更时, 摄像装置成功获取变更后的讲话者的视频的过程需要一定的时间。 在这期间, 现有技术由于采用一摄像装置专门用于拍摄本地会场的全景, 另一摄像装置专 门用于跟踪拍摄讲话者, 因此, 在所述专门用于跟踪拍摄讲话者的摄像装置成 功获取当前讲话者的视频之前, 需要先将画面切换到会场的全景, 待摄像装置 成功获取当前讲话者的视频时, 才将画面切换到变更后的讲话者, 这样会导致 视频不流畅。 而本发明所提供的技术方案中, 只有在摄像装置成功获取当前讲 话者的视频之后, 才输出所述当前讲话者的视频, 在摄像装置成功获取当前讲 话者的视频之前, 保持输出所述当前讲话者的前一个讲话者的视频。 这样, 相 对于现有技术需要在摄像装置成功获取下一个讲话者的视频之前先切换到本地 会场的全景, 本发明确实能够减少视频切换次数, 从而使画面衔接紧密, 输出 的视频更加流畅。 而且, 当本地会场中多人快速地交替讲话时, 根据现有技术 拍摄的画面会将多个讲话者包含进来, 若所述多个讲话者距离较远, 则无法在 所拍摄的画面中观察所述多个讲话者的表情。 在本发明中, 由于所述第一摄像 装置和所述第二摄像装置的相互配合, 即使在本地会场中存在讲话者快速地交 替讲话, 两摄像装置也能够交替拍摄讲话者的面部画面。 In the embodiment of the present invention, both the first camera device and the second camera device can track the speaker. Controlling, by the first speaker, the first camera to capture the first speaker, while the second camera is in preparation for tracking the next speaker of the first speaker Standby state. Changing the current speaker from the first speaker to the second speaker (ie, with the first speaker) Controlling the second camera to capture the second speaker while the second camera is shooting, while the first camera keeps capturing the first speaker and transitioning to preparation for tracking A state of the next speaker different from the second speaker position is taken. In this way, it can be ensured that the first camera device and the second camera device can cooperate with each other and seamlessly dock. It takes a certain time for the camera to successfully acquire the video of the changed speaker when the speaker changes. During this period, the prior art uses a camera device exclusively for capturing the panoramic view of the local venue, and the other camera device is specifically used for tracking the shooting of the speaker, and therefore, the camera device dedicated to tracking the speaker is successfully acquired. Before the current speaker's video, the screen needs to be switched to the panorama of the venue. When the camera successfully acquires the current speaker's video, the screen is switched to the changed speaker, which will result in the video not being smooth. In the technical solution provided by the present invention, the video of the current speaker is output only after the camera device successfully acquires the video of the current speaker, and the current output is kept before the camera device successfully acquires the video of the current speaker. A video of the speaker's previous speaker. In this way, compared with the prior art, it is required to switch to the panorama of the local conference site before the camera device successfully acquires the video of the next speaker. The present invention can indeed reduce the number of video switching, thereby making the screen tightly connected and the output video more smooth. Moreover, when a plurality of people in the local venue quickly talk alternately, the picture taken according to the prior art includes a plurality of speakers, and if the plurality of speakers are far apart, the picture cannot be observed in the captured picture. The expression of the plurality of speakers. In the present invention, due to the cooperation of the first image pickup device and the second image pickup device, even if there is a speaker who alternately speaks in the local venue, the two camera devices can alternately capture the face image of the speaker.
为更好地理解本发明, 下面参照图 3A至图 1 0, 再以几个具体实施例为例 来对本发明进行进一步说明。 亦须注意, 以下所列举的实施例只是本发明的一 部分实施例, 本领域技术人员由本发明所述内容, 可易于想到其他实施例, 它 们都在本发明的范围内。 For a better understanding of the present invention, the present invention will be further described with reference to Figures 3A through 10, and several specific embodiments. It is also to be noted that the embodiments set forth below are only a part of the embodiments of the present invention, and those skilled in the art can readily contemplate other embodiments, which are within the scope of the present invention.
在以下的具体实施例中, 可以利用追踪标志对摄像装置进行标记, 并输出 指定追踪标志的摄像装置所拍摄的视频。 例如, 可以将第一摄像装置的初始的 追踪标志设置为 0 (即第一追踪标志), 第二摄像装置的初始的追踪标志设置为 1 (即第二追踪标志), 其中, 追踪标志为 0的摄像装置用于拍摄当前讲话者的
视频; 追踪标志为 1 的摄像装置用于拍摄当前讲话者的下一讲话者的视频, 下 文中为了简便均以此为例进行说明。 当然, 将第一摄像装置的追踪标志设置为In the following specific embodiments, the imaging device may be marked with a tracking mark and the video captured by the imaging device that specifies the tracking flag may be output. For example, the initial tracking flag of the first camera device may be set to 0 (ie, the first tracking flag), and the initial tracking flag of the second camera device is set to 1 (ie, the second tracking flag), wherein the tracking flag is 0. Camera for shooting the current speaker Video; The camera with the tracking flag of 1 is used to capture the video of the next speaker of the current speaker, which will be described below for convenience. Of course, the tracking flag of the first camera is set to
1, 第二摄像装置的追踪标志设置为 0, 或者其他设置追踪标志的方式也是可以 的, 本发明对此不作限定。 1. The tracking flag of the second camera is set to 0, or other manners of setting the tracking flag are also possible, which is not limited by the present invention.
图 3A是本发明控制视频拍摄的方法的一具体实施例的流程图。 图 3B为本 发明控制视频拍摄的方法的一具体实施例的另一流程图。 3A is a flow chart of a specific embodiment of a method of controlling video capture in accordance with the present invention. Figure 3B is another flow diagram of one embodiment of a method of controlling video capture in accordance with the present invention.
如图 3A所示, 以摄像装置是摄像机为例, 本发明具体实施例提供的控制视 频拍摄的方法包括: As shown in FIG. 3A, the method for controlling video shooting provided by the specific embodiment of the present invention includes:
531 , 会议开始时, 控制两台摄像机拍摄本地会场的全景。 531. At the beginning of the meeting, control two cameras to take a panoramic view of the local venue.
在所述两台摄像机 (第一摄像机和第二摄像机)开启后, 即在会议开始时, 本地会场还尚无人讲话, 为了将本地会场的布置情况传送到远端会场, 可以控 制所述两台摄像机拍摄本地会场的全景, 拍摄的角度和大小可由用户设置, 优 选的设置可以是能够包含所有与会者和主要会议场景的设置。 将摄像机拍摄的 画面从本地会场传送到远端会场时, 由于此时两台摄像机拍摄的均为本地会场 的全景, 因而可以传送任意一台摄像机拍摄的画面, 优选的是先传送追踪标志 为 1 的摄像机 (即第二摄像机) 拍摄的画面。 After the two cameras (the first camera and the second camera) are turned on, that is, at the beginning of the conference, the local site has not yet spoken. In order to transmit the layout of the local site to the remote site, the two can be controlled. The camera captures the panorama of the local site. The angle and size of the camera can be set by the user. The preferred setting can be the setting that can include all participants and the main conference scene. When the camera captures the image from the local site to the remote site, since both cameras capture the panorama of the local site, you can transfer the image captured by any camera. The first transmission tracking flag is 1 The camera (ie the second camera) captures the picture.
532, 利用声源定位技术, 控制第一摄像机拍摄所述第一讲话者的视频。 在控制所述两台摄像机拍摄会场的全景之后, 当会场中有一个人开始讲话 时, 即出现第一讲话者时, 拾音麦克风阵列拾取本地会场的声音, 并将所述声 音发送给声源定位器,所述声源定位器根据声源定位技术产生讲话者位置信息。 然后, 控制器根据所述位置信息控制追踪标志为 0的摄像机拍摄所述第一讲话 者大小合适的视频。 所述追踪标志为 0的摄像机 (即第一摄像机) 拍摄到所述 第一讲话者大小合适的视频之后, 其追踪标志置为 1, 另一台摄像机 (即第二 摄像机) 的追踪标志由 1 置为 0。 532. Control the first camera to capture the video of the first speaker by using sound source localization technology. After controlling the panoramic view of the two cameras to shoot the conference site, when a person in the conference starts speaking, when the first speaker appears, the sound collection microphone array picks up the sound of the local conference site, and sends the sound to the sound source to locate the sound. The sound source locator generates speaker position information according to a sound source localization technique. Then, the controller controls the camera with the tracking flag of 0 to capture the video of the first speaker size according to the position information. After the camera with the tracking flag of 0 (ie, the first camera) captures the video of the first speaker size, the tracking flag is set to 1, and the tracking flag of the other camera (ie, the second camera) is set to 1. Set to 0.
533, 在当前讲话者从所述第一讲话者变更为第二讲话者时, 控制所述第二 摄像机拍摄所述第二讲话者的视频, 其中, 所述第二讲话者为与所述第一讲话 者位置不同的下一个讲话者。
当所述第一摄像机拍摄到所述第一讲话者大小合适的视频之后, 所述第一 摄像机的追踪标志变为了 1, 所述第二摄像机的追踪标志变为了 0。 之后, 如果 讲话者的位置发生变化, 即由所述第一讲话者变更为与所述第一讲话者位置不 同的所述第二讲话者, 控制器可以控制所述追踪标志为 0的摄像机 (即所述第 二摄像机) 去拍摄所述第二讲话者的视频, 控制拍摄的方法同 S32。 当所述追 踪标志为 0的摄像机拍摄到所述第二讲话者大小合适的视频之后, 其追踪标志 置为 1, 另一台摄像机的追踪标志则由 1 置为 0。 533. Control, when the current speaker changes from the first speaker to the second speaker, the second camera to capture a video of the second speaker, where the second speaker is The next speaker with a different speaker position. After the first camera captures the video of the first speaker size, the tracking flag of the first camera becomes 1, and the tracking flag of the second camera becomes 0. Thereafter, if the position of the speaker changes, that is, the first speaker changes to the second speaker different from the first speaker position, the controller may control the camera with the tracking flag of 0 ( That is, the second camera) captures the video of the second speaker, and the method of controlling the shooting is the same as S32. When the camera with the tracking flag of 0 captures the video of the second speaker size, its tracking flag is set to 1, and the tracking flag of the other camera is set to 0 by 1.
534, 当后续再发生讲话者变更时, 依次控制所述第一摄像机和所述第二摄 像机交替拍摄当前讲话者的视频。 534. When the speaker change occurs subsequently, the first camera and the second camera are sequentially controlled to alternately capture the video of the current speaker.
当所述第二摄像机拍摄到所述第二讲话者大小合适的视频之后, 所述第二 摄像机的追踪标志变为了 1, 所述第一摄像机的追踪标志变为了 0。 之后, 如果 讲话者再由所述第二讲话者变更为第三讲话者 (即所述第二讲话者的下一个讲 话者), 则控制追踪标志为 0的摄像机(即所述第一摄像机)去拍摄第三讲话者, 当所述追踪标志为 0的摄像机成功获取所述第三讲话者的视频之后, 所述追踪 标志为 0的摄像机的追踪标志由 0置为 1,另一台摄像机的追踪标志由 1置为 0。 类似地, 当讲话者由所述第三讲话者变更为第四讲话者 (所述第三讲话者的下 一个讲话者) 时, 再控制追踪标志为 0的摄像机 (即所述第二摄像机) 去拍摄 所述第四讲话者, 当所述追踪标志为 0的摄像机成功获取所述第四讲话者的视 频之后, 所述追踪标志为 0的摄像机的追踪标志由 0置为 1, 另一台摄像机的 追踪标志由 1 置为 0。 这样, 每次发生讲话者变更时, 均控制追踪标志为 0的 摄像机 (具体可能为第一摄像机或第二摄像机) 去跟踪拍摄变更后的讲话者, 而且, 该摄像机成功地获取讲话者的视频之后, 其追踪标志均由 0置为 1, 另 一台摄像机的追踪标志则由 1 置为 0。 After the second camera captures the video of the second speaker size, the tracking flag of the second camera becomes 1, and the tracking flag of the first camera becomes 0. Thereafter, if the speaker is further changed by the second speaker to the third speaker (ie, the next speaker of the second speaker), the camera whose tracking flag is 0 (ie, the first camera) is controlled. Going to the third speaker, after the camera with the tracking flag of 0 successfully acquires the video of the third speaker, the tracking flag of the camera with the tracking flag of 0 is set to 0 by 0, and the camera of the other camera The tracking flag is set to 0 by 1. Similarly, when the speaker is changed from the third speaker to the fourth speaker (the next speaker of the third speaker), the camera whose tracking flag is 0 (ie, the second camera) is controlled. Going to the fourth speaker, after the camera with the tracking flag of 0 successfully acquires the video of the fourth speaker, the tracking flag of the camera with the tracking flag of 0 is set to 0 by 1 and the other The tracking flag of the camera is set to 0 by 1. In this way, each time the speaker changes, the camera with the tracking flag of 0 (specifically the first camera or the second camera) is controlled to track the speaker after the change, and the camera successfully acquires the video of the speaker. After that, its tracking flag is set to 0 by 0, and the tracking flag of the other camera is set to 0 by 1.
535 , 在拍摄所述当前讲话者视频的摄像机成功获取当前讲话者的视频之 后, 全屏输出所述当前讲话者的视频。 535. After the camera that captures the current speaker video successfully acquires the video of the current speaker, the video of the current speaker is output in full screen.
当标识为 0 的摄像机成功获取当前讲话者的视频之后, 所述追踪标志为 0 的摄像机的追踪标志由 0置为 1,另一台摄像机的追踪标志则由 1 置为 0。所以,
变更后追踪标志为 1 的摄像机拍摄的视频即为所述当前讲话者的视频。 在此, 所述全屏输出所述当前讲话者的视频是指输出的视频来自一个摄像机。 全屏显 示的画面中可以只显示一个讲话者, 也可以显示多个讲话者。 其中, 多个讲话 者的距离较近, 使得根据拍摄出来的视频能够观察到每个讲话者的肢体语言或 面部信息。 参照步骤 S1 2, 若多个讲话者距离较远以至无法在同一个摄像机拍 摄的视频观察每一个讲话者, 则可以认为讲话者的位置发生了变更, 可以使用 另一台摄像机拍摄讲话者的视频。 在所述当前讲话者的视频以全屏的形式传送 到远端会场之后, 远端会场的与会者可以清楚地观察到所述当前讲话者的特写 画面, 其中所述特写画面可能包含着重要的会议信息, 这样, 可以尽可能多地 保留重要的会议信息。 After the camera identified as 0 successfully acquires the video of the current speaker, the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0, and the tracking flag of the other camera is set to 0 by 1. and so, The video captured by the camera with the tracking flag of 1 after the change is the video of the current speaker. Here, the full screen output of the current speaker's video means that the output video is from a camera. In the full-screen display, only one speaker can be displayed, or multiple speakers can be displayed. Among them, the distances of a plurality of speakers are relatively close, so that the body language or face information of each speaker can be observed according to the captured video. Referring to step S1 2, if a plurality of speakers are far apart to observe each speaker in the video captured by the same camera, the position of the speaker may be considered to be changed, and the video of the speaker may be taken by another camera. . After the current speaker's video is transmitted to the remote site in full screen, the participant of the remote site can clearly observe the close-up picture of the current speaker, wherein the close-up picture may contain important meetings. Information, so that important meeting information can be retained as much as possible.
如图 4所示, 从左到右的三幅图中, 第一幅图表示会议开始时, 显示器全 屏显示本地会场的全景; 第二幅图表示, 第一讲话者出现后, 显示器全屏显示 第一讲话者的视频; 第三幅图表示, 当讲话者由第一讲话者变更为第二讲话者 之后, 显示器全屏显示第二讲话者。 As shown in FIG. 4, from the left to the right of the three figures, the first picture shows that the display shows the panoramic view of the local site in full screen at the beginning of the meeting; the second picture shows that after the first speaker appears, the display is displayed full screen. A speaker's video; the third picture shows that after the speaker has changed from the first speaker to the second speaker, the display displays the second speaker in full screen.
S36,在拍摄所述当前讲话者视频的摄像机成功获取所述当前讲话者的视频 之前, 输出所述当前讲话者的前一个讲话者的视频。 S36: Before the camera that captures the current speaker video successfully acquires the video of the current speaker, output a video of the current speaker of the current speaker.
需要说明的是, 步骤 S36在步骤 S35之前执行。 It should be noted that step S36 is performed before step S35.
由于从讲话者发生变更开始, 到摄像机成功获取所述当前讲话者的视频之 前的过程中, 摄像机会转动 /推拉摄像头, 由此会产生模糊或不稳定的画面。 但 是, 在上述过程中, 通过输出所述当前讲话者的前一个讲话者的视频, 可以避 免输出所述模糊或不稳定的画面。 Since the camera starts to change, before the camera successfully acquires the current speaker's video, the camera will rotate/push the camera, resulting in a blurred or unstable picture. However, in the above process, by outputting the video of the previous speaker of the current speaker, it is possible to avoid outputting the blurred or unstable picture.
为便于理解, 下面对照附图 5A及 5B进行说明。 如图 5A所示, 按照从左到 右的顺序, 约定三幅图分别为第一幅图、 第二幅图、 第三幅图。 第三幅图讲话 者是第一幅图讲话者的下一个讲话者, 从讲话者发生变更开始到摄像机成功获 取第三幅图讲话者大小合适的视频之前的过程中,如果直接输出摄像机在转动 / 推拉摄像头过程中拍摄的画面, 就会出现第二幅图中模糊或不稳定的画面。 相 应地, 本发明具体实施例在上述过程中, 输出的是第一幅图讲话者的视频, 而
且只有在成功获取第三幅图讲话者的大小合适的视频之后, 才输出第三幅图讲 话者的视频, 这样能够避免输出所述模糊或不稳定的画面(参照图 5B)。 For ease of understanding, the following description will be made with reference to Figs. 5A and 5B. As shown in FIG. 5A, in the order from left to right, the three figures are agreed to be the first picture, the second picture, and the third picture, respectively. The third picture speaker is the next speaker of the first picture speaker. From the time the speaker changes, until the camera successfully acquires the video of the size of the third picture speaker, if the direct output camera is rotating / The picture taken during the process of pushing and pulling the camera will result in a blurred or unstable picture in the second picture. Correspondingly, in the above process, the specific embodiment of the present invention outputs the video of the first picture speaker, and The video of the third picture talker is output only after the video of the size of the third picture talker is successfully acquired, so that the blurred or unstable picture can be avoided (refer to FIG. 5B).
另外, 根据本地会场的情况, 本具体实施例在实现的过程中可能出现以下 几种情况, 对应的处理方式如下: In addition, according to the situation of the local site, the following situations may occur in the implementation process of the specific embodiment, and the corresponding processing manner is as follows:
( 1 )、 本地会场无人讲话 (1), no one speaks at the local venue
不切换输出的画面, 仍输出本地会场的全景画面; The screen of the output is not switched, and the panoramic view of the local site is still output;
(2)、 本地会场单个人讲话, 无人插话 (2), a single person at the local venue speaks, no one is plugged in
输出的画面是当前讲话者的全屏显示画面; The output screen is the full screen display of the current speaker;
(3)、 本地会场单个人在讲话, 有人插话, 但插话时间很短 (3), a single person in the local venue is speaking, someone is interjecting, but the time is very short.
不切换输出的画面, 仍输出主讲话人全屏显示的画面; Does not switch the output screen, still output the full-screen display of the main speaker;
(4)、 本地会场单个人讲话, 时有移动 (4), a single person speaks at the local venue, there is a movement
若讲话者的走动, 头或身体的偏移没有超出当前的输出画面且位于该画面 的设定中心区域内, 则摄像机不切换, 也不做跟踪, 输出的画面是当前讲话者 位于中心区域内的全屏显示画面; 若讲话者的移动使得讲话者仍未超出当前的 输出画面但有可能或已经超出该画面的设定中心区域, 则摄像机不切换, 但可 以做适当的跟踪, 以保持讲话者位于中心区域内; 若讲话者的移动使得讲话者 已经超出了当前的输出画面, 则切换摄像机, 对讲话者进行跟踪; If the speaker moves, the head or body offset does not exceed the current output screen and is located in the set center area of the screen, the camera does not switch and does not track. The output picture is that the current speaker is in the center area. Full screen display; if the speaker's movement is such that the speaker has not exceeded the current output picture but may or has exceeded the set center area of the picture, the camera does not switch, but can be properly tracked to keep the speaker Located in the central area; if the speaker's movement causes the speaker to have exceeded the current output picture, the camera is switched to track the speaker;
(5)、 本地会场讲话人发生一次变更, 变更成旁边的人或其他人 (5) The local conference speaker has changed once and changed to the person or other person next to it.
若变更后的讲话者位置未超出变更前的输出画面且位于该画面的设定中心 区域内, 则摄像机不切换, 也不做跟踪, 输出的画面是变更后的讲话者位于中 心区域内的全屏显示画面; 若变更后讲话者的位置仍未超出变更前的输出画面 但有可能或已经超出该画面的设定中心区域, 则摄像机不切换, 但可以做适当 的跟踪, 以保持变更后的讲话者位于中心区域内, 输出的画面为变更后的讲话 者位于中心区域内的全屏显示画面; 若变更后的讲话者位置已经超出了变更前 的输出画面, 则切换摄像机, 对变更后的讲话者进行跟踪; If the changed speaker position does not exceed the output screen before the change and is located in the setting center area of the screen, the camera does not switch and does not perform tracking. The output screen is the full screen of the changed speaker in the center area. Display screen; if the position of the speaker after the change has not exceeded the output screen before the change but may or may have exceeded the setting center area of the screen, the camera does not switch, but can be properly tracked to maintain the changed speech. The player is located in the center area, and the output screen is a full-screen display screen in which the speaker after the change is located in the center area. If the changed speaker position has exceeded the output screen before the change, the camera is switched, and the changed speaker is switched. Tracking;
(6)、 本地会场多人同时讲话, 即抢话状态 (6), many people in the local venue speak at the same time, that is, the state of robbing
这种情况下抢话的时间通常很短, 不切换输出的画面;
(7)、 本地会场多人讨论, 交替讲话, 即多次发生讲话人位置变更 摄像机交替跟踪每次位置发生变更后的讲话人, 输出的画面为变更后讲话 者的全屏显示画面。 In this case, the time for snatching is usually very short, and the output picture is not switched; (7), the local venue is discussed by many people, alternating speech, that is, the speaker position change occurs multiple times. The camera alternately tracks the speaker after each position change, and the output screen is the full screen display screen of the speaker after the change.
本具体实施例中, 每次发生讲话者的位置变更时, 均控制追踪标志为 0的 摄像机去跟踪拍摄位置变更后的讲话者, 而且, 在该摄像机成功地获取讲话者 的合适视频之后, 其追踪标志均由 0 置为 1, 另一台摄像机的追踪标志则由 1 置为 0。 这样就总可以保证在某个时刻, 有一台摄像机正在拍摄当前讲话者, 同时还有另外一台摄像机可以用于拍摄所述当前讲话者的下一个讲话者。 也就 是说, 两台摄像机可以相互配合、 无缝对接。 由于在讲话者的位置发生变更时, 摄像机成功获取变更后的讲话者的视频的过程需要一定的时间。 在这期间, 保 持输出所述当前讲话者的前一个讲话者的视频, 只有在摄像机成功获取当前讲 话者的视频之后, 才输出所述当前讲话者的视频, 相对于现有技术需要先将画 面切换到会场的全景, 待摄像机成功获取变更后的讲话者的视频时, 才将画面 切换到变更后的讲话者, 本发明确实能够减少视频切换次数, 从而使画面衔接 紧密, 输出的视频更加流畅。 而且, 当会场中多人快速地交替讲话时, 现有技 术专门用于拍摄讲话者视频的摄像机拍摄的画面会将多个讲话者包含进来, 若 所述多个讲话者距离较远, 则无法在所拍摄的画面中观察所述多个讲话者的表 情。 在本发明中, 由于所述第一摄像机和所述第二摄像机的相互配合, 即使在 会场中存在讲话者快速地交替讲话, 两摄像机也能够交替拍摄讲话者的面部画 面。 此外, 通过全屏输出所述当前讲话者的视频, 远端会场的与会者可以更加 清楚地观察到所述当前讲话者的面部特写, 这些面部特写可能包含着重要的会 议信息, 这样, 可以更多地保留有价值的会议信息。 In this embodiment, each time the position change of the speaker occurs, the camera with the tracking flag of 0 is controlled to track the speaker after the change of the shooting position, and after the camera successfully acquires the appropriate video of the speaker, The tracking flags are set to 0 by 0, and the tracking flag of the other camera is set to 0 by 1. This always ensures that at some point, one camera is shooting the current speaker, and another camera can be used to capture the next speaker of the current speaker. In other words, the two cameras can be mated and seamlessly mated. It takes a certain amount of time for the camera to successfully acquire the video of the changed speaker when the position of the speaker changes. During this period, the video of the previous speaker of the current speaker is kept output, and the video of the current speaker is output only after the camera successfully acquires the video of the current speaker, which is required to be compared with the prior art. Switching to the panoramic view of the site, when the camera successfully acquires the changed speaker's video, the screen is switched to the changed speaker. The present invention can reduce the number of video switching, thereby making the picture tightly connected and the output video smoother. . Moreover, when multiple people in the venue quickly talk alternately, the prior art camera shot for the camera video will include multiple speakers, if the multiple speakers are far apart, The expressions of the plurality of speakers are observed in the captured picture. In the present invention, due to the cooperation of the first camera and the second camera, even if there is a speaker who alternately speaks in the venue, the two cameras can alternately photograph the face of the speaker. In addition, by outputting the video of the current speaker in full screen, the participants of the remote site can more clearly observe the facial features of the current speaker, and these facial features may contain important meeting information, so that more Preserve valuable meeting information.
图 6是本发明控制视频拍摄的方法的另一具体实施例的流程图。 6 is a flow chart of another embodiment of a method of controlling video capture of the present invention.
如图 6所示, 以摄像装置是摄像机为例, 本发明具体实施例提供的控制视 频拍摄的方法包括: As shown in FIG. 6 , the method for controlling video shooting provided by the specific embodiment of the present invention includes:
S61 , 会议开始时, 控制两台摄像机拍摄本地会场的全景。 S61. At the beginning of the meeting, control two cameras to take a panoramic view of the local venue.
在所述两台摄像机开启后, 即在会议开始时, 本地会场还没有人讲话, 为
了将本地会场的布置情况传送到远端会场, 可以控制所述两台摄像机拍摄本地 会场的全景, 拍摄的角度和大小可由用户设置, 优选的设置可以是能够包含所 有与会人和主要会议场景的设置, 而且, 在输出本地会场的全景视频时, 优选 的是先输出追踪标志为 1 的摄像机所拍摄的视频。 After the two cameras are turned on, that is, at the beginning of the meeting, no one in the local venue has spoken. The location of the local site is transmitted to the remote site. The two cameras can be controlled to capture the panorama of the local site. The angle and size of the camera can be set by the user. The preferred setting can include all participants and the main conference scene. In addition, when outputting the panoramic video of the local site, it is preferable to output the video captured by the camera with the tracking flag of 1.
562, 结合声源定位技术和预置位, 控制第一摄像机拍摄所述第一讲话者的 视频。 562. Control the first camera to capture the video of the first speaker in combination with the sound source localization technology and the preset position.
在控制所述两台摄像机拍摄会场的全景之后, 当会场中有一个人开始讲话 时, 即出现第一讲话者时, 利用声源定位技术获取第一讲话者的位置信息。 再 结合预置位, 即结合预先设定的、讲话者在本地会场中讲话时所处的可能位置, 确定所述第一讲话者的准确位置。 具体地, 可以从多个预置位中找出与声源定 位获得的位置最接近的预置位作为准确位置。 然后, 控制器根据所述第一讲话 者的准确位置, 控制追踪标志为 0的摄像机去拍摄第一讲话者的视频。 所述追 踪标志为 0的摄像机拍摄到所述第一讲话者的合适视频之后, 其追踪标志置为 1, 另一台摄像机的追踪标志由 1 置为 0。 After controlling the panoramic view of the two cameras to shoot the conference site, when a person in the conference starts speaking, when the first speaker appears, the position information of the first speaker is obtained by using sound source localization technology. In combination with the preset position, that is, the exact position of the first speaker is determined in combination with a preset position where the speaker is speaking in the local venue. Specifically, a preset position closest to the position obtained by the sound source localization can be found from a plurality of preset positions as an accurate position. Then, the controller controls the camera with the tracking flag of 0 to capture the video of the first speaker according to the exact position of the first speaker. After the camera with the tracking flag of 0 captures the appropriate video of the first speaker, its tracking flag is set to 1, and the tracking flag of the other camera is set to 0 by 1.
563, 在当前讲话者从所述第一讲话者变更为第二讲话者时, 控制所述第二 摄像机拍摄所述第二讲话者的视频, 其中, 所述第二讲话者为与所述第一讲话 者位置不同的下一个讲话者。 563. Control, when the current speaker changes from the first speaker to the second speaker, the second camera to capture a video of the second speaker, where the second speaker is The next speaker with a different speaker position.
当所述第一摄像机成功拍摄到所述第一讲话者的视频之后, 所述第一摄像 机的追踪标志变为了 1, 所述第二摄像机的追踪标志变为了 0。 此时, 如果讲话 者发生变化, 即由所述第一讲话者变更为与所述第一讲话者位置不同的所述第 二讲话者,像步骤 S62那样地,控制器可以控制所述追踪标志为 0的摄像机(即 所述第二摄像机) 去拍摄所述第二讲话者的视频。 当所述追踪标志为 0的摄像 机成功拍摄到所述第二讲话者的视频之后, 其追踪标志置为 1, 另一台摄像机 的追踪标志由 1 置为 0。 After the first camera successfully captures the video of the first speaker, the tracking flag of the first camera becomes 1, and the tracking flag of the second camera becomes 0. At this time, if the speaker changes, that is, the first speaker changes to the second speaker different from the first speaker position, the controller may control the tracking flag as in step S62. A camera of 0 (ie, the second camera) takes a video of the second speaker. When the camera with the tracking flag of 0 successfully captures the video of the second speaker, its tracking flag is set to 1, and the tracking flag of the other camera is set to 0 by 1.
564, 当后续再发生讲话者变更时, 依次控制所述第一摄像机和所述第二摄 像机交替拍摄当前讲话者的视频。 564. When the speaker change occurs later, the first camera and the second camera are sequentially controlled to alternately capture the video of the current speaker.
当所述第二摄像机成功拍摄到所述第二讲话者的视频之后, 所述第二摄像
机的追踪标志变为了 1, 所述第一摄像机的追踪标志变为了 0。如果讲话者再由 所述第二讲话者变更为第三讲话者, 则控制追踪标志为 0的摄像机 (即所述第 一摄像机) 去拍摄第三讲话者, 当所述追踪标志为 0的摄像机成功获取所述第 三讲话者的合适视频之后, 所述追踪标志为 0的摄像机的追踪标志由 0置为 1, 另一台摄像机 (即所述第二摄像机) 的追踪标志由 1 置为 0。 类似地, 当讲话 者由所述第三讲话者变更为第四讲话者 (即所述第三讲话者的下一个讲话者), 再控制追踪标志为 0的摄像机 (即所述第二摄像机) 去拍摄第四讲话者, 当所 述追踪标志为 0的摄像机成功获取所述第四讲话者的合适视频之后, 所述追踪 标志为 0的摄像机的追踪标志由 0置为 1, 另一台摄像机 (即所述第一摄像机) 的追踪标志由 1 置为 0。 当后续再发生讲话者变更时, 以同样的方式进行交替 拍摄。 After the second camera successfully captures the video of the second speaker, the second camera The tracking flag of the machine becomes 1, and the tracking flag of the first camera becomes 0. If the speaker is changed from the second speaker to the third speaker, the camera with the tracking flag of 0 (ie, the first camera) is controlled to shoot the third speaker, when the tracking flag is 0. After successfully acquiring the appropriate video of the third speaker, the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0, and the tracking flag of the other camera (ie, the second camera) is set to 1 by 0. . Similarly, when the speaker is changed from the third speaker to the fourth speaker (ie, the next speaker of the third speaker), the camera with the tracking flag of 0 (ie, the second camera) is controlled. Going to the fourth speaker, after the camera with the tracking flag of 0 successfully acquires the appropriate video of the fourth speaker, the tracking flag of the camera with the tracking flag of 0 is set to 0 by 1, another camera The tracking flag of the first camera (ie, the first camera) is set to zero by one. When the speaker changes again, the alternate shooting is performed in the same manner.
S65 , 在拍摄所述当前讲话者视频的摄像机成功获取当前讲话者的视频之 后, 以画中画的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话 者的视频; 其中, 所述画中画包括第一画面和包含在所述第一画面中的、 比第 一画面小的第二画面, 在所述第一画面中输出所述当前讲话者, 所述第二画面 中输出所述当前讲话者的前一个讲话者。 S65, after the camera that captures the current speaker video successfully acquires the video of the current speaker, simultaneously outputs the video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture; The picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, where the current speaker is output, in the second picture The previous speaker of the current speaker is output.
当标识为 0 的摄像机成功获取当前讲话者的视频之后, 所述追踪标志为 0 的摄像机的追踪标志由 0置为 1。 此时, 追踪标志为 1 的摄像机拍摄的是所述 当前讲话者的视频, 追踪标志为 0的摄像机拍摄的是所述当前讲话者的前一个 讲话者的视频。 在此, 所述以画中画的形式同时输出所述当前讲话者和所述当 前讲话者的前一个讲话者的视频,是指在所述第一画面中输出所述当前讲话者, 在包含于所述第一画面且比第一画面小的所述第二画面中, 输出所述当前讲话 者的前一个讲话者。 这样, 远端会场的与会者除了可以观察到所述当前讲话者 的面部表情之外, 还可以观察到一方对于另一方发言的反应表情, 这些表情可 能包含着重要的会议信息, 这样, 可以尽可能多地保留重要的会议信息。 After the camera identified as 0 successfully acquires the video of the current speaker, the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0. At this time, the camera with the tracking flag of 1 captures the video of the current speaker, and the camera with the tracking flag of 0 captures the video of the previous speaker of the current speaker. Here, the simultaneously outputting the video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture refers to outputting the current speaker in the first picture, including In the second picture that is smaller than the first picture and outputs the previous speaker of the current speaker. In this way, in addition to observing the facial expression of the current speaker, the participant of the remote site can also observe the expression of one party's reaction to the other party's speech. These expressions may contain important meeting information, so that It is possible to retain important meeting information.
如图 7所示, 从左到右的三幅图中, 第一幅图表示会议开始时, 以画中画 的形式输出本地会场的全景; 第二幅图表示, 第一讲话者出现后, 在大画面(即
第一画面) 中输出第一讲话者, 屏幕的右下角 (即第二画面) 输出本地会场全 景; 第三幅图表示, 当讲话者由第一讲话者变更为第二讲话者之后, 大画面中 输出第二讲话者, 屏幕的右下角输出第一讲话者。 As shown in Fig. 7, among the three pictures from left to right, the first picture shows the panorama of the local site in the form of picture-in-picture at the beginning of the meeting; the second picture shows that after the first speaker appears, In the big picture (ie In the first screen, the first speaker is output, and the lower right corner of the screen (ie, the second screen) outputs a panoramic view of the local site; the third image shows that after the speaker is changed from the first speaker to the second speaker, the large screen The second speaker is output, and the first speaker is output in the lower right corner of the screen.
S66,在拍摄所述当前讲话者视频的摄像机成功获取所述当前讲话者的视频 之前,在所述第一画面和第二画面中分别输出所述当前讲话者的前两个讲话者。 S66. Before the camera that captures the current speaker video successfully acquires the video of the current speaker, respectively output the first two speakers of the current speaker in the first picture and the second picture.
需要说明的是, 步骤 S66在步骤 S65之前执行。 It should be noted that step S66 is performed before step S65.
由于从讲话者发生变更到摄像机成功获取所述当前讲话者的视频的过程 中, 摄像机会转动 /推拉摄像头, 从而产生模糊或不稳定的画面。 为此, 可以在 所述第一画面和第二画面中分别输出所述当前讲话者的前两个讲话者, 这样能 够避免输出所述模糊或不稳定的画面。 During the process from the change of the speaker to the successful acquisition of the video of the current speaker by the camera, the camera will rotate/push the camera, resulting in a blurred or unstable picture. To this end, the first two speakers of the current talker may be respectively output in the first picture and the second picture, so that the output of the blurred or unstable picture can be avoided.
为便于理解, 下面对照附图 8A及 8B进行说明。 如图 8A所示, 按照从左到 右的顺序, 约定三幅图分别为第一幅图、 第二幅图、 第三幅图。 第一幅图右下 角 (即第二画面) 讲话者是第一幅图大画面 (即第一画面) 讲话者的前一个讲 话者, 第一幅图大画面讲话者是第三幅图大画面讲话者的前一个讲话者。现在, 讲话者由第一幅图大画面讲话者变更为第三幅图大画面讲话者。 从讲话者发生 变更开始到摄像机成功获取第三幅图大画面讲话者的视频之前的过程中, 如果 直接输出摄像机在转动 /推拉摄像头过程中拍摄的画面,就会出现第二幅图右下 角画面中模糊或不稳定的画面。 如图 8B所示, 相应地, 本发明具体实施例在上 述过程中, 输出的是第一幅图讲话者的活动画面 (第二幅图大画面) 和第一幅 图讲话者的前一个讲话者的凝固画面 (第二幅图右下角画面), 能够避免输出所 述模糊或不稳定的画面。 For ease of understanding, the following description will be made with reference to Figs. 8A and 8B. As shown in Fig. 8A, in the order from left to right, the three figures are agreed to be the first picture, the second picture, and the third picture, respectively. The lower right corner of the first picture (ie the second picture) The speaker is the first picture of the first picture (ie the first picture) The previous speaker of the speaker, the first picture of the big picture is the third picture The former speaker of the speaker. Now, the speaker is changed from the first picture big picture speaker to the third picture big picture speaker. In the process from the change of the speaker to the video before the camera successfully acquires the video of the third picture big picture speaker, if the picture taken by the camera during the rotation/pushing of the camera is directly output, the picture in the lower right corner of the second picture will appear. A blurred or unstable picture. As shown in FIG. 8B, correspondingly, in the above process, the specific embodiment of the present invention outputs the active picture of the first picture speaker (the large picture of the second picture) and the previous speech of the first picture speaker. The solidified picture (the lower right picture of the second picture) can avoid outputting the blurred or unstable picture.
当然, 根据实际需要, 从讲话者发生变更到摄像机成功获取所述当前讲话 者的视频的过程中, 也可以采用图 8A的第二幅图所示的输出方式。 Of course, according to actual needs, the output mode shown in the second figure of Fig. 8A can also be used in the process from the change of the speaker to the successful acquisition of the video of the current speaker by the camera.
另外, 根据本地会场的情况, 本具体实施例在实现的过程中可能出现以下 几种情况, 对应的处理方式如下: In addition, according to the situation of the local site, the following situations may occur in the implementation process of the specific embodiment, and the corresponding processing manner is as follows:
( 1 )、 本地会场无人讲话 (1), no one speaks at the local venue
输出画面的组合方式不变, 仍输出本地会场的全景画面;
(2)、 本地会场单个人讲话, 无人插话 The combination of the output screens remains unchanged, and the panoramic view of the local site is still output; (2), a single person at the local venue speaks, no one is plugged in
第一画面中输出当前讲话者, 第二画面输出的是所述当前讲话者的前一个 讲话者, 画面组合方式不变; The current speaker is outputted in the first picture, and the second speaker outputs the previous speaker of the current speaker, and the picture combination mode is unchanged;
(3)、 本地会场单个人在讲话, 有人插话, 但插话时间很短 (3), a single person in the local venue is speaking, someone is interjecting, but the time is very short.
第一画面中输出主讲人, 第二画面不切换或者输出插话人, 优选是所述第 二画面不切换; The first screen outputs the presenter, the second screen does not switch or outputs the interjector, and preferably the second screen does not switch;
(4)、 本地会场单个人讲话, 时有移动 (4), a single person speaks at the local venue, there is a movement
若讲话者的走动, 头或身体的偏移没有超出当前输出的第一画面且位于第 一画面的设定中心区域内, 则摄像机不切换, 也不做跟踪, 第一画面输出的是 当前讲话者有动作的画面, 第二画面不变, 输出画面组合方式不变; 若讲话者 的移动使得讲话者仍未超出当前输出的第一画面但有可能或已经超出第一画面 的设定中心区域, 则摄像机不切换, 但可以做适当的跟踪, 以保持讲话者位于 第一画面的设定中心区域内, 第二画面不变, 输出画面组合方式不变; 若讲话 者的移动使得讲话者已经超出了当前输出的第一画面, 则切换摄像机, 对讲话 者进行跟踪, 跟踪成功后在第一画面中输出讲话者, 摄像机切换前的第一画面 切换至第二画面进行输出; If the speaker moves, the head or body offset does not exceed the first screen of the current output and is located in the set center area of the first screen, the camera does not switch and does not perform tracking, and the first screen outputs the current speech. The action has a picture, the second picture does not change, and the output picture combination mode does not change; if the speaker moves so that the speaker has not exceeded the first picture currently output but may or may have exceeded the set center area of the first picture , the camera does not switch, but can be properly tracked to keep the speaker in the set center area of the first picture, the second picture is unchanged, and the output picture combination mode is unchanged; if the speaker's movement makes the speaker already If the first screen of the current output is exceeded, the camera is switched, and the speaker is tracked. After the tracking succeeds, the speaker is output in the first screen, and the first screen before the camera switching is switched to the second screen for output;
(5)、 本地会场讲话人发生一次变更, 变更成旁边的人或其他人 (5) The local conference speaker has changed once and changed to the person or other person next to it.
若变更后的讲话者位置未超出变更前的第一画面且位于第一画面的设定中 心区域内, 则摄像机不切换, 也不做跟踪, 第一画面输出的是变更后的讲话者 位于中心区域内的画面, 第二画面不变; 若变更后讲话者的位置仍未超出变更 前的第一画面但有可能或已经超出第一画面的设定中心区域,则摄像机不切换, 但可以做适当的跟踪, 以保持变更后的讲话者位于第一画面中心区域内, 第二 画面不变; 若变更后的讲话者位置已经超出了变更前的第一画面, 则切换摄像 机, 对变更后的讲话者进行跟踪, 第一画面输出变更后的讲话者, 第二画面输 出变更前的讲话者; If the changed speaker position does not exceed the first screen before the change and is located in the set center area of the first screen, the camera does not switch and does not perform tracking. The first screen outputs the changed speaker at the center. The picture in the area, the second picture does not change; if the position of the speaker after the change has not exceeded the first picture before the change but may or may have exceeded the setting center area of the first picture, the camera does not switch, but can do Appropriate tracking, so that the changed speaker is located in the center of the first screen, the second screen remains unchanged; if the changed speaker position has exceeded the first screen before the change, the camera is switched, after the change The speaker performs tracking, the first screen outputs the changed speaker, and the second screen outputs the speaker before the change;
(6)、 本地会场多人同时讲话, 即抢话状态 (6), many people in the local venue speak at the same time, that is, the state of robbing
这种情况下抢话的时间通常很短, 输出画面的组合方式不变;
(7)、 本地会场多人讨论, 交替讲话, 即多次发生讲话人位置变更 摄像机交替跟踪每次位置发生变更后的讲话人,改变输出画面的组合方式, 即每次变更后, 所述第一画面中输出当前讲话者, 第二画面输出的是所述当前 讲话者的前一个讲话者。 In this case, the time for snatching is usually very short, and the combination of the output pictures is unchanged; (7), the local venue, many people discuss, alternate speech, that is, the speaker position change occurs multiple times. The camera alternately tracks the speaker after each position change, and changes the combination mode of the output screen, that is, after each change, the The current speaker is output in one picture, and the second speaker outputs the previous speaker of the current speaker.
在本具体实施例中, 每次发生讲话者的位置变更时, 均控制追踪标志为 0 的摄像机去跟踪拍摄位置变更后的讲话者, 而且, 在该摄像机成功地获取讲话 者大小合适的视频之后, 其追踪标志均由 0置为 1, 另一台摄像机的追踪标志 则由 1 置为 0。 这样就总可以保证在某个时刻, 有一台摄像机正在拍摄当前讲 话者, 同时还有另外一台摄像机处于空闲状态, 可以用于拍摄所述当前讲话者 的下一个讲话者。 也就是说, 两台摄像机可以相互配合、 无缝对接。 由于在讲 话者的位置发生变更时, 摄像机成功获取变更后的讲话者的视频的过程需要一 定的时间。 在这期间, 保持输出所述当前讲话者的前一个讲话者的视频, 只有 在摄像机成功获取当前讲话者的视频之后, 才输出所述当前讲话者的视频, 相 对于现有技术需要先将画面切换到会场的全景, 待摄像机成功获取变更后的讲 话者的视频时, 才将画面切换到变更后的讲话者, 本发明确实能够减少视频切 换次数, 从而使画面衔接紧密, 输出的视频更加流畅。 而且, 当会场中多人快 速地交替讲话时, 现有技术专门用于拍摄讲话者视频的摄像机拍摄的画面会将 多个讲话者包含进来, 若所述多个讲话者距离较远, 则无法在所拍摄的画面中 观察所述多个讲话者的表情。 在本发明中, 由于所述第一摄像机和所述第二摄 像机的相互配合, 即使在会场中存在讲话者快速地交替讲话, 两摄像机也能够 交替拍摄讲话者的面部画面。 此外, 以画中画的形式同时输出所述当前讲话者 和所述当前讲话者的前一个讲话者的视频, 使得远端会场的与会者可以清楚地 观察到所述当前讲话者的面部特写, 同时还可以看到本地会场中讲话者变更的 情况以及一方对于另一方发言的反应, 这样, 就更多地保留了有价值的会议信 息。 In this embodiment, each time the position change of the speaker occurs, the camera with the tracking flag of 0 is controlled to track the speaker after the change of the shooting position, and after the camera successfully acquires the video of the appropriate size of the speaker. The tracking flag is set to 0 by 0, and the tracking flag of the other camera is set to 0 by 1. This always ensures that at some point, one camera is shooting the current speaker while another camera is idle and can be used to capture the next speaker of the current speaker. In other words, the two cameras can be mated and seamlessly mated. It takes a certain amount of time for the camera to successfully acquire the video of the changed speaker when the position of the speaker changes. During this period, the video of the previous speaker of the current speaker is kept output, and the video of the current speaker is output only after the camera successfully acquires the video of the current speaker, which is required to be compared with the prior art. Switching to the panoramic view of the site, when the camera successfully acquires the changed speaker's video, the screen is switched to the changed speaker. The present invention can reduce the number of video switching, thereby making the picture tightly connected and the output video smoother. . Moreover, when multiple people in the venue quickly talk alternately, the prior art camera shot for the camera video will include multiple speakers, if the multiple speakers are far apart, The expressions of the plurality of speakers are observed in the captured picture. In the present invention, due to the cooperation of the first camera and the second camera, even if there is a speaker who alternately speaks in the venue, the two cameras can alternately photograph the face of the speaker. In addition, the video of the current speaker and the previous speaker of the current speaker is simultaneously output in the form of picture-in-picture, so that the participants of the remote site can clearly observe the facial features of the current speaker. At the same time, you can see the changes in the speakers in the local venue and the reaction of one party to the other, so that more valuable meeting information is retained.
图 9是本发明控制视频拍摄的方法的又一具体实施例的流程图。 9 is a flow chart of still another embodiment of a method of controlling video capture of the present invention.
如图 9所示, 以摄像装置是摄像机为例, 本发明具体实施例提供的控制视
频拍摄的方法包括: As shown in FIG. 9 , taking the camera device as a camera as an example, the control device provided by the specific embodiment of the present invention The methods of frequency shooting include:
591 , 会议开始时, 控制两台摄像机拍摄会场的全景。 591. At the beginning of the meeting, control the panoramic view of the two cameras to capture the venue.
在所述两台摄像机开启后, 即在会议开始时, 本地会场还没有人讲话, 为 了将本地会场的布置情况传送到远端会场, 可以控制所述两台摄像机拍摄本地 会场的全景, 拍摄的角度和大小可由用户设置, 优选的设置可以是能够包含所 有与会人和主要会议场景的设置, 在输出本地会场的全景画面的视频时, 优选 的是先输出追踪标志为 1 的摄像机所拍摄的视频。 After the two cameras are turned on, that is, at the beginning of the conference, no one has spoken at the local conference site. In order to transmit the layout of the local conference site to the remote conference site, the two cameras can be controlled to capture the panoramic view of the local conference site. The angle and size can be set by the user. The preferred setting can be a setting that can include all the participants and the main conference scene. When outputting the video of the panoramic screen of the local conference site, it is preferable to output the video captured by the camera with the tracking flag of 1. .
592, 利用声源定位技术和图像识别技术, 控制第一摄像机拍摄所述第一讲 话者的视频。 592. Control the first camera to capture the video of the first speaker by using sound source localization technology and image recognition technology.
在控制所述两台摄像机拍摄会场的全景之后, 当会场中有一个人开始讲话 时, 即出现第一讲话者时, 利用声源定位技术获取第一讲话者的位置, 控制追 踪标志为 0的摄像机转动到合适的角度。 再利用图像识别技术, 进一步判断所 述第一讲话者的准确位置。 然后, 控制器根据所述第一讲话者的准确位置, 控 制追踪标志为 0的摄像机去拍摄第一讲话者的视频。 所述追踪标志为 0的摄像 机拍摄到所述第一讲话者的合适视频之后, 其追踪标志置为 1, 另一台摄像机 的追踪标志由 1 置为 0。 After controlling the panoramic view of the two cameras to shoot the conference site, when one of the participants in the conference starts speaking, when the first speaker appears, the position of the first speaker is obtained by the sound source localization technique, and the camera with the tracking flag of 0 is controlled. Turn to the right angle. The image recognition technique is then utilized to further determine the exact location of the first speaker. Then, the controller controls the camera with the tracking flag of 0 to capture the video of the first speaker according to the exact position of the first speaker. After the camera with the tracking flag of 0 captures the appropriate video of the first speaker, its tracking flag is set to 1, and the tracking flag of the other camera is set to 0 by 1.
593, 在当前讲话者从所述第一讲话者变更为第二讲话者时, 控制所述第二 摄像机拍摄所述第二讲话者的视频, 其中, 所述第二讲话者为与所述第一讲话 者位置不同的下一个讲话者。 593. Control, when the current speaker changes from the first speaker to the second speaker, the second camera to capture a video of the second speaker, where the second speaker is The next speaker with a different speaker position.
当所述第一摄像机成功拍摄到所述第一讲话者的视频之后, 所述第一摄像 机的追踪标志变为了 1, 所述第二摄像机的追踪标志变为了 0。 此时, 如果讲话 者发生变化, 即由所述第一讲话者变更为与所述第一讲话者位置不同的所述第 二讲话者,像步骤 S92那样地,控制器可以控制所述追踪标志为 0的摄像机(即 所述第二摄像机) 去拍摄所述第二讲话者的视频。 当所述追踪标志为 0的摄像 机拍摄到所述第二讲话者的合适视频之后, 其追踪标志置为 1, 另一台摄像机 的追踪标志由 1 置为 0。 After the first camera successfully captures the video of the first speaker, the tracking flag of the first camera becomes 1, and the tracking flag of the second camera becomes 0. At this time, if the speaker changes, that is, the first speaker changes to the second speaker different from the first speaker position, the controller may control the tracking flag as in step S92. A camera of 0 (ie, the second camera) takes a video of the second speaker. When the camera with the tracking flag of 0 captures the appropriate video of the second speaker, its tracking flag is set to 1, and the tracking flag of the other camera is set to 0 by 1.
594, 当后续再发生讲话者变更时, 依次控制所述第一摄像机和所述第二摄
像机交替拍摄当前讲话者的视频。 594, when the speaker changes again, the first camera and the second camera are sequentially controlled. The camera alternately captures the current speaker's video.
当所述第二摄像机成功拍摄到所述第二讲话者的视频之后, 所述第二摄像 机的追踪标志变为了 1, 所述第一摄像机的追踪标志变为了 0。如果讲话者再由 所述第二讲话者变更为第三讲话者, 则控制追踪标志为 0的摄像机 (即所述第 一摄像机) 去拍摄第三讲话者, 当所述追踪标志为 0的摄像机成功获取所述第 三讲话者的合适视频之后, 所述追踪标志为 0的摄像机的追踪标志由 0置为 1, 另一台摄像机 (即所述第二摄像机) 的追踪标志由 1 置为 0。 类似地, 当讲话 者由所述第三讲话者变更为第四讲话者 (即所述第三讲话者的下一个讲话者), 再控制追踪标志为 0的摄像机 (即所述第二摄像机) 去拍摄第四讲话者, 当所 述追踪标志为 0的摄像机成功获取所述第四讲话者的合适视频之后, 所述追踪 标志为 0的摄像机的追踪标志由 0置为 1, 另一台摄像机 (即所述第一摄像机) 的追踪标志由 1 置为 0。 当后续再发生讲话者变更时, 以同样的方式进行交替 拍摄。 After the second camera successfully captures the video of the second speaker, the tracking flag of the second camera becomes 1, and the tracking flag of the first camera becomes 0. If the speaker is changed from the second speaker to the third speaker, the camera with the tracking flag of 0 (ie, the first camera) is controlled to shoot the third speaker, when the tracking flag is 0. After successfully acquiring the appropriate video of the third speaker, the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0, and the tracking flag of the other camera (ie, the second camera) is set to 1 by 0. . Similarly, when the speaker is changed from the third speaker to the fourth speaker (ie, the next speaker of the third speaker), the camera with the tracking flag of 0 (ie, the second camera) is controlled. Going to the fourth speaker, after the camera with the tracking flag of 0 successfully acquires the appropriate video of the fourth speaker, the tracking flag of the camera with the tracking flag of 0 is set to 0 by 1, another camera The tracking flag of the first camera (ie, the first camera) is set to zero by one. When the speaker changes again, the alternate shooting is performed in the same manner.
S95 , 在拍摄所述当前讲话者视频的摄像机成功获取当前讲话者的视频之 后, 以双画面的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话 者的视频; 其中, 所述双画面包括互不包含的二部分画面, 一部分画面输出所 述当前讲话者, 另一部分画面输出所述当前讲话者的前一个讲话者。 S95, after the camera that captures the current speaker video successfully acquires the video of the current speaker, simultaneously output the video of the current speaker and the previous speaker of the current speaker in the form of a dual screen; The dual picture includes two partial pictures that are not included in each other, one part of the picture outputs the current speaker, and the other part of the picture outputs the previous speaker of the current speaker.
当标识为 0 的摄像机成功获取当前讲话者的视频之后, 所述追踪标志为 0 的摄像机的追踪标志由 0置为 1。 此时, 追踪标志为 1 的摄像机拍摄的是所述 当前讲话者的视频, 追踪标志为 0的摄像机拍摄的是所述当前讲话者的前一个 讲话者的视频。 在此, 所述以双画面的形式同时输出所述当前讲话者和所述当 前讲话者的前一个讲话者的视频, 是指在一个画面中输出所述当前讲话者, 在 另一个画面中输出所述当前讲话者的前一个讲话者, 上述两个画面互不包含。 这样, 远端会场的与会者除了可以观察到所述当前讲话者的面部表情之外, 还 可以观察到一方对于另一方发言的反应表情, 这些表情可能包含着重要的会议 信息, 这样, 可以尽可能多地保留重要的会议信息。 After the camera identified as 0 successfully acquires the video of the current speaker, the tracking flag of the camera with the tracking flag of 0 is set to 1 by 0. At this time, the camera with the tracking flag of 1 captures the video of the current speaker, and the camera with the tracking flag of 0 captures the video of the previous speaker of the current speaker. Here, the simultaneous output of the video of the current speaker and the previous speaker of the current speaker in the form of a two-picture means that the current speaker is outputted in one screen and outputted in another screen. The previous speaker of the current speaker, the above two pictures are not included in each other. In this way, in addition to observing the facial expression of the current speaker, the participant of the remote site can also observe the expression of one party's reaction to the other party's speech. These expressions may contain important meeting information, so that It is possible to retain important meeting information.
如图 1 0所示, 从左到右的三幅图中, 第一幅图表示会议开始时, 以双画面
的形式输出本地会场的全景; 第二幅图表示, 第一讲话者出现后, 在左侧画面 中输出第一讲话者, 右侧画面输出本地会场全景; 第三幅图表示, 讲话者由第 一讲话者变更为第二讲话者之后, 右侧画面中输出第二讲话者, 左侧画面输出 第一讲话者。 As shown in Figure 10, from the left to the right of the three figures, the first picture shows the beginning of the meeting, with a double screen The form outputs the panoramic view of the local site; the second picture shows that after the first speaker appears, the first speaker is output in the left picture, and the local picture is output on the right side; the third picture shows that the speaker is After a speaker changes to the second speaker, the second speaker is output on the right screen, and the first speaker is output on the left screen.
S96,在拍摄所述当前讲话者视频的摄像机成功获取所述当前讲话者的视频 之前, 在所述双画面中分别输出所述当前讲话者的前两个讲话者。 S96. Before the camera that captures the current speaker video successfully acquires the video of the current speaker, respectively output the first two speakers of the current speaker in the dual screen.
需要说明的是, 步骤 S96在步骤 S95之前执行。 It should be noted that step S96 is performed before step S95.
由于从讲话者发生变更开始, 到摄像机成功获取所述当前讲话者的视频结 束的过程中,摄像机会转动 /推拉摄像头,从而产生模糊或不稳定的画面。为此, 在所述双画面中分别输出所述当前讲话者的前两个讲话者, 可以避免输出所述 模糊或不稳定的画面。 Since the start of the change of the speaker, the camera will rotate/push and pull the camera during the video capture of the current speaker's video, resulting in a blurred or unstable picture. To this end, the first two speakers of the current talker are respectively output in the dual picture, and the output of the blurred or unstable picture can be avoided.
下面用对照的附图 1 1 A及 1 1 B进行说明。 如图 1 1 A, 按照从左到右的顺序, 约定三幅图分别为第一幅图、 第二幅图、 第三幅图。 第一幅图右侧画面讲话者 是第一幅图左侧讲话者的前一个讲话者, 第一幅图左侧画面讲话者是第三幅图 右侧画面讲话者的前一个讲话者。 现在, 讲话者由第一幅图左侧画面讲话者变 更为第三幅图右侧画面讲话者。 从讲话者发生变更开始到摄像机成功获取第三 幅图右侧画面讲话者的合适视频之前的过程中, 如果直接输出摄像机在转动 / 推拉摄像头过程中拍摄的画面, 就会出现第二幅图右侧画面中模糊或不稳定的 画面。 如图 1 1 B所示, 相应地, 本发明具体实施例在上述过程中, 输出的是第 一幅图讲话者的活动画面 (第二幅图右侧画面) 和第一幅图讲话者的前一个讲 话者的凝固画面(第二幅图左侧画面),能够避免输出所述模糊或不稳定的画面。 The following description will be made with reference to Figs. 1 1 A and 1 1 B. As shown in Fig. 1 1 A, in the order from left to right, the three figures are agreed to be the first picture, the second picture, and the third picture. The speaker on the right side of the first picture is the previous speaker of the speaker on the left side of the first picture. The speaker on the left side of the first picture is the previous speaker of the speaker on the right side of the third picture. Now, the speaker is changed from the speaker on the left side of the first picture to the speaker on the right side of the third picture. In the process from the change of the speaker until the camera successfully acquires the appropriate video of the speaker on the right side of the third picture, if the picture taken during the rotation/pushing of the camera is directly output, the second picture will appear. A blurred or unstable picture in the side view. As shown in FIG. 11B, correspondingly, in the above process, the specific embodiment of the present invention outputs the active picture of the first picture speaker (the picture on the right side of the second picture) and the speaker of the first picture. The solidified picture of the previous speaker (the picture on the left side of the second picture) can avoid outputting the blurred or unstable picture.
当然, 根据实际需要, 从讲话者发生变更到摄像机成功获取所述当前讲话 者的视频的过程中, 也可以采用图 7A的第二幅图所示的输出方式。 Of course, according to actual needs, the output mode shown in the second figure of Fig. 7A can also be used in the process from the change of the speaker to the successful acquisition of the video of the current speaker by the camera.
另外, 根据本地会场的情况, 本具体实施例在实现的过程中可能出现以下 几种情况, 对应的处理方式如下: In addition, according to the situation of the local site, the following situations may occur in the implementation process of the specific embodiment, and the corresponding processing manner is as follows:
( 1 )、 本地会场无人讲话 (1), no one speaks at the local venue
输出画面的组合方式不变, 仍输出本地会场的全景画面;
(2)、 本地会场单个人讲话, 无人插话 The combination of the output screens remains unchanged, and the panoramic view of the local site is still output; (2), a single person at the local venue speaks, no one is plugged in
一部分画面中输出当前讲话者, 另一部分画面输出的是所述当前讲话者的 前一个讲话者, 画面组合方式不变; One part of the picture outputs the current speaker, and the other part of the picture outputs the previous speaker of the current speaker, and the picture combination mode is unchanged;
(3)、 本地会场单个人在讲话, 有人插话, 但插话时间很短 (3), a single person in the local venue is speaking, someone is interjecting, but the time is very short.
一部分画面中输出主讲人, 另一部分画面不切换或者输出插话人, 优选是 所述另一部分画面不切换; One part of the picture outputs the presenter, and the other part of the picture does not switch or outputs the interjector, preferably the other part of the picture does not switch;
(4)、 本地会场单个人讲话, 时有移动 (4), a single person speaks at the local venue, there is a movement
若讲话者的走动, 头或身体的偏移没有超出当前的输出画面且位于该画面 的设定中心区域内, 则摄像机不切换, 也不做跟踪, 输出画面组合方式不变; 若讲话者的移动使得讲话者仍未超出当前的输出画面但有可能或已经超出当前 的输出画面的设定中心区域, 则摄像机不切换, 但可以做适当的跟踪, 以保持 讲话者位于中心区域内, 输出画面组合方式不变; 若讲话者的移动使得讲话者 已经超出了当前的输出画面, 则切换摄像机, 对讲话者进行跟踪; If the speaker moves, the head or body offset does not exceed the current output picture and is located in the set center area of the picture, the camera does not switch, does not track, and the output picture combination mode does not change; The movement makes the speaker still not beyond the current output picture but may or has exceeded the setting center area of the current output picture, the camera does not switch, but can be properly tracked to keep the speaker in the center area, output screen The combination mode is unchanged; if the speaker's movement causes the speaker to have exceeded the current output picture, the camera is switched to track the speaker;
(5)、 本地会场讲话人发生一次变更, 变更成旁边的人或其他人 (5) The local conference speaker has changed once and changed to the person or other person next to it.
若后一个讲话者位置未超出前一个讲话者的输出画面且位于该画面的设定 中心区域内, 则摄像机不切换, 也不做跟踪, 输出的画面是后一个讲话者位于 中心区域内的画面; 若后一个讲话者的位置仍未超出前一个讲话者的输出画面 但有可能或已经超出该画面的设定中心区域, 则摄像机不切换, 但可以做适当 的跟踪, 以保持后一个讲话者位于中心区域内, 输出的画面为后一个讲话者位 于中心区域内的画面; 若后一个讲话者位置已经超出了前一个讲话者的输出画 面, 则切换摄像机, 对后一个讲话者进行跟踪; If the next speaker position does not exceed the output screen of the previous speaker and is located in the set center area of the screen, the camera does not switch and does not perform tracking, and the output picture is the picture in which the latter speaker is located in the center area. If the position of the next speaker has not exceeded the output picture of the previous speaker but may or may have exceeded the set center area of the picture, the camera does not switch, but can be properly tracked to keep the next speaker. Located in the central area, the output picture is the picture in which the latter speaker is located in the central area; if the latter speaker position has exceeded the output picture of the previous speaker, the camera is switched to track the next speaker;
(6)、 本地会场多人同时讲话, 即抢话状态 (6), many people in the local venue speak at the same time, that is, the state of robbing
这种情况下抢话的时间通常很短, 输出画面的组合方式不变; In this case, the time for snatching is usually very short, and the combination of the output pictures is unchanged;
(7)、 本地会场多人讨论, 交替讲话, 即多次发生讲话人位置变更 摄像机交替跟踪每次位置发生变更后的讲话人,改变输出画面的组合方式, 即每次变更后, 一部分画面中输出当前讲话者, 另一部分画面输出的是所述当 前讲话者的前一个讲话者。
本具体实施例中, 每次发生讲话者的位置变更时, 均控制追踪标志为 0的 摄像机去跟踪拍摄位置变更后的讲话者, 而且, 在该摄像机成功地获取讲话者 大小合适的视频之后, 其追踪标志均由 0置为 1, 另一台摄像机的追踪标志则 由 1 置为 0。 这样就总可以保证在某个时刻, 有一台摄像机正在拍摄当前讲话 者, 同时还有另外一台摄像机可以用于拍摄所述当前讲话者的下一个讲话者。 也就是说, 两台摄像机可以相互配合、 无缝对接。 由于在讲话者的位置发生变 更时, 摄像机成功获取变更后的讲话者的视频的过程需要一定的时间。 在这期 间, 保持输出所述当前讲话者的前一个讲话者的视频, 只有在摄像机成功获取 当前讲话者的视频之后, 才输出所述当前讲话者的视频, 相对于现有技术需要 先将画面切换到会场的全景, 待摄像机成功获取变更后的讲话者的视频时, 才 将画面切换到变更后的讲话者, 本发明确实能够减少视频切换次数, 从而使画 面衔接紧密, 输出的视频更加流畅。 而且, 当会场中多人快速地交替讲话时, 现有技术专门用于拍摄讲话者视频的摄像机拍摄的画面会将多个讲话者包含进 来, 若所述多个讲话者距离较远, 则无法在所拍摄的画面中观察所述多个讲话 者的表情。 在本发明中, 由于所述第一摄像机和所述第二摄像机的相互配合, 即使在会场中存在讲话者快速地交替讲话, 两摄像机也能够交替拍摄讲话者的 面部画面。 此外, 通过双画面的形式输出当前讲话者和所述当前讲话者的前一 个讲话者的视频, 远端会场的与会者除了可以清楚地观察到所述当前讲话者面 部特写外, 还可以观察到本地会场中一方对于另一方发言的反应 (适合多人交 谈, 特别是两人交谈的情形), 这样, 就更多地保留有价值的会议信息。 (7), the local venue, many people discuss, alternate speech, that is, the speaker position change occurs multiple times. The camera alternately tracks the speaker after each position change, and changes the combination mode of the output screen, that is, after each change, part of the screen The current speaker is output, and the other part of the picture outputs the previous speaker of the current speaker. In this embodiment, each time the position change of the speaker occurs, the camera with the tracking flag of 0 is controlled to track the speaker after the change of the shooting position, and after the camera successfully acquires the video of the appropriate size of the speaker, The tracking flag is set to 0 by 0, and the tracking flag of the other camera is set to 0 by 1. This always ensures that at some point, one camera is shooting the current speaker, and another camera can be used to capture the next speaker of the current speaker. In other words, the two cameras can be mated and seamlessly mated. It takes a certain amount of time for the camera to successfully acquire the video of the changed speaker when the position of the speaker changes. During this period, the video of the previous speaker of the current speaker is kept output, and the video of the current speaker is output only after the camera successfully acquires the video of the current speaker, which is required to be compared with the prior art. Switching to the panoramic view of the site, when the camera successfully acquires the changed speaker's video, the screen is switched to the changed speaker. The present invention can reduce the number of video switching, thereby making the picture tightly connected and the output video smoother. . Moreover, when multiple people in the venue quickly talk alternately, the prior art camera shot for the camera video will include multiple speakers, if the multiple speakers are far apart, The expressions of the plurality of speakers are observed in the captured picture. In the present invention, due to the cooperation of the first camera and the second camera, even if there is a speaker who alternately speaks in the venue, the two cameras can alternately photograph the face of the speaker. In addition, the video of the current speaker and the previous speaker of the current speaker is outputted in the form of a two-screen, and the participant of the remote site can observe, besides, the face of the current speaker can be clearly observed. The reaction of one of the local venues to the other party's speech (suitable for multi-person conversations, especially when the two talk), thus retaining more valuable meeting information.
与本发明实施例提供的一种控制视频拍摄的方法相对应, 本发明实施例还 提供一种控制视频拍摄的装置。 本发明实施例提供的控制视频拍摄的装置可以 由具备控制处理功能的一类装置来实施, 所述装置举例而言可以为摄像机、 视频控制器、 视频终端等。 如图 1 2所示, 本发明实施例提供的一种控制视频 拍摄的装置 1 2包括: Corresponding to a method for controlling video capture provided by an embodiment of the present invention, an embodiment of the present invention further provides an apparatus for controlling video capture. The apparatus for controlling video shooting provided by the embodiment of the present invention may be implemented by a device having a control processing function, which may be, for example, a camera, a video controller, a video terminal, or the like. As shown in FIG. 12, an apparatus for controlling video shooting according to an embodiment of the present invention includes:
控制单元 1 21, 用于在第一讲话者讲话时, 控制第一摄像装置拍摄第一 讲话者的视频; 用于在当前讲话者从所述第一讲话者变更为第二讲话者时, 控
制第二摄像装置拍摄第二讲话者的视频, 其中, 所述第二讲话者为与所述第一 讲话者位置不同的下一个讲话者; 还用于当后续再发生讲话者变更时, 依次控 制所述第一摄像装置和所述第二摄像装置交替拍摄当前讲话者的视频。 The control unit 1 21 is configured to control, when the first speaker speaks, the first camera to capture a video of the first speaker; and to control when the current speaker changes from the first speaker to the second speaker The second camera device captures a video of the second speaker, wherein the second speaker is a next speaker different from the first speaker position; and is further used to sequentially change the speaker when the player changes The first camera and the second camera are controlled to alternately capture a video of the current speaker.
处理单元 1 22, 与所述控制单元 1 21 连接, 用于在成功获取所述当前讲 话者的视频之后输出所述当前讲话者的视频。 The processing unit 1 22 is connected to the control unit 1 21 for outputting the video of the current speaker after successfully acquiring the video of the current speaker.
其中, 可选地, 在一个实施例中, 所述控制单元 1 21 还可用于: 控制第一 摄像装置拍摄第一讲话者的视频之前, 在初始状态时, 控制所述第一摄像装置 和所述第二摄像装置拍摄整个会场的视频; Optionally, in an embodiment, the control unit 1 21 is further configured to: control the first camera device and the device in an initial state before the first camera device captures the video of the first speaker The second camera device captures a video of the entire venue;
所述处理单元 1 22, 还用于将所拍摄的视频输出。 The processing unit 1 22 is further configured to output the captured video.
可选地, 在另一个实施例中, 所述控制单元 1 21 还用于: 为所述第一摄像 装置和所述第二摄像装置分别设置追踪标志, 其中, 所述第一摄像装置的追踪 标志初始为第一追踪标志,所述第二摄像装置的追踪标志初始为第二追踪标志。 Optionally, in another embodiment, the control unit 1 21 is further configured to: separately set a tracking flag for the first camera device and the second camera device, where the tracking of the first camera device The flag is initially a first tracking flag, and the tracking flag of the second camera device is initially a second tracking flag.
所述控制单元 1 21 具体用于: 在第一讲话者讲话时, 控制具有第一追踪标 志的第一摄像装置去拍摄第一讲话者的视频, 在成功获取所述第一讲话者的视 频后, 将所述第一摄像装置的追踪标志从所述第一追踪标志设置为所述第二追 踪标志, 同时将所述第二摄像装置的追踪标志从所述第二追踪标志设置为所述 第一追踪标志。 The control unit 1 21 is specifically configured to: when the first speaker speaks, control the first camera with the first tracking flag to capture the video of the first speaker, after successfully acquiring the video of the first speaker Setting a tracking flag of the first camera device from the first tracking flag to the second tracking flag, and setting a tracking flag of the second camera device from the second tracking flag to the first A tracking sign.
所述控制单元 1 21 具体用于: 在当前讲话者从所述第一讲话者变更为第二 讲话者时, 控制具有所述第一追踪标志的第二摄像装置去拍摄第二讲话者的视 频, 在成功获取所述第二讲话者的视频后, 将所述第二摄像装置的追踪标志从 所述第一追踪标志设置为所述第二追踪标志, 同时将所述第一摄像装置的追踪 标志从所述第二追踪标志设置为所述第一追踪标志。 The control unit 1 21 is specifically configured to: when the current speaker changes from the first speaker to the second speaker, control the second camera with the first tracking flag to capture the video of the second speaker After successfully acquiring the video of the second speaker, setting the tracking flag of the second camera device from the first tracking flag to the second tracking flag, and simultaneously tracking the first camera device A flag is set from the second tracking flag to the first tracking flag.
所述控制单元 1 21 具体用于: 后续每次发生讲话者变更时, 控制具有所述 第一追踪标志的摄像装置去拍摄当前讲话者的视频, 在成功获取当前讲话者的 视频后, 将所述第一摄像装置和所述第二摄像装置的追踪标志互换。 The control unit 1 21 is specifically configured to: when each subsequent speaker change occurs, control the camera device having the first tracking flag to capture the video of the current speaker, and after successfully acquiring the video of the current speaker, The tracking flags of the first imaging device and the second imaging device are interchanged.
可选地, 控制单元 1 21 具体用于: 判断所述第二讲话者位置是否在所述第 一讲话者的输出画面中; 若所述第二讲话者位置不在所述第一讲话者的输出画
面中, 则控制所述第二摄像装置拍摄所述第二讲话者的视频; Optionally, the control unit 1 21 is specifically configured to: determine whether the second speaker position is in an output screen of the first speaker; if the second speaker position is not in an output of the first speaker Painting In the face, controlling the second camera to capture a video of the second speaker;
若所述第二讲话者位置在所述第一讲话者的输出画面中, 则进一步判断所 述第二讲话者位置是否在所述第一讲话者的输出画面的设定区域内; 若所述第 二讲话者位置在所述设定区域内, 则控制所述第一摄像装置拍摄所述第二讲话 者的视频; 若所述第二讲话者位置不在所述设定区域内, 则控制所述第一摄像 装置跟踪拍摄所述第二讲话者, 以使所述第二讲话者位置在所述设定区域内。 If the second speaker position is in the output screen of the first speaker, further determining whether the second speaker position is within a setting area of an output screen of the first speaker; The second speaker position is within the set area, then controlling the first camera to capture a video of the second speaker; if the second speaker position is not within the set area, controlling the location The first camera device tracks the second speaker to position the second speaker in the set area.
可选地, 所述控制单元 1 21 可具体用于: 利用声源定位技术, 控制摄像装 置拍摄讲话者的视频。 Optionally, the control unit 1 21 may be specifically configured to: control the camera to capture a video of the speaker by using a sound source localization technique.
进一步地, 所述控制单元 1 21 可具体用于: 利用声源定位技术并结合预置 位或图像识别技术, 控制摄像装置拍摄讲话者的视频。 Further, the control unit 1 21 may be specifically configured to: control the camera to capture a video of the speaker by using a sound source localization technique in combination with a preset position or an image recognition technology.
需要说明的是, 所述第一摄像装置和所述第二摄像装置可以由连接装置连 接固定在一起, 也可以相互独立。 It should be noted that the first imaging device and the second imaging device may be connected and fixed together by the connecting device, or may be independent of each other.
本实施例中, 有人开始讲话时, 控制单元 1 21 控制其中一台摄像装置拍摄 当前讲话者的视频, 处理单元 1 22在成功获取到当前讲话者的视频之后, 将所 述视频输出。 此时, 另一台摄像装置处于准备跟踪拍摄所述当前讲话者的下一 个讲话者的待命状态。 当后续讲话者发生变更时, 控制单元 1 21 可以立即控制 处于所述待命状态的摄像装置拍摄所述当前讲话者的下一个讲话者的视频。 由 于从讲话者的位置发生变更开始, 到获取变更后讲话者的合适视频的过程需要 时间, 本实施例在这期间输出到远端会场的画面不需要先切换到会场的全景, 而是继续输出变更前讲话者的视频, 这样, 能够减少视频切换次数, 从而使画 面衔接紧密, 输出的视频更加流畅。 而且, 由于控制单元 1 21 控制两台摄像装 置交替拍摄当前讲话者的视频, 即使在会场中存在讲话者快速地交替讲话, 两 台摄像装置也能够交替拍摄讲话者的面部画面,更多地保留有价值的会议信息。 In this embodiment, when someone starts speaking, the control unit 1 21 controls one of the camera devices to capture the video of the current speaker, and the processing unit 1 22 outputs the video after successfully acquiring the video of the current speaker. At this time, another camera device is in a standby state ready to track the next speaker of the current speaker. When the subsequent speaker changes, the control unit 1 21 can immediately control the video camera in the standby state to capture the video of the next speaker of the current speaker. Since the process of obtaining the appropriate video of the speaker after the change takes time from the change of the position of the speaker, the picture output to the remote site during this period does not need to be switched to the panorama of the site first, but continues to be output. The video of the former speaker is changed, so that the number of video switching can be reduced, so that the picture is closely connected and the output video is smoother. Moreover, since the control unit 1 21 controls the two camera devices to alternately capture the video of the current speaker, even if there is a speaker in the venue to quickly alternately speak, the two camera devices can alternately capture the face image of the speaker, more reserved. Valuable meeting information.
可选地, 在本发明的另一实施例中, 处理单元 1 22可以全屏输出所述当前 讲话者的视频。 处理单元 1 22具体用于: 在成功获取所述当前讲话者的视频之 后, 设置所述当前讲话者的视频全屏显示, 完成设置后, 全屏输出所述当前讲 话者的视频; 在成功获取所述当前讲话者的视频之前, 全屏输出所述当前讲话
者的前一个讲话者的视频。 Optionally, in another embodiment of the present invention, the processing unit 1 22 may output the video of the current speaker in full screen. The processing unit 1 22 is specifically configured to: after successfully acquiring the video of the current speaker, set a full-screen display of the video of the current speaker, and after completing the setting, output the video of the current speaker in full screen; Output the current speech in full screen before the current speaker's video The video of the previous speaker.
通过全屏输出所述当前讲话者的视频, 远端会场的与会者可以更加清楚地 观察到所述当前讲话者的面部特写,这些面部特写可能包含着重要的会议信息, 这样, 可以进一步保留有价值的会议信息。 By outputting the video of the current speaker in full screen, the participants of the remote site can more clearly observe the facial features of the current speaker, and these facial features may contain important meeting information, thus further retaining valuable value. Meeting information.
可选地, 在本发明的又一实施例中, 处理单元 1 22可以以画中画的形式同 时输出所述当前讲话者和所述当前讲话者的前一个讲话者的视频。 Optionally, in still another embodiment of the present invention, the processing unit 1 22 may simultaneously output the video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture.
处理单元 1 22具体用于: 在成功获取所述当前讲话者的视频之后, 设置所 述当前讲话者的视频和所述当前讲话者的前一个讲话者的视频以画中画的形式 进行显示; 其中, 所述画中画包括第一画面和包含在所述第一画面中的、 比所 述第一画面小的第二画面, 在所述第一画面中显示所述当前讲话者, 在所述第 二画面中显示所述当前讲话者的前一个讲话者; 设置完成后, 以画中画的形式 同时输出所述当前讲话者和所述当前讲话者的前一个讲话者的视频。 The processing unit 1 22 is specifically configured to: after successfully acquiring the video of the current speaker, set a video of the current speaker and a video of a previous speaker of the current speaker to be displayed in the form of picture-in-picture; The picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, and the current speaker is displayed in the first picture. The previous speaker of the current speaker is displayed in the second screen; after the setting is completed, the video of the current speaker and the previous speaker of the current speaker is simultaneously output in the form of picture-in-picture.
控制单元 1 21 还用于: 在当前讲话者从所述第二讲话者变更为第三讲话者 时, 控制所述第一摄像装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为 与所述第二讲话者位置不同的下一个讲话者。 The control unit 1 21 is further configured to: when the current speaker changes from the second speaker to the third speaker, control the first camera to capture a video of the third speaker, wherein the third speaker The next speaker is different from the second speaker position.
处理单元 1 22具体用于: 在成功获取所述第三讲话者的视频之前: 在所述 第一画面中输出所述第二讲话者, 在所述第二画面中输出所述第一讲话者的凝 固画面; 或者, 在所述第一画面中输出所述第二讲话者, 在所述第二画面中输 出已经开始拍摄但尚未成功获取过程中的所述第三讲话者; 在成功获取所述第 三讲话者的视频之后: 在所述第一画面中输出所述第三讲话者, 在所述第二画 面中输出所述第二讲话者。 The processing unit 1 22 is specifically configured to: before successfully acquiring the video of the third speaker: outputting the second speaker in the first screen, and outputting the first speaker in the second screen a solidified picture; or, outputting, in the first picture, the second speaker, outputting, in the second picture, the third speaker that has started shooting but has not successfully acquired; After the video of the third speaker: the third speaker is outputted in the first picture, and the second speaker is outputted in the second picture.
以画中画的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话 者的视频, 使得远端会场的与会者可以清楚地观察到所述当前讲话者的面部特 写, 同时还可以看到本地会场中讲话者变更的情况以及一方对于另一方发言的 反应, 这样, 就更多地保留了有价值的会议信息。 Simultaneously outputting the video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture, so that the participant of the remote site can clearly observe the facial close-up of the current speaker while still You can see the changes in the speaker in the local venue and the reaction of one party to the other, so that more valuable meeting information is retained.
可选地, 在本发明的再一实施例中, 处理单元 1 22可以以双画面的形式同 时输出所述当前讲话者和所述当前讲话者的前一个讲话者的视频。
处理单元 1 22具体用于: 在成功获取所述当前讲话者的视频之后, 设置所 述当前讲话者的视频和所述当前讲话者的前一个讲话者的视频以双画面的形式 进行显示; 其中, 所述双画面包括互不包含的二部分画面, 一部分画面显示所 述当前讲话者, 另一部分画面显示所述当前讲话者的前一个讲话者; 设置完成 后, 以双画面的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话 者的视频。 Optionally, in still another embodiment of the present invention, the processing unit 1 22 may simultaneously output the video of the current speaker and the previous speaker of the current speaker in a two-picture form. The processing unit 1 22 is specifically configured to: after successfully acquiring the video of the current speaker, set a video of the current speaker and a video of a previous speaker of the current speaker to be displayed in a dual screen; The dual screen includes two partial screens that are not included in each other, a part of the screen displays the current speaker, and another part of the screen displays the previous speaker of the current speaker; after the setting is completed, the simultaneous output is performed in the form of a dual screen. A video of the current speaker and the previous speaker of the current speaker.
控制单元 1 21 还用于: 在当前讲话者从所述第二讲话者变更为第三讲话者 时, 控制所述第一摄像装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为 与所述第二讲话者位置不同的下一个讲话者。 The control unit 1 21 is further configured to: when the current speaker changes from the second speaker to the third speaker, control the first camera to capture a video of the third speaker, wherein the third speaker The next speaker is different from the second speaker position.
处理单元 1 22具体用于: 在成功获取所述第三讲话者的视频之前: 在所述 一部分画面中输出所述第一讲话者的凝固画面, 在所述另一部分画面中输出所 述第二讲话者; 或者, 在所述一部分画面中输出已经开始拍摄但尚未成功获取 过程中的所述第三讲话者, 在所述另一部分画面中输出所述第二讲话者; 在成 功获取所述第三讲话者的视频之后:在所述一部分画面中输出所述第三讲话者, 在所述另一部分画面中输出所述第二讲话者。 The processing unit 1 22 is specifically configured to: before successfully acquiring the video of the third speaker: output a solidified picture of the first speaker in the part of the screen, and output the second in the other part of the picture a speaker; or, in the part of the screen, outputting the third speaker in the process of having started shooting but not successfully acquiring, outputting the second speaker in the other partial screen; After the video of the three talkers: the third speaker is outputted in the part of the picture, and the second speaker is outputted in the other part of the picture.
通过双画面的形式输出当前讲话者和所述当前讲话者的前一个讲话者的视 频, 远端会场的与会者除了可以清楚地观察到所述当前讲话者面部特写外, 还 可以观察到本地会场中一方对于另一方发言的反应 (适合多人交谈, 特别是两 人交谈的情形), 这样, 就更多地保留有价值的会议信息。 The video of the current speaker and the previous speaker of the current speaker is outputted in the form of a two-picture, and the participant of the remote site can observe the local meeting site in addition to the close-up of the current speaker's face. The reaction of one party to the other party's speech (suitable for multi-person conversations, especially in the case of two people talking), so that more valuable meeting information is retained.
值得注意的是, 上述控制视频拍摄的装置实施例中, 所包括的各个单元只 是按照功能逻辑进行划分的, 但并不局限于上述的划分, 只要能够实现相应的 功能即可; 另外, 各功能单元的具体名称也只是为了便于相互区分, 并不用于 限制本发明的保护范围。 It should be noted that, in the foregoing apparatus for controlling video shooting, each unit included is only divided according to functional logic, but is not limited to the above division, as long as the corresponding function can be implemented; The specific names of the units are also for convenience of distinction from each other and are not intended to limit the scope of the present invention.
下面参照图 1 3A至图 1 3G对本发明控制视频拍摄的装置的其他实施例进行 说明。 如图 1 3A所示, 本发明实施例提供的控制视频拍摄的装置 1 3包括: 控制器 1 31, 用于在第一讲话者讲话时, 控制第一摄像模块 1 32拍摄第一 讲话者的视频; 用于在当前讲话者从所述第一讲话者变更为第二讲话者时, 控
制第二摄像模块 133拍摄第二讲话者的视频, 其中, 所述第二讲话者为与所述 第一讲话者位置不同的下一个讲话者; 还用于当后续再发生讲话者变更时, 依 次控制第一摄像模块 132和第二摄像模块 133交替拍摄当前讲话者的视频。 Other embodiments of the apparatus for controlling video shooting of the present invention will now be described with reference to Figures 1 3A through 1G. As shown in FIG. 13A, the device 13 for controlling video capture provided by the embodiment of the present invention includes: a controller 1 31, configured to control the first camera module 1 32 to capture the first speaker when the first speaker speaks Video; for controlling when the current speaker changes from the first speaker to the second speaker The second camera module 133 captures a video of the second speaker, wherein the second speaker is a next speaker different from the first speaker position; and is further used when a speaker change occurs subsequently, The first camera module 132 and the second camera module 133 are sequentially controlled to alternately capture the video of the current speaker.
输出处理器 134, 与第一摄像模块 132和第二摄像模块 133连接, 用于在 成功获取所述当前讲话者的视频之后输出所述当前讲话者的视频。 The output processor 134 is coupled to the first camera module 132 and the second camera module 133 for outputting the video of the current speaker after successfully acquiring the video of the current speaker.
所述输出处理器 134可以集成在第一摄像模块 132或第二摄像模块 133中, 也可以与第一摄像模块 132和第二摄像模块 133分离。 The output processor 134 may be integrated in the first camera module 132 or the second camera module 133, or may be separated from the first camera module 132 and the second camera module 133.
其中, 可选地, 所述控制器 131 还可用于: 控制第一摄像模块 132拍摄第 一讲话者的视频之前, 在初始状态时, 控制第一摄像模块 132和第二摄像模块 133拍摄整个会场的视频; Optionally, the controller 131 is further configured to: before the first camera module 132 is configured to capture the video of the first speaker, in the initial state, control the first camera module 132 and the second camera module 133 to capture the entire site. Video
所述输出处理器 134, 还用于将所拍摄的整个会场的视频输出。 The output processor 134 is further configured to output the video of the entire conference site that is captured.
第一摄像模块 132与第二摄像模块 133可以相互独立, 也可以由连接装置 连接固定在一起,形成一个双摄像模块。第一摄像模块 132与第二摄像模块 133 可以集成在控制视频拍摄的装置 13上,也可以与控制视频拍摄的装置 13分离。 The first camera module 132 and the second camera module 133 may be independent of each other, or may be connected and fixed together by a connecting device to form a dual camera module. The first camera module 132 and the second camera module 133 may be integrated on the device 13 that controls video capture, or may be separate from the device 13 that controls video capture.
可选地, 在一个实施例中, 所述控制器 131 还可用于: 为所述第一摄像模 块 132和所述第二摄像模块 133分别设置追踪标志, 其中, 所述第一摄像模块 132的追踪标志初始为第一追踪标志, 所述第二摄像模块 133的追踪标志初始 为第二追踪标志。 Optionally, in an embodiment, the controller 131 is further configured to: set a tracking flag for the first camera module 132 and the second camera module 133, where the first camera module 132 The tracking flag is initially a first tracking flag, and the tracking flag of the second camera module 133 is initially a second tracking flag.
所述控制器 131 具体用于: 在第一讲话者讲话时, 控制具有第一追踪标志 的第一摄像模块 132去拍摄第一讲话者的视频, 在成功获取所述第一讲话者的 视频后, 将所述第一摄像模块 132的追踪标志从所述第一追踪标志设置为所述 第二追踪标志, 同时将所述第二摄像模块 133的追踪标志从所述第二追踪标志 设置为所述第一追踪标志。 The controller 131 is specifically configured to: when the first speaker speaks, control the first camera module 132 having the first tracking flag to capture the video of the first speaker, after successfully acquiring the video of the first speaker Setting the tracking flag of the first camera module 132 from the first tracking flag to the second tracking flag, and setting the tracking flag of the second camera module 133 from the second tracking flag to The first tracking mark is described.
所述控制器 131 具体用于: 在当前讲话者从所述第一讲话者变更为第二讲 话者时, 控制具有所述第一追踪标志的第二摄像模块 133去拍摄第二讲话者的 视频, 在成功获取所述第二讲话者的视频后, 将所述第二摄像模块 133的追踪 标志从所述第一追踪标志设置为所述第二追踪标志, 同时将所述第一摄像模块
132的追踪标志从所述第二追踪标志设置为所述第一追踪标志。 The controller 131 is specifically configured to: when the current speaker changes from the first speaker to the second speaker, control the second camera module 133 having the first tracking flag to capture the video of the second speaker After the video of the second speaker is successfully acquired, the tracking flag of the second camera module 133 is set from the first tracking flag to the second tracking flag, and the first camera module is simultaneously A tracking flag of 132 is set from the second tracking flag to the first tracking flag.
所述控制器 131 具体用于: 后续每次发生讲话者变更时, 控制具有所述第 一追踪标志的摄像装置去拍摄当前讲话者的视频, 在成功获取当前讲话者的视 频后, 将所述第一摄像模块 132和所述第二摄像模块 133的追踪标志互换。 The controller 131 is specifically configured to: when each subsequent speaker change occurs, control the camera device having the first tracking flag to capture a video of the current speaker, and after successfully acquiring the video of the current speaker, The tracking marks of the first camera module 132 and the second camera module 133 are interchanged.
如图 13B所示, 可选地, 本发明实施例提供的控制视频拍摄的装置 13还包 括: As shown in FIG. 13B, the apparatus 13 for controlling video shooting provided by the embodiment of the present invention further includes:
拾音麦克风阵列 135、 声源定位器 136, 用于: 利用声源定位技术, 获取 讲话者的位置, 其中, 根据拾音麦克风阵列 135拾取的声音, 声源定位器 136 利用声源定位技术进行定位。 控制器 131 根据声源定位得到的位置, 控制摄像 模块拍摄讲话者的视频。 The sound microphone array 135 and the sound source locator 136 are configured to: acquire a position of a speaker by using a sound source localization technique, wherein the sound source locator 136 performs sound source localization technology according to the sound picked up by the sound pickup microphone array 135. Positioning. The controller 131 controls the camera module to capture the video of the speaker based on the position obtained by the sound source localization.
如图 13B 所示, 进一步地, 本发明实施例提供的控制视频拍摄的装置 13 还包括: 图像定位器 137, 用于利用人脸检测、 肤色检测或唇动检测等图像识 别技术对讲话者进行定位; 控制器 131 可用于根据图像识别技术得到的位置信 息, 控制摄像模块拍摄讲话者的视频。 As shown in FIG. 13B, the apparatus 13 for controlling video shooting provided by the embodiment of the present invention further includes: an image locator 137, configured to perform image recognition on a speaker by using image recognition techniques such as face detection, skin color detection, or lip motion detection. The controller 131 can be used to control the camera module to capture the video of the speaker according to the position information obtained by the image recognition technology.
可选地, 控制器 131 根据声源定位得到的位置和预置位信息, 控制摄像模 块拍摄讲话者的视频。 Optionally, the controller 131 controls the camera module to capture the video of the speaker according to the position and preset position information obtained by the sound source.
可选地, 图像定位器 137具体用于判断所述第二讲话者位置是否在所述第 一讲话者的输出画面中; 若所述第二讲话者位置不在所述第一讲话者的输出画 面中, 则控制器 131控制第二摄像模块 133拍摄所述第二讲话者的视频; Optionally, the image locator 137 is specifically configured to determine whether the second speaker position is in an output screen of the first speaker; if the second speaker position is not in an output screen of the first speaker The controller 131 controls the second camera module 133 to capture the video of the second speaker;
若所述第二讲话者位置在所述第一讲话者的输出画面中, 则图像定位器 If the second speaker position is in the output screen of the first speaker, the image locator
137 进一步判断所述第二讲话者位置是否在所述第一讲话者的输出画面的设定 区域内; 若所述第二讲话者位置在所述设定区域内, 则控制器 131 控制第一摄 像模块 132拍摄所述第二讲话者的视频; 若所述第二讲话者位置不在所述设定 区域内, 则控制器 131 控制第一摄像模块 132跟踪拍摄所述第二讲话者, 以使 所述第二讲话者位置在所述设定区域内。 137 further determining whether the second speaker position is within a setting area of an output screen of the first speaker; if the second speaker position is within the setting area, the controller 131 controls the first The camera module 132 captures a video of the second speaker; if the second speaker position is not within the set area, the controller 131 controls the first camera module 132 to track the second speaker, so that The second talker position is within the set area.
本实施例中, 有人开始讲话时, 控制器 131 控制其中第一摄像模块 132拍 摄当前讲话者的视频, 输出处理器 134获取到当前讲话者的视频, 并输出该视
频。 此时, 第二摄像模块 1 33处于准备跟踪拍摄所述当前讲话者的下一个讲话 者的待命状态。 当后续讲话者发生变更时, 控制器 1 31 可以立即控制处于所述 待命状态的第二摄像模块 1 33拍摄所述当前讲话者的下一个讲话者的视频。 由 于从讲话者的位置发生变更开始, 到获取变更后讲话者的合适视频的过程需要 时间, 本实施例在这期间输出到远端会场的画面不需要先切换到会场的全景, 而是继续输出变更前讲话者的视频, 这样, 能够减少视频切换次数, 从而使画 面衔接紧密, 输出的视频更加流畅。 而且, 由于控制器 1 31 控制两个摄像模块 交替拍摄当前讲话者的视频, 即使在会场中存在讲话者快速地交替讲话, 两个 摄像模块也能够交替拍摄讲话者的面部画面, 更多地保留有价值的会议信息。 In this embodiment, when someone starts speaking, the controller 131 controls the first camera module 132 to capture the video of the current speaker, the output processor 134 acquires the video of the current speaker, and outputs the video. Frequency. At this time, the second camera module 1 33 is in a standby state ready to track the next speaker of the current speaker. When the subsequent speaker changes, the controller 1 31 can immediately control the second camera module 1 33 in the standby state to capture the video of the next speaker of the current speaker. Since the process of obtaining the appropriate video of the speaker after the change takes time from the change of the position of the speaker, the picture output to the remote site during this period does not need to be switched to the panorama of the site first, but continues to be output. The video of the former speaker is changed, so that the number of video switching can be reduced, so that the picture is closely connected and the output video is smoother. Moreover, since the controller 1 31 controls the two camera modules to alternately capture the video of the current speaker, even if there is a speaker in the venue to quickly alternately speak, the two camera modules can alternately capture the speaker's face picture, more reserved Valuable meeting information.
可选地, 在本发明的另一实施例中, 输出处理器 1 34可以全屏输出所述当 前讲话者的视频。 输出处理器 1 34具体用于: 在成功获取所述当前讲话者的视 频之后, 设置所述当前讲话者的视频全屏显示, 完成设置后, 全屏输出所述当 前讲话者的视频; 在成功获取所述当前讲话者的视频之前, 全屏输出所述当前 讲话者的前一个讲话者的视频。 Alternatively, in another embodiment of the present invention, the output processor 134 may output the video of the current speaker in full screen. The output processor 1 is specifically configured to: after successfully acquiring the video of the current speaker, set a full-screen display of the current speaker's video, and after completing the setting, output the video of the current speaker in full screen; Before the current speaker's video, the video of the current speaker's previous speaker is output in full screen.
通过全屏输出所述当前讲话者的视频, 远端会场的与会者可以更加清楚地 观察到所述当前讲话者的面部特写,这些面部特写可能包含着重要的会议信息, 这样, 可以进一步保留有价值的会议信息。 By outputting the video of the current speaker in full screen, the participants of the remote site can more clearly observe the facial features of the current speaker, and these facial features may contain important meeting information, thus further retaining valuable value. Meeting information.
可选地, 在本发明的又一实施例中, 输出处理器 1 34可以以画中画的形式 同时输出所述当前讲话者和所述当前讲话者的前一个讲话者的视频。 Alternatively, in still another embodiment of the present invention, the output processor 134 may simultaneously output the video of the current speaker and the previous speaker of the current speaker in a picture-in-picture format.
输出处理器 1 34具体用于: 在成功获取所述当前讲话者的视频之后, 设置 所述当前讲话者的视频和所述当前讲话者的前一个讲话者的视频以画中画的形 式进行显示; 其中, 所述画中画包括第一画面和包含在所述第一画面中的、 比 所述第一画面小的第二画面, 在所述第一画面中显示所述当前讲话者, 在所 述第二画面中显示所述当前讲话者的前一个讲话者; 设置完成后, 以画中 画的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话者的 视频。 The output processor 1 is specifically configured to: after successfully acquiring the video of the current speaker, set the video of the current speaker and the video of the previous speaker of the current speaker to be displayed in the form of picture-in-picture Wherein the picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, and the current speaker is displayed in the first picture, The previous speaker of the current speaker is displayed in the second screen; after the setting is completed, the video of the current speaker and the previous speaker of the current speaker is simultaneously output in the form of picture-in-picture.
控制器 1 31还用于:在当前讲话者从所述第二讲话者变更为第三讲话者时,
控制第一摄像模块 1 32拍摄第三讲话者的视频, 其中, 所述第三讲话者为与所 述第二讲话者位置不同的下一个讲话者。 The controller 1 31 is further configured to: when the current speaker changes from the second speaker to the third speaker, The first camera module 1 32 is controlled to capture a video of the third speaker, wherein the third speaker is the next speaker different from the second speaker position.
输出处理器 1 34具体用于: 在成功获取所述第三讲话者的视频之前: 在所 述第一画面中输出所述第二讲话者, 在所述第二画面中输出所述第一讲话者的 凝固画面; 或者, 在所述第一画面中输出所述第二讲话者, 在所述第二画面中 输出已经开始拍摄但尚未成功获取过程中的所述第三讲话者; 在成功获取所述 第三讲话者的视频之后: 在所述第一画面中输出所述第三讲话者, 在所述第二 画面中输出所述第二讲话者。 The output processor 1 is specifically configured to: before successfully acquiring the video of the third speaker: outputting the second speaker in the first screen, and outputting the first speech in the second screen a solidified picture of the person; or, outputting, in the first picture, the second speaker, outputting, in the second picture, the third speaker who has started shooting but has not successfully acquired; After the video of the third speaker: the third speaker is outputted in the first picture, and the second speaker is outputted in the second picture.
以画中画的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话 者的视频, 使得远端会场的与会者可以清楚地观察到所述当前讲话者的面部特 写, 同时还可以看到本地会场中讲话者变更的情况以及一方对于另一方发言的 反应, 这样, 就进一步地保留了有价值的会议信息。 Simultaneously outputting the video of the current speaker and the previous speaker of the current speaker in the form of picture-in-picture, so that the participant of the remote site can clearly observe the facial close-up of the current speaker while still You can see the changes in the speaker in the local venue and the reaction of one party to the other, so that valuable meeting information is further retained.
可选地, 在本发明的再一实施例中, 输出处理器 1 34可以以双画面的形式 同时输出所述当前讲话者和所述当前讲话者的前一个讲话者的视频。 Alternatively, in still another embodiment of the present invention, the output processor 134 may simultaneously output the video of the current speaker and the previous speaker of the current speaker in a two-picture form.
输出处理器 1 34具体用于: 在成功获取所述当前讲话者的视频之后, 设置 所述当前讲话者的视频和所述当前讲话者的前一个讲话者的视频以双画面的形 式进行显示; 其中, 所述双画面包括互不包含的二部分画面, 一部分画面显示 所述当前讲话者, 另一部分画面显示所述当前讲话者的前一个讲话者; 设 置完成后, 以双画面的形式同时输出所述当前讲话者和所述当前讲话者的 前一个讲话者的视频。 The output processor 1 is specifically configured to: after successfully acquiring the video of the current speaker, set a video of the current speaker and a video of a previous speaker of the current speaker to be displayed in a dual screen; The dual screen includes two partial screens that are not included in each other, a part of the screen displays the current speaker, and another part of the screen displays the previous speaker of the current speaker; after the setting is completed, the two speakers simultaneously output in the form of two pictures. A video of the current speaker and the previous speaker of the current speaker.
控制器 1 31还用于:在当前讲话者从所述第二讲话者变更为第三讲话者时, 控制所述第一摄像模块 1 32拍摄第三讲话者的视频, 其中, 所述第三讲话者为 与所述第二讲话者位置不同的下一个讲话者。 The controller 1 31 is further configured to: when the current speaker changes from the second speaker to the third speaker, control the first camera module 1 32 to capture a video of the third speaker, wherein the third The speaker is the next speaker who is different from the second speaker position.
输出处理器 1 34具体用于: 在成功获取所述第三讲话者的视频之前: 在所 述一部分画面中输出所述第一讲话者的凝固画面, 在所述另一部分画面中输出 所述第二讲话者; 或者, 在所述一部分画面中输出已经开始拍摄但尚未成功获 取过程中的所述第三讲话者, 在所述另一部分画面中输出所述第二讲话者; 在
成功获取所述第三讲话者的视频之后: 在所述一部分画面中输出所述第三讲话 者, 在所述另一部分画面中输出所述第二讲话者。 The output processor 1 is specifically configured to: before successfully acquiring the video of the third speaker: outputting a solidified picture of the first speaker in the part of the screen, and outputting the first part in the part of the screen a second speaker; or, in the part of the screen, outputting the third speaker who has started shooting but has not successfully acquired, and outputs the second speaker in the other partial screen; After successfully acquiring the video of the third speaker: outputting the third speaker in the part of the picture, and outputting the second speaker in the other part of the picture.
通过双画面的形式输出当前讲话者和所述当前讲话者的前一个讲话者的视 频, 远端会场的与会者除了可以清楚地观察到所述当前讲话者面部特写外, 还 可以观察到本地会场中一方对于另一方发言的反应, 这样, 就进一步地保留有 价值的会议信息。 The video of the current speaker and the previous speaker of the current speaker is outputted in the form of a two-picture, and the participant of the remote site can observe the local meeting site in addition to the close-up of the current speaker's face. The reaction of one of the parties to the other party's speech, thus further retaining valuable meeting information.
下面结合附图通过一个具体完整的实施例对本发明实施例提供的控制视频 拍摄的装置 13进行说明。如图 13G所示, 本发明实施例提供的控制视频拍摄的 装置 13包括: The device 13 for controlling video shooting provided by the embodiment of the present invention will be described below with reference to the accompanying drawings. As shown in FIG. 13G, the apparatus 13 for controlling video capture provided by the embodiment of the present invention includes:
控制器 131;第一摄像模块 132,初始的追踪标志设为 0;第二摄像模块 133, 初始的追踪标志设为 1 ;输出处理器 134;拾音麦克风阵列 135;声源定位器 136; 图像定位器 137; 主控模块 138; 视频模块 139; 视频信号处理器 140; 音频模 块 141 ; 音频信号处理器 142; 拾音麦克风 143; 扬声器 144; 显示器 145。 上 述各个部分可以集成一个完整的装置, 也可以是相互分离的部分, 并在控制器 131和主控模块 138的控制下协调工作。 The controller 131; the first camera module 132, the initial tracking flag is set to 0; the second camera module 133, the initial tracking flag is set to 1; the output processor 134; the microphone array 135; the sound source locator 136; a locator 137; a main control module 138; a video module 139; a video signal processor 140; an audio module 141; an audio signal processor 142; a pickup microphone 143; a speaker 144; Each of the above sections may be integrated into a complete device or separate parts and coordinated under the control of the controller 131 and the main control module 138.
在控制视频拍摄的装置 13开启后, 即在会议开始时, 本地会场还没有人讲 话, 为了将本地会场的布置情况传送到远端会场, 控制器 131 可以控制所述两 个摄像模块拍摄会场的全景。在摄像模块拍摄到本地会场的视频之后, 优选地, 利用视频模块 139中的视频信号处理器 140对第二摄像模块 133拍摄的视频进 行编解码等处理, 并在主控模块 138的控制下, 将该视频通过网络传送到远端 会场。 After the device 13 for controlling the video shooting is turned on, that is, when the conference starts, no one is speaking at the local conference site. In order to transmit the layout of the local conference site to the remote conference site, the controller 131 can control the two camera modules to shoot the conference site. panoramic. After the camera module captures the video of the local site, the video captured by the second camera module 133 is encoded and decoded by the video signal processor 140 in the video module 139, and under the control of the main control module 138, The video is transmitted to the remote site through the network.
当本地会场中有一个人开始讲话时, 即出现第一讲话者时, 拾音麦克风阵 列 135拾取本地会场的声音, 将所述本地会场的声音发送给声源定位器 136, 其中, 所述本地会场的声音在发送给声源定位器 136的过程中, 可以经过由音 频模块 141 的内部模块 (例如具有预处理功能的模块) 对其的进行去噪声等处 理之后, 再发送给声源定位器 136。 声源定位器 136根据声源定位产生的位置 信息,控制器 131获取声源定位器 136产生的位置信息,控制第一摄像模块 132
(即追踪标志为 0的摄像模块) 转动到合适的角度, 粗略获取第一讲话者的视 频。 然后, 图像定位器 137根据第一摄像模块 132获取的第一讲话者的视频, 利用图像识别技术确定所述第一讲话者的准确位置(包括面部位置)。在控制器 131 的控制下, 第一摄像模块 132 (即追踪标志为 0的摄像模块) 转动 /推拉摄 像头, 拍摄所述第一讲话者的合适视频。 第一摄像模块 132在成功拍摄到所述 第一讲话者的视频之后, 其追踪标志由 0置 1, 第二摄像模块 133的追踪标志 由 1 置 0。 When a person in the local site starts to talk, that is, when the first speaker appears, the microphone array 135 picks up the sound of the local site, and sends the voice of the local site to the sound source locator 136, where the local site The sound may be sent to the sound source locator 136, and may be sent to the sound source locator 136 after being subjected to denoising or the like by an internal module of the audio module 141 (for example, a module having a preprocessing function). . The sound source locator 136 obtains the position information generated by the sound source locator 136 according to the position information generated by the sound source localization, and controls the first camera module 132. (ie, the camera module with the tracking flag of 0) Rotate to the appropriate angle to roughly capture the video of the first speaker. Then, the image locator 137 determines the exact position (including the face position) of the first speaker using image recognition technology based on the video of the first speaker acquired by the first camera module 132. Under the control of the controller 131, the first camera module 132 (i.e., the camera module with the tracking flag of 0) rotates/pushes the camera to capture the appropriate video of the first speaker. After the first camera module 132 successfully captures the video of the first speaker, its tracking flag is set by 0, and the tracking flag of the second camera module 133 is set to 0 by 1.
在第一摄像模块 132在成功拍摄到所述第一讲话者的视频之后, 如果讲话 者发生变化, 即由所述第一讲话者变更为所述第二讲话者, 控制器 131 可以控 制所述追踪标志为 0的摄像模块 (即第二摄像模块 133) 去拍摄所述第二讲话 者的视频, 控制拍摄的方法同上。 当第二摄像模块 133拍摄到所述第二讲话者 的合适视频之后, 其追踪标志由 0置为 1, 第一摄像模块 132的追踪标志则由 1 置为 0。 After the first camera module 132 successfully captures the video of the first speaker, if the speaker changes, that is, the first speaker changes to the second speaker, the controller 131 can control the The camera module with the tracking flag of 0 (ie, the second camera module 133) captures the video of the second speaker, and the method of controlling the shooting is the same as above. After the second camera module 133 captures the appropriate video of the second speaker, its tracking flag is set to 0, and the tracking flag of the first camera module 132 is set to 0 by 1.
像上述那样地, 每次发生讲话者变更时, 控制器 131 均控制追踪标志为 0 的摄像模块 (具体可能为第一摄像模块 132或第二摄像模块 133) 去跟踪拍摄 变更后的讲话者, 而且, 在该摄像模块成功地拍摄讲话者的合适视频之后, 其 追踪标志均由 0置为 1, 另一个摄像模块的追踪标志则由 1 置为 0。 As described above, each time the speaker changes, the controller 131 controls the camera module with the tracking flag of 0 (specifically, the first camera module 132 or the second camera module 133) to track the speaker after the change of the shooting. Moreover, after the camera module successfully captures the appropriate video of the speaker, its tracking flag is set to 1 by 0, and the tracking flag of the other camera module is set to 0 by 1.
在摄像模块成功拍摄讲话者的视频之后, 输出处理器 134从摄像模块处获 取所述讲话者的视频。 在获取到所述讲话者的视频后, 输出处理器 134可以设 置视频的输出方式, 可以以全屏、 画中画或双画面等方式输出获取到的所述讲 话者的视频。 After the camera module successfully captures the speaker's video, the output processor 134 retrieves the speaker's video from the camera module. After the video of the speaker is obtained, the output processor 134 can set the output mode of the video, and the obtained video of the speaker can be output in a full screen, picture-in-picture or dual-picture manner.
输出处理器 134在设置视频的输出方式完成后, 将所述讲话者的视频发送 给视频信号处理器 140, 由视频信号处理器 140对所述讲话者的视频进行编码 等处理。 然后, 在主控模块 138的控制下, 从视频信号处理器 140开始将所述 讲话者的视频通过网络传送到远端会场。 After the output mode of the set video is completed, the output processor 134 transmits the video of the speaker to the video signal processor 140, and the video signal processor 140 encodes the video of the speaker. Then, under the control of the main control module 138, the video of the talker is transmitted from the video signal processor 140 to the remote conference site through the network.
进一步地, 在摄像模块成功获取当前讲话者的视频之前, 主控模块 138可 以控制输出处理器 134输出所述当前讲话者的前一个讲话者的视频。
另外, 音频信号处理器 1 42用于对拾音麦克风 1 43拾取到的本地会场的讲 话者的声音进行编码等处理, 需要说明的是, 拾音麦克风 1 43拾取的声音的用 途不同于拾音麦克风阵列 1 35拾取的声音, 前者用于同摄像模块拍摄的视频一 起传送到远端会场, 后者用于声源定位。 扬声器 1 44和显示器 1 45都是控制视 频拍摄的装置 1 3的基本配置, 分别用于在本地会场中输出音频和视频。 Further, before the camera module successfully acquires the video of the current speaker, the main control module 138 can control the output processor 134 to output the video of the previous speaker of the current speaker. In addition, the audio signal processor 1 42 is used to process the sound of the speaker of the local venue picked up by the pickup microphone 143. It should be noted that the sound picked up by the pickup microphone 143 is different from the pickup. The sound picked up by the microphone array 1 35 is used for transmission to the remote site together with the video captured by the camera module, and the latter is used for sound source localization. Both the speaker 1 44 and the display 1 45 are basic configurations of the device 13 for controlling video capture, respectively for outputting audio and video in a local venue.
本说明书中的各个实施例已有侧重地进行了描述, 各个实施例之间相同相 似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。 尤其, 对于装置实施例而言, 由于其基本相似于方法实施例, 所以描述得比较 简单, 相关之处参见方法实施例的部分说明即可。 The various embodiments in the present specification have been described in detail, and the same reference numerals may be referred to between the various embodiments, and each embodiment focuses on the differences from the other embodiments. In particular, for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
需说明的是, 以上所描述的装置实施例仅仅是示意性的, 其中所述作为分 离部件说明的单元可以是或者也可以不是物理上分开的, 作为单元显示的部件 可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多 个网络单元上。 可以根据实际的需要选择其中的部分或者全部模块来实现本实 施例方案的目的。 另外, 本发明提供的装置实施例附图中, 模块之间的连接关 系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。 本领域普通技术人员在不付出创造性劳动的情况下, 即可以理解并实施。 It should be noted that the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as the cells may or may not be physical. Units can be located in one place, or they can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the embodiments of the present invention. In addition, in the drawings of the apparatus embodiments provided by the present invention, the connection relationship between the modules indicates that there is a communication connection therebetween, and specifically, one or more communication buses or signal lines can be realized. Those of ordinary skill in the art can understand and implement without any creative effort.
本领域普通技术人员将会理解, 本发明的各个方面、 或各个方面的可能实 现方式可以被具体实施为系统、 方法或者计算机程序产品。 因此, 本发明的各 方面、 或各个方面的可能实现方式可以采用完全硬件实施例、 完全软件实施例 (包括固件、 驻留软件等等), 或者组合软件和硬件方面的实施例的形式, 在这 里都统称为 "电路"、 "模块" 或者 "系统"。 此外, 本发明的各方面、 或各个方 面的可能实现方式可以采用计算机程序产品的形式, 计算机程序产品是指存储 在计算机可读介^中的计算机可读程序代码。 Those of ordinary skill in the art will appreciate that aspects of the invention, or aspects of various aspects, may be embodied as a system, method, or computer program product. Thus, aspects of the invention, or possible implementations of various aspects, may be in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, etc.), or a combination of software and hardware aspects, They are collectively referred to herein as "circuits," "modules," or "systems." Furthermore, aspects of the invention, or various possible implementations of the invention, may take the form of a computer program product, which is a computer readable program code stored in a computer readable medium.
计算机可读介^可以是计算机可读信号介^或者计算机可读存储介^。 计 算机可读存储介^包含但不限于电子、 磁性、 光学、 电磁、 红外或半导体系统、 设备或者装置, 或者前述的任意适当组合, 如随机存取存储器 (RAM)、 只读存 储器 (R0M)、 可擦除可编程只读存储器(EPR0M 或者快闪存储器)、 光纤、 便携
式只读存储器(CD-ROM)。 The computer readable medium can be a computer readable signal or a computer readable storage medium. The computer readable storage device includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPR0M or flash memory), optical fiber, portable Read only memory (CD-ROM).
计算机中的处理器读取存储在计算机可读介^中的计算机可读程序代码, 使得处理器能够执行在流程图中每个步骤、或各步骤的组合中规定的功能动作; 生成实施在框图的每一块、 或各块的组合中规定的功能动作的装置。 The processor in the computer reads the computer readable program code stored in the computer readable medium, such that the processor can perform the functional actions specified in each step or combination of steps in the flowchart; A device that functions as specified in each block, or combination of blocks.
计算机可读程序代码可以完全在用户的计算机上执行、 部分在用户的计算 机上执行、 作为单独的软件包、 部分在用户的计算机上并且部分在远程计算机 上, 或者完全在远程计算机或者服务器上执行。 也应该注意, 在某些替代实施 方案中, 在流程图中各步骤、 或框图中各块所注明的功能可能不按图中注明的 顺序发生。 例如, 依赖于所涉及的功能, 接连示出的两个步骤、 或两个块实际 上可能被大致同时执行, 或者这些块有时候可能被以相反顺序执行。 The computer readable program code can be executed entirely on the user's computer, partly on the user's computer, as a separate software package, partly on the user's computer and partly on the remote computer, or entirely on the remote computer or server. . It should also be noted that in some alternative implementations, the functions noted in the various steps of the flowcharts or in the blocks of the block diagrams may not occur in the order noted in the drawings. For example, two steps, or two blocks shown in succession, may be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限于 此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到 变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应 以权利要求的保护范围为准。
The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of protection of the invention should be determined by the scope of the claims.
Claims
1、 一种控制视频拍摄的方法, 其特征在于, 包括: 1. A method of controlling video shooting, characterized by including:
在第一讲话者讲话时, 控制第一摄像装置拍摄第一讲话者的视频; 在当前讲话者从所述第一讲话者变更为第二讲话者时, 控制第二摄像装置 拍摄第二讲话者的视频, 其中, 所述第二讲话者为与所述第一讲话者位置不同 的下一个讲话者; When the first speaker speaks, control the first camera device to capture the video of the first speaker; when the current speaker changes from the first speaker to the second speaker, control the second camera device to capture the second speaker A video of , wherein the second speaker is the next speaker whose position is different from that of the first speaker;
当后续再发生讲话者变更时, 依次控制所述第一摄像装置和所述第二摄像 装置交替拍摄当前讲话者的视频; When the speaker changes subsequently, the first camera device and the second camera device are sequentially controlled to alternately capture the video of the current speaker;
在成功获取所述当前讲话者的视频之后, 输出所述当前讲话者的视频。 After successfully acquiring the video of the current speaker, the video of the current speaker is output.
2、 根据权利要求 1 所述的方法, 其特征在于, 所述输出所述当前讲话者的 视频包括: 全屏输出所述当前讲话者的视频。 2. The method according to claim 1, wherein the outputting the video of the current speaker includes: outputting the video of the current speaker in full screen.
3、 根据权利要求 2所述的方法, 其特征在于, 所述全屏输出所述当前讲话 者的视频包括: 3. The method according to claim 2, wherein the full-screen output of the video of the current speaker includes:
在成功获取所述当前讲话者的视频之前, 全屏输出所述当前讲话者的前一 个讲话者的视频; Before successfully acquiring the video of the current speaker, output the video of the previous speaker of the current speaker in full screen;
在成功获取所述当前讲话者的视频之后,全屏输出所述当前讲话者的视频。 After successfully acquiring the video of the current speaker, the video of the current speaker is output in full screen.
4、 根据权利要求 1 所述的方法, 其特征在于, 所述输出所述当前讲话者的 视频包括: 以画中画的形式同时输出所述当前讲话者和所述当前讲话者的前一 个讲话者的视频; 4. The method according to claim 1, wherein the outputting the video of the current speaker includes: simultaneously outputting the current speaker and the previous speech of the current speaker in a picture-in-picture form. author’s video;
其中, 所述画中画包括第一画面和包含在所述第一画面中的、 比所述第一 画面小的第二画面, 在所述第一画面中输出所述当前讲话者, 在所述第二画面 中输出所述当前讲话者的前一个讲话者。 Wherein, the picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, the current speaker is output in the first picture, and the The previous speaker of the current speaker is output in the second screen.
5、 根据权利要求 4所述的方法, 其特征在于, 所述方法还包括: 在当前讲话者从所述第二讲话者变更为第三讲话者时, 控制所述第一摄像 装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为与所述第二讲话者位置 不同的下一个讲话者; 5. The method of claim 4, further comprising: when the current speaker changes from the second speaker to a third speaker, controlling the first camera device to capture the third speaker. A video of a speaker, wherein the third speaker is the next speaker whose position is different from that of the second speaker;
所述以画中画的形式同时输出所述当前讲话者和所述当前讲话者的前一个
讲话者的视频包括: The current speaker and the previous speaker of the current speaker are simultaneously output in the form of picture-in-picture. Videos of speakers include:
在成功获取所述第三讲话者的视频之前: 在所述第一画面中输出所述第二 讲话者, 在所述第二画面中输出所述第一讲话者的凝固画面; 或者, 在所述第 一画面中输出所述第二讲话者, 在所述第二画面中输出已经开始拍摄但尚未成 功获取过程中的所述第三讲话者; Before successfully acquiring the video of the third speaker: output the second speaker in the first picture, and output the frozen picture of the first speaker in the second picture; or, in the The second speaker is output in the first screen, and the third speaker who has started shooting but has not yet been successfully acquired is output in the second screen;
在成功获取所述第三讲话者的视频之后: 在所述第一画面中输出所述第三 讲话者, 在所述第二画面中输出所述第二讲话者。 After successfully acquiring the video of the third speaker: output the third speaker in the first picture, and output the second speaker in the second picture.
6、 根据权利要求 1 所述的方法, 其特征在于, 所述输出所述当前讲话者的 视频包括: 以双画面的形式同时输出所述当前讲话者和所述当前讲话者的前一 个讲话者的视频; 6. The method according to claim 1, wherein the outputting the video of the current speaker includes: simultaneously outputting the current speaker and the previous speaker of the current speaker in the form of a dual picture. 's video;
其中, 所述双画面包括互不包含的二部分画面, 一部分画面输出所述当前 讲话者, 另一部分画面输出所述当前讲话者的前一个讲话者。 Wherein, the double picture includes two parts of the picture that are mutually exclusive, one part of the picture outputs the current speaker, and the other part of the picture outputs the previous speaker of the current speaker.
7、 根据权利要求 6所述的方法, 所述方法还包括: 7. The method according to claim 6, further comprising:
在当前讲话者从所述第二讲话者变更为第三讲话者时, 控制所述第一摄像 装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为与所述第二讲话者位置 不同的下一个讲话者; When the current speaker changes from the second speaker to a third speaker, control the first camera device to capture a video of the third speaker, wherein the third speaker is the same as the second speaker Next speaker in a different position;
所述以双画面的形式同时输出所述当前讲话者和所述当前讲话者的前一个 讲话者的视频包括: The simultaneous outputting of the video of the current speaker and the previous speaker of the current speaker in the form of dual pictures includes:
在成功获取所述第三讲话者的视频之前: 在所述一部分画面中输出所述第 一讲话者的凝固画面, 在所述另一部分画面中输出所述第二讲话者; 或者, 在 所述一部分画面中输出已经开始拍摄但尚未成功获取过程中的所述第三讲话 者, 在所述另一部分画面中输出所述第二讲话者; Before successfully acquiring the video of the third speaker: output the frozen picture of the first speaker in the part of the picture, and output the second speaker in the other part of the picture; or, in the The third speaker who has started shooting but has not yet been successfully acquired is output in a part of the picture, and the second speaker is output in the other part of the picture;
在成功获取所述第三讲话者的视频之后: 在所述一部分画面中输出所述第 三讲话者, 在所述另一部分画面中输出所述第二讲话者。 After the video of the third speaker is successfully obtained: the third speaker is output in the part of the picture, and the second speaker is output in the other part of the picture.
8、 根据权利要求 1 所述的方法, 其特征在于, 在所述控制第一摄像装置拍 摄第一讲话者的视频之前, 所述方法还包括: 8. The method according to claim 1, characterized in that, before controlling the first camera device to capture the video of the first speaker, the method further includes:
在初始状态时, 控制所述第一摄像装置和所述第二摄像装置拍摄整个会场
的视频并将所拍摄的视频输出。 In the initial state, the first camera device and the second camera device are controlled to capture the entire venue. video and output the captured video.
9、根据权利要求 1 -8任意一项所述的方法, 其特征在于, 在所述控制第一 摄像装置拍摄第一讲话者的视频之前, 所述方法还包括: 9. The method according to any one of claims 1 to 8, characterized in that, before controlling the first camera device to capture the video of the first speaker, the method further includes:
为所述第一摄像装置和所述第二摄像装置分别设置追踪标志, 其中, 所述 第一摄像装置的追踪标志初始为第一追踪标志, 所述第二摄像装置的追踪标志 初始为第二追踪标志; Tracking flags are respectively set for the first camera device and the second camera device, wherein the tracking flag of the first camera device is initially the first tracking flag, and the tracking flag of the second camera device is initially the second tracking flag. tracking flag;
所述在第一讲话者讲话时,控制第一摄像装置拍摄第一讲话者的视频包括: 在第一讲话者讲话时, 控制具有第一追踪标志的第一摄像装置去拍摄第一讲话 者的视频, 在成功获取所述第一讲话者的视频后, 将所述第一摄像装置的追踪 标志从所述第一追踪标志设置为所述第二追踪标志, 同时将所述第二摄像装置 的追踪标志从所述第二追踪标志设置为所述第一追踪标志; The step of controlling the first camera device to capture the video of the first speaker when the first speaker is speaking includes: controlling the first camera device with the first tracking mark to capture the video of the first speaker when the first speaker is speaking. video, after successfully acquiring the video of the first speaker, setting the tracking flag of the first camera device from the first tracking flag to the second tracking flag, and simultaneously changing the tracking flag of the second camera device. a tracking flag is set from the second tracking flag to the first tracking flag;
所述在当前讲话者从所述第一讲话者变更为第二讲话者时, 控制第二摄像 装置拍摄第二讲话者的视频包括: 在当前讲话者从所述第一讲话者变更为第二 讲话者时, 控制具有所述第一追踪标志的第二摄像装置去拍摄第二讲话者的视 频, 在成功获取所述第二讲话者的视频后, 将所述第二摄像装置的追踪标志从 所述第一追踪标志设置为所述第二追踪标志, 同时将所述第一摄像装置的追踪 标志从所述第二追踪标志设置为所述第一追踪标志。 Controlling the second camera device to capture the video of the second speaker when the current speaker changes from the first speaker to the second speaker includes: when the current speaker changes from the first speaker to the second speaker When the speaker is the speaker, control the second camera device with the first tracking mark to capture the video of the second speaker, and after successfully acquiring the video of the second speaker, remove the tracking mark of the second camera device from The first tracking flag is set to the second tracking flag, and the tracking flag of the first camera device is set from the second tracking flag to the first tracking flag.
1 0、 根据权利要求 9所述的方法, 其特征在于, 10. The method according to claim 9, characterized in that,
所述当后续再发生讲话者变更时, 依次控制所述第一摄像装置和所述第二 摄像装置交替拍摄当前讲话者的视频包括: 后续每次发生讲话者变更时, 控制 具有所述第一追踪标志的摄像装置去拍摄当前讲话者的视频, 在成功获取当前 讲话者的视频后, 将所述第一摄像装置和所述第二摄像装置的追踪标志互换。 When the speaker changes subsequently, controlling the first camera device and the second camera device to alternately capture the video of the current speaker includes: each time the speaker changes subsequently, controlling the first camera to The camera device tracking the mark captures the video of the current speaker. After successfully acquiring the video of the current speaker, the tracking marks of the first camera device and the second camera device are exchanged.
1 1、根据权利要求 1 0所述的方法, 其特征在于, 控制摄像装置拍摄讲话者 的视频包括: 11. The method according to claim 10, characterized in that controlling the camera device to capture the video of the speaker includes:
利用声源定位技术, 控制摄像装置拍摄讲话者的视频。 Using sound source localization technology, the camera device is controlled to capture video of the speaker.
1 2、 根据权利要求 1 1所述的方法, 其特征在于, 所述利用声源定位技术, 控制摄像装置拍摄讲话者的视频包括:
利用声源定位技术并结合预置位或图像识别技术, 控制摄像装置拍摄讲话 者的视频。 12. The method according to claim 11, characterized in that, using sound source positioning technology to control the camera device to capture the video of the speaker includes: Use sound source localization technology combined with preset position or image recognition technology to control the camera device to capture the video of the speaker.
1 3、 根据权利要求 1 至 1 2任意一项所述的方法, 其特征在于, 所述在当前 讲话者从所述第一讲话者变更为第二讲话者时, 控制第二摄像装置拍摄第二讲 话者的视频包括: 13. The method according to any one of claims 1 to 12, characterized in that when the current speaker changes from the first speaker to the second speaker, controlling the second camera device to capture the second speaker. Videos of two speakers include:
判断所述第二讲话者位置是否在所述第一讲话者的输出画面中; Determine whether the second speaker's position is in the output picture of the first speaker;
若所述第二讲话者位置不在所述第一讲话者的输出画面中, 则控制所述第 二摄像装置拍摄所述第二讲话者的视频; If the second speaker's position is not in the output picture of the first speaker, control the second camera device to capture the video of the second speaker;
若所述第二讲话者位置在所述第一讲话者的输出画面中, 则进一步判断所 述第二讲话者位置是否在所述第一讲话者的输出画面的设定区域内; If the second speaker's position is in the first speaker's output screen, then further determine whether the second speaker's position is within the set area of the first speaker's output screen;
若所述第二讲话者位置在所述设定区域内, 则控制所述第一摄像装置拍摄 所述第二讲话者的视频; If the second speaker's position is within the set area, control the first camera device to capture the video of the second speaker;
若所述第二讲话者位置不在所述设定区域内, 则控制所述第一摄像装置跟 踪拍摄所述第二讲话者, 以使所述第二讲话者位置在所述设定区域内。 If the second speaker's position is not within the set area, the first camera device is controlled to track and photograph the second speaker so that the second speaker's position is within the set area.
14、 一种控制视频拍摄的装置, 其特征在于, 包括: 14. A device for controlling video shooting, characterized by including:
控制单元, 用于在第一讲话者讲话时, 控制第一摄像装置拍摄第一讲话者 的视频; A control unit, used to control the first camera device to capture the video of the first speaker when the first speaker speaks;
所述控制单元, 还用于在当前讲话者从所述第一讲话者变更为第二讲话者 时, 控制第二摄像装置拍摄第二讲话者的视频, 其中, 所述第二讲话者为与所 述第一讲话者位置不同的下一个讲话者; The control unit is also configured to control the second camera device to capture the video of the second speaker when the current speaker changes from the first speaker to the second speaker, where the second speaker is the same as the first speaker. a next speaker whose position is different from the first speaker;
所述控制单元, 还用于当后续再发生讲话者变更时, 依次控制所述第一摄 像装置和所述第二摄像装置交替拍摄当前讲话者的视频; The control unit is also configured to sequentially control the first camera device and the second camera device to alternately capture the video of the current speaker when the speaker changes subsequently;
处理单元, 与所述控制单元连接, 用于在成功获取所述当前讲话者的视频 之后输出所述当前讲话者的视频。 A processing unit, connected to the control unit, configured to output the video of the current speaker after successfully acquiring the video of the current speaker.
15、 根据权利要求 14所述的装置, 其特征在于, 所述处理单元具体用于: 设置所述当前讲话者的视频全屏显示; 15. The device according to claim 14, wherein the processing unit is specifically configured to: set the current speaker's video to be displayed in full screen;
全屏输出所述当前讲话者的视频。
Output the video of the current speaker in full screen.
1 6、 根据权利要求 15所述的装置, 其特征在于, 所述处理单元具体用于: 在成功获取所述当前讲话者的视频之前, 全屏输出所述当前讲话者的前一 个讲话者的视频; 16. The device according to claim 15, wherein the processing unit is specifically configured to: before successfully acquiring the video of the current speaker, output the video of the previous speaker of the current speaker in full screen ;
在成功获取所述当前讲话者的视频之后,全屏输出所述当前讲话者的视频。 After successfully acquiring the video of the current speaker, the video of the current speaker is output in full screen.
1 7、 根据权利要求 14所述的装置, 其特征在于, 所述处理单元具体用于: 设置所述当前讲话者的视频和所述当前讲话者的前一个讲话者的视频以画 中画的形式进行显示; 17. The device according to claim 14, characterized in that the processing unit is specifically configured to: set the video of the current speaker and the video of the previous speaker of the current speaker in picture-in-picture mode. display in form;
其中, 所述画中画包括第一画面和包含在所述第一画面中的、 比所述第一 画面小的第二画面, 在所述第一画面中显示所述当前讲话者, 在所述第二画面 中显示所述当前讲话者的前一个讲话者; Wherein, the picture-in-picture includes a first picture and a second picture included in the first picture that is smaller than the first picture, the current speaker is displayed in the first picture, and the current speaker is displayed in the first picture. Display the previous speaker of the current speaker in the second screen;
以画中画的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话 者的视频。 Videos of the current speaker and the previous speaker of the current speaker are simultaneously output in a picture-in-picture format.
1 8、 根据权利要求 15所述的装置, 其特征在于, 所述控制单元还用于: 在当前讲话者从所述第二讲话者变更为第三讲话者时, 控制所述第一摄像 装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为与所述第二讲话者位置 不同的下一个讲话者; 18. The device according to claim 15, wherein the control unit is further configured to: when the current speaker changes from the second speaker to a third speaker, control the first camera device Shoot a video of a third speaker, where the third speaker is the next speaker in a different position from the second speaker;
所述处理单元具体用于: The processing unit is specifically used for:
在成功获取所述第三讲话者的视频之前: 在所述第一画面中输出所述第二 讲话者, 在所述第二画面中输出所述第一讲话者的凝固画面; 或者, 在所述第 一画面中输出所述第二讲话者, 在所述第二画面中输出已经开始拍摄但尚未成 功获取过程中的所述第三讲话者; Before successfully acquiring the video of the third speaker: output the second speaker in the first picture, and output the frozen picture of the first speaker in the second picture; or, in the The second speaker is output in the first screen, and the third speaker who has started shooting but has not yet been successfully acquired is output in the second screen;
在成功获取所述第三讲话者的视频之后: 在所述第一画面中输出所述第三 讲话者, 在所述第二画面中输出所述第二讲话者。 After successfully acquiring the video of the third speaker: output the third speaker in the first picture, and output the second speaker in the second picture.
19、 根据权利要求 14所述的装置, 其特征在于, 所述处理单元具体用于: 设置所述当前讲话者的视频和所述当前讲话者的前一个讲话者的视频以双 画面的形式进行显示; 19. The device according to claim 14, wherein the processing unit is specifically configured to: set the video of the current speaker and the video of the previous speaker of the current speaker in the form of a dual picture. show;
其中, 所述双画面包括互不包含的二部分画面, 一部分画面显示所述当前
讲话者, 另一部分画面显示所述当前讲话者的前一个讲话者; Wherein, the double picture includes two parts of the picture that do not include each other, and one part of the picture displays the current speaker, another part of the screen displays the previous speaker of the current speaker;
以双画面的形式同时输出所述当前讲话者和所述当前讲话者的前一个讲话 者的视频。 The video of the current speaker and the previous speaker of the current speaker is simultaneously output in the form of a dual picture.
20、 根据权利要求 19所述的装置, 其特征在于, 所述控制单元还用于: 在当前讲话者从所述第二讲话者变更为第三讲话者时, 控制所述第一摄像 装置拍摄第三讲话者的视频, 其中, 所述第三讲话者为与所述第二讲话者位置 不同的下一个讲话者; 20. The device according to claim 19, wherein the control unit is further configured to: when the current speaker changes from the second speaker to a third speaker, control the first camera device to take pictures A video of a third speaker, wherein the third speaker is the next speaker whose position is different from that of the second speaker;
所述处理单元具体用于: The processing unit is specifically used for:
在成功获取所述第三讲话者的视频之前: 在所述一部分画面中输出所述第 一讲话者的凝固画面, 在所述另一部分画面中输出所述第二讲话者; 或者, 在 所述一部分画面中输出已经开始拍摄但尚未成功获取过程中的所述第三讲话 者, 在所述另一部分画面中输出所述第二讲话者; Before successfully acquiring the video of the third speaker: output the frozen picture of the first speaker in the part of the picture, and output the second speaker in the other part of the picture; or, in the The third speaker who has started shooting but has not yet been successfully acquired is output in a part of the picture, and the second speaker is output in the other part of the picture;
在成功获取所述第三讲话者的视频之后: 在所述一部分画面中输出所述第 三讲话者, 在所述另一部分画面中输出所述第二讲话者。 After the video of the third speaker is successfully obtained: the third speaker is output in the part of the picture, and the second speaker is output in the other part of the picture.
21、 根据权利要求 14所述的装置, 其特征在于, 控制第一摄像装置拍摄第 一讲话者的视频之前, 所述控制单元还用于: 21. The device according to claim 14, characterized in that, before controlling the first camera device to capture the video of the first speaker, the control unit is also used to:
在初始状态时, 控制所述第一摄像装置和所述第二摄像装置拍摄整个会场 的视频; In the initial state, the first camera device and the second camera device are controlled to capture video of the entire conference venue;
所述处理单元, 还用于将所述控制单元所拍摄的整个会场的视频输出。 The processing unit is also used to output the video of the entire venue captured by the control unit.
22、 根据权利要求 14-21 任意一项所述的装置, 其特征在于, 所述控制单 元还用于: 22. The device according to any one of claims 14-21, characterized in that the control unit is also used for:
为所述第一摄像装置和所述第二摄像装置分别设置追踪标志, 其中, 所述 第一摄像装置的追踪标志初始为第一追踪标志, 所述第二摄像装置的追踪标志 初始为第二追踪标志; Tracking flags are respectively set for the first camera device and the second camera device, wherein the tracking flag of the first camera device is initially the first tracking flag, and the tracking flag of the second camera device is initially the second tracking flag. tracking flag;
所述控制单元具体用于: 在第一讲话者讲话时, 控制具有第一追踪标志的 第一摄像装置去拍摄第一讲话者的视频,在成功获取所述第一讲话者的视频后, 将所述第一摄像装置的追踪标志从所述第一追踪标志设置为所述第二追踪标
志, 同时将所述第二摄像装置的追踪标志从所述第二追踪标志设置为所述第一 追踪标志; The control unit is specifically used to: when the first speaker speaks, control the first camera device with the first tracking mark to capture the video of the first speaker, and after successfully acquiring the video of the first speaker, The tracking mark of the first camera device is set from the first tracking mark to the second tracking mark. and simultaneously setting the tracking flag of the second camera device from the second tracking flag to the first tracking flag;
所述控制单元具体用于: 在当前讲话者从所述第一讲话者变更为第二讲话 者时, 控制具有所述第一追踪标志的第二摄像装置去拍摄第二讲话者的视频, 在成功获取所述第二讲话者的视频后, 将所述第二摄像装置的追踪标志从所述 第一追踪标志设置为所述第二追踪标志, 同时将所述第一摄像装置的追踪标志 从所述第二追踪标志设置为所述第一追踪标志。 The control unit is specifically configured to: when the current speaker changes from the first speaker to the second speaker, control the second camera device with the first tracking mark to capture the video of the second speaker, After successfully acquiring the video of the second speaker, the tracking flag of the second camera device is set from the first tracking flag to the second tracking flag, and at the same time, the tracking flag of the first camera device is changed from The second tracking flag is set to the first tracking flag.
23、 根据权利要求 22所述的装置, 其特征在于, 所述控制单元具体用于: 后续每次发生讲话者变更时, 控制具有所述第一追踪标志的摄像装置去拍摄当 前讲话者的视频, 在成功获取当前讲话者的视频后, 将所述第一摄像装置和所 述第二摄像装置的追踪标志互换。 23. The device according to claim 22, wherein the control unit is specifically configured to: control the camera device with the first tracking mark to capture the video of the current speaker each time a speaker changes subsequently. , after successfully acquiring the video of the current speaker, the tracking flags of the first camera device and the second camera device are exchanged.
24、 根据权利要求 23所述的装置, 其特征在于, 所述控制单元具体用于: 利用声源定位技术, 控制摄像装置拍摄讲话者的视频。 24. The device according to claim 23, characterized in that the control unit is specifically configured to: use sound source localization technology to control the camera device to capture the video of the speaker.
25、 根据权利要求 24所述的装置, 其特征在于, 所述控制单元具体用于: 利用声源定位技术并结合预置位或图像识别技术, 控制摄像装置拍摄讲话 者的视频。 25. The device according to claim 24, wherein the control unit is specifically configured to: use sound source positioning technology combined with preset position or image recognition technology to control the camera device to capture the video of the speaker.
26、 根据权利要求 14至 25任意一项所述的装置, 其特征在于, 所述控制 单元具体用于: 26. The device according to any one of claims 14 to 25, characterized in that the control unit is specifically used for:
判断所述第二讲话者位置是否在所述第一讲话者的输出画面中; Determine whether the second speaker's position is in the output picture of the first speaker;
若所述第二讲话者位置不在所述第一讲话者的输出画面中, 则控制所述第 二摄像装置拍摄所述第二讲话者的视频; If the second speaker's position is not in the output picture of the first speaker, control the second camera device to capture the video of the second speaker;
若所述第二讲话者位置在所述第一讲话者的输出画面中, 则进一步判断所 述第二讲话者位置是否在所述第一讲话者的输出画面的设定区域内; If the second speaker's position is in the first speaker's output screen, then further determine whether the second speaker's position is within the set area of the first speaker's output screen;
若所述第二讲话者位置在所述设定区域内, 则控制所述第一摄像装置拍摄 所述第二讲话者的视频; If the second speaker's position is within the set area, control the first camera device to capture the video of the second speaker;
若所述第二讲话者位置不在所述设定区域内, 则控制所述第一摄像装置跟 踪拍摄所述第二讲话者, 以使所述第二讲话者位置在所述设定区域内。
If the second speaker's position is not within the set area, the first camera device is controlled to track and photograph the second speaker so that the second speaker's position is within the set area.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310566974.1 | 2013-11-14 | ||
CN201310566974.1A CN103595953B (en) | 2013-11-14 | 2013-11-14 | A kind of method and apparatus for controlling video capture |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015070558A1 true WO2015070558A1 (en) | 2015-05-21 |
Family
ID=50085919
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/074831 WO2015070558A1 (en) | 2013-11-14 | 2014-04-04 | Video shooting control method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103595953B (en) |
WO (1) | WO2015070558A1 (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103595953B (en) * | 2013-11-14 | 2017-06-20 | 华为技术有限公司 | A kind of method and apparatus for controlling video capture |
US9686467B2 (en) * | 2014-08-15 | 2017-06-20 | Sony Corporation | Panoramic video |
CN104486549B (en) * | 2014-12-29 | 2017-07-25 | 中国科学院长春光学精密机械与物理研究所 | A kind of high flux image pickup method for being used to be imaged flow cytometer |
CN105049807B (en) * | 2015-07-31 | 2018-05-18 | 小米科技有限责任公司 | Monitored picture sound collection method and device |
CN105245820B (en) * | 2015-10-12 | 2019-04-02 | 苏州科达科技股份有限公司 | A kind of multiple video strems switching display methods, device and Videoconference Management System |
CN106231234B (en) * | 2016-08-05 | 2019-07-05 | 广州小百合信息技术有限公司 | The image pickup method and system of video conference |
CN107786834A (en) * | 2016-08-31 | 2018-03-09 | 宝利通公司 | For the camera base and its method in video conferencing system |
JP6766086B2 (en) | 2017-09-28 | 2020-10-07 | キヤノン株式会社 | Imaging device and its control method |
CN107820006A (en) * | 2017-11-07 | 2018-03-20 | 北京小米移动软件有限公司 | Control the method and device of camera shooting |
JP7292853B2 (en) | 2017-12-26 | 2023-06-19 | キヤノン株式会社 | IMAGING DEVICE, CONTROL METHOD AND PROGRAM THEREOF |
JP2019117375A (en) * | 2017-12-26 | 2019-07-18 | キヤノン株式会社 | Imaging apparatus, control method of the same, and program |
CN109009170A (en) * | 2018-07-20 | 2018-12-18 | 深圳市沃特沃德股份有限公司 | Detect the method and apparatus of mood |
CN108924469B (en) * | 2018-08-01 | 2020-11-10 | 广州视源电子科技股份有限公司 | Display picture switching transmission system, intelligent interactive panel and method |
CN109151370B (en) * | 2018-09-21 | 2020-10-23 | 上海赛连信息科技有限公司 | Intelligent video system and intelligent control terminal |
TWI678660B (en) * | 2018-10-18 | 2019-12-01 | 宏碁股份有限公司 | Electronic system and image processing method |
CN111212218A (en) * | 2018-11-22 | 2020-05-29 | 阿里巴巴集团控股有限公司 | Shooting control method and device and shooting system |
CN109816722A (en) * | 2019-01-18 | 2019-05-28 | 深圳市沃特沃德股份有限公司 | Position method, apparatus, storage medium and the computer equipment of spokesman position |
CN110072055A (en) * | 2019-05-07 | 2019-07-30 | 中国联合网络通信集团有限公司 | Video creating method and system based on artificial intelligence |
CN110536101A (en) * | 2019-09-29 | 2019-12-03 | 广州视源电子科技股份有限公司 | Electronic holder, video conference system and method |
CN112911256A (en) * | 2020-12-29 | 2021-06-04 | 慧投科技(深圳)有限公司 | Projector system with camera for automatically capturing sound source |
CN113596349B (en) * | 2021-07-26 | 2024-06-04 | 世邦通信股份有限公司 | Conference method, system, device and storage medium for automatic linkage video of speaking position |
CN115550559B (en) * | 2022-04-13 | 2023-07-25 | 荣耀终端有限公司 | Video picture display method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080218582A1 (en) * | 2006-12-28 | 2008-09-11 | Mark Buckler | Video conferencing |
CN102256098A (en) * | 2010-05-18 | 2011-11-23 | 宝利通公司 | Videoconferencing endpoint having multiple voice-tracking cameras |
CN103281508A (en) * | 2013-05-23 | 2013-09-04 | 深圳锐取信息技术股份有限公司 | Video picture switching method, video picture switching system, recording and broadcasting server and video recording and broadcasting system |
CN103281492A (en) * | 2013-05-23 | 2013-09-04 | 深圳锐取信息技术股份有限公司 | Video picture switching method, video picture switching system, recording and broadcasting server and video recording and broadcasting system |
CN103595953A (en) * | 2013-11-14 | 2014-02-19 | 华为技术有限公司 | Method and device for controlling video shooting |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1386371A (en) * | 2000-08-01 | 2002-12-18 | 皇家菲利浦电子有限公司 | Aiming a device at a sound source |
CN102148965B (en) * | 2011-05-09 | 2014-01-15 | 厦门博聪信息技术有限公司 | Video monitoring system for multi-target tracking close-up shooting |
US9030520B2 (en) * | 2011-06-20 | 2015-05-12 | Polycom, Inc. | Automatic camera selection for videoconferencing |
-
2013
- 2013-11-14 CN CN201310566974.1A patent/CN103595953B/en active Active
-
2014
- 2014-04-04 WO PCT/CN2014/074831 patent/WO2015070558A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080218582A1 (en) * | 2006-12-28 | 2008-09-11 | Mark Buckler | Video conferencing |
CN102256098A (en) * | 2010-05-18 | 2011-11-23 | 宝利通公司 | Videoconferencing endpoint having multiple voice-tracking cameras |
CN103281508A (en) * | 2013-05-23 | 2013-09-04 | 深圳锐取信息技术股份有限公司 | Video picture switching method, video picture switching system, recording and broadcasting server and video recording and broadcasting system |
CN103281492A (en) * | 2013-05-23 | 2013-09-04 | 深圳锐取信息技术股份有限公司 | Video picture switching method, video picture switching system, recording and broadcasting server and video recording and broadcasting system |
CN103595953A (en) * | 2013-11-14 | 2014-02-19 | 华为技术有限公司 | Method and device for controlling video shooting |
Also Published As
Publication number | Publication date |
---|---|
CN103595953A (en) | 2014-02-19 |
CN103595953B (en) | 2017-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015070558A1 (en) | Video shooting control method and device | |
US9860486B2 (en) | Communication apparatus, communication method, and communication system | |
US8289363B2 (en) | Video conferencing | |
CN109413359B (en) | Camera tracking method, device and equipment | |
US8860776B2 (en) | Conference terminal, conference server, conference system and data processing method | |
JP6171263B2 (en) | Remote conference system and remote conference terminal | |
TWI602437B (en) | Video and audio processing devices and video conference system | |
US11076127B1 (en) | System and method for automatically framing conversations in a meeting or a video conference | |
US10079995B1 (en) | Methods and systems for generating resolution based content | |
WO2012072008A1 (en) | Method and device for superposing auxiliary information of video signal | |
WO2018214746A1 (en) | Video conference realization method, device and system, and computer storage medium | |
US10079996B2 (en) | Communication system, communication device, and communication method | |
WO2010022633A1 (en) | Method and device for generating, playing audio signals, and processing system for audio signals | |
WO2011109578A1 (en) | Digital conferencing for mobile devices | |
KR20180048982A (en) | Devices for video-conferences | |
JP2007173952A (en) | Content reproduction system, reproducing unit and method, providing device and providing method, program, and recording medium | |
WO2012019517A1 (en) | Method, device and system for processing video in video communication | |
EP4106326A1 (en) | Multi-camera automatic framing | |
EP4075794A1 (en) | Region of interest based adjustment of camera parameters in a teleconferencing environment | |
JP2010157906A (en) | Video display device | |
Nguyen et al. | ITEM: Immersive telepresence for entertainment and meetings—A practical approach | |
TW201801525A (en) | Video image generation system and video image generating method thereof | |
TW202018649A (en) | Asymmetric video conferencing system and method thereof | |
JP2010004480A (en) | Imaging apparatus, control method thereof and program | |
TWI636691B (en) | Method of switching videoconference signals and the related videoconference system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14861898 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14861898 Country of ref document: EP Kind code of ref document: A1 |