WO2008075726A1

WO2008075726A1 - Video conferencing device

Info

Publication number: WO2008075726A1
Application number: PCT/JP2007/074449
Authority: WO
Inventors: Toshiyuki Hata; Takuya Tamaru
Original assignee: Yamaha Corporation
Priority date: 2006-12-19
Filing date: 2007-12-19
Publication date: 2008-06-26
Also published as: CN101518049A; JP2008154055A; JP4862645B2

Abstract

Provided is a video conferencing device capable of accurately and clearly sending a conference participant audio and video and associated materials. The position of a camera (2) is fixed by a stay (3) with respect to a disc-shaped sound emitting/collecting device (1). In this position, the camera (2) is arranged in such a manner that it can rotate and be half-fixed at a horizontal-direction state and a vertical-direction state. Here, the state of the camera (2) is detected by using a switch. When the camera (2) is set in the horizontal direction, the camera (2) images a conference participant and extracts and acquires a video of a conference participant who is talking. On the other hand, when the camera (2) is set in the vertical direction, it images a material which is set in advance. Since the camera (2) includes a fish-eye lens, the acquired video is corrected in accordance with the respective states so as to generate image data to be transmitted.

Description

Specification

Video conferencing equipment

Technical field

[0001] The present invention relates to a video conferencing apparatus that communicates video and images and audio used when a video conference is performed between conference rooms separated from each other.

Background art

Conventionally, when a video conference is performed between a plurality of points separated from each other, a video conference device (video conference device) as shown in Patent Document 1 is arranged at each point so as to surround the video conference device. The conference is attended and a conference is held.

[0003] In the video conference apparatus of Patent Document 1, each conference person is equipped with a microphone with a radio wave generator, and radio waves are radiated from the microphone that picks up the highest level of sound. The person photographing camera detects the direction of the speaker by receiving this radio wave, directs the camera toward the direction of the speaker, and captures an image centered on the speaker. The video data and audio data are encoded and transmitted to the destination video conference apparatus.

Patent Document 1: JP-A-6-276514

Disclosure of the invention

Problems to be solved by the invention

[0004] When a video conference is performed, there is a case where it is desired to refer to materials or the like in common between distant points only by the video of a conference person such as a speaker as described above. With the device of Patent Document 1, the ability to switch and acquire a speaker's video cannot be used to display materials. For this reason, in order to project a document using the configuration of Patent Document 1, it is sufficient for the conference person to manually view the document in front of the camera, but the document cannot be fixed completely, so the image is blurred. Let's do it. In addition, the document cannot be imported as it is (original image) due to the influence of the curvature of the lens. In addition, as an alternative method of referring to materials in common, it is also possible to transmit the materials as data. Intuitive and flexible materials such as writing and explaining during meetings are provided. I can't.

[0005] Therefore, an object of the present invention is to provide a resource that is highly flexible along with audio and video. To provide a video conferencing device that can be transmitted accurately and clearly even if there is a fee.

Means for solving the problem

[0006] The present invention provides an imaging unit that captures a predetermined area, a video data generation unit that generates video data based on video captured by the imaging unit, Generating voice data and generating communication data including a housing including a sound emission and collection unit for emitting sound emission sound data, sound collection sound data and video data, and transmitting the communication data to the outside. The present invention relates to a video conferencing apparatus including a communication unit that acquires sound emission sound data from external communication data and applies the sound emission sound collection unit, and a support unit that supports the imaging unit in a predetermined manner. In this video conferencing apparatus, the support unit uses the first mode in which the imaging unit is directed to the conference person imaging region around the housing, and the second mode in which the imaging unit is directed to a region near the imaging unit in the vicinity of the housing. The imaging unit is supported by any one of the modes. Then, (i) when the selection of the first mode is detected, the video data generation unit of this video conference apparatus cuts out only the azimuth area corresponding to the sound collection direction information of the collected sound data from the video data. Then, the extracted video data is corrected by the first correction processing according to the first mode. In addition, (ii) when the selection of the second mode is detected, the video data generation unit cuts out a predetermined area centered on the front direction of the imaging unit from the video data, and is different from the first correction processing. The video data cut out by the second correction process according to the above is corrected.

[0007] With this configuration, the video conference device of the present invention cuts out only the video data in the sound collection direction and performs the first correction process when the imaging unit is set to the first mode facing the conference person imaging area. Make adjustments so that they are easy to see. Then, the video conference device generates communication data from the video data and the collected sound data, and transmits the communication data to the counterpart device. On the other hand, in the video conference apparatus, the image capturing unit captures a document or the like installed in a close area near the casing.

If it is set to 2 modes, the image taken by the imaging unit from the front is corrected by the second correction process so that it can be easily viewed. Then, the video conference device generates communication data including the video data and transmits it to the counterpart device. At this time, since the area to be photographed may be different between the first mode and the second mode, the video is corrected by the first correction process and the second correction process, which are different correction processes according to the respective modes. . [0008] Thereby, since the conference participant video and the still image such as the document are corrected according to the respective shooting specifications, the conference participant video and the document image appropriately corrected with respect to the destination device. And can send power.

[0009] Further, the support section of the video conference apparatus of the present invention is characterized by including a joint mechanism for switching between the first mode and the second mode, and forming a switch by the joint mechanism. Furthermore, the video data generation unit of this video conference apparatus is characterized by detecting the selection between the first mode and the second mode based on the switch selection status by the joint mechanism.

[0010] In the video conferencing apparatus having this configuration, the first mode and the second mode are selected by switching the switch by operating the joint mechanism of the support unit. Second mode is set.

[0011] The present invention also includes an imaging unit that images a predetermined area, a video data generation unit that generates video data based on the video captured by the imaging unit, and a sound around the device itself. Generates collected sound data, generates communication data including a sound emission / collection unit that emits sound output sound data, sound collection sound data and video data, and transmits the communication data to the outside. The present invention relates to a video conferencing apparatus comprising: a communication unit that obtains sound emission sound data from communication data from and provides the sound emission and collection unit; and a support unit that supports the imaging unit with respect to the housing. is there. In this video conference apparatus, the imaging unit simultaneously captures the conference person imaging area and the area close to the imaging unit in the vicinity of the housing. The video data generation unit cuts out only the azimuth area corresponding to the sound collection direction information of the collected sound data from the first partial video data corresponding to the conference person imaging area, and the third partial video data is extracted from the first partial video data. The second partial video data corresponding to the area close to the imaging unit is corrected by a fourth correction process different from the third correction process.

[0012] In the video conferencing apparatus having this configuration, the first partial video data corresponding to the conference person imaging area and the second partial video data corresponding to the area where the material close to the imaging unit is arranged are provided as one unit. Acquired simultaneously by the imaging unit. In the first partial video data, only the azimuth area corresponding to the collected sound data is cut out and appropriately corrected by the third correction process. The second part video data is adjusted so that it can be easily viewed by the corresponding fourth correction process.

[0013] Thereby, the conference video and the still image such as the document are acquired at the same time, and each It is adjusted according to the shooting specifications. As a result, it is also possible to simultaneously transmit the conference participant video and the material image appropriately corrected to the destination device.

[0014] The video conference apparatus according to the present invention includes a selection unit that selects partial video data used for communication data. The video data generation unit of the video conference apparatus gives the partial video data selected by the selection unit to the communication unit.

[0015] In this configuration, one of the conference video and the still image selected is transmitted. As a result, a still image that hardly changes with time can be transmitted only when necessary, so that no extra load is applied to the communication system.

[0016] Further, in the video conference apparatus of the present invention, the imaging unit has a fisheye lens, the central region of the region imaged by the fisheye lens is set as a region close to the imaging unit, and at least a peripheral region outside the central region is conferenced It is characterized by a person imaging area.

[0017] In the video conferencing apparatus having this configuration, a fisheye lens is used as a specific specification of the imaging unit. Then, an area corresponding to the center of the fisheye lens is set as an area close to the imaging unit, and correction is appropriately performed by correction processing according to this area. As the conference area, the center area may be used when the mode is changed, but the peripheral area is mainly used. Therefore, the video in the conference area is appropriately adjusted by the correction processing according to the selected area in each case. As a result, even if an image (image) in the vicinity area near the imaging unit and an image in the conference area are captured through the fisheye lens, the respective images are appropriately corrected.

[0018] In addition, the video data generation unit of the video conference apparatus of the present invention is integrally formed with the imaging unit. In addition, the communication unit of the video conference apparatus according to the present invention is integrally formed with the housing together with the sound emission and collection unit. In addition, the video data generation unit of the video conference apparatus according to the present invention is integrally formed with the casing together with the sound emission and collection unit. These make the video conferencing equipment compact.

[0019] The video conference apparatus of the present invention further includes a display monitor for reproducing video data. The communication unit of this video conference apparatus acquires video data included in the communication data and supplies it to the display monitor.

[0020] Thereby, the video conference apparatus of the present invention is arranged and connected to each point where the communication conference is performed. It is possible to easily share the conference video and materials between the two parties.

The invention's effect

[0021] According to the present invention, the video of the speaker is corrected by the correction processing according to the video of the speaker, and the image of the material is corrected by the correction processing according to the image of the material by a simple operation of the imaging unit. Since the correction is made, it is possible to transmit both the speaker image and the document image to the other device accurately and clearly. As a result, in the video conference using this apparatus, it is possible to realize the conference more easily and easily!

Brief Description of Drawings

FIG. 1 is an external view of a video conferencing apparatus according to a first embodiment in a conference shooting mode.

FIG. 2 is an external view of the video conference apparatus according to the first embodiment in a document shooting mode.

FIG. 3 is a block diagram illustrating a main configuration of the video conference apparatus according to the first embodiment.

FIG. 4 is a diagram illustrating a situation (conference shooting mode) in which the video conference apparatus according to the first embodiment is arranged and a video conference is performed with another point connected to the network.

FIG. 5 is an explanatory diagram used for explaining video data generation in the conference shooting mode.

FIG. 6 is a diagram showing a situation (material shooting mode) in which the video conference apparatus according to the first embodiment is arranged and a video conference is performed with another point connected to the network.

FIG. 7 is an explanatory diagram used for explaining video data generation in the document photographing mode.

FIG. 8 is an external view of an assembly member including a sound emission and collection device 1, a camera 2, and a support 7 in a video conference device according to a second embodiment.

FIG. 9 is a diagram showing a usage situation of a video conference apparatus using the video conference apparatus of the second embodiment.

FIG. 10 is a diagram for explaining generation of video data by the video conference apparatus according to the second embodiment.

[0023] 1 Sound emission and collection device, 2—Power mela, 3 Steady, 4 Switch, 5 Communication terminal, 6 Display, 7 Support, 8—Mounting table, 11 Housing, 12 Legs, 21—Imaging Part, 22 Video processing part, 31 Main part, 32 Forced support part, 33 Main part support part, 34 Sound emitting / receiving device mounting part, 102 Input / output I / F, 103 Sound emission control part, 105—A / D—AMP 106 Sound pickup system , 107 Echo Cancellation, 110 Concave, 111 Operation, 203 Hinge, 500 — Network, 60 to 605 Conference, 610 All Area Image Data, 611, 615 People Image, 621 — Compensation Image Data, 622 Compensation Image Data, 631, 635 human image, 64 1-644 human image, 650 materials, 654 human image, 660 image, 670 image, 680, 681 image data, 682 peripheral image data, 683 partial image data, 700 tables

BEST MODE FOR CARRYING OUT THE INVENTION

The video conference apparatus according to the first embodiment of the present invention will be described with reference to the drawings. FIGS. 1 and 2 are external views of the video conference apparatus of the present embodiment, and FIG. (B) is a side view. Fig. 1 and Fig. 2 show only the structure of the sound emission and collection device, camera, and stage, which are mechanically characteristic, and the communication terminal, sound emission and collection device, and the cable that electrically connects the camera are not shown. Omitted. Fig. 1 shows the mechanism state in the conference shooting mode, and Fig. 2 shows the mechanism state in the document shooting mode.

FIG. 3 is a block diagram showing the main configuration of the video conference apparatus according to the present embodiment.

[0025] In FIG. 1, FIG. 2, FIG. 3 and the subsequent drawings referred to in this specification, the microphone is represented or generically represented by “MC”, and the speaker is represented or generically represented by “MC”. “SP”. The video conference apparatus according to the present embodiment includes a sound emitting and collecting apparatus 1 having a disk shape in plan view, a camera 2 having an imaging function and a video data generating function, and the camera 2 with respect to the sound emitting and collecting apparatus 1 at a predetermined position. And stay 3 to be installed. Although not shown in FIGS. 1 and 2, sound emitting and collecting apparatus 1 and camera 2 are electrically connected, and the video conferencing apparatus is electrically connected to sound emitting and collecting apparatus 1 and camera 2. A communication terminal to be connected is provided.

[0026] The communication terminal 5 demodulates the communication data received from the communication terminal of the other party's video conferencing apparatus connected via the network 500, and outputs a sound signal for sound emission, the other party's apparatus ID, and the speaker orientation. The data is acquired and given to the sound emitting and collecting device 1 on the own device side connected by the cable. Further, the communication terminal 5 generates communication data based on the collected sound signal and speaker position data received from the sound emitting and collecting device 1 on the own device side and the video data received from the camera 2. Communication terminal 5 transmits the generated communication data to the communication terminal of the destination video conference device. To do. Further, the communication terminal 5 mediates transmission / reception of the speaker position data between the sound emitting and collecting apparatus 1 and the camera 2 depending on the situation.

The sound emission and collection device 1 includes a disk-shaped housing 11. Specifically, the casing 11 has a circular shape in plan view, and the shape in side view in which the area between the top surface and the bottom surface is narrower than the area of the middle part in the vertical direction is from a point in the height direction. It has a shape that narrows toward the surface and narrows from the one point toward the bottom surface. That is, it has a shape having inclined surfaces on the upper side and the lower side from the one point. A concave portion 110 having a predetermined depth narrower than the area of the top surface is formed on the top surface of the casing 11 so that the center of the concave portion 110 and the center of the top surface coincide with each other. Is set to

[0028] 16 microphones MC;! To MC16 are installed inside the top surface of the casing 11 along the side surface of the recess 110, and each microphone MC;! They are arranged at equiangular pitches (in this case, at intervals of about 22.5 °) with the viewed center as the rotation center. At this time, if the microphone MC1 is in the direction of θ = 0 °, the microphones MC ;! to MC16 are arranged along the direction in which the Θ force increases by ½2.5 ° in order. For example, the microphone MC5 is arranged in the Θ = 90 ° direction, the microphone MC9 is arranged in the Θ = 180 ° direction, and the microphone MC13 is arranged in the Θ = 270 ° direction. Moreover, each microphone MC ;! to MC16 has a single directivity, and each microphone MC is arranged so as to have a strong directivity in the central direction as viewed from above. For example, microphone MC1 has Θ = 180 ° as the center of directivity, microphone MC5 has Θ = 270 ° as the center of directivity, and microphone MC9 has Θ = 0 (360). The direction is the center of directivity, and the microphone MC13 has the direction of Θ = 90 ° as the center of directivity. The number of microphones is not limited to this, and may be set as appropriate according to specifications.

[0029] The four speakers SP;! To SP4 are installed so that the inclined surface on the lower side of the casing 11 and the sound emitting surface coincide with each other, and each speaker SP;! To SP4 is a sound emitting and collecting device. They are arranged at equiangular pitches (in this case, about 90 ° intervals) with the center of 1 viewed in plan as the center of rotation. At this time, the angle of the speaker SP1 is arranged in the Θ = 0 ° direction, the speaker SP2 is arranged in the Θ = 90 ° direction with respect to the speaker SP1, and the speaker SP3 is arranged in the Θ = 180 ° direction with respect to the speaker SP1. Speaker SP4 is arranged in the direction of Θ = 270 ° with respect to speaker SP1. Also, each speaker SP;! To SP4 has a strong directivity in the front direction of the sound emitting surface. The speaker SP1 has Θ = 0. The speaker SP2 emits sound around the Θ = 90 ° direction, the speaker SP3 emits sound around the Θ = 180 ° direction, and the speaker SP 4 focuses on the Θ = 270 ° direction. Sounds out.

[0030] In this way, the speakers SP;! To SP4 are arranged on the lower side of the casing 11, and the microphones MC;! To MC16 are arranged on the upper side of the casing 11, and the microphones MC;! To MC16 are accommodated. By setting the sound direction to the central direction of the housing 11, the microphones MC ;! to MC16 are difficult to pick up the wraparound sound from the speakers SP ;! to SP4. For this reason, speaker position detection, which will be described later, is less likely to be affected by wraparound speech, and the speaker position can be detected with higher accuracy.

[0031] The operation unit 111 is installed on an inclined surface on the upper side of the casing 11, and includes various operation buttons and a liquid crystal display panel (not shown).

The input / output I / F102 (not shown in FIGS. 1 and 2) is an inclined surface on the lower side of the casing 11, and is installed at a position where the SP force SP;! To SP4 is not installed. Equipped with a terminal that can communicate various control data. Then, by connecting the terminal of the input / output I / F 102 and the communication terminal with a cable or the like, communication is performed between the sound emission and collection device 1 and the communication terminal.

[0032] The sound emitting and collecting apparatus 1 has a functional configuration as shown in FIG. 3 in addition to such a structural configuration.

The control unit 101 performs general control such as setting, sound collection, and sound emission of the sound emission / collection device 1, and controls each part of the sound emission / collection device 1 based on the operation instruction content input by the operation unit 111. Give

[0033] (1) Sound emission

The input / output I / F 102 outputs sound emission sound signals S 1 to S 3 received from the communication terminal 5 to the channels CH;! To CH 3, respectively. The channel assignment may be set as appropriate according to the number of received sound signals for sound emission. The input / output I / F 102 receives the counterpart device ID from the communication terminal 5 and assigns a channel CH to each counterpart device ID. For example, when there is one connected counterpart device, the audio data from the counterpart device is assigned to channel CH1 as sound output audio signal S1. Also, when there are two connected counterpart devices, the audio data from the two counterpart devices are individually assigned to channels CHI and CH2 as sound emission sound signals SI and S2, respectively. Similarly, connected If there are three counterpart devices, the audio data from the three counterpart devices are individually assigned to channels CHI, CH2, and CH3 as sound output signals SI, S2, and S3, respectively. The channels CH;! To CH3 are connected to the sound emission control unit 103 via the echo cancellation unit 107.

Further, the input / output I / F 102 extracts the speaker orientation data Py at the other party sound emission and collection device from the communication terminal 5 and provides it to the sound emission control unit 103 together with the channel information.

[0034] The sound emission control unit 103 generates speaker output signals SPD;! To SPD4 to be given to the speakers SP;! To SP4 based on the sound signals for sound emission S1 to S3 and the speaker orientation information Py. To do.

[0035] The D / A-AMP 104 converts each speaker output signal SPD ;! to SPD4 from digital to analog, amplifies the signal with a constant amplification factor, and supplies it to the speakers SP ;! to SP4, respectively. Speaker SP;! ~ SP4 converts the given speaker output signal SPD;! ~ SPD4 into sound and emits it

[0036] By performing such sound emission processing, the sound emitted from each speaker SP;! To SP4 has a predetermined delay relationship and amplitude relationship. A sense of sound can be given to the conferees.

[0037] (2) Sound collection

The microphones MC;! To MC16 collect sound from outside, such as the sound generated by the conference, and generate the collected signals MS;! To MS16. Each A / D-AMP 105 amplifies the corresponding collected sound signal MS ;! to MS 16 with a predetermined amplification factor, converts the signal to analog-digital, and outputs it to the sound collection control unit 106.

[0038] The sound collection control unit 106 synthesizes the acquired sound collection signals MS;! To MS 16 with different delay control patterns and amplitude patterns, and sets the respective different directions as the central direction of directivity. A sound beam signal is generated. For example, with the sound emitting and collecting apparatus 1 as the center, eight sound collecting beam signals are generated in which the 360 ° of the entire circumference is divided into eight angles, that is, the central direction of the directivity is shifted every 45 °. The sound collection control unit 106 compares the amplitude levels of these sound collection beam signals, selects the sound collection beam signal MBS having the highest amplitude level, and outputs it to the echo cancellation unit 107. The sound collection control unit 106 acquires the speaker orientation corresponding to the selected sound collection beam signal, generates the speaker orientation information Pm, and provides it to the input / output I / F 102. [0039] The echo cancellation unit 107 includes an adaptive filter that generates pseudo-regression sound signals based on the sound output sound signals S1 to S3 for the input sound pickup beam signal MBS, and a sound pickup beam signal. It consists of a post processor that subtracts the pseudo-regressive sound signal from the MBS. The echo cancellation circuit subtracts the pseudo-regression sound signal from the output sound pickup beam signal MBS while sequentially optimizing the filter coefficients of the adaptive filter, so that the speaker SP;! Included in the output sound pickup beam signal MBS! ~ Remove the wraparound component from SP4 to microphone MC;! ~ MC16. The collected sound beam signal MBS from which the wraparound component has been removed is output to the input / output I / F 102.

[0040] The input / output I / F 102 associates the collected sound beam signal MBS from which the return sound has been removed by the echo canceling unit 107 with the speaker orientation information Pm from the sound collecting control unit 106, and outputs it to the communication terminal 5. To do.

The camera 2 is installed at a position fixed to the sound emitting and collecting apparatus 1 by the stay 3 as shown in FIGS. At this time, the camera 2 is installed by the stay 3 so as to be rotatable between a horizontal direction (direction facing the camera 2 shown in FIG. 1) and a vertical downward direction (direction facing the camera 2 shown in FIG. 2). ing.

The stay 3 includes a main body part 31, a camera support part 32, a main body support part 33, and a sound emitting and collecting device attachment part 34. The main body 31 is formed of a linear member having a predetermined width, and is installed in a shape extending in a direction of a predetermined angle with respect to the vertical direction by the main body support 33. A camera support portion 32 is installed at one end in the extending direction of the main body portion 31 via a hinge 203, and a sound emission and collection device mounting portion 34 is installed at the other end. The sound emitting and collecting device mounting portion 34 is formed of a flat plate having an opening portion into which the leg portion 12 of the housing 11 is fitted, and is integrally formed with the main body portion 31, for example.

[0043] The end portion on the camera support portion 32 side of the main portion 31 has a shape in which only both end walls in the width direction remain and the center portion in the width direction opens. The opening has a shape that does not contact the main body 31 when the camera 2 installed in the camera support 32 rotates between the horizontal direction and the vertical downward direction.

The hinge 203 has a structure in which the camera support portion 32 is rotatably installed with respect to the main body portion 31. Further, the hinge 203 and the camera support portion 32 have a structure that is semi-fixed when the camera 2 and the camera support portion 32 face in the horizontal direction and when they face in the vertical downward direction. For example, the hinge 203 is fixed to the main body 31, and the recesses are formed at the horizontal position and the vertically downward position of the hinge 203, respectively. A protrusion on the hinge side of the camera support 32 is provided with a protrusion that fits into the recess, and the protrusion is biased from within the camera support 32 with a panel or the like. As a result, the camera 2 can rotate between the horizontal direction and the vertically downward direction, and can maintain a mechanical state in the horizontal direction and the vertically downward direction.

The mechanism unit including the hinge 203 and the camera support unit 32 functions as the switch 4.

For example, electrodes are installed in the concave and convex portions, respectively, and electrical conduction and release are detected. At this time, the connection or detection signal is set so that different signals are obtained between the horizontal recess and the vertical downward recess. With such a structure, the switch 4 is formed, and the detection result of the switch 4 is given to the camera 2. As a result, the camera 2 can identify the power of the camera 2 facing in the horizontal direction and the power of capturing the video by identifying whether the camera 2 is facing down in the vertical direction.

The camera 2 includes an imaging unit 21 and a video processing unit 22. The imaging unit 21 includes a fisheye lens, and images an area up to the installation surface of the fisheye lens with an infinite distance force in all directions around the front direction of the camera 2. The imaging data is given to the video processing unit 22.

The image processing unit 22 acquires the direction in which the camera 2 is detected (hereinafter referred to as a shooting direction) from the force of the switch 4 (hinge 203 and camera support unit 32) of the stay 3. Based on the acquired shooting direction and the speaker orientation data P m from the sound emission and collection device 1 via the communication terminal 5, the video processing unit 22 extracts only the necessary part from the imaging data and corrects the image, thereby obtaining video data. Is generated. The generated video data is given to the communication terminal 5.

[0048] Next, a method for using the video conference apparatus and a method for generating video data in the video processing unit 22 will be described more specifically. In the following explanation, the number of power conferencing members shown when there are five conferencing members on the device side is not particularly limited to this.

FIG. 4 is a diagram showing a situation in which the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the camera 2 captures the conference participants 60;!-605. FIG.

Fig. 5 is an explanatory diagram used to explain video data generation. (A) shows the video (image) taken through the fisheye lens, and (B) and (C) are image correction concepts for each conference direction. Indicates. FIG. 6 is a diagram illustrating a situation where the video conference apparatus according to the present embodiment is arranged and a video conference is performed with another point connected to the network, and the case where the camera 2 captures the document 650 is illustrated. FIG.

Fig. 7 is an explanatory diagram used to explain the video data generation. (A) shows the video (image) taken through the fisheye lens, and (B) shows the concept of image correction during image capture.

[0051] When a video conference is performed, the conferees 60;! To 605 are seated on the oval table 700 at positions other than one end in the longitudinal direction. On the table 700, an integrated member of a circular sound emission and collection device 1 and a camera 2 fixed to the same by a stay 3 is installed. At this time, the force lens 2 is installed so that the axis parallel to the longitudinal direction of the table 700 coincides with the central axis of the fish-eye lens in a state of being horizontally oriented. Under the table 700, a communication terminal 5 is installed. The communication terminal 5 is electrically connected to the sound emission and collection device 1 and the camera 2 and is connected to the network 500. The communication terminal 5 is electrically connected to the display 6. The display 6 is composed of, for example, a liquid crystal display or the like, and is installed near the end of the table 700 where the participants 60 ;! to 605 are not seated. At this time, the display 6 is installed such that the display surface faces the direction of the table 700.

[0052] When a video conference is performed in such a state, the video conference device including the sound emission and collection device 1, the camera 2, and the communication terminal 5 transmits the conference video to the destination video conference device in two modes. Send.

[0053] (1) Conference shooting mode

When any of the participants 60;! To 605 is set in the horizontal direction, the video processor 22 of the camera 2 detects that the conference shooting mode has been selected by the detection signal from the switch 4. To do. When the video processing unit 22 detects the conference shooting mode, the video processing unit 22 provides the communication terminal 5 with a selection signal for the mode.

The imaging unit 21 of the camera 2 acquires imaging data obtained by imaging all conference persons 60;! To 605 present on the device side through the fisheye lens, and outputs the acquired imaging data to the video processing unit 22. Here, since the imaging data passes through the fisheye lens, the imaging area becomes circular as shown in Fig. 5 (A). When the conference shooting mode is selected, the video processing unit 22 takes a circular imaging data with a coordinate system in which the horizontal direction of the circular arc is represented by an azimuth angle Φ and the vertical direction is represented by an elevation angle Φ. To get. That is, the same height as the lens axis in the front direction of the fisheye lens is φ = 0. , Φ = 0

Set to °. Furthermore, it is set so that Φ increases in the negative direction in the direction spreading leftward from the coordinates, and Φ increases in the positive direction in the direction spreading rightward. Therefore, the direction perpendicular to the fisheye lens axis is φ = 90 ° from the front of the fisheye lens of the force lens 2 to the shooting direction, and from the cutting edge of the fisheye lens of the camera 2 to the shooting direction. The direction perpendicular to the fisheye lens axis in the right direction is φ = + 90 °. Also, it is set so that φ increases in the positive direction in the direction spreading upward from the coordinates, and φ increases in the negative direction in the direction spreading downward. Therefore, from the leading edge of camera 2's fisheye lens, the direction upward to the shooting direction and perpendicular to the axis of the fisheye lens is Φ = + 90 °, and from the leading edge of camera 2's fisheye lens, it is below the shooting direction. The direction perpendicular to the axis of the fisheye lens is φ = —90 °.

[0055] The sound emission and collection device 1 acquires the voice of the conference participant who is speaking by the above-described processing, detects the conference direction, and transmits the collected sound data and the speaker orientation information Θ to the communication terminal 5. Give to. For example, if the conference person 601 shown in FIG. 4 speaks, the sound emission and collection device 1 detects the direction θ 1 of the conference party 601 and collects the collected sound data and the speaker based on the voice from the direction of the conference party 601. Direction information θ 1 is given to communication terminal 5. Also, if the conference person 605 speaks, the sound emission and collection device 1 detects the orientation Θ 2 of the conference party 605, and collects the collected sound data based on the voice from the conference 605 direction and the speaker orientation information Θ 2. Give to communication terminal 5. The communication terminal 5 gives the speaker orientation information Θ to the video processing unit 22 of the camera 2.

The video processing unit 22 corrects the imaging data based on the speaker orientation information Θ from the communication terminal 5! /. The video processing unit 22 stores in advance the relationship between the speaker orientation information Θ and the orientation angle φ set in the imaging data. When the video processing unit 22 receives the speaker orientation information Θ, the video processing unit 22 reads the corresponding orientation angle φ. For example, when the video processing unit 22 receives the speaker orientation information Θ 1 for the conference 601, the video processing unit 22 reads out the corresponding azimuth angle φ = 0 °. For example, when the video processing unit 22 receives the speaker orientation information Θ 2 for the conference person 605, the video processing unit 22 reads the corresponding orientation angle φ = −90 °.

[0057] The video processing unit 22 sets an image extraction direction angle range having a predetermined azimuth angle width including the read azimuth angle φ. In addition, the video processing unit 22 starts from a predetermined elevation angle width including an elevation angle φ = 0 °. An image extraction elevation range is set. Then, the video processing unit 22 determines an image extraction region based on the set azimuth angle range and elevation angle range, and acquires imaging data corresponding to the region as image data.

[0058] For example, when the image processing unit 22 reads out the azimuth angle φ = 0 °, the azimuth angle φ1 to the azimuth angle φ2 (φ1 <0 ° Set to angular range. In addition, the image processing unit 22 sets the range of elevation angle φ1 to elevation angle φ2 (φ1 to φ2) including φ = 0 ° as the elevation angle range. Then, the video processing unit 22 acquires the image data 621 by setting an image extraction region based on the azimuth range φ1 to φ2 and the elevation range φ1 to φ2. For example, when the image processing unit 22 reads out the azimuth angle φ = −90 °, the azimuth angle Φ3—azimuth angle φ4 to azimuth angle φ4 (φ3 <−90 ° <φ4) is included. Set to range. Further, the video processing unit 22 sets the range of elevation angle φ3 to elevation angle φ4 (φ3 <φ4) including φ = 0 ° as the elevation angle range. Then, the video processing unit 22 sets the image extraction area with the azimuth angle range φ 3 to φ 4 and the elevation angle range φ 3 to φ 4 and acquires the image data 622.

The video processing unit 22 performs image correction conversion for each acquired image extraction area. Specifically, each pixel defined by two angular directions, φ direction and φ direction, is corrected so as to be applied to a pixel in an orthogonal two-dimensional plane coordinate (X—Υ coordinate system). At this time, the video processing unit 22 stores a conversion processing table between the φφ coordinate system and the X−Υ coordinate system in advance, and calculates the X−Υ coordinate based on the obtained φ−φ coordinate of each pixel. And compensate for transformation. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.

[0060] For example, as shown in FIG. 5 (B), the video processing unit 22 uses a plane coordinate system to store the image data 621 set in the azimuth range φ1 to φ2 and the elevation range φ1 to φ2. Converted to the corrected image data 621 'set by xl to x2 and yl to y2 with the horizontal direction as the X axis and the vertical direction as the vertical axis. By this conversion, the person image 611 of the conference person 601 obtained in the φφ coordinate system is converted into a corrected person image 631 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected person image 631 becomes close to the natural body image of the conference person 601.

[0061] Further, for example, as shown in FIG. 5C, the video processing unit 22 converts the image data 622 set in the azimuth angle range φ3 to φ4 and the elevation angle range φ3 to φ4 into plane coordinates. System, horizontal direction X axis Is converted to the corrected image data 622 ′ set by x3 to x4 and y3 to y4 with the vertical direction as the Y axis. By this conversion, the person image 615 of the conference person 605 acquired in the φφ coordinate system is converted into a corrected person image 635 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected person image 635 becomes close to the natural body image of the conference participant 601.

The video processing unit 22 attaches time information to the corrected image data including the corrected human image approaching the natural body in this way, and outputs the corrected image data to the communication terminal 5 as video data. Such generation and output of the corrected image data are performed sequentially. If the received speaker orientation information Θ changes, the center direction of the corrected image data is switched according to the change.

[0063] The communication terminal 5 uses the video data from the video processing unit 22, the collected voice data, and the speaker orientation information.

Communication data is generated by associating with Θ, and transmitted to the video conference apparatus of the other party via the network 500. As a result, it is possible to provide a conference person who is present in the vicinity of the other party's video conferencing apparatus with an image close to the natural state of the conference participant who is speaking and the speech of the conference participant.

[0064] ( ² ) Document shooting mode

As shown in FIG. 6, when the power of conference person 60;! To 605 is set in the vertically downward direction as shown in FIG. 6, the video processing unit 22 of the camera 2 causes the document shooting mode to be detected by the detection signal from the switch 4. Detects that a command was selected. When the video processing unit 22 detects the document photographing mode, the video processing unit 22 gives a selection signal for the mode to the communication terminal 5.

[0065] In addition, the force of any of the participants 60;! To 605 places the material 650 around the vertical downward position of the hinge 203 in the table 700. At this time, if the material placement marking is performed on the table 700 in advance, the material 650 can be placed easily and appropriately.

The imaging unit 21 of the camera 2 acquires imaging data obtained by imaging the material 650 placed on the table 700 through the fisheye lens, and outputs it to the video processing unit 22. Here, since the imaging data passes through the fisheye lens, the imaging area becomes circular as shown in FIG.

[0067] When the document shooting mode is selected, the image processing unit 22 sets the center of the imaging data as the origin and the distance r extending in the radial direction from the origin, and a predetermined direction (see FIG. In Fig. 7, it is expressed as an angle 71 with respect to the image data from the origin in the right direction (0 ° direction). Obtained in the coordinate system. The video processing unit 22 cuts out image data 680 in a preset range from the acquired imaging data.

The video processing unit 22 corrects the image data 680 in the r η coordinate system by converting it into corrected image data 680 ′ in the X−Υ plane coordinate system. At this time, the video processing unit 22 stores in advance a coordinate conversion processing table in which the center coordinates of the r η coordinate system and the X Υ coordinate system coincide with each other, and the X— Y coordinate is calculated and corrected. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and V can be used for the coordinate conversion calculation formula and fi correction can be performed.

[0069] By this conversion, the material image 660 of the material 650 acquired in the r-7 coordinate system is converted into a corrected material image 670 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected material image 670 becomes close to the natural body image of the material 650. That is, the image data of the material 650 can be acquired.

The communication terminal 5 generates communication data including the image data of the material 650 acquired from the video processing unit 22 and transmits the communication data to the partner video conference apparatus via the network 500. As a result, it is possible to provide clear and easy-to-see material images to the conference attendees who are present around the video conference device of the other party. At this time, if the collected sound data is acquired from the sound emission and collection device 1, the communication terminal 5 generates and transmits communication data including the collected sound data together with the image data of the material 650. Also good.

[0071] As described above, the configuration and processing of the present embodiment

By using, you can acquire and transmit the video of the conference and the image of the document in a state suitable for each specification. At this time, simply changing the camera in two directions, the horizontal direction and the vertical downward direction, can easily obtain the video corresponding to the specifications of the conference video and the document image.

Next, a video conference apparatus according to the second embodiment will be described with reference to the drawings.

FIG. 8 is an external view of an assembly member including the sound emission and collection device 1, the camera 2, and the support 7 in the video conference device of the present embodiment, (A) is a plan view, and (B) is a side view. It is.

FIG. 9 is a diagram showing a usage situation of the video conference apparatus using the video conference apparatus of the present embodiment, where (A) is a plan view and (B) is a side view. 8 and 9, the sound emission and collection device 1 and cables connected to the camera 2 are not shown.

FIG. 10 is a diagram for explaining generation of video data by the video conference apparatus according to the present embodiment. (A) is a diagram showing imaging data, and (B) is a concept of image correction at the center of the imaging data. FIGS. 2C and 2C are conceptual diagrams of image correction around the image data.

In the video conference apparatus of the present embodiment, the configuration and processing of the sound emitting and collecting apparatus 1 and the communication terminal 5 are the same as those of the video conference apparatus of the first embodiment. On the other hand, the video conferencing apparatus of the present embodiment is different from the first embodiment in that the switch 4 is installed in the structure of the camera 2, that is, the structure of the support 7 and the video processing method in the video processing unit 22 of the camera 2. It is omitted.

As shown in FIG. 8, a support 7 is disposed around the disc-shaped sound emitting and collecting apparatus 1. The support 7 includes four vertical support shafts extending in the vertical direction, two horizontal support shafts disposed at a distance h 1 from the top surface of the sound emitting and collecting device 1, and the top surface of the sound emitting and collecting device 1. It consists of four horizontal spindles arranged at a distance h2 (> hl). The two horizontal support shafts arranged at the distance hi have a structure that intersects at a substantially central position when the sound emitting and collecting apparatus 1 is viewed in plan, and are held at the distance hi by the four vertical support shafts. The horizontal support shafts arranged at the distance h2 are assembled so as to be substantially square in a plan view, and are held at the distance h2 by four vertical support shafts.

[0075] Camera 2 is installed at the intersection of two horizontal spindles at distance hi. Camera 2 is installed so that the shooting direction is vertically upward.

[0076] The mounting table 8 is supported by four horizontal support shafts at a distance h2, and the mounting table 8 is formed of a highly transmissive glass, an acrylic plate, or the like. At this time, the mounting table 8 and the camera 2 are installed so that the center of the mounting table 8 and the axis of the fisheye lens of the camera 2 substantially coincide with each other in a plan view.

On the mounting table 8, the material 650 is placed with the printing surface in a vertically downward direction, that is, in a direction in contact with the mounting tape nozzle 8.

[0078] Here, the height of the camera 2 and the height of the mounting table 8, that is, the distances hi and h2, are as shown in FIG. It should be set so that it can be photographed and is not hidden by the horizontal spindle that supports the mounting table 8.

When the video conferencing apparatus having such a configuration is used, it is acquired by the imaging unit 21 of the camera 2. The imaging data is as shown in Fig. 10 (A). In other words, since the imaging data is taken through a fisheye lens, the entire imaging area is a circular all-area image data 610, and the document image 660 of the document 650 is projected at the center, and each of the surrounding areas is each image data 660. Personnel image 60;! ~ 604 People image 64;! ~ 644 are shown.

[0080] For the circular image data, the image processing unit 22 uses the center of the image data as the origin, the distance r extending in the radial direction from the origin, and a predetermined direction (in FIG. 10, from the origin to the image data). It is obtained in the r 7] coordinate system expressed by the angle η with respect to the right direction (0 ° direction). The video processing unit 22 cuts out a predetermined range of image data 681 from the acquired imaging data.

The video processing unit 22 corrects the image data 681 in the r η coordinate system by converting it into corrected image data 681 ′ in the X−Υ plane coordinate system. At this time, the video processing unit 22 stores in advance a coordinate conversion processing table in which the center coordinates of the r η coordinate system and the X Υ coordinate system coincide with each other, and based on the acquired r 7] coordinates of each pixel, — Calculate Y coordinate and perform correction conversion. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and V can be used for the coordinate conversion calculation formula and fi correction can be performed.

By this conversion, as shown in FIG. 10B, the material image 660 of the material 650 acquired in the r η coordinate system is converted into a corrected material image 670 in the Χ-Χ coordinate system (planar coordinate system). The By transforming into the X-coordinate system in this way, the corrected material image 670 becomes close to the natural body image of the material 650. In other words, it is possible to obtain the image data of the document 650 that is not distorted

In addition, the video processing unit 22 acquires peripheral image data 682 by removing the image data 681 near the center from the entire region image data 610. Based on the speaker position information acquired from the sound emission and collection device 1 via the communication terminal 5, the video processing unit 22 sets an area to be extracted as in the first embodiment. That is, the video processing unit 22 extracts a region including the image of the conference participant who is speaking, and acquires the partial image data 683. At this time, the video processing unit 22 acquires partial image data in the rV coordinate system. Specifically, as shown in FIG. 10 (C), the video processing unit 22 determines the coordinates of the four corners of the fan shape including the image of the corresponding conference (rlO) based on the speaker orientation information. , 7] 10), (rlO, η 20), (r20, η 20), (r20,] 10) ί To obtain.

The video processing unit 22 performs correction conversion on the acquired partial image data 683. Specifically, each pixel defined in the r coordinate system is compensated and transformed so as to be applied to a pixel in the orthogonal two-dimensional plane coordinate (XY coordinate system). At this time, the video processing unit 22 stores in advance a conversion processing table between the rn coordinate system and the XY coordinate system, and calculates the XY coordinate based on the acquired rn coordinate of each pixel. , Make corrections. Note that the video processing unit 22 stores a coordinate conversion calculation formula in advance, and may perform correction conversion using the coordinate conversion calculation formula.

Yes

For example, as shown in FIG. 10 (C), the video processing unit 22 has a planar coordinate system for displaying the partial image data 683 set in the distance range rl0 to r20 and the azimuth angle range η10 to η20. Converted to the corrected image data 683 'set by xl0 to x20 and yl0 to y20 with the horizontal direction as the X axis and the vertical direction as the vertical axis. By this conversion, the person image 644 of the conference person 604 acquired in the rn coordinate system is converted into a corrected person image 654 in the XY coordinate system (planar coordinate system). By converting to the XY coordinate system in this way, the corrected human image 654 becomes close to the natural image of the conference person 604.

The video processing unit 22 attaches time information to the corrected image data including the acquired correction material image 670 and the corrected image data including the corrected human image 654, and outputs it to the communication terminal 5 as video data. Generation and output of such corrected image data are performed sequentially. If the received speaker orientation information Θ changes, only the corrected image data including the corrected human image is switched according to the change. Video data is output.

[0087] The communication terminal 5 uses the video data from the video processing unit 22, the collected voice data, and the speaker orientation information.

Communication data is generated by associating with Θ, and transmitted to the video conference apparatus of the other party via the network 500. As a result, it is possible to simultaneously provide a document image together with a video that is close to the natural state of the speaking conference participant and a speech of the conference conference participant who is present in the vicinity of the other party's video conference device.

[0088] In this way, by using the configuration and processing of the present embodiment, it is possible to realize a video conferencing apparatus that acquires and transmits a conference participant's video and a document image that are currently speaking with a relatively simple structure. Touch with S. [0089] In the present embodiment, an example in which a conference participant image and a document image are acquired and transmitted at the same time has been shown. However, acquisition of a document image is temporarily performed rather than performed on a regular basis. It may be transmitted only at the timing. In this case, since the material image does not change except when the material is replaced, the content of information transmitted to the other party is not reduced compared to the case where the material image is regularly transmitted. On the other hand, while the document image is not transmitted, the processing and network load are reduced by the amount of data of the document image, so that processing and transmission can be performed at higher speed. Note that the document image acquisition timing is different from the previous image by providing an image analysis unit that can input the acquisition operation from the operation unit when a new document is placed. Time may be a new acquisition timing.

Further, in each of the above-described embodiments, the power shown in the example in which the video processing unit is provided in the camera. The video processing unit can be realized by a device independent of the camera, or the sound emitting and collecting device or the communication terminal. You may equip it. As a result, since the camera has a simpler structure, a general-purpose video camera can be used as long as it has a lens capable of shooting the necessary area described above.

Further, in the above description, an example in which the communication terminal is provided independently of the sound emission and collection device is shown, but the function of the communication terminal may be provided in the sound emission and collection device. As a result, the number of components of the video conference apparatus is reduced, so that a simpler and smaller video conference apparatus can be realized.

[0092] The present invention has been described in detail and with reference to particular embodiments. It should be understood that various changes and modifications can be made without departing from the spirit, scope or scope of the invention. It is clear to the contractor.

The present invention is based on a Japanese patent application filed on December 19, 2006 (Japanese Patent Application No. 2006-341175), the contents of which are incorporated herein by reference.

Claims

The scope of the claims

[1] an imaging unit for imaging a predetermined area;

A video data generation unit that generates video data based on the video captured by the imaging unit, and a sound collection / sound collection unit that collects sound around the device to generate sound collection sound data and emits sound release sound data. A housing having a section;

Communication data including the collected sound data and the video data is generated, the communication data is transmitted to the outside, and the emitted sound data is acquired from the communication data from the outside and is given to the sound emitting and collecting unit And

A support unit that supports the imaging unit in a predetermined manner;

A video conferencing apparatus comprising:

The support part is

The first aspect in which the imaging unit is directed to a conference person imaging region around the casing, and the second aspect in which the imaging unit is directed to an area near the imaging unit in the vicinity of the casing. Support the imaging unit,

The video data generation unit

When the selection of the first mode is detected, only the azimuth area corresponding to the sound collection direction information of the collected sound data is cut out from the video data, and the cut out video data is changed to the first type according to the first mode. Compensate by the compensation process,

When the selection of the second mode is detected, a predetermined area centered on the front direction of the imaging unit is cut out from the video data, and the second correction according to the second mode different from the first correction process is performed. Video conferencing equipment that compensates video data cut out by processing.

[2] The support portion includes a joint mechanism that switches between the first mode and the second mode, and forms a switch by the joint mechanism,

2. The video conference apparatus according to claim 1, wherein the video data generation unit detects selection between the first mode and the second mode based on a selection status of the switch by the joint mechanism.

[3] an imaging unit for imaging a predetermined area;

A video data generation unit that generates video data based on the video captured by the imaging unit, and collects sound around the device itself to generate sound collection sound data, and emits sound emission sound data A housing having a sound emission and collection part;

A support unit for supporting the imaging unit with respect to the housing;

A video conferencing apparatus comprising:

The imaging unit simultaneously images a conference person imaging region and a region near the imaging unit in the vicinity of the housing,

The video data generation unit

Only the azimuth area corresponding to the sound collection azimuth information of the collected sound data is cut out from the first partial video data corresponding to the conference person imaging area, and the cut out first partial video data is corrected by the third correction process. And

A video conferencing apparatus that corrects the second partial video data corresponding to an area close to the imaging unit by a fourth correction process different from the third correction process.

[4] a selection unit that selects partial video data used for the communication data;

4. The video conferencing apparatus according to claim 3, wherein the video data generation unit provides the communication unit with partial video data selected by the selection unit.

[5] The imaging unit includes a fisheye lens, a central region of the region imaged by the fisheye lens is a region close to the imaging unit, and at least a peripheral region outside the central region is the conference person imaging region. The video conference apparatus according to any one of Items ;! to 4.

6. The video conference apparatus according to claim 5, wherein the video data generation unit is formed integrally with the imaging unit.

[7] The communication unit is integrally formed with the casing together with the sound emission and collection unit;

6! /, Video conferencing equipment as described in any of the above.

8. The video conference apparatus according to claim 5, wherein the video data generation unit is integrally formed with the housing together with the sound emission and collection unit.

[9] Equipped with a display monitor that plays back video data,

The communication unit acquires video data included in the communication data, and displays the display mode. The video conference apparatus according to claim 1, which is given to Utah.