CN118975274A

CN118975274A - Sound augmented reality object reproduction device and information terminal system

Info

Publication number: CN118975274A
Application number: CN202280094489.6A
Authority: CN
Inventors: 鹤贺贞雄; 桥本康宣; 吉泽和彦; 泷泽和之
Original assignee: Maxell Ltd
Current assignee: Maxell Ltd
Filing date: 2022-04-04
Publication date: 2024-11-15

Abstract

The sound augmented reality object reproduction apparatus is an apparatus capable of mapping a target to a virtual space. The sound augmented reality object reproducing apparatus includes a processor that performs a prescribed process. The processor of the sound augmented reality object reproducing apparatus can execute, for example, the process of: based on the sound input by being output from an information terminal such as a smart phone or an appropriate wearable device, the information terminal or an application on the information terminal is mapped as a target to a position in a virtual space corresponding to the position of the information terminal.

Description

Sound augmented reality object reproduction device and information terminal system

Technical Field

The present invention relates to a sound augmented reality object reproduction apparatus and an information terminal system using the sound augmented reality object reproduction apparatus.

Background

As an example of the sound augmented reality object (voice augmented reality object) reproducing apparatus (reproduction device), there is known an apparatus capable of being worn on the head of a user, outputting sound by a spatial audio technique from a sound output device such as a speaker, and displaying various information on a display screen in front of the eyes of the user.

Here, patent document 1 discloses a technique concerning spatial audio. That is, patent document 1 discloses a spatial audio signal reproduction device that generates and reproduces a spatial audio signal, the device including: a first processing unit that performs fourier transform along an azimuth angle on a head-related transfer function measured at a first distance, performs transform processing from the first distance to a second distance using a Hankel function, and further performs inverse fourier transform with the number of times of the Hankel function as a variable, thereby generating a head-related transfer function at the second distance; and a second processing section that applies a head related transfer function at the second distance as a filter to the input acoustic signal to generate the spatial audio signal.

The technique of patent document 1 is considered to have an effect that even when a method of synthesizing HRTF at an arbitrary distance in the horizontal plane is used, degradation of quality due to discontinuity is suppressed, and high-quality spatial audio reproduction (reproduction) can be realized. Further, patent document 1 discloses a spatial audio signal reproduction device capable of realizing a high presence (presentation) on a horizontal plane where human perception accuracy is high.

On the other hand, patent document 2 discloses a sound processing apparatus. That is, patent document 2 discloses a sound processing apparatus comprising: a microphone array having at least 2 or more channels of microphone elements; a band dividing unit that divides a signal from the microphone array into a plurality of frequency bands by channels; a sound source positioning unit that estimates a sound source direction from the band-split signal after the band splitting; a sound source separation unit which emphasizes the band division signal for each estimated sound source direction; a sound source repetition determination unit that determines whether the frequency band division signal is a signal from a plurality of sound sources or a single sound source, using the emphasized frequency band division signal and the estimated sound source direction information; and a sound source searching unit which performs a sound source search using the band division signal determined to be from the single sound source.

The technique of patent document 2 judges whether or not a plurality of sound sources are repeated, and uses only a band division signal from a single sound source for sound source localization, so that a band component in which a plurality of sound sources are repeated and the direction information of the sound source is lost is not used. Thus, the technique of patent document 2 is considered to be capable of precisely knowing the sound emission direction of sound and music.

Prior art literature

Patent literature

Patent document 1: japanese patent laid-open publication No. 2018-64227

Patent document 2: japanese patent laid-open No. 2006-227328

Disclosure of Invention

Technical problem to be solved by the invention

As a user usage of the sound augmented reality object reproducing apparatus, for example, the following method can be considered. That is, the user maps (maps) a target (information terminal or application) as a virtual object to a virtual space of the sound augmented reality object reproducing apparatus, and performs an operation of the mapped target using the sound augmented reality object reproducing apparatus. However, for example, when the user has limited external vision, the user cannot easily map the image, and the user's convenience is poor. Further, patent document 1 and patent document 2 described above do not disclose such mapping techniques.

The present invention aims to provide a sound augmented reality object reproducing device which improves user convenience and enables easy mapping, and an information terminal system using the sound augmented reality object reproducing device.

Technical means for solving the problems

According to a first aspect of the present invention, the following sound augmented reality object reproducing apparatus is provided. The sound augmented reality object reproduction apparatus is capable of mapping a target to a virtual space. The sound augmented reality object reproduction apparatus includes a processor. The processor maps the information terminal or an application on the information terminal as a target to a position corresponding to the position of the information terminal in a virtual space based on a sound output from the information terminal and input.

According to a second aspect of the present invention, the following information terminal system is provided. The information terminal system includes 1 or more information terminals and a sound augmented reality object reproducing device capable of mapping a target to a virtual space. The sound augmented reality object reproduction apparatus includes a processor. The processor maps the information terminal or an application on the information terminal as a target to a position corresponding to the position of the information terminal in a virtual space based on a sound output from the information terminal and input.

Effects of the invention

According to the present invention, it is possible to provide a sound augmented reality object reproducing apparatus which improves user convenience and can easily perform mapping, and an information terminal system using the sound augmented reality object reproducing apparatus.

Drawings

Fig. 1 is a block diagram for explaining an example of the structure of a head mounted display according to the first embodiment.

Fig. 2 is a diagram for explaining a connection example of communication with an information terminal.

Fig. 3 is a diagram for explaining an example of the structure of the head mounted display.

Fig. 4 is a diagram for explaining an example of the structure of the head mounted display.

Fig. 5 is a diagram for explaining an example of a method for mapping a target by a user.

Fig. 6 is a diagram for explaining an example of a method for mapping a target by a user.

Fig. 7 is a diagram for explaining an example of a method for mapping targets by a user.

Fig. 8 is a diagram for explaining the sound source of the sound heard at the time of mapping.

Fig. 9 is a diagram for explaining the sound source of the sound heard after mapping.

Fig. 10 is a diagram for explaining a relationship between a virtual sound source in a virtual space and spatial audio.

Fig. 11 is a diagram for explaining the position of a virtual sound source in a local coordinate system.

Fig. 12 is a diagram for explaining the position of a virtual sound source in the world coordinate system.

Fig. 13 is a flowchart for explaining an example of the mapping process.

Fig. 14 is a flowchart for explaining an example of the mapping process.

Fig. 15 is a flowchart for explaining an example of the mapping process.

Fig. 16 is a flowchart for explaining an example of the sound operation processing.

Fig. 17 is a flowchart for explaining an example of the sound operation processing.

Fig. 18 is a flowchart for explaining an example of the sound operation processing.

Fig. 19 is a diagram for explaining an example of input and output of data between the head mounted display and the information terminal in the audio operation.

Fig. 20 is a block diagram for explaining an example of the structure of the audio augmented reality object reproducing apparatus according to the second embodiment.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following description is one embodiment of the present invention, and is not limited to other configurations and modes that can perform the same processing. The mapping technique of the invention can help the sustainable development goal (SDGs: sustainable Development Goals) advocated by the united nations to realize '9. Industry, innovation and infrastructure'.

First, as an example of the audio augmented reality object reproducing apparatus, an example of the structure of a head mounted display (sometimes referred to as an HMD) will be described with reference to fig. 1. The sound augmented reality object reproduction apparatus is an apparatus capable of mapping a sound of a target (target) and reproducing the mapped sound of the target. Fig. 1 is a block diagram for explaining an example of the structure of an HMD. According to the first embodiment, the HMD101 can map a target into a virtual space, and generate an icon of the mapped target. Then, the user can select the generated icon and operate the mapped object.

As shown in fig. 1, HMD101 includes control unit 10, ROM11, RAM12, storage unit 13, camera 14, display 15, microphone 16, speaker 17, buttons 18, and touch sensor 19.

The control unit 10 (processor) controls the entire HMD101 according to a predetermined operation program. The control unit 10 transmits and receives various commands, data, and the like to and from each constituent module in the HMD101 via a system bus as a data communication path. The control unit 10 may be a main body capable of executing a predetermined process, and may be constituted by CPU (Central Processing Unit), for example, but may be constituted by a semiconductor device such as GPU (Graphics Processing Unit).

The ROM11 is configured by an appropriate storage device such as a flash ROM, and stores data such as programs related to operations and processes performed by the HMD 101. The RAM12 is a memory (internal memory) used when the control unit 10 executes predetermined processing. The storage unit 13 may be configured by an appropriate storage device such as a hard disk drive (HDD: HARD DISK DRIVE) and may store data.

The camera 14 is provided at an appropriate position so as to be able to acquire an external image. The camera 14 may be provided so as to be able to acquire information outside the field of view of the user, for example.

A display 15 (display section) is provided on the front side for displaying an image. For example, an image acquired by the camera 14 may be displayed on the display 15, and a user wearing the HMD101 may visually acquire information by viewing the image acquired by the camera 14 displayed on the display 15. As will be described in detail later, the display 15 can display an icon generated by performing the mapping process, but other information (for example, information on the output volume of the HMD101, information acquired from the outside by wireless communication, and the like) may be appropriately displayed on the display 15.

In addition, the display 15 can be of an appropriate structure. The display 15 may be, for example, a non-see-through type or a see-through type. The HMD101 may be configured to have 1 display 15 arranged in front of both eyes of the user, or may be configured to have 1 display 15 arranged to cover both eyes of the user.

The microphone 16 is a sound input device, and in the present embodiment, is provided at an appropriate position so as to be able to input sound of a user wearing the HMD 101. The microphone 16 may be provided, for example, via a part extending to the mouth.

The speaker 17 is a sound output device, and can output information by sound. The speaker 17 is provided at an appropriate position in such a manner that the user can hear the outputted sound. In addition, a different sound output device from the speaker 17 may be used, and for example, headphones may be provided as the sound output device.

HMD101 may be configured such that a user can perform various operations such as adjustment of volume and image quality, setting of communication, and the like using button 18 or touch sensor 19. By pressing the buttons 18 corresponding to the operation requested by the user, the requested operation content can be realized, and the positions and the number of the buttons 18 can be appropriately set. The touch sensor 19 is appropriately provided to be able to detect an operation by a user such as pressing an icon displayed on the display 15.

HMD101 includes voice recognition unit 20. The voice recognition unit 20 is configured to include a circuit or the like for performing voice recognition (voice recognition) processing. Here, the program and data used for voice recognition are arranged in an appropriate storage device such as the ROM11 and the storage unit 13. The processing of the voice recognition unit 20 may be performed by a known method, for example, by analyzing and recognizing an input voice using an acoustic model or a language model.

HMD101 includes sound input unit 21. The audio input unit 21 constitutes an audio input device for inputting audio output from the information terminal 102, for example, in a mapping process described later. The sound input unit 21 may be configured by, for example, an array microphone 22, a directional microphone 23, or the like, as will be described in detail later, using a sound input device capable of acquiring azimuth information to a sound generation source.

HMD101 includes distance measuring unit 24. The distance measuring unit 24 may be configured by a sensor that measures a distance to the information terminal 102, for example, in a mapping process described later. The distance measuring unit 24 may be configured by, for example, a range camera 25 (for example, a stereo camera), a LiDAR26, a distance sensor 27 that is different from the above and that can appropriately measure a distance to the information terminal 102, and the like. The distance measuring unit 24 may be constituted by 1 or more sensors. The distance measuring unit 24 may be composed of 1 or more types of sensors.

HMD101 includes head tracking unit 28. The head tracking unit 28 detects the inclination of the user's head when the HMD101 is worn. The head tracking unit 28 is constituted by, for example, a sensor such as an acceleration sensor 29 or a gyro sensor 30. In addition, the head tracking section 28 may be constituted by 1 or more sensors. In addition, the head tracking unit 28 may be configured by 1 or more types of sensors.

The HMD101 includes an eye-movement tracking unit 31. The eye tracking unit 31 detects a line of sight direction of the user when the HMD101 is worn. The eye movement tracking unit 31 is constituted by a sensor such as a line-of-sight detection sensor 32, for example. In addition, the eye tracking section 31 may be constituted by 1 or more sensors. The eye tracking unit 31 may be composed of 1 or more types of sensors.

HMD101 includes a communication processing unit 33. The communication processing unit 33 includes a circuit or the like that performs communication processing (e.g., signal processing) in wireless communication, and in this embodiment, the HMD101 includes a wireless LAN communication unit 34 that performs communication processing when wireless LAN communication is used, and a short-range wireless communication unit 35 that performs communication processing when short-range wireless communication is used.

Further, HMD101 includes interface 36 used for communication. HMD101 can transmit and receive data to and from the outside by performing wireless communication with the outside via interface 36. Here, HMD101 may be provided with antenna 37 used for wireless communication. In addition, a device used for wireless communication such as a wireless adapter may be provided.

Next, an example of a mode of wireless communication will be described with reference to fig. 2. As shown in fig. 2, HMD101 can communicate with information terminal 102 via network 202, for example. Here, in the present embodiment, the information terminal 102 is a device capable of outputting sound, and examples of the information terminal 102 include a wearable device 200 and a smart phone 201.

Next, an example of the structure of HMD101 in which sound input unit 21 is constituted by array microphone 22 will be described with reference to fig. 3. In the example of fig. 3, the HMD101 has a spectacle shape, but the configuration of the HMD101 is not limited to this example and can be changed as appropriate. Here, description will be made with reference to the front-rear, left-right, and up-down directions shown in fig. 3.

As shown in fig. 3, HMD101 includes front frame portion 51, left frame portion 52, and right frame portion 53 on the front side (front side). On the front frame portion 51, 2-piece displays 15 are mounted so as to be respectively positioned before the left and right eyes of the user when worn.

The left frame portion 52 extends rearward from the left end portion 51a of the front frame portion 51 and is positioned on the left head side of the user when worn. The speaker 17, not shown in fig. 3, is attached to the left frame 52 so as to output sound to the left ear of the user. Similarly, the right frame portion 53 extends rearward from the right end portion 51b of the front frame portion 51 and is positioned on the right head side of the user when worn. The speaker 17, not shown in fig. 3, is attached to the right frame portion 53 so as to output sound to the right ear of the user.

In addition, in the HMD101, microphones constituting the array microphone 22, i.e., a first microphone 22a, a second microphone 22b, and a third microphone 22c are provided. In the example of fig. 3, the first microphone 22a and the second microphone 22b are arranged at the left end portion 51a and the right end portion 51b of the front frame portion 51. That is, the first microphone 22a is disposed at the right lower end portion of the front frame portion 51, and the second microphone 22b is disposed at the left upper end portion of the front frame portion 51. The third microphone 22c is disposed outside (right side) the right frame portion 53. In contrast to the arrangement shown in fig. 3, the first microphone 22a may be arranged at the left lower end portion of the front frame portion 51, the second microphone 22b may be arranged at the right upper end portion of the front frame portion 51, and the third microphone 22c may be arranged outside (left side) the left frame portion 52. The first microphone 22a and the second microphone 22b may be located on the front side of the HMD101 or on the left and right sides at the end of the front frame portion 51.

With the first microphone 22a and the second microphone 22b thus arranged, when sounds are input thereto, the direction of the sound source (the direction concerning the left-right direction and the up-down direction) can be determined based on the difference in time between the inputs to the first microphone 22a and the second microphone 22 b. In addition, when the sound is input through the first microphone 22a and the third microphone 22c, the direction of the sound source (the direction related to the front-rear direction) can be determined based on the difference in time between the inputs to the first microphone 22a and the third microphone 22 c. Thus, with the array microphone 22 thus configured, the hmd101 can easily determine the direction of the sound source.

Here, regarding the arrangement described above, it is preferable to arrange the microphones (22 a, 22b, 22 c) of the array microphone 22 so that the distance between the first microphone 22a and the second microphone 22b is substantially equal to the distance between the first microphone 22a and the third microphone 22 c. By adopting such a structure of the positional relationship, the accuracy of determining the sound source direction can be improved.

Next, an example of the structure of HMD101 in which sound input unit 21 is constituted by directional microphone 23 will be described with reference to fig. 4. As in the case of fig. 3, the example of fig. 4 uses an HMD101 having a spectacle shape, but is not limited to this configuration. Here, description will be made with reference to the front-rear, left-right, and up-down directions shown in fig. 4.

As in the case of the array microphone 22 described above, the HMD101 includes a front frame portion 51 on the front side (front side), a left frame portion 52, and a right frame portion 53, the display 15 is mounted on the front frame portion 51, and speakers 17 not shown in fig. 4 are mounted on the left frame portion 52 and the right frame portion 53.

In the example of fig. 4, the directional microphone 23 is disposed on the upper end side of the center portion 51c of the front frame portion 51. The direction of the sound source can be determined by using the directional microphone 23. In addition, the directivity pattern of the microphone may be appropriately set as long as the direction of the sound source can be determined. In this example, the directional microphone 23 is disposed on the upper end side of the center portion 51c of the front frame portion 51, but the directional microphone 23 may be disposed at another position. The directional microphones 23 may be provided singly or in plural, but the number of microphones can be reduced by appropriately switching the directional modes of the microphones, for example.

Although HMD101 including array microphone 22 and HMD101 including directional microphone 23 have been described above, HMD101 may be configured as described below. For example, both the array microphone 22 and the directional microphone 23 may be provided in the HMD101, and the HMD101 may determine the direction of the sound source based on data of the sound input to both the array microphone 22 and the directional microphone 23. Further, a position adjusting mechanism that adjusts the position of the microphone may be provided in the HMD 101. The position adjustment mechanism may be, for example, a mechanism that enables the position of the microphone to be adjusted by sliding the microphone along the frame. Further, HMD101 may be configured to be foldable or expandable between frames.

Next, an example of a method for performing target mapping by the user will be described with reference to fig. 5 to 7. In the examples of fig. 5 to 7, the target of mapping (i.e., mapped) is the information terminal 102 (in detail, the wearable device 200 as one example of the information terminal 102). In this example, the information terminal 102 can input a voice and output a voice, and can shift to a mode (mapping mode) for mapping by recognizing the input voice.

As shown in fig. 5, the user (the operator 100 in fig. 5) wearing the HMD101 instructs the HMD101 and the wearable device 200 to start mapping by inputting sounds to start mapping to the microphone 16 of the HMD101 and the wearable device 200. The user issues a sound such as "start mapping" as a mapping start instruction and inputs the sound, whereby the HMD101 and the wearable device 200 shift to the mapping mode based on appropriate sound recognition.

Further, although an example in which the HMD101 and the information terminal 102 are simultaneously shifted to the mapping mode is described here, the respective information devices (101, 102) may be shifted to the mapping mode at different timings. The user may, for example, cause the information terminal 102 to shift to the mapping mode after causing the HMD101 to shift to the mapping mode.

Next, as shown in fig. 6, the user moves the wearable apparatus 200 to a position to be registered, causing the wearable apparatus 200 to output a sound. Here, the user causes the wearable device 200 to output sound by an appropriate method (e.g., manipulating his keys, touching a screen, or voice input to the wearable device 200).

Then, as shown in fig. 7, since the sound from the information terminal 102 is input to the HMD101 (specifically, the sound input unit 21 of the HMD 101), the HMD101 performs a process of mapping the information terminal 102 to the virtual space based on the input sound. Here, the HMD101 determines the azimuth of the sound source (i.e., the information terminal 102) based on the sound input to the sound input unit 21, and calculates the distance to the sound source. The distance to the sound source may be appropriately calculated using data of the sound input to the sound input section 21 (for example, data obtained by correlating the size of the input sound with the distance to the sound source). In the case where the HMD101 includes the distance measuring unit 24, the measurement result of the distance to the information terminal 102 obtained by the distance measuring unit 24 may be used. By using the measurement result of the distance measurement unit 24, the mapping accuracy (in particular, the accuracy in the depth direction to the information terminal 102) can be improved. Further, HMD101 may perform position detection of information terminal 102 by wireless communication with information terminal 102, and map the result thereof.

Then, the HMD101 maps the information terminal 102 (wearable device 200 in this example) as a target to a corresponding position in the virtual space based on the azimuth of the sound source and the distance to the sound source, and configures a virtual sound source 103 as a mapped target. In the description herein, the information terminal 102 is targeted for mapping, but the application of the information terminal 102 may be targeted for mapping. In this case, the mapping process of the application is performed by causing the information terminal 102 having the target application to output a sound at the time of starting or using the target application.

Then, HMD101 generates an icon representing the mapped object, and can display the generated icon on display 15. Here, the HMD101 may display an icon at an appropriate position on the display 15, and may display an icon of a target at a position corresponding to the position mapped to the target in the virtual space, for example. In addition, the HMD101 may attach information about the name representing the target (for example, text information of "wearable device" in the case where the target is the wearable device 200) to the icon and display it.

Here, an example of the audio output of HMD101 at the time of mapping and after mapping will be described with reference to fig. 8 and 9.

At the time of target mapping, as shown in fig. 8, the user can hear sound from the information terminal 102 (the wearable device 200 in this example) and sound from the speaker 17 of the HMD 101. Here, the speaker 17 (left and right speakers 17a, 17b in fig. 8) of the HMD101 outputs a sound having a position considered to be the same as the information terminal 102 (i.e., a position obtained based on the azimuth of the information terminal 102 and the distance to the information terminal 102) as the virtual sound source 103. Thus, the same sound as the sound heard from the information terminal 102 (i.e., the sound heard from the position of the virtual sound source 103) is output from the speaker 17 of the HMD 101. Therefore, the user can easily confirm whether or not the mapping is properly performed by comparing the sound actually heard from the information terminal 102 with the sound output from the speaker 17.

In addition, after the mapping, as shown in fig. 9, even if the position of the information terminal 102 (in this example, the wearable device 200) is changed, the HMD101 outputs sound as if it were heard from the position of the virtual sound source 103.

Here, a relationship between the virtual sound source 103 and the spatial audio in the virtual space 300, which is the mapping space of the target, will be described with reference to fig. 10. In the present embodiment, HMD101 calculates whether or not the ear can hear the sound emitted by virtual sound source 103 disposed in virtual space 300, thereby representing the spatial audio.

That is, by the user's operation as described above, the HMD101 maps the target to the virtual space 300, which is a coordinate space centered on the position of the user (in the figure, the operator 100 who wears the HMD 101), and virtual sound sources (103 a, 103 b) are arranged at mapped positions in the virtual space. Then, the HMD101 performs appropriate sound output based on the direction and distance of the virtual sound sources (103 a, 103 b), thereby representing spatial audio. Here, the HMD101 can adjust the sound according to the sound output device, and can output the adjusted sound. For example, in the case where the sound output device is the speaker 17, the HMD101 can output sound adjusted accordingly according to the speaker 17. For example, in the case where the sound output device is a headphone, the HMD101 can output sound adjusted accordingly in accordance with the headphone.

In addition, in the present embodiment, the HMD101 can map the target to the virtual space 300 of the coordinate system (local coordinate system or world coordinate system) selected by the user. Here, the position of the virtual sound source in each coordinate system when the user moves or the like will be described with reference to fig. 11 and 12.

First, a case where the virtual space 300 is a local coordinate system will be described with reference to fig. 11. The local coordinate system is a coordinate system in which the positions of the virtual sound sources (103 a, 103 b) move together with the user (in the figure, the operator 100), and in the case of the local coordinate system, the virtual sound sources (103 a, 103 b) move in accordance with the movement of the user.

As shown in fig. 11, for example, when a user wearing the HMD101 changes the orientation, the positions of the mapped virtual sound sources (103 a, 103 b) change following the changed orientation of the user. In the example of fig. 11, the virtual sound source 103c is arranged in the virtual space 300 because the position of the virtual sound source 103a is changed, and the virtual sound source 103d is arranged in the virtual space 300 because the position of the virtual sound source 103b is changed. As described above, the positions of the virtual sound sources (103 a, 103 b) move in the virtual space 300 in such a manner that a relationship between a certain azimuth and a certain distance is maintained with respect to the position and the orientation of the user (in other words, the position and the orientation of the HMD 101) in the local coordinate system. In addition, in this processing, for example, head tracking may be used. In addition, for example, a GPS reception sensor may be provided in the HMD101, and GPS-based data may be used.

Accordingly, in the case where the user wearing the HMD101 changes the direction or moves the position in the local coordinate system, the position of the virtual sound source with respect to the user and the distance from the virtual sound source remain unchanged, and the HMD101 outputs sound from the virtual sound source satisfying the relationship between the fixed position and the fixed distance with respect to the user.

In contrast, the world coordinate system is a coordinate system in which the positions of the virtual sound sources (103 a, 103 b) are fixed, and in the world coordinate system, the positions of the virtual sound sources (103 a, 103 b) are unchanged even when the user moves or the like. Thus, as shown in fig. 12, for example, in the case where the user (in the figure, the operator 100) changes the orientation, the orientation of the virtual sound sources (103 a, 103 b) with respect to the user changes accordingly, and the HMD101 outputs sounds from the virtual sound sources (103 a, 103 b) in different directions before and after the user changes the orientation. Therefore, unlike the local coordinate system, in the world coordinate system, the user changes the direction and movement, which causes the direction of the sound to be heard and the sense of distance of the sound to be changed.

Next, details of the mapping process will be described with reference to flowcharts shown in fig. 13 to 15. Fig. 13 to 15 are flowcharts for explaining an example of the mapping process.

As shown in fig. 13, HMD101 waits until the user instructs to start mapping (S101). Then, the user utters a sound indicating the start of the mapping (for example, the user utters "start mapping") (S102), whereby the control section 10 performs voice recognition to recognize a keyword indicating the start of the mapping (S103). Then, the HMD101 (specifically, the control unit 10) recognizes a keyword by voice recognition, and starts a mapping mode, which is a mode for performing target mapping (S104). Here, the HMD101 outputs a sound for selecting whether to perform mapping in the local coordinate system or to perform notification of mapping in the world coordinate system (S105). Then, the user makes a sound (e.g., the user speaks "local coordinate system") of the keyword selected in which coordinate system to perform mapping (S106), and the control unit 10 performs voice recognition to recognize the keyword regarding which coordinate system to use (S107). Then, HMD101 outputs a sound informing the user that the mapping mode in the selected coordinate system has been started (S108). Here, the HMD101 outputs, for example, a sound of "start of the mapping mode in the local coordinate system".

In addition, the data such as the keywords used for the HMD101 to perform voice recognition in S101 to S108 may be stored in advance in an appropriate storage device such as the storage unit 13.

Next, the user sounds an instruction to start mapping to the information terminal 102 (in this example, the wearable device 200) (S109). The user speaks "start registration", for example. Here, as in the case of the HMD101 described above, the wearable device 200 recognizes a keyword by voice recognition (S110), and starts a mapping mode, that is, a device registration mode (S111). Here, the wearable device 200 may also output a sound notifying that the device registration mode has been started (S112). The wearable device 200 may output, for example, a sound of "start device registration mode".

In addition, as in the above case, the data such as the keywords used for voice recognition by the information terminal 102 in S109 to S112 may be stored in advance in an appropriate storage device of the information terminal 102. In this embodiment, although the HMD101 and the information terminal 102 are independently set to the mapping mode, the user may simultaneously shift the HMD101 and the information terminal 102 to the mapping mode by inputting the sound to the HMD101 and the information terminal 102 at the same time.

In this way, preparation for the mapping process is performed in S101 to S112. Then, mapping is performed by the processing described below.

As shown in fig. 14, first, the user moves the wearable device 200 to a location to be mapped (S201). Then, the user presses a button of the wearable device 200 to output the sound (position detection sound) of the target to be mapped (S202).

Here, in the case of taking the information terminal 102 (the wearable device 200 in this example) as a target to be mapped, the user causes it to output, for example, a sound regarding the mapping mode of the information terminal 102. On the other hand, in the case where the application of the information terminal 102 is targeted for mapping, the user operates the information terminal 102 to execute the targeted application, and causes the information terminal 102 to output the sound of the application.

In addition, the method of causing the information terminal 102 (in this example, the wearable device 200) to output sound is not limited to the method of pressing a button, and may be a method of operating a key thereof, a touch screen, or voice input, as long as sound can be appropriately output.

Then, when a sound is output from the information terminal 102 in S202, the HMD101 introduces the sound (position detection sound) via the sound input unit 21 (S203). Here, the array microphone 22 is used as the sound input unit 21 in this example, but for example, the directional microphone 23 may be used instead.

Then, the control section 10 calculates the position (distance and azimuth) of the wearable device 200 from the imported sound (position detection sound) (S204). Here, the control unit 10 stores the calculated position information in a memory (in this example, the storage unit 13) (S205). Then, the control section 10 maps the target (the wearable device 200 in this example) to the calculated position in the stereo sound effect space (in the virtual space 300) (S206). Here, the control unit 10 maps the target into the virtual space 300 based on the coordinate system recognized by the voice in S107. Thereby, the virtual sound source 103 is set in the virtual space 300.

After mapping in the virtual space 300, the control unit 10 outputs sound from the speaker 17 so that sound is output from the mapped position (i.e., the virtual sound source 103) (S207). So that the user can confirm whether the target is properly mapped by comparing the sound output from the wearable apparatus 200 with the sound output from the speaker 17.

The control unit 10 may determine whether or not the mapping is appropriate based on whether or not the position of the virtual sound source 103 arranged in the virtual space 300 by the mapping matches the position of the information terminal 102. Then, the control unit 10 can automatically adjust the mapped position according to the result. That is, the control unit 10 can determine whether or not the direction of the information terminal 102 matches the direction of the virtual sound source 103, and adjust the position of the virtual sound source 103 according to the result (S208). Specifically, the control unit 10 determines the uniformity of the direction of the sound based on whether the deviation of the direction of the sound is within a predetermined threshold. Then, the control section 10 adjusts the positional information of the wearable device 200 in the case where it is determined that the directions of the sounds are not uniform. The control unit 10 stores the adjusted position information in the memory (S205), and again performs mapping based on the position information (S206), thereby adjusting the position of the virtual sound source 103.

Then, the user confirms the coincidence of the directions of the sounds of the information terminal 102 and the virtual sound source 103, presses a button of the wearable device 200, and stops the sound output (S209). In addition, as in the case of S202 described above, the user may stop the sound output of the wearable device 200 by an appropriate method other than pressing the button.

In this way, in the processing of S201 to S206, the target is mapped into the virtual space 300, and in the processing of S207 to S209, confirmation is made as to whether the mapping is appropriate. Then, the mapping process ends through the process described below.

As shown in fig. 15, the user confirms whether or not there are other targets to be mapped, and if there are other targets to be mapped, the targets are mapped by the method described above (S301). Then, when confirming that there is no object to be mapped, the user emits a sound indicating the end of the mapping (S302). Here, the user speaks "end map", for example. Then, the control unit 10 performs voice recognition to recognize a keyword indicating the completion of the mapping (S303), and ends the mapping mode (S304). Then, HMD101 outputs a sound notifying the user that the mapping mode has ended (S305). Here, HMD101 outputs a sound such as "end map mode", for example.

In this way, the mapping process is ended through S301 to S305 (S306). In S301 to S305, data such as keywords used for voice recognition by the HMD101 may be stored in advance in an appropriate storage device such as the storage unit 13.

In addition, when mapped positions in virtual space 300 are to be mapped, HMD101 may output a warning with sound. At this time, the HMD101 may output sound that suggests in which direction the position of the target to be mapped is shifted. Then, the HMD101 can recognize a keyword from the voice input by the user using voice recognition, and shift the position of the target to be mapped in a predetermined direction. Here, the keywords (e.g., "left", "right", etc.) are stored in an appropriate storage device. The offset amount can be set appropriately, for example, to a minimum amount to avoid overlapping. Then, the control unit 10 may determine the consistency of the direction of the sound in S208 on the basis of the offset.

Further, HMD101 can generate an icon representing the mapped object. Next, an example of a method for generating an icon in the HMD101 (specifically, the control unit 10) will be described.

The HMD101 can use the sound output from the information terminal 102 when generating the icon of the target. That is, data such as a keyword indicating a target and a sound to be output when the target is started is stored in the storage device in advance as data necessary for performing sound recognition. HMD101 performs voice recognition based on the voice input from information terminal 102 in S202 and the like, and determines the destination to generate the icon.

Here, for example, in the case where the target is the wearable device 200 as the information terminal 102, the HMD101 may determine that the target to generate the icon is the wearable device 200 by recognizing a sound or the like output when the wearable device 200 is started up as a keyword in the mapping mode.

Then, the HMD101 generates an icon of the determined target. Here, the data such as the pattern of the icon and the name of the icon may be stored in the storage device, and the control unit 10 may generate an icon corresponding to the determined target based on the data. In addition, as will be described in detail later, the control section 10 can display the generated icon on the display 15. At this time, a name indicating the target may be displayed in addition.

An example of icon generation in the case where the object is an application will be described. In the case where the target is an application, data representing keywords or the like of the application is stored in the storage device, as in the case of the information terminal 102.

Here, for example, in the case where the target is an application concerning weather forecast, sounds concerning keywords (for example, "weather, sunny, cloudy, rainy", etc.) of weather forecast, sounds output when the application is started, and the like may also be stored in the storage device. Then, the HMD101 performs voice recognition based on the voice of the application input from the information terminal 102 in S202 or the like, and determines a target to generate an icon.

Further, although an example in which the object to be generated with respect to the icon is determined based on voice recognition is described here, the HMD101 may acquire information for determining the object by performing communication. The HMD101 may acquire data for determining a target (e.g., information about the name of the target) by communicating with the information terminal 102, for example, and determine the target using the acquired information. Here, information (for example, a table in which information acquired by communication and a name of a target are recorded) associated with the information acquired by communication may be stored in the storage device, and the HMD101 may determine the target from the information acquired by communication by referring to the stored information.

Then, the HMD101 can display the generated icon of the target on the display 15. Here, the control unit 10 may display an icon at a position corresponding to the mapped position in the virtual space 300, for example, with reference to the user wearing the device. In addition, the display position of the icon can be moved appropriately by an operation or the like of the user. The HMD101 may be configured to be able to move the icon by an operation (by drag and drop) of a user that selects and moves the displayed icon, for example. On the other hand, as will be described in detail later, the movement of the icon may also be performed by sound input.

HMD101 is configured such that an icon of a target displayed on display 15 can be selected by a user. The user can appropriately select the icon of the target, and operate on the mapped target. Next, with reference to flowcharts shown in fig. 16 to 18, a sound operation process using icons will be described. Fig. 16 to 18 are flowcharts for explaining an example of the sound operation processing.

As shown in fig. 16, the HMD101 waits until the user instructs to start the voice operation mode (the mode in which voice operation is enabled) (S401). Then, the user utters a sound indicating the start of the sound operation mode (e.g., the user speaks "start operation") (S402), and the control section 10 performs sound recognition to recognize a keyword indicating the start of the sound operation mode (S403). Then, the HMD101 (specifically, the control unit 10) recognizes a keyword by voice recognition, and starts a voice operation mode (S404). Here, the HMD101 outputs a sound notifying that the sound operation mode has been started (S405). HMD101 performs notification such as "start operation", for example.

Thus, in S401 to S405, the voice operation mode is started, and preparation for performing the voice operation is made. Then, as described in an example of the following description, the user can perform a sound operation (operation based on sound) of the target. In the following description, an icon for generating a map is sometimes referred to as a map icon.

First, the user speaks a map icon of a target to be operated by sound (S406). For example, in the case where the mapped smartphone 201 as the information terminal 102 is to be selected, the user speaks "cell phone". Then, the control unit 10 recognizes the mapped icon uttered by the user by voice recognition (S407). That is, the control unit 10 selects a map icon of a target corresponding to a sound input by the user. The mobile phone is a short for the smart phone 201.

Then, the HMD101 audibly notifies the user of the selected map icon (S408). Here, the HMD101 notifies that the mobile phone is selected, for example. The user confirms whether the selected mapping icon is correct or not according to the notified content, and if so, speaks correct (e.g., speaks "OK") (S409). Thus, the HMD101 recognizes the keyword by voice recognition, and can execute the processing of S501 described below. On the other hand, in the case where the map icon is not properly selected, the user speaks incorrectly (e.g., speaks "NO"). Then, the user speaks the map icon to be operated again, causing the HMD101 to execute processing of identifying the map icon.

Thus, in S406 to S409, the user selects a map icon to be subjected to a sound operation. In addition, when the map icon is selected, a sound indicating that the map icon is selected may be output. The sound may be, for example, a simple sound such as "pop", or may be the name of the target shown by the map icon. Thus, the user can understand that the map icon has been selected.

The sound indicating the selection of the map icon may be output from the speaker 17 so as to be heard from the direction in which the selected map icon is displayed. For example, with the center portion on the front side of the HMD101 as a reference, when the selected map icon is displayed on the front of the right eye of the user wearing the device, it is possible to output a sound as if it were heard from the right side. In addition, in the case where the map icon is displayed on the center side of the HMD101, a sound as if heard from the front side can be output.

In addition, HMD101 may use an appropriate tracking technique in the selection of the map icon. For example, the HMD101 may detect the orientation of the user's head by the head tracking unit 28 in addition to the sound of the user input to the microphone 16, and select a map icon of the sound input to the microphone 16 displayed in the orientation. In this case, the user can select the map icon requested by the user by turning the head in the direction of the map icon to be selected and making a sound.

Further, for example, the HMD101 may detect the direction of the line of sight of the user by the eye-tracking unit 31 in addition to the sound of the user input to the microphone 16, and select a map icon of the sound input to the microphone 16 displayed in the direction. In this case, the user can select the map icon desired by the user by directing his or her line of sight to the map icon to be selected and making a sound.

In this way, by using the tracking technique, it is possible to realize a map icon selection performed not only based on sound but also including a user's motion and a line of sight. In S401 to S409, data such as keywords for voice recognition may be stored in advance in an appropriate storage device such as the storage unit 13. Next, the processing of the sound operation will be described. In this sound operation, based on a sound input from the HMD101 side, an operation is performed via wireless communication with the information terminal 102 that performs processing.

As shown in fig. 17, the user speaks the operation content of the map icon of the selected target (S501).

Here, various operations can be considered as the operation contents. The operation contents include, for example, an operation on display (display of a menu, selection of a menu item, etc.), display and movement of a cursor, volume adjustment, an operation on call and answer in a case where the target smart phone 201 or the like has a call function (a function of processing a sound in a call), movement (remapping) of a displayed icon position, an operation of the target information terminal 102, execution of a target application (start of an application), and the like. In addition, HMD101 can output sound from a target via speaker 17 based on a virtual sound source in virtual space 300. In the case where the information terminal 102 has a call function, the information terminal 102 may process the call sound, and input and output the call sound by the microphone 16 and the speaker 17 of the HMD 101.

Then, the control unit 10 recognizes the operation content by voice recognition (S502), and the HMD101 notifies the recognized operation content by voice (S503).

When the user wants to move the mapping icon of the selected smartphone 201 to the left, for example, the user speaks "move to the left". Then, the HMD101 recognizes that the user wants to move the map icon to the left by voice recognition, for example, "move the mobile phone to the left" is notified with voice. In this way, in S501 to S503, the operation content is input to the HMD101, and the HMD101 recognizes the operation content.

Then, the control section 10 performs an operation conforming to the inputted operation content (S504), and notifies the performed operation content with sound (S505). When the control unit 10 performs an operation of moving the mapping icon of the smartphone 201 to the left, for example, it notifies "the smartphone has been moved to the left" by voice. Further, the operation of the control section 10 here is an undetermined process, and the user is to judge whether or not the operation content is correct (S506). When the user determines that the operation content is correct, the processing described below is executed to determine the operation content. On the other hand, when the user determines that the operation content is incorrect, the operation content is input again. In this case, the operation content determined as incorrect by the user is reset. Thus, in S504 to S506, the input operation content is executed by the control section 10. Next, a process for determining the operation content is described.

When the user determines that the operation content is correct, the user outputs a keyword indicating the meaning by voice (S507). The user speaks "OK", for example. Then, the control unit 10 recognizes a keyword by voice recognition (S508), and determines the operation content (S509). Then, the control section 10 audibly notifies the user that the operation content has been determined (S510). As described above, in the case where it is determined that the mapped icon of the smartphone 201 has been moved to the left, the control section 10 may, for example, notify "determined to be moved to the left" with sound.

Thus, in S507 to S510, the sound operation is determined. Here, in the voice processing operations of S501 to S510, data such as keywords for voice recognition may be appropriately stored in the storage device such as the storage section 13, and the control section 10 can use the data in voice recognition.

In addition, in the sound operation, in the case where the mapped icon is moved so as to overlap with another mapped icon, the HMD101 may output a warning with sound. Then, the HMD101 may output a sound suggesting in which direction the mapping icon to be moved is shifted in order to make the mapping icons non-overlapping. Then, the HMD101 can shift the position of the target to be mapped in a predetermined direction by recognizing a keyword from the voice input by the user using voice recognition. Here, the keywords (e.g., "left", "right", etc.) are stored in an appropriate storage device. The offset amount can be set appropriately, for example, to a minimum amount to avoid overlapping.

Next, an example of the processing of the final sound processing operation (i.e., the processing of ending the sound operation mode) will be described. As shown in fig. 18, the user confirms whether or not there is a map icon to be subjected to the voice operation (S601), and if there is no corresponding map icon, a keyword indicating that the voice operation is ended is uttered (S602). The user speaks, for example, "end operation". Then, the control unit 10 recognizes a keyword by voice recognition (S603), and the HMD101 ends the voice operation mode (S604). Then, the HMD101 notifies the user of the completed sound pattern with sound (S605). Here, the HMD101 outputs a sound such as "end operation", for example.

Thus, through S601 to S605, the sound operation mode ends (S606). In S601 to S605, data such as keywords used for voice recognition by the HMD101 may be stored in advance in an appropriate storage device such as the storage unit 13.

As described above, the user can perform voice operation on the information terminal 102 from the HMD101 side. Here, with reference to fig. 19, input and output of data between the HMD101 and the information terminal 102 during audio operation will be described.

First, the HMD101 waits until the user inputs a sound concerning the operation content, and when the sound concerning the operation content is input, starts the operation mode for the information terminal 102 (wearable device operation mode in fig. 19) (S701). Then, when the information terminal 102 is operated in voice (that is, when the operation content of the information terminal 102 is recognized in the processing of S502), the control unit 10 starts the communication unit (the communication processing unit 33 and the interface 36) and starts the communication with the information terminal 102 (the wearable device 200 in this example) (S702).

Then, the control section 10 transmits the operation content to the wearable device 200 via the network 202 (S703), and receives the operation result from the wearable device 200 (S704). Then, the user confirms the received operation result, and confirms whether the operation is correctly performed (S705). That is, in S705, the above-described confirmation of S506 is performed. Then, when the user confirms that the operation has been performed correctly, the user inputs a keyword indicating the meaning in a voice. Then, the control unit 10 determines the operation content, and ends the operation mode for the information terminal 102 (S706).

According to the present embodiment, the user can easily perform the mapping process of the object, generate the icon of the object, and operate the object based on a simple method of inputting the sound. Thus, for example, even if the outside field of view is limited, the user can use the portable electronic device with good convenience. Further, according to the present embodiment, an information terminal system including an HMD101, which is one example of an audio augmented reality object reproduction apparatus, and 1 or more information terminals 102 can be realized. In addition, although the example in which the wearable device 200 and the smart phone 201 are used as one example of the information terminal 102 has been described above, the information terminal 102 may be a different type of terminal. The information terminal 102 may be a terminal that can perform a normal operation other than audio. In this case, the input operation to the information terminal 102 for instructing the start of the mapping may be performed by a method other than voice.

Next, a second embodiment will be described with reference to fig. 20. The same reference numerals are given to the same functions as those of the other embodiments, and the description thereof may be omitted. In the second embodiment, an example of the audio augmented reality object reproducing apparatus 1001 in which the display 15 is omitted from the HMD101 described in the first embodiment will be described. The processing concerning display is omitted in the sound augmented reality object reproducing apparatus 1001.

The sound augmented reality object reproducing device 1001 can be a device worn on the head, such as a headphone, for example. The audio augmented reality object playback device 1001 is connected to the information terminal 102, and performs mapping to the virtual space 300 in accordance with the audio input from the target, as described above. In addition, the sound augmented reality object reproducing apparatus 1001 performs processing corresponding to the operation of the user by the operation requested by the user input. Here, the user can perform various operations such as an operation of reproducing (e.g., playing) the mapped object, as in the above description. In addition, in the case of reproducing the target, the sound augmented reality object reproducing apparatus 1001 can make an output as if it were heard from the position of the virtual sound source 103 in the virtual space 300.

The first and second embodiments are described above. Here, the HMD101 and the sound augmented reality object reproducing apparatus 1001 described in the embodiment may be used alone, not connected to the information terminal 102. In this case, the HMD101 and the audio augmented reality object reproduction apparatus 1001 perform processing corresponding to the operation of the user by mapping the audio from the information terminal 102 as in the above description, but the processing using communication with the information terminal 102 is omitted.

Then, in the case of reproducing the mapped object, data to be reproduced from the mapped object is stored in advance in the HMD101 and the audio-augmented reality object reproducing apparatus 1001, and the HMD101 and the audio-augmented reality object reproducing apparatus 1001 perform output as if they were heard from the corresponding positions in the virtual space 300 based on the data stored in advance.

In addition, the sound augmented reality object reproducing apparatuses (101, 1001) may be configured to be used alone, and in this case, a configuration for communicating with the information terminal 102 may be omitted. The information terminal 102 may be a terminal in which a configuration for communication is omitted.

While the embodiments of the present invention have been described above, the configuration for realizing the technique of the present invention is not limited to the above embodiments, and various modifications can be considered. For example, the above-described embodiments are described in detail for the purpose of easily understanding the present invention, and are not limited to all the configurations that are required to be described. In addition, a part of the structure of one embodiment may be replaced with the structure of another embodiment, and the structure of another embodiment may be added to the structure of one embodiment. All of which are within the scope of the present invention. In addition, numerical values, messages, and the like appearing in the text and the drawings are merely examples, and even if they are used differently, the effects of the present invention are not impaired.

The programs described in the respective processing examples may be independent programs, or may be one application program composed of a plurality of programs. In addition, the processing may be executed in sequence by exchanging the processing.

For example, the functions and the like of the present invention described above may be implemented in hardware by designing them in an integrated circuit or the like. The present invention may be implemented in software by a microprocessor unit, a CPU, or the like interpreting and executing an operation program for realizing each function or the like. The scope of software implementation is not limited, and both hardware and software may be used. In addition, a part or all of the functions may be realized by a server. The server may be any server as long as it can perform functions in cooperation with other components via communication, for example, a local server, a cloud server, an edge server, a web service, or the like. The information such as a program, a table, and a file for realizing each function may be stored in a recording device such as a memory, a hard disk, and an SSD (Solid STATE DRIVE), or in a recording medium such as an IC card, an SD card, and a DVD, or may be stored in a device on a communication network.

In addition, control lines and information lines shown in the drawings are shown as deemed necessary for explanation, and not necessarily all control lines and information lines on the product. In practice it is also possible to consider that almost all structures are interconnected.

Description of the reference numerals

10. Control unit (processor)

11 ROM

12 RAM

13. Storage unit

14. Video camera

15. Display (display part)

16. Microphone

17. Loudspeaker

18. Push button

19. Touch sensor

20. Voice recognition unit

21. Sound input unit

22. Array microphone

23. Directional microphone

24. Distance measuring unit

25. Distance measuring camera

26 LiDAR

27. Distance sensor

28. Head tracking unit

29. Acceleration sensor

30. Gyroscope sensor

31. Eye movement tracking unit

32. Sight line detection sensor

33. Communication processing unit

34 Wireless LAN communication unit

35. Short-distance wireless communication unit

36. Interface

37. Wireless antenna

100. Operator (user)

101 HMD (head type display)

102. Information terminal

103. Virtual sound source

200. Wearable device

201. Smart phone

202. Network system

300. Virtual space

1001. Sound augmented reality object reproduction device

Claims

1. A sound augmented reality object reproduction apparatus capable of mapping a target to a virtual space, characterized by:

comprising a processor and a memory, wherein the processor is used for storing data,

The processor maps the sound input from the information terminal to a position corresponding to the position of the information terminal in a virtual space with the information terminal or an application on the information terminal as a target based on the sound output from the information terminal.

2. The sound augmented reality object reproducing apparatus according to claim 1, wherein:

Comprising an array microphone for inputting sound from the information terminal,

The array microphone may be configured to be coupled to a microphone array,

(1) The sound reproducing device is configured by microphones disposed at the upper left end and the lower right end on the front side of the sound reproducing device and the right side of the sound reproducing device, or (2) the sound reproducing device is configured by microphones disposed at the upper right end and the lower left end on the front side of the sound reproducing device and the left side of the sound reproducing device.

3. The sound augmented reality object reproducing apparatus according to claim 2, wherein:

In the case of the array microphone described above,

In the case of the configuration of (1), each microphone is arranged such that, when the sound augmented reality object reproducing apparatus is worn, the distance of each microphone on the front side is substantially equal to the distance of the microphone on the right lower end portion on the front side from the microphone on the right side,

In the case of the configuration of (2), each microphone is arranged such that, when the sound augmented reality object reproducing device is worn, the distance between each microphone on the front side is substantially equal to the distance between the microphone on the left lower end portion on the front side and the microphone on the left side.

4. The sound augmented reality object reproducing apparatus according to claim 1, wherein:

including 1 or more directional microphones for inputting sound from the information terminal.

5. The sound augmented reality object reproducing apparatus according to claim 1, wherein:

the processor determines whether mapping is appropriate based on whether or not the position of the virtual sound source configured by the mapping coincides with the position of the information terminal outputting the sound.

6. The sound augmented reality object reproducing apparatus according to claim 5, wherein:

the processor adjusts, when it is determined that the mapping is not appropriate, the position of the virtual sound source so as to coincide with the position of the information terminal.

7. The sound augmented reality object reproducing apparatus according to claim 1, wherein:

The processor outputs a warning with sound in a case where the position of the object to be mapped overlaps the positions of other objects mapped in the virtual space.

8. The sound augmented reality object reproducing apparatus according to claim 7, wherein:

the processor outputs sounds suggesting in which direction the position of the object to be mapped is shifted.

9. The sound augmented reality object reproducing apparatus according to claim 1, wherein:

the processor outputs sounds that allow the user to select which of the local coordinate system and the world coordinate system to use in the mapping,

Mapping is performed under a coordinate system corresponding to the input user's voice.

10. The sound augmented reality object reproducing apparatus according to claim 1, wherein:

Comprises a display part and a display part, wherein the display part comprises a display part,

The processor displays an icon for operating the mapped object on the display section.

11. The sound augmented reality object reproducing apparatus according to claim 10, wherein:

The processor selects an icon of a target corresponding to a sound input by a user.

12. The sound augmented reality object reproducing apparatus according to claim 11, wherein:

Comprising a head tracking part detecting the movement of the head of the user,

The processor selects an icon in a direction detected by the head tracking section.

13. The sound augmented reality object reproducing apparatus according to claim 11, wherein:

Comprises an eye tracking part for detecting the direction of the line of sight of a user,

The processor selects an icon in a direction detected by the eye-tracking section.

14. The sound augmented reality object reproducing apparatus according to claim 10, wherein:

The processor displays the name of the target acquired based on the sound from the information terminal on the display unit together with the icon of the target.

15. The sound augmented reality object reproducing apparatus according to claim 10, wherein:

Comprising an interface for the communication of the communication network,

The processor displays a name of the object acquired based on communication with the information terminal on the display unit together with an icon of the object.

16. An information terminal system, characterized by comprising:

1 or more information terminals; and

A sound augmented reality object reproducing apparatus capable of mapping a target to a virtual space, wherein,

The sound augmented reality object reproduction apparatus includes a processor,

17. The information terminal system according to claim 16, wherein:

the sound augmented reality object reproducing apparatus includes a display section,

18. The information terminal system according to claim 17, wherein:

19. The information terminal system according to claim 17, wherein:

20. The information terminal system according to claim 17, wherein:

The sound augmented reality object reproducing apparatus includes an interface for communication,