Nothing Special   »   [go: up one dir, main page]

WO2024190127A1 - Système de détection d'objet et procédé de sélection d'objet - Google Patents

Système de détection d'objet et procédé de sélection d'objet Download PDF

Info

Publication number
WO2024190127A1
WO2024190127A1 PCT/JP2024/003097 JP2024003097W WO2024190127A1 WO 2024190127 A1 WO2024190127 A1 WO 2024190127A1 JP 2024003097 W JP2024003097 W JP 2024003097W WO 2024190127 A1 WO2024190127 A1 WO 2024190127A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voice
unit
object selection
voice command
Prior art date
Application number
PCT/JP2024/003097
Other languages
English (en)
Japanese (ja)
Inventor
亜旗 米田
愼一 式井
未佳 砂川
弘毅 高橋
隆雅 吉田
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Publication of WO2024190127A1 publication Critical patent/WO2024190127A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Definitions

  • the present disclosure relates to an object selection system and an object selection method.
  • Patent document 1 discloses a technology for detecting a user's gaze and defining a region of interest (ROI).
  • ROI region of interest
  • the present disclosure therefore provides an object selection system that can determine which object within a region of interest a user is interested in.
  • the object selection system includes an image acquisition unit that acquires an image corresponding to a user's field of view, a gaze detection unit that detects the user's gaze, a setting unit that sets an attention area on the image that the user is focusing on based on the detected gaze of the user, an object detection unit that detects one or more objects included in the attention area, a voice acquisition unit that acquires the voice of the user, a voice command analysis unit that analyzes the user's voice to extract a voice command indicating a position included in the user's voice, and an object selection unit that selects an object corresponding to the extracted voice command from among the one or more objects.
  • the object selection system includes an image acquisition unit that acquires an image corresponding to a user's field of view, a gaze detection unit that detects the user's gaze, a setting unit that sets an attention area on the image that the user is focusing on based on the detected gaze of the user, an object detection unit that detects one or more objects included in the attention area, a category determination unit that determines a category of the one or more objects, a voice acquisition unit that acquires the voice of the user, a voice command analysis unit that analyzes the user's voice to extract a voice command indicating a category included in the user's voice, and an object selection unit that selects an object corresponding to the extracted voice command from the one or more objects.
  • FIG. 1 is a block diagram showing an example of an object selection system according to a first embodiment.
  • FIG. 4 is a diagram for explaining a method for selecting an object in the first embodiment.
  • FIG. 11 is a block diagram showing an example of an object selection system according to a second embodiment.
  • FIG. 11 is a diagram for explaining a method for selecting an object in the second embodiment.
  • 13A to 13C are diagrams for explaining a process in which an object is selected in the second embodiment.
  • FIG. 11 is a block diagram showing an example of an object selection system according to a third embodiment.
  • FIG. 13 is a diagram for explaining a method for selecting an object in the third embodiment.
  • FIG. 11 is a diagram illustrating an example of object information.
  • FIG. 13 is a block diagram showing an example of an object selection system according to a fourth embodiment.
  • FIG. 13 is a diagram for explaining a method for selecting an object in embodiment 4.
  • FIG. 13 is a diagram for explaining a method for selecting an object in embodiment 4.
  • FIG. 2 is a diagram showing an example of a user's web browsing history.
  • FIG. 13 is a block diagram showing an example of an object selection system according to a fifth embodiment.
  • FIG. 13 is a diagram illustrating an example of a virtual object to be superimposed.
  • FIG. 13 is a diagram illustrating an example of a virtual object to be superimposed.
  • FIG. 13 is a diagram illustrating an example of a virtual object to be superimposed.
  • FIG. 13 is a diagram illustrating an example of a virtual object to be superimposed.
  • FIG. 13 is a diagram illustrating an example of a virtual object to be superimposed.
  • FIG. 23 is a block diagram showing an example of an object selection system according to a sixth embodiment.
  • 13 is a flowchart illustrating an example of an object selection method according to another embodiment.
  • 13 is
  • FIG. 1 is a block diagram showing an example of an object selection system 1 according to the first embodiment.
  • the object selection system 1 is a system for allowing a user to select an object in which the user is interested from among multiple objects that appear in the user's field of vision using AR (Augmented Reality) or VR (Virtual Reality), etc.
  • the object selection system 1 comprises an image acquisition unit 101, a gaze detection unit 102, a setting unit 103, an object detection unit 104, a voice acquisition unit 105, a voice command analysis unit 106, and an object selection unit 107.
  • the object selection system 1 is a computer including a processor (microprocessor) and a memory.
  • the memory is a ROM (Read Only Memory) and a RAM (Random Access Memory), etc., and can store a program executed by the processor.
  • the image acquisition unit 101, the gaze detection unit 102, the setting unit 103, the object detection unit 104, the voice acquisition unit 105, the voice command analysis unit 106, and the object selection unit 107 are realized by a processor or the like that executes a program stored in the memory.
  • the category determination unit 108, the virtual object superimposition unit 111, the virtual object generation unit 112, the user state estimation unit 113, the user situation management unit 114, and the gesture detection unit 115 are also realized by a processor that executes a program stored in a memory.
  • the object selection system 1 may also include a communication interface for wired or wireless communication with AR glasses, a head mounted display (HMD), or the like.
  • the object selection system 1 may be a computer (device) in a single housing, or may be a system made up of multiple computers. Also, for example, the object selection system 1 may be a server. Note that the components of the object selection system 1 may be located in a single server, or may be distributed across multiple servers.
  • the image acquisition unit 101 acquires an image corresponding to the user's field of view.
  • the image acquisition unit 101 acquires the image by wireless communication from a camera mounted on the AR glasses or HMD worn by the user.
  • the camera may be a stereo camera capable of acquiring image information and parallax information (distance information), and the image acquisition unit 101 may acquire an image including distance information.
  • the image corresponding to the user's field of view is an image that shows approximately the same area as the field of view seen by the user through the AR glasses.
  • the image corresponding to the user's field of view is an image displayed to the user by the HMD.
  • the image may be an image showing the user's surroundings, or may be an image of virtual reality.
  • the object selection system 1 may be mounted on the AR glasses or HMD, and in this case, the image acquisition unit 101 may acquire an image by wired communication from a camera mounted on the AR glasses or HMD.
  • the object selection system 1 may also be equipped with such a camera.
  • the gaze detection unit 102 detects the user's gaze. There are no particular limitations on the method of detecting the user's gaze, but for example, the gaze detection unit 102 detects the user's gaze by acquiring gaze information indicating the location and movement of the user's gaze from an eye tracker or the like.
  • the eye tracker may use infrared rays or may measure the user's electromyogram.
  • the object selection system 1 may also be equipped with an eye tracker or the like.
  • the setting unit 103 sets a region of interest (ROI) on the acquired image that the user is focusing on based on the detected line of sight of the user.
  • ROI region of interest
  • the setting unit 103 sets the region on the image that the user is focusing on in the user's field of vision as the region of interest.
  • the region of interest that can be set based on the user's line of sight may include multiple objects.
  • the object detection unit 104 detects one or more objects included in the region of interest. If the region of interest includes multiple objects, the object detection unit 104 detects the multiple objects included in the region of interest. There are no particular limitations on the method of detecting objects, but for example, because an image includes distance information, a group of pixels that are approximately the same distance apart can be detected as a single object. Also, for example, objects can be detected by inputting the image into a trained model for detecting objects.
  • the voice acquisition unit 105 acquires the user's voice.
  • the voice acquisition unit 105 acquires the user's voice (voice signal) from a microphone worn by the user.
  • the microphone may be mounted on AR glasses or an HMD.
  • the voice command analysis unit 106 analyzes the user's voice (voice signal) to extract a voice command indicating a position contained in the user's voice. Details of the voice command analysis unit 106 will be described later.
  • the object selection unit 107 selects an object corresponding to the extracted voice command from among one or more objects (e.g., multiple objects) included in the area of interest. Details of the object selection unit 107 will be described later. Note that if the area of interest includes only one object, extraction of the voice command does not need to be performed, and the object selection unit 107 may select that one object.
  • objects e.g., multiple objects
  • the object selection unit 107 outputs information about the selected object.
  • the object selection unit 107 outputs information about the selected object to a processing unit that generates a response to a voice command.
  • FIG. 2 is a diagram for explaining a method for selecting an object in the first embodiment.
  • FIG. 2 shows an image corresponding to the user's field of view and a region of interest (ROI) set on the image.
  • the region of interest in the image shown in FIG. 2 includes a flag in the foreground and a window in the background.
  • the voice including the position of the object is, for example, a voice including a demonstrative pronoun.
  • the voice including the position of the object is a voice including "this,” “that,” “in front,” “over there,” “on the right,” “on the left,” “above,” “below,” and the like.
  • the voice command analysis unit 106 analyzes the user's voice to extract the voice command "this" indicating the position included in the user's voice.
  • the voice command analysis unit 106 also extracts the voice command of the question "What?".
  • the object selection unit 107 selects the flag in the foreground as the object corresponding to the extracted voice command "this" from among the flags and windows included in the attention area. For example, the object selection unit 107 outputs an image (feature) of the object in the foreground (flag) corresponding to the voice command "this” and the voice command of the question "What?" for this object to a processing unit that generates a response to the voice command. This generates answers to questions about the flag and presents them to the user.
  • the voice command analysis unit 106 extracts the voice command "that,” and the object selection unit 107 selects the window at the back as the object corresponding to the voice command "that.” Then, an answer to the question about the window is generated and presented to the user.
  • the voice command analysis unit 106 may extract a voice command that indicates both top/bottom/left/right and depth positions.
  • the image does not have to include distance information. In this case, it is difficult to distinguish objects that exist in the depth direction, but it is possible to distinguish objects that exist in the up, down, left, and right directions.
  • a voice command extracted from voice including the position of an object in which the user is interested e.g., a demonstrative pronoun
  • a voice command extracted from voice including the depth position of the object in which the user is interested it is also possible to select an object in which the user is interested within the area of interest from objects that exist in the depth direction.
  • FIG. 3 is a block diagram showing an example of an object selection system 2 according to embodiment 2.
  • the object selection system 2 according to the second embodiment differs from the object selection system 1 according to the first embodiment in that it includes a category determination unit 108.
  • the following describes the object selection system 2 according to the second embodiment, focusing on the differences from the object selection system 1 according to the first embodiment, and omitting a description of the similarities.
  • the category determination unit 108 determines the category of one or more objects contained in the attention area.
  • the method of determining the category is not particularly limited, but for example, the category of the object can be determined by using image recognition AI or the like. For example, if the object is a woman wearing a dress, categories such as “person”, “ woman”, and “dress” can be determined as the category of this object. Also, for example, if the object is a poodle, categories such as “animal”, “dog”, and “poodle” can be determined as the category of this object.
  • the voice command analysis unit 106 analyzes the user's voice to extract voice commands indicating a location and a category contained in the user's voice. Details of the voice command analysis unit 106 will be described later.
  • FIG. 4 is a diagram for explaining a method of selecting an object in embodiment 2.
  • FIG. 4 shows an image corresponding to the user's field of view and a region of interest (ROI) set on the image.
  • the region of interest in the image shown in FIG. 4 includes a woman and a dog in the foreground, and a man and two women in the background.
  • the voice command analysis unit 106 analyzes the user's voice to extract the voice command "this dog" indicating the position and category included in the user's voice. The voice command analysis unit 106 also extracts the voice command of the question "What?".
  • the object selection unit 107 selects the dog in the foreground as the object corresponding to the extracted voice command "this dog” from among the three women, men, and dogs included in the attention area. For example, the object selection unit 107 outputs the image (feature) of the object in the foreground (dog) corresponding to the voice command "this dog” and the voice command of the question "What?" to this object to a processing unit that generates an answer to the voice command. As a result, an answer to the question about the dog (e.g., a poodle) is generated and presented to the user.
  • a processing unit that generates an answer to the voice command.
  • FIG. 5 is a diagram for explaining the process by which an object is selected in embodiment 2.
  • the voice command analysis unit 106 does not have to extract a voice command indicating a position contained in the user's voice.
  • the voice command analysis unit 106 may extract a voice command indicating a category contained in the user's voice by analyzing the user's voice.
  • the user when a user focuses on an area of interest such as that shown in FIG. 4, the user utters a voice including an object category.
  • the voice including an object category is a voice including " woman,” “man,” “dog,” and the like. Since the area of interest shown in FIG. 4 includes multiple women, it is difficult to distinguish the women in the area of interest if a voice command indicating a position is not extracted. However, since there is only one man and only one dog, it is possible to distinguish the man and the dog in the area of interest.
  • FIG. 5 has been described as a diagram for explaining the process of selecting an object in the second embodiment, the intention of this diagram is not limited to this, and the object selection process in FIG. 5 may be displayed to the user. That is, a user who is about to utter a voice command can confirm that there are multiple objects in the current ROI because each object is highlighted (as in FIG. 5, the contours of the object are emphasized. Note that the method of highlighting is not limited to this, and a method of displaying the background other than the object with a low brightness may also be used). Here, when "this" is uttered, the highlighting of "the man and woman in the back" is canceled.
  • the user can confirm that the " woman” and “dog” in the foreground are candidates for object selection. Furthermore, when “dog” is uttered, only “dog” is highlighted. That is, the user can confirm that only "dog” has been selected as an object.
  • FIG. 6 is a block diagram showing an example of an object selection system 3 according to embodiment 3.
  • the object selection system 3 according to the third embodiment differs from the object selection system 2 according to the second embodiment in that it includes an object information storage unit 109.
  • the following describes the object selection system 3 according to the third embodiment, focusing on the differences from the object selection system 2 according to the second embodiment, and omitting a description of the similarities.
  • the object information storage unit 109 accumulates object information about one or more detected objects. Details of the object information will be described later.
  • the voice command analysis unit 106 analyzes the user's voice to extract voice commands indicating categories and past times contained in the user's voice. Details of the voice command analysis unit 106 will be described later.
  • the object selection unit 107 selects an object corresponding to the extracted voice command from among one or more previously detected objects indicated by the object information stored in the object information storage unit 109. Details of the object selection unit 107 will be described later.
  • FIG. 7 is a diagram for explaining a method for selecting an object in embodiment 3.
  • FIG. 7 shows images corresponding to the user's field of view 20 seconds ago, 10 seconds ago, and the current field of view, and a region of interest (ROI) set on the image.
  • ROI region of interest
  • the object information storage unit 109 accumulates object information about objects included in the user's focus area up to the present.
  • the focus area 20 seconds ago included a signboard with "Club XXX” written on it and a plant
  • the focus area 10 seconds ago included a person (male)
  • the current focus area includes a signboard with "Snack Bar YYY” written on it.
  • object information such as that shown in FIG. 8 is accumulated.
  • FIG. 8 shows an example of object information.
  • images of objects in the "sign, store” category and images of objects in the "plant” category are stored as object information related to objects that were included in the area of interest from 25 seconds ago to 20 seconds ago (14:15:00 to 14:15:5).
  • images of objects in the "person, man” category are stored as object information related to objects that were included in the area of interest from 15 seconds ago to 10 seconds ago (14:15:10 to 14:15:15).
  • images of objects in the "sign, store” category are stored as object information related to objects that were included in the area of interest from 5 seconds ago to the present (14:15:20 to 14:15:25).
  • the user utters a voice including an object category and a past time point.
  • the voice including an object category and a past time point is a voice including "the signboard just now,” “the person just now,” “the plant just now,” and the like.
  • the voice command analysis unit 106 analyzes the user's voice to extract the voice command "the signboard just now,” which indicates the category and the past time point included in the user's voice.
  • the voice command analysis unit 106 also extracts the voice command of the question "What?" Then, the object selection unit 107 selects the signboard of the club "XXX,” which was included in the attention area from 25 seconds ago to 20 seconds ago, as the object corresponding to the extracted voice command "the signboard just now,” from among the people, plants, and signs detected in the past, which are indicated by the object information accumulated in the object information storage unit 109. For example, the object selection unit 107 outputs an image (characteristics) of a past object (signboard) corresponding to a voice command "that previous signboard” and a voice command for asking "what?" about this object to a processing unit that generates an answer to the voice command. As a result, an answer to the question about the club "XXX" (for example, the store's URL or image) is generated and presented to the user.
  • the object selection unit 107 selects the signboard of the club "XXX," which was included in the attention area from 25 seconds ago to 20 seconds ago, as the object corresponding
  • FIG. 9 is a block diagram showing an example of an object selection system 4 according to embodiment 4.
  • the object selection system 4 according to the fourth embodiment differs from the object selection system 2 according to the second embodiment in that it includes a user information storage unit 110.
  • the following describes the object selection system 4 according to the fourth embodiment, focusing on the differences from the object selection system 2 according to the second embodiment, and omitting a description of the similarities.
  • the object selection unit 107 compares the extracted voice command with user information to select an object corresponding to the extracted voice command from among one or more objects included in the area of interest. Details of the object selection unit 107 will be described later.
  • FIGS. 10 and 11 are diagrams for explaining a method of selecting an object in embodiment 4.
  • FIG. 10 and FIG. 11 show an image corresponding to a user's field of view and a region of interest (ROI) set on the image.
  • the region of interest in the image shown in FIG. 10 includes multiple people.
  • the region of interest in the image shown in FIG. 11 includes signs on the foreground and background.
  • the object selection unit 107 outputs an image (features) of the object (son) corresponding to the voice command "son” and the voice command of the question "Where?" for this object to a processing unit that generates an answer to the voice command.
  • a processing unit that generates an answer to the voice command.
  • the object selection unit 107 compares the extracted voice command "that store” with the user information. For example, the object selection unit 107 assumes that "that store” refers to a store on a web page recently viewed by the user, and captures an image on a web page indicated by the user's most recent web browsing history (e.g., the web browsing history for 2022/12/10 shown in FIG. 12) in the store category. The object selection unit 107 then compares the captured image with multiple objects (e.g., the signs "XXX", "YYY” and "ZZZ") included in the image shown in FIG. 11.
  • multiple objects e.g., the signs "XXX", "YYY" and "ZZZ
  • the object selection unit 107 determines that "that store” is a "ZZZ” store, and selects the object "ZZZ” from among the multiple objects included in the attention area. For example, the object selection unit 107 outputs an image (characteristics) of the object “ZZZ” corresponding to the voice command "that store” and a voice command for the question "Where?" for this object to a processing unit that generates an answer to the voice command. As a result, an answer to the question about "ZZZ” (for example, a line surrounding a group of pixels in which "ZZZ” appears on the image) is generated and presented to the user.
  • an answer to the question about "ZZZ” for example, a line surrounding a group of pixels in which "ZZZ” appears on the image
  • the object selection unit 107 may also output the user's current location to the processing unit, and may present a store that is closest to the user's current location among the stores included in the web browsing history. For example, if the image in the web page indicated by the web browsing history on 2022/9/15 includes a snack bar "YYY", the snack bar "ZZZ” is closer to the current location than the "ZZZ” store, so the object selection unit 107 may select the snack bar "YYY" object from among the multiple objects included in the attention area.
  • the object selection system 3 may include a user information storage unit 110, and in the third embodiment, the object selection unit 107 may select an object corresponding to the extracted voice command from among one or more objects by comparing the extracted voice command with user information.
  • FIG. 13 is a block diagram showing an example of an object selection system 5 according to embodiment 5.
  • the object selection system 5 according to the fifth embodiment differs from the object selection system 1 according to the first embodiment in that it includes a virtual object superimposition unit 111, a virtual object generation unit 112, a user state estimation unit 113, and a user situation management unit 114.
  • the following describes the object selection system 5 according to the fifth embodiment, focusing on the differences from the object selection system 1 according to the first embodiment, and omits a description of the similarities.
  • the user state estimation unit 113 estimates the state of the user.
  • the user state estimation unit 113 estimates the state of the user by acquiring biometric information from a sensor for acquiring biometric information of the user.
  • the sensor may be a sensor (electrode) mounted on AR glasses or an HMD, and may acquire biometric information such as the user's brain waves, heart rate, facial expression (electromyography of facial muscles), body temperature (lower body temperature during fasting), or skin resistance (sweating).
  • the sensor may be a breath sensor and may acquire biometric information such as acetone during fat burning.
  • the sensor may be a microphone attached to the user's belt, and may acquire biometric information such as visceral sounds.
  • the senor may be a blood glucose sensor consisting of a needle or an infrared sensor and may acquire biometric information such as blood glucose levels.
  • the sensor may be an odor sensor and may acquire biometric information such as changes in biometric signals when the user smells an odor.
  • the sensor may be an acceleration sensor and may acquire biometric information such as the user's body movements or gait.
  • the user state estimation unit 113 may estimate the user's current location as the user's state by acquiring location information from a GPS sensor or the like.
  • the user status management unit 114 manages the statuses of multiple users. For example, when multiple users are using the object selection system 5, it manages the priority of each user's status.
  • the user status management unit 114 will be described in detail later.
  • the virtual object generation unit 112 generates a virtual object to be superimposed on an image corresponding to the user's field of view. For example, the virtual object generation unit 112 generates a virtual object based on an estimated state of the user. Also, for example, the virtual object generation unit 112 generates a virtual object based on the situations of multiple users. Details of the virtual object generation unit 112 will be described later.
  • the virtual object superimposition unit 111 superimposes a virtual object on an image corresponding to the user's field of view.
  • one or more objects included in the area of interest may include a virtual object.
  • not only real objects but also virtual objects may appear in the user's field of view, and the user may be interested in a virtual object.
  • the virtual object superimposition unit 111 determines the position in the image where the virtual object is to be superimposed based on the state of the user, and superimposes the virtual object at the determined position. Details of the virtual object superimposition unit 111 will be described later.
  • FIGS. 14 to 17B show examples of superimposed virtual objects.
  • a sensor mounted on an AR glass or an HMD acquires biometric information of the user, and the user state estimation unit 113 estimates that the user is hungry from the biometric information.
  • the virtual object generation unit 112 searches the Internet or the like for restaurants near the current location, and generates virtual objects (icons 201, 202, and 203) indicating restaurants near the current location as shown in FIG. 14, and the virtual object superimposition unit 111 superimposes the icons 201, 202, and 203 on an image corresponding to the user's field of vision.
  • the virtual object superimposition unit 111 superimposes the icon 201 of restaurant AAA, which is close to the current location, on the foreground of the image, and superimposes the icon 203 of restaurant CCC, which is far from the current location, on the background of the image. Then, an area of interest is set on the image based on the user's line of sight, the user's voice is analyzed, and if a voice command to select a virtual object (e.g., icon 201) is extracted, navigation is initiated to guide the user to the store "AAA" of icon 201.
  • a virtual object e.g., icon 201
  • a virtual object appropriate to the user's state can be superimposed in the user's field of vision at a position appropriate to the user's state.
  • the user's specific state e.g., hunger
  • a machine learning network may be used for this estimation.
  • an estimated accuracy of the user's state indicating the accuracy of the estimation may be calculated.
  • the accuracy of the estimation may be calculated as a statistically significant difference between the normal state and the specific state.
  • a predetermined estimated accuracy threshold for the user's state may be set, and when the estimated accuracy of the user's state becomes greater than this threshold, an action based on the estimation (e.g., overlaying an icon of a candidate restaurant) may be performed.
  • the icon may be superimposed when triggered by a voice command such as, for example, "Are there any good restaurants?".
  • the restaurants indicated by the icons may be filtered by comparing the suitability calculated based on the user's properties (e.g., annual income, gender, or age), the user's schedule (e.g., the amount of time available to eat), the user's preferences (e.g., vegetarian, likes spicy food, or likes Japanese food), the current time (e.g., whether it is lunchtime or dinnertime), the user's behavioral history (e.g., what the user usually eats when eating out), etc., with a preset threshold.
  • the user's properties e.g., annual income, gender, or age
  • the user's schedule e.g., the amount of time available to eat
  • the user's preferences e.g., vegetarian, likes spicy food, or likes Japanese food
  • the current time e.g., whether it is lunchtime or dinnertime
  • the user's behavioral history e.g., what the user usually eats when eating out
  • the icon does not have to show the restaurant's name, but may show a picture of the restaurant's representative menu item, a picture of the restaurant's exterior, or the restaurant's logo.
  • the order of the icons on the image does not have to be based on the distance from the current location, but may be based on the number of past visits by the user, the longest or shortest time difference between the past visit and the present, or the degree of suitability.
  • the order of the icons is not limited to depth, but may be vertical or horizontal.
  • the arrangement may also be multidimensional.
  • the order of the icons in the depth may be adjusted according to the distance from the current location
  • the vertical order may be adjusted based on the degree of suitability
  • the horizontal order may be adjusted based on the number of past visits
  • the icons may be arranged in three dimensions.
  • the system may automatically make a reservation at the restaurant.
  • Food may also be automatically ordered based on the user's preferences. For example, the food may be filtered based on the user's schedule and, if there is only a short mealtime, only dishes that can be quickly prepared and eaten may be selected. Food icons may also be overlaid on top of the restaurant icon.
  • navigation may be initiated to guide the user to that restaurant without superimposing an icon to allow the user to select.
  • a sensor mounted on the AR glasses or HMD acquires biometric information of the user, and the user state estimation unit 113 estimates from the biometric information that the user is feeling anxious.
  • the virtual object generation unit 112 generates virtual objects (icons 204, 205, and 206) of apps that may resolve the anxiety, as shown in FIG. 15, and the virtual object superimposition unit 111 superimposes the icons 204, 205, and 206 on an image corresponding to the user's field of vision.
  • the virtual object superimposition unit 111 superimposes the icon 204 of an FX app that is likely to resolve the anxiety on the foreground side of the image, and the icon 206 of an email app that is unlikely to resolve the anxiety on the background side of the image. Then, an attention area is set on the image based on the user's line of sight, and the user's voice is analyzed. When a voice command for selecting a virtual object (e.g., the icon 205 of a schedule app) is extracted, the schedule app is launched.
  • a voice command for selecting a virtual object e.g., the icon 205 of a schedule app
  • the icon may be superimposed when triggered by a voice command such as, for example, "What's going on?”.
  • the apps indicated by the icons may be filtered by comparing the anxiety resolution level calculated based on the user's properties (annual income, gender, or age), the user's schedule (future plans), the user's behavioral history (app launch history, usage history, or search history), the user's current location, etc., with a preset threshold. For example, if an FX app is frequently launched and there is a usage history of huge investments, the anxiety resolution level of the FX app will be relatively high. For example, biometric information may be obtained before and after the app is launched, and the reduction in anxiety feelings may be measured, and machine learning may be used to determine which apps were effective in resolving anxiety.
  • the icon does not have to show a pictogram representing the type of app, but may show information such as the rate of the currency the user is investing in, the time until their next appointment, or a digest of email responses.
  • the order of the icons may be determined according to not only the degree of anxiety resolution, but also one or a combination of the time elapsed since a change (e.g., a change in FX rates), the magnitude of the change (e.g., the fluctuation value of FX rates), and the degree of urgency (e.g., the time remaining until the next appointment).
  • the weight of the combination may be determined by parameters such as the user's properties, the user's schedule, the user's behavioral history, or the user's current location.
  • the order of the icons is not limited to being arranged in depth, but may also be arranged vertically or horizontally. The arrangement may also be multidimensional. For example, the order of the icons in depth may be adjusted according to the degree of anxiety resolution, and the order of the icons vertically may be adjusted according to the degree of urgency.
  • search terms related to the subject of the user's interest may be automatically generated and used to search the Internet.
  • the user's specific state may be a state other than hunger as described in FIG. 14 or anxiety as described in FIG. 15.
  • icons of apps that can kill time e.g., games, SNS, videos, or e-books
  • the order of the icons may be determined based on the degree to which boredom is relieved. For example, the free time until the next appointment is obtained from the user's schedule, and an entertainment app that can kill the free time can be selected. For example, if there is 10 minutes of free time, a game that can be completed within 10 minutes or a video that is 10 minutes long or less may be recommended.
  • an icon of an entertainment app for breaks may be superimposed to encourage the user to take a voluntary break.
  • apps that can be finished within a predetermined time (for example, about 10 minutes) are superimposed.
  • games or videos with little stimulation are selected to encourage the user to take a break.
  • a message saying "Let's get back to work” may be output. If the message is audio, it goes without saying that it would be effective to output a sample of the voice of a boss or teacher.
  • a reply message to the effect that "I'm busy right now" may be automatically sent back to the sender of the message.
  • the reply message may have a different writing style depending on the message sender.
  • the writing style may be generated using a trained network that uses machine learning to learn about the user's history of messaging with various senders. For example, the writing style may be changed to a boss-friendly style when speaking to a boss, and to a colleague-friendly style when speaking to a colleague.
  • the user may manually launch a recreational app.
  • a message such as "Are you sure?" or "You'll be in big trouble if you get caught” may be displayed. If such a message is displayed but the recreational app is launched anyway, the recreational app may be slowed down or made to run slowly to make the user feel uncomfortable and encourage them to get back to work.
  • the icon display may be superimposed based on the user's application launch history (e.g., launching a news application when waking up in the morning, or a video application when sitting on the sofa) rather than on biometric information or voice commands.
  • application launch history e.g., launching a news application when waking up in the morning, or a video application when sitting on the sofa
  • a sensor mounted on the AR glasses or HMD acquires biometric information of the user, and the user state estimation unit 113 estimates that the user is hungry from the biometric information.
  • the virtual object generation unit 112 generates a virtual object (digital assistant 207) representing an assistant and a message of the digital assistant 207 (for example, "You're hungry!) as shown in FIG. 16, and the virtual object superimposition unit 111 superimposes the digital assistant 207 and its message on an image corresponding to the user's field of vision.
  • restaurant icons 201, 202, and 203 are generated and superimposed on the image.
  • the digital assistant 207 may output a message such as "Sorry, I made a mistake," and erase the icons 201, 202, and 203.
  • the digital assistant 207 may be a character (such as a child or an animal) that the user feels is understandable even if it makes a mistake or fails.
  • the message of the digital assistant 207 may be output as a voice, and the voice may be the voice of the character. In this case, it is more effective if a voice appropriate to the character is output.
  • the estimation of the user's state using biometric information may be performed by machine learning using the user's response (whether the estimation was successful or unsuccessful). As learning progresses, the accuracy of the estimation of the user's state can be improved. The estimation accuracy may be calculated from the average value of the estimation accuracy of the user's state over a certain period of time. As the learning progresses and the estimation accuracy improves, the character of the digital assistant 207 may be grown. For example, the digital assistant 207 may be a child character in the early stages when the estimation accuracy is poor, and may grow into a young adult character and an adult character as the estimation accuracy improves.
  • the character of the digital assistant 207 may have a confident expression, and when the estimation accuracy of the user's state is low, the character of the digital assistant 207 may have an unconfident expression.
  • the estimation accuracy of the user's state increases or decreases with each estimation, the adult character and the child character may be swapped.
  • the preset threshold for the estimation accuracy of the user's state may be lowered in the order of at work, studying, and free. This reduces the probability that the user will be disturbed by a failure to estimate the state, since actions are unlikely to occur when the user is at work unless the estimation accuracy of the user's state is high.
  • the character of the digital assistant 207 may be changed depending on whether the person is working, studying, or free. For example, the character may be changed to a reliable secretary-like character when working, and a cute animal character when free.
  • the digital assistant 207 may ask the user confirmation such as "You're working right now, aren't you?", "You're studying right now, aren't you?", or "You're free right now, aren't you?”.
  • the app may be launched without superimposing an icon to allow the user to select.
  • the digital assistant 207 may ask the user for confirmation before launching, such as "Do you want to launch the app?" This confirmation may be switched between being performed or omitted depending on the app and the situation. For example, if the situation is free and the app is a game or other app with little social impact, it may be launched without confirmation. On the other hand, if the calling app is for "making a call to a customer," the digital assistant 207 may always confirm, regardless of the situation.
  • a sensor mounted on the AR glasses or HMD acquires biometric information of user A, and the user state estimation unit 113 estimates from the biometric information that user A is in a state where he or she feels the urge to defecate.
  • a sensor mounted on the AR glasses or HMD acquires biometric information of user B who is near user A, and the user state estimation unit 113 estimates from the biometric information that user B is in a state where he or she feels the urge to defecate.
  • the virtual object generation unit 112 searches for toilets near user A and user B, and finds toilets 10 m away, 100 m away, and 500 m away, but there is only one vacant toilet at the 10 m away location. The virtual object generation unit 112 then compares the urges of user A and user B, and determines that user B feels the urge to defecate more urgently.
  • the virtual object generation unit 112 generates a toilet icon 208 for 100 m ahead and a toilet icon 209 for 500 m ahead, as shown in FIG. 17A, and the virtual object superimposition unit 111 superimposes the icons 208 and 209 on the image corresponding to the field of view of user A.
  • the virtual object generation unit 112 also generates a toilet icon 210 for 10 m ahead and a toilet icon 211 for 100 m ahead, as shown in FIG. 17B, and the virtual object superimposition unit 111 superimposes the icons 210 and 211 on the image corresponding to the field of view of user B.
  • an attention area is set on the image based on the user's line of sight, the user's voice is analyzed, and when a voice command for selecting a virtual object (e.g., icon 208 for user A and icon 210 for user B) is extracted, navigation to the toilet selected for each of user A and user B is initiated.
  • a virtual object e.g., icon 208 for user A and icon 210 for user B
  • navigation to the restroom may be initiated without the user having to select an icon by overlaying it.
  • the priority does not have to be determined solely by the urgency of the defecation urge, but may also be determined based on the user's constitution (e.g., tendency to have diarrhea or constipation) determined from the user's behavioral history, or based on the user's future schedule. For example, User A has an appointment in 15 minutes, and if he or she has to travel to a toilet far away, he or she will miss the appointment on time, so User A's priority may be raised.
  • the user's constitution e.g., tendency to have diarrhea or constipation
  • the object selection system 5 can be applied not only to toilet resources, but also to medical services.
  • an abnormality in health is estimated based on biometric information
  • users in a more critical condition may be guided to the nearest hospital. If hospitals are saturated and there is no capacity, users who are not in a critical condition may not be able to receive medical treatment immediately, but after being guided to a place where they can rest, a message such as "Please wait n minutes until you can be examined at the hospital" may be displayed. Also, even if a doctor is examined remotely rather than at an actual hospital, a doctor may be assigned to users with more severe symptoms.
  • the object selection system 5 can be applied to a taxi dispatch service. Upcoming events for a user can be determined from the user's schedule, and the priority of a user who has an important event that absolutely cannot be delayed, such as an exam or attending a birth, can be increased, and a taxi can be dispatched to this user with priority.
  • the priority of battery replacement may be optimized according to the priority of the user riding in the taxi. Travel time is shortened in vehicles with high-priority users by replacing the battery with a charged one, and in vehicles with low-priority users, if there is no charged battery, the battery may be charged over time. Although it takes time to charge the battery, this saves the lives of high-priority users, improving the happiness of society as a whole.
  • EV taxis may be one-way rental EV bikes. High-priority users may be guided to rental EV bikes that have a large charge and are parked nearby, while low-priority users may be guided to rental EV bikes that are running low on charge or are parked far away.
  • the priority of the user riding in the taxi may be reflected in the priority of the ride. For example, a taxi riding in a taxi with a user who has a fatal illness may continuously ride in the passing lane, while a taxi riding in a taxi with a user with a low priority may not be able to ride in the passing lane.
  • the object selection system 5 can be applied to various vehicles other than taxis, including ambulances and transport trucks.
  • the social importance of the vehicle may also be taken into consideration to calculate an overall priority, which may be reflected in the driving priority.
  • the priority may be increased by a user's payment.
  • a user who really wants to raise the priority can do so by paying a fee.
  • this is a negative factor in improving the happiness of society as a whole
  • the funds collected through the payment can be used to improve public welfare, making up for the negative impact on the happiness of society as a whole.
  • FIG. 18 is a block diagram showing an example of an object selection system 6 according to embodiment 6.
  • the object selection system 6 according to the sixth embodiment differs from the object selection system 1 according to the first embodiment in that it includes a gesture detection unit 115.
  • the following describes the object selection system 6 according to the sixth embodiment, focusing on the differences from the object selection system 1 according to the first embodiment, and omitting a description of the similarities.
  • the gesture detection unit 115 detects a user's gesture.
  • the gesture detection unit 115 may detect a user's hand gesture using an image of the user's hand captured by a camera mounted on the bottom of the AR glasses or VR worn by the user.
  • the gesture detection unit 115 may detect a user's head gesture using sensing data of an acceleration sensor mounted on the AR glasses or VR worn by the user, or an image of the user's head captured by a front camera provided in front of the user.
  • the gesture detection unit 115 may detect a user's facial gesture (such as opening the mouth) using an image of the user's mouth captured by a camera mounted on the bottom of the AR glasses or VR worn by the user, or the results of electromyography of facial muscles by electrodes provided on the AR glasses or VR worn by the user. Also, for example, the gesture detection unit 115 may detect a user's hand gesture using sensing data from an acceleration sensor of, for example, a watch-type wearable device worn on the user's wrist.
  • a user's facial gesture such as opening the mouth
  • the gesture detection unit 115 may detect a user's hand gesture using sensing data from an acceleration sensor of, for example, a watch-type wearable device worn on the user's wrist.
  • the voice acquisition unit 105 acquires the user's voice when a specific gesture by the user is detected. For example, the voice acquisition unit 105 may enter a speech acceptance state when the user covers his or her mouth with a hand. This makes it difficult for others to see that the user is speaking alone. Also, voice is reflected by the hand, making it easier to extract voice commands.
  • a particular gesture may be moving a hand to an ear as if making a phone call.
  • a particular gesture may also be a gesture that does not involve using the hands, such as tilting the head while opening the mouth.
  • the voice acquisition unit 105 may also acquire the user's voice (enter a speech acceptance state) when the user's gaze is fixed for a certain period of time. In this case, if there is no need to speak, the user can cancel the speech acceptance state by simply looking away.
  • the voice acquisition unit 105 may also acquire the user's voice according to a combination of the user's gaze and the user's gestures.
  • the gesture detection unit 115 can be applied to the object selection systems according to embodiments 1 to 5.
  • the voice command "What?" expressing a question is not limited to a question about an object.
  • the AR glasses or HMD may be equipped with a microphone, an odor sensor, or a temperature and humidity sensor, and questions may be asked about physical phenomena detected by these sensors, such as "What's this song?”, “What's that smell?", or "What's this heat?”.
  • AR glasses or HMDs were used as examples of application of the object selection system, but the object selection system can also be applied to a head-up display in the driver's seat of an automobile.
  • the present disclosure can be realized not only as an object selection system, but also as an object selection method that includes steps (processing) performed by components that make up the object selection system.
  • FIGS. 19 and 20 are flowcharts showing an example of an object selection method according to another embodiment.
  • the object selection method is a method executed by an object selection system, and includes, as shown in FIG. 19, an image acquisition step (step S11) of acquiring an image corresponding to the user's field of view and including distance information, a gaze detection step (step S12) of detecting the user's gaze, a setting step (step S13) of setting an attention area on the image where the user is focusing based on the detected user's gaze, an object detection step (step S14) of detecting one or more objects included in the attention area, a voice acquisition step (step S15) of acquiring the user's voice, a voice command analysis step (step S16) of analyzing the user's voice to extract a voice command indicating a position included in the user's voice, and an object selection step (step S17) of selecting an object corresponding to the extracted voice command from among one or more objects.
  • the object selection method is a method executed by the object selection system, and includes, as shown in FIG. 20, an image acquisition step (step S21) for acquiring an image corresponding to the user's field of view, a gaze detection step (step S22) for detecting the user's gaze, a setting step (step S23) for setting an attention area on the image where the user is focusing based on the detected gaze of the user, an object detection step (step S24) for detecting one or more objects included in the attention area, a category determination step (step S25) for determining a category of the one or more objects, a voice acquisition step (step S26) for acquiring the user's voice, a voice command analysis step (step S27) for analyzing the user's voice to extract a voice command indicating a category included in the user's voice, and an object selection step (step S28) for selecting an object corresponding to the extracted voice command from among one or more objects.
  • an image acquisition step for acquiring an image corresponding to the user's field of view
  • the present disclosure can be realized as a program for causing a computer (processor) to execute the steps included in the object selection method.
  • the present disclosure can be realized as a non-transitory computer-readable recording medium, such as a CD-ROM, on which the program is recorded.
  • each step is performed by running the program using hardware resources such as a computer's CPU, memory, and input/output circuits.
  • hardware resources such as a computer's CPU, memory, and input/output circuits.
  • each step is performed by the CPU obtaining data from memory or input/output circuits, etc., performing calculations, and outputting the results of the calculations to memory or input/output circuits, etc.
  • each component included in the object selection system may be configured with dedicated hardware, or may be realized by executing a software program suitable for each component.
  • Each component may be realized by a program execution unit such as a CPU or processor reading and executing a software program recorded on a recording medium such as a hard disk or semiconductor memory.
  • LSI is an integrated circuit. These may be individually implemented as single chips, or may be implemented as a single chip that includes some or all of the functions. Furthermore, the implementation of the integrated circuit is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. An FPGA (Field Programmable Gate Array) that can be programmed after LSI manufacture, or a reconfigurable processor that can reconfigure the connections and settings of circuit cells inside the LSI may also be used.
  • FPGA Field Programmable Gate Array
  • this disclosure also includes forms obtained by applying various modifications to the embodiments that a person skilled in the art may conceive, and forms realized by arbitrarily combining the components and functions of each embodiment within the scope that does not deviate from the spirit of this disclosure.
  • An object selection system comprising: an image acquisition unit that acquires an image corresponding to a user's field of vision; a gaze detection unit that detects the user's gaze; a setting unit that sets an attention area on the image that the user is focusing on based on the detected gaze of the user; an object detection unit that detects one or more objects included in the attention area; a voice acquisition unit that acquires the user's voice; a voice command analysis unit that analyzes the user's voice to extract a voice command indicating a position included in the user's voice; and an object selection unit that selects an object corresponding to the extracted voice command from the one or more objects.
  • a voice command extracted from a voice including the position of an object in which the user is interested e.g., a demonstrative pronoun
  • the object in which the user is interested within the area of interest can be selected from objects that exist in the depth direction.
  • the object selection system further includes a category determination unit that determines a category of the one or more objects, and the voice command analysis unit extracts the voice command indicating the position and category contained in the user's voice by analyzing the user's voice.
  • the object selection system described in Technology 1 or 2.
  • the object selection system according to any one of Technologies 1 to 3, further comprising a virtual object overlay unit that overlays a virtual object on the image, and the one or more objects include the virtual object.
  • AR or VR not only real objects but also virtual objects may appear in the user's field of vision, and the user may be interested in the virtual objects.
  • the object selection system further includes a user state estimation unit that estimates the state of the user, and a virtual object generation unit that generates the virtual object based on the state of the user, and the virtual object superimposition unit determines a position in the image at which to superimpose the virtual object based on the state of the user, and superimposes the virtual object at the determined position.
  • the object selection system described in Technology 5 further includes a user status management unit that manages the statuses of multiple users, and the virtual object generation unit further generates the virtual object based on the statuses of the multiple users.
  • the service resources that can be provided to the users may be limited.
  • An object selection system comprising: an image acquisition unit that acquires an image corresponding to a user's field of vision; a gaze detection unit that detects the user's gaze; a setting unit that sets an attention area on the image that the user is focusing on based on the detected gaze of the user; an object detection unit that detects one or more objects included in the attention area; a category determination unit that determines a category of the one or more objects; a voice acquisition unit that acquires the voice of the user; a voice command analysis unit that analyzes the user's voice to extract a voice command indicating a category included in the user's voice; and an object selection unit that selects an object corresponding to the extracted voice command from the one or more objects.
  • the object selection system further includes an object information storage unit that accumulates object information related to the one or more detected objects, the voice command analysis unit analyzes the user's voice to extract the voice command indicating a category and a past point in time contained in the user's voice, and the object selection unit selects an object corresponding to the extracted voice command from among the one or more previously detected objects indicated by the object information accumulated in the object information storage unit, the object selection system described in Technology 7.
  • a voice command extracted from a voice that includes not only the category of the object that the user is interested in, but also the time when the user focused on that object, it is possible to select an object that the user was interested in within the area of focus in the past. For example, it is possible to display objects that the user was interested in in the past.
  • the object selection system further includes a user information storage unit that stores user information related to the user, and the object selection unit selects an object corresponding to the extracted voice command from among the one or more objects by comparing the extracted voice command with the user information.
  • the object selection system further includes a gesture detection unit that detects a gesture of the user, and the voice acquisition unit acquires the voice of the user when a specific gesture of the user is detected.
  • the object selection system is described in any one of Technologies 1 to 9.
  • An object selection method executed by an object selection system comprising: an image acquisition step of acquiring an image corresponding to a user's field of view; a gaze detection step of detecting the gaze of the user; a setting step of setting an attention area on the image in which the user is focusing based on the detected gaze of the user; an object detection step of detecting one or more objects included in the attention area; a voice acquisition step of acquiring the voice of the user; a voice command analysis step of analyzing the voice of the user to extract a voice command indicating a position included in the voice of the user; and an object selection step of selecting an object corresponding to the extracted voice command from among the one or more objects.
  • This provides an object selection method that can determine which object within the area of interest the user is interested in.
  • An object selection method executed by an object selection system comprising: an image acquisition step of acquiring an image corresponding to a user's field of view; a gaze detection step of detecting the gaze of the user; a setting step of setting an attention area on the image where the user is focusing based on the detected gaze of the user; an object detection step of detecting one or more objects included in the attention area; a category determination step of determining a category of the one or more objects; a voice acquisition step of acquiring the voice of the user; a voice command analysis step of analyzing the voice of the user to extract a voice command indicating a category included in the voice of the user; and an object selection step of selecting an object corresponding to the extracted voice command from among the one or more objects.
  • This provides an object selection method that can determine which object within the area of interest the user is interested in.
  • This disclosure can be applied to computer user interfaces, AR, VR, digital assistants, information terminals, wearable devices, etc.
  • Object selection system 101 Image acquisition unit 102 Gaze detection unit 103 Setting unit 104 Object detection unit 105 Voice acquisition unit 106 Voice command analysis unit 107 Object selection unit 108 Category determination unit 109 Object information storage unit 110 User information storage unit 111 Virtual object superimposition unit 112 Virtual object generation unit 113 User state estimation unit 114 User situation management unit 115 Gesture detection unit 201, 202, 203, 204, 205, 206, 208, 209, 210, 211 Icon 207 Digital assistant

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Un système de sélection d'objet (1) comprend : une unité d'acquisition d'image (101) permettant d'acquérir une image correspondant à la ligne de visée d'un utilisateur ; une unité de détection de ligne de visée (102) pour détecter la ligne de visée de l'utilisateur ; une unité de réglage (103) permettant de régler, sur l'image, une région de regard, au niveau de laquelle l'utilisateur regarde, sur la base de la ligne de visée détectée de l'utilisateur ; une unité de détection d'objet (104) permettant de détecter au moins un objet inclus dans la région de regard ; une unité d'acquisition vocale (105) permettant d'acquérir la voix de l'utilisateur ; une unité d'analyse de commande vocale (106) permettant d'analyser la voix de l'utilisateur et d'extraire ainsi une commande vocale, qui est incluse dans la voix de l'utilisateur et indique une position ; et une unité de sélection d'objet (107) permettant de sélectionner un objet correspondant à la commande vocale extraite parmi un ou plusieurs objets.
PCT/JP2024/003097 2023-03-10 2024-01-31 Système de détection d'objet et procédé de sélection d'objet WO2024190127A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2023-038000 2023-03-10
JP2023038000 2023-03-10

Publications (1)

Publication Number Publication Date
WO2024190127A1 true WO2024190127A1 (fr) 2024-09-19

Family

ID=92754828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2024/003097 WO2024190127A1 (fr) 2023-03-10 2024-01-31 Système de détection d'objet et procédé de sélection d'objet

Country Status (1)

Country Link
WO (1) WO2024190127A1 (fr)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014174589A (ja) * 2013-03-06 2014-09-22 Mega Chips Corp 拡張現実システム、プログラムおよび拡張現実提供方法
US20200074993A1 (en) * 2016-12-20 2020-03-05 Samsung Electronics Co., Ltd. Electronic device, method for determining utterance intention of user thereof, and non-transitory computer-readable recording medium
WO2020110270A1 (fr) * 2018-11-29 2020-06-04 マクセル株式会社 Dispositif et procédé d'affichage de vidéo
WO2020152537A1 (fr) * 2019-01-27 2020-07-30 Gaurav Dubey Systèmes et procédés de détection d'intentions
US20220244835A1 (en) * 2019-09-27 2022-08-04 Apple Inc. Devices, Methods, and Graphical User Interfaces for Interacting with Three-Dimensional Environments
JP2022113713A (ja) * 2014-06-14 2022-08-04 マジック リープ, インコーポレイテッド 仮想および拡張現実を作成する方法およびシステム
JP2022121592A (ja) * 2017-04-19 2022-08-19 マジック リープ, インコーポレイテッド ウェアラブルシステムのためのマルチモード実行およびテキスト編集

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014174589A (ja) * 2013-03-06 2014-09-22 Mega Chips Corp 拡張現実システム、プログラムおよび拡張現実提供方法
JP2022113713A (ja) * 2014-06-14 2022-08-04 マジック リープ, インコーポレイテッド 仮想および拡張現実を作成する方法およびシステム
US20200074993A1 (en) * 2016-12-20 2020-03-05 Samsung Electronics Co., Ltd. Electronic device, method for determining utterance intention of user thereof, and non-transitory computer-readable recording medium
JP2022121592A (ja) * 2017-04-19 2022-08-19 マジック リープ, インコーポレイテッド ウェアラブルシステムのためのマルチモード実行およびテキスト編集
WO2020110270A1 (fr) * 2018-11-29 2020-06-04 マクセル株式会社 Dispositif et procédé d'affichage de vidéo
WO2020152537A1 (fr) * 2019-01-27 2020-07-30 Gaurav Dubey Systèmes et procédés de détection d'intentions
US20220244835A1 (en) * 2019-09-27 2022-08-04 Apple Inc. Devices, Methods, and Graphical User Interfaces for Interacting with Three-Dimensional Environments

Similar Documents

Publication Publication Date Title
US11937929B2 (en) Systems and methods for using mobile and wearable video capture and feedback plat-forms for therapy of mental disorders
JP6264492B1 (ja) 運転者監視装置、運転者監視方法、学習装置及び学習方法
CN109074117B (zh) 提供基于情绪的认知助理系统、方法及计算器可读取媒体
JP2017168097A (ja) 状況に特有の車両ドライバとの交信を提供するためのシステムおよび方法
JP2004310034A (ja) 対話エージェントシステム
JPWO2019207896A1 (ja) 情報処理システム、情報処理方法、および記録媒体
US20210134062A1 (en) Artificial intelligence enabled mixed reality system and method
JP6722347B2 (ja) 行動支援システム、行動支援装置、行動支援方法およびプログラム
CN114090862A (zh) 信息处理方法、装置及电子设备
WO2024190127A1 (fr) Système de détection d'objet et procédé de sélection d'objet
EP3889970A1 (fr) Système de prise en charge de diagnostic
Lotz et al. Recognizing behavioral factors while driving: A real-world multimodal corpus to monitor the driver’s affective state
JP7469467B2 (ja) デジタルヒューマンに基づく車室インタラクション方法、装置及び車両
WO2023031941A1 (fr) Expérience de conversation artificielle
KR102689884B1 (ko) 차량 및 그 제어방법
WO2024214707A1 (fr) Système de commande d'action
US20240319789A1 (en) User interactions and eye tracking with text embedded elements
JP7263848B2 (ja) 状態改善装置、状態改善システム、および状態改善方法
JP7416244B2 (ja) 音声生成装置、音声生成方法及び音声生成プログラム
JP7416245B2 (ja) 学習装置、学習方法及び学習プログラム
WO2024214792A1 (fr) Système de commande de comportement
US20240100908A1 (en) Vehicle
JP2024159685A (ja) 電子機器
JP2024159688A (ja) エージェントシステム
CN118824492A (zh) 一种老年人就医自助服务方法、服务系统、介质以及产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24770247

Country of ref document: EP

Kind code of ref document: A1