CN118197315A

CN118197315A - Cabin voice interaction method, system and computer readable medium

Info

Publication number: CN118197315A
Application number: CN202410609091.2A
Authority: CN
Inventors: 蒋磊; 蔡勇; 刘新; 陆晨昱; 蔡超; 葛德发; 方露雨; 李娜
Original assignee: Hozon New Energy Automobile Co Ltd
Current assignee: Hozon New Energy Automobile Co Ltd
Priority date: 2024-05-16
Filing date: 2024-05-16
Publication date: 2024-06-14

Abstract

The invention provides a cabin voice interaction method, a cabin voice interaction system and a computer readable medium. The cabin voice interaction method is suitable for a vehicle cabin and comprises the following steps: acquiring a voice instruction of a user, determining the position of the user through the voice instruction, and converting the voice instruction into a text; the method comprises the steps of obtaining visual orientation information of a user, determining an interactive object of a voice instruction through the visual orientation information, and obtaining the interactive information, wherein the interactive object comprises multimedia content, multimedia equipment and vehicle cabin hardware; inputting the text and the interaction information to a cloud control center, and the cloud control center obtains the interaction instruction according to the text and the interaction information and outputs the interaction instruction to the vehicle to be executed.

Description

Cabin voice interaction method, system and computer readable medium

Technical Field

The invention mainly relates to the technical field of vehicle-mounted voice interaction, in particular to a cabin voice interaction method, a cabin voice interaction system and a computer readable medium.

Background

With the rapid development of the vehicle industry and the development of vehicle intellectualization, vehicles with voice interaction function are increasingly popular. However, the traditional automobile cabin functional area layout is fragmented, information overload brings a barrier for human-vehicle interaction, so that the value of the automobile as an interaction entrance is underestimated, and as the voice technology is increasingly widely applied to the automobile, the human-vehicle interaction modes are enriched, and the riding experience of a user is improved. The vehicle interior can be provided with a plurality of intelligent terminal display devices, such as a central control large screen positioned in the front row, a display device arranged at the back of the seat and the like, and the intelligent terminal display devices have a voice interaction function. With the increase of interactive devices and voice instructions in the cabin, the current voice interaction method provides higher requirements on the accuracy of the voice instructions of the user, has poor understanding capability on the voice instructions, and can cause unsmooth interaction process and influence user experience.

Disclosure of Invention

The invention aims to provide a cabin voice interaction method, a cabin voice interaction system and a computer readable medium, wherein the cabin voice interaction method, the cabin voice interaction system and the computer readable medium are more accurate in recognition and smoother in interaction.

In order to solve the technical problems, the invention provides a cabin voice interaction method which is applicable to a vehicle cabin and comprises the following steps: acquiring a voice instruction of a user, determining the position of the user through the voice instruction, and converting the voice instruction into a text; the method comprises the steps of obtaining visual orientation information of a user, determining an interactive object of a voice instruction through the visual orientation information, and obtaining the interactive information, wherein the interactive object comprises multimedia content, multimedia equipment and vehicle cabin hardware; inputting the text and the interaction information to a cloud control center, and the cloud control center obtains the interaction instruction according to the text and the interaction information and outputs the interaction instruction to the vehicle to be executed.

In one embodiment of the present invention, determining the user location via voice instructions includes: collecting sound signals in a vehicle cabin through a microphone array, preprocessing the sound signals and estimating time; and determining the user position through a beam forming algorithm according to the time delay estimation result; wherein the microphone array comprises a plurality of microphones.

In one embodiment of the invention, the conversion of speech instructions into text uses ASR technology.

In an embodiment of the present invention, acquiring visual orientation information of a user includes: the user is facial tracked and eye tracked via visual signals from the OMS/DMS devices within the vehicle cabin.

In an embodiment of the present invention, when the interactive object is multimedia content, the cabin voice interactive method further includes: determining at least one first display device corresponding to visual orientation information among at least one display device within a vehicle cabin; determining an interactable area in at least one first display device; and when the interaction information in the interactable area can be acquired through the vehicle machine, inputting the interaction information to the cloud control center.

In an embodiment of the present invention, when the interaction information in the interactable area cannot be obtained through the vehicle, the step of obtaining the interaction information by the cloud control center includes: and acquiring an image in the interactable area, and carrying out entity identification on the image to obtain interaction information.

In an embodiment of the present invention, when there are a plurality of first display devices, the cloud control center obtaining an interaction instruction according to text and interaction information includes: comparing the similarity of the text and the interactive information to obtain a plurality of alternative operation items; and sorting the similarity of the plurality of candidate operation items, wherein the candidate operation item with the highest similarity is used as an interaction instruction.

In an embodiment of the present invention, when the interactive object is a multimedia device, the cabin voice interactive method further includes: determining a first display device corresponding to visual orientation information in at least one display device within a vehicle cabin; and acquiring interaction information of the first display device, wherein the interaction information comprises opening, closing and brightness adjustment.

In an embodiment of the present invention, when the interactive object is cabin hardware, the cabin voice interactive method further includes: determining first cabin hardware corresponding to visual orientation information in at least one cabin hardware in a vehicle cabin and acquiring interaction information of the first cabin hardware; wherein the first cabin hardware includes windows, seats, rearview mirrors, sound and lighting systems.

The invention also provides a cabin voice interaction system, which comprises: a memory for storing instructions executable by the processor; and a processor for executing the instructions to implement the method of any of the previous embodiments.

The invention also provides a computer readable medium storing computer program code which, when executed by a processor, implements the cabin voice interaction method of any of the previous embodiments.

Compared with the prior art, the invention has the following advantages: aiming at the control of multimedia content, multimedia equipment and vehicle cabin hardware, the voice command and visual orientation information of a user are combined, the understanding capability of the voice command is improved, the interaction accuracy is improved, the use feeling of the user is improved, the image in the interactable area can be obtained through means such as a camera or screen capturing when the multimedia content cannot be called, and the entity recognition is carried out on the image so as to obtain the interaction information.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the accompanying drawings:

Fig. 1 is a flow chart of a cabin voice interaction method according to an embodiment of the invention.

Fig. 2 is a partial flow chart of a cabin voice interaction method in accordance with another embodiment of the invention.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is apparent to those of ordinary skill in the art that the present application may be applied to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description. Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate. In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

In the description of the present application, it should be understood that the azimuth or positional relationships indicated by the azimuth terms such as "front, rear, upper, lower, left, right", "lateral, vertical, horizontal", and "top, bottom", etc., are generally based on the azimuth or positional relationships shown in the drawings, merely to facilitate description of the present application and simplify the description, and these azimuth terms do not indicate and imply that the apparatus or elements referred to must have a specific azimuth or be constructed and operated in a specific azimuth, and thus should not be construed as limiting the scope of protection of the present application; the orientation word "inner and outer" refers to inner and outer relative to the contour of the respective component itself.

Spatially relative terms, such as "above … …," "above … …," "upper surface on … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial location relative to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "above" or "over" other devices or structures would then be oriented "below" or "beneath" the other devices or structures. Thus, the exemplary term "above … …" may include both orientations "above … …" and "below … …". The device may also be positioned in other different ways (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

In addition, the terms "first", "second", etc. are used to define the components, and are only for convenience of distinguishing the corresponding components, and the terms have no special meaning unless otherwise stated, and therefore should not be construed as limiting the scope of the present application. Furthermore, although terms used in the present application are selected from publicly known and commonly used terms, some terms mentioned in the present specification may be selected by the applicant at his or her discretion, the detailed meanings of which are described in relevant parts of the description herein. Furthermore, it is required that the present application is understood, not simply by the actual terms used but by the meaning of each term lying within.

A flowchart is used in the present application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously. At the same time, other operations are added to or removed from these processes.

Fig. 1 is a flow chart of a cabin voice interaction method according to an embodiment of the invention. As described with reference to fig. 1, the present invention provides a cabin voice interaction method 10 suitable for a vehicle cabin, comprising the steps of:

s11: acquiring a voice instruction of a user, determining the position of the user through the voice instruction, and converting the voice instruction into a text;

s12: the method comprises the steps of obtaining visual orientation information of a user, determining an interactive object of a voice instruction through the visual orientation information, and obtaining the interactive information;

S13: inputting the text and the interaction information to a cloud control center, and the cloud control center obtains the interaction instruction according to the text and the interaction information and outputs the interaction instruction to the vehicle to be executed.

In this embodiment, determining the user position by voice command in step S11 employs Beamforming (Beamforming) technology, specifically including: the method comprises the steps of collecting sound signals in a vehicle cabin through a microphone array, preprocessing the sound signals, estimating time delay, and determining the position of a user through a beam forming algorithm according to the result of the time delay estimation, wherein the microphone array comprises a plurality of microphones.

Specifically, the microphone array disposed in the vehicle cabin includes a plurality of microphones, preferably may include a dual microphone or four microphones, wherein the dual microphone is generally disposed between the main driver's seat and the co-driver's seat, and the four microphones are newly added with two microphones based on the dual microphones, and are disposed at the left rear and the right rear of the cabin, respectively. The plurality of microphones can respectively collect sound signals in the vehicle cabin, and synchronize and preprocess the sound signals, such as filtering, denoising and the like, so as to reduce the influence of environmental noise and interference on positioning accuracy.

It will be appreciated that due to the different locations, different microphones will receive sound waves at different times, even for the same speech signal from the same user, and that the position of the sound source can be estimated by measuring the time difference between the Arrival of the sound at the different microphones using a time delay estimate (TIME DELAY of Arrival, TDOA). Further, using a specific algorithm, such as a Delay-and-Sum (Delay-Sum) beamforming algorithm, adjusting the signal of each microphone in the array according to the result of the Delay estimation may cause the microphone array to form a main beam in the direction of the sound source (i.e., the direction of the user who is giving the voice command), where the peak direction of the beam is regarded as the azimuth of the sound source.

In this embodiment, the conversion of speech commands into text uses ASR techniques, which include preprocessing the speech commands and sequentially inputting the processed audio features into an acoustic model and a language model, followed by decoding and post-processing to output a continuous text stream. Specifically, the pretreatment includes: sampling and quantizing, the audio signal is first sampled and quantized into a digital signal; noise is eliminated, and the influence of background noise is reduced through a noise suppression technology; and feature extraction, extracting features from the audio signal, such as mel-frequency cepstral coefficients (MFCCs), filter bank energy (FBANK), spectrograms, and the like.

The acoustic model and the language model are required to be trained in advance and then put into use, wherein the training of the acoustic model requires the use of a large amount of labeled voice data and corresponding texts, the acoustic model is trained to recognize the relations between different sound features and language units (such as phonemes), the preprocessed audio features are input into the trained acoustic model, and the acoustic model can output the probability distribution of the phonemes or states corresponding to each time frame. The language model is trained to predict the probability distribution of words after a given pre-term, which helps identify the entire sentence rather than just individual words, in conjunction with the output of the acoustic model, which helps identify the most likely word sequence.

Further, the decoder uses a search algorithm (such as Viterbi algorithm) to find the most probable word sequence under the guidance of the acoustic model and the language model, and outputs the most probable word sequence, and the output group of words most likely form the speaker's voice. Post-processing adds punctuation and capitalization to the converted text according to grammar rules and context, and uses the grammar rules or additional models to correct errors that may occur during recognition.

In one embodiment of the present invention, the obtaining of the user' S visual orientation information in step S12 includes facial tracking and eye tracking of the user via visual signals from the OMS/DMS device within the vehicle cabin. Specifically, the face orientation tracking includes the steps of:

(1) Face detection: capturing face images using an on-board camera OMS/DMS typically involves using face detection algorithms such as Haar cascading, deep learning models (e.g., convolutional neural network CNN), etc.;

(2) Characteristic point positioning: when a face is detected, the next step is to locate key feature points of the face, such as eyes, nose, mouth, cheekbones, etc. by face detection to describe the geometry of the face;

(3) Fitting a face model: fitting a face model to the detected face using a set of predefined facial feature points;

(4) Posture estimation: estimating the pose of the head, including pitch, yaw and roll angles, by analyzing the relative positions between the feature points and parameters of the facial model;

(5) Three-dimensional reconstruction: if depth cameras or stereo cameras are used, the three-dimensional shape of the face may also be reconstructed by triangulation or other stereo vision techniques;

(6) Data smoothing and filtering: to reduce noise and tracking instability, it is often necessary to smooth the tracking data using a kalman filter or other smoothing algorithm;

(7) Tracking and updating: over time, the appearance and pose of the face may change, so the tracking system needs to continually update the face model and the locations of the feature points to reflect these changes.

Eye tracking comprises the following steps:

image capturing: capturing an image of the eye using an on-board camera OMS/DMS, which may be an infrared camera, a visible camera, or a combination of both;

Pupil detection: detecting the position of the pupil by image processing algorithms, typically involves identifying the contrast between the pupil and the iris, sclera, and eyelid;

characteristic point positioning: once the pupil is detected, the system will further locate other key feature points of the eye, such as pupil edges, corneal reflection points, etc.;

Eye model fitting: fitting an eye model to the captured image using the located feature points can help explain the geometry and optical characteristics of the eye;

Line of sight estimation: estimating the gaze direction by analyzing the positions of the pupil and cornea reflection points, and the eye model, typically involving geometric or optical calculations, to determine three-dimensional spatial points of eye gaze;

head pose correction: if necessary, the system corrects the posture change of the head to ensure the accuracy of eyeball tracking, and the step can be realized by an additional head tracking technology;

Data smoothing and filtering: to reduce noise and tracking instability, it is often necessary to smooth the tracking data using a kalman filter or other smoothing algorithm;

Tracking and updating: over time, the position of the eye and gaze point may change, and thus the tracking system needs to continually update the eye model and the position of the feature points to reflect these changes.

Further, the interactive object includes multimedia content, multimedia devices, and vehicle cabin hardware in step S12. Fig. 2 is a partial flow chart of a cabin voice interaction method in accordance with another embodiment of the invention. Referring to fig. 1-2 in combination, when the interactive object is multimedia content, the cabin voice interactive method further includes:

S21: determining at least one first display device corresponding to visual orientation information among at least one display device within a vehicle cabin;

S22: determining an interactable area in at least one first display device;

s23: judging whether the interactive information in the interactive area can be acquired through the vehicle machine, if so, executing a step S25, otherwise, executing a step S24;

S24: acquiring an image in the interactable area, performing entity identification on the image to obtain interaction information, and executing step S25;

S25: and inputting the interaction information to a cloud control center.

In particular, multimedia content may be understood as content displayed on a display device (including a main display screen, a sub display screen, etc.) in a cabin of a vehicle, including some interactable objects, such as when a user looks at the main display screen, at which time the main display screen is determined to be the first display device, and interactable areas therein, such as various software UIs, etc., are determined. The interactive information in the interactable area can be directly acquired through a system or can access an interface provided by a third party application under some conditions so as to acquire the content, and can not be directly acquired under other conditions, at the moment, the image in the interactable area can be acquired through a camera or a screen capturing method and the like, and entity identification is carried out on the image so as to acquire the interactive information. For example, different video software exists on a plurality of screens, and the user can start the video software only by looking at the screen which wants to start and speaking "video".

In an embodiment of the present invention, when there are a plurality of first display devices, the cloud control center obtaining an interaction instruction according to text and interaction information includes: and comparing the similarity between the text and the interaction information to obtain a plurality of alternative operation items, and sorting the similarity between the plurality of alternative operation items, wherein the alternative operation item with the highest similarity is used as the interaction instruction. For example, a plurality of adjacent screens exist in the interactable area pointed by the user's sight, the screens are all first display devices, at the moment, the user sends out a voice command of opening video, the method can acquire interaction information (i.e. operable video software) on the plurality of screens, and compares the similarity between texts and the interaction information, and finally only the alternative operation item with the highest similarity is output as the interaction command.

Further, in an embodiment of the present invention, when the interactive object is a multimedia device, the cabin voice interactive method further includes: and determining a first display device corresponding to the visual orientation information in at least one display device in the vehicle cabin, and acquiring interaction information of the first display device, wherein the interaction information comprises opening, closing and brightness adjustment.

Still further, in an embodiment of the present invention, when the interactive object is cabin hardware, the cabin voice interactive method further includes: a first cabin hardware corresponding to the visual orientation information in at least one cabin hardware within the vehicle cabin is determined and interaction information for the first cabin hardware is obtained, wherein the first cabin hardware includes windows, seats, rearview mirrors, sound and illumination systems.

Some aspects of the application may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. The processor may be one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital signal processing devices (DAPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, or a combination thereof. Furthermore, aspects of the application may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media. For example, computer-readable media can include, but are not limited to, magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips … …), optical disks (e.g., compact disk CD, digital versatile disk DVD … …), smart cards, and flash memory devices (e.g., card, stick, key drive … …).

The computer readable medium may comprise a propagated data signal with the computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take on a variety of forms, including electro-magnetic, optical, etc., or any suitable combination thereof. A computer readable medium can be any computer readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer readable medium may be propagated through any suitable medium, including radio, cable, fiber optic cable, radio frequency signals, or the like, or a combination of any of the foregoing.

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements and adaptations of the application may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within the present disclosure, and therefore, such modifications, improvements, and adaptations are intended to be within the spirit and scope of the exemplary embodiments of the present disclosure.

Meanwhile, the present application uses specific words to describe embodiments of the present application. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the application may be combined as suitable.

Similarly, it should be noted that in order to simplify the description of the present disclosure and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure does not imply that the subject application requires more features than are set forth in the claims. Indeed, less than all of the features of a single embodiment disclosed above.

In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the number allows for a 20% variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations in some embodiments for use in determining the breadth of the range, in particular embodiments, the numerical values set forth herein are as precisely as possible.

While the application has been described with reference to the specific embodiments presently, it will be appreciated by those skilled in the art that the foregoing embodiments are merely illustrative of the application, and various equivalent changes and substitutions may be made without departing from the spirit of the application, and therefore, all changes and modifications to the embodiments are intended to be within the scope of the appended claims.

Claims

1. The cabin voice interaction method is suitable for a vehicle cabin and is characterized by comprising the following steps of:

Acquiring a voice instruction of a user, determining the position of the user through the voice instruction, and converting the voice instruction into a text;

The method comprises the steps of obtaining visual orientation information of a user, determining an interactive object of the voice instruction through the visual orientation information, and obtaining interactive information, wherein the interactive object comprises multimedia content, multimedia equipment and vehicle cabin hardware;

inputting the text and the interaction information to a cloud control center, obtaining an interaction instruction by the cloud control center according to the text and the interaction information, outputting the interaction instruction to a vehicle machine for execution,

The cabin voice interaction method further comprises the following steps:

determining at least one first display device corresponding to the visual orientation information among at least one display device within the vehicle cabin;

Determining an interactable area in the at least one first display device; and

When the interaction information in the interactable area can be obtained through the vehicle machine, inputting the interaction information to a cloud control center;

when the interaction information in the interactable area cannot be obtained through the vehicle, the step of obtaining the interaction information by the cloud control center further includes: and acquiring an image in the interactable area, and carrying out entity identification on the image to obtain the interaction information.

2. The cabin voice interaction method of claim 1, wherein determining the user location via the voice command comprises:

Collecting sound signals in the vehicle cabin through a microphone array, preprocessing the sound signals and estimating time; and

Determining the user position through a beam forming algorithm according to the time delay estimation result; wherein,

The microphone array includes a plurality of microphones.

3. A cabin voice interaction method as in claim 1, wherein the conversion of the voice command into text employs ASR techniques.

4. The cabin voice interaction method of claim 1, wherein acquiring visual orientation information of the user comprises: the user is facial tracked and eye tracked via visual signals of the OMS/DMS devices within the vehicle cabin.

5. The cabin voice interaction method of claim 1, wherein when there are a plurality of first display devices, the cloud control center obtaining an interaction instruction according to the text and the interaction information comprises:

Performing similarity comparison on the text and the interaction information to obtain a plurality of alternative operation items; and

And sorting the similarity of the plurality of candidate operation items, and taking the candidate operation item with the highest similarity as the interaction instruction.

6. The cabin voice interaction method of any one of claims 1-4, wherein when the interaction object is a multimedia device, the cabin voice interaction method further comprises:

Determining a first display device corresponding to the visual orientation information among at least one display device within the vehicle cabin; and

And acquiring the interaction information of the first display device, wherein the interaction information comprises opening, closing and brightness adjustment.

7. The cabin voice interaction method of any one of claims 1-4, wherein when the interaction object is cabin hardware, the cabin voice interaction method further comprises:

Determining a first cabin hardware corresponding to the visual orientation information in at least one cabin hardware in the vehicle cabin and acquiring the interaction information of the first cabin hardware; wherein,

The first cabin hardware includes windows, seats, rearview mirrors, sound and lighting systems.

8. A cabin voice interactive system, comprising:

a memory for storing instructions executable by the processor; and

A processor for executing the instructions to implement the method of any one of claims 1-7.

9. A computer readable medium storing computer program code which, when executed by a processor, implements the cabin voice interaction method of any one of claims 1-7.