CN114499712A

CN114499712A - Gesture recognition method, device and storage medium

Info

Publication number: CN114499712A
Application number: CN202111578495.2A
Authority: CN
Inventors: 单元元; 周济; 王小乾; 李伟泽; 赵素雅; 朱帅
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-05-13
Anticipated expiration: 2041-12-22
Also published as: CN114499712B

Abstract

The application relates to a gesture recognition method, a device and a storage medium. The application discloses a gesture recognition method, which comprises the following steps: obtaining first CSI gesture data from a first environment; training an improved confrontation discriminant domain adaptive ADDA model using the first CSI gesture data, wherein the ADDA model is improved by using a Convolutional Neural Network (CNN) -bidirectional long-short term memory (BilsTM) network model as a feature extractor, the CNN-BilsTM network model being a network model combining CNN and BilsTM based on an attention mechanism; the ADDA model is used for gesture recognition based on second CSI gesture data from a second environment, wherein the first environment and the second environment are different.

Description

Gesture recognition method, device and storage medium

Technical Field

The invention relates to the technical field of communication, in particular to a method for gesture recognition based on Channel State Information (CSI) of a WIFI system.

Background

With the continuous development of the internet of things technology, the man-machine interaction technology applied to smart homes and medical health scenes becomes more and more important. Gestures become important modes in human-computer interaction due to the characteristics of convenience, understandability, rich meanings and the like, meanwhile, a wireless sensing technology is continuously broken through, and WIFI equipment is widely deployed in a living environment, so that the research of gesture recognition based on channel state information of a WIFI system becomes a hotspot direction. The CSI of the WIFI system is utilized for gesture recognition, and the method has the advantages of being free from illumination influence, free from wearing special equipment and the like.

However, the current method for gesture recognition based on CSI of WIFI system mainly faces the following two problems:

1) is susceptible to environmental interference

Although the existing gesture recognition technology based on WIFI has a high recognition rate in a single environment, along with the change of the WIFI environment, such as different positions of indoor objects and even the change of experimenters, the multipath effect of a WIFI channel can be influenced, the feature distribution of collected gesture data can be different, and the accuracy of gesture recognition is reduced greatly. That is, the accuracy of gesture recognition is greatly reduced by training a model by using gesture data in one environment and then directly applying the model to other different environments.

2) Gesture action singleness in gesture data

When the existing gesture recognition method based on WIFI is used for collecting data aiming at WIFI CSI activity, each sample of the human body activity CSI is single, namely the experimental data are ideal, and the time sequence characteristics of the WIFI CSI data are not considered, so that the over-fitting phenomenon exists in the experimental result.

The existing gesture recognition technical scheme based on the CSI of the WIFI system mainly solves the problem of classification and recognition of gesture features. Patent CN109766951A provides a WIFI gesture recognition method based on time-frequency statistical characteristics. The method comprises the steps of extracting CSI amplitude data by utilizing gesture data received by a network card, conducting low-pass filtering pretreatment on the CSI amplitude data to reduce environmental noise, conducting dimension reduction treatment on the CSI amplitude data through a Singular Value Decomposition (SVD) algorithm, then extracting Time-frequency characteristics of signals through Short-Time Fourier Transform (STFT), conducting statistical characteristic extraction and characteristic standardization treatment on the Time-frequency characteristics to obtain statistical characteristics which can be used for classification, and finally conducting gesture classification judgment through a k-neighborhood (k-Nearest Neighbor, kNN) classification algorithm. The method can effectively classify and recognize the gesture characteristics, solves the problem of gesture recognition in an indoor single complex environment, and does not solve the problem of poor universality.

With the development of the internet of things technology, the WIFI gesture recognition technology is widely applied to smart homes and medical health scenes, and the problems of easy environmental interference and poor universality of the existing WIFI gesture recognition method are not beneficial to the wide-range application of the WIFI gesture recognition technology.

Disclosure of Invention

In view of this, embodiments of the present invention provide a domain-based self-adaptive WIFI gesture recognition method and apparatus, so as to solve the problems that the existing WIFI gesture recognition method is easily interfered by the environment and has poor universality.

According to a first aspect, an embodiment of the present invention provides a gesture recognition method, including: obtaining first CSI gesture data from a first environment; training an improved confrontation discriminant field adaptive ADDA (adaptive Discriminative Domain attachment) model by using the first CSI gesture data, wherein the ADDA model is improved by using a Convolutional Neural Network (CNN) -bidirectional long-short term memory (BilSTM) network model as a feature extractor, and the CNN-BilSTM network model is a network model combining the CNN and the BilSTM based on an attention mechanism; the ADDA model is used for gesture recognition based on second CSI gesture data from a second environment, wherein the first environment and the second environment are different.

The existing WIFI CSI gesture recognition technology is used for model training in a single environment, and if the existing WIFI CSI gesture recognition technology is directly applied to other environments, the gesture recognition accuracy is reduced, and data with labels needs to be collected for training again.

The application applies the countermeasure domain self-adaptive model to the CSI gesture recognition field, modifies the feature extraction network in the countermeasure discrimination field self-adaptive ADDA model into CNN-BilSTM, solves the problem that data needs to be collected again in different environments for training, and in addition, the universality of the gesture recognition model can be improved by using the improved ADDA model, and the influence of environmental factors is reduced.

With reference to the first aspect, in a first embodiment of the first aspect, the method further includes: obtaining the second CSI gesture data; classifying the second CSI gesture data using the trained improved ADDA model to obtain a corresponding gesture class.

The training of the ADDA model may be performed using only first CSI gesture data from a first environment, and the trained ADDA model may then be used to gesture classify CSI gesture data from another, different environment without requiring retraining by collecting tagged data from the other, different environment.

With reference to the first aspect, in a second implementation form of the first aspect, the ADDA model is further refined by using a loss function based on Wassertein distance as the loss function for generating the countermeasure.

The application further improves an ADDA model, and a loss function for generating the countermeasures is replaced by a loss function based on Wasserstein distance. The loss function based on the Wassertein distance is used as the loss function in the ADDA model, so that the model trained through certain environment data still has high accuracy when being applied to other different environments, and the problem of poor universality of the gesture recognition method is further solved.

With reference to the first aspect, in a third implementation of the first aspect, the CNN-BiLSTM network model is trained using third CSI gesture data from the first environment or the second environment.

That is, the CNN-BilSTM network model may be trained using gesture data in a single environment. And the third CSI gesture data may be the same as or different from the first CSI gesture data or the second CSI gesture data. Specifically, the CNN-BiLSTM network model may be trained using data from different environments, and finally, the CNN-BiLSTM network model trained using data from which environment is used may be determined by a certain index, so as to optimize the pattern effect.

With reference to the first aspect, in a fourth implementation of the first aspect, the training an improved confrontational discriminant domain adaptive ADDA model using the first CSI gesture data includes: dividing the first CSI gesture data into source domain data and target domain data; performing feature extraction and gesture classification on the source domain data by using the CNN-BilSTM network model to obtain gesture categories and extracted features; and training the improved ADDA model using the gesture classes and extracted features and the target domain data.

The CNN-BilSTM network model is used for carrying out feature extraction and gesture classification on the source domain data, and the purpose is to improve the quality of the data.

With reference to the first aspect, in a fifth implementation of the first aspect, the first CSI gesture data and/or the second CSI gesture data comprise CSI gesture data associated with gesture actions performed in a plurality of different directions.

The existing gesture recognition technology generally adopts a gesture recognition algorithm in a single environment, the algorithm has very many restrictions on the provision of gesture actions and equipment for experimenters, for example, the orientation of the experimenters and the arrangement of the experimenters need to be restricted, which is not in accordance with the actual life scene.

The CSI gesture data acquired by the application not only relate to a plurality of different environments, but also relate to gesture actions carried out towards a plurality of different directions, so that the diversity of gestures is increased, and the gesture actions in the gesture database are enriched.

With reference to the first aspect, in a sixth implementation manner of the first aspect, the acquiring the channel state information CSI gesture data includes: collecting original CSI gesture data, wherein the original CSI gesture data are CSI data packets with three-dimensional data dimensions; denoising the original CSI gesture data; and extracting a gesture segment from the denoised raw CSI gesture data.

With reference to the first aspect, in a seventh implementation manner of the first aspect, the extracting the gesture segment from the denoised raw CSI gesture data includes: determining the size of a sliding window; calculating a variance of the denoised raw CSI gesture data in each window; and extracting the gesture segment according to the variance and the size of the buffer area.

The gesture CSI data are subjected to gesture segment interception by a buffer-based gesture segment interception algorithm. The gesture segment intercepting algorithm based on the buffer area is improved on the basis of gesture segment intercepting based on the sliding window, and the buffer area is added to buffer gesture segments when the starting stage and the ending stage of gesture data are judged.

According to a second aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing therein computer instructions, and the processor executing the computer instructions to perform the gesture recognition method according to the first aspect or any one of the embodiments of the first aspect.

According to a third aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the gesture recognition method described in the first aspect or any one of the implementation manners of the first aspect.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:

FIG. 1 shows a flow diagram of a gesture recognition method according to an embodiment of the present application;

fig. 2 shows a flow diagram of a method for obtaining CSI gesture data according to an embodiment of the application;

FIG. 3 illustrates a gesture capture device layout;

FIG. 4 shows a flow diagram of a method for collecting raw CSI gesture data according to an embodiment of the present application;

FIG. 5 illustrates a flow diagram of a method for denoising raw CSI gesture data according to an embodiment of the application;

FIG. 6 illustrates a flow diagram of a method for extracting gesture fragments from denoised raw CSI gesture data according to an embodiment of the present application;

FIG. 7 illustrates a flow diagram for determining a start of a gesture segment according to an embodiment of the present application;

FIG. 8 illustrates a gesture recognition apparatus according to an embodiment of the present application;

FIG. 9 shows an electronic device according to an embodiment of the application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a flowchart of a WIFI gesture recognition method based on domain self-adaptation according to an embodiment of the present application, in this embodiment, the method applies a domain self-adaptation model to the WIFI CSI gesture recognition domain, and predicts a tag of data without a tag in one environment by using data with a tag in another environment, so as to reduce the cost of data collection, improve the universality of the gesture recognition method, and reduce the influence of environmental factors. Specifically, the domain adaptive model is an improved confrontation discriminant domain adaptive ADDA model, and the feature extraction network in the ADDA model is modified to CNN-BilSTM, which solves the problem of data recollection training in different environments, and moreover, the improved ADDA can improve the universality of the gesture recognition model and reduce the influence of environmental factors.

As shown in fig. 1, the gesture recognition method may include:

s11: first CSI gesture data from a first environment is obtained.

S12: training an improved confrontational discrimination domain adaptive ADDA model using the first CSI gesture data, wherein the ADDA model is improved by using a Convolutional Neural Network (CNN) -bidirectional long short term memory (BilSTM) network model as a feature extractor, the CNN-BilSTM network model being a network model combining CNN and BilSTM based on an attention mechanism; the ADDA model is used for gesture recognition based on second CSI gesture data from a second environment, wherein the first environment and the second environment are different.

The feature extractor in this improved ADDA model uses the CNN-BilSTM network model. Among them, BilSTM is a machine learning algorithm, and one of its features is with attention mechanism. That is, the present application adopts a combination of CNN and a BilTM network structure based on attention mechanism to extract gesture features. The CNN-BilSTM network model may be trained using gesture data from a single environment (e.g., the first environment in S11), in particular, by:

for the preprocessed data input into the network, firstly, CNN is used for carrying out feature extraction on the data, after high-dimensional features are obtained through extraction, high-dimensional feature features are mapped into sequence features, a BilSTM learning with attention mechanism is used for learning feature representation related to the whole situation (for example, a feature representation table set before learning and containing feature values such as gender (male and female)), then the features are input into a full connection layer, and finally gesture classification is carried out. In the initial stage of model training, training data with labels are input into a CNN-BilSTM network to obtain prediction labels, then loss function values are calculated by using the prediction values and real label values, and model parameters are updated and optimized through the back propagation of an optimization method based on gradient.

The loss function of the gesture feature extraction may be cross entropy loss, and the calculation formula is as follows:

wherein p is_iRepresenting the actual output of the neuron; y is_iIndicating the desired output, 1 for a positive class and 0 for a negative class.

In one embodiment, the gesture recognition method further includes:

obtaining the second CSI gesture data;

classifying the second CSI gesture data using the trained improved ADDA model to obtain a corresponding gesture class.

The present application may train the ADDA model using only first CSI gesture data from a first environment, and then the trained ADDA model may be used to gesture classify second CSI gesture data from another, different, second environment without requiring retraining by collecting tagged data from the second environment.

In a preferred embodiment, the ADDA model is further refined by using a Wassertein distance based loss function as the loss function that generates the countermeasures. Specifically, the penalty function in the ADDA model that generates the countermeasures is exchanged for a penalty function based on Wasserstein distance. That is, in the preferred embodiment, the present application further improves on the ADDA model described above, and the feature extractor uses the CNN-BiLSTM network model, and converts the penalty function for generating the countermeasures into a penalty function based on the Wassertein distance.

As described above, using a loss function based on the Wassertein distance as the loss function in the ADDA model can make the model trained by some environment data still have higher accuracy when applied to other different environments, thereby further solving the problem of poor universality of the gesture recognition method.

More specifically, the present application improves the loss function based on the Wassertein distance over the original GAN objective function. The ADDA model may include a source domain feature extractor, a destination domain feature extractor, a domain discriminator, a classifier, and a generator. The improvement in the application removes sigmoid of the last layer of the discriminator, and the loss function of the generator and the discriminator does not take log, and the loss function of the discriminator is as follows:

where E represents a loss function in machine learning, M_SRepresenting source Domain feature maps, M_tRepresenting a target Domain feature map, x_tAnd x_sRepresenting some two points in space.

Wherein D is a discriminator and needs to satisfy the 1-Lisschitz constraint as follows:

|D(x₁)-D(x₂)|≤|x₁-x₂|

wherein x₁And x₂Representing the coordinates of some two points.

In order to satisfy the 1-Lisschitz constraint, a gradient penalty term needs to be added to the loss function of the discriminator, and the formula is as follows:

finally, the optimization objective formula of the domain discriminator is as follows:

where λ is a constant between 0 and 1.

The final optimization goal of the feature extractor is as follows:

further, the process of training the improved confrontation discriminant domain adaptive ADDA model with the first CSI gesture data may include:

the first CSI gesture data is divided into source domain data and target domain data. As an example, the division may be randomly performed at a preset scale.

Feature extraction and gesture classification are performed on the source domain data using a CNN-BiLSTM network model to obtain a gesture class (e.g., gesture label) and extracted features.

An improved ADDA model is trained using the gesture class and extracted features and the target domain data. Specifically, the source domain data (i.e., the gesture type and the extracted features) and the target domain data after the preprocessing are respectively sent to a source domain feature extractor and a target domain feature extractor, and a domain discriminator is used to determine which domain the features come from. In addition, at the time of model training, the parameters of the source domain feature extractor are fixed, and the weight values of the target domain feature extractor are initialized by the parameters in the source domain feature extractor.

S13: classifying the second CSI gesture data by using the trained improved ADDA model to obtain a corresponding gesture category. Specifically, the trained target domain feature extractor and classifier are used for classifying the second CSI gesture data to obtain a target domain gesture label.

Fig. 2 shows a flowchart of a method for obtaining CSI gesture data (the first CSI gesture data and/or the second CSI gesture data described above) according to an embodiment of the present application. The method comprises the following steps:

s111: raw CSI gesture data is collected. Specific embodiments thereof will be described below with reference to fig. 3 and 4, wherein fig. 3 shows a gesture capture device layout, and fig. 4 shows a method flowchart of one specific embodiment for collecting raw CSI gesture data.

S112: denoising the original CSI gesture data. A specific embodiment thereof will be described below with reference to fig. 5.

S113: extracting a gesture segment from the denoised raw CSI gesture data. A specific embodiment thereof will be described below with reference to fig. 6.

FIG. 3 illustrates a gesture capture device layout. In fig. 3, RX denotes a receiving end and TX denotes a transmitting end. The placement of RX and TX may be the same in different environments, except that the surrounding environment is different. The two receivers RX may be spaced two meters apart, the transmitter may be at 1.5m from the center of the two receivers, and the experimenter station may be at O-point 0.75 m from the transmitter. In order to match with the actual situation as much as possible, the experimenter makes gesture motions towards the directions of five arrows in the drawing respectively towards the TX, and the receiving end acquires gesture data by using a CSI-Tools tool. Of these, the five arrow directions are merely examples, and the present application is not limited in this respect.

Fig. 4 shows a flow diagram of a method for collecting raw CSI gesture data according to an embodiment of the application. The method comprises the following steps:

s1111: the receiving and transmitting ends are connected. Specifically, the process configures WIFI through terminal commands using a CSI-Tool software package.

S1112: and setting a mode. Specifically, a WIFI communication channel and a data sampling frequency are selected.

The existing WIFI communication channels generally include 2.4G and 5G, and since there are many devices using the 2.4G channel and the interference is large, the present application preferably selects to perform on the 5G channel with the small interference.

In an example, the gesture actions made by the experimenter can select four commonly used gestures of ' swing up ' hand, swing down ' hand, swing left hand ' hand and swing right hand ', the gesture recognition is performed after the four gestures are collected, the gesture actions made by the experimenter are about 1.5s, and the collection time of the whole actions is about 2 s. In this example, since the collected gesture actions are simple and can be completed in about 1.5s, if the sampling frequency is too low, the precise gesture actions cannot be captured, and meanwhile, the problem of packet drop can also occur, so that the sampling frequency can be set to 1000 HZ.

Further, in this example, to improve the accuracy of gesture recognition, the present application may assume that the experimenter's initial state is with hands perpendicular to the body torso, arms are bent up and down perpendicular to the ground when making "up swing" and "down swing" gestures, respectively, and arms are bent left and right parallel to the ground when making "left swing" and "right swing", respectively. Meanwhile, when the four actions are performed in five different directions, the constant-speed actions are kept as much as possible, and the front and the back of each action stay for a period of time as much as possible.

S1113: and (4) data transceiving. As a specific embodiment, the present application enables four people to perform four different gesture actions between a transmitting end (TX) and a receiving end (RX), respectively, and collects multipath effects generated by refraction, reflection, and other phenomena of signals on a propagation path.

S1114: and (4) storing data. Specifically, collected CSI gesture data may be analyzed by matlab to obtain a CSI data packet with a three-dimensional data dimension.

Fig. 5 shows a flow diagram of a method for denoising raw CSI gesture data according to an embodiment of the application. The method adopts Discrete Wavelet Transform (DWT) to perform denoising pretreatment on gesture CSI data. The method comprises the following steps:

s1121: a wavelet function is selected. When discrete wavelet transform is performed, a proper wavelet function is selected first, and then multi-scale decomposition is performed on signals. The wavelet basis functions are not unique, and the denoising effects based on different wavelet bases are different, and need to be selected according to specific situations. As a preferred embodiment, the present application selects the Symlets wavelet basis function, which can obtain CSI with finer granularity.

S1122: and (5) performing wavelet transform multi-scale decomposition. Specifically, in this stage, the signal is divided into an approximation coefficient vector and a detail coefficient vector, an appropriate number of layers is selected for decomposition to obtain an approximation coefficient and a detail coefficient, and then the two coefficients are used to reconstruct the data. Assuming that the discrete signal is denoted as H (t), the decomposition can be expressed as:

H(t)＝A_n+D_n+D_n-1+…+D₁

where n denotes the number of layers of the decomposition, a denotes the low frequency approximation part, and D denotes the high frequency detail part. Describing the coefficients of each layer decomposition in further detail, the following two equations:

wherein,

the approximate coefficients are represented by the coefficients of the approximation,

indicating detail coefficient, x_nWhich represents the input of the n-th layer,<·>the dot product is represented by the sum of the dot products,

and

are two sets of orthogonal wavelet basis functions, k referring to a point in the band.

S1123: and performing threshold quantization processing on the wavelet coefficients on each scale. In particular, the present application may select a dynamic threshold to remove the noise component from the detail coefficient. And comparing the decomposed coefficients after the wavelet transformation with a set threshold, if the absolute value of the decomposed coefficients exceeds the threshold, not processing the decomposed coefficients, and otherwise, setting the value of the wavelet coefficients to be 0.

S1124: the inverse wavelet transform reconstructs the signal. The present application may employ inverse wavelet transform to reconstruct the signal. The inverse of the discrete wavelet transform is represented by:

fig. 6 illustrates a flow diagram of a method for extracting gesture fragments from denoised raw CSI gesture data according to an embodiment of the application. In the specific embodiment, a buffer-based gesture segment intercepting algorithm is adopted to intercept gesture segments of gesture CSI data. The gesture segment intercepting algorithm based on the buffer area is improved on the basis of gesture segment intercepting based on the sliding window, and the buffer area is added to buffer gesture segments when the starting stage and the ending stage of gesture data are judged. The method comprises the following steps:

s1131: the size of the sliding window is determined. In particular, the size of the sliding window may be determined in dependence on the frequency of the actual data samples.

The extraction of the gesture segment is influenced by the size of the sliding window, and if the window value is too large, redundant data are contained at two ends of the detected gesture segment; if the window value is too small, the more gradual part in the middle of the gesture segment can be mistakenly detected as the termination frame. After the buffer mechanism is adopted, the defect that the window is too small is avoided. In the case of a sample of experimentally collected data of 1000HZ, the window size may be chosen to be 100 data packets, i.e. the size of the sliding window corresponds to a time length of 0.1 s.

S1132: the variance in each window is calculated. Specifically, the mean and variance in each window are calculated as follows:

wherein S (i) represents the amplitude value of the ith window, mean (i)ⁿRepresents the average amplitude value of the nth subcarrier in the ith window, K represents the window size, and var (i) represents the window variance.

S1133: and extracting a gesture fragment from the de-noised raw CSI gesture data according to the variance and the size of the buffer area. In particular, window numbers corresponding to the start and end positions of the gesture segment may be determined by a variance threshold, a size of the buffer, and a corresponding threshold. The specific determination manner will be described below with reference to fig. 7, where fig. 7 shows a flowchart for determining the start of a gesture segment according to an embodiment of the present application.

As shown in fig. 7, the parameters are initialized, the variance threshold size thresh, the sizes of the two buffers and the corresponding thresholds (buf1, buf2, θ 1, θ 2) are set, and then the window numbers corresponding to the start and end positions of the gesture segment are determined.

When the windows are traversed sequentially, the segment corresponding to the window is set as S (i), and the average variance value is set as Var (i). When Var (i) is greater than threshold thresh for the first time, the window is stored in buffer 1, and if the value in buffer 1 is greater than threshold θ 1, the window sequence number corresponding to the gesture start is the value obtained by subtracting values in buffer 1 and buffer 2 from the window sequence number at the moment. Otherwise, the window is stored in buffer 2, if the value in buffer 2 is greater than threshold θ 2, the values in both buffers are emptied, and the previous steps are repeated to traverse the next window. The window sequence number t1 corresponding to the gesture start is determined through the above operations.

After the gesture starting position is determined, traversing a window from the gesture starting position, if the average variance value Var (i) of the window is larger than a threshold thresh, firstly checking whether a buffer area is empty, if not, firstly adding the window value in the buffer area into a gesture section, and then emptying the buffer area; otherwise, storing the window sequence into a buffer area, and if the buffer area is full, ending the judgment. And repeating the steps until all windows are traversed or the algorithm is judged to be finished. And synthesizing the window sequence numbers corresponding to the start and the end of the gesture segment to obtain a complete gesture segment.

Accordingly, referring to fig. 8, an embodiment of the present application provides an electronic device, which includes: a data obtaining unit 801, configured to obtain first CSI gesture data from a first environment; a model training unit 802 for training an improved confrontational discriminant domain adaptive ADDA model using the first CSI gesture data, wherein the ADDA model is improved by using a Convolutional Neural Network (CNN) -bidirectional long-short term memory (BilsTM) network model as a feature extractor, the CNN-BilsTM network model is a network model combining CNN and BilsTM based on an attention mechanism, and the ADDA model is used for gesture recognition based on second CSI gesture data from a second environment, wherein the first environment and the second environment are different.

Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.

An embodiment of the present invention further provides an electronic device, as shown in fig. 9, where the electronic device may include a processor and a memory, where the processor and the memory may be connected by a bus or in another manner, and fig. 9 takes the connection by the bus as an example.

The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.

The memory, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the gesture recognition method in the embodiments of the present invention. The processor executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory, that is, the gesture recognition method in the above method embodiment is implemented.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory and, when executed by the processor, perform the gesture recognition method as described above.

The details of the electronic device are the same as those of the corresponding embodiments, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A gesture recognition method, comprising:

obtaining first CSI gesture data from a first environment;

training an improved confrontation discriminant domain adaptive ADDA model using the first CSI gesture data, wherein the ADDA model is improved by using a Convolutional Neural Network (CNN) -bidirectional long-short term memory (BilsTM) network model as a feature extractor, the CNN-BilsTM network model being a network model combining CNN and BilsTM based on an attention mechanism;

the ADDA model is used for gesture recognition based on second CSI gesture data from a second environment, wherein the first environment and the second environment are different.

2. The gesture recognition method of claim 1, wherein the method further comprises: obtaining the second CSI gesture data;

3. The gesture recognition method of claim 1 or 2, wherein the ADDA model is further refined by using a Wassertein distance based loss function as a loss function to generate countermeasures.

4. The gesture recognition method of claim 1, wherein the CNN-BiLSTM network model is trained with third CSI gesture data from the first environment or the second environment.

5. The method of gesture recognition of claim 1, wherein the training of an improved confrontational discriminant domain adaptive ADDA model using the first CSI gesture data comprises:

dividing the first CSI gesture data into source domain data and target domain data;

performing feature extraction and gesture classification on the source domain data by using the CNN-BilSTM network model to obtain gesture categories and extracted features; and

training the improved ADDA model using the gesture classes and extracted features and the target domain data.

6. The gesture recognition method of claim 1, wherein the first CSI gesture data and/or the second CSI gesture data comprise CSI gesture data associated with gesture actions performed in a plurality of different directions.

7. The gesture recognition method according to claim 2, wherein the obtaining of the first CSI gesture data from the first environment and/or the obtaining of the second CSI gesture data comprises:

collecting original CSI gesture data, wherein the original CSI gesture data are CSI data packets with three-dimensional data dimensions;

denoising the original CSI gesture data; and

extracting a gesture segment from the denoised raw CSI gesture data.

8. The gesture recognition method of claim 7, wherein the extracting gesture segments from the denoised raw CSI gesture data comprises:

determining the size of a sliding window;

calculating a variance of the de-noised raw CSI gesture data in each window; and

and extracting a gesture fragment according to the variance and the size of the buffer area.

9. An electronic device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the gesture recognition method of any of claims 1-8.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform the gesture recognition method of any one of claims 1-8.