CN107798349A

CN107798349A - A kind of transfer learning method based on the sparse self-editing ink recorder of depth

Info

Publication number: CN107798349A
Application number: CN201711069171.XA
Authority: CN
Inventors: 胡学钢; 张玉红; 朱毅; 李培培; 周鹏
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2017-11-03
Filing date: 2017-11-03
Publication date: 2018-03-13
Anticipated expiration: 2037-11-03
Also published as: CN107798349B

Abstract

The invention discloses a kind of transfer learning method based on the sparse self-editing ink recorder of depth, including：（1）Pretreatment early stage of data set vectorization.（2）Modelling and realization.（3）The extraction feature obtained using Stacked RICA algorithms does semi-supervised learning.（4）After feature extraction is complete, with logistic regression analysis models（LR）Grader is trained on training set.（5）Classification prediction is done on test set with the grader on training set.（6）The classification on test set is completed, obtains final transfer learning result.The present invention improves the effect of feature extraction, improves the precision of transfer learning, has higher robustness and practicality.

Description

Transfer learning method based on depth sparse self-coding machine

Technical Field

The invention relates to the field of feature extraction and transfer learning methods, in particular to a transfer learning method based on a deep sparse self-coding machine.

Background

Traditional machine learning has achieved significant success in many areas. However, many machine learning algorithms are based on the assumption that the training set and the test set are homologized, independent and homologized, and most machine learning needs to recapture data when the data set distribution changes, which requires a large amount of training data to be collected again. In real-world applications, the environment is constantly changing, data is re-collected, and the model is retrained for each new scenario encountered by the learning system, which is very costly and impractical. It is desirable that the learning system automatically adapt to changes in the environment with little retraining data and retraining time. Under the condition, the migration knowledge which is obtained from the former scene and can be applied to the new scene can help us to speed up the learning process, reduce the cost of collecting new training data and achieve the goal of migration learning. Migration learning emphasizes the transfer of knowledge across domains, tasks, and distributions, which are similar but not identical. For example, learning to recognize an apple may help to learn a pear, or learning to play an electronic organ may help to learn a piano. The research of transfer learning is essentially that people always apply the original knowledge to solve new problems more quickly.

In recent years, Deep Learning (Deep Learning) has been used to extract features in the fields of images, text, audio, etc. with much progress and good results. The information processing of the human visual system is hierarchical in terms of human perception. Edge features are extracted from the low-level V1 region, to the shape of the V2 region or part of the object, to higher levels, the entire object, the behavior of the object, etc. That is, the feature at the upper layer is the combination of the features at the lower layer, and the feature representation from the lower layer to the upper layer is more and more abstract and can more and more express the semantics or the intention. The higher the abstraction level, the fewer possible guesses there are, and the more classification is facilitated. Deep learning is proposed to mimic this process. Therefore, the essence of deep learning is to learn more useful features by constructing machine learning models and training data with many hidden layers, thereby finally improving the accuracy of classification or prediction. Different from the traditional shallow learning, the deep learning is different in that: 1) emphasizes the depth of the model structure; 2) by means of layer-by-layer feature transformation, the feature representation of the sample in the original space is transformed to a new feature space, so that classification or prediction is easier.

A sparse self-coding machine is a method for extracting data features. This has the advantage that a set of linearly independent over-complete bases (over-completed bases) can be extracted to reconstruct the sample. The general model for extracting the feature base vectors can only ensure that the base vectors are linearly uncorrelated, and the model cannot be well applied in some applications. For example, some audio is collected, and the audio has personal voices, which are independent of each other, and we want to separate the audio of each person, so the model is disabled at this time. We use the RICA (Reconstruction Independent Component analysis) algorithm, and the goal is to learn a set of mutually Independent overcomplete bases.

The deep sparse self-coding machine is based on the thought of deep learning, the sparse self-coding machine is used as one layer of a model to be superposed, namely the result of the sparse self-coding machine at the previous layer is output and is used as the input of the next layer, so that a multi-layer deep learning structure is formed, and more useful characteristics are extracted. And then, semi-supervised learning is performed according to the extracted features, so that the precision and the accuracy of the transfer learning are improved.

In the research on the feature extraction and the migration learning method, the existing methods are all researches carried out by using self-coding models, and the research work of using sparse coding models is very little. Sparse coding is one of effective means for reducing dimensions in the fields of images, texts and the like, but the application of sparse coding in field adaptation has some problems, and the common problems are as follows: (a) the problem of non-independent linearity between characteristic basis vectors; (b) the problem of the use of tags in the source domain; (c) and (4) the objective function bias term after superposition. If the problems cannot be solved well, the accuracy of feature extraction and transfer learning is inevitably affected, and the invention provides a solution to the problems.

Disclosure of Invention

The invention aims to provide a transfer learning method based on a depth sparse self-coding machine, which aims to solve the problems of the prior art in feature extraction and transfer learning methods.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a transfer learning method based on a depth sparse self-coding machine is characterized by comprising the following steps: the method sequentially comprises the following steps:

(1) and whitening preprocessing is carried out on all images in the image database, and the process is as follows:

(1.1) representing the input dataset as { x }⁽¹⁾,x⁽²⁾,...,x⁽ⁿ⁾Calculating the covariance matrix of xThen, calculating the eigenvector of the covariance matrix, and forming a matrix U according to the column as shown in the following formula:

in the matrix U, U₁Is the principal eigenvector, which corresponds to the largest eigenvalue, u₂Is a sub-characteristicVector, analogize it and remember λ₁,λ₂,...,λ_nCharacteristic values corresponding to all the vector quantities in the matrix U are obtained;

(1.2) representing the input data by the calculated matrix U as shown in the following equation:

wherein the subscript rot refers to rotation, which means that it is the result of the rotation processing of the original data, in order to make each input feature have unit variance, the use ofScaling each feature x as a scaling factor_rot,iWherein the resulting PCA whitened data is represented by the following formula:

(1.3) let R be any orthogonal matrix, i.e. satisfy RR^T＝R^TR is I, then Rx_PCAwhiteStill with unit covariance, in order to make the input data after whitening processing for all possible R as close to the original input data as possible, let R be U, resulting in formula (1):

x_ZCAwhite＝Ux_PCAwhite(1),

x_ZCAwhitenamely processing data of original input data obtained after ZCA whitening;

(2) constructing a depth sparse self-coding machine model to extract high-level abstract features of the image, wherein the process is as follows:

(2.1) constructing a sparse self-encoding machine model, comprising the following steps:

(2.1.1) sparse self-encoding model Using Reconstruction indexThe dependency component analysis algorithm, RICA algorithm, x obtained by equation (1)_ZCAwhiteIs input data of the RICA algorithm and substitutes the cost function formula (2):

in the cost function formula (2), x is the input data, i.e. x_ZCAwhiteW is a weighting matrix;

(2.1.2) obtaining the partial derivative of x for the cost function formula (2), wherein the partial derivative of the first term of the cost function formula (2) is obtainedAs the partial derivative function, the obtained partial derivative function is shown in formula (3):

and (2.1.3) iteratively calculating a weighting matrix W by using an L-FBGS algorithm to obtain a trained sparse self-coding model.

(2.2) constructing a deep sparse self-coding machine model:

substituting the weighting matrix W obtained in the step (2.1) into the cost function formula (2), and recording the obtained output asIs output data obtained after the training of the single-layer RICA model is finished, and the output data is obtainedRepeating step (2.1) as input data to obtain W⁽ⁱ⁾Training a weighting matrix obtained after stacking the sparse self-coding machine, wherein i is the number of times of the iteration step (2.1);

(2.3) extracting features according to the trained deep sparse self-coding machine model;

pooling the square root of the model square root with the weighting matrix W obtained in step (2.1)⁽ⁱ⁾Substituting the formula (4) for convolution feature extraction, wherein the formula (4) is as follows:

in the formula (4), the first and second groups,represents the input of the l-th layer in the convolutional network,the error term for layer l +1 representing the kth feature in the convolutional network, the output from equation (4) is denoted as x_feaAbstract features extracted from the raw input data;

(3) and optimizing features by semi-supervised learning:

using x obtained in step (2)_feaPerforming semi-supervised learning for input to obtain a formula (5), adding KL distance of source domain distribution and target domain distribution and multiclass regression bias terms according to source domain class labels,denotes the output, W, obtained after semi-supervised learning_SSLRepresenting a weight matrix in semi-supervised learning, ξ^(s)Output representing hidden layers in the source domain, ξ^(t)Representing the output of the hidden layer in the target domain, equation (5) is as follows:

in the formula (5), the first and second groups,represents fromReconstructing errors between original data and data re-represented after feature extraction;

KL distances representing source and target domain distributions;

representing multi-class regression bias terms made according to the source domain class labels;

matrix W representing characteristic parameters_SSLThe constraint term of (2);

(4) training a classifier and classifying the test image data set, wherein the process is as follows:

(4.1) training an LR classifier with the test image dataset; in the LR classification, note:

in formula (6), scaleOutputting the finished product of the step (3) for sigmoid functionAnd a label-substitution-in (6) training classifier of y, the training image dataset, of known labels;

(4.2) classifying the test image data set by using the trained classifier; outputting the finished product of the step (3)The output of the test data set in (1) is substituted for the LR classifier finished in (6), and the classification result T of the test image data set is obtained_testAs shown in equation (7):

T_test＝argmaxP(x) (7)。

the invention provides a transfer learning method based on a depth sparse self-coding machine. According to the method, from the aspect of deep learning, a sparse self-coding machine model applying an RICA algorithm is applied to data set feature extraction, and through a multi-layer superposition idea of deep learning, a deep sparse self-coding machine is constructed through a Stacked RICA algorithm and a linear independent over-complete feature basis vector is trained and extracted. And on the basis of the feature basis vector, a semi-supervised learning method is applied to add the source domain class labels and the bias terms of the multi-class regression, so that the extracted features are further optimized. And finally, training a classifier by applying a support vector machine model according to the extracted features to realize classification prediction of the target domain and finish the target of transfer learning. The method can extract more useful features in the data set, improves the classification precision in the target domain, and obviously improves the accuracy and precision of the transfer learning.

The invention solves the important practical problem of feature extraction and transfer learning, the research result can be directly applied to image classification, text classification, emotion transfer and other applications, and can be expanded to be applied to a plurality of fields such as audio, web pages, videos and the like, and the invention has important application value, and once the research is successful and put into application, huge social and economic benefits can be generated.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention realizes the characteristic representation of the extracted data from the level research of the independent component analysis model, and improves the robustness of the represented characteristic compared with the traditional characteristic extraction algorithm (sparse coding or self-coding).

2. According to the invention, through a method of a hierarchical structure in deep learning, on the basis of analyzing data of a data set, a staged RicA algorithm is researched and provided, a source domain label and a multi-class regression target function are considered in a multilayer structure, the condition of the source domain label is applied to optimization of the target function, more useful characteristics in the data set can be extracted, the classification precision in the target domain is improved, and the accuracy of transfer learning is improved.

3. The invention can be applied to a plurality of fields such as images, texts, audios, videos and the like, and has important application value. Moreover, the results of the research based on the Stacked RICA can also be applied to many pattern classification fields related to transfer learning, such as image recognition, emotion classification, theme classification, voice recognition and robot system.

Drawings

Fig. 1 is a flowchart of a specific study scheme of feature extraction and transfer learning according to the present invention.

FIG. 2 is a schematic diagram of the hierarchy of the RICA model.

FIG. 3 is a schematic diagram of an analysis of a Stacked RICA model according to the present invention.

Detailed Description

As shown in fig. 1, fig. 1 is a flow chart of the method of the present invention, and the specific implementation in fig. 1 is as follows:

(1) in order to train better characteristics, the training data set and the test data set are spliced and vectorized to obtain a vectorized data set.

(2) For the vectorized text data set, a Stacked Reconstruction independent component Analysis (Stacked RICA) model is used for feature extraction, and the specific process is as follows:

1) whitening data with the ZCA method:

ZCA whiteningIs a data preprocessing method that maps data from x to x_ZCAwhiteIt has also proven to be a rough model of the processed image of a biological eye (retina). For example, when your eye perceives an image, most adjacent "pixels" are perceived as similar values in the eye, since adjacent parts in an image are very correlated in brightness. Therefore, it is very cost-inefficient if the human eye needs to transmit each pixel value separately (via the optic nerve) into the brain. Instead, the retina performs a decorrelation operation similar to that in ZCA, thereby obtaining a less redundant representation of the input image, and transmits it to the brain. In feature extraction, the input is redundant for training purposes due to the strong correlation between adjacent instances or expressions in the dataset. The purpose of whitening is to reduce the redundancy of the input, and the input of the learning algorithm has the following properties through the whitening process: (i) the correlation between features is low; (ii) all features have the same variance. The result of ZCA whitening can be expressed as:

2) feature extraction based on Stacked RICA

The method comprises the following specific steps:

① Single layer RICA extraction features

A Reconstruction Independent Component Analysis (RICA) algorithm is designed to extract features according to the idea of fig. 2. Assuming that given an input of x, the present invention is intended to derive a linearly independent set of bases (denoted by W), the objective function can be expressed as:

J(W)＝||Wx||₁

in the expression, Wx represents the characteristic representation of input x, in RICA, in order to ensure that mutually linearly independent overcomplete bases are obtained, the invention solves the following objective function:

where λ is the weight attenuation coefficient, W is the weight proof, and x is the input data. To solve the objective function:

first, the first step requires solving by a method of requiring derivativesI.e. to solve for

As shown in fig. 2, the weights and activation functions in the model are as follows:

let J (z)⁽⁴⁾) F (x), then J (z)⁽⁴⁾)＝∑_kJ(z_k ⁽⁴⁾)。

After the model input is set to F, the problem is converted to a solutionAlthough W appears twice in the model, it can be shown that when W appears multiple times in the neural network, the partial derivative with respect to W is the sum of partial derivatives with respect to each W instance in the network, as follows:

as described above, the present invention first derives a partial derivative for each W instance,

with respect to W^TThe following steps:

regarding W:

the final method for solving partial derivatives of W is:

the second step is an iteration by the method with l-bfgs. The invention is completed by the following cost function:

w finally obtained after multiple iterations is a group of linear independent overcomplete bases of the original input x. From this set of bases we can get a more useful characterization Wx of the original input data x.

② superposition of RICA (stacked RICA) computational feature representation

FIG. 3 is a model diagram of a Stacked RICA model according to the present invention, illustrating that the Stacked RICA model consists of an input layer, two hidden layers, and an output layer. The Stacked RICA model is based on the idea of deep learning, RICA structures are Stacked, namely a stronger characteristic representation z obtained after a single-layer RICA is finished is used as the input of the next-layer RICA algorithm, and then each layer of iteration optimization parameters is used for optimizing an objective function. Finally obtaining the characteristic representation of the original input data through multilayer superposition

(3) After the feature extraction work is done by Stacked RICA, the resulting feature representation is usedInstead of the original input data x, pairAnd performing Semi-Supervised Learning (Semi-Supervised Learning) and adding consideration of bias terms, wherein the bias terms comprise KL distances of source domain distribution and target domain distribution and multi-class regression bias terms according to source domain class labels, and the label information of the source domain is applied to optimization of feature representation. By optimizing the objective function, the feature representations of the source domain and the target domain for classification are obtained.

The objective function can be expressed as:

wherein,representing the reconstruction error from the original data to the data re-represented after feature extraction.

Indicating KL distances of the source domain distribution and the target domain distribution.

And representing multi-class regression bias terms made according to the source domain class labels.

Representing the constraint terms of the characteristic parameter matrix W.

(4) After all the processes of feature extraction and selection are completed, the obtained features of the source domain are used for representing, and a classifier is trained in the source domain, wherein the tool for training the classifier is a Support Vector Machine (SVM), a logistic regression analysis model (LR) or a module classifier.

(5) And (4) carrying out classification prediction in the target domain by using the classifier obtained by the source domain training, thereby applying the classifier in the source domain to the target domain.

(6) And obtaining a final migration learning result.

Claims

1. A transfer learning method based on a depth sparse self-coding machine is characterized by comprising the following steps: the method sequentially comprises the following steps:

(1.1) representing the input dataset as { x }⁽¹⁾,x⁽²⁾,...,x⁽ⁿ⁾Calculating the covariance matrix of xThen calculates the covariance momentThe characteristic vector of the array is shown as the following formula according to the array composition matrix U:

in the matrix U, U₁Is the principal eigenvector, which corresponds to the largest eigenvalue, u₂Is a sub-feature vector, analogizes with the same, and records λ₁,λ₂,...,λ_nCharacteristic values corresponding to all the vector quantities in the matrix U are obtained;

<mrow> <msub> <mi>x</mi> <mrow> <mi>P</mi> <mi>C</mi> <mi>A</mi> <mi>w</mi> <mi>h</mi> <mi>i</mi> <mi>t</mi> <mi>e</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <msqrt> <msub> <mi>&lambda;</mi> <mi>i</mi> </msub> </msqrt> </mfrac> <mo>;</mo> </mrow>

x_ZCAwhite＝Ux_PCAwhite(1),

(2.1.1), the sparse self-coding model uses a Reconstruction Independent Component Analysis algorithm (RICA) to obtain x by using a formula (1)_ZCAwhiteIs input data of the RICA algorithm and substitutes the cost function formula (2):

<mrow> <mtable> <mtr> <mtd> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> </mtd> <mtd> <mrow> <mi>&lambda;</mi> <mo>|</mo> <mo>|</mo> <mi>W</mi> <mi>x</mi> <mo>|</mo> <msub> <mo>|</mo> <mn>1</mn> </msub> <mo>+</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> <mo>|</mo> <mo>|</mo> <msup> <mi>W</mi> <mi>T</mi> </msup> <mi>W</mi> <mi>x</mi> <mo>-</mo> <mi>x</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mo>&dtri;</mo> <mi>W</mi> </msub> <mi>F</mi> <mo>=</mo> <mi>&lambda;</mi> <mrow> <mo>(</mo> <mi>W</mi> <mi>x</mi> <mo>/</mo> <msqrt> <mrow> <msup> <mrow> <mo>(</mo> <mrow> <mi>W</mi> <mi>x</mi> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <mi>&epsiv;</mi> </mrow> </msqrt> <mo>)</mo> </mrow> <msup> <mi>x</mi> <mi>T</mi> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>2</mn> <mo>(</mo> <mrow> <msup> <mi>W</mi> <mi>T</mi> </msup> <mi>W</mi> <mi>x</mi> <mo>-</mo> <mi>x</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <msup> <mi>x</mi> <mi>T</mi> </msup> <mo>+</mo> <mn>2</mn> <mrow> <mo>(</mo> <mi>W</mi> <mi>x</mi> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msup> <mi>W</mi> <mi>T</mi> </msup> <mi>W</mi> <mi>x</mi> <mo>-</mo> <mi>x</mi> <mo>)</mo> </mrow> <mi>T</mi> </msup> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

(2.2) constructing a deep sparse self-coding machine model:

<mrow> <msub> <mo>&dtri;</mo> <mi>W</mi> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>;</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>i</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>*</mo> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mn>90</mn> <mrow> <mo>(</mo> <msubsup> <mi>&delta;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <mn>2</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

(3) and optimizing features by semi-supervised learning:

<mrow> <mo>,</mo> <mi>J</mi> <mo>=</mo> <msub> <mi>J</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mrow> <mi>f</mi> <mi>e</mi> <mi>a</mi> </mrow> </msub> <mo>,</mo> <mover> <mi>x</mi> <mo>^</mo> </mover> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&alpha;gJ</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <msup> <mi>&xi;</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> </msup> <mo>,</mo> <msup> <mi>&xi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&beta;gJ</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mrow> <mi>S</mi> <mi>S</mi> <mi>L</mi> </mrow> </msub> <mo>,</mo> <msup> <mi>&xi;</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&gamma;gJ</mi> <mn>4</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mrow> <mi>S</mi> <mi>S</mi> <mi>L</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

in the formula (5), the first and second groups,representing the reconstruction error from the original data to the data re-represented after feature extraction;

<mrow> <msub> <mi>J</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <msup> <mi>&xi;</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> </msup> <mo>,</mo> <msup> <mi>&xi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>D</mi> <mrow> <mi>K</mi> <mi>L</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mi>s</mi> </msub> <msub> <mi>PP</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>D</mi> <mrow> <mi>K</mi> <mi>L</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mi>t</mi> </msub> <msub> <mi>PP</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>P</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mi>ln</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mi>P</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>P</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>P</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mi>ln</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mi>P</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>P</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>

KL distances representing source and target domain distributions;

matrix W representing characteristic parameters_SSLThe constraint term of (2);

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>h</mi> <mi>&theta;</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <msup> <mi>&theta;</mi> <mi>T</mi> </msup> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msup> <mi>&theta;</mi> <mi>T</mi> </msup> <mi>x</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>=</mo> <mn>0</mn> <mo>|</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mi>P</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>h</mi> <mi>&theta;</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

T_test＝arg max P(x) (7)。