Nothing Special   »   [go: up one dir, main page]

CN107798349A - A kind of transfer learning method based on the sparse self-editing ink recorder of depth - Google Patents

A kind of transfer learning method based on the sparse self-editing ink recorder of depth Download PDF

Info

Publication number
CN107798349A
CN107798349A CN201711069171.XA CN201711069171A CN107798349A CN 107798349 A CN107798349 A CN 107798349A CN 201711069171 A CN201711069171 A CN 201711069171A CN 107798349 A CN107798349 A CN 107798349A
Authority
CN
China
Prior art keywords
mrow
msub
mtd
msup
mtr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711069171.XA
Other languages
Chinese (zh)
Other versions
CN107798349B (en
Inventor
胡学钢
张玉红
朱毅
李培培
周鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201711069171.XA priority Critical patent/CN107798349B/en
Publication of CN107798349A publication Critical patent/CN107798349A/en
Application granted granted Critical
Publication of CN107798349B publication Critical patent/CN107798349B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of transfer learning method based on the sparse self-editing ink recorder of depth, including:(1)Pretreatment early stage of data set vectorization.(2)Modelling and realization.(3)The extraction feature obtained using Stacked RICA algorithms does semi-supervised learning.(4)After feature extraction is complete, with logistic regression analysis models(LR)Grader is trained on training set.(5)Classification prediction is done on test set with the grader on training set.(6)The classification on test set is completed, obtains final transfer learning result.The present invention improves the effect of feature extraction, improves the precision of transfer learning, has higher robustness and practicality.

Description

Transfer learning method based on depth sparse self-coding machine
Technical Field
The invention relates to the field of feature extraction and transfer learning methods, in particular to a transfer learning method based on a deep sparse self-coding machine.
Background
Traditional machine learning has achieved significant success in many areas. However, many machine learning algorithms are based on the assumption that the training set and the test set are homologized, independent and homologized, and most machine learning needs to recapture data when the data set distribution changes, which requires a large amount of training data to be collected again. In real-world applications, the environment is constantly changing, data is re-collected, and the model is retrained for each new scenario encountered by the learning system, which is very costly and impractical. It is desirable that the learning system automatically adapt to changes in the environment with little retraining data and retraining time. Under the condition, the migration knowledge which is obtained from the former scene and can be applied to the new scene can help us to speed up the learning process, reduce the cost of collecting new training data and achieve the goal of migration learning. Migration learning emphasizes the transfer of knowledge across domains, tasks, and distributions, which are similar but not identical. For example, learning to recognize an apple may help to learn a pear, or learning to play an electronic organ may help to learn a piano. The research of transfer learning is essentially that people always apply the original knowledge to solve new problems more quickly.
In recent years, Deep Learning (Deep Learning) has been used to extract features in the fields of images, text, audio, etc. with much progress and good results. The information processing of the human visual system is hierarchical in terms of human perception. Edge features are extracted from the low-level V1 region, to the shape of the V2 region or part of the object, to higher levels, the entire object, the behavior of the object, etc. That is, the feature at the upper layer is the combination of the features at the lower layer, and the feature representation from the lower layer to the upper layer is more and more abstract and can more and more express the semantics or the intention. The higher the abstraction level, the fewer possible guesses there are, and the more classification is facilitated. Deep learning is proposed to mimic this process. Therefore, the essence of deep learning is to learn more useful features by constructing machine learning models and training data with many hidden layers, thereby finally improving the accuracy of classification or prediction. Different from the traditional shallow learning, the deep learning is different in that: 1) emphasizes the depth of the model structure; 2) by means of layer-by-layer feature transformation, the feature representation of the sample in the original space is transformed to a new feature space, so that classification or prediction is easier.
A sparse self-coding machine is a method for extracting data features. This has the advantage that a set of linearly independent over-complete bases (over-completed bases) can be extracted to reconstruct the sample. The general model for extracting the feature base vectors can only ensure that the base vectors are linearly uncorrelated, and the model cannot be well applied in some applications. For example, some audio is collected, and the audio has personal voices, which are independent of each other, and we want to separate the audio of each person, so the model is disabled at this time. We use the RICA (Reconstruction Independent Component analysis) algorithm, and the goal is to learn a set of mutually Independent overcomplete bases.
The deep sparse self-coding machine is based on the thought of deep learning, the sparse self-coding machine is used as one layer of a model to be superposed, namely the result of the sparse self-coding machine at the previous layer is output and is used as the input of the next layer, so that a multi-layer deep learning structure is formed, and more useful characteristics are extracted. And then, semi-supervised learning is performed according to the extracted features, so that the precision and the accuracy of the transfer learning are improved.
In the research on the feature extraction and the migration learning method, the existing methods are all researches carried out by using self-coding models, and the research work of using sparse coding models is very little. Sparse coding is one of effective means for reducing dimensions in the fields of images, texts and the like, but the application of sparse coding in field adaptation has some problems, and the common problems are as follows: (a) the problem of non-independent linearity between characteristic basis vectors; (b) the problem of the use of tags in the source domain; (c) and (4) the objective function bias term after superposition. If the problems cannot be solved well, the accuracy of feature extraction and transfer learning is inevitably affected, and the invention provides a solution to the problems.
Disclosure of Invention
The invention aims to provide a transfer learning method based on a depth sparse self-coding machine, which aims to solve the problems of the prior art in feature extraction and transfer learning methods.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a transfer learning method based on a depth sparse self-coding machine is characterized by comprising the following steps: the method sequentially comprises the following steps:
(1) and whitening preprocessing is carried out on all images in the image database, and the process is as follows:
(1.1) representing the input dataset as { x }(1),x(2),...,x(n)Calculating the covariance matrix of xThen, calculating the eigenvector of the covariance matrix, and forming a matrix U according to the column as shown in the following formula:
in the matrix U, U1Is the principal eigenvector, which corresponds to the largest eigenvalue, u2Is a sub-characteristicVector, analogize it and remember λ12,...,λnCharacteristic values corresponding to all the vector quantities in the matrix U are obtained;
(1.2) representing the input data by the calculated matrix U as shown in the following equation:
wherein the subscript rot refers to rotation, which means that it is the result of the rotation processing of the original data, in order to make each input feature have unit variance, the use ofScaling each feature x as a scaling factorrot,iWherein the resulting PCA whitened data is represented by the following formula:
(1.3) let R be any orthogonal matrix, i.e. satisfy RRT=RTR is I, then RxPCAwhiteStill with unit covariance, in order to make the input data after whitening processing for all possible R as close to the original input data as possible, let R be U, resulting in formula (1):
xZCAwhite=UxPCAwhite(1),
xZCAwhitenamely processing data of original input data obtained after ZCA whitening;
(2) constructing a depth sparse self-coding machine model to extract high-level abstract features of the image, wherein the process is as follows:
(2.1) constructing a sparse self-encoding machine model, comprising the following steps:
(2.1.1) sparse self-encoding model Using Reconstruction indexThe dependency component analysis algorithm, RICA algorithm, x obtained by equation (1)ZCAwhiteIs input data of the RICA algorithm and substitutes the cost function formula (2):
in the cost function formula (2), x is the input data, i.e. xZCAwhiteW is a weighting matrix;
(2.1.2) obtaining the partial derivative of x for the cost function formula (2), wherein the partial derivative of the first term of the cost function formula (2) is obtainedAs the partial derivative function, the obtained partial derivative function is shown in formula (3):
and (2.1.3) iteratively calculating a weighting matrix W by using an L-FBGS algorithm to obtain a trained sparse self-coding model.
(2.2) constructing a deep sparse self-coding machine model:
substituting the weighting matrix W obtained in the step (2.1) into the cost function formula (2), and recording the obtained output asIs output data obtained after the training of the single-layer RICA model is finished, and the output data is obtainedRepeating step (2.1) as input data to obtain W(i)Training a weighting matrix obtained after stacking the sparse self-coding machine, wherein i is the number of times of the iteration step (2.1);
(2.3) extracting features according to the trained deep sparse self-coding machine model;
pooling the square root of the model square root with the weighting matrix W obtained in step (2.1)(i)Substituting the formula (4) for convolution feature extraction, wherein the formula (4) is as follows:
in the formula (4), the first and second groups,represents the input of the l-th layer in the convolutional network,the error term for layer l +1 representing the kth feature in the convolutional network, the output from equation (4) is denoted as xfeaAbstract features extracted from the raw input data;
(3) and optimizing features by semi-supervised learning:
using x obtained in step (2)feaPerforming semi-supervised learning for input to obtain a formula (5), adding KL distance of source domain distribution and target domain distribution and multiclass regression bias terms according to source domain class labels,denotes the output, W, obtained after semi-supervised learningSSLRepresenting a weight matrix in semi-supervised learning, ξ(s)Output representing hidden layers in the source domain, ξ(t)Representing the output of the hidden layer in the target domain, equation (5) is as follows:
in the formula (5), the first and second groups,represents fromReconstructing errors between original data and data re-represented after feature extraction;
KL distances representing source and target domain distributions;
representing multi-class regression bias terms made according to the source domain class labels;
matrix W representing characteristic parametersSSLThe constraint term of (2);
(4) training a classifier and classifying the test image data set, wherein the process is as follows:
(4.1) training an LR classifier with the test image dataset; in the LR classification, note:
in formula (6), scaleOutputting the finished product of the step (3) for sigmoid functionAnd a label-substitution-in (6) training classifier of y, the training image dataset, of known labels;
(4.2) classifying the test image data set by using the trained classifier; outputting the finished product of the step (3)The output of the test data set in (1) is substituted for the LR classifier finished in (6), and the classification result T of the test image data set is obtainedtestAs shown in equation (7):
Ttest=argmaxP(x) (7)。
the invention provides a transfer learning method based on a depth sparse self-coding machine. According to the method, from the aspect of deep learning, a sparse self-coding machine model applying an RICA algorithm is applied to data set feature extraction, and through a multi-layer superposition idea of deep learning, a deep sparse self-coding machine is constructed through a Stacked RICA algorithm and a linear independent over-complete feature basis vector is trained and extracted. And on the basis of the feature basis vector, a semi-supervised learning method is applied to add the source domain class labels and the bias terms of the multi-class regression, so that the extracted features are further optimized. And finally, training a classifier by applying a support vector machine model according to the extracted features to realize classification prediction of the target domain and finish the target of transfer learning. The method can extract more useful features in the data set, improves the classification precision in the target domain, and obviously improves the accuracy and precision of the transfer learning.
The invention solves the important practical problem of feature extraction and transfer learning, the research result can be directly applied to image classification, text classification, emotion transfer and other applications, and can be expanded to be applied to a plurality of fields such as audio, web pages, videos and the like, and the invention has important application value, and once the research is successful and put into application, huge social and economic benefits can be generated.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention realizes the characteristic representation of the extracted data from the level research of the independent component analysis model, and improves the robustness of the represented characteristic compared with the traditional characteristic extraction algorithm (sparse coding or self-coding).
2. According to the invention, through a method of a hierarchical structure in deep learning, on the basis of analyzing data of a data set, a staged RicA algorithm is researched and provided, a source domain label and a multi-class regression target function are considered in a multilayer structure, the condition of the source domain label is applied to optimization of the target function, more useful characteristics in the data set can be extracted, the classification precision in the target domain is improved, and the accuracy of transfer learning is improved.
3. The invention can be applied to a plurality of fields such as images, texts, audios, videos and the like, and has important application value. Moreover, the results of the research based on the Stacked RICA can also be applied to many pattern classification fields related to transfer learning, such as image recognition, emotion classification, theme classification, voice recognition and robot system.
Drawings
Fig. 1 is a flowchart of a specific study scheme of feature extraction and transfer learning according to the present invention.
FIG. 2 is a schematic diagram of the hierarchy of the RICA model.
FIG. 3 is a schematic diagram of an analysis of a Stacked RICA model according to the present invention.
Detailed Description
As shown in fig. 1, fig. 1 is a flow chart of the method of the present invention, and the specific implementation in fig. 1 is as follows:
(1) in order to train better characteristics, the training data set and the test data set are spliced and vectorized to obtain a vectorized data set.
(2) For the vectorized text data set, a Stacked Reconstruction independent component Analysis (Stacked RICA) model is used for feature extraction, and the specific process is as follows:
1) whitening data with the ZCA method:
ZCA whiteningIs a data preprocessing method that maps data from x to xZCAwhiteIt has also proven to be a rough model of the processed image of a biological eye (retina). For example, when your eye perceives an image, most adjacent "pixels" are perceived as similar values in the eye, since adjacent parts in an image are very correlated in brightness. Therefore, it is very cost-inefficient if the human eye needs to transmit each pixel value separately (via the optic nerve) into the brain. Instead, the retina performs a decorrelation operation similar to that in ZCA, thereby obtaining a less redundant representation of the input image, and transmits it to the brain. In feature extraction, the input is redundant for training purposes due to the strong correlation between adjacent instances or expressions in the dataset. The purpose of whitening is to reduce the redundancy of the input, and the input of the learning algorithm has the following properties through the whitening process: (i) the correlation between features is low; (ii) all features have the same variance. The result of ZCA whitening can be expressed as:
2) feature extraction based on Stacked RICA
The method comprises the following specific steps:
① Single layer RICA extraction features
A Reconstruction Independent Component Analysis (RICA) algorithm is designed to extract features according to the idea of fig. 2. Assuming that given an input of x, the present invention is intended to derive a linearly independent set of bases (denoted by W), the objective function can be expressed as:
J(W)=||Wx||1
in the expression, Wx represents the characteristic representation of input x, in RICA, in order to ensure that mutually linearly independent overcomplete bases are obtained, the invention solves the following objective function:
where λ is the weight attenuation coefficient, W is the weight proof, and x is the input data. To solve the objective function:
first, the first step requires solving by a method of requiring derivativesI.e. to solve for
As shown in fig. 2, the weights and activation functions in the model are as follows:
let J (z)(4)) F (x), then J (z)(4))=∑kJ(zk (4))。
After the model input is set to F, the problem is converted to a solutionAlthough W appears twice in the model, it can be shown that when W appears multiple times in the neural network, the partial derivative with respect to W is the sum of partial derivatives with respect to each W instance in the network, as follows:
as described above, the present invention first derives a partial derivative for each W instance,
with respect to WTThe following steps:
regarding W:
the final method for solving partial derivatives of W is:
the second step is an iteration by the method with l-bfgs. The invention is completed by the following cost function:
w finally obtained after multiple iterations is a group of linear independent overcomplete bases of the original input x. From this set of bases we can get a more useful characterization Wx of the original input data x.
② superposition of RICA (stacked RICA) computational feature representation
FIG. 3 is a model diagram of a Stacked RICA model according to the present invention, illustrating that the Stacked RICA model consists of an input layer, two hidden layers, and an output layer. The Stacked RICA model is based on the idea of deep learning, RICA structures are Stacked, namely a stronger characteristic representation z obtained after a single-layer RICA is finished is used as the input of the next-layer RICA algorithm, and then each layer of iteration optimization parameters is used for optimizing an objective function. Finally obtaining the characteristic representation of the original input data through multilayer superposition
(3) After the feature extraction work is done by Stacked RICA, the resulting feature representation is usedInstead of the original input data x, pairAnd performing Semi-Supervised Learning (Semi-Supervised Learning) and adding consideration of bias terms, wherein the bias terms comprise KL distances of source domain distribution and target domain distribution and multi-class regression bias terms according to source domain class labels, and the label information of the source domain is applied to optimization of feature representation. By optimizing the objective function, the feature representations of the source domain and the target domain for classification are obtained.
The objective function can be expressed as:
wherein,representing the reconstruction error from the original data to the data re-represented after feature extraction.
Indicating KL distances of the source domain distribution and the target domain distribution.
And representing multi-class regression bias terms made according to the source domain class labels.
Representing the constraint terms of the characteristic parameter matrix W.
(4) After all the processes of feature extraction and selection are completed, the obtained features of the source domain are used for representing, and a classifier is trained in the source domain, wherein the tool for training the classifier is a Support Vector Machine (SVM), a logistic regression analysis model (LR) or a module classifier.
(5) And (4) carrying out classification prediction in the target domain by using the classifier obtained by the source domain training, thereby applying the classifier in the source domain to the target domain.
(6) And obtaining a final migration learning result.

Claims (1)

1. A transfer learning method based on a depth sparse self-coding machine is characterized by comprising the following steps: the method sequentially comprises the following steps:
(1) and whitening preprocessing is carried out on all images in the image database, and the process is as follows:
(1.1) representing the input dataset as { x }(1),x(2),...,x(n)Calculating the covariance matrix of xThen calculates the covariance momentThe characteristic vector of the array is shown as the following formula according to the array composition matrix U:
<mrow> <mi>U</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mo>|</mo> </mtd> <mtd> <mo>|</mo> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <mo>|</mo> </mtd> </mtr> <mtr> <mtd> <msub> <mi>u</mi> <mn>1</mn> </msub> </mtd> <mtd> <msub> <mi>u</mi> <mn>2</mn> </msub> </mtd> <mtd> <mn>...</mn> </mtd> <mtd> <msub> <mi>u</mi> <mi>n</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mo>|</mo> </mtd> <mtd> <mo>|</mo> </mtd> <mtd> <mrow></mrow> </mtd> <mtd> <mo>|</mo> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow>
in the matrix U, U1Is the principal eigenvector, which corresponds to the largest eigenvalue, u2Is a sub-feature vector, analogizes with the same, and records λ12,...,λnCharacteristic values corresponding to all the vector quantities in the matrix U are obtained;
(1.2) representing the input data by the calculated matrix U as shown in the following equation:
<mrow> <msub> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> </mrow> </msub> <mo>=</mo> <msup> <mi>U</mi> <mi>T</mi> </msup> <mi>x</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msup> <msub> <mi>u</mi> <mn>1</mn> </msub> <mi>T</mi> </msup> <mi>x</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msup> <msub> <mi>u</mi> <mn>2</mn> </msub> <mi>T</mi> </msup> <mi>x</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <msup> <msub> <mi>u</mi> <mi>n</mi> </msub> <mi>T</mi> </msup> <mi>x</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>,</mo> </mrow>
wherein the subscript rot refers to rotation, which means that it is the result of the rotation processing of the original data, in order to make each input feature have unit variance, the use ofScaling each feature x as a scaling factorrot,iWherein the resulting PCA whitened data is represented by the following formula:
<mrow> <msub> <mi>x</mi> <mrow> <mi>P</mi> <mi>C</mi> <mi>A</mi> <mi>w</mi> <mi>h</mi> <mi>i</mi> <mi>t</mi> <mi>e</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>x</mi> <mrow> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <msqrt> <msub> <mi>&amp;lambda;</mi> <mi>i</mi> </msub> </msqrt> </mfrac> <mo>;</mo> </mrow>
(1.3) let R be any orthogonal matrix, i.e. satisfy RRT=RTR is I, then RxPCAwhiteStill with unit covariance, in order to make the input data after whitening processing for all possible R as close to the original input data as possible, let R be U, resulting in formula (1):
xZCAwhite=UxPCAwhite(1),
xZCAwhitenamely processing data of original input data obtained after ZCA whitening;
(2) constructing a depth sparse self-coding machine model to extract high-level abstract features of the image, wherein the process is as follows:
(2.1) constructing a sparse self-encoding machine model, comprising the following steps:
(2.1.1), the sparse self-coding model uses a Reconstruction Independent Component Analysis algorithm (RICA) to obtain x by using a formula (1)ZCAwhiteIs input data of the RICA algorithm and substitutes the cost function formula (2):
<mrow> <mtable> <mtr> <mtd> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> </mtd> <mtd> <mrow> <mi>&amp;lambda;</mi> <mo>|</mo> <mo>|</mo> <mi>W</mi> <mi>x</mi> <mo>|</mo> <msub> <mo>|</mo> <mn>1</mn> </msub> <mo>+</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> <mo>|</mo> <mo>|</mo> <msup> <mi>W</mi> <mi>T</mi> </msup> <mi>W</mi> <mi>x</mi> <mo>-</mo> <mi>x</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
in the cost function formula (2), x is the input data, i.e. xZCAwhiteW is a weighting matrix;
(2.1.2) obtaining the partial derivative of x for the cost function formula (2), wherein the partial derivative of the first term of the cost function formula (2) is obtainedAs the partial derivative function, the obtained partial derivative function is shown in formula (3):
<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mo>&amp;dtri;</mo> <mi>W</mi> </msub> <mi>F</mi> <mo>=</mo> <mi>&amp;lambda;</mi> <mrow> <mo>(</mo> <mi>W</mi> <mi>x</mi> <mo>/</mo> <msqrt> <mrow> <msup> <mrow> <mo>(</mo> <mrow> <mi>W</mi> <mi>x</mi> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <mi>&amp;epsiv;</mi> </mrow> </msqrt> <mo>)</mo> </mrow> <msup> <mi>x</mi> <mi>T</mi> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>2</mn> <mo>(</mo> <mrow> <msup> <mi>W</mi> <mi>T</mi> </msup> <mi>W</mi> <mi>x</mi> <mo>-</mo> <mi>x</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <msup> <mi>x</mi> <mi>T</mi> </msup> <mo>+</mo> <mn>2</mn> <mrow> <mo>(</mo> <mi>W</mi> <mi>x</mi> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msup> <mi>W</mi> <mi>T</mi> </msup> <mi>W</mi> <mi>x</mi> <mo>-</mo> <mi>x</mi> <mo>)</mo> </mrow> <mi>T</mi> </msup> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
and (2.1.3) iteratively calculating a weighting matrix W by using an L-FBGS algorithm to obtain a trained sparse self-coding model.
(2.2) constructing a deep sparse self-coding machine model:
substituting the weighting matrix W obtained in the step (2.1) into the cost function formula (2), and recording the obtained output asIs output data obtained after the training of the single-layer RICA model is finished, and the output data is obtainedRepeating step (2.1) as input data to obtain W(i)Training a weighting matrix obtained after stacking the sparse self-coding machine, wherein i is the number of times of the iteration step (2.1);
(2.3) extracting features according to the trained deep sparse self-coding machine model;
pooling the square root of the model square root with the weighting matrix W obtained in step (2.1)(i)Substituting the formula (4) for convolution feature extraction, wherein the formula (4) is as follows:
<mrow> <msub> <mo>&amp;dtri;</mo> <mi>W</mi> </msub> <mi>J</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>;</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>i</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>*</mo> <mi>r</mi> <mi>o</mi> <mi>t</mi> <mn>90</mn> <mrow> <mo>(</mo> <msubsup> <mi>&amp;delta;</mi> <mi>k</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <mn>2</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
in the formula (4), the first and second groups,represents the input of the l-th layer in the convolutional network,the error term for layer l +1 representing the kth feature in the convolutional network, the output from equation (4) is denoted as xfeaAbstract features extracted from the raw input data;
(3) and optimizing features by semi-supervised learning:
using x obtained in step (2)feaPerforming semi-supervised learning for input to obtain a formula (5), adding KL distance of source domain distribution and target domain distribution and multiclass regression bias terms according to source domain class labels,denotes the output, W, obtained after semi-supervised learningSSLRepresenting a weight matrix in semi-supervised learning, ξ(s)Output representing hidden layers in the source domain, ξ(t)Representing the output of the hidden layer in the target domain, equation (5) is as follows:
<mrow> <mo>,</mo> <mi>J</mi> <mo>=</mo> <msub> <mi>J</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mrow> <mi>f</mi> <mi>e</mi> <mi>a</mi> </mrow> </msub> <mo>,</mo> <mover> <mi>x</mi> <mo>^</mo> </mover> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;alpha;gJ</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <msup> <mi>&amp;xi;</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> </msup> <mo>,</mo> <msup> <mi>&amp;xi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;beta;gJ</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mrow> <mi>S</mi> <mi>S</mi> <mi>L</mi> </mrow> </msub> <mo>,</mo> <msup> <mi>&amp;xi;</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;gamma;gJ</mi> <mn>4</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mrow> <mi>S</mi> <mi>S</mi> <mi>L</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
in the formula (5), the first and second groups,representing the reconstruction error from the original data to the data re-represented after feature extraction;
<mrow> <msub> <mi>J</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <msup> <mi>&amp;xi;</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> </msup> <mo>,</mo> <msup> <mi>&amp;xi;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>D</mi> <mrow> <mi>K</mi> <mi>L</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mi>s</mi> </msub> <msub> <mi>PP</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>D</mi> <mrow> <mi>K</mi> <mi>L</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mi>t</mi> </msub> <msub> <mi>PP</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>P</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mi>ln</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mi>P</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>P</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>P</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mi>ln</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mi>P</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>P</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>
KL distances representing source and target domain distributions;
representing multi-class regression bias terms made according to the source domain class labels;
matrix W representing characteristic parametersSSLThe constraint term of (2);
(4) training a classifier and classifying the test image data set, wherein the process is as follows:
(4.1) training an LR classifier with the test image dataset; in the LR classification, note:
<mrow> <mtable> <mtr> <mtd> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>h</mi> <mi>&amp;theta;</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <msup> <mi>&amp;theta;</mi> <mi>T</mi> </msup> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mi>&amp;sigma;</mi> <mrow> <mo>(</mo> <msup> <mi>&amp;theta;</mi> <mi>T</mi> </msup> <mi>x</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>=</mo> <mn>0</mn> <mo>|</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> <mo>-</mo> <mi>P</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>=</mo> <mn>1</mn> <mo>|</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>h</mi> <mi>&amp;theta;</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
in formula (6), scaleOutputting the finished product of the step (3) for sigmoid functionAnd a label-substitution-in (6) training classifier of y, the training image dataset, of known labels;
(4.2) classifying the test image data set by using the trained classifier; outputting the finished product of the step (3)The output of the test data set in (1) is substituted for the LR classifier finished in (6), and the classification result T of the test image data set is obtainedtestAs shown in equation (7):
Ttest=arg max P(x) (7)。
CN201711069171.XA 2017-11-03 2017-11-03 Transfer learning method based on depth sparse self-coding machine Expired - Fee Related CN107798349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711069171.XA CN107798349B (en) 2017-11-03 2017-11-03 Transfer learning method based on depth sparse self-coding machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711069171.XA CN107798349B (en) 2017-11-03 2017-11-03 Transfer learning method based on depth sparse self-coding machine

Publications (2)

Publication Number Publication Date
CN107798349A true CN107798349A (en) 2018-03-13
CN107798349B CN107798349B (en) 2020-07-14

Family

ID=61549046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711069171.XA Expired - Fee Related CN107798349B (en) 2017-11-03 2017-11-03 Transfer learning method based on depth sparse self-coding machine

Country Status (1)

Country Link
CN (1) CN107798349B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564121A (en) * 2018-04-09 2018-09-21 南京邮电大学 A kind of unknown classification image tag prediction technique based on self-encoding encoder
CN108595568A (en) * 2018-04-13 2018-09-28 重庆邮电大学 A kind of text sentiment classification method based on very big unrelated multivariate logistic regression
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
CN108805160A (en) * 2018-04-17 2018-11-13 平安科技(深圳)有限公司 Transfer learning method, apparatus, computer equipment and storage medium
CN109117793A (en) * 2018-08-16 2019-01-01 厦门大学 Direct-push high Resolution Range Profile Identification of Radar method based on depth migration study
CN109359557A (en) * 2018-09-25 2019-02-19 东北大学 A kind of SAR remote sensing images Ship Detection based on transfer learning
CN109726742A (en) * 2018-12-11 2019-05-07 中科恒运股份有限公司 The quick training method of disaggregated model and terminal device
CN109816002A (en) * 2019-01-11 2019-05-28 广东工业大学 The single sparse self-encoding encoder detection method of small target migrated certainly based on feature
CN109902861A (en) * 2019-01-31 2019-06-18 南京航空航天大学 A kind of order manufacturing schedule real-time predicting method based on the double-deck transfer learning
CN111046824A (en) * 2019-12-19 2020-04-21 上海交通大学 Time series signal efficient denoising and high-precision reconstruction modeling method and system
CN111753898A (en) * 2020-06-23 2020-10-09 扬州大学 Representation learning method based on superposition convolution sparse self-encoding machine
CN111753899A (en) * 2020-06-23 2020-10-09 扬州大学 Adaptive unbalanced data field adaptation method
CN111985161A (en) * 2020-08-21 2020-11-24 广东电网有限责任公司清远供电局 Transformer substation three-dimensional model reconstruction method
CN112070236A (en) * 2020-09-11 2020-12-11 福州大学 Sparse feature learning method for solving online complex optimization calculation based on transfer learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200224A (en) * 2014-08-28 2014-12-10 西北工业大学 Valueless image removing method based on deep convolutional neural networks
CN104408469A (en) * 2014-11-28 2015-03-11 武汉大学 Firework identification method and firework identification system based on deep learning of image
CN105844331A (en) * 2015-01-15 2016-08-10 富士通株式会社 Neural network system and training method thereof
CN106096652A (en) * 2016-06-12 2016-11-09 西安电子科技大学 Based on sparse coding and the Classification of Polarimetric SAR Image method of small echo own coding device
CN106203506A (en) * 2016-07-11 2016-12-07 上海凌科智能科技有限公司 A kind of pedestrian detection method based on degree of depth learning art
CN106529428A (en) * 2016-10-31 2017-03-22 西北工业大学 Underwater target recognition method based on deep learning
CN106599863A (en) * 2016-12-21 2017-04-26 中国科学院光电技术研究所 Deep face recognition method based on transfer learning technology

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200224A (en) * 2014-08-28 2014-12-10 西北工业大学 Valueless image removing method based on deep convolutional neural networks
CN104408469A (en) * 2014-11-28 2015-03-11 武汉大学 Firework identification method and firework identification system based on deep learning of image
CN105844331A (en) * 2015-01-15 2016-08-10 富士通株式会社 Neural network system and training method thereof
CN106096652A (en) * 2016-06-12 2016-11-09 西安电子科技大学 Based on sparse coding and the Classification of Polarimetric SAR Image method of small echo own coding device
CN106203506A (en) * 2016-07-11 2016-12-07 上海凌科智能科技有限公司 A kind of pedestrian detection method based on degree of depth learning art
CN106529428A (en) * 2016-10-31 2017-03-22 西北工业大学 Underwater target recognition method based on deep learning
CN106599863A (en) * 2016-12-21 2017-04-26 中国科学院光电技术研究所 Deep face recognition method based on transfer learning technology

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIALE CUI ET AL.: "Text Classification Based on ReLU Activation Function of SAE Algorithm", 《INTERNATIONAL SYMPOSIUM ON NEURAL NETWORK》 *
梅灿华 等: "一种基于最大熵模型的加权归纳迁移学习方法", 《计算机研究与发展》 *
谢李鹏: "基于局部不变特征融合的图像检索技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564121B (en) * 2018-04-09 2022-05-03 南京邮电大学 Unknown class image label prediction method based on self-encoder
CN108564121A (en) * 2018-04-09 2018-09-21 南京邮电大学 A kind of unknown classification image tag prediction technique based on self-encoding encoder
CN108595568A (en) * 2018-04-13 2018-09-28 重庆邮电大学 A kind of text sentiment classification method based on very big unrelated multivariate logistic regression
CN108595568B (en) * 2018-04-13 2022-05-17 重庆邮电大学 Text emotion classification method based on great irrelevant multiple logistic regression
CN108805160A (en) * 2018-04-17 2018-11-13 平安科技(深圳)有限公司 Transfer learning method, apparatus, computer equipment and storage medium
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
CN109117793B (en) * 2018-08-16 2021-10-29 厦门大学 Direct-push type radar high-resolution range profile identification method based on deep migration learning
CN109117793A (en) * 2018-08-16 2019-01-01 厦门大学 Direct-push high Resolution Range Profile Identification of Radar method based on depth migration study
CN109359557B (en) * 2018-09-25 2021-11-09 东北大学 SAR remote sensing image ship detection method based on transfer learning
CN109359557A (en) * 2018-09-25 2019-02-19 东北大学 A kind of SAR remote sensing images Ship Detection based on transfer learning
CN109726742A (en) * 2018-12-11 2019-05-07 中科恒运股份有限公司 The quick training method of disaggregated model and terminal device
CN109816002A (en) * 2019-01-11 2019-05-28 广东工业大学 The single sparse self-encoding encoder detection method of small target migrated certainly based on feature
CN109816002B (en) * 2019-01-11 2022-09-06 广东工业大学 Single sparse self-encoder weak and small target detection method based on feature self-migration
CN109902861A (en) * 2019-01-31 2019-06-18 南京航空航天大学 A kind of order manufacturing schedule real-time predicting method based on the double-deck transfer learning
CN111046824B (en) * 2019-12-19 2023-04-28 上海交通大学 Efficient denoising and high-precision reconstruction modeling method and system for time series signals
CN111046824A (en) * 2019-12-19 2020-04-21 上海交通大学 Time series signal efficient denoising and high-precision reconstruction modeling method and system
CN111753898A (en) * 2020-06-23 2020-10-09 扬州大学 Representation learning method based on superposition convolution sparse self-encoding machine
CN111753899A (en) * 2020-06-23 2020-10-09 扬州大学 Adaptive unbalanced data field adaptation method
CN111753899B (en) * 2020-06-23 2023-10-17 扬州大学 Self-adaptive unbalanced data field adaptation method
CN111753898B (en) * 2020-06-23 2023-09-22 扬州大学 Representation learning method based on superposition convolution sparse self-encoder
CN111985161A (en) * 2020-08-21 2020-11-24 广东电网有限责任公司清远供电局 Transformer substation three-dimensional model reconstruction method
CN112070236B (en) * 2020-09-11 2022-08-16 福州大学 Sparse feature learning method for solving online complex optimization calculation based on transfer learning
CN112070236A (en) * 2020-09-11 2020-12-11 福州大学 Sparse feature learning method for solving online complex optimization calculation based on transfer learning

Also Published As

Publication number Publication date
CN107798349B (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN107798349B (en) Transfer learning method based on depth sparse self-coding machine
CN112364779B (en) Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
Yang et al. Application of deep convolution neural network
CN106650813B (en) A kind of image understanding method based on depth residual error network and LSTM
CN109389207A (en) A kind of adaptive neural network learning method and nerve network system
CN100492399C (en) Method for making human face posture estimation utilizing dimension reduction method
CN111127146B (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
CN111444343A (en) Cross-border national culture text classification method based on knowledge representation
CN105224984A (en) A kind of data category recognition methods based on deep neural network and device
CN106326843A (en) Face recognition method
CN112818764A (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN107609638A (en) A kind of method based on line decoder and interpolation sampling optimization convolutional neural networks
CN109086886A (en) A kind of convolutional neural networks learning algorithm based on extreme learning machine
CN106919980A (en) A kind of increment type target identification system based on neuromere differentiation
CN107369147B (en) Image fusion method based on self-supervision learning
CN109886072A (en) Face character categorizing system based on two-way Ladder structure
Ovcharenko et al. Style transfer for generation of realistically textured subsurface models
CN103646256A (en) Image characteristic sparse reconstruction based image classification method
CN112017255A (en) Method for generating food image according to recipe
CN110598737B (en) Online learning method, device, equipment and medium of deep learning model
CN108229571A (en) Apple surface lesion image-recognizing method based on KPCA algorithms Yu depth belief network
CN107528824A (en) A kind of depth belief network intrusion detection method based on two-dimensionses rarefaction
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN103077408A (en) Method for converting seabed sonar image into acoustic substrate classification based on wavelet neutral network
CN114863209B (en) Unsupervised domain adaptation modeling method, system, equipment and medium for category proportion guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200714

CF01 Termination of patent right due to non-payment of annual fee