Incremental Active Learning Method for Supervised ISOMAP

Guopeng Zhang¹⁶,
Rui Huang¹⁶ &
Junli Chen¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10667))

Included in the following conference series:

International Conference on Image and Graphics

2178 Accesses

Abstract

The supervised Isomap (S-Isomap), a supervised version of Isomap, can achieve better recognition performance by considering the class label information. However, S-Isomap usually suffers from heavy computation burden especially when new data arrive and re-computation of distance matrix is needed. To address the limitation, an incremental strategy is applied to update the geodesic distances using the previous results. Meanwhile, informative samples are selected through active learning to exploit the added data information with less human labeling effort on incremental process. Therefore, an incremental active learning S-Isomap (IALSI) is proposed. Image retrieval and classification experiments are conducted to evaluate the performance of IALSI. The results show that the proposed method can improve the accuracy meanwhile reducing the running time greatly.

You have full access to this open access chapter, Download conference paper PDF

An Improved Supervised Isomap Method Using Adaptive Parameters

Iterative Active Classification of Large Image Collection

Learning Distance Metrics with Feature Space Performance for Image Retrieval

Keywords

1 Introduction

With the rapid growth of information technology, massive data sets, such as human gene distribution, human healthy report, climate patterns, commercial record, have been created. Dimension reduction (DR), as a key technique to extract the important formation from the high-dimensional data, has increasingly attracted considerable attentions and studies.

The major task of DR is to find a projection to map the high dimension data into a lower dimensional space according to some criterion. So far, many DR methods have been proposed, and they can be divided into linear and nonlinear groups. PCA and LDA [1] are most popular linear methods, which try to obtain linear projection matrices. In contrast to the linear DR methods, nonlinear approaches are able to discover the nonlinear structure data in real world. Most incremental DR methods are based on kernel function. For example, kernel principal component analysis (KPCA) [2] has been widely used to get an implicit nonlinear mapping. But the result may be worse once the inappropriate kernel function is chosen. Recently, some methods based on manifold learning, such as laplacian eigenmaps (LE) [3], locally linear embedding (LLE) [4] and Isomap [5] have drawn much concern. In particular, Isomap, which can reveal the intrinsic geometric structure of manifold by preserving geodesic distance of all similarity pairs, has presented some encouraging results. Therefore, a series of Isomap methods have occurred, such as supervised Isomap (S-Isomap) [6], orthogonal constrained marginal Isomap (M-Isomap) [7] and multi-manifold discriminant Isomap (MMD-Isomap) [8]. Among them, S-Isomap has shown good performances in visualization, image retrieval and classification by modifying the distances between the pairwise points with the same and different class labels. Despite the good performances achieved by Isomap family, they suffer from heavy computation burden that is caused by calculating the geodesic distances of all point pairs. In particular, distance recalculation has to be done when new data arrive [9] and leads to more computational load.

Incremental learning, as a family of efficient of scalable machine learning methods, has been proposed to operate on a sequence of data instance. At each step, the incremental learning algorithms process an incoming example and update the learner through the instance. An incremental version of Isomap has been proposed, which can efficiently update the geodesic distances and re-estimate the eigenvectors using the previous computation results [10]. On the other hand, for supervised learning, the number of labeled samples is limited in practice due to cost and time. This issue has motivated the recent study of active learning [11], which queries only a subset of informative incoming instances to update the learner and aims to maximize learner performance using minimal human labeling effort at each iteration.

In order to cut down the computation amount of S-Isomap and use the most informative labeled samples, we present an incremental active learning S-Isomap, called by IALSI. The proposed method modifies the original S-Isomap to the incremental version and reduces the computational cost greatly. Furthermore, active learning is applied for high efficiency and scalability towards large-scale learning tasks.

The remainder of this paper is organized as follows: Sect. 2 describes our proposed incremental active learning S-Isomap. The experimental results are shown in Sect. 3. Finally, Sect. 4 gives the conclusion and future work.

2 Incremental Active Learning S-Isomap

In this section, we will elaborate our method. Before that, we first give a brief review of S-Isomap. Then we introduce the details of our method.

2.1 S-Isomap

Given a data set $ {\mathbf{X}} = \left[ {{\mathbf{x}}_{1} ,{\mathbf{x}}_{2} , \ldots ,{\mathbf{x}}_{N} } \right]{ \in }{\mathbf{R}}^{r \times N} $ with N points, DR is to find a mapping function that maps these points to a new data set $ {\mathbf{Y}} = \left[ {{\mathbf{y}}_{1} ,{\mathbf{y}}_{2} , \ldots ,{\mathbf{y}}_{N} } \right]{ \in }{\mathbf{R}}^{r' \times N} $ in a lower dimensional space $ \left( {r' \ll r} \right) $. $ d\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) $ denotes the Euclidean distance between points $ {\mathbf{x}}_{i} $ and $ {\mathbf{x}}_{j} $.

S-Isomap is a supervised version of Isomap. Considering the class label information of samples, it defines the distance metric as follows:

$$ \tilde{d}\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) = \left\{ {\begin{array}{*{20}l} {\sqrt {1 - { \exp }\left( { - d^{2} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right)/\beta } \right)} \;{\text{if }}l\left( {{\mathbf{x}}_{i} } \right) = l\left( {{\mathbf{x}}_{j} } \right)} \hfill \\ {\sqrt {{ \exp }\left( {d^{2} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right)/\beta } \right)} - \alpha \;{\text{if }}l\left( {{\mathbf{x}}_{i} } \right) \ne l\left( {{\mathbf{x}}_{j} } \right)} \hfill \\ \end{array} } \right. $$

(1)

where the parameters of $ \alpha $ and $ \beta $ are used to control the range of $ d\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) $. Actually, $ \alpha $ is set to be a small positive value, and $ \beta $ is set to be the average Euclidean distance between all pairs of data points. $ l\left( {{\mathbf{x}}_{i} } \right) $ is the class label of $ {\mathbf{x}}_{i} $. After the modification of point distances, the same steps as Isomap are used to obtain the lower dimensional representations of samples. It can be summarized as follows:

1.
Construct neighborhood graph $ {\mathbf{D}}_{G} $. For every pair of data points, if $ \tilde{d}\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) $. is smaller than the fixed us radius $ { \in } $ or $ {\mathbf{x}}_{j}\,{ \in }\,{\text{KNN}}\left( {{\mathbf{x}}_{i} } \right)\;\,\left( {{\text{KNN}}\left( {{\mathbf{x}}_{i} } \right)} \right. $ means $ {\mathbf{x}}_{j} $ is the K-nearest neighbors of $ \left. {{\mathbf{x}}_{i} } \right), $ set edge with weight of $ \tilde{d}\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right). $
2.
Compute the shortest geodesic distances. Initial the distance $ d_{G} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) = \tilde{d}\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) $. if $ {\mathbf{x}}_{j} $ and $ {\mathbf{x}}_{i} $ are neighbors, otherwise, put $ d_{G} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) = {\infty } $. Estimate the geodesic distances between all pairs of points by computing their shortest path distances in $ {\mathbf{D}}_{G} $ using Dijkstra’s or Floyd’s algorithm.
3.
Construct r’-dimensional embedding. Define the optimization problem as follows:
$$ \mathop {\hbox{min} }\limits_{{\mathbf{Y}}} \sum\limits_{{{\mathbf{x}}_{i} {\mathbf{,x}}_{j} }} {\left( {d({\mathbf{y}}_{i} ,{\mathbf{y}}_{j} ) - d_{G} ({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} )} \right)^{2} } $$
(2)

Let $ {\mathbf{H}} = {\mathbf{I}} - \left( {1/N} \right){\mathbf{ee}}^{T} $, where I is an $ N \times N $ identity matrix, and $ {\mathbf{e}} $ is the vector of all ones. Let Q is an $ N \times N $ matrix with elements $ {\mathbf{Q}}_{ij} = d_{G}^{2} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) $. The lower-dimensional embedding Y is obtained as $ \left[ {\sqrt {\lambda_{1} } {\mathbf{v}}_{1} , \sqrt {\lambda_{2} } {\mathbf{v}}_{2} , \ldots ,\sqrt {\lambda_{{r^{{\prime }} }} } {\mathbf{v}}_{{r^{{\prime }} }} } \right]^{T} $., where $ \left\{ {{\mathbf{v}}_{i} } \right\}_{i = 1}^{{r^{{\prime }} }} $ denotes the eigenvectors corresponding to the first $ r' $ leading eigenvalues of $ {\mathbf{R}} = - {\mathbf{HQH}}/2 $.

2.2 Proposed IALSI

In IALSI, incremental learning is applied to update the shortest distance matrix based on Floyd’s algorithm en new training data arrive in each iteration. These new incoming samples are chosen by active learning according to a given criterion.

The main task of active learning is to select information and representative samples for classification. According to different criteria for sample selection, active learning methods can be roughly divided into two groups: committee-based approaches and uncertainty-based approaches. Committee-based approaches, based on committee of classifiers, select the data whose classification results have the greatest disagreements among the classifiers. While in the uncertainty-based approaches, the most uncertain samples determined by some uncertainty scheme are chosen and labeled.

In the proposed IALSI, the uncertainty of sample $ {\mathbf{x}}_{i} $ are measured as follows:

$$ w\left( {{\mathbf{x}}_{i}} \right)={{\hbox{max} }_{1\le c\le C}}d\left( {{\mathbf{x}}_{i}},{{\overline{\mathbf{x}}}_{c}} \right)-{{\hbox{min} }_{1\le c\le C}}d\left( {{\mathbf{x}}_{i}},{{\overline{\mathbf{x}}}_{c}} \right), $$

(3)

where $ \overline{{\mathbf{x}}}_{c} $ is the mean value of samples belonging to class c, and C is the number of classes. A smaller $ w\left( {{\mathbf{x}}_{i} } \right) $ means that the distances between sample $ {\mathbf{x}}_{i} $ and each class center are comparable and thus there is ambiguity to label $ {\mathbf{x}}_{i} $ based on the distances. Therefore, $ {\mathbf{x}}_{i} $ is more informative and to be labeled and added to the training set.

Suppose the available training data set is $ \left\{ {{\mathbf{x}}_{i} , l\left( {{\mathbf{x}}_{i} } \right)} \right\}_{i = 1}^{N} $, the unlabeled data set is $ \left\{ {{\mathbf{x}}_{j} } \right\}_{j = 1}^{M} $. We will select m samples from the unlabeled set through active learning. The details of active learning algorithm are summarized in Table 1.

Table 1. Active learning

Full size table

In the beginning of IALSI, distances between points in $ \left\{ {{\mathbf{x}}_{i} , l\left( {{\mathbf{x}}_{i} } \right)} \right\}_{i = 1}^{N} $. are calculated using Eq. (1), and the shortest distance matrix $ {\mathbf{D}}_{0} $ is obtained. In the $ t $ th iteration, m samples selected by active learning algorithm are added to the training set as $ \left\{ {{\mathbf{x}}_{j} , l\left( {{\mathbf{x}}_{j} } \right)} \right\}_{j = 1}^{m} , $ and used to update the geodesic distances between points $ \left( {1 \le t \le T} \right) $. The extended distance matrix is given as:

$$ {\mathbf{D}}_{t} = \left[ {\begin{array}{*{20}l} {{\mathbf{D}}_{t - 1} } \hfill & {{\mathbf{D}}_{E} } \hfill \\ {{\mathbf{D}}_{E}^{T} } \hfill & {{\mathbf{D}}_{A} } \hfill \\ \end{array} } \right] $$

(4)

where $ {\mathbf{D}}_{A} $ denotes the distance matrix among the new arriving data $ \left\{ {{\mathbf{x}}_{j} , l\left( {{\mathbf{x}}_{j} } \right)} \right\}_{j = 1}^{m} $, $ {\mathbf{D}}_{E} $ denotes the distances between $ \left\{ {{\mathbf{x}}_{i} , l\left( {{\mathbf{x}}_{i} } \right)} \right\}_{i = 1}^{N} $ and $ \left\{ {{\mathbf{x}}_{j} , l\left( {{\mathbf{x}}_{j} } \right)} \right\}_{j = 1}^{m} $. Instead of re-computing the shortest distances for all points, only new data points are considered. For new point $ {\mathbf{x}}_{i} $, the distance between it and point $ {\mathbf{x}}_{j}\, { \in }\,\left\{ {{\mathbf{x}}_{i} , l\left( {{\mathbf{x}}_{i} } \right)} \right\}_{i = 1}^{N + m} $. is initialized as $ d_{G} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) = \tilde{d}\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) $ if $ {\mathbf{x}}_{j} $ and $ {\mathbf{x}}_{i} $ are neighbors, otherwise, put $ d_{G} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) = \infty $. Then Floyd’s algorithm is adopted to dynamically update the shortest distance matrix in all-pairs data. Hence, for each value of $ p = N + 1, \cdots ,N + m $. in turn, replace all entries $ d_{G} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) $ by $ { \hbox{min} }\left\{ {d_{G} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right),d_{G} \left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{p} } \right) + d_{G} \left( {{\mathbf{x}}_{p} ,{\mathbf{x}}_{j} } \right)} \right\} $. After the iteration, the final matrix is denoted $ {\mathbf{D}}_{t} $. At last, we use the classic MDS algorithm to construct r’-dimensional embedding. During the iterations, the k value in neighborhood graph changes allows

$$ k\left( t \right) = k_{0} + k_{1} t $$

(5)

where $ k_{0} $ is the initial value and $ k_{1} $ acts as a factor controlling the change range. Table 2 lists the details of IALSI.

Table 2. IALSI

Full size table

Remark on computational complexity. We focus on the space complexity of construction of distance matrix and neighborhood graph, as the steps is the major computing load in practice. The space complexity of original S-Isomap algorithm is $ O\left( {N^{2} } \right) $ [8]. When the new data points are added, the space complexity of batch algorithm reaches to $ O\left( {\left( {N + m} \right)^{2} } \right) $. In IALSI, only the shortest distances involving new data need to be computed. Hence, the space complexity of IALSI is simply $ O\left( {2N \times m + m^{2} } \right) $.

3 Experiments and Discussions

In this section, experiments are carried out to evaluate the performance of IALSI. Three publicly available image data sets, including Corel [12], UC Merced LULC [13], and MNIST [14], are used for image retrieval and classification. Two batch methods including original Isomap and S-Imap, two incremental methods including incremental S-Isomap (ISI) and IALSI are compared. It is noted that, in ISI, samples to be added into training set are randomly selected. All the algorithms are implemented in MATLAB and run on an Intel(R) core(TM) i5-3470 PC at 3.2 GHz with 12 GB RAM.

3.1 Data Sets and Experimental Setting

Two datasets, the UC Merced LULC and Corel, are used for image retrieval. The LULC data set, obtained from aerial imagery, consists of images from 20 classes, with a pixel resolution of 30 cm. Each class contains 100 images of size 256 by 256 pixels. The Corel data set is from the real-world photo from COREL Photo Gallery. It has 20 categories with 500 images per category. We extract basic color features and wavelet texture features to describe images [15]. The features include color histogram (32 dimensions), color moment (64 dimensions), color auto correlogram (6 dimensions), wavelet moment (40 dimensions) and Gabor transform where the number of scales was set 4 and orientation was set 6 (48 dimensions). All the features are concatenated into a long vector as an image feature and each image is represented by a 190-dimensional vector. The new feature is normalized to zero mean and unit variance. The MNIST handwritten digit database is used for classification. It has 70000 hand written digit images with size of $ 28 \times 28 $ pixels. We randomly choose 4000 images from digits “0–9” for evaluation. Figure 1 gives some example images of the three data sets.

In our experiments, each dataset is randomly divided into three partitions: training set, candidate set and test set. Among them, the candidate set is for sample selection where the informative samples are selected and added into the training set in each iteration. For the LUCL, Corel and MNIST data sets, the numbers of training and test samples are 200 and 300, 500 and 500, 500 and 500, respectively. The rest acts as candidate samples. For supervised DR methods, a BP neural network is constructed to approximate the mapping from the high-dimensional space to the lower-dimensional space. In the reduced dimension space, L ₂ distance metric is employed for image retrieval to measure the similarities between the query image in the test set and the training images. For classification task, KNN $ \left( {K = 1} \right) $ classifier is used to predict their class labels for the test samples.

In our experiments, the dimension of data is reduced to the number of classes of each data set. The parameters of each method are carefully adjusted to get the best performance. In particular, for Isomap and S-Isomap, the k value in KNN is assigned to 25 for LULC, 80 for COREL, 60 for MNIST. For the incremental learning methods, $ k_{0} $ is set 15 for LUCL, 60 for Corel and MNIST, $ k_{1} $ is 30 for the three data sets. Besides, we set $ m = 300 $ for all datasets, $ T = 3 $ for LULC and $ T = 6 $ for others.

3.2 Experimental Results and Analysis

Table 3 summarizes the retrieval precisions of 4 different methods when top 5 to 25 (with 5 step intervals) images are retrieved. In addition, classification accuracies on MNIST are also listed. It can be seen: (1) supervised methods perform better than unsupervised Isomap; (2) more training samples help to improve the performance; (3) IALSI with active learning outperforms the other methods, as more informative samples are chosen.

Table 3. Performance comparison of different methods

Full size table

The running time of different methods are shown in Fig. 2. Compared with the batch methods, incremental learning methods are more efficient. Figure 3 takes the Corel data set as an example, and further depicts the running time during iterations. For the batch methods, re-computation of distance matrix is necessary when new samples are added and thus the computation amount rises dramatically while a smooth increase can be observed in the incremental methods.

4 Conclusions

An incremental active learning method for S-Isomap, IALSI, is proposed in the paper. IALSI aims to modify the original S-Isomap into incremental learning version meanwhile selecting informative samples through active learning in each iteration. The core idea of IALSI is to update the geodesic distance matrix based on the existing computation results and select informative samples according to an uncertainty criterion. The encouraging experimental results show that IALSI not only achieves better performance in image retrieval and classification tasks, but reduces the running time greatly.

References

Cai, D., He, X., Han, J.: Training linear discriminant analysis in linear time. In: 24th IEEE International Conference on Data Engineering, Cancun, pp. 209–217 (2008)
Google Scholar
Zhang, D., Zhou, Z.H., Chen, S.: Adaptive kernel principal component analysis with unsupervised learning of kernels. In: 6th International Conference on Data Mining, Hong Kong, pp. 1178–1182 (2006)
Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inf. Process. Syst. 14(6), 585–591 (2002)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Tenenbaum, J.B., De, S.V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Geng, X., Zhan, D.C., Zhou, Z.H.: Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans. Syst. Man Cybern. B Cybern. 35(6), 1098–1107 (2005)
Article Google Scholar
Zhang, Z., Chow, T.W., Zhao, M.: M-Isomap: orthogonal constrained marginal Isomap for nonlinear dimensionality reduction. IEEE Trans. Syst. Man Cybern. B Cybern. 43(1), 180–191 (2012)
Google Scholar
Yang, B., Xiang, M., Zhang, Y.: Multi-manifold discriminant Isomap for visualization and classification. Pattern Recogn. 55(11), 215–230 (2016)
Article Google Scholar
Hoi, S.C.H., Wang, J., Zhao, P.: LIBOL: a library for online learning algorithms. J. Mach. Learn. Res. 15(1), 495–499 (2014)
MATH Google Scholar
Law, M.H.C., Jain, A.K.: Incremental nonlinear dimensionality reduction by manifold learning. IEEE Trans. Patter Anal. Mach. Intell. 28(28), 377–391 (2006)
Article Google Scholar
Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Advance in Neural Information Processing System, pp. 892–900 (2010)
Google Scholar
Hoi, S.C.H, Liu, W., Lyu, M.R., Ma, W.Y.: Learning distance metrics with contextual constraints for image retrieval. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2072–2078 (2006)
Google Scholar
Yang, Y., Newsam, S.: Bag-of-visual-words and spatial extensions for land-use classification. In: ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, USA, pp. 270–279 (2010)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Xia, H., Hoi, S.C., Jin, R.: Online multiple kernel similarity learning for visual search. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 536–549 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai University, Shanghai, China
Guopeng Zhang, Rui Huang & Junli Chen

Authors

Guopeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Junli Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Huang .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
Dalian University of Technology, Dalian, China
Xiangwei Kong
UNSW, Sydney, New South Wales, Australia
David Taubman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, G., Huang, R., Chen, J. (2017). Incremental Active Learning Method for Supervised ISOMAP. In: Zhao, Y., Kong, X., Taubman, D. (eds) Image and Graphics. ICIG 2017. Lecture Notes in Computer Science(), vol 10667. Springer, Cham. https://doi.org/10.1007/978-3-319-71589-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-71589-6_34
Published: 29 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71588-9
Online ISBN: 978-3-319-71589-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Incremental Active Learning Method for Supervised ISOMAP

Abstract

Similar content being viewed by others

An Improved Supervised Isomap Method Using Adaptive Parameters

Iterative Active Classification of Large Image Collection

Learning Distance Metrics with Feature Space Performance for Image Retrieval

Keywords

1 Introduction