Open AccessArticle

Tensor-Based Uniform and Discrete Multi-View Projection Clustering

Linlin Ma

^1,†,

Haomin Li

^2,†

Wenke Zang

^1,*

Xincheng Liu

¹ and

Minghe Sun

School of Business, Shandong Normal University, Jinan 250014, China

Haide College, Ocean University of China, Qingdao 266100, China

Carlos Alvarez College of Business, The University of Texas at San Antonio, San Antonio, TX 78249, USA

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(4), 817; https://doi.org/10.3390/electronics14040817

Submission received: 31 December 2024 / Revised: 12 February 2025 / Accepted: 17 February 2025 / Published: 19 February 2025

(This article belongs to the Special Issue Emerging Distributed/Parallel Computing Systems)

Download

Browse Figures

Figure 1
Flowchart of the TUDMPC algorithm. "> Figure 2
Visualization of the affinity matrices of the MSRC_v1 dataset. (a) SC. (b) MCGC. (c) MVGL. (d) GMC. (e) SFMC. (f) TUDMPC. "> Figure 3
Visualization of the affinity matrices of the NGs dataset. (a) SC. (b) MCGC. (c) MVGL. (d) GMC. (e) SFMC. (f) TUDMPC. "> Figure 4
Visualization of the affinity matrices of the 100leaves dataset. (a) SC. (b) MCGC. (c) MVGL. (d) GMC. (e) SFMC. (f) TUDMPC. "> Figure 5
Visualization of the HW2sources dataset. (a) SC. (b) Co-regMSC. (c) AWP. (d) MCGC. (e) MVGL. (f) TUDMPC. "> Figure 6
Visualization of the MSRC_v1 dataset. (a) SC. (b) Co-regMSC. (c) AWP. (d) MCGC. (e) MVGL. (f) TUDMPC. "> Figure 7
Some face images from the ORL dataset (10 × 10). "> Figure 8
Some handwritten digital images from the HW dataset (10 × 50). "> Figure 9
Some face image recognition results of TUDMPC on the ORL dataset (10 × 10). "> Figure 10
Some handwritten digital image recognition results of TUDMPC on the HW dataset (10 × 50). "> Figure 11
Results of the ablation experiments on the 100leaves dataset. "> Figure 12
Convergence on some datasets. (a) MRSC_v1. (b) HW2source. (c) ORL. (d) 100leaves. "> Figure 13
Sensitivity analysis on different datasets as parameters <math display="inline"><semantics> <mi>α</mi> </semantics></math> and k change. (a) MSRC_v1. (b) HW2sources. (c) 100leaves. (d) NGs. (e) ORL. "> Figure 14
Results of the Nemenyi Test. (a) ACC. (b) NMI. (c) Purity. ">

Versions Notes

Abstract

Multi-view graph clustering (MVGC) utilizes affinity graphs to efficiently obtain information between views. Although various excellent MVGC methods have been proposed, they still have many limitations. To surmount these limitations, this work develops a novel tensor-based unified and discrete multi-view projection clustering (TUDMPC) approach. Specifically, TUDMPC uses projection and the

L_{2, 1}

-norm for feature selection to reduce the effects of redundancy and noise. Meanwhile, the differences among similar graphs are minimized through the tensor kernel norm to better leverage information across views and capture high-order correlations. In addition, the rank constraint is applied to keep the affinity graphs with a discrete cluster structure, and the clustering results are obtained directly in a unified joint framework. Finally, an efficient optimization algorithm is proposed to obtain the clustering results. Experiments are conducted to compare the clustering results of TUDMPC with seven baseline methods. The results show that TUDMPC outperforms the existing methods.

Keywords:

multi-view clustering; tensor kernel norm; projection learning; graph learning

1. Introduction

For the majority of applications, the collected data are from various sources with different formats. For example, a website can include images, audio, and other types of data in various ways, and may be considered multi-view data. Consensus cluster structures among different view data can be found by multi-view clustering (MVC) methods. Different views of data may have consistent information in a specific underlying data structure. Capturing information on inter-view consistency and complementarity is one of the hotspots in multi-view clustering studies.

Graphs are a crucial data structure for visually displaying the connections of data. Multi-view graph clustering (MVGC), with outstanding performance and widespread applications in numerous industries [1], can use affinity graphs to efficiently obtain consistent and complementary information [2].

Numerous MVGC approaches have demonstrated promising performance. Kumar et al. [3], by developing Co-regMSC, used feature factorization to obtain latent embedding, learn affinity graphs by reducing the differences in low-dimensional latent embedding of various views, and achieve decent performance. However, Co-regMSC does not distinguish the weights of various views. Nie et al. [4] proposed to distribute weights to various views in an adaptable manner. However, this approach uses preset similarity graphs, and, hence, its clustering performance is heavily influenced by the initial graphs. Jing et al. [5] used hypergraph embedding to select important features while fusing all views in the Grassmann space. However, these approaches still have the following two limitations. (1) Learning affinity graphs in high-dimensional spaces with redundant and high-dimensional features can lead to the curse of dimensionality and feature spoofing problems. (2) Most affinity graphs can only consider the shared information but cannot capture the higher-order correlations, making the generated affinity matrix poor-quality.

To address the high dimensionality issue, a large number of dimensionality reduction techniques [6,7,8] have been applied to MVC. Yuan et al. [9] presented an unsupervised feature selection method that simultaneously removes irrelevant features. Gao et al. [8] transformed high-dimensional data into a subspace using a projection matrix but did not care about the local structure and complementarity. To use higher-order and complementary information, researchers presented the tensor kernel norm methods [10,11,12,13,14]. Wu et al. [15] learned uniform graph and low-rank tensor, which reduces the dimensionality of data and retains higher-order information, but does not perform feature selection for dimensionality reduction, with performance susceptible to noise. Liu et al. [14] used tensor constraints to capture the global structure of views under deep learning.

High-order information among different views can be captured by tensors. The existing tensor-based methods do not consider redundant information, and most of them do not directly obtain clustering results during optimization but use other clustering algorithms subsequently, such as K-means or spectral clustering, to obtain the results. In order to perform feature selection and structure preservation in a unified framework and capture both complementary and higher-order information without subsequent steps to obtain clustering results, this study proposes a brand-new MVC approach called tensor-based unified and discrete multi-view projection clustering (TUDMPC). As depicted in Figure 1, the approach uses projection to learn low-dimensional spatial features and uses the

L_{2, 1}

-norm for feature selection to eliminate the effects of redundancy and noise. Meanwhile, the divergence between similar graphs is minimized by a tensor kernel norm to better utilize the high-order correlations and complementary information of related graphs with different views. In addition, the Laplace rank constraints are imposed to ensure that the affinity graphs have discrete cluster structures and that the clustering results are obtained directly in a unified framework. This work makes the following major contributions:

The high-dimensional data in the view are mapped to a low-dimensional potential space by projection learning to reduce the complexity of the method and to avoid dimensional catastrophes. Meanwhile, the $L_{2, 1}$ -norm is utilized for feature selection to remove the problem of outliers and redundant data on the clustering. The use of post-fusion in the low-dimensional space can adaptively and more accurately learn affinity graphs while maintaining the popular structure.
The proposed TUDMPC minimizes the differences between views by using the tensor kernel norm, which makes excellent use of complementarity and captures the high-order correlations.
The affinity graph generated from TUDMPC representing the clustering structure and the clustering results can be obtained in a unified framework without subsequent processing.
An efficient iterative algorithm is developed to implement TUDMPC. The experimental results show that TUDMPC outperforms some of the baseline methods using datasets from the literature and from online websites.

The rest of this paper is structured as follows. In Section 2, related work is briefly reviewed. The details of TUDMPC, along with its background knowledge, are described in Section 3. The optimized iterative algorithm is described in Section 4. Experimental experiments are reported in Section 5. Conclusions are provided in Section 6.

2. Related Works

Numerous MVC methods have been developed, which can be generally classified into graph-based [1,3,16,17,18,19,20,21,22,23,24], subspace-based [4,25,26,27,28,29,30], and matrix decomposition- based [6,14,31,32,33].

The graph-based methods use graph topology to construct consistent similarity matrices. Wang et al. [20] advocated the learning of each view-specific and consistent affinity matrices in a mutually reinforcing approach. Xia et al. [21] exploited an MVC network that facilitates joint self-supervised learning and blocks the diagonal representation of data. Ren et al. [18] introduced the low-dimensional space with energy-conserving properties into clustering, using energy-preserving feature projection-based techniques to solve high-dimensional and corrupted data problems. Sang et al. [22] suggested an automatic weighted multi-view projection clustering approach that allows simultaneous manifold learning, dimensionality reduction, and consistent graph learning. Zhao et al. [23] extracted local information of binary codes to obtain the best clustering results by orthogonalizing the mapping matrix to remove redundant data and embedding bipartite graphs into a unified clustering framework.

Multi-view clustering based on matrix factorization tries to find the subspace where most of the data points are located by factorizing the original data as a product of two non-negative, including the base and the coefficient, matrices and utilizing the coefficient matrix to acquire the clustering results. Deng et al. [31] presented unconstrained non-negative matrix factorization guided multi-view clustering without non-negative restrictions by employing mapping functions that meet the non-negativity requirement, which allows extending the optimization method and employing learning rates to guide the optimization. Liu et al. [32] linked matrix factorization with probabilistic latent semantic analysis to improve clustering performance. Shi et al. [33] reconstructed non-negative and orthogonal graphs and updated the soft label matrix to create a superior similarity matrix.

Multi-view subspace clustering approaches consider that the data are not distributed uniformly but are distributed in the underlying subspace, and thus the clustering structure can be found in the potential low-dimensional subspace. Lv et al. [28] proposed partition fusion, where multiple partitions are generated and all partitions are merged into one common partition. Chen et al. [34] presented the diversity-embedded depth matrix decomposition for multi-view clustering, which uses diversity loss for the depth matrix decomposition and reduces redundant features. Fu et al. [35] used tensor to obtain high-order information and fuse multiple subspaces over the Grassmann manifold and additionally applied the rank constraint on the consensus affinity matrix to gain clustering results directly in a unified subspace.

3. TUDMPC and Its Preparation

The details of TUDMPC are described in this section. The main notations used in the methods are introduced, the motivation of the developed method is briefly presented, and the method is finally described in detail.

3.1. Notations

In the following, italic letters represent scalars, boldface lowercase letters represent vectors, and boldface capital letters represent matrices. Furthermore, boldface script letters represent third-order tensors,

{∥A∥}_{2.1} = \sum_{j} {∥A (:, j)∥}_{2}

denotes the

L_{2, 1}

-norm of the matrix

A

and

{∥A∥}_{F}

represents the Frobenius norm of the matrix

A

. Additional notations are provided in Table 1.

3.2. Motivation

Most of the collected data in real life are high-dimensional and involve plenty of noise and redundant information due to the acquisition equipment and other factors. Therefore, poor clustering results may be obtained if the affinity map is generated directly from these raw data [22]. Although the various views provide consistent and complementary information [2], many multi-view clustering methods only generate consistent structural graphs by considering only consistent information but ignoring the complementary information. Additionally, multi-view clustering using the projection matrix for dimensionality reduction ignores the high-order correlation, thus making the generated similarity matrix less accurate and affecting the clustering performance. Therefore, the proposed TUDMPC takes the information complementarity and high-order correlation between views into full consideration in dimensionality reduction and noise removal and ensures that the affinity graph has a discrete clustering structure by introducing the Laplacian matrix rank constraint, and directly obtaining the clustering results in a unified framework without subsequent processing.

3.3. The TUDMPC Method

3.3.1. MVGC

Let

X^{(1)}, \dots, X^{(V)}

denote the multi-view data containing V views and

X^{(v)} = [x_{1}^{(v)}, \dots, x_{n}^{(v)}]

\in R^{d_{v} \times n}

x_{i}^{(v)} \in R^{d_{v} \times 1}

represent data point i in view

v s .

, and

d_{v}

represent the number of features. MVGC usually first constructs a similarity graph based the on original data, followed by k-means clustering or spectral clustering [22]. Therefore, the goodness of the affinity matrix constructed for use as an input to subsequent spectral clustering influences the final performance. To resolve this issue, Wang et al. [36] preserved the local manifold structure in the following way:

\begin{matrix} \min_{S^{(v)}} \sum_{v = 1}^{V} \sum_{i, j = 1}^{n} {∥x_{i}^{(v)} - x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + γ {∥S^{(v)}∥}_{F}^{2} \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1 \end{matrix},

(1)

where

S^{(v)} \in R^{n \times n}

is the affinity matrix, and

s_{i j}^{(v)}

represents the element of its row i column j.

3.3.2. Projection Learning

The purpose of using the projection matrix is to reduce the data dimension and to effectively improve the computational time complexity [37]. Based on the projection matrix, a

L_{2, 1}

-norm constraint is used for feature selection and noise removal [22] to maintain compactness. The following model is used to construct the projection matrix:

\begin{matrix} \min_{S^{(v)}, Z^{(v)}} \sum_{v = 1}^{V} \sum_{i, j = 1}^{n} {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + α {∥Z^{(v)}∥}_{2, 1} + γ {∥S^{(v)}∥}_{F}^{2} \\ s . t . S^{(v)} 1 = 1, s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, {(Z^{(v)})}^{T} X^{(v)} {(X^{(v)})}^{T} Z^{(v)} = I \end{matrix},

(2)

where

α

and

γ

are the parameters, and

Z^{(v)} \in R^{d_{v} \times m_{v}}

with

m_{v} \leq d_{v}

3.3.3. Tensor Kernel Norm

Given a third order tensor

C \in R^{n 1 \times n 2 \times n 3}

, let

C^{(i)}

denote the frontal slice i of

C

, and

\bar{C}

denote the discrete fast Fourier transform (FFT) of

C

along the three dimensions, i.e.,

\bar{C} = fft (C, [], 3)

. Thus,

C = fft (\bar{C}, [], 3)

Definition 1

(tensor Kernel Norm [38]). Given

C \in R^{n 1 \times n 2 \times n 3}

, its nuclear norm is given by

{∥C∥}_{*} = \sum_{i = 1}^{n 3} {∥\bar{C^{(i)}}∥}_{*} = \sum_{i = 1}^{n 3} \sum_{j = 1}^{\min (n 1, n 2)} σ_{j} (\bar{C^{(i)}}) .

(3)

The tensor kernel norm enables efficient acquisition of higher-order correlation information.

3.3.4. Rank Constraint

Multi-view clustering methods that use Laplace rank constraints to directly obtain clustering results can avoid the problems of being trapped in locally optimal solutions associated with the subsequent use of clustering methods such as K-means or spectral clustering. The direct implementation of multi-view clustering through the Laplace rank constraints essentially embeds the clustering step into the optimization process of representation learning, rather than treating it as an independent post-processing step. This approach outperforms the step-by-step strategy in both theoretical consistency and stability of results and is especially suitable for clustering complex multi-view data.

Lemma 1

([39]). The connective part of a matrix

S

has the same number of zero eigenvalues as its Laplacian matrix

L_{s}

, where

L_{s} = D - (S + S^{T}) / 2

D

is a diagonal matrix whose elements

D_{i i}

are given by

\sum_{j = 1}^{n} (s_{i j} + s_{i j}^{T}) / 2

, and

s_{i j}

is the element of

S

in row i and column j.

Lemma 1 shows that the Laplacian matrix

L_{s}

with zero eigenvalues has c weights, and has

rank (L_{s}) = n - c

. Because

rank (L_{s})

is difficult to compute, the condition is relaxed based on the work of Hao et al. [36] to obtain the following:

rank (L_{s}) = n - c = \sum_{i = 1}^{c} σ_{i} (L_{s}),

(4)

where

σ_{i} (L_{s})

is the ith smallest eigenvalue of matrix

L_{s}

Theorem 1

([40]). If

H \in R^{n \times n}

is a real symmetric matrix, then:

\begin{matrix} min_{f_{i}^{T} f_{j} = {_{0, i \neq j}^{1, i = j}} \sum_{i = 1}^{c} f_{l}^{T} H f_{l} = min_{F^{T} F = I} tr (F^{T} HF) \\ = λ_{1} + \dots + λ_{c} \\ = \sum_{l = 1}^{c} a_{l}^{T} H a_{l} \\ = tr (Θ^{T} H Θ) \end{matrix},

(5)

where

Θ = [a_{1}, a_{2} \dots, a_{c}]

and

λ_{1} \leq \dots \leq λ_{n}

are the n orthonormal eigenvectors and n eigenvalues of

H

, respectively.

According to Theorem 1, the following can be obtained:

\sum_{j = 1}^{c} σ_{j} (L_{s}) = \min_{F^{T} F = I} tr (F^{T} L_{s} F),

(6)

where

F = [f_{1}, \dots, f_{c}] \in R^{n \times c}

Therefore, combining the tensor kernel norm, and (1), (2), (3) and (6), the TUDMPC model is stated as follows:

\begin{matrix} \min_{S^{(v)}, Z^{(v)}, F} \sum_{v = 1}^{V} \sum_{i, j = 1}^{n} {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + α {∥Z^{(v)}∥}_{2, 1} + {∥S∥}_{*} \\ + γ {∥S^{(v)}∥}_{F}^{2} + β tr (F^{T} L_{s} F) \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1, {(Z^{(v)})}^{T} X^{(v)} {(X^{(v)})}^{T} Z^{(v)} = I, F^{T} F = I \end{matrix},

(7)

with

S (:, v, :) = S^{(v)}

. The TUDMPC model allows direct access to the clustering results.

4. Optimization and Complexity Analysis

4.1. Optimization

To obtain the optimal solution of (7), instead of using heuristic optimization procedures, an iterative alternating method is used to transform the constrained problem into non-constrained sub-problems. An iterative solution algorithm is developed to find exact solutions because heuristic procedures can only find quick and approximate solutions. Model (7) is modified by adding the auxiliary variable

G

to become:

\begin{matrix} L = \sum_{v = 1}^{V} (\sum_{i, j = 1}^{n} {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + γ {∥S^{(v)}∥}_{F}^{2} + {∥G∥}_{*} + α {∥Z^{(v)}∥}_{2, 1}) \\ + β tr (F^{T} L_{s} F) + 〈Y, S - G〉 + \frac{μ}{2} {∥S - G∥}_{F}^{2} \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1, {(Z^{(v)})}^{T} X^{(v)} {(X^{(v)})}^{T} Z^{(v)} = I, F^{T} F = I \end{matrix},

(8)

where

Y

is the tensor of Lagrange multipliers, and

μ > 0

is a penalty coefficient.

Thus, the optimization algorithm is composed of the following four modules.

4.1.1. Updating $Z^{(v)}$ by Fixing $S^{(v)}$ , $F$ and $G$

When

Z^{(v)}

is updated, the other variables are fixed:

\begin{matrix} min \sum_{v = 1}^{V} (\sum_{i, j = 1}^{n} {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + α {∥Z^{(v)}∥}_{2, 1}) \\ s . t . {(Z^{(v)})}^{T} X^{(v)} {(X^{(v)})}^{T} Z^{(v)} = I \end{matrix} .

(9)

This process is discussed for only one of the views as each is independent:

\begin{matrix} min \sum_{i, j = 1}^{n} {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + α {∥Z^{(v)}∥}_{2, 1} \\ s . t . {(Z^{(v)})}^{T} X^{(v)} {(X^{(v)})}^{T} Z^{(v)} = I \end{matrix},

(10)

where

{∥Z^{(v)}∥}_{2, 1} = 2 Tr ((Z^{(v)})^{T} W^{(v)} Z^{(v)})

, and

w_{i i}^{(v)}

is the value of the diagonal element of

W^{(v)}

with

w_{i i}^{(v)} = {1 / 2 ∥Z^{(v)} (i, :)∥}_{2}

. Thus, (10) can be written as the following:

\begin{matrix} min \sum_{i, j = 1}^{n} {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + α Tr ((Z^{(v)})^{T} W^{(v)} Z^{(v)}) \\ s . t . {(Z^{(v)})}^{T} X^{(v)} {(X^{(v)})}^{T} Z^{(v)} = I \end{matrix} .

(11)

Model (11) is solved by the Lagrange multiplier method with the Lagrangian as shown in (12) as follows:

L = Tr ((Z^{(v)})^{T} X^{(v)} L_{s}^{(v)} {(X^{(v)})}^{T} Z^{(v)}) + α Tr ((Z^{(v)})^{T} W^{(v)} Z^{(v)}) - Tr (ψ ((Z^{(v)})^{T} X^{(v)} {(X^{(v)})}^{T} Z^{(v)} - I)),

(12)

where

ψ

is a diagonal matrix of the Lagrange multipliers.

The solution approach of Sang et al. [22] can be used to find a solution of (12).

4.1.2. Updating $S^{(v)}$ by Fixing $Z^{(v)}$ , $F$ and $G$

The sub-problem with

S^{(v)}

as the variables can be written as:

\begin{matrix} \min_{S^{(v)}} \sum_{v = 1}^{V} (\sum_{i, j = 1}^{n} {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + γ {∥S^{(v)}∥}_{F}^{2}) \\ + β tr (F^{T} L_{s} F) + 〈Y, S - G〉 + \frac{μ}{2} {∥S - G∥}_{F}^{2} \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1 \end{matrix} .

(13)

The problem (13) can be rewritten as:

\begin{matrix} \min_{S^{(v)}} \sum_{v = 1}^{V} (\sum_{i, j = 1}^{n} {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + γ {∥S^{(v)}∥}_{F}^{2}) \\ + β tr (F^{T} L_{s} F) + \frac{μ}{2} {∥S^{(v)} - G^{(v)} + \frac{Y^{(v)}}{μ}∥}_{F}^{2} \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1 \end{matrix} .

(14)

With

d_{i j}^{v} = {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2}

f_{i j}^{v} = \frac{β}{V} {∥f_{i} - f_{j}∥}_{2}^{2}

and

E^{(v)} = G^{(v)} - \frac{1}{μ} Y^{(v)}

, and

e_{i}^{(v)} \in R^{1 \times n}

as the row vector of

E^{(v)}

, the model in (14) for view

v s .

is transformed into (15)

\begin{matrix} min \sum_{j = 1}^{n} d_{i j}^{v} s_{i j}^{(v)} + γ {∥s_{i}^{(v)}∥}_{2}^{2} + f_{i j}^{v} s_{i j}^{(v)} + \frac{μ}{2} {∥s_{i}^{(v)} - e_{i}^{(v)}∥}_{2}^{2} \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1 \end{matrix},

(15)

where

s_{i}^{(v)} \in R^{1 \times n}

is the row vector of

S^{(v)}

Let

o_{i j}^{v} = d_{i j}^{v} + f_{i j}^{v}

, then (15) is rewritten as

\begin{matrix} min \sum_{j = 1}^{n} o_{i j}^{v} s_{i j}^{(v)} + γ {∥s_{i}^{(v)}∥}_{2}^{2} + \frac{μ}{2} {∥s_{i}^{(v)} - e_{i}^{(v)}∥}_{2}^{2} \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1 \end{matrix} .

(16)

The model in (16) can be rewritten as:

\begin{matrix} min {∥s_{i}^{(v)} + \frac{o_{i}^{v}}{2 γ}∥}_{2}^{2} + \frac{μ}{2 γ} {∥s_{i}^{(v)} - e_{i}^{(v)}∥}_{2}^{2} \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1 \end{matrix},

(17)

where

o_{i}^{v} \in R^{1 \times n}

is the row vector composed of

o_{i j}^{v}

The Lagrangian corresponding to (17) is given as follows

L (s_{i}^{(v)}, η, δ) = {∥s_{i}^{(v)} + \frac{o_{i}^{v}}{2 γ}∥}_{2}^{2} + \frac{μ}{2 γ} {∥s_{i}^{(v)} - e_{i}^{(v)}∥}_{2}^{2} - η (s_{i}^{(v)} 1 - 1) - s_{i}^{(v)} δ,

(18)

where

η \geq 0

δ \in R^{n \times 1}

and

δ \geq 0

are the Lagrange multipliers.

After deriving the partial derivatives of

L (s_{i}^{(v)}, η, δ)

in (18) with respect to

s_{i}^{(v)}

and setting them to 0, the following is obtained

2 (s_{i}^{(v)} + \frac{o_{i}^{v}}{2 γ}) + \frac{μ}{γ} (s_{i}^{(v)} - e_{i}^{(v)}) - η 1^{T} - δ^{T} = 0 .

(19)

By the KKT condition,

s_{i}^{(v)} δ_{j} = 0

holds. Therefore, the following updated solution for

s_{i j}^{(v)}

is obtained:

s_{i j}^{(v)} = \{\begin{matrix} {(\frac{μ e_{i j}^{(v)} - o_{i j}^{v} + η γ}{2 γ + μ})}_{+}, & i \neq j \\ 0, & i = j \end{matrix},

(20)

where

e_{i j}^{(v)}

is element j of

e_{i}^{(v)}

To simplify the calculation, the k-nearest neighbor approach [22] is employed. Sorting from the smallest to the largest gives

s_{i k}^{(v)} > 0

, and

s_{i k + 1}^{(v)} = 0

, and the results in (20) become:

- o_{i k}^{v} + μ e_{i k}^{(v)} + γ η > 0, - o_{i k + 1}^{v} + μ e_{i k + 1}^{(v)} + γ η = 0 .

(21)

Given

s_{i}^{(v)} 1 = 1

, the following can be obtained from (20):

η = \frac{1}{k} (2 + \frac{μ}{γ} + \sum_{m = 1}^{k} \frac{o_{i m}^{v}}{γ} - \sum_{m = 1}^{k} \frac{μ}{γ} e_{i m}^{(v)}) .

(22)

Combining (21) and (22), the following holds:

γ = \frac{1}{2} (k o_{i k + 1}^{v} - k μ e_{i k + 1}^{(v)} - μ - \sum_{m = 1}^{k} (o_{i m}^{v} - μ e_{i m}^{(v)})) .

(23)

By (20) and (23), the updated solution of

s_{i j}^{(v)}

can be rewritten as:

s_{i j}^{(v)} = \{\begin{matrix} \frac{μ e_{i j}^{(v)} - o_{i j}^{v} + o_{i k + 1}^{v} - μ e_{i k + 1}^{(v)}}{k (o_{i k + 1}^{v} - μ e_{i k + 1}^{(v)}) + \sum_{t = 1}^{k} (μ e_{i t}^{(v)} - o_{i t}^{v})}, & i \leq k \\ 0, & i > k \end{matrix} .

(24)

4.1.3. Updating $G$ by Fixing $S^{(v)}$ , $F$ and $Z^{(v)}$

The update of

G

can be written as:

\begin{matrix} min {∥G∥}_{*} + 〈Y, S - G〉 + \frac{μ}{2} {∥S - G∥}_{F}^{2} \\ = min \frac{1}{μ} {∥G∥}_{*} + \frac{1}{2} {∥S + \frac{Y}{μ} - G∥}_{F}^{2} \end{matrix} .

(25)

Using the results of Hu et al. [41], the solution to problem (25) is

G^{*} = U Γ_{e} [Σ] V^{T},

(26)

where

U Σ V^{T}

is the singular value decomposition of the matrix

G

4.1.4. Updating $F$ by Fixing $S^{(v)}$ , $Z^{(v)}$ and $G$

The update of

F

can be written as below:

\begin{matrix} min β tr (F^{T} L_{s} F) \\ s . t . F^{T} F = I \end{matrix} .

(27)

The matrix

F

consists of c eigenvectors corresponding to the c smallest eigenvalues of

L_{s}

. Then, c clusters can be identified depending on the connection of graph

S

with

S = \sum (S^{(v)} + {(S^{(v)})}^{T})

. Let

ς

denote the number of eigenvalues of the Laplacian matrix

L_{s}

that equals 0 in each iteration. In particular, update

β

β = β \times 2

when

ς < c

or to

β = β / 2

when

ς > c + 1

. Except for the above two conditions, the search process ends.

The steps of the iterative alternating method, i.e., the TUDMPC algorithm, is outlined in Algorithm 1.

Algorithm 1 Steps in the TUDMPC algorithm

Require: Multi-view datasets

X^{(v)}

for

v = 1, 2 \dots, V

, the projection dimension

m_{v}

for

v = 1, 2 \dots, V

, and the parameters

α

β

and k.

Ensure: Graph

S

with c-connected components.

1:: Initialize $S^{(v)}$ via (1)
2:: Initialize $Z^{(v)}$ using the approach proposed by Sang et al. [22];
3:: Initialize $G$ via (26);
4:: while the convergence criteria are not satisfied do
5:: Update $S^{(v)}$ using (24);
6:: Update $Z^{(v)}$ by (11);
7:: Update $G$ via (26);
8:: Update $F$ by (27);
9:: $Y = Y + μ (S - J)$ ;
10:: $μ = \min (ρ μ, μ_{\max})$ ;
11:: end while
12:: Obtain the c clusters directly from graph $S = \sum_{v = 1}^{V} (S^{(v)} + {(S^{(v)})}^{T})$ .

4.2. Complexity Analysis

Four modules, i.e., the updating of

S^{(v)}

G

Z^{(v)}

and

F

, respectively, are included in the TUDMPC algorithm to solve (7). Specifically, the complexities of updating

S^{(v)}

Z^{(v)}

F

and are

O (V n k)

O (V m_{v} {d_{v}}^{2})

O (c n^{2})

and

O (n^{2} V l o g n + n^{2} {m_{v}}^{2})

, respectively. Overall, the complexity of the TUDMPC algorithm is

O (t (V n k + V m_{v} {d_{v}}^{2} + c n^{2} + n^{2} V l o g n + n^{2} {m_{v}}^{2})) \approx O (t n^{2} (c + V l o g n + {m_{v}}^{2}))

, where t is the number of iterations.

5. Numerical Experiments and Analysis of Results

Six datasets are used in the experiments to test the performance of TUDMPC, and seven baseline algorithms are used for comparison. Meanwhile, five evaluation metrics are selected to assess the performance of TUDMPC and other methods. Furthermore, the convergence of the TUDMPC algorithm is examined. Finally, the sensitivity of the performance of the TUDMPC algorithm is analyzed as the parameter values change.

5.1. Experimental Setup

Datasets. The effectiveness of TUDMPC is tested on six datasets. Basic details about the datasets used are given in Table 2.

MSRC-v1: This dataset contains 20 groups of images of objects, each with about 100 images. It is mainly used for object recognition and image segmentation tasks. The dataset is suitable for multi-view clustering and image classification, where each image symbolizes a different “perspective”.
HW2sources: This is a handwritten digital recognition dataset, which contains handwritten digital samples from multiple perspectives and comes from different writings. This dataset provides diversity and different writing styles and is suitable for testing the performance of clustering algorithms on non-uniform data.
100leaves: This dataset contains 100 different types of leaf images, and each leaf has multiple images, which are mainly used for plant classification. It can help evaluate the effectiveness of multi-view learning in natural image data.
NGs: This dataset contains different levels of detail of natural images and is suitable for exploring fine-grained object classification. Multi-view provides more information and is suitable for testing the performance of multi-view clustering algorithms in different details.
Hdigit: Similar to HW2sources, this dataset is also about handwritten numbers but comes from different recording methods or devices. It provides more sample variability for handwritten digital recognition and can test the stability of clustering algorithms.
ORL: The ORL dataset contains 400 face images of 40 different people, which includes relevant information in expressions, facial ornaments and minute gestures, and is often used in face recognition studies. The different features of the face images in the ORL dataset are extracted by using four feature extraction methods including Generalized Search Tree (GIST), Local Binary Pattern (LBP), Histogram of Orientation Gradients (HOG) and Gradient Energy Norm Tensor (CENT), which are considered as four views. The obtained feature dimensions of the four views are 512, 59, 864, and 254, respectively.

Baseline methods. The performance of the TUDMPC is compared with those of the following seven baseline methods: SC [42], Co-regMSC [3], MVGL [43], MCGC [44], AWP [45], GMC [20] and SFMC [46].

Evaluation metrics. The evaluation metrics used to measure the performance of the clustering methods include accuracy (ACC), purity, normalized mutual information (NMI), recall and F-score. The value of each of these five evaluation metrics is in the interval of [0, 1], and the larger the value of each metric, the better the clustering results. To reduce the effect of randomness, each method is run on each dataset 20 times independently, and the mean and standard deviation are calculated and reported for each method. The results are reported in the following tables and are also visualized in the following figures. The best mean value for each assessment metric is highlighted, and the next best mean value is italicized for each dataset.

Parameter setting. Some parameters must be specified beforehand. Specifically,

k = 10

is used for the baseline methods using the KNN method. In addition, the affinity graph of SFMC is a bipartite graph, so the anchor point scale is set to

0.5

as described in Li et al. [46] where SFMC was originally proposed. For all baseline methods used in the experiments, the parameter settings proposed in the original articles publishing the corresponding methods are used.

Table 2. Basic information of the different databases.

Datasets	Instances (n)	Views (V)	Clusters (c)	Dimensions
MSRC-v1 [47]	210	5	7	24/576/512/256/254
HW2sources [36]	2000	2	10	784/256
100leaves ¹	1600	3	100	64/64/64
NGs ²	500	3	5	2000/2000/2000
Hdigit ³	10,000	2	10	784/256
ORL [48]	400	4	40	512/59/864/254

¹ https://archive.ics.uci.edu/dataset/241/one+hundred+plant+species+leaves+data+set, accessed on 12 May 2023; ² http://lig-membres.imag.fr/grimal/data.html, accessed on 12 May 2023; ³ https://cs.nyu.edu/home/people/in_memoriam/roweis/data.html, accessed on 12 May 2023.

5.2. Analyses of Experimental Results

The clustering results measured in the five evaluation metrics for all methods are reported in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8. The findings below are drawn from these results.

For the MRSC_v1 dataset, as shown in Table 3, TUDMPC has the best performance of all the methods. For example, the mean ACC of TUDMPC is

28.09 %

12.07 %

17.62 %

17.14 %

18.09 %

17.62 %

, and

14.28 %

higher, and the mean NMI of TUDMPC is

28.82 %

12.78 %

9.45 %

12.49 %

18.24 %

7.68 %

and

5.94 %

higher, than those of SC, Co-regMSC, MVGL, AWP, MCGC, GMC and SFMC, respectively. The results of the HW2sources, 100leaves and Hdigit datasets are presented in Table 4, Table 7 and Table 8 and have very similar patterns.

The results for the high-dimensional NGs dataset, as presented in Table 5, illustrate that TUDMPC also obtained better clustering results than the baseline methods. The main reason is that the proposed TUDMPC reduced the dimensions by using the projection matrix and using the

L_{2, 1}

-norm for feature learning to remove noise and redundant information when learning the similarity matrix on clean and low-dimensional data, while the baseline methods learned the initial affinity map directly from the original data.

As the results of the ORL dataset in Table 6 illustrate, TUDMPC achieved the optimal result for the ACC, NMI, F-score and recall indicators, although it did not achieve the best results in purity. Overall, TUDMPC has better performance than the baseline method.

One of the main reasons for the proposed TUDMPC to outperform the SC method on all the datasets is that the proposed TUDMPC uses tensor to take into account the complementary information and preserves the local manifold structure. This result shows the success of the proposed approach in enhancing the functionality of MVC.

These results provide evidence to validate the performance and effectiveness of TUDMPC. Two main reasons contribute to the good performance of TUDMPC. (1) Projection learning is utilized to map high-dimensional data to a space of lower dimension to reduce method complexity, prevent dimensional curses, and lessen the effects of noise and redundancy. (2) The high-order correlations and complementarities included in the various views are effectively utilized by the application of the tensor kernel norm.

To visualize this effect, the affinity matrices are visualized for the datasets MRSC_v1, NGs and 100leaves in Figure 2, Figure 3 and Figure 4, respectively. The number of diagonal blocks in the figure is the number of clusters obtained by running the corresponding algorithm, and the rest of the surrounding blue highlights are unclassified data points. As shown in these figures, MCGL can obtain the block diagonal structure, which is not clear and the number of obtained diagonal blocks is not equal to the total number of clusters. Although MVGL, GMC and SFMC can obtain the right number of diagonal blocks, like MCGC, many data points are clustered around these blocks resulting from noisy and redundant information. By contrast, TUDMPC not only obtained the correct diagonal blocks but also fully captured the complementary information between views using the tensor kernel norm, thereby making the clustering structure clearer.

The t-distribution random neighborhood embedding (t-SNE) [49] is also used to visualize the diagonal block distribution of affinity matrices to assess the clustering performance. As examples, the affinity matrices of the HW2sources and MSRCv1 datasets, with 10 and 7 clusters, respectively, are shown in Figure 5 and Figure 6, where each number in the legend represents a cluster. Both SC and Co-regMSC use Gaussian functions to construct view-specific similarity maps [22]. As shown in these figures, SC cannot obtain a clear cluster structure because it does not leverage the complementary information. Compared to the baseline methods, the similarity matrices obtained by TUDMPC can reveal clearer cluster structures with fewer incorrectly clustered data points.

5.3. More Applications

Section 5.2 demonstrated the clustering performance of TUDMPC using five metrics as compared with seven baseline methods using six real datasets. To prove the usefulness of TUDMPC, the clustering results of TUDMPC on two more real datasets are reported in this subsection. The ORL face image dataset, as used above, and the HW handwritten digital image dataset are used to verify the effectiveness of TUDMPC on real image data.

5.3.1. Datasets

The HW dataset contains handwritten digital images of the numbers 0 to 9, which are taken from a map of Dutch utilities available in the UCI Repository [50]. There are 200 samples for each digit for a total of 2000 handwritten patterns. In this work, the original image features are presented in six feature views, including 76-dimensional character shapes, 216-dimensional contour-dependent Fourier coefficients, 64-dimensional Karhunen-Love coefficients, 47-dimensional Zernike moments, 240-dimensional pixel averages in a 2 × 3 window, and 6-dimensional morphology features, i.e., feature dimensions for the six views are 76, 216, 64, 47, 240 and 6, respectively.

The ORL dataset is described in Section 5.1. Some of the actual images of the above two datasets are shown in Figure 7 and Figure 8.

5.3.2. Analysis of the Application Results

The visual recognition results of TUDMPC for the first 100 face images of the ORL dataset are presented in Figure 9, where images in the same cluster are identified by the same color. As Figure 9 shows, TUDMPC misclassified only two groups of face images, i.e., images 12 and 20 of the first row and images 1, 4, 5, 7 and 9 of the fourth row, out of the 10 groups of face images presented, and classified the rest of the groups correctly. Thus, the face image recognition performance of TUDMPC is satisfactory.

The visual recognition results of TUDMPC for the 500 handwritten digital images from the HW dataset are presented in Figure 10, where the images in the same cluster are identified by the same color. From Figure 10, TUDMPC misclassified 28 out of the presented 500 handwritten digital images and classified the remaining 472 handwritten digital images correctly. Thus, TUDMPC has excellent performance in digital image recognition.

5.4. Ablation Experiments

The ablation experiments are used to verify the validity of the projection matrix and tensor. Specifically, the projection matrix

Z^{(v)}

is removed from the original model and obtained as TUDMPC1, and the tensor is removed from the original model and obtained method is designed as TUDMPC2. TUDMPC1 and TUDMPC2 are shown in (28) and (29) below:

\begin{matrix} \min_{S^{(v)}, F} \sum_{v = 1}^{V} \sum_{i, j = 1}^{n} {∥x_{i}^{(v)} - x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + {∥S∥}_{*} + γ {∥S^{(v)}∥}_{F}^{2} + β tr (F^{T} L_{s} F) \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1, F^{T} F = I \end{matrix} .

(28)

\begin{matrix} \min_{S^{(v)}, Z^{(v)}, F} \sum_{v = 1}^{V} \sum_{i, j = 1}^{n} {∥{(Z^{(v)})}^{T} x_{i}^{(v)} - {(Z^{(v)})}^{T} x_{j}^{(v)}∥}_{2}^{2} s_{i j}^{(v)} + α {∥Z^{(v)}∥}_{2, 1} + γ {∥S^{(v)}∥}_{F}^{2} + β tr (F^{T} L_{s} F) \\ s . t . s_{i i}^{(v)} = 0, 0_{n} \leq S^{(v)} \leq 1_{n}, S^{(v)} 1 = 1, {(Z^{(v)})}^{T} X^{(v)} {(X^{(v)})}^{T} Z^{(v)} = I, F^{T} F = I \end{matrix} .

(29)

The performances measured in ACC and NMI of the TUDMPC, TUDMPC1 and TUDMPC2 on the 100leaves dataset are given in Figure 11. From the figure, the projection learning and the tensor learning can be seen to facilitate MVC. Especially, projection learning can significantly enhance the clustering performance by capturing the structure of the data in a low-dimensional space free of noise and redundant data.

5.5. Convergence Analysis

To show the convergence more clearly and to verify the theoretical results, the convergences of TUDMPC on the MRSC_v1, HW2sources, ORL and 100leaves datasets are depicted in Figure 12. The objective function value gradually declines with the increase in the number of iterations and can stabilize after about 20 iterations. Therefore, TUDMPC has excellent convergence.

5.6. Sensitivity Analyses

The objective function (7) of the TUDMPC has three parameters, i.e.,

α

γ

and

m_{v}

, that need to be determined. According to (23), the search for an optimal value of

γ

is transformed to the search for the optimal value of k. Therefore, the sensitivity of the clustering performance of TUDMPC when

γ

changes can be analyzed when k changes. The value of k is empirically chosen from the interval [5,30]. It is challenging for the clustering method to find the best values for

α

for all datasets because different datasets have distinct properties. Therefore, this study uses grid search to find the best values of the regularization parameter

α

from the set

\{10^{- 2}, 10^{- 1}, 1, 10, 10^{2}, 10^{3}, 10^{4}, 10^{5}\}

. Simultaneously, according to Sang et al. [22], the value of

m_{v}

is set according to (30) in the following,

m_{v} = \{\begin{matrix} \frac{c}{2} - 1, & if d_{v} < n \\ c - 1 + d_{v} - n, & otherwise \end{matrix} .

(30)

The obtained clustering results with different parameter values are shown in Figure 13 for the evaluation metric ACC. As shown in the figure, ACC is relatively stable as

α

and k change and is insensitive to the values of

α

and k on the ORL, NGs, and HW2sources datasets. On the MSRC_v1 and 100leaves datasets, clustering performance remains stable as the value of

α

changes and varies with the value of k, and as can be seen in Figure 13, the best clustering performance is achieved on the MSRC_v1 dataset when

k = 25

, and on the 100 datasets when

k = 15

, when the clustering performance is optimal.

5.7. Statistical Tests

The Friedman test is a non-parametric statistical test that is mainly used to determine if there are statistically significant differences among multiple related populations. In this section, the Friedman test is used to compare the clustering effects of the SC, Co-regMSC, MVGL, MCGC, AWP, GMC, SFMC and TUDMPC algorithms to determine if there are any significant differences among them to further demonstrate in more depth that TUDMPC outperforms these baseline methods. Statistical tests were carried out using the ACC, NMI and purity values. The Friedman test showed significant differences in the performance among these methods at a significance level of 0.05, and the results are reported in Table 9.

As the results in the table show, the p-values are all much smaller than the significance level of 0.05. Therefore, it is proved that there are significant differences in the performance among these clustering algorithms. The Nemenyi test is used to further determine the significant differences in the performance of each pair of algorithms. The test statistic CD is calculated using (31) as follows:

\begin{matrix} C D = q_{a} \sqrt{\frac{M (M + 1)}{6 N}}, \end{matrix}

(31)

where

q_{a}

, N and M are the number of datasets, number of algorithms, and the threshold of the Tukey distribution, respectively. Since eight algorithms are used in this work, the conventional settings are M = 8, N = 5 and

q_{a}

= 2.359. Hence,

C D

= 3.6545 is obtained by using (31). Figure 14 illustrates the results of the Nemenyi test for the eight algorithms at a significance level of 0.05.

From the results in Figure 14, the performance of TUDMPC is statistically different from those of SC, MVGL, MCGC and AWP in ACC and NMI, and statistically different from those of SC, SFMC and AWP in purity. The above statistical test results show that TUDMPC outperforms the baseline methods.

6. Conclusions

A novel tensor-based unified and discrete multi-view projection clustering method (TUDMPC) is proposed. This approach uses the projection matrix to extend the high-dimensional information and store the information in a low-dimensional space, thus reducing the time processing complexity. Additionally, TUDMPC uses feature selection to lessen the impact of noise and redundancy. More accurate affinity matrices can be learned adaptively in the low-dimensional space. Meanwhile, the tensor kernel norm is used to better exploit the complementarity and the high-order correlation of the views. In addition, the rank constraint is applied to keep the affinity matrices with a discrete cluster structure, and the clustering results are directly obtained in a unified framework. According to numerical experimental results, TUDMPC is more effective than the other cutting-edge methods.

However, as described in Section 5.6, TUDMPC is a little sensitive to the parameter settings. Subsequent studies will focus on parameter-free clustering approaches. Meanwhile, deep learning can further capture the underlying data structure of the original data. Therefore, in the future, deep learning will be introduced into TUDMPC for further research.

Author Contributions

Conceptualization, L.M.; methodology, L.M.; validation, X.L. and H.L.; formal analysis, H.L.; investigation, W.Z.; writing—original draft preparation, L.M.; writing—review and editing, L.M., X.L., M.S. and W.Z.; visualization, H.L.; supervision, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jiang, T.; Gao, Q. Fast multiple graphs learning for multi-view clustering. Neural Netw. 2022, 155, 348–359. [Google Scholar] [CrossRef] [PubMed]
Si, X.; Yin, Q.; Zhao, X.; Yao, L. Consistent and diverse multi-View subspace clustering with structure constraint. Pattern Recognit. 2022, 121, 108196. [Google Scholar] [CrossRef]
Kumar, A.; Rai, P.; Daumé, H. Co-regularized multi-view spectral clustering. In Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, 12–14 December 2011; pp. 1413–1421. [Google Scholar]
Nie, F.; Li, J.; Li, X. Parameter-Free Auto-Weighted Multiple Graph Learning: A Framework for Multiview Clustering and Semi-Supervised Classification. Int. Jt. Conf. Artif. Intell. 2016, 9, 1881–1887. [Google Scholar]
Jing, P.; Su, Y.; Li, Z.; Nie, L. Learning robust affinity graph representation for multi-view clustering. Inf. Sci. 2021, 544, 155–167. [Google Scholar] [CrossRef]
Liao, S.; Gao, Q.; Yang, Z.; Chen, F.; Nie, F.; Han, J. Discriminant Analysis via Joint Euler Transform and L_2,1-norm. IEEE Trans Image Process. 2018, 27, 5668–5682. [Google Scholar] [CrossRef]
Xie, D.; Zhang, X.; Gao, Q.; Han, J.; Xiao, S.; Gao, X. Multiview Clustering by Joint Latent Representation and Similarity Learning. IEEE Trans. Cybern. 2020, 50, 4848–4854. [Google Scholar] [CrossRef]
Gao, Q.; Wan, Z.; Liang, Y.; Wang, Q.; Liu, Y.; Shao, L. Multi-view projected clustering with graph learning. Neural Netw. 2020, 126, 335–346. [Google Scholar] [CrossRef]
Yuan, H.; Li, J.; Liang, Y.; Tang, Y. Multi-view unsupervised feature selection with tensor low-rank minimization. Neurocomputing 2022, 487, 75–85. [Google Scholar] [CrossRef]
Fu, L.; Yang, J.; Chen, C.; Zhang, C. Low-rank tensor approximation with local structure for multi-view intrinsic subspace clustering. Inf. Sci. 2022, 606, 877–891. [Google Scholar] [CrossRef]
Ma, S.; Liu, Y.; Liu, G.; Zheng, Q.; Zhang, C. Orthogonal multi-view tensor-based learning for clustering. Neurocomputing 2022, 500, 592–603. [Google Scholar] [CrossRef]
Li, Z.; Tang, C.; Liu, X.; Zheng, X.; Zhang, W.; Zhu, E. Tensor-Based Multi-View Block-Diagonal Structure Diffusion for Clustering Incomplete Multi-View Data. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
Liu, Z.; Song, P. Deep low-rank tensor embedding for multi-view subspace clustering. Expert Syst. Appl. 2024, 237, 121518. [Google Scholar] [CrossRef]
Liu, Z.; Chen, Z.; Li, Y.; Zhao, L.; Yang, T.; Farahbakhsh, R.; Crespi, N.; Huang, X. IMC-NLT: Incomplete multi-view clustering by NMF and low-rank tensor. Expert Syst. Appl. 2023, 221, 119742. [Google Scholar] [CrossRef]
Wu, J.; Xie, X.; Nie, L.; Lin, Z.; Zha, H. Unified Graph and Low-Rank Tensor Learning for Multi-View Clustering. Proc. AAAI Conf. Artif. Intell. 2020, 34, 6388–6395. [Google Scholar] [CrossRef]
Dong, X.; Wu, D.; Nie, F.; Wang, R.; Li, X. Multi-view clustering with adaptive procrustes on Grassmann manifold. Inf. Sci. 2022, 609, 855–875. [Google Scholar] [CrossRef]
Yao, J.; Lin, R.; Lin, Z.; Wang, S. Multi-view clustering with graph regularized optimal transport. Inf. Sci. 2022, 612, 563–575. [Google Scholar] [CrossRef]
Ren, Z.; Li, X.; Mukherjee, M.; Huang, Y.; Sun, Q.; Huang, Z. Robust multi-view graph clustering in latent energy-preserving embedding space. Inf. Sci. 2021, 569, 582–595. [Google Scholar] [CrossRef]
Li, L.; He, H. Bipartite Graph based Multi-view Clustering. IEEE Trans. Knowl. Data Eng. 2020, 34, 3111–3125. [Google Scholar] [CrossRef]
Wang, H.; Yang, Y.; Liu, B. GMC: Graph-Based Multi-View Clustering. IEEE Trans. Knowl. Data Eng. 2020, 32, 1116–1129. [Google Scholar] [CrossRef]
Wei, X.; Sen, W.; Ming, Y.; Quan, X.; Jun, G.; Xin, B. Multi-view graph embedding clustering network: Joint self-supervision and block diagonal representation. Neural Netw. 2022, 145, 1–9. [Google Scholar] [CrossRef]
Sang, X.; Lu, J.; Lu, H. Consensus graph learning for auto-weighted multi-view projection clustering. Inf. Sci. 2022, 609, 816–837. [Google Scholar] [CrossRef]
Zhao, J.; Kang, F.; Zou, Q.; Wang, X. Multi-view clustering with orthogonal mapping and binary graph. Expert Syst. Appl. 2023, 213, 118911. [Google Scholar] [CrossRef]
Du, Y.; Lu, G.; Ji, G. Robust and optimal neighborhood graph learning for multi-view clustering. Inf. Sci. 2023, 631, 429–448. [Google Scholar] [CrossRef]
Nie, F.; Li, J.; Li, X. Self-weighted Multiview Clustering with Multiple Graphs. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017. [Google Scholar] [CrossRef]
Jiao, W.; Bin, W.; Zhen, W.; Hong, Y.; Yun, H. Multi-scale deep multi-view subspace clustering with self-weighting fusion and structure preserving. Expert Syst. Appl. 2023, 213, 119031. [Google Scholar] [CrossRef]
Huang, S.; Tsang, I.; Xu, Z.; Lv, J. Measuring Diversity in Graph Learning: A Unified Framework for Structured Multi-View Clustering. IEEE Trans. Knowl. Data Eng. 2022, 34, 5869–5883. [Google Scholar] [CrossRef]
Jun, C.; Zhao, K.; Bo, Y.; Lu, P.; Zeng, L. Multi-view subspace clustering via partition fusion. Inf. Sci. 2021, 560, 410–423. [Google Scholar] [CrossRef]
Jin, H.; Jian, Y. Robust subspace segmentation via low-rank representation. IEEE Trans. Cybern. 2014, 44, 1432–1445. [Google Scholar] [CrossRef]
Yang, B.; Wu, J.; Zhang, X.; Zheng, X.; Nie, F.; Chen, B. Discrete correntropy-based multi-view anchor-graph clustering. Inf. Fusion 2022, 103, 102097. [Google Scholar] [CrossRef]
Deng, P.; Li, T.; Wang, D.; Wang, H.; Peng, H.; Horng, S.-J. Multi-view clustering guided by unconstrained non-negative matrix factorization. Knowl.-Based Syst. 2023, 266, 110425. [Google Scholar] [CrossRef]
Liu, J.; Wang, C.; Gao, J.; Han, J. Multi-View Clustering via Joint Nonnegative Matrix Factorization. In Proceedings of the SIAM International Conference on DATA MININGs, Austin, TX, USA, 2–4 May 2013; pp. 252–260. [Google Scholar] [CrossRef]
Shi, S.; Nie, F.; Wang, R.; Li, X. Multi-View Clustering via Nonnegative and Orthogonal Graph Reconstruction. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 201–214. [Google Scholar] [CrossRef]
Chen, Z.; Lin, P.; Chen, Z.; Ye, D.; Wang, S. Diversity embedding deep matrix factorization for multi-view clustering. Inf. Sci. 2022, 610, 114–125. [Google Scholar] [CrossRef]
Fu, L.; Li, J.; Chen, C. Consistent affinity representation learning with dual low-rank constraints for multi-view subspace clustering. Neurocomputing 2022, 514, 113–126. [Google Scholar] [CrossRef]
Wang, H.; Yang, Y.; Liu, B.; Hamido, F. A study of graph-based system for multi-view clustering. Knowl.-Based Syst. 2019, 163, 1009–1019. [Google Scholar] [CrossRef]
Wang, B.; Xiao, Y.; Li, Z.; Wang, X.; Chen, X.; Fang, D. Robust Self-Weighted Multi-View Projection Clustering. Proc. AAAI Conf. Artif. Intell. 2020, 34, 6110–6117. [Google Scholar] [CrossRef]
Misha, E.K.; Carla, D.M. Factorization strategies for third-order tensors. Linear Algebra Its Appl. 2011, 435, 641–658. [Google Scholar] [CrossRef]
Fan, K. On a theorem of weyl concerning eigenvalues of linear transformation. Proc. Natl. Acad. Sci. USA 1949, 35, 652–655. [Google Scholar] [CrossRef]
Jeribi, A. Spectral Graph Theory. In Spectral Theory and Applications of Linear Operators and Block Operator Matrices; Springer: Cham, Switzerland, 2015. [Google Scholar] [CrossRef]
Hu, W.; Tao, D.; Zhang, W.; Xie, Y.; Yang, Y. The Twist Tensor Nuclear Norm for Video Completion. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2961–2973. [Google Scholar] [CrossRef]
Ng, A.; Jordan, M.; Weiss, Y. On Spectral Clustering: Analysis and an algorithm. Neural Inf. Process. Syst. 2001, 14, 849–856. [Google Scholar]
Zhan, K.; Zhang, C.; Guan, J.; Wang, J. Graph Learning for Multiview Clustering. IEEE Trans. Cybern. 2018, 48, 2887–2895. [Google Scholar] [CrossRef]
Zhan, K.; Nie, F.; Wang, J.; Yang, Y. Multiview Consensus Graph Clustering. IEEE Trans. Image Process. 2019, 28, 1261–1270. [Google Scholar] [CrossRef]
Nie, F.; Tian, L.; Li, X. Multiview Clustering via Adaptively Weighted Procrustes. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar] [CrossRef]
Li, X.; Zhang, H.; Wang, R.; Nie, F. Multiview Clustering: A Scalable and Parameter-Free Bipartite Graph Fusion Method. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 330–344. [Google Scholar] [CrossRef]
Winn, J.; Jojic, N. LOCUS: Learning object classes with unsupervised segmentation. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Washington, DC, USA, 17–21 October 2005; pp. 756–763. [Google Scholar] [CrossRef]
Chen, M.; Huang, L.; Wang, C.; Huang, D.; Lai, J. Relaxed multi-view clustering in latent embedding space. Inf. Fusion 2021, 68, 8–21. [Google Scholar] [CrossRef]
Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Kelly, M.; Longjohn, R.; Nottingham, K. The UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu (accessed on 15 November 2024).

Figure 1. Flowchart of the TUDMPC algorithm.

Figure 2. Visualization of the affinity matrices of the MSRC_v1 dataset. (a) SC. (b) MCGC. (c) MVGL. (d) GMC. (e) SFMC. (f) TUDMPC.

Figure 3. Visualization of the affinity matrices of the NGs dataset. (a) SC. (b) MCGC. (c) MVGL. (d) GMC. (e) SFMC. (f) TUDMPC.

Figure 4. Visualization of the affinity matrices of the 100leaves dataset. (a) SC. (b) MCGC. (c) MVGL. (d) GMC. (e) SFMC. (f) TUDMPC.

Figure 5. Visualization of the HW2sources dataset. (a) SC. (b) Co-regMSC. (c) AWP. (d) MCGC. (e) MVGL. (f) TUDMPC.

Figure 6. Visualization of the MSRC_v1 dataset. (a) SC. (b) Co-regMSC. (c) AWP. (d) MCGC. (e) MVGL. (f) TUDMPC.

Figure 7. Some face images from the ORL dataset (10 × 10).

Figure 8. Some handwritten digital images from the HW dataset (10 × 50).

Figure 9. Some face image recognition results of TUDMPC on the ORL dataset (10 × 10).

Figure 10. Some handwritten digital image recognition results of TUDMPC on the HW dataset (10 × 50).

Figure 11. Results of the ablation experiments on the 100leaves dataset.

Figure 12. Convergence on some datasets. (a) MRSC_v1. (b) HW2source. (c) ORL. (d) 100leaves.

Figure 13. Sensitivity analysis on different datasets as parameters

α

and k change. (a) MSRC_v1. (b) HW2sources. (c) 100leaves. (d) NGs. (e) ORL.

Figure 13. Sensitivity analysis on different datasets as parameters

α

and k change. (a) MSRC_v1. (b) HW2sources. (c) 100leaves. (d) NGs. (e) ORL.

Figure 14. Results of the Nemenyi Test. (a) ACC. (b) NMI. (c) Purity.

Table 1. Additional notations.

Notations	Descriptions
$c, n$ and V	The number of clusters, samples and views, respectively,
$m_{v}$	Projection dimension in view v
$d_{v}$	The number of features in view v
$X^{(v)} \in R^{d_{v} \times n}$	Data matrix of view v
$Z^{(v)} \in R^{d_{v} \times m_{v}}$	Projection matrix of view v
$S^{(v)} \in R^{n \times n}$	Similarity matrix in view v
$F \in R^{n \times c}$	Clustering indicator matrix
$1$ , $I$ and $0$	Matrix of all 1 s, the identity matrix and matrix of all 0 s

Table 3. Clustering results for the MRSC_v1 dataset.

Methods	ACC	NMI	Purity	R	F
SC	0.6429 ± 0.0000	0.5595 ± 0.0003	0.6905 ± 0.0000	0.5141 ± 0.0009	0.5306 ± 0.0005
Co-regMSC	0.8031 ± 0.0066	0.7199 ± 0.0095	0.8031 ± 0.0066	0.6821 ± 0.0069	0.6931 ± 0.0081
MVGL	0.7476 ± 0.0000	0.7532 ± 0.0000	0.8810 ± 0.0000	0.5860 ± 0.0000	0.6736 ± 0.0000
AWP	0.7524 ± 0.0000	0.7228 ± 0.0000	0.8810 ± 0.0000	0.6029 ± 0.0000	0.6867 ± 0.0000
MCGC	0.7429 ± 0.0000	0.6653 ± 0.0000	0.8286 ± 0.0000	0.5453 ± 0.0000	0.6191 ± 0.0000
GMC	0.7476 ± 0.0000	0.7709 ± 0.0000	0.7905 ± 0.0000	0.8089 ± 0.0000	0.6968 ± 0.0000
SFMC	0.7810 ± 0.0000	0.7883 ± 0.0000	0.8095 ± 0.0000	0.8138 ± 0.0000	0.7404 ± 0.0000
TUDMPC	0.9238 ± 0.0000	0.8477 ± 0.0000	0.9238 ± 0.0000	0.8548 ± 0.0000	0.8505 ± 0.0000