Open AccessArticle

Leveraging Quantum Machine Learning to Address Class Imbalance: A Novel Approach for Enhanced Predictive Accuracy

Seongjun Kwon

Jihye Huh

Sang Ji Kwon

²,

Sang-ho Choi

³ and

Ohbyung Kwon

^4,*

Department of Big Data Analytics, Kyung Hee University, Seoul 02447, Republic of Korea

Department of Business Administration, Kyung Hee University, Seoul 02447, Republic of Korea

Department of AI-Based Management Information Systems, Kyung Hee University, Seoul 02447, Republic of Korea

⁴

School of Management, Kyung Hee University, Seoul 02447, Republic of Korea

Author to whom correspondence should be addressed.

Symmetry 2025, 17(2), 186; https://doi.org/10.3390/sym17020186

Submission received: 10 January 2025 / Revised: 22 January 2025 / Accepted: 23 January 2025 / Published: 25 January 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Figure 1
Comparison between conventional machine learning and QML. "> Figure 2
Example of data transformation through quantum gate. "> Figure 3
AI GPU server (Nvidia Tesla A100). "> Figure 4
Illustration of VQC algorithm. "> Figure 5
Illustration of Variational Quantum Circuit. "> Figure 6
Example of measurement using qubits. "> Figure 7
Example of loss function. "> Figure 8
Optimization condition exploration of the VQC algorithm. "> Figure 9
Illustration of feature space: 4 × 4 matrix. "> Figure 10
Illustrations of Bloch sphere visualizations before and after rotation through encoding. ">

Versions Notes

Abstract

The class imbalance problem presents a critical challenge in real-world applications, particularly in high-stakes domains such as healthcare, finance, disaster management, and fault diagnosis, where accurate anomaly detection is paramount. Class imbalance often disrupts the inherent symmetry of data distributions, resulting in suboptimal performance of traditional machine learning models. Conventional approaches such as undersampling and oversampling are commonly employed to address this issue; however, these methods can introduce additional asymmetries, including information loss and overfitting, which ultimately compromise model efficacy. This study introduces an innovative approach leveraging quantum machine learning (QML), specifically the Variational Quantum Classifier (VQC), to restore and capitalize on the symmetrical properties of data distributions without relying on resampling techniques. By employing quantum circuits optimized to mitigate the asymmetries inherent in imbalanced datasets, the proposed method demonstrates consistently superior performance across diverse datasets, with notable improvements in Recall for minority classes. These findings underscore the potential of quantum machine learning as a robust alternative to classical methods, offering a symmetry-aware solution to class imbalance and advancing QML-driven technologies in fields where equitable representation and symmetry are of critical importance.

Keywords:

class imbalance problem; quantum machine learning; variational quantum classifier; resampling method; robustness test

1. Introduction

The class imbalance problem, coupled with high dimensionality, remains a critical challenge in applying machine learning across various real-world domains, including medical diagnosis [1], finance [2], disaster management [3], telecommunications network management [4], and medical diagnosis [5]. Class imbalance arises when the target class is significantly underrepresented compared to other classes, which introduces bias in model training and results in degraded predictive performance, particularly through an increase in false negatives (FNs). Although data resampling techniques, such as undersampling and oversampling, are widely used to mitigate this issue, they come with trade-offs. Undersampling leads to information loss, while oversampling, including advanced methods like SMOTE [6], can introduce overfitting and still suffers from some degree of information loss.

Algorithmic techniques such as cost-sensitive learning and thresholding adjustments [7,8] offer alternative approaches but are limited by arbitrary thresholding and the inherent challenges in balancing error types. More recently, quantum machine learning (QML) has shown potential in addressing complex data patterns, yet its application to class imbalance remains underexplored. While some studies report promising outcomes, such as higher accuracy in Quantum Support Vector Machines (QSVMs), the performance improvement for imbalanced datasets remains inconclusive [9].

In this study, we propose a novel approach to address class imbalance through quantum machine learning, specifically utilizing Variational Quantum Classifiers (VQCs) optimized with quantum encoding and decoding techniques. Our experimental results, conducted on real-world imbalanced datasets, consistently demonstrate that quantum models can outperform traditional machine learning methods without the need for resampling techniques. Through repeated tests on multiple benchmark datasets, we highlight the robustness and potential of QML in resolving class imbalance, offering new directions for improving prediction performance in domains where high precision is critical.

1.1. Class Imbalance Problem

Class imbalance problem refers to the situation where instances of a particular class are significantly fewer than those of other classes, which can negatively affect the performance of classification algorithms [10]. Addressing class imbalance is crucial in tasks requiring accurate and reliable predictions, such as fraud detection and medical diagnosis. Although many methods have been devised to tackle the class imbalance issue, achieving stable performance on high-dimensional and imbalanced datasets remains a challenging task. Various approaches have been introduced to handle class imbalance, including data-level solutions, algorithm-level solutions, and ensemble learning techniques.

Among data-level methods, random sampling is one of the most widely used techniques for handling imbalanced data due to its ease of implementation and potential to offer acceptable performance. In addition to random sampling, the SMOTE (Synthetic Minority Oversampling Technique) generates synthetic data samples for the minority class to balance the class distribution [11]. However, data-level approaches are not limited to SMOTE [12], which experimentally evaluates various techniques to address the class imbalance problem, highlighting the limitations of data-level approaches and emphasizing the importance of selecting context-appropriate methods for performance improvement. Their study compared several methods for augmenting minority class data and showed that traditional methods like SMOTE do not always yield optimal performance, which can vary depending on the context of the data.

Furthermore, in Zhang’s [13] study, a GAN-based data augmentation method using the WGAN-GP algorithm was employed to generate synthetic data samples with distributions similar to real data in complex medical datasets, improving the model’s generalization. Additionally, Xu [14] proposed the KNSMOTE algorithm, a refined version of the SMOTE, which classifies the dataset using k-means clustering, removes easily misclassified boundary samples and noisy samples, and generates more stable samples from imbalanced datasets compared to other oversampling algorithms.

Algorithm-level approaches to handling data imbalance include techniques such as assigning different costs to emphasize the minority class and adjusting prediction thresholds to more easily classify samples that are likely to belong to the minority class. For example, cost-sensitive learning assigns a higher cost to the minority class, training the model to improve accuracy for that class [15]. In Esposito’s [16] study, the GHOST (Generalized Threshold Shifting) approach optimized thresholds, resulting in better performance than random undersampling of the dataset. Ensemble methods, which combine multiple classifiers, can mitigate prediction errors and bias. Notable examples include SMOTEBagging [17] and SMOTEBoost [18]. Recently, there have been attempts to integrate imbalance handling into ensemble methods like XGBoost to improve performance [19].

Despite advancements in handling data imbalance, traditional methods still face challenges such as computational inefficiency and limited scalability. Moreover, advanced pre-processing methods, such as the “Negative Branch” approach, offer improved model performance in speech recognition tasks [20]. Quantum computing has been proposed as a solution to overcome these limitations, sparking growing interest in whether quantum machine learning can address the data imbalance problem and provide effective solutions.

1.2. Quantum Machine Learning

Quantum machine learning (QML) aims to leverage the unique characteristics of quantum systems, specifically their ability to process information in atypical ways that classical systems struggle to handle efficiently, with the goal of outperforming traditional machine learning in terms of accuracy and inference time [21]. As a starting point to achieve this goal, QML utilizes qubits, which scale data representation, storage, and processing more effectively compared to classical bits. By encoding information into qubits and employing quantum entanglement, QML forms quantum circuits and optimizes them to achieve superior performance. In other words, QML maps classical data to quantum mechanical states, detects patterns within the data, and processes these states through quantum manipulation.

QML encompasses various tasks, including processing classical data using quantum machine learning algorithms, processing quantum data using classical machine learning algorithms, and processing quantum data with quantum machine learning algorithms [22]. However, this study focuses on processing imbalanced classical datasets using QML to explore how it can provide superior solutions. Additionally, QML can be categorized into four types of tasks: quantum simulation to improve existing simulations, enhanced quantum computing to improve quantum computing itself, quantum machine perception for sensing, and classical data analysis to enhance supervised and unsupervised learning. Given the context of this study, we will focus on QML for classical data analysis.

The core of QML lies in converting classical data into quantum states for processing. By utilizing quantum states, QML can learn patterns and make predictions more efficiently. As shown in Figure 1, QML encodes the data into quantum states to take advantage of quantum properties such as parallel computation and entanglement to enhance performance, while conventional machine learning processes data directly through algorithms. Moreover, quantum gates can accelerate matrix operations and reduce computational complexity [23,24]. Therefore, QML requires an additional initial step of quantizing the data, which is part of the state preparation process needed for analyzing data through quantum circuits. Then, QML produces quantum data as output, which will be converted back into classical data.

To utilize the benefits of quantum computation in QML, classical data must first be encoded into quantum states. Then, quantum circuits composed of quantum gates manipulate and transform the quantum states, exploiting quantum phenomena such as superposition and entanglement. Figure 2 illustrates an example of transforming data through a quantum gate. Quantum gates like Hadamard, CNOT, and Pauli gates are fundamental components for manipulating qubits in quantum computing. These gates apply unitary transformations, preserving quantum information by keeping the total probability constant. A single qubit state can be represented as a quantum state or vector, where coefficients α and β are probability amplitudes representing the likelihood of measuring the qubit in either the ∣0⟩ or ∣1⟩ state. For instance, a Pauli X gate rotates the qubit 180 degrees about the x-axis on the Bloch sphere. By combining such transformations, more complex probability distributions can be represented, ultimately enabling more efficient problem-solving in quantum algorithms.

As a result, quantum circuits play a pivotal role in quantum machine learning (QML) algorithms. They are responsible for the fundamental transformations and computations that enable QML to operate, and various quantum machine learning algorithms have been developed and proposed utilizing these circuits. These algorithms aim to perform complex calculations more efficiently than their classical counterparts by leveraging the unique properties of quantum circuits. Numerous quantum machine learning algorithms have been developed and proposed to date, with a summary of the representative ones provided in Table 1.

There is still no consensus on whether quantum machine learning (QML) algorithms outperform traditional machine learning algorithms. For example, existing experiments indicate that Quantum SVM exhibits approximately 5% higher accuracy than traditional SVMs when evaluated on benchmark datasets such as Iris, Rain, Custom, and Adhoc [39]. However, in datasets with significant class imbalance, Quantum SVMs have been reported to show little performance difference compared to traditional methods like XGBoost or Random Forest in terms of accuracy or AUC [9]. Notably, this study focused solely on Quantum SVMs and did not explore a broader range of quantum machine learning algorithms.

Furthermore, the performance of quantum algorithms can be optimized based on the quantization methods and the techniques used for entangling qubits, yet the specific effects of these variations on performance have not been clearly articulated. Additionally, the number of qubits used in quantum algorithms influences their performance [40] and concurrently affects the computation time. Therefore, optimizing performance requires consideration of both predictive accuracy and the number of qubits.

Moreover, the integration of quantum computing with multi-strategy fusion has enhanced the performance of the dung beetle optimization algorithm (DPO), contributing to improvements in global exploration and optimization capabilities in complex engineering problems [23,24]. However, there have been few proposals for QML optimized specifically for addressing class imbalance issues.

2. Materials and Methods

2.1. Materials

2.1.1. Implementation Environment

For the experiments, an AI GPU server, specifically the Nvidia Tesla A100 (Seoul, Korea), was utilized. This server is equipped with a 32-core CPU, a single Nvidia Tesla A100 GPU, 256 GB of RAM, and 5.76 TB of storage (See Figure 3). Additionally, various backends available on the IBM Quantum Platform (including actual quantum computers and quantum simulators) were employed to execute the algorithms. In particular, IBM Qiskit was used for simulation, and the performance was compared with classical models. The results confirmed that the quantum approach could enhance computational speed while achieving nearly similar accuracy in handling complex probabilistic calculations [23].

2.1.2. Input Dataset

To apply QML, an imbalanced dataset consisting of imported food inspection data was utilized. This dataset is primarily composed of documentation related to imported foods, which is used to determine whether the products pose a risk. The number of identified risk cases is considerably low compared to the total number of cases, indicating a significant data imbalance. The analysis focused particularly on the dataset concerning seasoning products, which are of high consumer interest within the Food and Drug Administration (FDA) datasets.

Before comparing Classical Support Vector Machine (SVM) and Quantum Support Vector Machine (QSVM), additional preprocessing steps were undertaken. Initially, the seasoning product dataset was composed of 45 features; however, dimensionality reduction techniques such as Principal Component Analysis (PCA) and Multiple Correspondence Analysis (MCA) were applied to reduce the features to four. This reduction was made with consideration for quantum speed. Additionally, for the training and evaluation of the Classical SVM, a test dataset comprising 5881 instances and a sampled subset of 150 instances was used. This approach aimed to compare performance based on dataset size. The properties of the dataset are summarized in Table 2.

2.2. Method

This study proposes a quantum machine learning (QML) approach suitable for addressing the class imbalance problem. Unlike classical machine learning methods, quantum machine learning utilizes unitary operations (Operation U) to transform classical state vectors

\vec{x}

into quantum state

| ϕ (\vec{x}) ⟩

, enabling operations in quantum space [40]. Given these unique operations, particularly the quantum properties of superposition and entanglement, it is hypothesized that QML can effectively capture the characteristics of imbalanced datasets. Furthermore, to finely analyze the features of imbalanced data, it is deemed necessary to have algorithms that can adjust quantum circuits according to specific problems, rather than relying on pre-designed quantum gate algorithms (e.g., Shor’s algorithm, Grover’s algorithm) [41].

2.2.1. Model Selection

Quantum Support Vector Machines (QSVMs) and Variational Quantum Classifiers (VQCs) are among the most widely used supervised learning-based quantum machine learning algorithms [42]. The reason for selecting the Variational Quantum Classifier (VQC) algorithm is as follows. First, by utilizing Parameterized Quantum Circuits, it can be specifically designed for particular problems [41], making it a flexible algorithm. This type of algorithm possesses a flexible structure that allows for the adjustment of parameters determined through learning [43]. Mathematically, the Variational Quantum Classifier (VQC) can be represented as

f (\vec{x}; \vec{θ}) = ⟨ {\hat{Z}}_{1} ⟩, \dots, ⟨ {\hat{Z}}_{m} ⟩

. In this framework, the input data

\vec{x}

is utilized to prepare the initial state, and measurements along the Z-axis of the m qubits are performed to optimize the parameters θ during the training process. These measurement values are denoted as

⟨ {\hat{Z}}_{k} ⟩

, representing the expectation value of the Z-axis for the k-th qubit. Consequently, the vector of expectation values

⟨ {\hat{Z}}_{1} ⟩, \dots, ⟨ {\hat{Z}}_{m} ⟩

serves as the foundation for the model’s predictions.

This transformation process, which involves finely tuning the parameters of quantum gates to convert the state of qubits to specific angles or values, is believed to yield optimal results. In contrast, the Quantum Support Vector Machine (QSVM) applies the principles of classical SVMs within the framework of quantum computing, focusing more on establishing decision boundaries in quantum space rather than on parameter optimization.

Secondly, at the current technology level of intermediate-scale quantum computers (NISQ: Noisy Intermediate-Scale Quantum), it is essential to apply algorithms that can produce results without additional error correction. The VQC is known to reduce errors by calculating a cost function through repeated measurements, integrating noise data into the optimization calculations [42]. Likewise, the QSVM may be more susceptible to errors compared to the VQC model; as the number of qubits and operations increases, the required resources grow, and without supporting error correction based on circuit complexity, it may not achieve proper performance. Therefore, this study has selected the VQC.

2.2.2. Variational Quantum Classifier

The Variational Quantum Classifier (VQC) is a type of quantum machine learning (QML) algorithm that is used to distinguish significant events of interest from background events in physics [42]. For example, in a physics experiment where one aims to identify specific traces of a particle, the VQC can effectively solve classification problems by accurately identifying the desired event (particle detection) from the background events.

Technically, the VQC calculates the objective function of a quantum circuit through quantum computing while leveraging classical computers to compute the circuit parameters. This parameter calculation enables the identification of the minimum or maximum values of the objective function. Mathematically, this is expressed as

M (y_{i}, \hat{y_{i}})

, where M represents the objective function. The actual value

y_{i}

and the predicted value

\hat{y_{i}}

are computed through the quantum machine learning model to minimize or maximize this objective function.

In this context, the predicted value

\hat{y_{i}}

is derived from the previously described function

f (\vec{x}; \vec{θ})

. The post-processing stage involves transforming the computational results of the quantum circuit into interpretable probability values or prediction outcomes (i.e., the final class of the target variable in classification tasks). This can be expressed as

\hat{y_{i}} = g (f (\vec{x}; \vec{θ}))

, where the function g is applied to the function f during the post-processing. Ultimately, the objective function incorporates the natural logarithm to compute probabilities, enabling stable learning for the algorithm. This process can be mathematically represented as

M (y_{i}, \hat{y_{i}}) = - (y_{i} \log (\hat{y_{i}}) + (1 - y_{i}) \log (1 - \hat{y_{i}}))

Additionally, to convert the output into values between 0 and 1 during the post-processing stage, the function

g (f (\vec{x}; \vec{θ})) = \frac{1 + f (\vec{x}; \vec{θ})}{2}

is applied.

The structure of the VQC model, as illustrated in Figure 4, generally consists of three main components: data encoding, the Variational Quantum Circuit, and measurement.

2.2.3. Feature Map Modeling

Quantum circuits are divided into two major stages: the Feature Map and the Variational Circuit [44]. The core concept of the quantum Feature Map is derived from traditional machine learning kernel techniques [45]. By applying a kernel-based Feature Map, special operations are applied to the initial state, significantly expanding the dimensionality of the computational space, which allows for the identification of hyperplanes that can separate data in the new space [42].

The primary goal of the Feature Map is to encode classical features present in the dataset into the Hilbert space where the quantum system operates [39]. We apply the Feature Map

f (\vec{x}; \vec{θ})

to the initial state ∣0⟩, which can be expressed mathematically as follows, illustrating how the Feature Map operates:

| ψ (\vec{x}; \vec{θ}) ⟩ : = f (\vec{x}; \vec{θ}) | 0 ⟩

Various embedding techniques exist for the transformation from classical to quantum states [46], and the Feature Maps available for use in the Variational Quantum Circuit (VQC) algorithm include ZFeatureMap, ZZFeatureMap, and PauliFeatureMap. The ZFeatureMap encodes data by rotating it around the Z-axis, reflecting the state of the qubit through the input data via the Z-axis rotation gate

R_{z} (θ)

. This method is primarily used for encoding simple linear relationships in the data and consists of single-qubit rotation gates, which do not require entanglement. The ZZFeatureMap rotates around the Z-axis while also adding entanglement (ZZ interaction) between the qubits. This interaction can be expressed in the form

R_{Z Z} (θ_{i} θ_{j})

, for qubits i and j. As it reflects interactions between data points, it is suitable for problems with nonlinear characteristics. The PauliFeatureMap employs rotation gates

R_{X}

R_{Y}

R_{Z}

for encoding. This method can create highly complex quantum states by including various rotations and entanglements between qubits. This method leverages multiple Pauli operators, offering greater flexibility than the Z and ZZ Feature Maps for encoding complex data correlations, making it well suited for non-trivial datasets.

2.2.4. Variational Circuit Modeling

Once embedded into quantum states from classical states, these quantum states can undergo further transformations within a Variational Circuit, as expressed by the following equation [39]. This illustrates the process by which the learnable function

V (θ)

is transformed through a set of quantum gates

U_{Φ}

∣ ψ (\vec{x}, θ) ⟩ : = V (θ) U_{Φ} (x) ∣ 0 ⟩

By defining the quantum gates that compose the quantum circuit, as exemplified in Figure 5, we can assert that this is equivalent to specifying the permitted definitions. The reason for using four qubits (

q_{0}, q_{1}, q_{2}, q_{3})

is that the number of qubits corresponds to the number of input variables X. This can be represented mathematically as follows: the operations of rotating the i-th qubit around the X-, Y-, and Z-axes are denoted by the

R_{X_{i}}, R_{Y_{i}}, R_{Z_{i}}

, respectively. Furthermore, the CNOT gate operates between the i-th qubit and the subsequent qubit, which is

(i + 1) (m o d n)

. Consequently, using the circuit constructed from the set of permitted gates U, we can formulate the learnable circuit

V (θ)

within the aforementioned Parameterized Quantum Circuit.

U = ⋃_{i = 1}^{n} \{R_{X_{i}}, R_{Y_{i}}, R_{Z_{i}}, {CNOT}_{i, (i + 1) (m o d n)}\}

Among various quantum circuits, four learnable circuits provided by the Qiskit library include PauliTwoDesign, RealAmplitudes, EfficientSU2, and TwoLocal. For this study, we selected RealAmplitudes, as it is primarily used for developing classification circuits and is more efficient than the others [47]. To facilitate data encoding, we employ the Pauli Feature Map, enabling the input data to be operable on the quantum circuit. Operations are performed through qubit rotations using Pauli operators (e.g.,

R_{Z}

R_{Y}

operators). An example of a circuit designed using RealAmplitudes is shown in Figure 5.

It is noteworthy that RealAmplitudes applies

R_{Y}

rotations to all qubits by default and establishes entanglement through the “CNOT” gate. In the circuit depicted below, we selected the Linear entanglement pattern from three options (Linear, Circular, Full) to determine the level of entanglement. The VQC algorithm follows the steps of data encoding, Variational Quantum Circuit, and measurement. During the data encoding step, Hadamard gates are applied to create superposition states, initializing the quantum state appropriately. Subsequently, Pauli Feature Map is used to map the data into quantum states. The quantum circuit then employs learnable RY gates, and CNOT gates are introduced to establish entanglement between qubits, allowing the circuit to capture and analyze more complex features of the data.

2.2.5. Measurement

Next, the measurement phase evaluates the likelihood of classifying the Y feature as either True or False. This process involves extracting numerous samples from various potential scenarios and calculating their average values. The measurement is conducted through the preparation of the initial state, the training of the parameters

\vec{θ}

, and the subsequent output of the Y variable from the trained circuit. To accomplish this, it is necessary to incorporate a measurement equation into the design of the Variational Circuit, which can be mathematically expressed as follows [39]. It is noteworthy that, since the Y variable in the food safety dataset pertains to a binary classification problem, a Boolean function is employed in the equation to reflect this characteristic.

f : {0, 1}^{q} \to \{- 1, + 1\}

z \mapsto \tilde{y}

This equation signifies the application of a function f to determine which class a given measured state z belongs to. By utilizing the function f, the output is returned as either −1 or +1, allowing us to ascertain the class membership based on the returned value. Figure 6 shows an example of measuring the probability using four qubits (i.e., variables). It visualizes the results obtained from measuring the first row of data from the actual test dataset using a quantum simulator. Each qubit state’s probability represents the likelihood of that particular state being measured. For instance, if a specific classical binary state, such as 0001 in Figure 5, has the highest probability, it implies that this state significantly contributes to the classification outcome. This measurement process reflects the probability distribution over different quantum states, indicating the likelihood that the output data, derived from the quantum circuit, belong to a particular class. Consequently, classification decisions can be made based on the measured probabilities. To ensure the reliability of these decisions, the resilience level was set to 2, the maximum setting for error mitigation. This helps minimize the impact of quantum noise, enhancing the model’s robustness when deployed on real quantum hardware.

2.2.6. Optimization

After the measurement phase, the parameters of the Variational Quantum Algorithm (VQA) circuit are optimized using classical optimization algorithms [42]. This process involves calculating the values of the cost function (or loss function) based on the measured results to determine the minimum. For this purpose, the Constrained Optimization by Linear Approximations (COBYLA) method was utilized to linearly approximate the cost function and constraints [42,48]. COBYLA allows for handling the problem’s constraints in two different ways, ensuring a harmonious optimization process.

It is worth noting that in COBYLA, the parameters that can be specified include the maximum number of iterations (maxiter) and the tolerance level. The maximum number of iterations specifies the highest allowable iterations for optimization, while the tolerance represents the convergence criterion for the optimization process.

In this experiment, a value of 100 was set for the maximum number of iterations to avoid insufficient optimization due to too few iterations and the risk of overfitting or excessive execution time from overly extensive optimization. The tolerance was maintained at its default value of tol = 1 × 10⁻⁶. Below, Figure 7 illustrates an actual example of the decreasing loss values during the optimization of the VQC algorithm through 100 iterations. This indicates that, using the COBYLA optimizer, the rotation angles, which are the parameters θ of the RY gate in the RealAmplitudes ansatz, are optimized through the learning process. In a quantum setting, cross-entropy loss serves as a probabilistic framework for optimizing classification performance by minimizing the difference between predicted and actual probability distributions. By weighting the loss function to assign higher penalties to the minority class, it is possible to balance the contributions of minority and majority classes, thereby addressing class imbalance and improving overall model performance.

R Y (θ) = [\begin{matrix} c o s (\frac{θ}{2}) & - s i n (\frac{θ}{2}) \\ s i n (\frac{θ}{2}) & c o s (\frac{θ}{2}) \end{matrix}]

2.2.7. Evaluation Metrics

For measuring the performance of the machine learning model, Recall was selected as the evaluation metric. In the context of imbalanced datasets, Recall is an appropriate metric for assessing how well the model identifies the relevant cases of interest, particularly the minority class. Recall has been particularly useful in scenarios where the consequences of misclassification or erroneous predictions pose significant risks [12,49]. Additionally, to compute performance metrics for imbalanced datasets, the Macro Average approach was adopted [50]. The Macro Average method calculates the arithmetic mean of the performance metrics for True and False predictions, ensuring that both True and False classes must have high predicted probabilities for the overall performance to be considered high, especially in the presence of class imbalance.

3. Results

3.1. Model Formulation

Next, Quantum Encoding was performed, during which a Feature Map was specified for quantization. The encoding was executed for different types of Feature Maps (Z, ZZ, Pauli FeatureMap), while varying the Feature Dimension (number of features), the number of iterations (the depth of the quantum circuit), and operators (entanglement, circuit transformations) to explore the optimal model.

Subsequently, the performance was evaluated. This involved first conducting a Quantum Decoding operation, followed by measurements and the conversion of the results obtained from the quantum state into classical data. After this, the Model Optimization phase was undertaken, which involved specifying the Feature Maps that yielded the best performance. As a result, the Pauli Feature Map, identified as having superior performance through the aforementioned processes, underwent hyperparameter tuning to optimize its effectiveness.

3.2. Performance Evaluation

To evaluate the performance of the proposed quantum algorithm, a comparison was made with traditional machine learning algorithms and the most commonly used Quantum Variational Methods (QVMs) in Quantum Machine Learning (QML). The traditional machine learning algorithms used for comparison included Random Forest, XGBoost, Extra Trees, Gradient Boosted Trees, AdaBoost, Support Vector Machine, Ensemble Model, and Stacking Model.

The results, as shown in Table 3, indicate that the quantum algorithm significantly improved the Recall metric compared to traditional algorithms. Additionally, among the Feature Maps used in QML, the Pauli Feature Map demonstrated the highest performance. This suggests that the Pauli Feature Map exhibits greater flexibility compared to other Feature Maps and is capable of representing complex patterns and interactions effectively.

Moreover, as indicated in Table 4, even when different Pauli Gates (YZ, ZY, XYZ) are applied using the QSVM algorithm, the performance can remain consistent due to the extensive learning achieved from the data. It is also observed that as the complexity of the Pauli Gates increases, the training time lengthens. Furthermore, in the case of certain Pauli Gates, poor gate selection can lead to a decrease in performance, as demonstrated by the results of applying the Pauli X Gate.

This study also explored the impact of traditional imbalance mitigation methods on model performance by conducting additional tests with different RandomState values for dataset sampling from the Food and Drug Administration (FDA). Our results showed that datasets that were difficult to learn consistently yielded low Recall scores across both classical and quantum algorithms (e.g., RandomState-30, RandomState-70). However, the Variational Quantum Classifier (VQC) demonstrated a distinct advantage in handling imbalanced datasets by manipulating and transforming quantum states.

In one instance, optimization of the VQC algorithm, as shown in Figure 8, led to exceptional predictive performance despite significant class imbalance. Specifically, the VQC accurately identified all potentially hazardous food items, achieving a perfect Recall score in this case. By systematically experimenting with a variety of hyperparameter configurations, including adjustments to the Pauli Feature Map repetitions, the degree of qubit entanglement, and the number of layers in the quantum Variational Circuit, we were able to pinpoint conditions that maximized performance. For instance, in a test set of 150 samples—where 148 were non-hazardous seasonings and only 2 were hazardous items—the VQC model achieved a Recall of 0.75 for the non-hazardous class and 1.00 for the hazardous class, resulting in an excellent Macro Average score.

These findings underscore the VQC’s capability to maintain high performance on highly imbalanced datasets without relying on conventional resampling techniques, offering a compelling alternative to traditional machine learning approaches for anomaly detection.

As visualized in Figure 9, the prediction results of the test dataset were represented in a 4 × 4 matrix format using four variables. Despite the apparent difficulty in determining the decision boundary for the minority class of hazardous food items (represented by red dots), the VQC algorithm, under optimized conditions, was able to interpret the characteristics of the dataset and make accurate predictions.

Table 5 demonstrates the significant impact of the SMOTE on improving model performance in imbalanced datasets. Notably, the VQC model consistently outperforms both classical SVMs and QSVMs, achieving a remarkable Recall of 0.98 without the SMOTE, indicating its robustness in handling imbalanced data without the need for resampling. While the QSVM shows substantial improvement with the SMOTE, particularly under RandomState 50, its performance declines noticeably without the SMOTE, highlighting its dependency on oversampling techniques. The classical SVM, though benefiting from the SMOTE, remains inferior to quantum models across all configurations. These findings underscore the superior capability of quantum models, particularly the VQC, in addressing data imbalance, even in the absence of resampling techniques like the SMOTE.

Ultimately, the key to optimizing the performance of the VQC algorithm, as observed up to this point, lies in the data encoding method (preparation for using quantum states), the number of repetitions, the degree of entanglement (Linear, Circular, Full), and the number of repetitions of these entanglements. Furthermore, the rotation of qubits through Pauli Gates plays a critical role in learning important features and patterns in the data.

3.3. Robustness Test

Through the analysis of the Food and Drug Administration (FDA) dataset, it was determined that the quantum algorithm exhibits superior performance by optimizing the Feature Map without relying on resampling methods to address data imbalance. To evaluate the robustness of these results, several datasets were used to repeat the same experiments. First, the Bank Customers dataset, which is related to a marketing campaign (phone calls) by a Portuguese bank, was analyzed. This dataset predicts whether customers subscribe to term deposits, with an imbalance ratio of True at 85.4% and False at 14.6%. The dataset was sourced from Kaggle and is based on a study on predicting bank telemarketing success based on data [51]. As shown in Table 6, the performance comparison revealed that, with the application of the SMOTE, classical algorithms outperformed in terms of the Recall metric, while the quantum algorithm demonstrated higher performance in the Non-SMOTE scenario. Furthermore, regarding the F1 Score metric, classical algorithms also performed better with the SMOTE applied, whereas the quantum algorithm outperformed classical methods in only one of the datasets in the Non-SMOTE condition.

Next, experiments were conducted using the Credit Card Fraud dataset. This dataset is related to the detection of fraudulent credit card transactions by a credit card company, predicting charges for items that the customer did not purchase. The dataset exhibits a severe imbalance, with True values at 99.8% and False values at 0.2%. The dataset was sourced from Kaggle and is based on the study BankSim: A bank payment simulator for fraud detection [52]. As shown in Table 7, it was observed that even without the application of the SMOTE, the quantum algorithm outperformed the SMOTE-applied dataset in terms of both Recall and F1 Score. This further confirms the potential effectiveness of the quantum algorithm in scenarios involving highly imbalanced datasets.

4. Discussion

The issue of data imbalance is a longstanding challenge that undermines the performance of machine learning models, driving the development of both data-driven and algorithm-driven solutions. While data-driven approaches such as resampling are straightforward, they often introduce discrepancies, leading to potential information loss or distortion from the original dataset, which in turn can negatively affect the sustainability of model performance. As a result, there is a growing need for methods that address data imbalance without altering the dataset itself.

This study presents a novel algorithmic solution to the data imbalance problem using optimized quantum machine learning (QML). By encoding multiple bits of information into qubits and leveraging quantum entanglement via quantum circuits, QML effectively processes and optimizes imbalanced datasets to produce accurate predictions. The flexibility of QML, specifically in terms of the quantization and entanglement methods employed, allows for fine-tuned optimizations that outperform conventional techniques.

Specifically, the Pauli decomposition expresses a Hamiltonian as a linear combination of Pauli matrices (e.g., X, Y, Z), which form a basis for Hermitian matrices, and is fundamental to quantum computing tasks [53], such as enhancing classification performance during the encoding process. Additionally, the alignment of rotation axes with the data distribution plays a pivotal role in improving classification performance [54]. For instance, when data points are predominantly distributed along the x-axis, X gates are better suited as they align with the intrinsic structure of the data, whereas Y gates are more effective for data centered around the y-axis. Therefore, the ability to amplify class differences through axis rotation offers a critical design principle for VQCs, especially in tasks where separability is a key determinant of performance. Visualizations of the Bloch sphere before and after rotation during encoding procedures using the Pauli Feature Map, as shown in Figure 10, illustrate the effects of these rotations, offering a clear understanding of how various rotation strategies impact the distribution of quantum states and their separability.

Through extensive experimentation, the Variational Quantum Classifier (VQC) was identified as the most effective QML approach for addressing class imbalance. Remarkably, the optimized QML consistently outperformed classical machine learning models that relied on data-driven methods like the widely used SMOTE (Synthetic Minority Oversampling Technique). Notably, QML achieved superior predictive performance without any need for resampling techniques, maintaining the integrity of the original datasets. This finding, demonstrated across both real-world and benchmark datasets, suggests that quantum machine learning can offer a more robust and efficient solution to the class imbalance problem than traditional machine learning approaches.

5. Conclusions

The QML community has been actively exploring potential application cases across real-world problems, including diverse engineering challenges [55] and healthcare [56]. In line with these efforts, this study underscores the utility and potential of QML by demonstrating its ability to effectively address data imbalance using real-world datasets, rather than augmented by synthetic data. This contribution is significant for the applied machine learning research community, which has a pressing need for successful, practical case studies.

Although various studies have reported inconsistent results regarding the advantages of QML over classical machine learning methods [57,58,59]—largely due to the current pre-development phase of quantum computing—the potential for QML to deliver superior performance through exponentially faster computations remains an unproven yet anticipated breakthrough [60]. Our findings demonstrate that QML can outperform classical methods in addressing class imbalance without the need for resampling techniques, thus expanding the practical applicability of QML in handling real-world datasets characterized by significant imbalance.

While this research has undergone extensive validation using both real and benchmark datasets, further validation across a wider variety of real-world datasets is essential to fully confirm the utility of QML in such scenarios. Moreover, the optimization process in QML lacks a mathematically rich explanation for why certain quantum circuit states outperform others. Addressing this gap by enhancing the explainability of QML optimizations will further solidify the credibility and applicability of this promising approach.

Author Contributions

Conceptualization and Methodology, S.K. and J.H.; Investigation, S.J.K.; Validation, S.-h.C.; Supervision and Validation, O.K.; Writing—original draft, S.K. and J.H.; Writing—review and editing, O.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a grant (21163MFDS516) from the Ministry of Food and Drug Safety in 2025.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

QML	Quantum Machine Learning
VQC	Variational Quantum Classifier
QSVM	Quantum Support Vector Machine
SMOTE	Synthetic Minority Oversampling Technique
NISQ	Noisy Intermediate-Scale Quantum

References

Chen, C.; Wu, X.; Zuo, E.; Chen, C.; Lv, X.; Wu, L. R-GDORUS technology: Effectively solving the Raman spectral data imbalance in medical diagnosis. Chemom. Intell. Lab. Syst. 2023, 235, 104762. [Google Scholar] [CrossRef]
Ryu, H.S.; Kim, S.H.; Oh, K.J. A study on the performance improvement of machine learning classification model by resolving imbalance in financial data: Focusing on credit card accidents. Quant. Bio-Sci. 2023, 42, 47–56. [Google Scholar]
Esparza, M.; Farahmand, H.; Brody, S.; Mostafavi, A. Examining data imbalance in crowdsourced reports for improving flash flood situational awareness. Int. J. Disaster Risk Reduct. 2023, 95, 103825. [Google Scholar] [CrossRef]
Saha, S.; Saha, C.; Haque, M.M.; Alam, M.G.R.; Talukder, A. ChurnNet: Deep learning enhanced customer churn prediction in telecommunication industry. IEEE Access 2024, 12, 4471–4484. [Google Scholar] [CrossRef]
Kok, C.L.; Ho, C.K.; Aung, T.H.; Koh, Y.Y.; Teo, T.H. Transfer Learning and Deep Neural Networks for Robust Intersubject Hand Movement Detection from EEG Signals. Appl. Sci. 2024, 14, 8091. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Vanderschueren, T.; Verdonck, T.; Baesens, B.; Verbeke, W. Predict-then-optimize or predict-and-optimize? An empirical evaluation of cost-sensitive learning strategies. Inf. Sci. 2022, 594, 400–415. [Google Scholar] [CrossRef]
Radwan, A.M. Enhancing prediction on imbalance data by thresholding technique with noise filtering. In Proceedings of the 2017 8th International Conference on Information Technology (ICIT), Amman, Jordan, 17–18 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 399–404. [Google Scholar]
Grossi, M.; Ibrahim, N.; Radescu, V.; Loredo, R.; Voigt, K.; Von Altrock, C.; Rudnik, A. Mixed quantum–classical method for fraud detection with quantum feature selection. IEEE Trans. Quantum Eng. 2022, 3, 1–12. [Google Scholar] [CrossRef]
Weiss, G.M. Foundations of Imbalanced Learning. In Imbalanced Learning:Foundations, Algorithms, and Applications; He, H., Ma, Y., Eds.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Liu, C.-L.; Chang, Y.-H. Learning from imbalanced data with deep density hybrid sampling. IEEE Trans. Syst. Man Cybern. Zyst. 2022, 52, 7065–7077. [Google Scholar] [CrossRef]
Thabtah, F.; Hammoud, S.; Kamalov, F.; Gonsalves, A. Data imbalance in classification: Experimental evaluation. Inf. Sci. 2020, 513, 429–441. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Z.; Zhang, Z.; Liu, J.; Feng, Y.; Wee, L.; Traverso, A. GAN-based one dimensional medical data augmentation. Soft Comput. 2023, 27, 10481–10491. [Google Scholar] [CrossRef]
Xu, Z.; Shen, D.; Nie, T.; Kou, Y.; Yin, N.; Han, X. A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data. Inf. Sci. 2021, 572, 574–589. [Google Scholar] [CrossRef]
Liu, X.Y.; Zhou, Z.H. The influence of class imbalance on cost-sensitive learning: An empirical study. In Proceedings of the Sixth International Conference on Data Mining (ICDM’06), Hong Kong, China, 18–22 December 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 970–974. [Google Scholar]
Esposito, C.; Landrum, G.A.; Schneider, N.; Stiefl, N.; Riniker, S. GHOST: Adjusting the decision threshold to handle imbalanced data in machine learning. J. Chem. Inf. Model. 2021, 61, 2623–2640. [Google Scholar] [CrossRef]
Wang, S.; Yao, X. Diversity analysis on imbalanced data sets by using ensemble models. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009; IEEE: Piscataway, NJ, USA, 2009. [Google Scholar]
Chawla, N.V.; Lazarevic, A.; Hall, L.O.; Bowyer, K.W. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the Knowledge Discovery in Databases: PKDD 2003: 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, 22–26 September 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 107–119. [Google Scholar]
Zhang, P.; Jia, Y.; Shang, Y. Research and application of XGBoost in imbalanced data. Int. J. Distrib. Sensor Netw. 2022, 18, 15501329221106935. [Google Scholar] [CrossRef]
Chen, J.; Teo, T.H.; Kok, C.L.; Koh, Y.Y. A Novel Single-Word Speech Recognition on Embedded Systems Using a Convolution Neuron Network with Improved Out-of-Distribution Detection. Electronics 2024, 13, 530. [Google Scholar] [CrossRef]
Biamonte, J.; Wittek, P.; Pancotti, N.; Rebentrost, P.; Wiebe, N.; Lloyd, S. Quantum machine learning. Nature 2017, 549, 195–202. [Google Scholar] [CrossRef]
Cerezo, M.; Verdon, G.; Huang, H.Y.; Cincio, L.; Coles, P.J. Challenges and opportunities in quantum machine learning. Nat. Comput. Sci. 2022, 2, 567–576. [Google Scholar] [CrossRef]
Zhou, Q.; Tian, G.; Deng, Y. Forecasting Bike Sharing Demand Using Quantum Bayesian Network. arXiv 2022, arXiv:2207.02142. [Google Scholar]
Zhou, Q.; Tian, G.; Deng, Y. BF-QC: Belief functions on quantum circuits. Expert Syst. Appl. 2023, 223, 119885. [Google Scholar] [CrossRef]
Rebentrost, P.; Mohseni, M.; Lloyd, S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 2014, 113, 130503. [Google Scholar] [CrossRef]
Bishwas, A.K.; Mani, A.; Palade, V. An all-pair quantum SVM approach for big data multiclass classification. Quantum Inf. Process. 2018, 17, 1–16. [Google Scholar] [CrossRef]
Schuld, M.; Sinayskiy, I.; Petruccione, F. Prediction by linear regression on a quantum computer. Phys. Rev. A 2016, 94, 022342. [Google Scholar] [CrossRef]
Wiebe, N.; Braun, D.; Lloyd, S. Quantum algorithm for data fitting. Phys. Rev. Lett. 2012, 109, 050505. [Google Scholar] [CrossRef]
Lloyd, S.; Mohseni, M.; Rebentrost, P. Quantum principal component analysis. Nat. Phys. 2014, 10, 631–633. [Google Scholar] [CrossRef]
Lloyd, S.; Mohseni, M.; Rebentrost, P. Quantum algorithms for supervised and unsupervised machine learning. arXiv 2013, arXiv:1307.0411. [Google Scholar]
Aïmeur, E.; Brassard, G.; Gambs, S. Quantum speed-up for unsupervised learning. Mach. Learn. 2013, 90, 261–287. [Google Scholar] [CrossRef]
Wiebe, N.; Kapoor, A.; Svore, K. Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning. arXiv 2014, arXiv:1401.2142. [Google Scholar] [CrossRef]
Kapoor, A.; Wiebe, N.; Svore, K. Quantum perceptron models. arXiv 2016, arXiv:1602.04799. [Google Scholar]
Menneer, T.; Narayanan, A. Quantum-inspired neural networks. In Proceedings of the Neural Information Processing Systems 95, Denver, CO, USA, 27 November–2 December 1995; MIT Press: Cambridge, MA, USA, 1995; pp. 927–934. [Google Scholar]
Lu, S.; Braunstein, S.L. Quantum decision tree classifier. Quantum Inf. Process. 2014, 13, 757–770. [Google Scholar] [CrossRef]
Harikrishnakumar, R.; Nannapaneni, S. Forecasting bike sharing demand using quantum Bayesian network. Expert Syst. Appl. 2023, 221, 119749. [Google Scholar] [CrossRef]
Schuld, M.; Bocharov, A.; Svore, K.M.; Wiebe, N. Circuit-centric quantum classifiers. Phys. Rev. A 2020, 101, 032308. [Google Scholar] [CrossRef]
Chen, S.Y.-C.; Yang, C.-H.H.; Qi, J.; Chen, P.-Y.; Ma, X.; Goan, H.-S. Variational quantum circuits for deep reinforcement learning. IEEE Access 2020, 8, 141007–141024. [Google Scholar] [CrossRef]
Simões, R.D.M.; Huber, P.; Meier, N.; Smailov, N.; Füchslin, R.M.; Stockinger, K. Experimental evaluation of quantum machine learning algorithms. IEEE Access 2023, 11, 6197–6208. [Google Scholar] [CrossRef]
de Oliveira, N.M.; Park, D.K.; Araujo, I.F.; da Silva, A.J. Quantum variational distance-based centroid classifier. Neurocomputing 2024, 576, 127356. [Google Scholar] [CrossRef]
Consul-Pacareu, S.; Montaño, R.; Rodriguez-Fernandez, K.; Corretgé, À.; Vilella-Moreno, E.; Casado-Faulí, D.; Atchade-Adelomou, P. Quantum Machine Learning hyperparameter search. arXiv 2023, arXiv:2302.10298. [Google Scholar]
Maheshwari, D.; Sierra-Sosa, D.; Garcia-Zapirain, B. Variational quantum classifier for binary classification: Real vs synthetic dataset. IEEE Access 2021, 10, 3705–3715. [Google Scholar] [CrossRef]
Sim, S.; Johnson, P.D.; Aspuru-Guzik, A. Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms. Adv. Quantum Technol. 2019, 2, 1900070. [Google Scholar] [CrossRef]
Cerezo, M.; Arrasmith, A.; Babbush, R.; Benjamin, S.C.; Endo, S.; Fujii, K.; McClean, J.R.; Mitarai, K.; Yuan, X.; Cincio, L.; et al. Variational quantum algorithms. Nat. Rev. Phys. 2021, 3, 625–644. [Google Scholar] [CrossRef]
Schuld, M.; Killoran, N. Quantum machine learning in feature Hilbert spaces. Phys. Rev. Lett. 2019, 122, 040504. [Google Scholar] [CrossRef]
Schuld, M. Supervised quantum machine learning models are kernel methods. arXiv 2023, arXiv:2101.11020. [Google Scholar]
Herbst, S.; De Maio, V.; Brandic, I. On optimizing hyperparameters for quantum neural networks. arXiv 2024, arXiv:2403.18579. [Google Scholar]
Powell, M.J. A view of algorithms for optimization without derivatives. Math. Today-Bull. Inst. Math. Its Appl. 2007, 43, 170–174. [Google Scholar]
Wu, C.; Wang, N.; Wang, Y. Increasing minority recall support vector machine model for imbalanced data classification. Discret. Dyn. Nat. Soc. 2021, 2021, 6647557. [Google Scholar] [CrossRef]
Farhadpour, S.; Warner, T.A.; Maxwell, A.E. Selecting and interpreting multiclass loss and accuracy assessment metrics for classifications with class imbalance: Guidance and best practices. Remote Sens. 2024, 16, 533. [Google Scholar] [CrossRef]
Moro, S.; Cortez, P.; Rita, P. A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 2014, 62, 22–31. [Google Scholar] [CrossRef]
Lopez-Rojas, E.A.; Axelsson, S. BankSim: A bank payments simulator for fraud detection research. In Proceedings of the European Modeling and Simulation Symposium, Bordeaux, France, 10–12 September 2014; CAL-TEK SRL: Cosenza, Italy, 2014; pp. 144–152. [Google Scholar]
Gil Fuster, E.M. Variational Quantum Classifier; Facultat de Física, Universitat de Barcelona: Barcelona, Spain, 2019. [Google Scholar]
Kanazawa, N.; Funaki, I.; Ohira, Y.; Ouchi, K. Variational quantum shot-based simulations for waveguide modes. arXiv 2023, arXiv:2301.12345, 2023. [Google Scholar]
Zhu, F.; Li, G.; Tang, H.; Li, Y.; Lv, X.; Wang, X. Dung beetle optimization algorithm based on quantum computing and multi-strategy fusion for solving engineering problems. Expert Syst. Appl. 2024, 236, 121219. [Google Scholar] [CrossRef]
Baygin, N.; Aydemir, E.; Barua, P.D.; Baygin, M.; Dogan, S.; Tuncer, T.; Tan, R.-S.; Acharya, U.R. Automated mental arithmetic performance detection using quantum pattern-and triangle pooling techniques with EEG signals. Expert Syst. Appl. 2023, 227, 120306. [Google Scholar] [CrossRef]
Abdel-Khalek, S.; Algarni, M.; Mansour, R.F.; Gupta, D.; Ilayaraja, M. Quantum neural network-based multilabel image classification in high-resolution unmanned aerial vehicle imagery. Soft Comput. 2023, 27, 13027–13038. [Google Scholar] [CrossRef]
Zeguendry, A.; Jarir, Z.; Quafafou, M. Quantum machine learning: A review and case studies. Entropy 2023, 25, 287. [Google Scholar] [CrossRef] [PubMed]
Jerbi, S.; Gyurik, C.; Marshall, S.C.; Molteni, R.; Dunjko, V. Shadows of quantum machine learning. Nat. Commun. 2024, 15, 5676. [Google Scholar] [CrossRef]
Schuld, M.; Sinayskiy, I.; Petruccione, F. An introduction to quantum machine learning. Contemp. Phys. 2015, 56, 172–185. [Google Scholar] [CrossRef]

Figure 1. Comparison between conventional machine learning and QML.

Figure 2. Example of data transformation through quantum gate.

Figure 3. AI GPU server (Nvidia Tesla A100).

Figure 4. Illustration of VQC algorithm.

Figure 5. Illustration of Variational Quantum Circuit.

Figure 6. Example of measurement using qubits.

Figure 7. Example of loss function.

Figure 8. Optimization condition exploration of the VQC algorithm.

Figure 9. Illustration of feature space: 4 × 4 matrix.

Figure 10. Illustrations of Bloch sphere visualizations before and after rotation through encoding.

Table 1. QML algorithms.

QML Algorithms	References	Description
QSVM (Quantum Support Vector Machine)	[25,26]	By utilizing Grover’s algorithm for optimization, it is possible to reduce time complexity and find optimal solutions at a faster pace in large datasets.
Q Linear Regression	[27]	Quantum linear regression is the quantum version of classical linear regression algorithms, modeling relationships between data points to perform predictions.
Q Least Squares	[28]	The Harrow–Hassidim–Lloyd (HHL) algorithm enables the rapid solution of linear equations, offering an exponential speed advantage over classical methods. By employing the HHL algorithm, we aim to efficiently solve linear systems, significantly improving computation speed while maintaining high accuracy in regression analysis.
QPCA (Quantum Principal Component Analysis)	[29]	Quantum Principal Component Analysis (QPCA) is the quantum version of Principal Component Analysis (PCA), a technique for identifying the principal components of data to reduce dimensions. QPCA can identify key patterns in large datasets more rapidly than classical PCA.
Q k-Means	[30]	The Q k-means algorithm is an unsupervised learning algorithm that groups data points into clusters, enhancing efficiency through Grover’s algorithm, allowing for fast clustering in large datasets.
Q K-Median	[31]	This algorithm finds groups of data points centered around the centroid of each cluster, quickly determining cluster medians using Grover’s search algorithm.
QKNN (Quantum k-Nearest Neighbors)	[32]	The k-Nearest Neighbors (k-NN) algorithm finds the nearest k neighbors to determine the class of a data point. Quantum k-NN (QKNN) maximizes efficiency by leveraging the parallel processing capabilities of quantum computing.
Q Perceptron Models	[33]	The perceptron, the basic unit of neural networks, addresses binary classification problems. The quantum perceptron uses quantum states and gates to perform learning and classification.
Q Neural Networks	[34]	Various neural network architectures, including multilayer perceptrons, are implemented on quantum computers to enhance learning and prediction performance.
Q Decision Tree	[35]	This approach leverages the strengths of quantum computing to provide faster learning rates and greater data processing capabilities. Quantum decision trees generate classification rules based on data attributes, utilizing quantum states and gates for faster classification.
Q Bayesian Network	[36]	Q-CBM gives computational benefits for solving complex probabilistic problems such as bike demand forecasting.
Circuit-centric Quantum Classifiers	[37]	Circuit-centric quantum classifiers are classification algorithms based on quantum circuits. They employ Parameterized Quantum Circuits (PQCs) to learn patterns in data and perform classifications. This approach encompasses Variational Quantum Algorithms (VQAs) and can be applied to various machine learning problems.
Deep Reinforcement Learning	[38]	Quantum deep reinforcement learning, the quantum version of reinforcement learning, allows agents to learn through interaction with their environment. Quantum reinforcement learning utilizes quantum parallelism and is stated to enable faster and more efficient learning.

Table 2. Information of food dataset.

Dataset	Number of Classes	Number of Features		Train Size	Test Size	Total Size	Imbalance Ratio for Dataset
Dataset	Number of Classes	Before PCA	After PCA	Train Size	Test Size	Total Size	True Ratio	False Ratio
Seasoning products	2	45	4	43,047	5881	48,928	98.5%	1.5%

Table 3. Performance comparison (food dataset).

Classical/Quantum	Model	Precision	Recall	F1 Score	Time (m)
Classical/Quantum	Model	(Macro Avg.)	(Macro Avg.)	(Macro Avg.)	Time (m)
Classical	(1) Random Forest	0.51	0.6	0.44	0
Classical	(2) XGBoost	0.49	0.44	0.46	4
Classical	(3) Extra Tree	0.51	0.62	0.45	0
Classical	(4) GBT	0.49	0.4	0.44	0
Classical	(5) AdaBoost	0.49	0.4	0.44	0
Classical	Ensemble (1), (2), (3), (4), (5)	0.5	0.54	0.38	0
Classical	Stacking (1), (2), (3), (4)	0.52	0.68	0.51	0
Classical	SVM	0.49	0.48	0.49	0
Quantum Simulator	QSVM	0.53	0.87	0.47	134
Quantum Computer	VQC	0.52	0.87	0.47	7

Table 4. Performance results of quantum circuits based on Pauli Gate combinations.

Pauli Gate	Precision	Recall	F1 Score	Runtime (s)
Pauli Gate	(Macro Avg)	(Macro Avg)	(Macro Avg)	Runtime (s)
X	0.49	0.50	0.50	383.89
Y	0.52	0.85	0.46	681.67
Z	0.52	0.85	0.46	585.57
XY	0.52	0.85	0.46	1012.55
YX	0.51	0.61	0.44	1013.30
YY	0.51	0.60	0.43	1043.99
YZ	0.53	0.87	0.47	879.13
ZY	0.53	0.87	0.47	882.51
XZ	0.52	0.85	0.46	831.37
ZX	0.51	0.61	0.44	831.06
XYZ	0.53	0.87	0.47	1126.72

Table 5. Performance comparison with and without SMOTE.

Category		Model	RS *-30	RS *-40	RS *-50	RS *-60	RS *-70
			Recall (Macro Avg.)
SMOTE	Classical	SVM	0.60	0.49	0.72	0.47	0.48
	Quantum	QSVM	0.72	0.86	0.90	0.88	0.54
	Quantum	VQC	0.78	0.91	0.90	0.90	0.77
			F1 Score (Macro Avg.)
	Classical	SVM	0.57	0.49	0.58	0.48	0.48
	Quantum	QSVM	0.47	0.44	0.50	0.45	0.45
	Quantum	VQC	0.54	0.49	0.49	0.48	0.39
			Recall (Macro Avg.)
Non-SMOTE	Classical	SVM	0.50	0.50	0.50	0.50	0.50
	Quantum	QSVM	0.50	0.50	0.50	0.50	0.50
	Quantum	VQC	0.71	0.98	0.89	0.96	0.73
			F1 Score (Macro Avg.)
	Classical	SVM	0.49	0.50	0.49	0.50	0.49
	Quantum	QSVM	0.49	0.50	0.50	0.50	0.49
	Quantum	VQC	0.45	0.60	0.49	0.55	0.50

* RS: RandomState.

Table 6. Performance comparison (Bank Customers dataset).

Category		Model	Recall (Macro Avg.)		F1 Score (Macro Avg.)
Category		Model	RS *-100	RS *-300	RS *-100	RS *-300
SMOTE	Classical	SVM	0.89	0.89	0.75	0.74
	Quantum	QSVM	0.84	0.86	0.74	0.74
	Quantum	VQC	0.84	0.79	0.78	0.65
Non-SMOTE	Classical	SVM	0.50	0.60	0.47	0.62
	Quantum	QSVM	0.58	0.55	0.60	0.56
	Quantum	VQC	0.77	0.64	0.79	0.65

* RS: RandomState.

Table 7. Performance comparison (Credit Card Fraud dataset).

Category		Model	Recall (Macro Avg.)		F1 Score (Macro Avg.)
Category		Model	RS *-10	RS *-20	RS *-10	RS *-20
SMOTE	Classical	SVM	0.50	0.84	0.50	0.90
	Quantum	QSVM	0.95	0.91	0.53	0.52
	Quantum	VQC	0.90	0.88	0.49	0.58
Non-SMOTE	Classical	SVM	0.50	0.50	0.50	0.50
	Quantum	QSVM	0.50	0.50	0.50	0.50
	Quantum	VQC	0.99	0.95	0.69	0.58

* RS: RandomState.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kwon, S.; Huh, J.; Kwon, S.J.; Choi, S.-h.; Kwon, O. Leveraging Quantum Machine Learning to Address Class Imbalance: A Novel Approach for Enhanced Predictive Accuracy. Symmetry 2025, 17, 186. https://doi.org/10.3390/sym17020186

AMA Style

Kwon S, Huh J, Kwon SJ, Choi S-h, Kwon O. Leveraging Quantum Machine Learning to Address Class Imbalance: A Novel Approach for Enhanced Predictive Accuracy. Symmetry. 2025; 17(2):186. https://doi.org/10.3390/sym17020186

Chicago/Turabian Style

Kwon, Seongjun, Jihye Huh, Sang Ji Kwon, Sang-ho Choi, and Ohbyung Kwon. 2025. "Leveraging Quantum Machine Learning to Address Class Imbalance: A Novel Approach for Enhanced Predictive Accuracy" Symmetry 17, no. 2: 186. https://doi.org/10.3390/sym17020186

APA Style

Kwon, S., Huh, J., Kwon, S. J., Choi, S.-h., & Kwon, O. (2025). Leveraging Quantum Machine Learning to Address Class Imbalance: A Novel Approach for Enhanced Predictive Accuracy. Symmetry, 17(2), 186. https://doi.org/10.3390/sym17020186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Quantum Machine Learning to Address Class Imbalance: A Novel Approach for Enhanced Predictive Accuracy

Abstract

1. Introduction

1.1. Class Imbalance Problem

1.2. Quantum Machine Learning

2. Materials and Methods

2.1. Materials

2.1.1. Implementation Environment

2.1.2. Input Dataset

2.2. Method

2.2.1. Model Selection

2.2.2. Variational Quantum Classifier

2.2.3. Feature Map Modeling

2.2.4. Variational Circuit Modeling

2.2.5. Measurement

2.2.6. Optimization

2.2.7. Evaluation Metrics

3. Results

3.1. Model Formulation

3.2. Performance Evaluation

3.3. Robustness Test

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI