Abstract
Over the past decades, density functional theory (DFT) calculations have been utilized in various fields such as materials science and semiconductor devices. However, due to the inherent nature of DFT calculations, which rigorously consider interactions between atoms, they require significant computational cost. To address this, extensive research has recently focused on training neural networks to replace DFT calculations. However, previous methods for training neural networks necessitated an extensive number of DFT simulations to acquire the ground truth (Hamiltonians). Conversely, when dealing with a limited amount of training data, deep learning models often display increased errors in predicting Hamiltonians and band structures for testing data. This phenomenon poses the potential risk of generating inaccurate physical interpretations, including the emergence of unphysical branches within band structures. To tackle this challenge, we propose a novel deep learning-based method for calculating DFT Hamiltonians, specifically tailored to produce accurate results with limited training data. Our framework not only employs supervised learning with the calculated Hamiltonian but also generates pseudo Hamiltonians (targets for unlabeled data) and trains the neural networks on unlabeled data. Particularly, our approach, which leverages unlabeled data, is noteworthy as it marks the first attempt in the field of neural network Hamiltonians. Our framework showcases the superior performance of our framework compared to the state-of-the-art approach across various datasets, such as MoS2, Bi2Te3, HfO2, and InGaAs. Moreover, our framework demonstrates enhanced generalization performance by effectively utilizing unlabeled data, achieving noteworthy results when evaluated on data more complex than the training set, such as configurations with more atoms and temperature ranges outside the training data.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
Recent density functional theory (DFT) simulation [1] research has attracted considerable attention, finding applications in diverse fields such as materials science [2, 3], semiconductor device [4], drug discovery [5], and ecology [6], showcasing its versatility in analyzing the physical properties of various materials. However, despite the high demand for DFT simulations, recent efforts have been increasing to replace them with neural networks due to the significant computational costs involved [7–13]. Previous research has predominantly focused on individual physical properties [14, 15] such as charge density and band structures [16–18]. However, predicting the DFT Hamiltonian, which expresses various physical properties in matrix form, using a neural network is considerably challenging. Consequently, only a relatively limited body of research has been conducted to address this issue. When attempting to use neural networks to model the relationship between material structure and the DFT Hamiltonian for large-scale material systems, challenges arise due to the exponential growth in the number of independent variables and the dimensions of the Hamiltonian matrix. To address this issue, a comprehensive deep learning approach for DFT Hamiltonians was introduced under the name DeepH [7]. This framework was specifically designed to investigate crystalline materials by employing a message-passing neural network (MPNN). The complex challenges associated with the inherently large dimensions and covariance concerns of the DFT Hamiltonian matrix were effectively resolved by incorporating locality principles. These included the utilization of local coordinates, localized basis transformations, and orbitals localized as basis functions. The DeepH framework consistently exhibited exceptional accuracy, not only in constructing the DFT Hamiltonian but also in computing various physical properties related to band structures and wavefunctions.
Despite these efforts in previous studies [19–22], it is still overlooked that more than hundreds of DFT simulation results are required to train neural networks. When the model learns from an insufficient quantity of training data, it exhibits elevated errors in Hamiltonian and band structure predictions. In such instances, there exists a potential hazard of yielding erroneous physical interpretations, such as the formation of unphysical branches. This phenomenon carries notable implications, not only for the accuracy of DFT predictions but also for their application in electron transport simulations. In this work, we introduce a novel framework, named SemiH, designed for training a neural network with a limited amount of data (DFT results), using semi-supervised learning (SSL). To achieve greater precision with a smaller training dataset, we introduce a method that incorporates unlabeled training data into the learning process. Our approach entails the generation of pseudo Hamiltonians for various atomic structures, employing them as targets for unlabeled data during training. This method offers the prospect of significantly reducing the expenses associated with obtaining training data.
2. Preliminaries
2.1. Neural network Hamiltonian (NNH)
In DFT, the Hamiltonian is a fundamental physical quantity that encapsulates the total energy of a system based on electron density, a concept rooted in the Hohenberg–Kohn theorem [23]. This Hamiltonian, denoted by , comprises several integral terms:
represents the kinetic energy, reflecting the motion of electrons;
accounts for the energy due to electron-nuclear attraction;
corresponds to the Hartree energy, indicative of the repulsive forces between electrons; and
embodies the exchange-correlation energy, addressing the complex exchange and correlation effects amongst electrons [24]

Due to the computational limitations of DFT in handling complex structures, recent advancements have led to the development of Hamiltonian approaches utilizing neural networks. In early research on NNH, the focus was on deriving tight-binding Hamiltonians using machine learning techniques, particularly linear regression. However, these approaches were limited to specific structures and generally exhibited lower accuracy compared to DFT Hamiltonians [25–28].
More recent developments have seen a shift towards the use of MPNN [29–33]. These methods conceptualize molecules as graphs, with atoms and chemical bonds represented as vertices and edges, respectively, thus significantly advancing the prediction of molecular properties. A significant advancement in this domain is the DeepH, which integrates MPNNs with local coordinate transformations to tackle the challenges associated with large dimensionality and rotation covariance. This integration facilitates the derivation of DFT Hamiltonians for crystalline materials. Furthering this progress, the DeepH-E3 [34] employs equivariant neural networks, which effectively reduce the computational load associated with local coordinate information, thereby enabling faster computations compared to previous models. Recently, the graph LC-Net [10] has improved data and model efficiency compared to DeepH by defining a complete local coordinate system, considering Hermitian symmetry, and encoding local information with the convolutional neural network.
Despite these advancements, a notable challenge persists: these models, while capable of streamlining the complex mathematical processes involved in traditional DFT calculations, such as Kohn–Sham equations, still demand extensive data for precise Hamiltonian predictions. Our research aims to mitigate this limitation by adopting SSL techniques, thus enhancing the efficiency and accuracy of Hamiltonian predictions in the field of computational material science.
2.2. Semi-supervised Learning
Deep neural networks often excel through supervised learning, relying on labeled datasets. However, the advantages of using larger datasets come with significant costs due to the human effort required for labeling. SSL addresses this challenge by reducing reliance on labeled data and leveraging unlabeled data. Obtaining unlabeled data is typically less labor-intensive, making SSL a cost-effective solution. Various SSL techniques tailored for deep neural networks have been proposed, such as FixMatch [35], MixMatch [36].
Among various SSL techniques, we employ the Pseudo-label method [37], which incorporates pseudo labels for unlabeled data during training. This inclusion of unlabeled data with pseudo labels enables the model to generalize better across diverse samples. Exposure to a broader range of data allows the model to capture underlying patterns and structures, leading to improved generalization performance on unseen or novel data. The Pseudo-labeling method introduces a form of self-supervision by treating model predictions on unlabeled data as ground truth labels. This self-supervised aspect encourages the model to learn meaningful representations and features without solely relying on externally provided labels.
Motivated by these previous efforts, we integrate SSL into neural network training by generating pseudo Hamiltonians to leverage unlabeled data-input data for which we have not performed DFT simulations. While there have been attempts to apply SSL in the field of molecular dynamics (MD) [38, 39], to our knowledge, there are no reported cases of applying SSL to neural network training predicting the DFT Hamiltonian that includes information about various physical properties. Therefore, our novel approach provides a unique perspective in the field of DFT Hamiltonian neural network training.
3. Methods
3.1. MPNN
To ensure spatial invariance before neural network training, it is imperative to transform the DFT Hamiltonian, represented in terms of position and localized pseudo-atomic orbital basis [40, 41]. As defined in equation (2), the transformation of the Hamiltonian matrix incorporates rotations or translations between coordinate systems. This calculation is performed for each orbital state, particularly considering the orbital's azimuthal quantum number l, magnetic quantum number m, and the multiplicity index p of the orbital state. When discussing the transformed Hamiltonian element , which accounts for the interaction between orbital α of atom i and orbital β of atom j, α and β are the product of the quantum numbers
for their respective orbitals. The transformation is achieved by summing the products of the original DFT Hamiltonian element
and the Wigner D-matrices
and
. Here, T is a
orthogonal matrix that performs the rotation transformation
between two coordinate systems s and s'

In the MPNN framework, the vertex vi represents the features of atom i, while the edge eij signifies the interaction between the pair of atoms, as illustrated in figure 1(b). The initial values assigned to these vertices and edges are based on the atomic species Ai and the interatomic distances , respectively. This process of initialization is formalized in equations (3) and (4), where cn and δ serve as the parameters that define the Gaussian basis [42]. 'Vertex Embedding' in this context refers to one-hot encoding
Figure 1. The overall architecture of our framework.
Download figure:
Standard image High-resolution image

Following initialization, both the vertices and edges are subject to an update process. In each update iteration t, the tensor products of the vertices and
, along with the edges
, are combined with spherical harmonics
. This combination creates a message
through the message function Mt, as outlined in equation (5). The vertex
is then updated using the vertex update function
, which incorporates the aggregated messages
received from neighboring vertices and the vertex's existing state, as detailed in equations (6) and (7). Additionally, the edge
is updated according to equation (8), using the edge update function
and the messages
. Here, the message function Mt corresponds to the 'Message block' in figure 1(c). This function concatenates vertices
and
, along with edge
, then performs a tensor product with spherical harmonics
and channel-wise multiplication with the initial edge
. The vertex update function
and the edge update function
are located within the "Update block" as shown in figure 1(c). In this block, messages
that have passed through a linear layer are added to the previous vertex
or edge
. The resulting values then pass through the E3LayerNorm layer implemented in DeepH-E3, and are subsequently added back to the previous values. Finally, symmetry-aligned features are effectively managed through the utilization of an Equivariant Neural Network, leading to their transformation into vector elements




After undergoing multiple updates across a series of multilayer layers, the Hamiltonian elements are meticulously computed using the Wigner–Eckart layer. In this process, we employed either SILU (Sigmoid-weighted Linear Unit) [43] or Sigmoid as the activation functions. The optimization process was carried out using the Adam optimizer [44], commencing with an initial learning rate of 0.005.
3.2. SemiH
We define our proposed framework as SemiH, which utilizes unlabeled data when training the NNH. Detailed algorithm is provided in Algorithm 1.
Algorithm 1. Overall process of SemiH. |
---|
Input: Labeled data x and their target Hamiltonians H, |
unlabeled data u, model F, initial step for semi-supervised learning I, pseudo Hamiltonian generation step ![]() ![]() |
for ![]() |
![]() |
![]() |
ife = s |
![]() |
![]() |
continue |
endif |
if![]() |
![]() |
![]() |
endif |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() ![]() |
endfor |
3.2.1. Training with labeled data
Initially, the model undergoes training only with labeled data x, where the targets H are obtained from DFT simulation results corresponding to the provided input atomic structures. Utilizing the neural network F, we predict Hamiltonians

and the difference between h and H is calculated as the loss. This initial phase aligns with the conventional approach of supervised learning. The loss function for supervised learning, denoted as , can be expressed as follows:

where n is the total number of elements in the Hamiltonian matrix. Here, N(i) is the number of i atoms, N(j) is the number of j atoms, O(i) is the number of orbitals for i atoms, and O(j) is the number of orbitals for j atoms. We train the model with this loss until the initial step for SSL I. This process establishes the initial learning weights, enabling the model to generate a pseudo Hamiltonian.
3.2.2. Pseudo Hamiltonian generation
Building upon numerous existing NNH approaches that exclusively relied on supervised learning loss, we enhance the training paradigm by incorporating unlabeled data. This involves creating a pseudo Hamiltonian corresponding to the input atomic configuration, thereby extending the model's learning capacity beyond the confines of traditional supervised learning approaches.
At the initial step I for SSL, we initiate the prediction of pseudo Hamiltonians H' for unlabeled data u, specifically those instances where only atomic configurations are given as inputs

Here, recognizing that the quality of the pseudo Hamiltonian plays a pivotal role in determining the final accuracy of the neural network, we prioritize obtaining accurate pseudo Hamiltonians. Consequently, we update the pseudo Hamiltonians every certain epochs to refine and enhance the accuracy of the pseudo Hamiltonians as the training progresses. After pseudo Hamiltonian generation step, we predict Hamiltonians h' for unlabeled data:

Then we calculate the unsupervised loss between H' and h' as follows:

Operating under the assumption that the pseudo-Hamiltonian converges to the actual DFT result (ground truth) (H' H), the unsupervised loss function tends to approach that of supervised learning. As a result, we employ a learning strategy that involves updating the pseudo Hamiltonian during the training process. This strategy aims to provide a more accurate pseudo Hamiltonian as a target for unlabeled data, enhancing the model's ability to learn meaningful representations.
3.2.3. Re-training with pseudo Hamiltonian
In traditional pseudo-labeling methods, the learning objective aims to place the decision boundary in low-density regions to achieve clear class differentiation. However, in the context of the Hamiltonian prediction task, predicting each element of the Hamiltonian-representing continuous physical quantities-with minimal error is crucial. In other words, ensuring minimal error for each element is of paramount importance. Even if the overall average error of the Hamiltonian is low, critical errors in specific elements can lead to erroneous physical interpretations.
However, when training data is limited in quantity, the neural network's knowledge of the Hamiltonian element distribution it has learned becomes restricted. Consequently, this limitation increases the susceptibility to overfitting issues. Hence, the objective of SSL in SemiH is to enhance generalization performance by incorporating unlabeled data. This incorporation mitigates the overfitting problem, significantly reducing its impact and enhancing the network's robustness on unseen data. To achieve this, we initiate the process by integrating unlabeled data with pseudo Hamiltonians and labeled data. Subsequently, we reorganize the training dataset by determining the learning order randomly. Finally, we adopt the integrated final loss function by combining the supervised and unsupervised losses as follows:

In this context, λ serves as a hyperparameter employed to balance the influence of and
. The optimal adjustment of λ plays a crucial role in determining network performance. Setting λ too high may compromise prediction accuracy for labeled data. Conversely, if λ is excessively small, it diminishes the utility of SemiH in leveraging unlabeled data. Therefore, we adopt an approach where the impact of the pseudo Hamiltonian is initially set low during the early stages of learning and progressively increased as the training proceeds. This aligns with prior research [37, 45], suggesting that gradually adjusting the influence of the loss function through incremental increments can potentially reduce the risk of the neural network getting trapped in local minima. Therefore, we initialize λ with a low value during the early training stages and gradually adjust it to higher values as the training progresses. Specifically, we implement a scheduling scheme for the hyperparameter λ, initializing it at 0 (
for supervised learning only) and subsequently adjusting it to different values at specific epochs:
. Here, e denotes the epoch at which the λ is updated.
3.3. Discussion on the reliability of pseudo Hamiltonian
To the best of our knowledge, our approach is the first attempt to generate pseudo Hamiltonians using a deep learning model to leverage unlabeled data in this field. Therefore, we considered how to effectively reflect the pseudo Hamiltonian in learning in the following four ways:
3.3.1. Unlabeled data selection
As explained in the Datasets section (section 4.1), we obtain input data (atomic structures) by varying the temperature within a specific range using MD simulations. Consequently, at temperatures near 300 K, the atomic structure exhibits more crystalline characteristics, while at higher temperatures (e.g. 2000 K), the structure becomes more amorphous. From the hundreds of continuous input data points obtained through MD simulations, we randomly select the training, validation, and test datasets, ensuring that all datasets belong to the same domain. From the training dataset, we randomly extract both labeled and unlabeled data, ensuring that they originate from the same domain. When the neural network is sufficiently trained, the decrease in validation loss on in-domain data indicates that the model has effectively learned the relevant domain-specific features. Therefore, the pseudo Hamiltonian generated for the unlabeled data is considered relatively reliable, as the network's performance on similar in-domain validation data suggests its competence. This approach provides a reasonable level of confidence in the pseudo Hamiltonian for the unlabeled data. This approach is grounded in the principle that models perform better when the labeled and unlabeled data distributions are consistent, thereby enhancing model performance by leveraging domain-specific knowledge effectively [46–49].
3.3.2. Initial pseudo hamiltonian generation step I
Even if the unlabeled data is in-domain data, the reliability of the accuracy of the pseudo Hamiltonian generated during learning is not guaranteed. Therefore, we use only labeled data in the early epoch (before I epoch) to learn the relationship between the atomic structure (input) and DFT Hamiltonian (ground truth) within the in-domain. Therefore, we empirically use I, which guarantees the accuracy of the pseudo Hamiltonian to some extent. Specifically, we empirically determine the initial step I based on the validation loss (MSE) falling within the range of 10−3 eV2 to 10−2eV2. In this case, since we assume that both the validation data and the unlabeled training data belong to the same domain, allowing us to achieve the reliability of the unlabeled data.
3.3.3. Training weight λ
As mentioned above, there is still a possibility that the pseudo Hamiltonian of the early epochs may have low reliability (even for the epoch after I). Therefore, we assign a small λ at the beginning of learning to reduce the impact of loss on unlabeled data using the pseudo Hamiltonian as a target, and assign higher λ as learning progresses. Furthermore, λ also serves a crucial role in regularization during the learning process by incorporating the loss on unlabeled data, thereby potentially mitigating overfitting and enhancing the model's generalization ability [50, 51].
3.3.4. Pseudo Hamiltonian update
Even if the pseudo Hamiltonian generation above is performed in epoch I, the model is still under-trained on the pseudo labeled data and therefore continuous pseudo Hamiltonian update is essential. Therefore, we iterate the pseudo Hamiltonian generation several times as training progresses. This iterative refinement helps the model incrementally improve its predictions.
A pseudo Hamiltonian was generated through the items discussed above, and the resulting analysis experiment is provided in figure 3(b). Through this analysis, we can confirm the connection with our claim by showing that when the pseudo Hamiltonian was actually updated for the last time, the difference with the ground truth (actual DFT result) significantly decreased compared to the early epoch.
4. Experiments
We generate stable atomic structures in response to temperature variations using ab initio MD calculations facilitated by the Vienna ab initio simulation package [52]. For each atomic configuration, we obtain the DFT Hamiltonian based on pseudo-atomic localized basis functions from the OpenMX software package [40]. The DFT employs the PBE exchange-correlation energy functional [53] and norm-conserving pseudopotentials [54]. We utilize a set of labeled data with quantities of for each experiment, using ten times the number of unlabeled data, except for Bi2Te3, where the unlabeled data is six times the labeled data when the number of labeled samples is 30. Our framework has been applied to the state-of-the-art model, DeepH-E3, which incorporates the methodologies previously discussed.
4.1. Datasets
Labeled data: For the labeled data, we randomly select a subset of diverse atomic configurations obtained from MD simulations. DFT simulations are then performed on these selected configurations to extract the corresponding DFT Hamiltonians, which serve as the targets for training the neural network.
Unlabeled data: To address the constraint of learning from a limited set of labeled data and to enhance generalization performance, the judicious selection of unlabeled data is crucial. We employ in-domain data for effective selection of unlabeled data. Specifically, the unlabeled data comprises the remaining atomic configurations from the MD simulations, excluding those used as labeled data. These configurations do not undergo DFT calculations. This approach reduces the computational costs associated with DFT calculations while still leveraging the benefits of using in-domain data by incorporating relaxed atomic configurations obtained within the same MD experimental environment.
We present table 1 to outline the composition of the dataset, detailing the number of labeled, unlabeled, and testing data samples for each material used in our study, thereby emphasizing our methodology of selecting diverse atomic configurations and efficiently choosing unlabeled data to enhance training.
Table 1. Composition of the dataset consisting of MoS2 and Bi2Te3, HfO2, InGaAs. In the train data, labeled and test data refer to cases where both MD and DFT were performed, while unlabeled data in the train data refers to cases where only MD was performed.
Material | Atoms | Temperature (K) | Orbital Basis | Training data | Testing data | |
---|---|---|---|---|---|---|
Labeled | Unlabeled | |||||
MoS2 [7] | 75 | 300 | s3p2d2 (Mo), | 30 | 300 | 50 |
s2p2d1 (S) | ||||||
Bi2Te3 [34] | 90 | 300 | s3p2d2 (Bi, Te) | 30 | 180 | 40 |
HfO2 (P21/c) | 96 | 1500-300 | s2p2d1 (Hf), | 30 | 300 | 50 |
s2p2 (O) | ||||||
InGaAs (R3m) | 108 | 600-300 | s2p2d2 (In, Ga, As) | 30 | 300 | 50 |
4.2. Main results
For each system, we evaluated Hamiltonian and band structure errors by comparing our method to the state-of-the-art technique, DeepH-E3. Table 2 illustrates that our framework surpasses the baseline method across various material datasets, particularly when using a smaller number of labeled data for training. Confirming an increase in average Hamiltonian error with fewer training data, we observed that SemiH exhibits significantly lower errors compared to the baseline model. Notably, when utilizing only 10 labeled data, the difference is found to decrease by compared to the baseline.
Table 2. Average Hamiltonian error (MSE). We compared our method with DeepH-E3. The best results are indicated in bold for each case. For neural network training, the number of unlabeled data was 10 times the number of labeled data, except in the case of Bi2Te3, where 180 unlabeled data were used with 30 labeled data. The baseline represents DeepH-E3.
Hamiltonian error (![]() | ||||||
---|---|---|---|---|---|---|
# of labeled data | 10 | 20 | 30 | |||
Method | baseline | SemiH | baseline | SemiH | baseline | SemiH |
MoS2 | 1.42 | 0.39 | 0.54 | 0.24 | 0.30 | 0.16 |
Bi2Te3 | 0.94 | 0.32 | 0.40 | 0.20 | 0.25 | 0.15 |
HfO2 | 0.98 | 0.48 | 0.40 | 0.29 | 0.25 | 0.24 |
InGaAs | 1.10 | 0.65 | 0.65 | 0.53 | 0.55 | 0.50 |
Average | 1.11 | 0.46 | 0.50 | 0.31 | 0.34 | 0.26 |
Δ | ![]() | ![]() | ![]() |
The issue of high Hamiltonian error is visually evident through the band structure in figure 2. Using Hamiltonians generated by the baseline model and SemiH, trained with only 10 labeled data, we conducted band structure calculations as follows:
Figure 2. Band structures of MoS2, Bi2Te3, HfO2, and InGaAs (from left to right) calculated by the baseline (blue dashed lines), SemiH (red doted lines), and DFT results (balck solid lines).
Download figure:
Standard image High-resolution image

The time-independent Schrödinger equation (15) and Bloch's theorem (16) are critical in deriving the band structure of crystalline solids. In the Schrödinger equation, the wave function ψ represents the quantum state of an electron in the system, S is overlap matrix, and the energy eigenvalues E denote the allowed energy levels for electrons in a crystal. Bloch's theorem extends this concept to electrons in a crystalline lattice, describing the electron wave function ψ as a product of a periodic function and a plane wave
. Here, k is the wave vector within the Brillouin zone, indicating the electron's momentum in the periodic lattice, and r is the position vector, denoting the electron's location in space. Utilizing these equations, we can calculate the band structure, which is pivotal for analyzing a physical properties of atomic structures.
As a result, when a restricted amount of labeled data is utilized, the baseline method may exhibit inaccurate band structures or even generate unphysical branches, potentially leading to erroneous physical interpretations. In contrast, our framework, SemiH, adeptly integrates unlabeled data into neural network training, effectively mitigating this issue. The quantitative results for the band structure error are presented in table 3. In all cases where labeled data is used, SemiH demonstrates superior performance. Particularly noteworthy is the case with 10 labeled data, where the band structure error decreases by
compared to the baseline.
Table 3. Average band structure error. We compared our method with DeepH-E3. The best results are indicated in bold for each case. We computed the band structure error by evaluating the disparity between the band structure calculated with the actual DFT Hamiltonian and the predicted Hamiltonian from both DeepH-E3 and SemiH. The baseline represents DeepH-E3.
Band structure error (![]() | ||||||
---|---|---|---|---|---|---|
# of labeled data | 10 | 20 | 30 | |||
Method | baseline | SemiH | baseline | SemiH | baseline | SemiH |
MoS2 | 2.52 | 1.39 | 1.89 | 1.19 | 1.49 | 1.20 |
Bi2Te3 | 4.21 | 2.00 | 2.17 | 1.61 | 1.68 | 1.27 |
HfO2 | 8.06 | 5.79 | 3.97 | 3.80 | 2.03 | 2.02 |
InGaAs | 4.10 | 1.73 | 1.26 | 1.12 | 1.36 | 1.16 |
Average | 4.72 | 2.73 | 2.32 | 1.93 | 1.64 | 1.41 |
Δ | ![]() | ![]() | ![]() |
4.3. Analysis
4.3.1. Accuracy of pseudo Hamiltonian
As discussed in section 3.2.2, the accuracy of the pseudo Hamiltonian is crucial for maximizing the performance of the SemiH framework, as it serves as the target for unlabeled data. In our experiments, detailed in figure 3, we update the pseudo Hamiltonian every 20 epochs and calculate the difference between the pseudo Hamiltonian and the actual DFT Hamiltonian at each update. In figure 3(a), we observe a convergence of errors to lower values with each update of the pseudo Hamiltonian. This trend is further evident in figure 3(b), where the loss confirms the decreasing error. Compared to the initial pseudo Hamiltonian error (left), we notice a significant reduction in overall error as the pseudo Hamiltonian is updated in later epochs. We identically performed this experiment on MoS2, Bi2Te3, HfO2 and InGaAs datasets, the average of MSE is shown in table 4.
Figure 3. (a) Accuracy of the pseudo Hamiltonian calculated every 20 epochs during the training process. We utilized 10 labeled data and 100 unlabeled data as our training dataset. We calculated the difference between the actual DFT Hamiltonian and the predicted Hamiltonian by SemiH (MSE). (b) MSE of the pseudo Hamiltonian is displayed. From left to right, it represents the error between the DFT Hamiltonian H and the updated pseudo Hamiltonian H' obtained as learning progresses for unlabeled data of Bi2Te3. The first, second, and third instances correspond to 60 epochs, 180 epochs, and 480 epochs, respectively.
Download figure:
Standard image High-resolution imageTable 4. The MSE of the pseudo Hamiltonian calculated as training progresses. The experiments are performed on the four datasets, MoS2, Bi2Te3, HfO2, and InGaAs. This illustrates the error in the early (epoch:60), middle (epoch:180), and later (epoch:480) epoch, respectively. (please refer to figure 3.) The error represents the MSE between pseudo Hamiltonians and the actual Hamiltonians from DFT simulations.
Hamiltonian error (![]() | ||||
---|---|---|---|---|
MoS2 | Bi2Te3 | HfO2 | InGaAs | |
epoch:60 | 33.10 | 209.0 | 9.29 | 10.80 |
epoch:180 | 2.09 | 2.43 | 4.28 | 1.99 |
epoch:480 | 0.40 | 0.36 | 0.45 | 0.94 |
4.3.2. Extension to more challenging dataset
As DFT simulations assume a periodic unit cell, predicting the Hamiltonian for a larger cell with more atoms randomly arranged becomes a more challenging task when using a neural network trained on a smaller unit cell. In figure 4, we trained a neural network on a () supercell and predicted the Hamiltonian for a (
) supercell, calculating the error against the actual DFT Hamiltonian. Consequently, SemiH exhibits superior generalization performance by reducing errors across the Hamiltonian elements even as the randomness of atomic structures increases (refer to the table A3). This observation aligns with the fundamental goal of DFT simulations to implement larger systems using information learned from smaller systems. It underscores SemiH's capability to fulfill the intrinsic purpose of DFT simulations in extending from small to larger systems, even as atomic structures become more random (refer to the figure A1).
Figure 4. The average MSE for 20 data is measured between the Hamiltonian predicted by a neural network and the DFT Hamiltonian. The network, initially trained on a () HfO2 supercell, predicts the Hamiltonian for a larger (
) supercell. Results from DeepH-E3 are shown on the left, and from SemiH on the right.
Download figure:
Standard image High-resolution imageAdditionally, the difficulty of Hamiltonian prediction may increase with changes in MD simulation conditions. In our initial experiments, we trained the neural network using atomic structures obtained by raising the temperature up to 1500 K and then lowering it to 300 K. To further investigate, we conducted additional experiments by raising the temperature up to 2000 K and lowering it to 300 K, using the newly extracted testing dataset. When altering MD simulation conditions, materials exposed to higher temperatures tend to adopt more amorphous atomic structures. Consequently, the neural network needs to predict atomic structures with increased randomness compared to the initially trained input data, posing a more challenging task. As depicted in figure 5, SemiH not only exhibits an overall reduction in errors, surpassing the baseline but also demonstrates significantly decreased errors each Hamiltonian element (refer to the table A3). This validates its ability to mitigate the risk associated with critical error elements by learning atomic structures more generally without the need for extra training, even when exposed to materials with higher temperatures and increased structural randomness (refer to the figure A2).
Figure 5. The average MSE between the DFT Hamiltonian and the predicted Hamiltonian. The neural network is trained using training data extracted from MD simulations conducted under conditions up to 1500 K. Testing data is obtained under conditions up to 2000 K. Results from DeepH-E3 and SemiH are on the left and right, respectively.
Download figure:
Standard image High-resolution image4.3.3. Training data reduction ratio
To assess efficacy of SemiH in reducing training data, we conducted a comparative analysis with DeepH across a spectrum of labeled data reduction, ranging from 50 to 10 labeled data (refer to figure 6). The experimentation encompassed MoS2 and Bi2Te3 materials. DeepH demonstrated the ability to maintain about eV2 error threshold, crucial for preserving band structure integrity for these materials, solely when trained on 50 labeled data. Conversely, SemiH exhibited comparable performance, even with a reduction to 30 labeled data, achieving a comparable error. This represents a significant data reduction of 40% compared to DeepH's requirement for 50 data. Thus, SemiH achieves an approximately 40% reduction in required data.
Figure 6. Hamiltonian error (MSE) of DeepH-E3 and SemiH according to the number of labeled data used. This experiment was performed on MoS2 and Bi2Te3, respectively, and in the case of SemiH, 6 to 10 times the number of unlabeled data as labeled data was used for training. The dashed line is the error of DeepH when using 50 labeled data.
Download figure:
Standard image High-resolution image4.3.4. Ablation study
We conducted ablation studies, varying the hyperparameters used for neural network training. For the ablation studies, 10 labeled data and 30 unlabeled data are utilized. Figure 7(a) shows that the Hamiltonian error decreases as more unlabeled data are used. This supports our argument that unlabeled data should be used to generate more accurate Hamiltonians. Figure 7(b) illustrates that as the epoch I for initially generating the pseudo Hamiltonian decreases, the reduction in errors occurs by promptly incorporating the influence of unlabeled data into the learning process. Additionally, we investigated the optimal frequency of updating the pseudo Hamiltonian, as depicted in figure 7(c). The results indicate that more frequent updates of the pseudo Hamiltonian lead to a reduction in errors across all four datasets tested. This observation aligns with the context of previous experiments emphasizing the necessity for achieving high accuracy in the pseudo Hamiltonian within our framework. Therefore, considering the trade-off between training time and accuracy, we opted to update the pseudo Hamiltonian every 20 epochs. Considering that an excessive influence of unlabeled data from the early epochs may potentially disrupt the learning of labeled data, we investigated the impact of the training weight λ in figure 7(d). The findings suggest that in the early epochs, when the neural network is not sufficiently trained, there is a higher likelihood of generating a less accurate pseudo Hamiltonian. Therefore, it is advisable to commence with a small λ and gradually increase it. We verified that a gradual increment of λ results in improved performance compared to a constant λ across various datasets.
Figure 7. Hamiltonian error (MSE) measured while varying (a) the number of unlabeled data used in neural network training (b) the initial pseudo Hamiltonian generation step I and (c) the re-generation step size (d) various scenarios of training weight λ with each 100 epochs: .
Download figure:
Standard image High-resolution image4.3.5. Effects of the Parameter Size
To verify the effect of parameter size on model performance, we measure training loss as learning progresses. Figure 8 shows the experimental results for four materials. DeepH-E3 and DeepH-E3
mean when the number of parameters is reduced to 95% and 85%, respectively, compared to the original DeepH-E3. It has been verified that the performance of DeepH-E3 decreases as the number of parameters decreases. These results mean that it is difficult to solve problems arising from insufficient data simply by reducing the parameter size, which means that SemiH overcomes this problem by showing lower training loss.
Figure 8. Training loss (MSE) according to learning progress. The experiment is performed on four materials: MoS2, Bi2Te3, HfO2, and InGaAs. DeepH-E3 (green dashed line) and DeepH-E3
(blue dashed line) mean when the number of parameters is reduced to about 95% and 85%, respectively, by reducing the number of edge and vertex features. Also, the original DeepH-E3 and SemiH are represented as the black and the red solid lines.
Download figure:
Standard image High-resolution image4.3.6. Computational cost
Table 5 shows the DFT simulation time to obtain the DFT Hamiltonian for one data. The number of atoms used in the experiments above is 75 for MoS2, 90 for Bi2Te3, 96 for HfO2, and 108 for InGaAs, respectively. Accordingly, the actual time required for DFT simulation is 2.56, 1.54, 1.02, and 9.33 hours to obtain one piece of data, respectively. As mentioned in section 4.3.3, by using only 30 labeled data, we can obtain results comparable to using 50 labeled data, so the time savings for 20 labeled data are 51.2 and 30.1, 20.4, and 186.6 hours, respectively. As shown in table 5, as the number of atoms increases, the time it takes to obtain data becomes much greater. Therefore, the usability of our framework increases in cases where large structures must be analyzed or when there is only little labeled data.
Table 5. Time cost of performing DFT simulations. OpenMX is used as the DFT simulation tool and the simulations are performed on an Intel Xeon CPU E5-2660 v2 @ 2.20GHz using 4 cores. The number of atoms in the training data we used for SemiH learning is indicated in bold. The actual DFT simulation times corresponding to each material are underlined.
MoS2 | Bi2Te3 | HfO2 | InGaAs | ||||
---|---|---|---|---|---|---|---|
# atom | Time(h) | # atom | Time(h) | # atom | Time(h) | # atom | Time(h) |
48 | 0.75 | 90 | 1.54 | 12 | 0.03 | 32 | 0.34 |
75 | 2.56 | 180 | 13.61 | 96 | 1.02 | 108 | 9.33 |
192 | 41.75 | 360 | 54.77 | 324 | 15.73 | 256 | 41.78 |
5. Conclusion
Current approaches to train neural networks as replacements for DFT calculations typically assume the need for a substantial number of DFT simulation results for effective learning. However, when confronted with a limited amount of training data, there arises a risk of obtaining distorted results in subsequent physical analyses. To tackle this challenge, we introduce a framework designed to alleviate the limitations stemming from insufficient training data. The accomplishment is achieved by integrating SSL techniques into the training process through the utilization of unlabeled data via pseudo Hamiltonian generation. Consequently, our proposed framework, SemiH, not only outperforms existing NNH methods but also adeptly extends its capabilities to more challenging datasets characterized by increased randomness. In this way, our framework offers a novel perspective for exploring diverse applications using DFT Hamiltonians.
6. Broader impact
The exploration of neural networks for predicting physics-based simulations or experimental results remains a subject of ongoing research. However, there are scenarios where access to extensive simulations or conducting experiments can be limited due to various constraints, including resource limitations or high costs. In such situations, it becomes imperative to achieve reliable and meaningful results using a limited amount of training data. We believe our framework offers a versatile solution that can be applied effectively to a wide range of examples in these circumstances.
Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).
Some of the datasets we used were provided by DeepH-E3 [7], and we can provide the HfO2 and InGaAs datasets generated by us upon request to the authors.
Appendix:
In this section, we discuss the experimental environment and additional analyses and further experiments that were not covered in the main manuscript.
A.1. Generalization Performance of SemiH
In this section, we discuss the generalization performance of SemiH. Here, generalization performance refers to the ability of SemiH not only to accurately predict specific Hamiltonian elements but also to train in a way that enhances the overall prediction accuracy for each element.
Firstly, table A1 presents the accuracy of pseudo Hamiltonians generated by the SemiH framework for unlabeled data. It is observed that the average error decreases, and, moreover, the overall error decreases with later epochs of the generated pseudo Hamiltonians, accompanied by reduced variance. Consequently, as the training progresses, the generalization performance improves, leading to an overall reduction in error for Hamiltonian elements. This addresses the limitation of having a potentially incorrect physical interpretation for elements with maximum Hamiltonian error, even when the average Hamiltonian error is small. Thus, utilizing the extracted pseudo Hamiltonian as a target demonstrates the strength of SemiH in leveraging unlabeled data.
Table A1. The maximum, minimum, average, median, and variance for the MSE between the pseudo Hamiltonian and the actual DFT Hamiltonian of Bi2Te3 are computed for unlabeled data generated by SemiH. The calculations are performed using the pseudo Hamiltonian generated in the early epoch (60), middle epoch (180), and later epoch (480), respectively. (refer to figure 3(b).)
Hamiltonian error (![]() | |||||
---|---|---|---|---|---|
Max | Min | Average | Median | Standard deviation | |
epoch:60 | 24 431.91 | 0.24 | 221.38 | 23.91 | 12.44 |
epoch:180 | 160.77 | 0.00 | 2.52 | 0.96 | 0.09 |
epoch:480 | 11.93 | 0.01 | 0.37 | 0.20 | 0.01 |
Table A2. The maximum, minimum, average, median, and variance for the MSE between the predicted Hamiltonians on HfO2 test dataset from DeepH-E3 and SemiH, and the actual DFT Hamiltonians are computed. The network, initially trained on a () HfO2 supercell (10 labeled data and 100 unlabeled data as our training dataset), predicts the Hamiltonian for a larger (
) supercell. (refer to figure 4)
Hamiltonian error (![]() | |||||
---|---|---|---|---|---|
Max | Min | Average | Median | Standard deviation | |
DeepH-E3 | 16.47 | 0.12 | 1.03 | 0.58 | 1.47 |
SemiH | 6.26 | 0.06 | 0.52 | 0.26 | 0.79 |
Δ | ![]() | ![]() | ![]() | ![]() | ![]() |
Additionally, we verified the high generalization performance of SemiH compared to DeepH-E3 on more challenging test datasets in tables A3 and 2. In both tables, the maximum error decreases by and
respectively, and the variance of Hamiltonian error is more than
lower. This confirms a higher accuracy for overall Hamiltonian elements. Consequently, even in slightly modified environmental conditions, SemiH trained in specific conditions demonstrates high accuracy in inferring test data created in those altered conditions. This underscores the excellent scalability of SemiH, even when simulation environments change.
Table A3. The maximum, minimum, average, median, and variance for the MSE between the predicted Hamiltonians on HfO2 test dataset from DeepH-E3 and SemiH, and the actual DFT Hamiltonians are computed. The neural network is trained using training data extracted from MD simulations conducted under conditions up to 1500 K. Testing data is obtained under conditions up to 2000 K. (refer to figure 5)
Hamiltonian error (![]() | |||||
---|---|---|---|---|---|
Max | Min | Average | Median | Standard deviation | |
DeepH-E3 | 13.70 | 0.16 | 1.06 | 0.67 | 1.24 |
SemiH | 4.93 | 0.07 | 0.49 | 0.29 | 0.54 |
Δ | ![]() | ![]() | ![]() | ![]() | ![]() |
A.2. Generalization performance with multiple materials
To investigate the generalization performance of the proposed framework and its learning capabilities for different crystalline materials, we conducted experiments to evaluate the Hamiltonian error when learning different materials simultaneously. Specifically, we evaluated the performance of the model using MoS2 and Bi2Te3 simultaneously as training data, as detailed in table A4. Simultaneous training on MoS2 and Bi2Te3 (DeepH-E3 and SemiH
) results in higher Hamiltonian errors compared to the models trained independently for each material (DeepH-E3 and SemiH). However, SemiH
performs better than DeepH-E3
in terms of Hamiltonian error for both materials. This suggests that the SemiH framework achieves enhanced generalization performance for simultaneous learning compared to the DeepH-E3 framework by leveraging unlabeled data.
Table A4. The MSE of Hamiltonian is reported when DeepH-E3 and SemiH are trained simultaneously on MoS2 and Bi2Te3. DeepH and SemiH
denote the models trained concurrently on MoS2 and Bi2Te3. For both materials, the training data comprises 10 labeled and 30 unlabeled samples. The validation datasets for MoS2 and Bi2Te3 consist of 50 and 40 samples, respectively.
Hamiltonian error (![]() | ||||
---|---|---|---|---|
DeepH-E3 | DeepH-E3![]() | SemiH | SemiH![]() | |
MoS2 | 1.42 | 1.81 | 0.80 | 0.82 |
Bi2Te3 | 0.94 | 2.18 | 0.52 | 0.99 |
This observation aligns with the main focus of SemiH, demonstrating that our model can exhibit superior generalization performance. This indicates the potential for training robust integrated models on diverse datasets.
A.3. Radial distribution function (RDF)
The RDF is a crucial physical quantity employed in MD and solid-state materials to describe the distribution of particles. This function provides a probability distribution of particle positions, offering insights into particle interactions and organization. The RDF illustrates how frequently other particles appear when a specific particle is located at the center. It is primarily utilized in the analysis of distance distributions between particles in dense substances such as liquids or solids. The function is extensively applied across various fields, including chemistry, physics, and biology, to comprehend particle interactions and elucidate material properties. The RDF is defined as follows: The particle distance function, g(r), in the system is expressed as:

where r, ρ, N, N(r) represent the distance from the central particle, particle density, total number of particles in the system and the number of particles within a distance dr from the central particle at distance r, respectively. The RDF characterizes the probability density of particle positions at specific distances, offering crucial information about the system's structure and interactions. This function is instrumental in studying changes in material states, structural transformations, and the nature of interactions within a substance.
Figure A1 presents RDFs categorized by types of atomic interactions in the context of figure 4. In larger systems, the artificially induced correlations due to periodic boundaries diminish compared to smaller systems. As a result, particles interact in a more diverse range of environments, leading to a more random behavior, as depicted in figure A1. Consequently, in larger systems, RDF peaks tend to become broader and lower.
Figure A1. The radial distribution function is analyzed at size of (2 × × 2) supercell and (3 × 3 × 3) supercell in MD simulations of HfO2 at a temperature of 1000 K. This analysis focuses on three types of atomic interactions: (a) Hf–Hf, (b) Hf–O, and (c) O–O interactions.
Download figure:
Standard image High-resolution imageAs the temperature increases, the thermal motion of atoms intensifies, leading to a greater separation between particles and a partial relaxation of their alignment. Consequently, as the temperature rises, one can observe lower and broader peaks in the RDF. In other words, as illustrated in figure A2(a), (b), and (c) corresponding to the situation in figure 5, at 2000 K, the structure becomes more amorphous, with the lower peaks and a more gradual form.
Figure A2. The radial distribution function is analyzed at temperatures of 300 K, 1500 K, and 2000 K in MD simulations of HfO2, focusing on (a) Hf–Hf, (b) Hf–O, and (c) O–O interactions.
Download figure:
Standard image High-resolution imageFrom the perspective of neural network predictions, the assumption in this paper, which relies on a very limited training dataset, may lead to a lack of reliability in predicting Hamiltonians for atomic structure changes when the model overfits to the training data. Therefore, the extension of our testing to a more challenging dataset demonstrates that SemiH achieves further generalization performance.