Open Access Published by De Gruyter Open Access October 23, 2024

Hardware automatic test scheme and intelligent analyze application based on machine learning model

Ru Jing , Yajuan Zhang and Shulong Zhuo

https://doi.org/10.1515/comp-2024-0006

Abstract

Hardware testing has always been the core of hardware development, and improving the performance and efficiency of hardware testing is very important for hardware development. Because hardware quality management is insufficient, many large hardware tools were developed using manual workshop technology in the past and could hardly be maintained. This can lead to the cancellation of the project, causing major personnel and property losses. Improving hardware quality and ensuring security are very complex problems. Hardware testing is usually conducted through manual and automatic testing, and the limitations of manual testing are increasingly obvious. So hardware automatic testing technology has attracted people’s attention in recent years. It has become an important research direction in the field hardware testing and can overcome many problems of traditional testing methods. Strict test rules, based on standards and scores, provide a fully automated test process. With the continuous improvement of network technology, the functions and scope of hardware are constantly enriched and expanded. With the acceleration of hardware updates and development, this has brought a heavy burden to the previous hardware testing work. The purpose of this article was to study the application of machine learning technology in the field of hardware automatic testing and provide an appropriate theoretical basis for optimizing testing methods. This article introduced the research methods of hardware automatic testing technology, introduced three automatic testing framework models, and summarized the application of machine learning in hardware testing. It included hardware security and reliability analysis, hardware defect prediction, and source-based research. Then, this article studied the defect prediction model and machine learning algorithm and constructed a hardware defect prediction model based on machine learning based on the theory. First, the data were preprocessed, and then, the Stacking method was used to build a comprehensive prediction model, and four prediction results evaluation indicators were established. In the experiment part, the defect prediction results of the hardware automatic test model were studied. The results showed that the hardware defect prediction model based on machine learning had higher accuracy, recall rate, F_measure and area under curve. Compared with other models, the average accuracy of the hardware defect prediction model in this article was 0.092 higher, which was more suitable for automatic hardware testing and analysis.

Keywords: hardware automatic test; intelligent analyze; machine learning model; hardware defect prediction

1 Introduction

1.1 Background

Early research on automated testing technology mainly focused on using automated methods to perform and replace some tedious and repetitive mechanical work in manual testing. With the application of automated test methods and technologies in various test activities, people begin to pay attention to the effectiveness of automated test implementation. Automated testing not only saves humans and resources, but also determines whether automated testing technology can reach the level of manual test in terms of test efficiency. Automatic test research introduces some test standards and strategies to evaluate test results. These standards and strategies are meaningful only when the reliability of automatic test results is guaranteed.

With the development of automatic testing technology, the standard of measuring the original statistical method is limited [1]. hardware testing is not a mechanical random error detection, but a process and behavior with strong pertinence and relevance. Therefore, in the field of automatic testing, the focus of research is evolutionary computing and artificial intelligence [2]. Introducing various high-performance algorithms into automatic test methods further reduces the dependence on human intervention and improves the analysis and processing of test data.

1.2 Status

At present, there are many research studies on hardware testing. Motahhir, Saad has developed an open hardware and hardware test platform for the sun tracker. The test platform also provided a virtual instrument based on Excel, which can record and display the data of the sun tracker [3]. Balera and de Santiago Júnior investigated the situation of hardware testing using the hyper-test method [4]. Alghamdi and Fathy investigated existing testing tools for hardware testing of parallel systems to detect runtime errors [5]. Marculescu discussed the development and deployment of search-based hardware testing tools for industry, and considered the transfer of these technologies to industry [6]. Melo collected and synthesized empirical research on concurrent hardware testing to describe the characteristics of this field and the types of empirical research conducted [7]. Gómez et al. pointed out that DSL is a programming language designed to express solutions to problems in specific fields, which helps improve productivity and quality. It also stated that SpringBoot is an open source Java-based framework used to implement the REST architecture style, which is a powerful developer and may result in a significant amount of funding for building independent hardware application programs and providing ready-made products for hardware application programs [8]. Quirós et al. introduced decentralized automation, a new method of reprogramming ubiquitous industrial IoT devices in critical infrastructure to jointly execute common computing workloads in the best and most reliable way. He believes that decentralized automation brings new life to industrial IoT devices, some of which are as powerful as general-purpose computers, And enable them to be used for collaborative execution of various computing workloads, while dynamically adapting to different application programs and operating environments. Two key contributions are the use of domain-specific languages and computation on passage [9]. Although there were many research studies on hardware testing, the hardware automatic testing scheme and intelligent analysis needed to be further studied. These scholars have to some extent enriched the theoretical content hardware testing and provided more ideas for hardware automation testing, but there are also some shortcomings. Due to the focus of scholars' research on hardware testing results and methods, there is little research on hardware automation testing solutions, which leads to poor hardware testing efficiency, low automation level, and overall improvement of hardware quality.

Machine learning was used in hardware research. Tran proposed a comparison of machine learning techniques to evaluate the severity and priority of hardware defect reports and then selected a method using the optimal decision tree or random forest for further research [10]. Esteves used the machine learning method to predict hardware defects [11]. Tucker used machine learning healthcare hardware to explore the integration of resampling, probabilistic graphic modeling, potential variable identification, and outlier analysis to generate real synthetic data based on primary care patient data [12]. Gerke used machine learning hardware to supervise medical devices [13]. Goyal and Bhatia compared 30 hardware quality prediction models, which were based on five machine learning technologies [14]. Jaiswal and Malhotra proposed to use the maximum likelihood technology in machine learning technology to predict hardware reliability and evaluate it according to the selected performance criteria [15]. Sandhu and Batth introduced an integrated random forest and gradient advancing machine learning algorithm to test the reusability of a given hardware code [16]. Although there were many research studies on machine learning in hardware, it was necessary to further study the application of machine learning in hardware automatic test schemes and intelligent analysis. Scholars applying machine learning to hardware research is beneficial for improving hardware reliability, but due to their lack of emphasis on exploring hardware automation testing under machine learning and analyzing hardware defect prediction, hardware security still needs to be improved. There are still significant issues with defect detection in hardware, which is not conducive to achieving hardware automation testing efficiency. At the same time, it does not have much effect on improving the performance of hardware automation testing and analysis. From it, it can be seen that previous scholars' research on machine learning and hardware is still in the initial stage, and hardware automation testing and intelligent analysis are still in a blank stage, and there is still room for improvement in the future.

1.3 Significance and content

In order to improve the accuracy of hardware automatic testing and analyze and improve the ability and efficiency of hardware automatic testing, this article used machine learning technology to study hardware automatic testing and intelligent analysis. This article first introduced some theoretical background, including the research direction of hardware automatic testing technology, automatic testing framework model, and the application of machine learning in the field of hardware testing. This article then took hardware defect prediction as an example, built hardware automatic testing and intelligent analysis model based on machine learning theory, and studied the hardware defect prediction performance of the model in the experimental part.

2 Development of hardware automatic testing and intelligent analysis

2.1 Direction of hardware automatic testing technology

2.1.1 Test automation framework

In today’s hardware development projects, it is always difficult to fully coordinate automated test methods with the entire project. Especially for the development of projects involving automated test activities or tests that are difficult to fully automate, more flexible automated test models need to be developed. The model can be described in detail in terms of the test life cycle and hardware development life cycle, and adapt to automated testing with the minimum changes and configurations. Some automated test models have also been developed from different perspectives [17].

2.1.2 Test automation script technology

There are many test processes that can be automated, and testers want to determine simple scripts to manage test automation. For example, in network application testing, scripts can be used to automatically simulate manual testing, saving a lot of human resources. However, many auxiliary automated testing tools may contain developer-particular scripting language [18]. If testers want to use automated testing tools, they should use different script languages to learn automated testing. If a standardized and easy-to-use test script language can be developed, automated testing would be simplified and the workload of testers would be reduced. Therefore, the acquisition of automatic test case technology is also a core component of automatic testing technology [19].

2.1.3 Automatic test case generation

Automatic generation of test cases is a typical method of data-driven testing, which generates errors in test programs through the automatic generation of test data. The first automatic test case generation focuses on generating randomized test cases [20]. Although this method is simple and easy to implement, it is inefficient, prone to distribution, and generates a large number of repeated test data in the test space. Automatic test case generation based on heuristic learning can significantly improve test efficiency, for example, by using scalable computers and intelligent computing technology to control the automatic generation of test cases. Heuristic learning can independently learn how to create screening test scripts to accelerate the generation of effective test scripts, eliminate and reduce unnecessary data generation [21].

2.2 Automatic test framework model

In the test framework model, testing is mainly based on the rules set in the process of program development. It generates test data in the program input variable area to facilitate the execution of the test program and generates the actual output results during the program operation. Compared with the expected output during test development, the maximum value is the formation of longitudinal test results [20].

2.2.1 Program structure automatic test framework model

The automatic test framework model based on hardware structure is more concerned than the automatic test function model. This automatic test framework model has more implementation systems. Usually, compilers are used to analyze the structure of programs and integrate test information into the structure of programs. It realizes test automation by running multiple dynamic programs. Obviously, the structural analysis of the program can obtain more test information than the function description of the program through higher testing costs [22]. The generality of this implementation process is limited by the program implementation characteristics.

2.2.2 Framework model function automatic test

Although functional testing is mainly based on the relationship between hardware input/output representations, the research focuses on the automatic testing framework based on rule description and describes the framework model in detail with relatively simple test objectives. The test objectives realized in this automated test framework model are relatively simple, mainly dealing with the relationship between test coverage and test reversibility. Basically, the test coverage during testing refers to the generator compiles and receives the control flow information of the program, and integrates this information into the code that can be processed by the executor as test information. The disadvantage of this model is the lack of continuity and repeatability.

2.2.3 Central air traffic flow management (C-ATFM) model

The implementation of the C-ATFM model is an integrated and self-organized environment that supports automated testing. It includes unit test, automatic test case creation, functional test, and continuous repeated test of embedded test case source code. The model is mainly composed of five modules: syntax analyzer, policy generator, command generator, test script generator, and command execution simulator.

2.3 Application of machine learning in hardware testing

Machine learning is widely used in hardware, and this article introduces several program representations suitable for machine learning technology. There are symbol representation, graphical representation, algorithm representation, tensor representation, and code representation. Symbolic representation is represented by mathematical symbols and equations, such as y = mx + b in a linear regression model, where m and b are the parameters of the model. Graphical representation uses graphical models to represent programs, and common graphical models include Bayesian networks, decision trees, neural networks, etc. Algorithm representation is the use of algorithm descriptions to represent programs, such as logistic regression, support vector machines, random forests, etc. Tensor representation uses tensors (multidimensional arrays) to represent data and models and learns model parameters through operations and optimization. Common tensor libraries include TensorFlow and PyTorch. Code representation is the use of programming language to represent machine learning models and algorithms, such as Python, C++, etc. These representation methods have application programs in different machine learning tasks and fields, and choosing the appropriate representation method depends on the specific problem and requirements. The application of machine learning in hardware testing is shown in Figure 1.

Figure 1

Application of machine learning in hardware testing.

2.3.1 Hardware safety and reliability analysis

Although a lot of research has been done on hardware security, the focus is on the security and reliability of mobile application programs, and little attention has been paid to the research and analysis of Windows and Linux application program [23]. In the process of verifying the security and reliability of window and Linux hardware, it usually analyzes the application program interface classification and call program. The information resources contained in the source code can be extracted, and the behavior of the hardware can be tracked and recorded with appropriate tools. It can obtain and input information about the representation model and use vectors to represent the machine. It supports random forest and other machine learning models to conduct experiments and research on virtual devices.

2.3.2 Hardware defects

Hardware defects mainly refer to problems or functional errors that may hinder the work of hardware and programs.

2.3.2.1 Hardware defect prediction

Hardware defect prediction mainly includes the defects in existing data prediction hardware.

The application of machine learning in hardware testing is mainly analyzed by combining dynamic and static methods with machine learning computing methods to build an assessment model [24]. The main characteristics of static analysis include target principle, inheritance principle, code principle, etc. Static analysis method can obtain many features, and different features pay different attention to error prediction. Their species classification is not uniform, so when applying these features to machine learning, the collected data must be applied to learning to avoid large deviations. With hardware error testing becoming more and more mature and data accumulation, the accumulated error data can effectively reduce the work intensity of hardware developers and testers, thus improving the efficiency hardware testing [25].

2.3.2.2 Find hardware defects

In the field hardware testing, locating errors is a relatively complex process. At present, most errors need manual detection. Code review has always been one of the most important error detection methods, but this method requires a lot of human and material resources. In the application of machine learning technology, there are two main methods to detect hardware errors. One is based on the fuzzy positioning theory, which summarizes the hardware errors in historical data. Secondly, similar to hardware defect prediction and other technologies, the model is mainly built and evaluated through static analysis.

2.3.2.3 Hardware defect classification

Generally, when classifying hardware errors, it must determine whether the defects found are true. The characteristics of hardware errors are different. For example, researchers involved in the crowdsourcing hardware validation process regard historical data from multiple fields as defects. The principle of static analysis is considered a sign of defects in hardware development. Accurate error prediction can effectively reduce the work intensity of relevant personnel when managing hardware errors. Due to the increasing pressure on open-source tools for hardware crowdsourcing testing and in-depth analysis, fruitful research is needed to find errors.

2.3.3 Source code

Source code research mainly includes identifying defects in source code through static analysis [26]. Application research in this field mainly includes static analysis of source code in various ways to obtain appropriate and effective features, modeling, and evaluation. Analysis based on source code includes code reuse, which can be completed in terms of code reuse and inspection, code reuse, and similarity inspection. It searches for similar source code in source code and source code package, reduces the labor intensity of research and development personnel, and invests in maintenance.

3 Hardware automatic testing and intelligent analysis model based on machine learning

This article takes hardware defect prediction as the research method to explore the application of machine learning in hardware automatic testing.

3.1 Model related concepts

Defect prediction needs to extract relevant measurement metadata from hardware modules. It identifies the relationship between errors and measurement elements through classification, regression, grouping, and other machine learning methods, and creates appropriate predictive model [27]. It predicts the existence and distribution of errors in a new hardware module. The hardware defect prediction model is shown in Figure 2.

Figure 2

Hardware defect prediction model.

According to the purpose of prediction, error prediction models can be divided into two types. The first category is hardware error prediction of problem classification; that is, predicting whether there are errors in the hardware module. The second category is hardware error prediction of problem sequencing.

3.1.1 Defect prediction model

The purpose of hardware defect prediction based on problem classification is to predict whether a hardware module contains defects. This requires hardware developers to focus their testing resources on predicting defective modules. Such error prediction models have been extensive research [28]. Although their prediction effect is very good, they are not accurate enough for the prediction language module. If a module without errors is erroneously regarded as defective, the test resources would be wasted. If a module is predicted to have no errors, it may not be able to detect and correct errors immediately. The costs of these two types of errors are different. The purpose of the study is to find a more suitable hardware error prediction model by focusing on the balance between the two.

A sort-based error prediction model is applied to an unknown test resource. Hardware modules are ranked according to the number of errors, and priority is given to the module with the largest number of errors. If there are too many test resources, modules with fewer errors can continue to be tested. Using this error prediction model, it is difficult to accurately predict the number of errors in hardware modules. It is important to maintain high-quality ranking results on the basis of hardware [29].

3.1.2 Machine learning algorithm

The machine learning algorithm is shown in Figure 3.

Figure 3

Machine learning algorithm classification.

Decision tree algorithm is a common classification modeling technology. It searches for classification rules in data by creating a decision tree. When a decision tree is used to create an error prediction model, the value of the prediction variable is processed and truncated. In order to improve the performance of the prediction model, it is also important to avoid duplication and minimize the path length.

Linear regression is the best combination of several independent variables to predict or estimate dependent variables. This is more effective and practical than using independent variables to estimate. This is because it can capture more complex relationships, improve prediction accuracy, have higher practical value, and be easy to explain. Linear regression is a basic regression algorithm for task classification. It can calculate the number of possible errors in unknown units and correlate it with hardware measurement. Establish linear equations between functions, and logistic regression is an improved version of linear regression. The result of linear regression can be mapped between 0 and 1 through logarithmic transformation. The binary classification problem can be easily modeled, regardless of whether the prediction hardware is faulty or not.

Bayesian classifier is a simple statistical learning classifier based on the Bayesian theorem and assumes the independence of attributes. It considers that one feature of the variable is independent of the other, and the Bayesian classifier hardly needs model training to evaluate the parameters of the prediction model. It works well when its attributes are completely independent or the correlation between attributes is unclear. Bayesian networks effectively express and aggregate information, and produce misjudgments. It is an uncertain causal model with strong uncertainty processing ability and is often used for category prediction.

Artificial neural network is a computer model that simulates the behavior characteristics of animal neural network and processes information in a distributed and parallel manner. Artificial neural network is an adaptive system in most cases, which can change its internal structure according to external information. This is a relatively simple and effective way to deal with this very complex problem.

It also supports vector machines, clustering analysis, and other hardware defect prediction methods. In all these tasks, it needs historical data to create and validate the prediction model. The advantages and disadvantages of each method depend on the comparison and analysis of the data itself, including accuracy, reliability, and universality.

Currently, these machine learning algorithms have basically been better used in hardware testing. In hardware testing, decision trees can be used to build test cases. For example, for a function, according to different inputs and conditions, a decision tree can be constructed to generate the corresponding test cases. In this way, each possible input and condition can be tested according to different branching paths to ensure the correctness of the hardware. Linear regression can be used to predict hardware performance metrics. For example, by collecting data on the response time of hardware under different loads, a linear regression model can be used to predict the response time under different loads. In this way, the performance of the hardware can be evaluated to see if it meets the requirements and to identify possible performance problems. Bayesian classifiers can be used to classify and identify defects in hardware. For example, by learning from historical defect data, a Bayesian classifier can be used to classify and identify new defects. In this way, defects in hardware can be quickly located and repaired to improve the quality and reliability of the hardware. Artificial neural networks can be used to identify and predict abnormal behavior in hardware. For example, by analyzing the system call sequence data of hardware at runtime, artificial neural network models can be used to identify and predict abnormal behavior in hardware. In this way, potential security vulnerabilities and errors can be identified and timely measures can be taken to fix them. In conclusion, algorithms such as decision trees, linear regression, Bayesian classifiers, and artificial neural networks are widely used in hardware testing, which can help us to better design and evaluate test cases, predict hardware performance, classify and identify defects, and identify abnormal behaviors.

3.2 Hardware defect prediction model based on machine learning

3.2.1 Data preprocessing

When data collected from multiple sources are consolidated into one record, duplicate data, missing data, and possible data errors would occur. The introduction of data may lead to an inaccurate error prediction model, so it is necessary to clean up the data set. In addition, for the classification problem, it needs to convert the number of errors into with or without errors. Therefore, the number of errors greater than 0 is an error class and is set to 1. If the number of defects is 0, it is considered error-free and set to 0. The cleaned data set can be used to train the model and can be divided into two parts: test and validation test set. hardware errors comply with the “2–8” principle, and 80% of hardware errors are concentrated in 20% of hardware modules. Therefore, the number of modules that contain hardware errors in the collected data is far less than the number of modules without errors, so the current situation of the data set leads to an imbalance. Therefore, the strategy of manually synthesizing new samples and re-sampling is used to optimize the unbalanced data set, so that the defective data and the defect-free data are arranged in the same order, and then, the evenly distributed sample set is studied.

3.2.2 Comprehensive prediction model

In machine learning, the effective technology that can improve the performance of the classifier is integrated learning that combines multiple identical or different algorithms. The hardware defect model that can be created using integration technology is still in its early stages. Integrating Bagging and Boosting algorithms has several methods that can improve prediction accuracy and model capability [30,31]. It is widely used to combine fragile prediction factors with new and powerful prediction factors.

This article uses another known embedded learning technology, Stacking, to build the model. Stacking method is mainly a combination of advanced and primary basis to improve prediction accuracy.

The main difference between Bagging and Boosting methods is that the Stacking method uses different machine learning algorithms and the same basic learning type [32,33]. The Stacking system contains two classifiers: level 0 is the basic classifier, and level 1 is the meta classifier. The basic classifier is created by automatically selecting samples from the training set and converting its output into a meta-classifier. The task of meta-classifier is to intelligently combine the output set, correct the classification errors of the basic classifier, and accurately classify the target. Therefore, the first step of combination prediction is to use the basic classifier to predict the data set and use the output of the basic classifier as the input of the meta classifier. The predicted information of the data set is combined with the classification results of the actual training data into a data set, and the new project is used as training data for subsequent use of other learning algorithms to solve problems.

This article applies the concept of Stacking method to hardware error prediction. A simple machine learning algorithm is used to quickly classify and sort the data in the training set. The classification problem can be solved by logical regression algorithm and decision tree algorithm. The sorting problem can be solved using linear regression algorithms and evolutionary algorithms. The prediction results are included in the original records as new measures. Because the prediction usually works well, the prediction result is closely related to the actual number of errors in the hardware module.

Algorithms with more complex network structures (such as Bayesian or neural networks) can be used to predict new secondary data sets. Due to the complexity and time of the modeling process, the amount of available data is reduced. The best attribute subset can be selected for training to improve the performance of the model and minimize information loss. It is suggested that the attribute selection method be used to reduce the cost. At the same time, the feature selection method can also solve the problem of the concentrated error of the measurement data and the concentrated error of some measurement data unrelated to the structure of the error prediction model. It can use the information gained to select attributes, and the model calculates the measurement performance selection by calculating the gain.

The information gain rate of attribute A is defined as follows:

(1) IGR ( A ) = IG ( A ) / I ( A ) ,

(2) IG ( A ) = entropy ( S ) − entropy ( S , A ) .

Here, IG ( A ) is the information gain of attribute A, that is, the difference between the data entropy before and after separation based on attribute A.

Suppose the set S = { S 1 , S 2 , … , S i , … , S N } and S i contain an attribute vector X i = ( x i 1 , x i 2 , … , x ip ) and classification name c i ∈ C = { c 1 , c 2 , … , c m } . In defect prediction, X i and c i represent the measurement vector in the module and the mark of the module defect respectively. If p i is the proportion of category i in S, the calculation method of information entropy is as follows:

(3) entropy ( S ) = ∑ i − 1 m p i log b p i .

Each attribute can have multiple values. Suppose Values ( A ) is set A with different attribute values, and S v refers to the set with all attribute values of ν in set S.

(4) entropy ( S , A ) = ∑ ν ϵ Values ( A ) | S v | | S | entropy ( S v ) ,

where entropy ( S , A ) represents the amount of information required for accurate classification of S components into Class A.

It is necessary to consider that the information value obtained for feature selection may be over-adjusted. Therefore, internal information I is introduced. It divides training set S by A to obtain data S ′ and then provides expected information.

(5) I ( A ) = − ∑ i = 1 m | S v | | S | logb | S v | | S | .

The information gain rate solves the problem of using information amplification as a compensation measure. The size of the original record would decrease with the best subset of the selected attributes. If the selected attribute subset does not add new attributes after the first level forecast, it would be added to the attribute subset manually and the model would be reapplied. Compared with simple algorithms, the complexity is higher. A learning algorithm has a strong macroscopic ability and excellent global optimization-seeking ability, which means that it is able to deal with the problem from a macroscopic point of view, comprehensively consider the various factors of the problem, and adopt appropriate optimization strategies and algorithms in order to find the global optimal solution of the problem, which can eliminate the over-matching of the simple algorithms in some data sets to a certain extent, and improve the prediction and generalization ability of predictive models.

The first trained test set is used as the input of the model to obtain the predicted value, and the model is verified by comparing the predicted value with the actual error. If the training and test data sets are from the same data set, the 10-fold crossover algorithm is used for verification. That is to say, the data set is divided into 10 parts, 9 training data, and 1 feedback calculation test data. The experiment was repeated 10 times, and the average of 10 times was calculated as the final test result.

3.3 Evaluation indicators

In terms of the taxonomic model, this study adopts traditional taxonomic performance indicators such as accuracy, recall rate, F-metric, area under curve (AUC).

Accuracy:

(6) Precision = TP / ( TP + FP ) .

Recall rate:

(7) Recall = TP / ( TP + FN ) .

F Measure:

(8) F _ measure = 2 × precision × recall / ( precision + recall ) .

In the formula, TP represents true positive, FP represents false positive, and FN represents false negative.

Receiver operating characteristic curve is used to describe the balance between original sales and cost. In this article, the Y-axis represents the actual positive rate, and the X-axis represents the actual negative rate. AUC is within the range of [0.1]. The larger the area, the better the model.

Overall, the development of machine learning techniques has opened up new possibilities for automated testing. Through the use of machine learning models, test cases can be generated and optimized in an automated way to improve the efficiency and accuracy of testing. In addition, machine learning can be used to analyze the test results to help us better understand the behavior and quality of the hardware. Automated testing based on machine learning mainly includes the following four steps: data collection and preprocessing, model training, test execution, defect detection, and reporting. Firstly, we collect relevant data such as hardware versions, test cases, defect reports, etc., and carry out pre-processing, such as data cleaning, feature extraction, etc. Then, we use machine learning models to analyze the pre-processed data. Then the machine learning model is used to train the preprocessed data and generate test cases. After that, automated testing is performed according to the generated test cases, and the test results are recorded. Finally, the machine learning model is used to analyze the test results, detect potential defects, and generate corresponding reports. Intelligent analysis application program mainly use machine learning models to analyze test results and provide insights into hardware quality and performance. By analyzing the execution time and pass rate of test cases, the performance and stability of the hardware can be assessed; by analyzing the defect reports, the main problem areas of the hardware and the possible direction of improvement can be understood.

4 Hardware automatic test and intelligent analysis experiment

To verify the effectiveness of hardware automatic testing and intelligent analysis for different models, this article takes the manufacturing industry as the evaluation object, selects data from four aspects of product quality, equipment failures, production efficiency, and customer needs in the manufacturing industry as the standard data set, uses Python programming language as the hardware development tool, and uses PowerBI as the data visualization tool, In addition, artificial neural networks are used to predict known data. During the collection process of four data sets, the enterprise database was used to collect the data sets. The product quality data are collected through defect reports, defect rates, and other information recorded by the quality inspection department. Equipment failure data are collected through equipment maintenance records, alarm systems, and sensors. Production efficiency data is collected from data such as output, working hours, and resource utilization in the production management system. Customer demand data are obtained through market research, customer feedback, and order information.

Manufacturing industry refers to the industry that converts raw materials into saleable products or components through processes such as processing, assembly, and manufacturing. It covers a wide range of fields, such as automobiles, electronic devices, machinery, furniture, chemicals, etc. Manufacturing data refer to various information related to manufacturing activities, such as product quality data, equipment failure records, production efficiency data, customer demand data, etc. These data can help enterprises understand the problems in the production process, optimize production processes, improve product quality and production efficiency, and better meet customer needs. To obtain these data, this article collaborated with a local manufacturing enterprise to obtain product quality data, equipment failure records, production efficiency data, and customer demand data from the actual production process of the enterprise from 2018 to the present, in order to better conduct research and analysis.

In this article, standard data sets are used as experimental test cases, from which four data sets are extracted for data defect prediction. And in order to test the performance of the machine learning-based defect prediction model, this paper compares it with the code feature-based defect prediction model. Where the code feature-based defect prediction model is denoted as Model A, the machine learning-based defect prediction model is denoted as Model B. The defect prediction model based on code features is a method of predicting potential defects in code by analyzing code features [34,35]. This method mainly includes three steps: code review, code feature extraction, and model training. During the code review phase, conduct static analysis of the code to identify potential defects. In the feature extraction stage of code, extract features related to defects from the code, such as code complexity, repetition rate, etc. In the model training stage, statistical and machine learning methods are used to establish predictive models, and historical data are used for model training and optimization. The accuracy, efficiency, maintainability, and predictive ability of defect prediction models based on code features are good, but their performance depends on multiple factors, including data quality, feature selection, model type, training and optimization methods, evaluation indicators, and data set size and diversity. In order to improve the performance of the model, it can be attempted to adjust these factors and use appropriate evaluation methods to objectively evaluate the performance of the model. This article conducted 30 experiments on the data set and calculated the average prediction accuracy, recall rate, F-value, and AUC of the 30 experimental results.

During the data preprocessing process, this article extracted physical properties, chemical composition, and appearance characteristics from the product quality data set; extracting fault type, frequency, and duration characteristics from the equipment fault data set; extracted unit time output, equipment operating time, and equipment operating time characteristics from the production efficiency data set. We extracted satisfaction and completeness features from the customer demand data set.

When evaluating machine learning-based defect prediction models and code feature-based defect prediction models, common applications/domains include software development, website development, mobile application development, etc. Projects in these fields typically have a large amount of code and may have various defects. By applying predictive models in these fields, the accuracy and practicality of the models can be evaluated, providing assurance for software quality and security.

Among the two defect prediction models, machine learning models mainly consider historical defect data, code structure, complexity metrics, etc. The model based on code features mainly focuses on code quality, style, syntax structure, etc. These aspects provide the model with the key information needed to predict defects.

In machine learning-based defect prediction models, commonly used software tools include Scikit learn, TensorFlow, and PyTorch. These tools provide rich machine learning algorithms and deep learning frameworks for constructing, training, and evaluating predictive models. They support key steps such as data preprocessing, feature extraction, model selection, and optimization to ensure the effectiveness and accuracy of the model.

4.1 Accuracy

The prediction accuracy of the data defect prediction model is shown in Figure 4.

Figure 4

Prediction accuracy of data defect prediction models. (a) Prediction accuracy of model data defect prediction model; (b) prediction accuracy of model data defect prediction model.

Figure 4(a) shows the prediction accuracy of model A data defect prediction model, and Figure 4(b) shows the prediction accuracy of model B data defect prediction model. The average prediction accuracy of defect prediction model A for four data sets is 0.792, and the average prediction accuracy of defect prediction model B for four data sets is 0.884. It can be seen from the comparison data that defect prediction model B has higher accuracy in defect prediction, and the accuracy is 0.092 higher than that of model A. This indicates that the hardware automated testing system based on machine learning in this article is able to predict hardware defects better than the traditional defect prediction model based on code features.

4.2 Recall rate

The predicted recall rate of the data defect prediction model is shown in Figure 5.

Figure 5

Prediction recall rate of data defect prediction models: (a) Prediction recall rate of model data defect prediction model, (b) prediction recall rate of model data defect prediction model.

Figure 5(a) shows the predicted recall rate of model A data defect prediction model, and Figure 5(b) shows the predicted recall rate of model B data defect prediction model. The average predicted recall rate of defect prediction model A for the four data sets is 0.849, and the average predicted recall rate of defect prediction model B for the four data sets is 0.891. It can be seen from the data that the data defect prediction recall rate of Model B is higher, and the recall rate is significantly higher than that of Model A. This indicates that the hardware automated testing system based on machine learning in this article is able to predict hardware defects better than the traditional defect prediction model based on code features.

4.3 F-Measurement

The F-metric of the data defect prediction model is shown in Figure 6.

Figure 6

F-measure of data defect prediction models: (a) model A data defect prediction model, (b) model B data defect prediction model.

Figure 6(a) shows F of model A data defect prediction model F_measure, Figure 6(b) shows the F_measure of model B data defect prediction model. The average F_measure of the predicted value of defect prediction model A for the four data sets is 0.749. The average F_measure of the predicted value of defect prediction model B for the four data sets is 0.853. The F_measure of the predicted value of Model A is mostly below 0.8, and the F_measure of the predicted value of Model B is mostly above 0.8. It can be seen from the data that Model B has a better F_measure in data defect prediction. This indicates that the hardware automated testing system based on machine learning in this article is able to predict hardware defects better than the traditional defect prediction model based on code features.

4.4 AUC

The AUC of the data defect prediction model is shown in Figure 7.

Figure 7

AUC of data defect prediction models: (a) model A data defect prediction model, (b) model B data defect prediction model.

Figure 7(a) shows the AUC of model A data defect prediction model, and Figure 7(b) shows the AUC of model B data defect prediction model. The AUC average of the predicted values of defect prediction model A for the four data sets is 0.731, and the AUC average of the predicted values of defect prediction model B for the four data sets is 0.817. It can be seen from the comparative data that the AUC predicted by the data defect of model B is higher. This indicates that the hardware automated testing system based on machine learning in this article is able to predict hardware defects better than the traditional defect prediction model based on code features.

5 Conclusions

At present, hardware automatic testing has gradually replaced manual testing as the main method of hardware testing, but the test performance and test accuracy of hardware automatic testing model need further study. In order to improve the performance of hardware automatic testing, this article used machine learning algorithm to study hardware automatic testing and intelligent analysis. Taking the data defect prediction direction of hardware automatic testing as an example, it constructed a hardware data defect prediction model based on machine learning and verifies its prediction performance through experiments. Experiments showed that the model has good accuracy, recall, F-metric, and AUC, and it can be applied to the field of hardware automatic testing.

Funding information: This work was supported by the Education Department of Hainan Province, project number: Hnjg2024ZC-141; also Supported by Scientific research funding project of Hainan Vocational University of Science and Technology, Project No: HKKY2022ZD-03.
Author contributions: Ru Jing, Yajuan Zhang and Shulong Zhuo, the three authors together collected and organized the data, wrote and reviewed the manuscript. All authors contribute equally.
Conflict of interest: The authors declare that there is no conflict of interest with any financial organizations regarding the material reported in this manuscript.
Data availability statement: No data were used to support this study.

References

[1] P. Zhang, “Measurement of hallux valgus related indicators using Mimics hardware based on foot weight bearing CT imaging,” Chin. J. Anat. Clin. Sci., vol. 23, no. 1, pp. 7–13, 2018.Search in Google Scholar

[2] W. Chen, C. Yu, and L. Xiao, “Research on the application of cloud based automatic testing for relay protection,” Electr. Technol., vol. 19, pp. 7–10, 2021.Search in Google Scholar

[3] S. Motahhir, “Open hardware/hardware test bench for solar tracker with virtual instrumentation,” Sustain. Energy Technol. Assess, vol. 31, no. 2, pp. 9–16, 2019.10.1016/j.seta.2018.11.003Search in Google Scholar

[4] J. M. Balera and V. A. de Santiago Júnior, “A systematic mapping addressing hyper-heuristics within search-based hardware testing,” Inf. Hardw. Technol., vol. 114, no. 10, pp. 176–189, 2019.10.1016/j.infsof.2019.06.012Search in Google Scholar

[5] A. M. Alghamdi and E. E. Fathy, “Hardware testing techniques for parallel systems: A survey,” Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 4, pp. 176–186, 2019.Search in Google Scholar

[6] B. Marculescu, “Transferring interactive search-based hardware testing to industry,” J. Syst. Hardw., vol. 142, no. 8, pp. 156–170, 2018.10.1016/j.jss.2018.04.061Search in Google Scholar

[7] S. M. Melo, “Empirical research on concurrent hardware testing: A systematic mapping study,” Inf. Hardw. Technol., vol. 105, no. 1, pp. 226–251, 2019.10.1016/j.infsof.2018.08.017Search in Google Scholar

[8] O. S. Gómez, R. H. Rosero, and K. Cortés-Verdín, “CRUDyLeaf: a DSL for generating spring boot REST APIs from entity CRUD operations,” Cybern. Inf. Technol., vol. 20, no. 3, pp. 3–14, 2020.10.2478/cait-2020-0024Search in Google Scholar

[9] G. Quirós, D. Cao, and A. Canedo, “Dispersed automation for industrial internet of things,” IEEE Trans. Autom. Sci. Eng., vol. 17, no. 3, pp. 1176–1181, 2020.10.1109/TASE.2020.2978527Search in Google Scholar

[10] H. M. Tran, “An analyze of hardware bug reports using machine learning techniques,” SN Comput. Sci, vol. 1, no. 1, pp. 1–11, 2020.10.1007/s42979-019-0004-1Search in Google Scholar

[11] G. Esteves, “Understanding machine learning hardware defect predictions,” Autom. Hardw. Eng., vol. 27, no. 3, pp. 369–392, 2020.10.1007/s10515-020-00277-4Search in Google Scholar

[12] A. Tucker, “Generating high-fidelity synthetic patient data for assessing machine learning healthcare hardware,” NPJ Digital Med., vol. 3, no. 1, pp. 1–13, 2020.10.1038/s41746-020-00353-9Search in Google Scholar PubMed PubMed Central

[13] S. Gerke, “The need for a system view to regulate artificial intelligence/machine learning-based hardware as medical device,” NPJ Digital Med., vol. 3, no. 1, pp. 1–4, 2020.10.1038/s41746-020-0262-2Search in Google Scholar PubMed PubMed Central

[14] S. Goyal and P. K. Bhatia, “Comparison of machine learning techniques for hardware quality prediction,” Int. J. Knowl. Syst. Sci. (IJKSS), vol. 11, no. 2, pp. 20–40, 2020.10.4018/IJKSS.2020040102Search in Google Scholar

[15] A. Jaiswal and R. Malhotra, “Hardware reliability prediction using machine learning techniques,” Int. J. Syst. Assur. Eng. Manag., vol. 9, no. 1, pp. 230–244, 2018.10.1007/s13198-016-0543-ySearch in Google Scholar

[16] A. K. Sandhu and R. S. Batth, “Hardware reuse analytics using integrated random forest and gradient boosting machine learning algorithm,” Hardw.: Pract. Exp., vol. 51, no. 4, pp. 735–747, 2021.10.1002/spe.2921Search in Google Scholar

[17] A. Dwarakanath, D. Era, A. Priyadarshi, N. Dubash, and S. Podder, “Accelerating test automation through a domain specific language,” 2017 IEEE International Conference on Hardware Testing, Verification and Validation (ICST), Tokyo, Japan, 2017, pp. 460–467, 10.1109/ICST.2017.52.Search in Google Scholar

[18] X. Long, “A script language for automatic testing of embedded hardware,” Control. Inf. Technol., vol. 3, pp. 48–51, 2019.Search in Google Scholar

[19] A. Khalilian, A. Baraani-Dastjerdi, and B. Zamani, “APRSuite: A suite of components and use cases based on categorical decomposition of automatic program repair techniques and tools,” J. Comput. Lang., vol. 57, p. 100927, 2020, 10.1016/j.cola.2019.100927.Search in Google Scholar

[20] A. S. Dimovski, “A binary decision diagram lifted domain for analyzing program families,” J. Comput. Lang., vol. 63, p. 101032, 2021, 10.1016/j.cola.2021.101032.Search in Google Scholar

[21] Y. Tsutano, S. Bachala, W. Srisa-an, G. Rothermel, and J. Dinh, “Jitana. A modern hybrid program analyze framework for android platforms,” J. Comput. Lang., vol. 52, pp. 55–71, 2019, 10.1016/j.cola.2018.12.004.Search in Google Scholar

[22] Peiling, “Error prone analyze and solution strategies for "program structure" in VB design,” Inf. Comput., vol. 8, pp. 236–237, 2019.Search in Google Scholar

[23] A. Balapour, H. R. Nikkhah, and R. Sabherwal, “Mobile application security: Role of perceived privacy as the predictor of security perceptions,” Int. J. Inf. Manag., vol. 52, p. 102063, 2020.10.1016/j.ijinfomgt.2019.102063Search in Google Scholar

[24] V. H. Durelli, R. S. Durelli, S. S. Borges, A. T. Endo, M. M. Eler, D. R. Dias, et al., “Machine learning applied to hardware testing: A systematic map** study,” IEEE Trans. Reliab., vol. 68, no. 3, pp. 1189–1212, 2019.10.1109/TR.2019.2892517Search in Google Scholar

[25] K. Shi, Y. Lu, J. Chang, and Z. Wei, “PathPair2Vec: An AST path pair-based code representation method for defect prediction,” J. Comput. Lang., vol. 59, p. 100979, 2020, 10.1016/j.cola.2020.100979.Search in Google Scholar

[26] J. Yang, “Evaluating and securing text-based java code through static code analyze,” J. Cybersecur. Educ. Res. Pract., vol. 2020, no. 1, p. 3, 2020.10.62915/2472-2707.1063Search in Google Scholar

[27] L. Kumar, S. Tummalapalli, S. C. Rathi, L. B. Murthy, A. Krishna, and S. Misra, “Machine learning with word embedding for detecting web-services anti-patterns,” J. Comput. Lang., vol. 75, p. 101207, 2023, 10.1016/j.cola.2023.101207.Search in Google Scholar

[28] Z. Liao, “A prediction model of the project life-span in open source hardware ecosystem,” Mob. Netw. Appl., vol. 24, pp. 1382–1391, 2019.10.1007/s11036-018-0993-3Search in Google Scholar

[29] A. F. da Silva, E. Borin, F. M. Pereira, N. L. Junior, and O. O. Napoli, “Program representations for predictive compilation: State of affairs in the early 20’s,” J. Comput. Lang., vol. 73, p. 101171, 2022, 10.1016/j.cola.2022.101171.Search in Google Scholar

[30] S. González, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Inf. Fusion., vol. 64, pp. 205–237, 2020.10.1016/j.inffus.2020.07.007Search in Google Scholar

[31] A. Plaia, “Comparing boosting and bagging for decision trees of rankings,” J. Classifi., vol. 39, no. 1, pp. 78–99, 2022.10.1007/s00357-021-09397-2Search in Google Scholar

[32] C. F. Kurz, W. Maier, and C. Rink, “A greedy stacking algorithm for model ensembling and domain weighting,” BMC Res. Notes, vol. 13, pp. 1–6, 2020.10.1186/s13104-020-4931-7Search in Google Scholar PubMed PubMed Central

[33] N. L. Tsakiridis, “A genetic algorithm-based stacking algorithm for predicting soil organic matter from vis-NIR spectral data,” Eur. J. Soil. Sci., vol. 70, no. 3, pp. 578–590, 2019.10.1111/ejss.12760Search in Google Scholar

[34] T. Shippey, D. Bowes, and T. Hall, “Automatically identifying code features for software defect prediction: Using AST N-grams,” Inf. Softw. Technol., vol. 106, pp. 142–160, 2019.10.1016/j.infsof.2018.10.001Search in Google Scholar

[35] A. Majd, M. Vahidi-Asl, A. Khalilian, P. Poorsarvi-Tehrani, and H. Haghighi, “SLDeep: Statement-level software defect prediction using deep-learning model on static code features,” Expert. Syst. Appl. vol. 147, p. 113156, 2020.10.1016/j.eswa.2019.113156Search in Google Scholar

Received: 2023-01-30

Revised: 2024-03-21

Accepted: 2024-05-20

Published Online: 2024-10-23

This work is licensed under the Creative Commons Attribution 4.0 International License.