Automated Anomaly Detection and Causal Analysis for Civil Aviation Using QAR Data
<p>Down-sampling results of pitch angle parameters.</p> "> Figure 2
<p>Distribution of different anomalies in QAR data during approach phase.</p> "> Figure 3
<p>Distribution of samples using data balance techniques.</p> "> Figure 4
<p>Overview of <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">MAD</mi> <mtext>-</mtext> <mi mathvariant="sans-serif">XFP</mi> </mrow> </semantics></math>.</p> "> Figure 5
<p>Distribution of feature importance of <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">MAD</mi> <mtext>-</mtext> <mi mathvariant="sans-serif">XFP</mi> </mrow> </semantics></math> (top 20).</p> "> Figure 6
<p>Confusion matrix of the model: (<b>a</b>) unbalanced; (<b>b</b>) balanced.</p> "> Figure 7
<p>Overall performance evaluation.</p> "> Figure 8
<p>Results of sensitivity analysis.</p> "> Figure 9
<p>SHAP Interpretation Chart. (<b>a</b>) SHAP interpretation chart of <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">MAD</mi> <mtext>-</mtext> <mi mathvariant="sans-serif">XFP</mi> </mrow> </semantics></math>; (<b>b</b>) SHAP interpretation chart of anomaly label 1 (removed IVV, RALTC, and their combinations).</p> "> Figure 10
<p>Example of anomaly detection and causal analysis. (<b>a</b>) high speed during approach phase; (<b>b</b>) causal analysis of high speed during approach.</p> "> Figure 11
<p>Multi-type anomaly detection.</p> "> Figure 12
<p>Features ranking for detected anomaly events. (<b>a</b>) ILS Heading Deviation; (<b>b</b>) Large Decline Rate; (<b>c</b>) ILS Glide Slope Deviation.</p> ">
Abstract
:1. Introduction
- It can only detect one anomaly in one operation routine. It lacks effectiveness in identifying composite abnormal events that involve multiple anomalies.
- It does not support automated causal analysis of abnormal events. It relies heavily on expert experience to identify abnormal events and generate causal analysis reports.
- An automated anomaly detection method is proposed, tailored to the unique characteristics of QAR data and the requirements of FOQA. This model can detect various abnormal events in one operation routine.
- In order to achieve efficient convergence rates and lower over-fitting risks, the advanced feature engineering techniques and hyper-parameter auto-tuning techniques are utilized.
- An automated and intelligent causal analysis technology for abnormal events is proposed to address the limitations of the traditional FOQA working method.
- Method evaluation experiments are conducted on real QAR datasets. The evaluation and comparison results are presented with in-depth analysis.
2. Related Works
3. Preliminary Preparation for Data and Causal Analysis
3.1. Problem Statement
3.2. Data Preprocessing
3.2.1. Data Cleaning
3.2.2. Data Balancing
3.3. Preparations for Causal Analysis
4. Multi-Type Anomaly Detection in Civil Aviation
4.1. Model Design
4.2. Anomaly Detection
5. Optimization Techniques
5.1. Automated Feature Generation
- Interaction between time and geographic features: the parameters involved are TIME_R, LONPC, LATPC, FLIGHT_PHASE.
- Interaction between meteorological conditions and flight conditions: parameters involved are RUDD, WIN_SPD, TAT, ALT_STD.
- Flight dynamic feature interaction: parameters involved are FF1, GSC.
- Interaction between flight parameters and flight status: parameters involved are N11, VB11, TLA1.
5.2. Automated Parameters Optimization
6. Experimental Evaluations
6.1. Evaluation Metrics
6.2. Results and Analysis
6.2.1. Overall Performance Evaluation
6.2.2. Ablation Study
6.2.3. Sensitivity Analysis
6.3. Automated Causal Analysis for Anomaly Event
6.4. Case Study
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Flight Standards Department of the Civil Aviation Administration of China. Implementation and Management of Flight Operations Quality Assurance; Civil Aviation Administration of China: Beijing, China, 2012.
- Qing, G. The Era of Big Data—Multi-dimensional Use of QAR Data. In Proceedings of the WWW, Ljubljana, Slovenia, 19–23 April 2021. [Google Scholar]
- Du, H.B.; Li, Z.X. Cause Analysis on Approach-and-Landing Loss Accidents and Their Countermeasures. China Saf. Sci. J. 2006, 16, 118–122. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Wang, X.; Lu, X. A Host-Based Anomaly Detection Framework Using XGBoost and LSTM for IoT Devices. Wirel. Commun. Mob. Comput. 2020, 2020, 8838571. [Google Scholar] [CrossRef]
- Yang, M.; Liu, S.; Xu, J.; Tan, G.; Li, C.; Song, L. Achieving privacy-preserving cross-silo anomaly detection using federated XGBoost. J. Frankl. Inst. 2023, 360, 6194–6210. [Google Scholar] [CrossRef]
- Hirata, T.; Kuremoto, T.; Obayashi, M.; Mabu, S.; Kobayashi, K. Time Series Prediction Using DBN and ARIMA. In Proceedings of the 2015 International Conference on Computer Application Technologies, Matsue, Japan, 31 August–2 September 2015; pp. 24–29. [Google Scholar]
- Duraj, A.; Ludwicka, M. Detection of outliers in the financial time series using ARIMA models. In Proceedings of the 2018 Applications of Electromagnetics in Modern Techniques and Medicine (PTZE), Raclawice, Poland, 9–12 September 2018; pp. 49–52. [Google Scholar]
- Hu, M.; Ji, Z.; Yan, K.; Guo, Y.; Feng, X.; Gong, J.; Zhao, X.; Dong, L. Detecting Anomalies in Time Series Data via a Meta-Feature Based Approach. IEEE Access 2018, 6, 27760–27776. [Google Scholar] [CrossRef]
- Lee, K.; Lee, C.H.; Lee, J. Semi-Supervised Anomaly Detection Algorithm Using Probabilistic Labeling (SAD-PL). IEEE Access 2021, 9, 142972–142981. [Google Scholar] [CrossRef]
- Akhmedova, S.; Stanovov, V.; Kamiya, Y. A Hybrid Clustering Approach Based on Fuzzy Logic and Evolutionary Computation for Anomaly Detection. Algorithms 2022, 15, 342. [Google Scholar] [CrossRef]
- Sun, Q.; Ji, R. Flight anomaly detection model based on QAR data autoencoder and DBscan algorithm. In Proceedings of the 2021 IEEE 3rd International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Changsha, China, 20–22 October 2021; pp. 309–312. [Google Scholar]
- Qiu, R.; Yin, Y.; Su, Q.; Guan, T. An Anomaly Detection Algorithm of QAR Based on Spatial-Temporal Correlation. In Proceedings of the 2023 International Conference on Cyber-Physical Social Intelligence (ICCSI), Xi’an, China, 20–23 October 2023; pp. 7–12. [Google Scholar]
- Han, H.; Wang, W.-Y.; Mao, B.-H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005. [Google Scholar]
- Zhang, T.; Zhang, Z.; Fan, Z.; Luo, H.; Liu, F.; Liu, Q.; Cao, W.; Li, J. OpenFE: Automated Feature Generation with Expert-level Performance. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Volume 202, pp. 41880–41901. [Google Scholar]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna-A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
- Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; pp. 2546–2554. [Google Scholar]
- Masahiro Nomura, M.S. CFPO: A Constrained Formulation of Pareto Optimization for Hyperparameter Tuning. Mach. Learn. J. 2021. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3149–3157. [Google Scholar]
- Scott Lundberg, S.I.L. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4768–4777. [Google Scholar]
Monitoring Items | Monitoring Parameters | Monitoring Point | Deviation Limit Value | |
---|---|---|---|---|
Mild Deviation | Severe Deviation | |||
Large Decent Rate | IVV | 610 m (2000 ft)∼305 m (1000 ft) | >457 m/min (1500 ft/min) | >549 m/min (1800 ft/min) |
305 m (1000 ft)∼152 m (500 ft) | >396 m/min (1300 ft/min) | >457 m/min (1500 ft/min) | ||
152 m (500 ft)∼15 m (50 ft) | >335 m/min (1100 ft/min) | >396 m/min (1300 ft/min) | ||
Large Approach Roll Angle | ROLL | 457 m (1500 ft)∼152 m (500 ft) | >30° | >35° |
152 m (500 ft)∼61 m (200 ft) | >15° | >20° | ||
61 m (200 ft)∼15 m (50 ft) | >8° | >10° | ||
High Approach Speed | IAS, VAPP | 152 m (500 ft)∼15 m (50 ft) | >(VAPP + 15) kn | >(VAPP + 20) kn |
Low Approach Speed | IAS, VAPP | 305 m (1000 ft)∼15 m (50 ft) | <(VAPP − 5) kn | <(VAPP − 10) kn |
ILS Glide Slope Deviation | GLIDE_DEVC | below 305 m (1000 ft) | >1.0 point | >1.5 point |
ILS Heading Deciation | LOC_DEVC | below 305 m (1000 ft) | >1.0 point | >1.5 point |
Low-Altitude High Speed | IAS | below 762 m (2500 ft) | >230 kn | >250 kn |
Categories | Parameters | Units | Description |
---|---|---|---|
Monitoring Parameters | IAS | KTS | Corrected Airspeed |
VAPP | KTS | Approach Reference Speed | |
GLIDE_DEVC | DDM | ILS Glideslope Deviation | |
LOC_DEVC | DDM | ILS Heading Deviation | |
ROLL | DEG | Roll Angle | |
IVV | FT/MIN | Decent Rate | |
Pilot Operating Parameters | TLA | DEG | Throttle Resolver Angle |
SEL_COURSE | — | Selected Course | |
FF1 | PPH | Fuel Mass Flow Rate ENG.1 | |
VB11 | % | N1 VIB ENG.1 BRG1 or TRF | |
Flight State Parameters | HEIGHT | FT | Corrected Height |
RALTC | FT | Radio Height | |
ALT_STDC | FT | Altitude Standard Recorded | |
VRTG | G | Normal Acceleration | |
LATG | G | Lateral Acceleration | |
LONG | G | Longitudinal Acceleration | |
SPOIL_POS | — | Spoiler Position | |
VOR_FRQ | HZ | Vor Frequency | |
N11 | % | Low Rotor Speed ENG.1 | |
FLAPC | DEG | Flap Actual Position | |
HEAD | DEG | Magnetic Heading | |
PITCH | DEG | Pitch Angle | |
RUDD | DEG | Rudder Position | |
FPA | DEG | Flight Path Angle | |
VREF | KTS | Landing Reference Speed | |
Environmental Parameters | WIN_DIR | DEG | Wind Direction |
WIN_SPD | KN | Wind Speed | |
LONPC | DEG | Present Position Longitude | |
LATPC | DEG | Present Position Latitude | |
Indicating Parameter | TIME_R | — | Time |
FLIGHT_PHASE | — | Flight Phase |
Abnormal Event | Anomaly Label |
---|---|
No Abnormal Event Occurred | 0 |
Large Decent Rate (2000 ft∼1000 ft) | 1 |
Large Decent Rate (1000 ft∼500 ft) | 2 |
Large Decent Rate (500 ft∼50 ft) | 3 |
Large Approach Roll Angle | 4 |
High Approach Speed | 5 |
Low Approach Speed | 6 |
ILS Glide Slope Deviation | 7 |
ILS Heading Deciation | 8 |
Low-Altitude High Speed | 9 |
Number | Candidate Features |
---|---|
0 | (LONPC*LATPC) |
1 | max(LONPC,LATPC) |
2 | (LONPC+LATPC) |
3 | residual(LATPC) |
4 | log(ALT_STD) |
5 | freq(RUDD) |
6 | max(TAT,ALT_STD) |
7 | log(GSC) |
8 | (FF1*GSC) |
9 | freq(FF1) |
10 | (N11*TLA1) |
… | … |
25 | (N11/VB11) |
… | … |
Parameter | Default | Optimum | Description | Function |
---|---|---|---|---|
learning_rate | 0.1 | 0.02 | Learning rate | Help converge stably |
max_depth | None | 13 | Maximum depth of tree | Prevent over fitting |
min_child_weight | 1 | 4 | The smallest sample weight sum | Control the stop of splitting |
subsample | 1 | 0.8 | Subsampling rate | Balance the variance and bias |
colsample_bytree | 1 | 0.7 | Sampling columns while spanning trees | Improve the generalization ability |
alpha | 0 | 0.080 | L1 regularization coefficient | Make more conservative |
lambda | 0 | 0.086 | L2 regularization coefficient | Make more conservative |
Dataset | Unbalanced | Balanced |
---|---|---|
Training sample points | 48,858 | 50,000 |
Testing sample points | 10,251 | 10,251 |
Dataset | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|
Unbalance | 0.955 | 0.687 | 0.799 | 0.992 |
Balanced | 0.844 | 0.781 | 0.812 | 0.993 |
Dataset | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|
Unbalance | 0.965 | 0.663 | 0.786 | 0.914 |
Balanced | 0.991 | 0.812 | 0.893 | 0.990 |
Model | Metrics | |||
---|---|---|---|---|
Precision | Recall | F1 | Accuracy | |
0.999 | 0.939 | 0.968 | 0.998 | |
w/o OpenFE | 0.963 | 0.763 | 0.851 | 0.994 |
w/o Optuna | 0.868 | 0.939 | 0.902 | 0.997 |
w/o OpenFE and Optuna | 0.844 | 0.781 | 0.812 | 0.993 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dang, X.; Hua, C.; Rong, C. Automated Anomaly Detection and Causal Analysis for Civil Aviation Using QAR Data. Appl. Sci. 2025, 15, 2250. https://doi.org/10.3390/app15052250
Dang X, Hua C, Rong C. Automated Anomaly Detection and Causal Analysis for Civil Aviation Using QAR Data. Applied Sciences. 2025; 15(5):2250. https://doi.org/10.3390/app15052250
Chicago/Turabian StyleDang, Xin, Congcong Hua, and Chuitian Rong. 2025. "Automated Anomaly Detection and Causal Analysis for Civil Aviation Using QAR Data" Applied Sciences 15, no. 5: 2250. https://doi.org/10.3390/app15052250
APA StyleDang, X., Hua, C., & Rong, C. (2025). Automated Anomaly Detection and Causal Analysis for Civil Aviation Using QAR Data. Applied Sciences, 15(5), 2250. https://doi.org/10.3390/app15052250