Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems

Published: 01 October 2006 Publication History

Abstract

With the prevalence of Internet services and the increase of their complexity, there is a growing need to improve their operational reliability and availability. While a large amount of monitoring data can be collected from systems for fault analysis, it is hard to correlate this data effectively across distributed systems and observation time. In this paper, we analyze the mass characteristics of user requests and propose a novel approach to model and track transaction flow dynamics for fault detection in complex information systems. We measure the flow intensity at multiple checkpoints inside the system and apply system identification methods to model transaction flow dynamics between these measurements. With the learned analytical models, a model-based fault detection and isolation method is applied to track the flow dynamics in real time for fault detection. We also propose an algorithm to automatically search and validate the dynamic relationship between randomly selected monitoring points. Our algorithm enables systems to have self-cognition capability for system management. Our approach is tested in a real system with a list of injected faults. Experimental results demonstrate the effectiveness of our approach and algorithms.

References

[1]
D. Patterson, “A Simple Way to Estimate the Cost of Downtime,” Proc. 16th System Administration Conf. (LISA '02), pp. 185-188, 2002.
[2]
D. Oppenheimer, A. Ganapathi, and D. Patterson, “Why Do Internet Services Fail, and What Can Be Done About It,” Proc. Fourth Usenix Symp. Internet Technologies and Systems (USITS '03), 2003.
[3]
M. Chen, A. Accardi, E. Kiciman, J. Lloyd, D. Patterson, A. Fox, and E. Brewer, “Path-Based Failure and Evolution Management,” Proc. First USENIX Symp. Networked Systems Design and Implementation (NSDI '04), Mar. 2004.
[7]
R. Isermann, “Model-Based Fault Detection and Diagnosis—Status and Applications,” Proc. 16th IFAC Symp. Automatic Control in Aerospace (ACA '04), June 2004.
[8]
J. Gertler, Fault Detection and Diagnosis in Engineering Systems. Marcel Dekker, 1998.
[12]
L. Ljung, System Identification - Theory for the User, second ed. Prentice Hall PTR, 1998.
[13]
R. Redner and H. Walker, “Mixture Densities, Maximum Likelihood and the Em Algorithm,” SIAM Rev., vol. 26, no. 2, pp. 195-239, 1984.
[14]
H. Akaike, “Information Theory and an Extension of the Maximum Likelihood Principle,” Proc. Second Int'l Symp. Information Theory, 1973.
[15]
J. Rissanen, “Prediction Minimum Description Length Principles,” Ann. Statistics, vol. 14, 1986.
[17]
M. Spiegel, Theory and Problems of Probability and Statistics. McGraw-Hill, 1992.
[18]
J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufman, 2000.
[19]
Proc. Int'l Conf. Dependable Systems and Networks (DSN), pp. 644-653, 2005.
[20]
Ira Cohen, Steve Zhang, Moises Goldszmidt, Julie Symons, Terence Kelly, Armando Fox, Capturing, indexing, clustering, and retrieving system history, ACM SIGOPS Operating Systems Review, v.39 n.5, December 2005
[21]
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, 2000.
[22]
R. Yager, M. Fedrizzi, and J. Kacprzyk, Advances in the Dempster-Shafer Theory of Evidence. Wiley, 1994.
[24]
J. O'Madadhain, D. Fisher, S. White, and Y. Boey, “The Jung (Java Universal Network/Graph) Framework,” Technical Report UCI-ICS 03-17, Univ. of California at Irvine, School of Information and Computer Sciences,
[25]
J. Voas and G. Mcgraw, Software Fault Injection: Inoculating Programs against Errors. John Wiley & Sons, 1997.
[26]
B. Tate, Bitter Java. Manning Publications, 2002.
[27]
E. Kiciman and A. Fox, “Detecting Application-Level Failures in Component-Based Internet Services,” IEEE Trans. Neural Networks, vol. 16, no. 5, pp. 1027-1041, 2006.
[28]
IEEE Comm. Magazine, vol. 34, no. 5, pp. 82-90, May 1996.
[29]
G. Jiang, H. Chen, C. Ungureanu, and K. Yoshihira, “Multi-Resolution Abnormal Trace Detection Using Varied-Length ngrams and Automata,” Proc. Second IEEE Int'l Conf. Autonomic Computing (ICAC '05), June 2005.
[30]
M. Aguilera, J. Mogul, J. Wiener, P. Reynolds, and A. Muthitacharoen, “Performance Debugging for Distributed Systems of Black Boxes,” Proc. 19th ACM Symp. Operating Systems Principles (SOSP '03), pp. 74-89, 2003.
[31]
R. Isermann and P. Balle, “Trends in the Application of Model-Based Fault Detection and Diagnosis of Industrial Process,” Control Eng. Practice, vol. 5, no. 5, 1997.
[32]
R. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” Trans. ASME-J. Basic Eng., vol. 82, no. series D, pp. 35-45, 1960.

Cited By

View all
  • (2022)NLP Based Anomaly Detection for Categorical Time Series2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI54793.2022.00019(27-34)Online publication date: 9-Aug-2022
  • (2021)Predicting Performance Anomalies in Software Systems at Run-timeACM Transactions on Software Engineering and Methodology10.1145/344075730:3(1-33)Online publication date: 23-Apr-2021
  • (2021)Flight data anomaly detection and diagnosis with variable association changeProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3441916(346-354)Online publication date: 22-Mar-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing  Volume 3, Issue 4
October 2006
133 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 October 2006

Author Tags

  1. Fault detection
  2. dynamic relationship
  3. flow intensity and dynamics.
  4. information systems
  5. model validation
  6. model-based FDI
  7. regression model
  8. system management

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)NLP Based Anomaly Detection for Categorical Time Series2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI54793.2022.00019(27-34)Online publication date: 9-Aug-2022
  • (2021)Predicting Performance Anomalies in Software Systems at Run-timeACM Transactions on Software Engineering and Methodology10.1145/344075730:3(1-33)Online publication date: 23-Apr-2021
  • (2021)Flight data anomaly detection and diagnosis with variable association changeProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3441916(346-354)Online publication date: 22-Mar-2021
  • (2020)AutoMAP: Diagnose Your Microservice-based Web Applications AutomaticallyProceedings of The Web Conference 202010.1145/3366423.3380111(246-258)Online publication date: 20-Apr-2020
  • (2020)HSACMA: a hierarchical scalable adaptive cloud monitoring architectureSoftware Quality Journal10.1007/s11219-020-09524-z28:3(1379-1410)Online publication date: 1-Sep-2020
  • (2019) WebHound: a data-driven intrusion detection from real-world web access logsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-018-03750-123:22(11947-11965)Online publication date: 1-Nov-2019
  • (2018)TINETProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3220003(1890-1899)Online publication date: 19-Jul-2018
  • (2018)Self-adaptive cloud monitoring with online anomaly detectionFuture Generation Computer Systems10.1016/j.future.2017.09.06780:C(89-101)Online publication date: 1-Mar-2018
  • (2017)Security of Cyber-Physical Systems in the Presence of Transient Sensor FaultsACM Transactions on Cyber-Physical Systems10.1145/30648091:3(1-23)Online publication date: 9-May-2017
  • (2017)Ranking Causal Anomalies for System Fault Diagnosis via Temporal and Dynamical Analysis on Vanishing CorrelationsACM Transactions on Knowledge Discovery from Data10.1145/304694611:4(1-28)Online publication date: 29-Jun-2017
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media