Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3366424.3385774acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Challenges in Forecasting Malicious Events from Incomplete Data

Published: 20 April 2020 Publication History

Abstract

The ability to accurately predict cyber-attacks would enable organizations to mitigate their growing threat and avert the financial losses and disruptions they cause. But how predictable are cyber-attacks? Researchers have attempted to combine external data – ranging from vulnerability disclosures to discussions on Twitter and the darkweb – with machine learning algorithms to learn indicators of impending cyber-attacks. However, successful cyber-attacks represent a tiny fraction of all attempted attacks: the vast majority are stopped, or filtered by the security appliances deployed at the target. As we show in this paper, the process of filtering reduces the predictability of cyber-attacks. The small number of attacks that do penetrate the target’s defenses follow a different generative process compared to the whole data which is much harder to learn for predictive models. This could be caused by the fact that the resulting time series also depends on the filtering process in addition to all the different factors that the original time series depended on. We empirically quantify the loss of predictability due to filtering using real-world data from two organizations. Our work identifies the limits to forecasting cyber-attacks from highly filtered data.

References

[1]
Andrés Abeliuk, Zhishen Huang, Emilio Ferrara, and Kristina Lerman. 2019. Predictability limit of partially observed systems. arXiv preprint arXiv:2001.06547(2019).
[2]
Santosh Aditham, Nagarajan Ranganathan, and Srinivas Katkoori. 2017. LSTM-based memory profiling for predicting data attacks in distributed big data systems. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 1259–1267.
[3]
Luca Allodi and Fabio Massacci. 2017. Security Events and Vulnerability Data for Cybersecurity Risk Estimation. Risk Analysis 37, 8 (Aug 2017), 1606–1627. https://doi.org/10.1111/risa.12864
[4]
Mohammed Almukaynizi, Ericsson Marin, Eric Nunes, Paulo Shakarian, Gerardo I Simari, Dipsy Kapoor, and Timothy Siedlecki. 2018. DARKMENTION: A Deployed System to Predict Enterprise-Targeted External Cyberattacks. In 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE, 31–36.
[5]
Jonathan Z Bakdash, Steve Hutchinson, Erin G Zaroukian, Laura R Marusich, Saravanan Thirumuruganathan, Charmaine Sample, Blaine Hoffman, and Gautam Das. 2018. Malware in the future? Forecasting of analyst detection of cyber events. Journal of Cybersecurity 4, 1 (2018), tyy007.
[6]
Christoph Bandt and Bernd Pompe. 2002. Permutation Entropy: A Natural Complexity Measure for Time Series. Phys. Rev. Lett. 88 (Apr 2002), 174102. Issue 17.
[7]
Ashok Deb, Kristina Lerman, and Emilio Ferrara. 2018. Predicting Cyber-Events by Leveraging Hacker Sentiment. Information 9, 11 (2018), 280.
[8]
Palash Goyal, KSM Hossain, Ashok Deb, Nazgol Tavabi, Nathan Bartley, Andrés Abeliuk, Emilio Ferrara, and Kristina Lerman. 2018. Discovering signals from web sources to predict cyber attacks. arXiv preprint arXiv:1806.03342(2018).
[9]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[10]
Kian-Ping Lim, Weiwei Luo, and Jae H Kim. 2013. Are US stock index returns predictable? Evidence from automatic autocorrelation-based tests. Applied Economics 45, 8 (2013), 953–962.
[11]
Fred Morstatter, Jürgen Pfeffer, Huan Liu, and Kathleen M Carley. 2013. Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. In ICWSM.
[12]
Ahmet Okutan, Shanchieh Jay Yang, and Katie McConky. 2017. Predicting cyber attacks with bayesian networks using unconventional signals. In Proceedings of the 12th Annual Conference on Cyber and Information Security Research. ACM, 13.
[13]
Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison Cottrell. 2017. A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971(2017).
[14]
Alan Ritter, Evan Wright, William Casey, and Tom Mitchell. 2015. Weakly Supervised Extraction of Computer Security Events from Twitter. Proceedings of the 24th International Conference on World Wide Web - WWW ’15 (2015).
[15]
Carl Sabottke, Octavian Suciu, and Tudor Dumitras. 2015. Vulnerability disclosure in the age of social media: exploiting twitter for predicting real-world exploits. In 24th {USENIX} Security Symposium ({USENIX} Security 15). 1041–1056.
[16]
Anna Sapienza, Sindhu Kiranmai Ernala, Alessandro Bessi, Kristina Lerman, and Emilio Ferrara. 2018. DISCOVER. Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW ’18 (2018). https://doi.org/10.1145/3184558.3191528
[17]
Samuel V Scarpino and Giovanni Petri. 2019. On the predictability of infectious disease outbreaks. Nature communications 10, 1 (2019), 898.
[18]
Vivek Shandilya, Fahad Polash, and Sajjan Shiva. 2014. A Multi-LAYER ARCHITECTURE FOR SPAM-DETECTION SYSTEM. Computer Science & Information Technology(2014), 193–200.
[19]
Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-László Barabási. 2010. Limits of predictability in human mobility. Science 327, 5968 (2010), 1018–1021.
[20]
Nazgol Tavabi, Palash Goyal, Mohammed Almukaynizi, Paulo Shakarian, and Kristina Lerman. 2018. Darkembed: Exploit prediction with neural language models. In Thirty-Second AAAI Conference on Artificial Intelligence.
[21]
Gordon Werner, Shanchieh Yang, and Katie McConky. 2017. Time series forecasting of cyber attack intensity. In Proceedings of the 12th Annual Conference on cyber and information security research. ACM, 18.
[22]
Jinyu Wu, Lihua Yin, and Yunchuan Guo. 2012. Cyber Attacks Prediction Model Based on Bayesian Network. 2012 IEEE 18th International Conference on Parallel and Distributed Systems (Dec 2012). https://doi.org/10.1109/icpads.2012.117
[23]
M. Xu, K. M. Schweitzer, R. M. Bateman, and S. Xu. 2018. Modeling and Predicting Cyber Hacking Breaches. IEEE Transactions on Information Forensics and Security 13, 11 (Nov 2018), 2856–2871.
[24]
Zhenxin Zhan, Maochao Xu, and Shouhuai Xu. 2015. Predicting cyber attack rates with extreme values. IEEE Transactions on Information Forensics and Security 10, 8(2015), 1666–1677.

Cited By

View all
  • (2022)Modelling and predicting enterprise-level cyber risks in the context of sparse data availabilityThe Geneva Papers on Risk and Insurance - Issues and Practice10.1057/s41288-022-00282-648:2(434-462)Online publication date: 10-Dec-2022
  • (2020)Predictability limit of partially observed systemsScientific Reports10.1038/s41598-020-77091-110:1Online publication date: 24-Nov-2020
  • (2020)Forecasting Network Intrusions from Security Logs Using LSTMsDeployable Machine Learning for Security Defense10.1007/978-3-030-59621-7_7(122-137)Online publication date: 18-Oct-2020

Index Terms

  1. Challenges in Forecasting Malicious Events from Incomplete Data
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '20: Companion Proceedings of the Web Conference 2020
    April 2020
    854 pages
    ISBN:9781450370240
    DOI:10.1145/3366424
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cyber-attack
    2. forecasting
    3. permutation entropy
    4. predictability
    5. time-series

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '20
    Sponsor:
    WWW '20: The Web Conference 2020
    April 20 - 24, 2020
    Taipei, Taiwan

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Modelling and predicting enterprise-level cyber risks in the context of sparse data availabilityThe Geneva Papers on Risk and Insurance - Issues and Practice10.1057/s41288-022-00282-648:2(434-462)Online publication date: 10-Dec-2022
    • (2020)Predictability limit of partially observed systemsScientific Reports10.1038/s41598-020-77091-110:1Online publication date: 24-Nov-2020
    • (2020)Forecasting Network Intrusions from Security Logs Using LSTMsDeployable Machine Learning for Security Defense10.1007/978-3-030-59621-7_7(122-137)Online publication date: 18-Oct-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media