research-article

Challenges in Forecasting Malicious Events from Incomplete Data

Authors:

Andres Abeliuk,

Negar Mokhberian,

Jeremy Abramson,

Kristina LermanAuthors Info & Claims

WWW '20: Companion Proceedings of the Web Conference 2020

Pages 603 - 610

https://doi.org/10.1145/3366424.3385774

Published: 20 April 2020 Publication History

Abstract

The ability to accurately predict cyber-attacks would enable organizations to mitigate their growing threat and avert the financial losses and disruptions they cause. But how predictable are cyber-attacks? Researchers have attempted to combine external data – ranging from vulnerability disclosures to discussions on Twitter and the darkweb – with machine learning algorithms to learn indicators of impending cyber-attacks. However, successful cyber-attacks represent a tiny fraction of all attempted attacks: the vast majority are stopped, or filtered by the security appliances deployed at the target. As we show in this paper, the process of filtering reduces the predictability of cyber-attacks. The small number of attacks that do penetrate the target’s defenses follow a different generative process compared to the whole data which is much harder to learn for predictive models. This could be caused by the fact that the resulting time series also depends on the filtering process in addition to all the different factors that the original time series depended on. We empirically quantify the loss of predictability due to filtering using real-world data from two organizations. Our work identifies the limits to forecasting cyber-attacks from highly filtered data.

References

[1]

Andrés Abeliuk, Zhishen Huang, Emilio Ferrara, and Kristina Lerman. 2019. Predictability limit of partially observed systems. arXiv preprint arXiv:2001.06547(2019).

[2]

Santosh Aditham, Nagarajan Ranganathan, and Srinivas Katkoori. 2017. LSTM-based memory profiling for predicting data attacks in distributed big data systems. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 1259–1267.

[3]

Luca Allodi and Fabio Massacci. 2017. Security Events and Vulnerability Data for Cybersecurity Risk Estimation. Risk Analysis 37, 8 (Aug 2017), 1606–1627. https://doi.org/10.1111/risa.12864

[4]

Mohammed Almukaynizi, Ericsson Marin, Eric Nunes, Paulo Shakarian, Gerardo I Simari, Dipsy Kapoor, and Timothy Siedlecki. 2018. DARKMENTION: A Deployed System to Predict Enterprise-Targeted External Cyberattacks. In 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE, 31–36.

[5]

Jonathan Z Bakdash, Steve Hutchinson, Erin G Zaroukian, Laura R Marusich, Saravanan Thirumuruganathan, Charmaine Sample, Blaine Hoffman, and Gautam Das. 2018. Malware in the future? Forecasting of analyst detection of cyber events. Journal of Cybersecurity 4, 1 (2018), tyy007.

[6]

Christoph Bandt and Bernd Pompe. 2002. Permutation Entropy: A Natural Complexity Measure for Time Series. Phys. Rev. Lett. 88 (Apr 2002), 174102. Issue 17.

[7]

Ashok Deb, Kristina Lerman, and Emilio Ferrara. 2018. Predicting Cyber-Events by Leveraging Hacker Sentiment. Information 9, 11 (2018), 280.

[8]

Palash Goyal, KSM Hossain, Ashok Deb, Nazgol Tavabi, Nathan Bartley, Andrés Abeliuk, Emilio Ferrara, and Kristina Lerman. 2018. Discovering signals from web sources to predict cyber attacks. arXiv preprint arXiv:1806.03342(2018).

[9]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[10]

Kian-Ping Lim, Weiwei Luo, and Jae H Kim. 2013. Are US stock index returns predictable? Evidence from automatic autocorrelation-based tests. Applied Economics 45, 8 (2013), 953–962.

[11]

Fred Morstatter, Jürgen Pfeffer, Huan Liu, and Kathleen M Carley. 2013. Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. In ICWSM.

[12]

Ahmet Okutan, Shanchieh Jay Yang, and Katie McConky. 2017. Predicting cyber attacks with bayesian networks using unconventional signals. In Proceedings of the 12th Annual Conference on Cyber and Information Security Research. ACM, 13.

Digital Library

[13]

Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison Cottrell. 2017. A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971(2017).

[14]

Alan Ritter, Evan Wright, William Casey, and Tom Mitchell. 2015. Weakly Supervised Extraction of Computer Security Events from Twitter. Proceedings of the 24th International Conference on World Wide Web - WWW ’15 (2015).

Digital Library

[15]

Carl Sabottke, Octavian Suciu, and Tudor Dumitras. 2015. Vulnerability disclosure in the age of social media: exploiting twitter for predicting real-world exploits. In 24th {USENIX} Security Symposium ({USENIX} Security 15). 1041–1056.

[16]

Anna Sapienza, Sindhu Kiranmai Ernala, Alessandro Bessi, Kristina Lerman, and Emilio Ferrara. 2018. DISCOVER. Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW ’18 (2018). https://doi.org/10.1145/3184558.3191528

[17]

Samuel V Scarpino and Giovanni Petri. 2019. On the predictability of infectious disease outbreaks. Nature communications 10, 1 (2019), 898.

[18]

Vivek Shandilya, Fahad Polash, and Sajjan Shiva. 2014. A Multi-LAYER ARCHITECTURE FOR SPAM-DETECTION SYSTEM. Computer Science & Information Technology(2014), 193–200.

[19]

Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-László Barabási. 2010. Limits of predictability in human mobility. Science 327, 5968 (2010), 1018–1021.

[20]

Nazgol Tavabi, Palash Goyal, Mohammed Almukaynizi, Paulo Shakarian, and Kristina Lerman. 2018. Darkembed: Exploit prediction with neural language models. In Thirty-Second AAAI Conference on Artificial Intelligence.

[21]

Gordon Werner, Shanchieh Yang, and Katie McConky. 2017. Time series forecasting of cyber attack intensity. In Proceedings of the 12th Annual Conference on cyber and information security research. ACM, 18.

Digital Library

[22]

Jinyu Wu, Lihua Yin, and Yunchuan Guo. 2012. Cyber Attacks Prediction Model Based on Bayesian Network. 2012 IEEE 18th International Conference on Parallel and Distributed Systems (Dec 2012). https://doi.org/10.1109/icpads.2012.117

[23]

M. Xu, K. M. Schweitzer, R. M. Bateman, and S. Xu. 2018. Modeling and Predicting Cyber Hacking Breaches. IEEE Transactions on Information Forensics and Security 13, 11 (Nov 2018), 2856–2871.

[24]

Zhenxin Zhan, Maochao Xu, and Shouhuai Xu. 2015. Predicting cyber attack rates with extreme values. IEEE Transactions on Information Forensics and Security 10, 8(2015), 1666–1677.

Digital Library

Cited By

Zängerle DSchiereck D(2022)Modelling and predicting enterprise-level cyber risks in the context of sparse data availabilityThe Geneva Papers on Risk and Insurance - Issues and Practice10.1057/s41288-022-00282-648:2(434-462)Online publication date: 10-Dec-2022
https://doi.org/10.1057/s41288-022-00282-6
Abeliuk AHuang ZFerrara ELerman K(2020)Predictability limit of partially observed systemsScientific Reports10.1038/s41598-020-77091-110:1Online publication date: 24-Nov-2020
https://doi.org/10.1038/s41598-020-77091-1
Mueller WMemory ABartrem K(2020)Forecasting Network Intrusions from Security Logs Using LSTMsDeployable Machine Learning for Security Defense10.1007/978-3-030-59621-7_7(122-137)Online publication date: 18-Oct-2020
https://doi.org/10.1007/978-3-030-59621-7_7

Index Terms

Challenges in Forecasting Malicious Events from Incomplete Data
1. Social and professional topics

Index terms have been assigned to the content through auto-classification.

Recommendations

Benchmarking Adversarial Attacks and Defenses for Time-Series Data
Neural Information Processing
Abstract
The adversarial vulnerability of deep networks has spurred the interest of researchers worldwide. Unsurprisingly, like images, adversarial examples also translate to time-series data as they are an inherent weakness of the model itself rather than ... $_{}$
Analysis of the Cybersecurity Threats in Botswana Using Publicly Available Data

Online criminal and terrorist activities impact society at individual, organizational and national levels. This makes cybersecurity risk a society risk, one in which cyber-attacks affect the whole community. As such a government led cybersecurity ...
Resiliency of forecasting methods in different application areas of smart grids: A review and future prospects
Abstract
The cyber–physical infrastructure of a smart grid requires data-dependent artificial intelligence (AI)-based forecasting schemes for predicting different aspects for the short- to long-term, where AI-based schemes include machine learning (ML), ...
Highlights
- Categorized the application areas of forecasting models in smart grids.
- Summarized presentation of forecasting models for each application area.
- Organized discussion on usages and robustness of forecasting models during cyber-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Companion Proceedings of the Web Conference 2020

April 2020

854 pages

ISBN:9781450370240

DOI:10.1145/3366424

Editors:
Amal El Fallah Seghrouchni
Sorbonne University, France
,
Gita Sukthankar
University of Central Florida, United States
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
220
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zängerle DSchiereck D(2022)Modelling and predicting enterprise-level cyber risks in the context of sparse data availabilityThe Geneva Papers on Risk and Insurance - Issues and Practice10.1057/s41288-022-00282-648:2(434-462)Online publication date: 10-Dec-2022
https://doi.org/10.1057/s41288-022-00282-6
Abeliuk AHuang ZFerrara ELerman K(2020)Predictability limit of partially observed systemsScientific Reports10.1038/s41598-020-77091-110:1Online publication date: 24-Nov-2020
https://doi.org/10.1038/s41598-020-77091-1
Mueller WMemory ABartrem K(2020)Forecasting Network Intrusions from Security Logs Using LSTMsDeployable Machine Learning for Security Defense10.1007/978-3-030-59621-7_7(122-137)Online publication date: 18-Oct-2020
https://doi.org/10.1007/978-3-030-59621-7_7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents