Some Remarks on Malicious and Negligent Data Breach Distribution Estimates
<p>Empirical distribution of the (logarithm of the) number of breached records, assumed to be a measure of the breach severity for (<b>a</b>) MED type organizations and (<b>b</b>) all but MED type organizations. Data reported in the PRC dataset in the time period 2010–2019.</p> "> Figure 2
<p>Empirical (light gray) vs. estimated (dark gray) distributions of the daily frequency of breaches for (<b>a</b>) malicious and (<b>b</b>) negligent breach types, respectively. Data for all but MED type companies as reported in the PRC dataset in the time period 2010–2019.</p> "> Figure 3
<p>Empirical (histograms) vs. estimated (red line) distributions of the severity of breaches, measured by the number of breached records for (<b>a</b>) malicious and (<b>b</b>) negligent breach types, respectively. Data for all but MED type companies as reported in the PRC dataset in the time period 2010–2019.</p> "> Figure 4
<p>The empirical distribution function of the (logarithm of the) number of breached records (circles) compared to the estimated distribution function (black line) for (<b>a</b>) malicious and (<b>b</b>) negligent breach types, respectively. Data for all but MED type companies as reported in the PRC dataset in the time period 2010–2019.</p> ">
Abstract
:1. Introduction
2. Data Breach Risk
3. Materials and Methods
3.1. Data Description
3.2. Modeling Methodology
3.3. Cyber Risk Measures
4. Results
- After observing the different statistical nature (in terms of distribution parameters) of the negligent breaches and the malicious ones, we separated the breach data into these two categories.
- We verified that the best fit for the daily frequency of both categories was given by a negative binomial distribution, and this was completely confirmed by a Kolmogorov–Smirnov (KS) test.
- Regarding severity, we observed that the best fit for both malicious and negligent breaches, among all the distributions proposed in the literature, was given by the skew-normal distribution. The fits were not as accurate as the frequency ones, but the KS test still confirmed the quite good performance of the model.
- Indeed, very few large breaches did not fit well with the skew-normal distribution, where negligent breaches were slightly underestimated and malicious ones were overestimated. All the conclusions were confirmed by the VaR estimates in Table 5.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Allianz Global Corporate & Specialty. Allianz Risk Barometer: Top Business Risks for 2022. Report. 2022. Available online: https://www.agcs.allianz.com/news-and-insights/reports/allianz-risk-barometer.html (accessed on 10 February 2021).
- Dacorogna, M.; Debbabi, N.; Kratz, M. Building up Cyber Resilience by better grasping cyber risk via a new algorithm for modelling heavy-tailed data. arXiv 2022, arXiv:2209.02845. [Google Scholar]
- Weathley, S.; Hofmann, H.; Sornette, D. Data breaches in the catastrophe framework and beyond. arXiv 2019, arXiv:1901.00699v2. [Google Scholar]
- OECD. Types of cyber incidents and losses. In Enhancing the Role of Insurance in Cyber Risk Management; OECD Publishing: Paris, France, 2017. [Google Scholar] [CrossRef]
- Martinelli, F.; Orlando, A.; Uuganbayar, G.; Yautsiukhin, A. Preventing the Drop in Security Investments for Non-competitive Cyber-Insurance Market. In Risks and Security of Internet and Systems; Cuppens, N., Cuppens, F., Lanet, J.L., Legay, A., Garcia-Alfaro, J., Eds.; CRiSIS 2017; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 10694, pp. 159–174. [Google Scholar]
- Marotta, A.; Martinelli, F.; Nanni, S.; Orlando, A.; Yautsiukhin, A. Cyber-insurance survey. Comput. Sci. Rev. 2017, 24, 35–61. [Google Scholar] [CrossRef]
- Eling, M.; Loperfido, N. Data breaches. Goodness of fit, pricing, and risk measurement. Insur. Math. Econ. 2017, 75, 126–136. [Google Scholar] [CrossRef]
- Sun, H.; Xu, M.; Zhao, P. Modeling Malicious Hacking Data Breach Risks. N. Am. Actuar. J. 2021, 25, 484–502. [Google Scholar] [CrossRef]
- Carfora, M.F.; Martinelli, F.; Mercaldo, F.; Orlando, A. Cyber Risk Management: An Actuarial Point of View. J. Oper. Risk 2019, 14, 77–103. [Google Scholar]
- Chief Risk Officers Forum—CRO Forum. Concept Paper on a Proposed Categorisation Methodology for Cyber Risk CRO Forum; CRO Forum: Amsterdam, The Netherlands, 2016. [Google Scholar]
- Ponemon Institute. 2022 Cost of Data Breach Study: Global Analysis; Ponemon Institute LLC: Traverse City, MI, USA, 2022. [Google Scholar]
- Woods, D.W.; Böhme, R. SoK: Quantifying Cyber Risk. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021. [Google Scholar]
- Privacy Rights Clearinghouse Chronology of Data Breaches. 2022. Available online: https://privacyrights.org/data-breaches (accessed on 15 June 2022).
- Edwards, B.; Hofmeyr, S.; Forrest, S. Hype and heavy tails: A closer look at data breaches. J. Cybersecur. 2016, 2, 3–14. [Google Scholar] [CrossRef] [Green Version]
- Xu, M.; Schweitzer, K.M.; Bateman, R.M.; Xu, S. Modeling and Predicting Cyber Hacking Breaches. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2856–2871. [Google Scholar] [CrossRef]
- Buckman, J.; Bockstedt, J.; Hashim, M.J.; Woutersen, T. Do organizations learn from a data breach. In Proceedings of the Workshop on the Economics of Information Security (WEIS), La Jolla, CA, USA, 26–27 June 2017; pp. 1–22. [Google Scholar]
- Bentley, M.; Stephenson, A.; Toscas, P.; Zhu, Z. A multivariate model to quantify and mitigate cybersecuity risk. Risks 2020, 8, 61. [Google Scholar] [CrossRef]
- Eling, M.; Jung, K. Copula approaches for modeling cross sectional dependence of data breach losses. Insur. Math. Econ. 2018, 82, 167–180. [Google Scholar] [CrossRef]
- Eling, M. Fitting insurance claims to skewed distributions: Are the skew-normal and skew-student good models? Insur. Math. Econ. 2012, 51, 239–248. [Google Scholar] [CrossRef]
- Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
- Carfora, M.F.; Orlando, A. Quantile based risk measures in cyber security. In Proceedings of the 2019 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (Cyber SA), Oxford, UK, 3–4 June 2019; pp. 1–4. [Google Scholar]
- Rosati, P.; Lynn, T. A dataset for accounting, finance and economics research on US data breaches. Data Brief 2021, 35, 106924. [Google Scholar] [CrossRef] [PubMed]
Acronym | Description |
---|---|
BSF | Businesses: Financial and Insurance Services |
BSO | Businesses: Other |
BSR | Businesses: Retail or Merchant, Including Online Retail |
EDU | Educational Institutions |
GOV | Government and Military |
MED | Healthcare, Medical Providers, and Medical Insurance Services |
NGO | Nonprofits |
Acronym | Description |
---|---|
CARD | Payment card fraud (fraud involving debit and credit cards that is |
not accomplished via hacking) | |
HACK | Hacking or malicious software |
INSD | Insider (someone with legitimate access, such as an employee, |
contractor, or customer, who intentionally breaches information) | |
PHYS | Physical loss (includes paper documents that are lost, discarded, |
or stolen (non-electronic)) | |
PORT | Portable loss (lost, discarded, or stolen laptops, PDAs, smartphones, |
memory sticks, CDs, hard drives, data tapes, etc.) | |
STAT | Stationary loss (lost, inappropriately accessed, discarded, or stolen |
computers or servers not designed for mobility) | |
DISC | Unintended disclosure (not involving hacking, intentional breach, or |
physical loss): sensitive information posted publicly, mishandled, or | |
sent to the wrong party via publishing online, sending in an email, etc. | |
UNKN | Unknown loss |
BSF | BSO | BSR | EDU | GOV | MED | NGO | UNKN | TOTAL | |
---|---|---|---|---|---|---|---|---|---|
#N/A | 0 | 0 | 0 | 0 | 0 | 87 | 0 | 0 | 87 |
CARD | 8 | 2 | 15 | 1 | 0 | 0 | 0 | 0 | 26 |
DISC | 39 | 33 | 34 | 83 | 85 | 991 | 5 | 0 | 1270 |
HACK | 78 | 188 | 113 | 102 | 68 | 798 | 20 | 0 | 1367 |
INSD | 29 | 15 | 32 | 8 | 36 | 160 | 5 | 0 | 285 |
PHYS | 11 | 14 | 5 | 15 | 31 | 1281 | 5 | 0 | 1362 |
PORT | 16 | 23 | 10 | 31 | 46 | 289 | 10 | 0 | 425 |
STAT | 3 | 3 | 2 | 6 | 2 | 72 | 0 | 0 | 88 |
UNKN | 41 | 11 | 7 | 34 | 14 | 32 | 2 | 465 | 606 |
TOTAL | 225 | 289 | 218 | 280 | 282 | 3710 | 47 | 465 | 5516 |
BSF | BSO | BSR | EDU | GOV | MED | NGO | UNKN | |
---|---|---|---|---|---|---|---|---|
#N/A | 0 | 0 | 0 | 0 | 0 | 3,079,889 | 0 | 0 |
CARD | 7,035,066 | 310 | 2,124,575 | 16 | 0 | 0 | 0 | 0 |
DISC | 1,550,375 | 2,105,006,706 | 385,194,087 | 1,576,141 | 21,094,488 | 12,979,387 | 3,501,561 | 0 |
HACK | 348,057,288 | 5,494,774,684 | 791,295,680 | 45,231,810 | 40,900,705 | 159,979,906 | 3,350,944 | 0 |
INSD | 2,407,569 | 3,508,456 | 35,671 | 40,379 | 28,506,293 | 1,059,014 | 317 | 0 |
PHYS | 58,909 | 64,007 | 4071 | 1,023,422 | 209,616 | 35,715,718 | 24,157 | 0 |
PORT | 5,852,045 | 5,836,258 | 30,244 | 238,778 | 7,683,283 | 12,645,645 | 72,176 | 0 |
STAT | 100,348 | 80,108 | 9189 | 78,177 | 3650 | 9,604,567 | 0 | 0 |
UNKN | 421,366 | 100,155,387 | 68,000,391 | 10,352,675 | 849,587 | 109,731 | 2501 | 10,657,026 |
Type | 90 es | 90 em | pv | 95 es | 95 em | pv | 99 es | 99 em | pv | 99.5 es | 99.5 em | pv |
---|---|---|---|---|---|---|---|---|---|---|---|---|
mal | 6.10 | 6.04 | 0.71 | 7.02 | 7.00 | 0.94 | 8.81 | 8.17 | 0.02 | 9.49 | 8.53 | 0.02 |
negl | 5.41 | 5.14 | 0.02 | 6.05 | 6.30 | 0.19 | 7.30 | 7.95 | 0.16 | 7.78 | 8.18 | 0.45 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Carfora, M.F.; Orlando, A. Some Remarks on Malicious and Negligent Data Breach Distribution Estimates. Computation 2022, 10, 208. https://doi.org/10.3390/computation10120208
Carfora MF, Orlando A. Some Remarks on Malicious and Negligent Data Breach Distribution Estimates. Computation. 2022; 10(12):208. https://doi.org/10.3390/computation10120208
Chicago/Turabian StyleCarfora, Maria Francesca, and Albina Orlando. 2022. "Some Remarks on Malicious and Negligent Data Breach Distribution Estimates" Computation 10, no. 12: 208. https://doi.org/10.3390/computation10120208