Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1978672.1978681acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

An experimental study on the measurement of data sensitivity

Published: 10 April 2011 Publication History

Abstract

Data-centric security proposes to leverage the business value of data to determine the level of overall IT security. It has gained much enthusiasm from the security community, but has not materialized into a practical security system. In this paper, we introduce our recent work towards fine-grained data centric security, which estimates the sensitivity of enterprise data semi-automatically. Specifically, the categories of sensitive data and their relative sensitivities are initially determined by subject matter experts (SMEs). We then apply a suite of text analytics and classification tools to automatically discover sensitive information in enterprise data, such as personally identifiable information (PII) and confidential documents, and estimates the sensitivity of individual data.
To validate the idea, we developed a proof-of-concept system that crawls all the files in a personal computer and estimates the sensitivity of individual files and the overall sensitivity level of the computer. We conducted a pilot test at a large IT company with its employees' laptops. The pilot scanned 28 different laptops, in which 2.2 million files stored in various file formats were analyzed. Specifically, the files were analyzed to determine if they contain any of the pre-defined sensitive information, comprising 11 different PII types and 11 sensitive topics. In addition to the sensitivity estimation, we also conducted a risk survey to estimate the risk level of the laptops.
We found that, surprisingly, 7% of the analyzed files belong to one of the eleven sensitive data categories defined by the SMEs of the company, and 37% of the files contain at least one piece of sensitive information such as address or person name. The analysis also discovered that the laptops have similar overall sensitivity levels, but a few machines have exceptionally high sensitivity. Interestingly, those few highly sensitive laptops were also most at risk of data loss and of malware infection, according to user survey responses. Furthermore, the tool produces the evidence of the discovered sensitive information including the surrounding context in the document, and thus users can easily redact the sensitive information or move it to a more secure location. Thus, this system can be used as a privacy enhancing tool as well as a security tool.

References

[1]
Open Security Foundation: OSF datalloss db. (http://datalossdb.org/)
[2]
Grandison, T., Bilger, M., O'Connor, L., Graf, M., Swimmer, M., Schunter, M., Wespi, A., Zunic, N.: Elevating the discussion on security management: The data centric paradigm. In: Proceedings of the 2nd IEEE/IFIP International Workshop on Business-driven IT Management (BDIM). (2007)
[3]
van Cleeff, A., Wieringa, R. In: Proceedings of the IADIS International Conference Information Systems. (2009) 105--112
[4]
Mogull, R.: Dlp content discovery: Best practices for stored data discovery and protection. http://www.emea.symantec.com/discover/downloads/DLP-Content-Discovery-Best-Practices.pdf (2008)
[5]
Liu, S., Kuhn, R.: Data loss prevention. In: IT Professional. Number 2 (2010) 10--13
[6]
Parno, B., McCune, J. M., Wendlandt, D., Andersen, D. G., Perrig, A.: Clamp: Practical prevention of large-scale data leaks. IEEE Symposium on Security and Privacy (2009) 154--169
[7]
McCullagh, K.: Data sensitivity: Proposals for resolving the conundrum. Journal of International Commercial Law and Technology 2 (2007)
[8]
Sokolova, M., El Emam, K., Rose, S., Chowdhury, S., Neri, E., Jonker, E., Peyton, L.: Personal health information leak prevention in heterogeneous texts. In: Proceedings of the Workshop on Adaptation of Language Resources and Technology to New Domains. AdaptLRTtoND '09, Association for Computational Linguistics (2009) 58--69
[9]
Park, Y.: A text mining approach to confidential document detection for data loss prevention. In: IBM Research Technical Report RC25055. (2010)

Cited By

View all
  • (2023)Privacy-Preserving Redaction of Diagnosis Data through Source Code AnalysisProceedings of the 35th International Conference on Scientific and Statistical Database Management10.1145/3603719.3603734(1-4)Online publication date: 10-Jul-2023
  • (2022)Identifying high-risk over-entitlement in access control policies using fuzzy logicCybersecurity10.1186/s42400-022-00112-15:1Online publication date: 2-Mar-2022
  • (2022)Optimal Data Allocation in the Environment of Edge and Cloud Servers2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)10.1109/ICNSC55942.2022.10004065(1-6)Online publication date: 15-Dec-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BADGERS '11: Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security
April 2011
111 pages
ISBN:9781450307680
DOI:10.1145/1978672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 April 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

EuroSys '11
Sponsor:
EuroSys '11: Sixth EuroSys Conference 2011
April 10, 2011
Salzburg, Austria

Acceptance Rates

Overall Acceptance Rate 4 of 7 submissions, 57%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Privacy-Preserving Redaction of Diagnosis Data through Source Code AnalysisProceedings of the 35th International Conference on Scientific and Statistical Database Management10.1145/3603719.3603734(1-4)Online publication date: 10-Jul-2023
  • (2022)Identifying high-risk over-entitlement in access control policies using fuzzy logicCybersecurity10.1186/s42400-022-00112-15:1Online publication date: 2-Mar-2022
  • (2022)Optimal Data Allocation in the Environment of Edge and Cloud Servers2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)10.1109/ICNSC55942.2022.10004065(1-6)Online publication date: 15-Dec-2022
  • (2018)A Study on Document Password Management using the Trust-Chain Based SPT (Work-in-Progress)2018 International Conference on Software Security and Assurance (ICSSA)10.1109/ICSSA45270.2018.00016(30-33)Online publication date: Jul-2018
  • (2016)Data classification and sensitivity estimation for critical asset discoveryIBM Journal of Research and Development10.1147/JRD.2016.255763860:4(2:1-2:12)Online publication date: 1-Jul-2016
  • (2016)A Policy-Driven Framework for Document Classification and Enterprise Security2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld)10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0149(949-953)Online publication date: Jul-2016
  • (2015)Data and Information Leakage Prevention Within the Scope of Information SecurityIEEE Access10.1109/ACCESS.2015.25061853(2554-2565)Online publication date: 2015
  • (2013)Estimating Asset Sensitivity by Profiling UsersComputer Security – ESORICS 201310.1007/978-3-642-40203-6_6(94-110)Online publication date: 2013
  • (2012)An Architecture for the Enforcement of Privacy and Security Requirements in Internet-Centric ServicesProceedings of the 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2012.72(1024-1031)Online publication date: 25-Jun-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media