research-article

Is data clustering in adversarial settings secure?

Authors:

Battista Biggio,

Ignazio Pillai,

Samuel Rota Bulò,

Marcello Pelillo,

Fabio RoliAuthors Info & Claims

AISec '13: Proceedings of the 2013 ACM workshop on Artificial intelligence and security

Pages 87 - 98

https://doi.org/10.1145/2517312.2517321

Published: 04 November 2013 Publication History

Abstract

Clustering algorithms have been increasingly adopted in security applications to spot dangerous or illicit activities. However, they have not been originally devised to deal with deliberate attack attempts that may aim to subvert the clustering process itself. Whether clustering can be safely adopted in such settings remains thus questionable. In this work we propose a general framework that allows one to identify potential attacks against clustering algorithms, and to evaluate their impact, by making specific assumptions on the adversary's goal, knowledge of the attacked system, and capabilities of manipulating the input data. We show that an attacker may significantly poison the whole clustering process by adding a relatively small percentage of attack samples to the input data, and that some attack samples may be obfuscated to be hidden within some existing clusters. We present a case study on single-linkage hierarchical clustering, and report experiments on clustering of malware samples and handwritten digits.

References

[1]

Collaborative Malware Collection and Sensing. https://alliance.mwcollect.org.

[2]

Project Malfease. http://malfease.oarci.net.

[3]

M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In ASIACCS '06: Proc. 2006 ACM Symposium on Information, Computer and Communications Security, pages 16--25, NY, USA, 2006. ACM.

Digital Library

[4]

U. Bayer, P. M. Comparetti, C. Hlauschek, C. Krügel, and E. Kirda. Scalable, behavior-based malware clustering. In NDSS. The Internet Society, 2009.

[5]

B. Biggio, G. Fumera, and F. Roli. Design of robust classifiers for adversarial environments. In IEEE Int'l Conf. on Systems, Man, and Cybernetics (SMC), pages 977--982, 2011.

[6]

B. Biggio, G. Fumera, and F. Roli. Security evaluation of pattern classifiers under attack. IEEE Trans. on Knowledge and Data Eng., 99(PrePrints):1, 2013.

[7]

B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against support vector machines. In J. Langford and J. Pineau, editors, 29th Int'l Conf. on Machine Learning. Omnipress, 2012.

[8]

M. Brückner, C. Kanzow, and T. Scheffer. Static prediction games for adversarial learning problems. J. Mach. Learn. Res., 13:2617--2654, 2012.

Digital Library

[9]

I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani. Crowdroid: behavior-based malware detection system for android. In Proc. 1st ACM workshop on Security and Privacy in Smartphones and Mobile devices, SPSM '11, pages 15--26, NY, USA, 2011. ACM.

Digital Library

[10]

C. Castillo and B. D. Davison. Adversarial web search. Foundations and Trends in Information Retrieval}, 4(5):377--486, May 2011.

Digital Library

[11]

J. G. Dutrisac and D. Skillicorn. Hiding clusters in adversarial settings. In IEEE Int'l Conf. on Intelligence and Security Informatics (ISI 2008), pages 185--187, 2008.

[12]

D. A. Forsyth and J. Ponce. Computer Vision: A Modern Approach. Prentice Hall, 2011.

Digital Library

[13]

M. Grosshans, C. Sawade, M. Brückner, and T. Scheffer. Bayesian games for adversarial regression problems. In J. Mach. Learn. Res. - Proc. 30th Int'l Conf. on Machine Learning (ICML), volume 28, 2013.

[14]

P. Haider, L. Chiarandini, and U. Brefeld. Discriminative clustering for market segmentation. In Proc. 18th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, KDD '12, pages 417--425, NY, USA, 2012. ACM.

Digital Library

[15]

M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. Journal of Intelligent Information Systems, 17(2--3):107--145, Dec. 2001.

Digital Library

[16]

S. Hanna, L. Huang, E. Wu, S. Li, C. Chen, and D. Song. Juxtapp: a scalable system for detecting code reuse among android applications. In Proc. 9th Int'l Conf. on Detection of Intrusions and Malware, and Vulnerability Assessment, DIMVA'12, pages 62--81, Berlin, Heidelberg, 2013. Springer-Verlag.

Digital Library

[17]

L. Huang, A. D. Joseph, B. Nelson, B. Rubinstein, and J. D. Tygar. Adversarial machine learning. In 4th ACM Workshop on Artificial Intelligence and Security (AISec 2011), pages 43--57, Chicago, IL, USA, 2011.

Digital Library

[18]

A. K. Jain and R. C. Dubes. Algorithms for clustering data. Prentice-Hall, Inc., NJ, USA, 1988.

Digital Library

[19]

M. Kloft and P. Laskov. Online anomaly detection under adversarial impact. In Proc. 13th Int'l Conf. on Artificial Intell. and Statistics, pages 405--412, 2010.

[20]

A. Kolcz and C. H. Teo. Feature weighting for improved classifier robustness. In Sixth Conf. on Email and Anti-Spam (CEAS), CA, USA, 2009.

[21]

Y. LeCun, L. Jackel, L. Bottou, A. Brunot, C. Cortes, J. Denker, H. Drucker, I. Guyon, U. Müller, E. Säckinger, P. Simard, and V. Vapnik. Comparison of learning algorithms for handwritten digit recognition. In Int'l Conf. on Artificial Neural Networks, pages 53--60, 1995.

[22]

M. Pavan and M. Pelillo. Dominant sets and pairwise clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence, 29(1):167--172, 2007.

Digital Library

[23]

R. Perdisci, D. Ariu, and G. Giacinto. Scalable fine-grained behavioral clustering of http-based malware. Computer Networks, 57(2):487 -- 500, 2013.

Digital Library

[24]

R. Perdisci, I. Corona, and G. Giacinto. Early detection of malicious flux networks via large-scale passive DNS traffic analysis. IEEE Trans. on Dependable and Secure Comp., 9(5):714--726, 2012.

Digital Library

[25]

F. Pouget, M. Dacier, J. Zimmerman, A. Clark, and G. Mohay. Internet attack knowledge discovery via clusters and cliques of attack traces. J. Information Assurance and Security, Vol. 1, Issue 1, March 2006.

[26]

G. Punj and D. W. Stewart. Cluster analysis in marketing research: Review and suggestions for application. J. Marketing Res., 20(2):134, May 1983.

[27]

D. B. Skillicorn. Adversarial knowledge discovery. IEEE Intelligent Systems, 24:54--61, 2009.

Digital Library

[28]

L. Spitzner. Honeypots: Tracking Hackers. Addison-Wesley Professional, 2002.

Digital Library

[29]

U. von Luxburg. Clustering stability: An overview. Foundations and Trends in Machine Learning, 2(3):235--274, 2010.

Digital Library

[30]

T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. In Proc. 1996 ACM SIGMOD Int'l Conf. on Management of data, SIGMOD '96, pages 103--114, NY, USA, 1996. ACM.

Digital Library

Cited By

Huang HZhou GZheng YQiu YWang AZhao QSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Adversarially robust deep multi-view clusteringProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692894(20526-20558)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692894
Zheng YLin CLyu XZhou XLi GWang T(2024)Robustness of Updatable Learning-based Index Advisors against Poisoning AttackProceedings of the ACM on Management of Data10.1145/36392652:1(1-26)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639265
Rass SKönig SAhmad SGoman M(2024)Metricizing the Euclidean Space Toward Desired Distance Relations in Point CloudsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.342024619(7304-7319)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3420246
Show More Cited By

Index Terms

Is data clustering in adversarial settings secure?
1. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Statistical graphics
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
  2. Systems security
    1. Operating systems security

Recommendations

Poisoning behavioral malware clustering
AISec '14: Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop

Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems. However, the suitability of clustering algorithms for ...
Adversarial machine learning
AISec '11: Proceedings of the 4th ACM workshop on Security and artificial intelligence

In this paper (expanded from an invited talk at AISEC 2010), we discuss an emerging field of study: adversarial machine learning---the study of effective machine learning techniques against an adversarial opponent. In this paper, we: give a taxonomy for ...
Can machine learning be secure?
ASIACCS '06: Proceedings of the 2006 ACM Symposium on Information, computer and communications security

Machine learning systems offer unparalled flexibility in dealing with evolving input in a variety of applications, such as intrusion detection systems and spam e-mail filtering. However, machine learning algorithms themselves can be a target of attack ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec '13: Proceedings of the 2013 ACM workshop on Artificial intelligence and security

November 2013

116 pages

ISBN:9781450324885

DOI:10.1145/2517312

General Chair:
Ahmad-Reza Sadeghi
TU Darmstadt, CASED, Intel ICRI-SC, Germany
,
Program Chairs:
Blaine Nelson
University of Potsdam, Germany
,
Christos Dimitrakakis
Chalmers University of Technology, Sweden
,
Elaine Shi
University of Maryland, College Park, MD, USA

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS'13

Sponsor:

SIGSAC

CCS'13: 2013 ACM SIGSAC Conference on Computer and Communications Security

November 4, 2013

Berlin, Germany

Acceptance Rates

AISec '13 Paper Acceptance Rate 10 of 17 submissions, 59%;

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

68
Total Citations
View Citations
600
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang HZhou GZheng YQiu YWang AZhao QSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Adversarially robust deep multi-view clusteringProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692894(20526-20558)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692894
Zheng YLin CLyu XZhou XLi GWang T(2024)Robustness of Updatable Learning-based Index Advisors against Poisoning AttackProceedings of the ACM on Management of Data10.1145/36392652:1(1-26)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639265
Rass SKönig SAhmad SGoman M(2024)Metricizing the Euclidean Space Toward Desired Distance Relations in Point CloudsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.342024619(7304-7319)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3420246
Mongardini ALa Morgia MJajodia SVincenzo Mancini LMei A(2024)DARD: Deceptive Approaches for Robust Defense Against IP TheftIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340243319(5591-5606)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3402433
Zong CFang RWang MQiu TZhang A(2024)Efficiently Manipulating Structural Graph Clustering Under Jaccard Similarity2024 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM59182.2024.00073(659-668)Online publication date: 9-Dec-2024
https://doi.org/10.1109/ICDM59182.2024.00073
Omari RKim JMontague P(2024) Adversarial Robustness on Image Classification With k -Means IEEE Access10.1109/ACCESS.2024.336551712(28853-28859)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3365517
Yang RKornaropoulos ECheng Y(2023)Algorithmic Complexity Attacks on Dynamic Learned IndexesProceedings of the VLDB Endowment10.14778/3636218.363623217:4(780-793)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.14778/3636218.3636232
Xu YWei XDai PCao X(2023)A2SC: Adversarial Attacks on Subspace ClusteringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358709719:6(1-23)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3587097
Liang JZhang XShang YGuo SLi C(2023)Clean-label Poisoning Attack against Fake News Detection Models2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386777(3614-3623)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386777
Rios Insua DNaveiro RGallego VPoulos J(2023)Adversarial Machine Learning: Bayesian PerspectivesJournal of the American Statistical Association10.1080/01621459.2023.2183129118:543(2195-2206)Online publication date: 31-Mar-2023
https://doi.org/10.1080/01621459.2023.2183129
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten