Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2506583.2506636acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
tutorial

A Privacy Preserving Markov Model for Sequence Classification

Published: 22 September 2013 Publication History

Abstract

Sequence classification has attracted much interest in recent years due to its difference from the traditional classification tasks, as well as its wide applications in many fields, such as bioinformatics. As it is not easy to define specific "features" for sequence data as in traditional feature based classifications, many methods have been developed to utilize the particular characteristics of sequences. One common way of classifying sequence data is to use probabilistic generative models, such as the Markov model, to learn the probability distribution of sequences in each class.
One thing that should be considered in the research of sequence classification is the privacy issue. In many cases, especially in the bioinformatics field, the sequence data contains sensitive information which obstructs the mining of data. For example, the DNA and protein sequences of individuals are highly sensitive and should not be released without protection. But in the real world, data is usually distributed among different parties and for the parties, training only with their own data may not give them strong enough models. This raises a problem when some parties, each holding a set of sequences, want to learn the Markov models on the union of their data, but do not want to reveal their data to others due to the privacy concerns. In this paper, we address this problem and propose a method to train the Markov models, from the ones of the first order to the ones of order k where k > 1, on sequence data distributed among parties without revealing each party's private sequences to others. We apply the homomorphic encryption to protect the sensitive information.

References

[1]
R. Agrawal and R. Srikant. Privacy-Preserving Data Mining. 2000.
[2]
C. Andorf, A. Silvescu, D. Dobbs, and V. Honavar. Learning classifiers for assigning protein sequences to gene ontology functional families. In Proceedings of the Fifth International Conference On Knowledge Based Computer Systems (KBCS), 2004.
[3]
D. Boneh. The Decision Diffie-Hellman Problem, volume 1423, pages 48--63. Springer-Verlag, 1998.
[4]
K. Chen and L. Liu. Privacy preserving data classification with rotation perturbation. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM '05, pages 589--592, Washington, DC, USA, 2005. IEEE Computer Society.
[5]
C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu. Tools for privacy preserving distributed data mining. ACM SIGKDD Explorations Newsletter, 4(2):28--34, 2002.
[6]
I. Damgard, M. Fitzi, E. Kiltz, J. B. Nielsen, and T. Toft. Unconditionally Secure Constant-Rounds Multi-party Computation for Equality, Comparison, Bits and Exponentiation, volume 3876, pages 285--304. Springer, 2006.
[7]
I. Damgard, M. Geisler, and M. Kroigard. Homomorphic encryption and secure comparison. International Journal of Applied Cryptography, 1:22, 2008.
[8]
W. Du and M. Atallah. Privacy-Preserving Cooperative Statistical Analysis, page 102. IEEE Computer Society, 2001.
[9]
W. Du, Y. Y. S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification, volume 233. Lake Buena Vista, Florida, 2004.
[10]
W. Du and Z. Zhan. Building decision tree classifier on private data. Reproduction, pages 1--8, 2002.
[11]
T. ElGamal. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Transactions on Information Theory, 31(4):469--472, 1985.
[12]
B. Goethals, S. Laur, H. Lipmaa, and T. Mielik?inen. On private scalar product computation for privacy-preserving data mining. Science, 3506:104--120, 2004.
[13]
O. Goldreich. Foundations of Cryptography, volume 1. Cambridge University Press, 2001.
[14]
S. Han and W. K. Ng. Privacy-preserving linear fisher discriminant analysis. In Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining, PAKDD'08, pages 136--147, Berlin, Heidelberg, 2008. Springer-Verlag.
[15]
S. Han, W. K. Ng, and P. S. Yu. Privacy-preserving singular value decomposition. 2009 IEEE 25th International Conference on Data Engineering, pages 1267--1270, 2009.
[16]
G. R. Heer. A bootstrap procedure to preserve statistical confidentiality in contingency tables. In Proceedings of the International Seminar on Statistical ConïňĄdentiality, pages 261--271, 1993.
[17]
Z. Huang, W. Du, and B. Chen. Deriving private information from randomized data. Proceedings of the 2005 ACM SIGMOD international conference on Management of data SIGMOD 05, page 37, 2005.
[18]
G. Jagannathan and R. N. Wright. Privacy-preserving distributed k-means clustering over arbitrarily partitioned data, pages 593--599. ACM, 2005.
[19]
S. Jha, L. Kruger, and V. Shmatikov. Towards practical privacy for genomic computation. 2008 IEEE Symposium on Security and Privacy sp 2008, pages:216--230, 2008.
[20]
M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, 16(9):1026--1037, 2004.
[21]
A. Kumar and L. Cowen. Augmented training of hidden markov models to recognize remote homologs via simulated evolution. Bioinformatics, 25(13):1602--1608, 2009.
[22]
P. Lin and K. S. Candan. Access-private outsourcing of markov chain and random walk based data analysis applications. In Proceedings of the 22nd International Conference on Data Engineering Workshops, 2006.
[23]
Y. Lindell and B. Pinkas. Privacy preserving data mining. Journal of Cryptology, 15(3):177--206, 2002.
[24]
A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. Scop: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 1995.
[25]
E. E. Oren, C. Tamerler, D. Sahin, M. Hnilova, U. O. S. Seker, M. Sarikaya, and R. Samudrala. A novel knowledge-based approach to design inorganic-binding peptides. Bioinformatics, 23(21):2816--2822, 2007.
[26]
P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. Computer, 1592:223--238, 1999.
[27]
P. Smaragdis and M. Shashanka. A framework for secure speech recognition. IEEE Transactions On Audio Speech And Language Processing, 15(4):1404--1413, 2007.
[28]
Z. Teng and W. Du. A hybrid multi-group privacy-preserving approach for building decision trees. In Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining, PAKDD'07, pages 296--307, Berlin, Heidelberg, 2007. Springer-Verlag.
[29]
J. Vaidya and C. Clifton. Privacy-preserving outlier detection, volume 41, pages 233--240. IEEE, 2004.
[30]
J. Vaidya, W. Lafayette, and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. Security, pages 206--215, 2003.
[31]
L. Wan, W. K. Ng, S. Han, and V. C. S. Lee. Privacy-preservation for gradient descent methods. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining KDD 07, page 775, 2007.
[32]
S. Zhong. Privacy-preserving algorithms for distributed mining of frequent itemsets. Information Sciences, 177(2):490--503, 2007.

Cited By

View all
  • (2022)Privacy Preserving Systems Adoptable in Cloud Image Retrieval2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS)10.1109/ICAISS55157.2022.10010857(01-07)Online publication date: 24-Nov-2022
  • (2019)HCFContext: Smartphone Context Inference via Sequential History-based Collaborative Filtering2019 IEEE International Conference on Pervasive Computing and Communications (PerCom10.1109/PERCOM.2019.8767396(1-10)Online publication date: Mar-2019
  • (2018)A systematic review on intrusion detection based on the Hidden Markov ModelStatistical Analysis and Data Mining: The ASA Data Science Journal10.1002/sam.1137711:3(111-134)Online publication date: 27-Apr-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB'13: Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
September 2013
987 pages
ISBN:9781450324342
DOI:10.1145/2506583
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Security
  2. Markov Model
  3. Sequence Classification

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

BCB'13
Sponsor:
BCB'13: ACM-BCB2013
September 22 - 25, 2013
Wshington DC, USA

Acceptance Rates

BCB'13 Paper Acceptance Rate 43 of 148 submissions, 29%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Privacy Preserving Systems Adoptable in Cloud Image Retrieval2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS)10.1109/ICAISS55157.2022.10010857(01-07)Online publication date: 24-Nov-2022
  • (2019)HCFContext: Smartphone Context Inference via Sequential History-based Collaborative Filtering2019 IEEE International Conference on Pervasive Computing and Communications (PerCom10.1109/PERCOM.2019.8767396(1-10)Online publication date: Mar-2019
  • (2018)A systematic review on intrusion detection based on the Hidden Markov ModelStatistical Analysis and Data Mining: The ASA Data Science Journal10.1002/sam.1137711:3(111-134)Online publication date: 27-Apr-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media