Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

ρ-uncertainty: inference-proof transaction anonymization

Published: 01 September 2010 Publication History

Abstract

The publication of transaction data, such as market basket data, medical records, and query logs, serves the public benefit. Mining such data allows for the derivation of association rules that connect certain items to others with measurable confidence. Still, this type of data analysis poses a privacy threat; an adversary having partial information on a person's behavior may confidently associate that person to an item deemed to be sensitive. Ideally, an anonymization of such data should lead to an inference-proof version that prevents the association of individuals to sensitive items, while otherwise allowing for truthful associations to be derived. Original approaches to this problem were based on value perturbation, damaging data integrity. Recently, value generalization has been proposed as an alternative; still, approaches based on it have assumed either that all items are equally sensitive, or that some are sensitive and can be known to an adversary only by association, while others are non-sensitive and can be known directly. Yet in reality there is a distinction between sensitive and non-sensitive items, but an adversary may possess information on any of them. Most critically, no antecedent method aims at a clear inference-proof privacy guarantee. In this paper, we propose ρ-uncertainty, the first, to our knowledge, privacy concept that inherently safeguards against sensitive associations without constraining the nature of an adversary's knowledge and without falsifying data. The problem of achieving ρ-uncertainty with low information loss is challenging because it is natural. A trivial solution is to suppress all sensitive items. We develop more sophisticated schemes. In a broad experimental study, we show that the problem is solved non-trivially by a technique that combines generalization and suppression, which also achieves favorable results compared to a baseline perturbation-based scheme.

References

[1]
E. Adar, D. S. Weld, B. N. Bershad, and S. S. Gribble. Why we search: visualizing and predicting user behavior. In WWW, 2007.
[2]
R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.
[3]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, 1994.
[4]
S. Agrawal, J. R. Haritsa, and B. A. Prakash. FRAPP: A framework for high-accuracy privacy-preserving mining. Data Min. Knowl. Discov., 18(1):101--139, 2009.
[5]
A. Amir, R. Feldman, and R. Kashi. A new and versatile method for association generation. Information Systems, 22(6--7):333--347, 1997.
[6]
R. J. Bayardo, Jr. Efficiently mining long patterns from databases. In SIGMOD, 1998.
[7]
H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query expansion using query logs. In WWW, 2002.
[8]
E. Dasseni, V. S. Verykios, A. K. Elmagarmid, and E. Bertino. Hiding association rules by using confidence and support. In IHW, 2001.
[9]
A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, 2003.
[10]
A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In KDD, 2002.
[11]
G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis. A framework for efficient data anonymization under privacy and accuracy constraints. ACM TODS, 34(2):1--47, 2009.
[12]
G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE, 2008.
[13]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, 2000.
[14]
Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. PVLDB, 2(1):934--945, 2009.
[15]
V. S. Iyengar. Transforming data to satisfy privacy constraints. In KDD, 2002.
[16]
D. Kifer. Attacks on privacy and deFinetti's theorem. In SIGMOD, 2009.
[17]
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 1(1):3, 2007.
[18]
S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In VLDB, 2002.
[19]
P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 13(6):1010--1027, 2001.
[20]
A. Savasere, E. Omiecinski, and S. B. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB, 1995.
[21]
Y. Saygin, V. S. Verykios, and C. Clifton. Using unknowns to prevent discovery of association rules. SIGMOD Rec., 30(4):45--54, 2001.
[22]
R. Srikant and R. Agrawal. Mining generalized association rules. In VLDB, 1995.
[23]
M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. PVLDB, 1(1):115--125, 2008.
[24]
V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni. Association rule hiding. IEEE TKDE, 16(4):434--447, 2004.
[25]
K. Wang, Y. Xu, A. W. C. Fu, and R. C. W. Wong. FF-anonymity: When quasi-identifiers are missing. In ICDE, 2009.
[26]
R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, 2007.
[27]
Y.-H. Wu, C.-M. Chiang, and A. L. Chen. Hiding sensitive association rules with limited side effects. IEEE TKDE, 19(1):29--42, 2007.
[28]
X. Xiao and Y. Tao. Anatomy: simple and effective privacy preservation. In VLDB, 2006.
[29]
X. Xiao and Y. Tao. Personalized privacy preservation. In SIGMOD, 2006.
[30]
Y. Xu, K. Wang, A. W.-C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In KDD, 2008.
[31]
G. Yang. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In KDD, 2004.
[32]
M. J. Zaki. Scalable algorithms for association mining. IEEE TKDE, 12(3):372--390, 2000.
[33]
Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In KDD, 2001.

Cited By

View all
  • (2022)Example-based spatial pattern matchingProceedings of the VLDB Endowment10.14778/3551793.355181515:11(2572-2584)Online publication date: 1-Jul-2022
  • (2022)Learning citywide patterns of life from trajectory monitoringProceedings of the 30th International Conference on Advances in Geographic Information Systems10.1145/3557915.3560978(1-12)Online publication date: 1-Nov-2022
  • (2022)TrajFormer: Efficient Trajectory Classification with TransformersProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557481(1229-1237)Online publication date: 17-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
September 2010
1658 pages

Publisher

VLDB Endowment

Publication History

Published: 01 September 2010
Published in PVLDB Volume 3, Issue 1-2

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Example-based spatial pattern matchingProceedings of the VLDB Endowment10.14778/3551793.355181515:11(2572-2584)Online publication date: 1-Jul-2022
  • (2022)Learning citywide patterns of life from trajectory monitoringProceedings of the 30th International Conference on Advances in Geographic Information Systems10.1145/3557915.3560978(1-12)Online publication date: 1-Nov-2022
  • (2022)TrajFormer: Efficient Trajectory Classification with TransformersProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557481(1229-1237)Online publication date: 17-Oct-2022
  • (2021)Clustering-based Location Authority Deep Model in the Next Point-of-Interest RecommendationIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3493943(335-342)Online publication date: 14-Dec-2021
  • (2020)Privacy- and Context-aware Release of Trajectory DataACM Transactions on Spatial Algorithms and Systems10.1145/33634496:1(1-25)Online publication date: 30-Jan-2020
  • (2019)Hiding Personal Tendency in Set-valued Database PublicationProceedings of the 2nd International Conference on Information Science and Systems10.1145/3322645.3322673(284-289)Online publication date: 16-Mar-2019
  • (2019)Two privacy-preserving approaches for data publishing with identity reservationKnowledge and Information Systems10.1007/s10115-018-1237-360:2(1039-1080)Online publication date: 1-Aug-2019
  • (2018)T-Closeness SlicingINFORMS Journal on Computing10.1287/ijoc.2017.079130:3(438-453)Online publication date: 1-Aug-2018
  • (2018)Data sanitization in association rule miningExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.10.04896:C(406-426)Online publication date: 15-Apr-2018
  • (2018)A graph-based multifold model for anonymizing data with attributes of multiple typesComputers and Security10.1016/j.cose.2017.09.00372:C(122-135)Online publication date: 1-Jan-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media