Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

A new method for similarity indexing of market basket data

Published: 01 June 1999 Publication History

Abstract

In recent years, many data mining methods have been proposed for finding useful and structured information from market basket data. The association rule model was recently proposed in order to discover useful patterns and dependencies in such data. This paper discusses a method for indexing market basket data efficiently for similarity search. The technique is likely to be very useful in applications which utilize the similarity in customer buying behavior in order to make peer recommendations. We propose an index called the signature table, which is very flexible in supporting a wide range of similarity functions. The construction of the index structure is independent of the similarity function, which can be specified at query time. The resulting similarity search algorithm shows excellent scalability with increasing memory availability and database size.

References

[1]
C. C. Aggarwal, J. L. Wolf, P. S. Yu, M. Epelman. The S-Tree: An,efficient index for multi-dimensional objects. Proceedings of the International Symposium on Spatial Databases. pages 350-370, Berlin, Germany, July }.997.
[2]
R. Agrawal, T. Imielinski, A. Swami. Mining Association Ru:{es between sets of items in very large databases. Proceedings of the A CM SIGMOD Conference, pages 207-216, 1993.
[3]
R. Agrawal, R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th VI, D B Conference, pages 478-499, 1994.
[4]
S. Arya. Nearest Neighbor Searching and Applications. Ph.D. Thesis, University of Maryland, College Park, MD, 1995.
[5]
N. Beckman, H.-P. Kriegel, R. Schneider, B. Seeger. The R*-Tree: An Efficient and Robust Method for Points and Rectangles. Proceedings of the A CM SIGMOD Conference. 322-331, 1990.
[6]
S. Berchtold, C. B5hm, H.-P. Kriegel. The Pyramid Technique: Towards Breaking the Curse of Dimensionality. Proceedings of the A CM SIGMOD Conference, pages 1.42-153, June 1998.
[7]
S. Berchtold, B. Ertl, D. Keim, H.-P. Kriegel., T. Seidl. Fast Nearest Neighbor Search in High Dimensional Spaces. Proceedings of the I4th International Conference on Data Engineering, Orlando, 1998.
[8]
C. Faloutsos. Signature Files. Information Retrieval: Data Structures and Algorithms, W. B. Frakes, R. Baez.~-Yates (Ed.), pages 44-65, 1992.
[9]
C. Faloutsos, R. Chan. Fast Text Access Methods for Optical and Large Magnetic Disks" Designs and Performance Comparison. Proceedings of the 1jth VI, DB Conference, pages 280-293, 1988.
[10]
C. Faloutsos, S. Christodoulakis. Description and Performance Analysis of Signature File Methods. A CM TOOIS, 5 (3), pages 237-257, 1987.
[11]
W. S. Frakes, R. Saeza-Yates (Editors). Information Retrieval" Data Structures and Algorithms. Prentice Hall PTR, New Jersey, 1992.
[12]
A. Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. Proceedings of the A CM SIGMOD Con/erence, 47-57, 1984.
[13]
K. Hinrichs, :l. Nievergelt. The Grid File" A r Data Structure to Support Proximity Queries on Spa.tiM Objects. Proceedings of the WG'83, 100-113, 1!}83.
[14]
R. Jain, D. A. White. Similarity Indexing: Algorithms and Performance. Proceedings o/SPIE Storage and Retrieval for Image and Video Databases IV, Vol. 2670, San 3ose, CA, pages 62-75, 1996.
[15]
N. Katayama, S. Satoh. The SR-Tree: An Index Structure for High Dimensional Nearest Neighbor Queries. Proceedings of the A CM SIGMOD Confer. ence. pages 369-380, 1997.
[16]
K.-I. Lin, It. V. Jagadish, C. Faloutsos. The TV- tree: An Index Structure for High Dimensional. Data. VLDB Journal, 3 (4), pages 517-542, 1992.
[17]
N. Roussopoulos, S. Kelley, F. Vincent. Nearest Neighbor Queries. Proceedings of the A CM SIG- MOD Conference, pages 71-79, 1995.
[18]
G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Informatior.L by Computer. Addison-Wesley Publishing.
[19]
R. Sibson. SLINK: An optimally efficient algorithn: for the single link cluster method. Computer Journal, Volume 16, pages 30-34, 1973.
[20]
T. Seidl, H.-P. Krieget. Optimal Multi-Step k- Nearest Neighbor Search. Proceedings of the A.CM SIGMOD Conference, pages 154-165, 1998.
[21]
T. Seidl, H.-P. Kriegel. Efficient User-Adaptable Similarity Search in Large Multimedia Databases. Proceedings of the 23rd VLDB Conference, 1997.
[22]
T. Sellis, N. Roussopoulos, C. Faloutsos. The R-{- Tree: A Dynamic Index for Multidimensional Objects. Proceedings of the 13th VLDB Conference, pages 507-518, 1987.
[23]
D. A. White, R. J ain. Similarity Indexing with the SS-Tree. Proceedings of the l~h International Conference on Data Engineering, New Orleans, USA, pages 516-523, February 1996.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 28, Issue 2
June 1999
599 pages
ISSN:0163-5808
DOI:10.1145/304181
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
    June 1999
    604 pages
    ISBN:1581130848
    DOI:10.1145/304182
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1999
Published in SIGMOD Volume 28, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)12
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Clustering and Bayesian NetworksHandbook of Research on Big Data Clustering and Machine Learning10.4018/978-1-7998-0106-1.ch004(50-73)Online publication date: 2020
  • (2012)Similarity search in sensor networks using semantic-based cachingJournal of Network and Computer Applications10.1016/j.jnca.2011.05.00835:2(577-583)Online publication date: 1-Mar-2012
  • (2012)Efficient processing of probabilistic set-containment queries on uncertain set-valued dataInformation Sciences: an International Journal10.1016/j.ins.2012.02.004196(97-117)Online publication date: 1-Aug-2012
  • (2012)Efficient bitmap-based indexing of time-based interval sequencesInformation Sciences: an International Journal10.1016/j.ins.2011.08.013194(38-56)Online publication date: 1-Jul-2012
  • (2012)A strategy-oriented operation module for recommender systems in E-commerceComputers and Operations Research10.1016/j.cor.2010.03.01139:8(1837-1849)Online publication date: 1-Aug-2012
  • (2010)On indexing error-tolerant set containmentProceedings of the 2010 ACM SIGMOD International Conference on Management of data10.1145/1807167.1807267(927-938)Online publication date: 6-Jun-2010
  • (2009)Relevant estimation among fields using field association wordsInternational Journal of Computer Applications in Technology10.1504/IJCAT.2009.02660535:2/3/4(296-306)Online publication date: 1-Jun-2009
  • (2002)Soft Computing in E-CommerceAdvances in Soft Computing — AFSS 200210.1007/3-540-45631-7_61(450-458)Online publication date: 29-Jan-2002
  • (2019)Product Prediction and Recommendation in Sustainable E-Commerce Using Association Rule Mining and K-Means ClusteringAdvanced Multi-Criteria Decision Making for Addressing Complex Sustainability Issues10.4018/978-1-5225-8579-4.ch012(254-266)Online publication date: 2019
  • (2014)A novel approach for sequential pattern mining by using genetic algorithm2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)10.1109/ICCICCT.2014.6992971(284-288)Online publication date: Jul-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media