Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1321440.1321473acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Discovering interesting usage patterns in text collections: integrating text mining with visualization

Published: 06 November 2007 Publication History

Abstract

This paper addresses the problem of making text mining results more comprehensible to humanities scholars, journalists, intelligence analysts, and other researchers, in order to support the analysis of text collections. Our system, FeatureLens1, visualizes a text collection at several levels of granularity and enables users to explore interesting text patterns. The current implementation focuses on frequent itemsets of n-grams, as they capture the repetition of exact or similar expressions in the collection. Users can find meaningful co-occurrences of text patterns by visualizing them within and across documents in the collection. This also permits users to identify the temporal evolution of usage such as increasing, decreasing or sudden appearance of text patterns. The interface could be used to explore other text features as well. Initial studies suggest that FeatureLens helped a literary scholar and 8 users generate new hypotheses and interesting insights using 2 text collections.

References

[1]
Agrawal, R., and R. Srikant, Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), 487--499. 1994.
[2]
Church, K. W., and Helfman, J. I., Dotplot: A Program for Exploring Self-Similarity in Millions of Lines of Text and Code, In Proc. of the 24th Symposium on the Interface, Computing Science and Statistics V24, 58--67. 1992.
[3]
Eick, S. G. and Steffen, J. L. and Sumner Jr, E. E., Seesoft - A Tool for Visualizing Line Oriented Software Statistics, In IEEE Transactions on Software Engineering, Vol 18, No 11, 957--968. 1992.
[4]
Fekete, J. and Dufournaud, N., Compus: visualization and analysis of structured documents for understanding social life in the 16th century. In Proc. of the Fifth ACM Conference on Digital Libraries, 47--55. 2000.
[5]
Frank, A. C., Amiri, H., Andersson, S., Genome Deterioration: loss of repeated sequences and accumulation of junk DNA. Genetica, Vol. 115, No. 1, 1--12. 2002.
[6]
Kurtz, S & Schleiermacher, C. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics 15, 426--427. 1999.
[7]
G. Lommerse, F. Nossin, L. Voinea, A. Telea, The Visual Code Navigator: An Interactive Toolset for Source Code Investigation. In Proc. IEEE InfoVis'05, 24--31. 2005.
[8]
NY Times: The State of the Union in Words. http://www.nytimes.com/ref/washington/20070123_STATEOFUNION.html
[9]
Paley, W. B. TextArc: Showing Word Frequency and Distribution in Text. Poster presented at IEEE Symposium on Information Visualization. 2002.
[10]
J. Pei and J. Han and R. Mao, CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets, ACM SIGMOD, Workshop on Research Issues in Data Mining and Knowledge Discovery, 21--30. 2000.
[11]
Plaisant, C. and Rose, J. and Yu, B. and Auvil, L. and Kirschenbaum, M. and Smith, M. and Clement, T. and Lord, G., Exploring Erotics in Emily Dickinson's Correspondence with Text Mining and Visual Interfaces, in Proc. of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, 141--150. 2006.
[12]
Data to Knowledge (D2K) and Text to knowledge (T2K), NCSA. http://alg.ncsa.uiuc.edu/do/tools.
[13]
Thomas, J. J. and Cook, K. A. (eds.), Illuminating the Path: Research and Development Agenda for Visual Analytics, IEEE. 2005.
[14]
Veerasamy, A. and Belkin, N. Evaluation of a Tool for Visualization of Information Retrieval Results, in Proc. of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, 85--92. 1996.
[15]
Wattenberg, M., Arc diagrams: visualizing structure in strings. In proc IEEE Symposium on Information Visualization 2002, 110--116. 2002.
[16]
Wise, J. A. and Thomas, J. J. and Pennock, K. and Lantrip, D. and Pottier, M. and Schur, A. and Crow, V., Visualizing the non-visual: spatial analysis and interaction with information from text documents, In proc IEEE Symposium on Information Visualization 1995, 51--58. 1995.

Cited By

View all
  • (2024)Supporting Exploration of Women’s Print History Project Data via Interactively Constructing Networks of InterestProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656697(1-9)Online publication date: 3-Jun-2024
  • (2024)Interactive Visualization on Large High‐Resolution Displays: A SurveyComputer Graphics Forum10.1111/cgf.1500143:6Online publication date: 30-Apr-2024
  • (2023)Machine Learning-Enhanced Text Mining as a Support Tool for Research on Climate Change5G, Artificial Intelligence, and Next Generation Internet of Things10.4018/978-1-6684-8634-4.ch004(86-122)Online publication date: 30-Jun-2023
  • Show More Cited By

Index Terms

  1. Discovering interesting usage patterns in text collections: integrating text mining with visualization

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
      November 2007
      1048 pages
      ISBN:9781595938039
      DOI:10.1145/1321440
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 November 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. digital humanities
      2. frequent closed itemsets
      3. n-grams
      4. text mining
      5. user interface

      Qualifiers

      • Research-article

      Conference

      CIKM07

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)46
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Supporting Exploration of Women’s Print History Project Data via Interactively Constructing Networks of InterestProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656697(1-9)Online publication date: 3-Jun-2024
      • (2024)Interactive Visualization on Large High‐Resolution Displays: A SurveyComputer Graphics Forum10.1111/cgf.1500143:6Online publication date: 30-Apr-2024
      • (2023)Machine Learning-Enhanced Text Mining as a Support Tool for Research on Climate Change5G, Artificial Intelligence, and Next Generation Internet of Things10.4018/978-1-6684-8634-4.ch004(86-122)Online publication date: 30-Jun-2023
      • (2022)DeHumor: Visual Analytics for Decomposing HumorIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.309770928:12(4609-4623)Online publication date: 1-Dec-2022
      • (2021)On Building and Evaluating a Medical Records Exploration Interface Using Text Mining TechniquesEntropy10.3390/e2310127523:10(1275)Online publication date: 29-Sep-2021
      • (2021)Automating Key Phrase Extraction from Fault Logs to Support Post-Inspection Repair of Software RequirementsProceedings of the 14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)10.1145/3452383.3452386(1-12)Online publication date: 25-Feb-2021
      • (2021)SumRe: Design and Evaluation of a Gist‐based Summary Visualization for Incident Reports TriageComputer Graphics Forum10.1111/cgf.1430540:3(263-274)Online publication date: 29-Jun-2021
      • (2020)Visualization of repeated patterns in multivariate discrete sequencesProceedings of the 12th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1109/ASONAM49781.2020.9381316(862-869)Online publication date: 7-Dec-2020
      • (2018)Improving Search and Navigation User Experience by Making Use of Social DataInformation Retrieval and Management10.4018/978-1-5225-5191-1.ch095(2132-2156)Online publication date: 2018
      • (2018)A Novel Interface for the Graphical Analysis of Music Practice BehaviorsFrontiers in Psychology10.3389/fpsyg.2018.022929Online publication date: 26-Nov-2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media