Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Seeing beyond reading: a survey on visual text analytics

Published: 01 November 2012 Publication History

Abstract

We review recent visualization techniques aimed at supporting tasks that require the analysis of text documents, from approaches targeted at visually summarizing the relevant content of a single document to those aimed at assisting exploratory investigation of whole collections of documents.Techniques are organized considering their target input material—either single texts or collections of texts—and their focus, which may be at displaying content, emphasizing relevant relationships, highlighting the temporal evolution of a document or collection, or helping users to handle results from a query posed to a search engine.We describe the approaches adopted by distinct techniques and briefly review the strategies they employ to obtain meaningful text models, discuss how they extract the information required to produce representative visualizations, the tasks they intend to support and the interaction issues involved, and strengths and limitations. Finally, we show a summary of techniques, highlighting their goals and distinguishing characteristics. We also briefly discuss some open problems and research directions in the fields of visual text mining and text analytics. © 2012 Wiley Periodicals, Inc. © 2012 Wiley Periodicals, Inc.

References

[1]
Salton G, Wong A, Yang CS. A vector space model for automatic indexing. ACM Commun 1975, 18: 613–620.
[2]
Steinbock D. Tag Crowd (Home page). Available at: http://tagcrowd.com/. (Acessed November 7, 2011).
[3]
Viegas FB, Wattenberg M, Feinberg J. Participatory visualization with wordle. IEEE Trans Vis Comput Graph 2009, 15: 1137–1144.
[4]
Feinberg J. Wordle (Home page). Available at: http://www.wordle.net/.
[5]
Seifert C, Kump B, Kienreich W, Granitzer G, Granitzer M. On the beauty and usability of tag clouds. In: International Conference Information Visualisation. Washington, D.C.: IEEE Computer Society; 2008, 17–25.
[6]
Kyle K, Bongshin L, Bohyoung K, Jinwook S. ManiWordle: Providing flexible control over Wordle. IEEE Trans Vis Comput Graph 2010, 16: 1190–1197.
[7]
Hassan-Montero Y, Herrero-Solana V. Improving tag-clouds as visual information retrieval interfaces. In: International Conference on Multidisciplinary Information Sciences and Technologies. Mérida, Spain: Open Institute of Knowledge; 2006.
[8]
Keim DA, Oelke D. Literature fingerprinting: a new method for visual literary analysis. In: IEEE Symposium on Visual Analytics Science and Technology. Washington, D.C.: IEEE Computer Society; 2007, 115–122.
[9]
Wattenberg M, Viégas FB. The Word Tree, an interactive visual concordance. IEEE Trans Vis Comput Graph 2008, 14: 1221–1228.
[10]
Collins C, Carpendale S, Penn G. DocuBurst: visualizing document content using language structure. Comput Graph Forum. 2009, 28: 1039–1046.
[11]
van Ham F, Wattenberg M, Viegas FB.; Mapping text with Phrase Nets. IEEE Trans Vis Comput Graph 2009, 15:1169–1176.
[12]
Rusu D, Fortuna B, Mladenic D, Grobelnik M, Sipos R. Document visualizationbased on semantic graphs. In: International Conference Information Visualisation. Washington, D.C.: IEEE Computer Society; 2009, 292–297.
[13]
Miller NE, Chung WP, Brewster M, Foote H. Topic Islands—a wavelet-based text visualization system. In: IEEE Conference on Visualization. Los Alamitos, CA: IEEE Computer Society; 1998, 189–196.
[14]
Mao Y, Dillon J, Lebanon G. Sequential document visualization. IEEE Trans Vis Comput Graph 2007, 13: 1208–1215.
[15]
Viégas FB, Wattenberg M, Dave K. Studying cooperation and conflict between authors with history flow visualizations. In: Conference on Human factors in Computing Systems. New York: ACM; 2004, 575–582.
[16]
Becks A. Benefits of document maps for text access in knowledge management: a comparative study. In: Proceedings of the ACM Symposium on Applied Computing. New York: ACM; 2002, 621–626.
[17]
Skupin A. A cartographic approach to visualizing conference abstracts. IEEE Comput Graph Appl 2002, 22: 50–58.
[18]
Wise JA. The ecological approach to text visualization. J Am Soc Inf Sci 1999, 50: 1224–1233.
[19]
PNNL. IN-SPIRETM Visual document analysis. Pacific Northwest National Laboratory (PNNL). Available at: http://in-spire.pnl.gov/. (Accessed October 10, 2011).
[20]
Paulovich FV, Nonato LG, Minghim R, Levkowitz H. Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans Vis Comput Graph 2008, 14: 564–575.
[21]
Eler DM, Paulovich FV, de Oliveira MCF, Minghim R. Topic-based coordination for visual analysis of evolving document collections. In: International Conference on Information Visualisation. Washington, D.C.: IEEE Computer Society; 2009, 149–155.
[22]
Lopes AA, Pinho R, Paulovich FV, Minghim R. Visual text mining using association rules. Comput Graph 2007, 31: 316–326.
[23]
Andrews K, Kienreich W, Sabol V, Becker J, Droschl G, Kappe F, Granitzer M, Auer P, Tochtermann K. The InfoSky visual explorer: exploiting hierarchical structure and document similarities. Inf Vis 2002, 1: 166–181.
[24]
Paulovich FV, Minghim R. HiPP: A novel hierarchical point placement strategy and its application to the exploration of document collections. IEEE Tran Vis Comput Graph 2008, 14: 1229–1236.
[25]
Börner K, Chen C, Boyack KW. Visualizing knowledge domains. Annu Rev Inf Sci Technol 2003, 37: 179–255.
[26]
Strobelt H, Oelke D, Rohrdantz C, Stoffel A, Keim DA, Deussen O. Document Cards: a top trumps visualization for documents. IEEE Trans Vis Comput Graph 2009, 15: 1145–1152.
[27]
Lee B, Riche NH, Karlson AK, Carpendale S. SparkClouds: Visualizing trends in tag clouds. IEEE Trans Vis Comput Graph 2010, 16: 1182–1189.
[28]
Cui W, Wu Y, Liu S, Wei F, Zhou MX, Qu H. Context-preserving, dynamic word cloud visualization. IEEE Comput Graph Appl 2010, 30: 42–53.
[29]
Havre S, Hetzler E, Whitney P, Nowell L. ThemeRiver: Visualizing thematic changes in large document collections. IEEE Trans Vis Comput Graph 2002, 8: 9–20.
[30]
Wei F, Liu S, Song Y, Pan S, Zhou MX, Qian W, Shi L, Tan L, Zhang Q. TIARA: a visual exploratory text analytic system. In: ACM International Conference on Knowledge Discovery and Data Mining. New York: ACM; 2010, 153–162.
[31]
Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res 2003, 3: 993–1022.
[32]
Cui W, Liu S, Tan L, Shi C, Song Y, Gao Z, Qu H, Tong X. TextFlow: towards better understanding of evolving topics in text. IEEE Trans Vis Comput Graph 2011, 17: 2412–2421.
[33]
Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical Dirichlet processes. J Am Stat Assoc 2004, 101: 1566–1581.
[34]
Luo D, Yang J, Krstajic M, Ribarsky William, Keim DA. EventRiver: visually exploring text collections with temporal references. IEEE Trans Vis Comput Graph 2012, 18: 93–105.
[35]
Leydesdorff L, Schank T. Dynamic animations of journal maps: Indicators of structural changes and interdisciplinary developments. J Am Soc Inf Sci Technol 2008, 59: 1810–1818.
[36]
Alencar AB, Paulovich FV, Börner K, Oliveira MCF. Time-aware visualization of document collections. In: ACM Symposium on Applied Computing - Multimedia and Visualization Track. Riva del Garda, Italy: ACM; 2012, 997–1004.
[37]
de Pinho R, Oliveira MCF, Lopes AA. An incremental space to visualize dynamic data sets. Multimedia Tools Appl 2010, 50: 533–562.
[38]
Alsakran J, Chen Y, Luo D, Zhao Y, Yang J, Dou W, Liu S. Real-time visualization of streaming text with a force-based dynamic system. IEEE Comput Graph Appl 2012, 32: 34–45.
[39]
Sci2 Team. Science of Science (Sci2) Tool. Indiana University and SciTech Strategies. Available at: http://sci2.cns.iu.edu.
[40]
Herr BW, Duhon RJ, Börner K, Hardy EF, Penumarthy S. 113 Years of physical review: using flow maps to show temporal and topical citation patterns. In: International Conference on Information Visualisation. Los Alamitos, CA: IEEE Computer Society; 2008, 421–426.
[41]
Chen C. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J Am Soc Inf Sci Technol 2006, 57: 359–377.
[42]
Dunne C, Shneiderman B, Gove R, Klavans J, Dorr B. Rapid understanding of scientific paper collections: integrating statistics, text analytics, and visualization. J Am Soc Inf Sci Technol 2012; (to appear).
[43]
Perer A, Shneiderman B. Balancing systematic and flexible exploration of social networks. IEEE Trans Vis Comput Graph 2006, 12: 693–700.
[44]
JabRef Development Team. JabRef. JabRef Development Team; 2010. Available at: http://jabref.sourceforge.net.
[45]
Cao N, Sun J, Lin Y-R, Gotz D, Liu S, Qu H. FacetAtlas: Multifaceted visualization for rich text corpora. IEEE Trans Vis Comput Graph 2010, 16: 1172–1181.
[46]
Hearst MA. TileBars: visualization of term distribution information in full text information access. In: Conference on Human Factors in Computing Systems. Denver, CO: ACM; 1995.
[47]
Hearst MA. Multi-paragraph segmentation of expository text. In: Proceedingsof the 32nd Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics; 1994, 9–16.
[48]
Heimonen T, Jhaveri N. Visualizing query occurrence in search result lists. In: International Conference on Information Visualisation. Washington, D.C.: IEEE Computer Society; 2005, 877–882.
[49]
Hoeber O, Yang XD. The visual exploration of web search results using HotMap. In: International Conference on Information Visualization. Washington, D.C.: IEEE Computer Society; 2006, 157–165.
[50]
Hoeber O, Yang XD. Interactive web information retrieval using wordbars. In: ACM Conference on Web Inteligence. New York: ACM; 2006.
[51]
Kuo BY-L, Hentrich T, Good BM, Wilkinson MD. Tag clouds for summarizing web search results. In: International Conference on World Wide Web. New York: ACM; 2007, 1203–1204.
[52]
Lam H, Baudisch P. Summary thumbnails: readable overviews for small screen web browsers. In: Conference on Human Factors in Computing Systems. New York: ACM; 2005, 681–690.
[53]
Li Z, Shi S, Zhang L. Improving relevance judgment of web search results with image excerpts. In: International Conference on World Wide Web. New York: ACM; 2008, 21–30.
[54]
Teevan J, Cutrell E, Fisher D, Drucker SM, Ramos G, Andre P, Hu C. Visual snippets: summarizing web pages for search and revisitation. In: International Conference on Human Factors in Computing Systems. New York: ACM; 2009, 2023–2032.
[55]
Jiao B, Yang L, Xu J, Wu F. Visual summarization of web pages. In: New York: ACM Conference on Research and Development in Information Retrieval. New York: ACM; 2010, 499–506.
[56]
Nguyen TN, Zhang J. A novel visualization model for web search results. IEEE Trans Vis Comput Graph, 12: 981-988, 2006.
[57]
Spoerri A. Rankspiral: toward enhancing search results visualization. In: International Conference Information Visualisation. Washington, D.C.: IEEE Computer Society; 2004, 208–214.
[58]
Nizamee MR, Shojib MA. Visualizing the web search results with web search visualization using scatter plot. In: IEEE Symposium on Web Society. Washington, D.C.: IEEE Computer Society; 2010, 5–10.
[59]
Jing TY, Orland H, Xue DY. Supporting Web Search with Visualization. London: Springer; 2010, 183–214.
[60]
Hearst MA,. Search User Interfaces. Washington, D.C.: Cambridge University Press; 2009.

Cited By

View all
  • (2024)The HaLLMark Effect: Supporting Provenance and Transparent Use of Large Language Models in Writing with Interactive VisualizationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641895(1-15)Online publication date: 11-May-2024
  • (2023)Addressing the gap between current language models and key-term-based clusteringProceedings of the ACM Symposium on Document Engineering 202310.1145/3573128.3604900(1-10)Online publication date: 22-Aug-2023
  • (2023)Portrayal: Leveraging NLP and Visualization for Analyzing Fictional CharactersProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596000(74-94)Online publication date: 10-Jul-2023
  • Show More Cited By
  1. Seeing beyond reading: a survey on visual text analytics

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
    Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery  Volume 2, Issue 6
    November 2012
    71 pages

    Publisher

    John Wiley & Sons, Inc.

    United States

    Publication History

    Published: 01 November 2012

    Author Tags

    1. Text Mining
    2. Visualization

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)The HaLLMark Effect: Supporting Provenance and Transparent Use of Large Language Models in Writing with Interactive VisualizationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641895(1-15)Online publication date: 11-May-2024
    • (2023)Addressing the gap between current language models and key-term-based clusteringProceedings of the ACM Symposium on Document Engineering 202310.1145/3573128.3604900(1-10)Online publication date: 22-Aug-2023
    • (2023)Portrayal: Leveraging NLP and Visualization for Analyzing Fictional CharactersProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596000(74-94)Online publication date: 10-Jul-2023
    • (2023)Artificial intelligence to automate the systematic review of scientific literatureComputing10.1007/s00607-023-01181-x105:10(2171-2194)Online publication date: 11-May-2023
    • (2022)TextGraph - A lexicon based framework for concept extraction and visualizationJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-21930343:2(2035-2044)Online publication date: 1-Jan-2022
    • (2022)DramatVis Personae: Visual Text Analytics for Identifying Social Biases in Creative WritingProceedings of the 2022 ACM Designing Interactive Systems Conference10.1145/3532106.3533526(1260-1276)Online publication date: 13-Jun-2022
    • (2021)Evaluating visual analytics for text information retrievalProceedings of the XX Brazilian Symposium on Human Factors in Computing Systems10.1145/3472301.3484320(1-11)Online publication date: 18-Oct-2021
    • (2020)LexichromeProceedings of the 2020 ACM Designing Interactive Systems Conference10.1145/3357236.3395503(477-488)Online publication date: 3-Jul-2020
    • (2018)SoS TextVisProceedings of the Conference on Computer Graphics & Visual Computing10.2312/cgvc.20181219(143-152)Online publication date: 13-Sep-2018
    • (2018)EvoqProceedings of the ACM Symposium on Document Engineering 201810.1145/3209280.3209533(1-10)Online publication date: 28-Aug-2018
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media