Abstract
Given a tabular dataset which should be graphically represented, how could the current complex visualization pipeline be improved? Could we produce a more visually enriched final representation, while minimizing the user intervention? Most of the existing approaches lack in capacity to provide a simplified end-to-end solution and leave the intricate process of setting up the data connections to the user. Their results mainly depend on necessary user actions at every step of the visualization pipeline and fail to consider the data structural properties and the constantly rising volume of open and linked data. This work is motivated by the need of a flexible framework which will improve the user experience and interaction by simplifying the process and enhancing the result, capitalizing on the enrichment of the final visualization based on the semantic analysis of linked data. We propose Lumina, a visualization framework, which : (a) builds on structural data analytics and semantic analysis principles, (b) increases the explainability and expressiveness of the visualization leveraging open data and semantic enrichment, (c) minimizes user interventions at every step of the visualization pipeline and (d) fulfills the growing need for open-source, modular and self-hosted solutions. Using publicly available read-world datasets, we validate the adaptability of Lumina and demonstrate the effectiveness and practicality of our method, in comparison to other open source solutions.
Graphic abstract
Similar content being viewed by others
Notes
References
Bostock M, Ogievetsky V, Heer J (2011) D\(^3\) data-driven documents. IEEE Trans Vis Comput Gr 17(12):2301–2309
Bradford L (2018) How open-source development is democratizing the tech industry. https://www.forbes.com/sites/laurencebradford/2018/03/26/how-open-source-development-is-democratizing-the-tech-industry/#6141022a3bb6
Bryan C, Ma K-L, Woodring J (2016) Temporal summary images: an approach to narrative visualization via interactive annotation generation and placement. IEEE Trans Vis Comput Gr 23(1):511–520
Card M (1999) Readings in information visualization: using vision to think. Morgan Kaufmann, Burlington
Cui Z, Badam SK, Yalçin MA, Elmqvist N (2019) Datasite: proactive visual data exploration with computation of insight-based recommendations. Inf Vis 18(2):251–267
Cui W, Zhang X, Wang Y, Huang H, Chen B, Fang L, Zhang H, Lou J-G, Zhang D (2019) Text-to-viz: automatic generation of infographics from proportion-related natural language statements. IEEE Trans Vis Comput Gr 26:906–916
Daniel Heward-Mills (2017) Using Tableau as a launch point for semantic web of linked data exploration. https://medium.com/virtuoso-blog/virtuoso-tableau-sparql-f9411852a87d
Gandy D (2015) Font awesome, the iconic font and css toolkit. Fortawesome. github. io
Gilson O, Silva N, Grant PW, Chen M (2008) From web data to visualization via ontology mapping. Computer graphics forum, vol 27. Wiley Online Library, New Jersey, pp 959–966
Grammel L, Bennett C, Tory M, Storey M-AD(2013) A survey of visualization construction user interfaces. In: EuroVis (Short Papers). Citeseer
Heer J, Bostock M (2010) Declarative language design for interactive visualization. IEEE Trans Vis Comput Gr 16(6):1149–1156
Helliwell JF, Layard R, Sachs JD(2019) World happiness report 2019. New York: Sustainable development solutions network
Hullman J, Diakopoulos N, Adar E (2013) Contextifier: automatic generation of annotated stock visualizations. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 2707–2716
Johnson I (2018) The trouble with D3 DailyJS medium. https://medium.com/dailyjs/the-trouble-with-d3-4a84f7de011f
Kong H-K, Liu Z, Karahalios K (2017) “Internal and external visual cue preferences for visualizations in presentations. Computer graphics forum, vol 36. Wiley Online Library, New Jersey, pp 515–525
Kosara R (2016) Presentation-oriented visualization techniques. IEEE Comput Gr Appl 36(1):80–85
Liu Z, Thompson J, Wilson A, Dontcheva M, Delorey J, Grigg S, Kerr B, Stasko J (2018) Data illustrator: augmenting vector design tools with lazy data binding for expressive visualization authoring. In: Proceedings of the 2018 CHI conference on human factors in computing systems. ACM, p 123
Luo MR, Cui G, Rigg B (2001) The development of the cie 2000 colour-difference formula: Ciede 2000. Color Res Appl 26(5):340–350
Luo Y, Qin X, Tang N, Li G (2018) Deepeye: towards automatic data visualization. In: 2018 IEEE 34th international conference on data engineering (ICDE). IEEE , pp 101–112
Mackinlay J (1986) Automating the design of graphical presentations of relational information. ACM Trans Gr (Tog) 5(2):110–141
Mei H, Ma Y, Wei Y, Chen W (2018) The design space of construction tools for information visualization: a survey. J Vis Lang Comput 44:120–132
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Moritz D, Wang C, Nelson GL, Lin H, Smith AM, Howe B, Heer J (2018) Formalizing visualization design knowledge as constraints: actionable and extensible models in draco. IEEE Trans Vis Comput Gr 25(1):438–448
Morstatter F, Kumar S, Liu H, Maciejewski R(2013) Understanding twitter data with tweetxplorer. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1482–1485
Onorati T, Díaz P, Carrion B (2019) From social networks to emergency operation centers: a semantic visualization approach. Future Gener Comput Syst 95:829–840
Ren D, Lee B, Brehmer M (2018) Charticulator: interactive construction of bespoke chart layouts. IEEE Tran Vis Comput Gr 25(1):789–799
Roy RS, Singh A, Chawla P, Saxena S, Sinha AR (2017) Automatic assignment of topical icons to documents for faster file navigation. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1, IEEE, pp 1338–1345
Satyanarayan A, Heer J (2014) Lyra: an interactive visualization design environment. Computer graphics forum, vol 33. Wiley Online Library, New Jersey, pp 351–360
Satyanarayan A, Moritz D, Wongsuphasawat K, Heer J (2016) Vega-lite: a grammar of interactive graphics. IEEE Trans Vis Comput Gr 23(1):341–350
Sears A, Jacko JA (2007) The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications. CRC Press, Boca Raton
Setlur V, Stone MC (2015) A linguistic approach to categorical color assignment for data visualization. IEEE Trans Vis Comput Gr 22(1):698–707
Setlur V, Battersby SE, Tory M, Gossweiler R, Chang AX(2016) Eviza: a natural language interface for visual analysis. In: Proceedings of the 29th annual symposium on user interface software and technology. ACM, pp 365–377
Setlur V, Mackinlay JD (2014) Automatic generation of semantic icon encodings for visualizations. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 541–550
Srinivasan A, Drucker SM, Endert A, Stasko J (2018) Augmenting visualizations with interactive data facts to facilitate interpretation and communication. IEEE Trans Vis Comput Gr 25(1):672–681
Sun Y, Leigh J, Johnson A, Di Eugenio B (2014) Articulate: Creating meaningful visualizations from natural language. In: Innovative approaches of data visualization and visual analytics. IGI Global, pp 218–235
Syed Z, Finin T, Mulwad V, Joshi A et al. (2010) Exploiting a web of semantic data for interpreting tables. In: Proceedings of the second web science conference
Tufte ER (1986) The visual display of quantitative information. Graphics Press, Cheshire
Tufte ER, Goeler NH, Benson R (1990) Envisioning information, vol 126. Graphics press, Cheshire
Viegas FB, Wattenberg M, Van Ham F, Kriss J, McKeon M (2007) Manyeyes: a site for visualization at internet scale. IEEE Trans Vis Comput Gr 13(6):1121–1128
Voigt M, Pietschmann S, Grammel L , Meißner K (2012) “Context-aware recommendation of visualization components,” In The 4th international conference on information, process, and knowledge management (eKNOW). Citeseer, pp 101–109
Wang Y, Sun Z, Zhang H, Cui W, Xu K, Ma X, Zhang D (2019) Datashot: automatic generation of fact sheets from tabular data. IEEE Trans Vis Comput Gr 26:895–905
Wang J, Wang H , Wang Z, Zhu KQ(2012) Understanding tables on the web. In: International conference on conceptual modeling. Springer, pp 141–155
Wongsuphasawat K , Qu Z, Moritz D, Chang R, Ouk F, Anand A, Mackinlay J, Howe B, Heer J (2017) Voyager 2: augmenting visual analysis with partial view specifications. In: Proceedings of the 2017 CHI conference on human factors in computing systems. ACM, pp 2648–2659
Wongsuphasawat K, Moritz D, Anand A, Mackinlay J, Howe B, Heer J (2015) Voyager: exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans Vis Comput Gr 22(1):649–658
Wongsuphasawat K, Moritz D, Anand A, Mackinlay J, Howe B, Heer J (2016) Towards a general-purpose query language for visualization recommendation. In: Proceedings of the workshop on human-in-the-loop data analytics. ACM, p 4
Yu B, Silva CT (2016) Visflow-web-based visualization framework for tabular data with a subset flow model. IEEE Trans Vis Comput Gr 23(1):251–260
Yu B, Silva CT (2019) Flowsense: a natural language interface for visual data exploration within a dataflow system. IEEE Trans Vis Comput Gr 26(1):1–11
Zwicklbauer S, Einsiedler C, Granitzer M, Seifert C (2013) Towards disambiguating web tables. In: International Semantic Web Conference (Posters & Demos), pp 205–208
Acknowledgements
This research has been cofinanced by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH CREATE INNOVATE (Project Code: T1EDK-03052), as well as from the H2020 Research and Innovation Programme under Grant Agreement No. 780121.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Semantic concepts
RDF - known as Resource Description Framework - is a W3C specification that is heavily used for knowledge management in web applications and provides models to describe knowledge as web resources and relationships between those web resources. It is designed to be mostly readable by computers connected on the world wide web and relies heavily on XML. RDF is used to provide semantic meaning to the information available on the web. Describes resources (web resources identified using URIS) using properties (resources also identified by URIS) and property values (other resources or literal values). RDF statements are in the form of triplets: subject predicate object. Such triplets are used in special datastores called RDF-stores or RDF-triplet stores and form the basis of describing a larger knowledge graph of semantically connected resources. Such knowledge graphs are maintained in online projects, such as DBpedia and Wikidata, that contain structured information with semantic meaning mainly retrieved from Wikipedia articles. DBpedia and Wikidata maintain open source knowledge base graphs of semantic information extracted from various Wikimedia projects, mainly Wikipedia. They provide public API endpoints for issuing semantic queries to the knowledge graph. Both of them also present structured information of concepts in the form of HTML pages. Knowledge graphs based on RDF triples can be queried by a special query language for the semantic web called SPARQL. Public knowledge graph projects, such as the ones mentioned above, expose HTTP REST APIs that support SPARQL queries. SPARQL queries can retrieve values from structured or semi structured and assist exploration by querying relationships between resources.
For example in DBpedia concepts are entities of a specific ontology type, such as Woman is an entity of type Person. Person is a subClass of Agent and Agent is a subClass of the root ontology owl:Thing. By issuing a specific SPARQL query in the DBpedia API we can retrieve all top-level classes that are direct ancestors of owl:Thing, thus compiling a list of the DBpedia’s core concepts, see Fig. 15:
In Wikidata, each item represents a semantic topic directly identifiable by an ID prefixed with Q. For example, the item with ID Q1 maps to the semantic concept of Universe as shown in Fig. 16
The root class of Wikidata items is a special Entity item described by the ID: Q35120. By Issuing a specific SPARQL query we can retrieve all direct ancestors of the Entity item.
Semantic analysis also involves a large amount of text and name analysis on raw data and specially meanings behind words, such as using the Wordnet system. Python includes a powerful framework widely used in Industry and Academics for natural language processing named NLTK-Natural Language Toolkit. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as Wordnet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries (https://www.nltk.org/). Wordnet is a lexical database for the English Language, which mainly groups English words into synsets (sets of synonyms) providing also Part Of Speech information for each word (Noun, verb, adjective, adverb). Words are linked by semantic relationships, such as hypernyms or hyponyms and form semantic hierarchies. Hypernym of a word is a general term that the word belongs to, for example worker is a Hypernym of word skilled worker. Person is a Hypernym of word worker. Hyponym of a word is a more specific term, whose semantic meaning is included in the meaning of the original word. For example, soccer ball is a hyponym of the word ball. Wordnet corpus is included in the Natural Language Toolkit library for Python. By exploring programmatically, the hypernym closures of random words the root hypernym of all is the synset entity.n.01 . Wordnet synsets are described by the following string pattern <word>.<letter for Part of speech>.<instance number sense>. Synset entity.n.01 refers to the first sense of wordentity as a noun.
Appendix B Semantic enrichment, symbol and color retrieval
The Semantic Analysis phase produces a data model that maintains a dictionary with all the recognized semantic concepts and named entities. The named entities recognized in the semantic analysis process are forwarded to the Named Entity Analysis process which further recognizes the individual named entity types and tries to gather Linked data from external data sources such as DBpedia, Wikidata, Twitter API and Geolocation services, see Fig. 17.
The next step is to feed all recognized semantic terms to the back-end library and retrieve relevant symbol assets (icons) and colors which are stored in reference objects in the internal data model. These symbols and color references will be ready to be used in the visualization process.
A great source for generic vector representations of everyday concepts is the highly popular FontAwesome library that includes 1500 icons in SVG format with complete search metadata. By analyzing FontAwesome search metadata files, a list of semantic topics was extracted and used to calculate popular hypernyms in Wordnet sorted by order of occurrences (Top 100). The top elements were used as a set to cover the basic semantic concepts that all words can be reduced to using hypernym closures. This ensures that we can retrieve relevant visualizations for most terms encountered in data. Visualization service maintains a semantic library of symbols represented by vector images. Each symbol is mapped to a core semantic concept and maintains metadata with other related semantic terms. For example, the specific semantic concept of hospital can be represented by a vector symbol stored in a hospital.svg file and also carry references to the following semantic terms: building, medical, health, service , see Fig. 18
Another set of useful vector representations added in the vector library were the Flags of all Countries along with semantic metadata for country codes, telephone prefixes and coordinates. Representations for the ISO 7001 Public information symbols were also added, as well as a specialized list of emoji representations (According to Unicode standards), that were linked semantically with sentiment concepts. Lumina includes the Visual service API which allows for color management. It supports a wide list of known color names mapped to HEX, RGB and HSL values. The list contains a couple of thousands of colors retrieved from specific domains such as X11 and HTML4 spec. A second list of semantic mappings to color names or color values is maintained. Visual service API provides calls for generating color palettes, based on arrays of semantic terms, see Fig. 19.
Rights and permissions
About this article
Cite this article
Kagkelidis, K., Dimitriadis, I. & Vakali, A. Lumina: an adaptive, automated and extensible prototype for exploring, enriching and visualizing data. J Vis 24, 631–655 (2021). https://doi.org/10.1007/s12650-020-00718-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12650-020-00718-y