Abstract
One of the characteristics of big data is its internal complexity and also variety manifested in many types of datasets that are to be managed, searched, or analyzed. In their natural forms, some of the data entities are unstructured, such as texts or multimedia objects, while some are structured but too complex. In this paper, we have investigated how visualizations of various complex datasets perform in the role of universal data representations for both human users and deep learning models. In a user study, we have evaluated several visualizations of complex relational data, where some proved their superior performance with respect to the precision and speed of classification by human users. Moreover, the same visualizations also led to effective classification performance when used with deep learning models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In order to ensure comparability of attributes, their values were transformed w.r.t. empirical cumulative distribution function.
- 2.
The study is accessible from http://hmon.ms.mff.cuni.cz:5002/visualrepresentation/join?guid=JvmOL0_d8aldQyy7wRTzPLJeStoHv5Cc.
- 3.
In all cases, we have used Euclidean distance. Each feature was converted to numeric value and normalized linearly between 0 and 1.
- 4.
Note that in the case of Nutrients dataset, we also selected a fixed sub-sample of displayed classes: a correct one plus three additional.
- 5.
Overall, the typical time to complete the whole study was 15–25 min.
- 6.
Note that because four classes were shown to each user in the case of Nutrients dataset, the accuracy of the random guessing would be 0.25.
- 7.
Pearson’s correlation of −0.57 and −0.94 if the results of Tabular baseline are removed.
- 8.
Note that we only evaluated visualizations that were readily available in PNG format, so we discarded the parallel coordinates and raw tabular data.
References
Afchar, D., Melchiorre, A.B., Schedl, M., Hennequin, R., Epure, E.V., Moussallam, M.: Explainability in music recommender systems. AI Mag. 43(2), 190–208 (2022)
Albo, Y., Lanir, J., Bak, P., Rafaeli, S.: Off the radar: comparative evaluation of radial visualization solutions for composite indicators. IEEE Trans. Vis. Comput. Graph. 22(1), 569–578 (2015)
Antonov, A.: ChernoffFace Python package (2022). https://github.com/antononcube/Python-packages/tree/main/ChernoffFace
Borg, I., Staufenbiel, T.: Performance of snow flakes, suns, and factorial suns in the graphical representation of multivariate data. Multivar. Behav. Res. 27(1), 43–55 (1992)
Chan, W.W.Y.: A survey on multivariate data visualization. Department of Computer Science and Engineering. Hong Kong University of Science and Technology 8(6), 1–29 (2006)
Chernoff, H.: The use of faces to represent points in k-dimensional space graphically. J. Am. Stat. Assoc. 68(342), 361–368 (1973)
Dokoupil, P., Peska, L.: Easystudy: Framework for easy deployment of user studies on recommender systems. In: Proceedings of the 17th ACM Conference on Recommender Systems, pp. 1196–1199. RecSys ’23, Association for Computing Machinery, New York, NY, USA (2023)
Etemadpour, R., Motta, R., de Souza Paiva, J.G., Minghim, R., De Oliveira, M.C.F., Linsen, L.: Perception-based evaluation of projection methods for multidimensional data visualization. IEEE Trans. Vis. Comput. Graph. 21(1), 81–94 (2014)
Flury, B., Riedwyl, H.: Graphical representation of multivariate data by means of asymmetrical faces. J. Am. Stat. Assoc. 76(376), 757–765 (1981)
Forsell, C., Seipel, S., Lind, M.: Simple 3D glyphs for spatial multivariate data. In: IEEE Symposium on Information Visualization, 2005. INFOVIS 2005, pp. 119–124. IEEE (2005)
Fuchs, J., Isenberg, P., Bezerianos, A., Fischer, F., Bertini, E.: The influence of contour on similarity perception of star glyphs. IEEE Trans. Vis. Comput. Graph. 20(12), 2251–2260 (2014)
Fuchs, J., Jäckle, D., Weiler, N., Schreck, T.: Leaf Glyphs: story telling and data analysis using environmental data glyph metaphors. In: Braz, J., et al. (eds.) VISIGRAPP 2015. CCIS, vol. 598, pp. 123–143. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29971-6_7
Grosse Deters, H., Timm, W., Nattkemper, T.W.: REEFSOM-a metaphoric data display for exploratory data mining. Brains, Minds and Media 2 (2006)
Hamner, C., Turner, D., Young, D.: Comparisons of several graphical methods for representing multivariate data. Comput. Math. Appl. 13(7), 647–655 (1987)
Holten, D., Van Wijk, J.J.: Evaluation of cluster identification performance for different PCP variants. In: Computer Graphics Forum, vol. 29, pp. 793–802. Wiley Online Library (2010)
Inselberg, A., Dimsdale, B.: Parallel coordinates: a tool for visualizing multi-dimensional geometry. In: Proceedings of the First IEEE Conference on Visualization: Visualization 1990, pp. 361–378. IEEE (1990)
Chang, K.: Parallel coordinates Github repository (2023). https://github.com/syntagmatic/parallel-coordinates/blob/master/examples/data/nutrients.csv
Keogh, E., Mueen, A.: Curse of Dimensionality, pp. 314–315. Springer, US, Boston, MA (2017). https://doi.org/10.1007/978-1-4899-7687-1_192
Klippel, A., Hardisty, F., Li, R., Weaver, C.: Colour-enhanced star plot glyphs: Can salient shape characteristics be overcome? Cartographica: Int. J. Geograph. Inf. Geovis. 44(3), 217–231 (2009)
Klippel, A., Hardisty, F., Weaver, C.: Star plots: how shape characteristics influence classification tasks. Cartogr. Geogr. Inf. Sci. 36(2), 149–163 (2009)
Kohonen, T.: The self-organizing map. Neurocomputing 21(1–3), 1–6 (1998)
Lee, M.D., Reilly, R.E., Butavicius, M.E.: An empirical evaluation of Chernoff faces, star glyphs, and spatial visualizations for binary data. In: Proceedings of the Asia-Pacific symposium on Information visualisation-Volume 24, pp. 1–10 (2003)
Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Vis. Comput. Graph. 23(3), 1249–1268 (2016)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Mezzich, J.E., Worthington, D.R.: A comparison of graphical representations of multidimensional psychiatric diagnostic data. In: Graphical Representation of Multivariate Data, pp. 123–141. Elsevier (1978)
Mohammed, L.T., AlHabshy, A.A., ElDahshan, K.A.: Big data visualization: a survey. In: 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), pp. 1–12. IEEE (2022)
Morris, C.J., Ebert, D.S., Rheingans, P.L.: Experimental analysis of the effectiveness of features in chernoff faces. In: 28th AIPR Workshop: 3D Visualization for Data Exploration and Decision Making, vol. 3905, pp. 12–17. SPIE (2000)
Naji, M.A., Filali, S.E., Aarika, K., Benlahmar, E.H., Abdelouhahid, R.A., Debauche, O.: Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Comput. Sci. 191, 487–492 (2021)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Scitovski, R., Sabo, K., Martínez-Álvarez, F., Ungar, Š.: Indexes, pp. 101–115. Springer International Publishing, Cham (2021)
Sharma, A., Vans, E., Shigemizu, D., Boroevich, K.A., Tsunoda, T.: DeepInsight: a methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9(1) (2019)
Skopal, T.: On visualizations in the role of universal data representation. In: Proceedings of the 2020 on International Conference on Multimedia Retrieval, ICMR 2020, Dublin, Ireland, June 8–11, 2020, pp. 362–367. ACM (2020)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Ventocilla, E., Riveiro, M.: A comparative user study of visualization techniques for cluster analysis of multidimensional data sets. Inf. Vis. 19(4), 318–338 (2020)
Wilkinson, L.: An experimental evaluation of multivariate graphical point representations. In: Proceedings of the 1982 Conference on Human Factors in Computing Systems, pp. 202–209 (1982)
Wolberg, W., Mangasarian, O., Street, N., Street, W.: Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository (1995). https://doi.org/10.24432/C5DW2B
Wong, P.C., Bergeron, R.D.: 30 years of multidimensional multivariate visualization. Sci. Vis. 2, 3–33 (1994)
Hamidani, Z.: Spotify tracks DB (2019). https://www.kaggle.com/datasets/zaheenhamidani/ultimate-spotify-tracks-db
Zhao, Y., et al.: Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Trans. Vis. Comput. Graph. 25(1), 12–21 (2018)
Acknowledgements
This paper has been supported by Czech Science Foundation (GAČR) project 22-21696S and by Charles University grant SVV-260698/2023.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Peska, L., Sixtova, I., Hoksza, D., Bernhauer, D., Skopal, T. (2023). Visual Representations for Data Analytics: User Study. In: da Silva, H.P., Cipresso, P. (eds) Computer-Human Interaction Research and Applications. CHIRA 2023. Communications in Computer and Information Science, vol 1997. Springer, Cham. https://doi.org/10.1007/978-3-031-49368-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-49368-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49367-6
Online ISBN: 978-3-031-49368-3
eBook Packages: Computer ScienceComputer Science (R0)