Nothing Special   »   [go: up one dir, main page]

Skip to main content

Visual Representations for Data Analytics: User Study

  • Conference paper
  • First Online:
Computer-Human Interaction Research and Applications (CHIRA 2023)

Abstract

One of the characteristics of big data is its internal complexity and also variety manifested in many types of datasets that are to be managed, searched, or analyzed. In their natural forms, some of the data entities are unstructured, such as texts or multimedia objects, while some are structured but too complex. In this paper, we have investigated how visualizations of various complex datasets perform in the role of universal data representations for both human users and deep learning models. In a user study, we have evaluated several visualizations of complex relational data, where some proved their superior performance with respect to the precision and speed of classification by human users. Moreover, the same visualizations also led to effective classification performance when used with deep learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In order to ensure comparability of attributes, their values were transformed w.r.t. empirical cumulative distribution function.

  2. 2.

    The study is accessible from http://hmon.ms.mff.cuni.cz:5002/visualrepresentation/join?guid=JvmOL0_d8aldQyy7wRTzPLJeStoHv5Cc.

  3. 3.

    In all cases, we have used Euclidean distance. Each feature was converted to numeric value and normalized linearly between 0 and 1.

  4. 4.

    Note that in the case of Nutrients dataset, we also selected a fixed sub-sample of displayed classes: a correct one plus three additional.

  5. 5.

    Overall, the typical time to complete the whole study was 15–25 min.

  6. 6.

    Note that because four classes were shown to each user in the case of Nutrients dataset, the accuracy of the random guessing would be 0.25.

  7. 7.

    Pearson’s correlation of −0.57 and −0.94 if the results of Tabular baseline are removed.

  8. 8.

    Note that we only evaluated visualizations that were readily available in PNG format, so we discarded the parallel coordinates and raw tabular data.

References

  1. Afchar, D., Melchiorre, A.B., Schedl, M., Hennequin, R., Epure, E.V., Moussallam, M.: Explainability in music recommender systems. AI Mag. 43(2), 190–208 (2022)

    Google Scholar 

  2. Albo, Y., Lanir, J., Bak, P., Rafaeli, S.: Off the radar: comparative evaluation of radial visualization solutions for composite indicators. IEEE Trans. Vis. Comput. Graph. 22(1), 569–578 (2015)

    Article  Google Scholar 

  3. Antonov, A.: ChernoffFace Python package (2022). https://github.com/antononcube/Python-packages/tree/main/ChernoffFace

  4. Borg, I., Staufenbiel, T.: Performance of snow flakes, suns, and factorial suns in the graphical representation of multivariate data. Multivar. Behav. Res. 27(1), 43–55 (1992)

    Article  Google Scholar 

  5. Chan, W.W.Y.: A survey on multivariate data visualization. Department of Computer Science and Engineering. Hong Kong University of Science and Technology 8(6), 1–29 (2006)

    Google Scholar 

  6. Chernoff, H.: The use of faces to represent points in k-dimensional space graphically. J. Am. Stat. Assoc. 68(342), 361–368 (1973)

    Article  Google Scholar 

  7. Dokoupil, P., Peska, L.: Easystudy: Framework for easy deployment of user studies on recommender systems. In: Proceedings of the 17th ACM Conference on Recommender Systems, pp. 1196–1199. RecSys ’23, Association for Computing Machinery, New York, NY, USA (2023)

    Google Scholar 

  8. Etemadpour, R., Motta, R., de Souza Paiva, J.G., Minghim, R., De Oliveira, M.C.F., Linsen, L.: Perception-based evaluation of projection methods for multidimensional data visualization. IEEE Trans. Vis. Comput. Graph. 21(1), 81–94 (2014)

    Google Scholar 

  9. Flury, B., Riedwyl, H.: Graphical representation of multivariate data by means of asymmetrical faces. J. Am. Stat. Assoc. 76(376), 757–765 (1981)

    Article  Google Scholar 

  10. Forsell, C., Seipel, S., Lind, M.: Simple 3D glyphs for spatial multivariate data. In: IEEE Symposium on Information Visualization, 2005. INFOVIS 2005, pp. 119–124. IEEE (2005)

    Google Scholar 

  11. Fuchs, J., Isenberg, P., Bezerianos, A., Fischer, F., Bertini, E.: The influence of contour on similarity perception of star glyphs. IEEE Trans. Vis. Comput. Graph. 20(12), 2251–2260 (2014)

    Article  Google Scholar 

  12. Fuchs, J., Jäckle, D., Weiler, N., Schreck, T.: Leaf Glyphs: story telling and data analysis using environmental data glyph metaphors. In: Braz, J., et al. (eds.) VISIGRAPP 2015. CCIS, vol. 598, pp. 123–143. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29971-6_7

    Chapter  Google Scholar 

  13. Grosse Deters, H., Timm, W., Nattkemper, T.W.: REEFSOM-a metaphoric data display for exploratory data mining. Brains, Minds and Media 2 (2006)

    Google Scholar 

  14. Hamner, C., Turner, D., Young, D.: Comparisons of several graphical methods for representing multivariate data. Comput. Math. Appl. 13(7), 647–655 (1987)

    Article  Google Scholar 

  15. Holten, D., Van Wijk, J.J.: Evaluation of cluster identification performance for different PCP variants. In: Computer Graphics Forum, vol. 29, pp. 793–802. Wiley Online Library (2010)

    Google Scholar 

  16. Inselberg, A., Dimsdale, B.: Parallel coordinates: a tool for visualizing multi-dimensional geometry. In: Proceedings of the First IEEE Conference on Visualization: Visualization 1990, pp. 361–378. IEEE (1990)

    Google Scholar 

  17. Chang, K.: Parallel coordinates Github repository (2023). https://github.com/syntagmatic/parallel-coordinates/blob/master/examples/data/nutrients.csv

  18. Keogh, E., Mueen, A.: Curse of Dimensionality, pp. 314–315. Springer, US, Boston, MA (2017). https://doi.org/10.1007/978-1-4899-7687-1_192

  19. Klippel, A., Hardisty, F., Li, R., Weaver, C.: Colour-enhanced star plot glyphs: Can salient shape characteristics be overcome? Cartographica: Int. J. Geograph. Inf. Geovis. 44(3), 217–231 (2009)

    Google Scholar 

  20. Klippel, A., Hardisty, F., Weaver, C.: Star plots: how shape characteristics influence classification tasks. Cartogr. Geogr. Inf. Sci. 36(2), 149–163 (2009)

    Article  Google Scholar 

  21. Kohonen, T.: The self-organizing map. Neurocomputing 21(1–3), 1–6 (1998)

    Article  Google Scholar 

  22. Lee, M.D., Reilly, R.E., Butavicius, M.E.: An empirical evaluation of Chernoff faces, star glyphs, and spatial visualizations for binary data. In: Proceedings of the Asia-Pacific symposium on Information visualisation-Volume 24, pp. 1–10 (2003)

    Google Scholar 

  23. Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Vis. Comput. Graph. 23(3), 1249–1268 (2016)

    Article  Google Scholar 

  24. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    Google Scholar 

  25. Mezzich, J.E., Worthington, D.R.: A comparison of graphical representations of multidimensional psychiatric diagnostic data. In: Graphical Representation of Multivariate Data, pp. 123–141. Elsevier (1978)

    Google Scholar 

  26. Mohammed, L.T., AlHabshy, A.A., ElDahshan, K.A.: Big data visualization: a survey. In: 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), pp. 1–12. IEEE (2022)

    Google Scholar 

  27. Morris, C.J., Ebert, D.S., Rheingans, P.L.: Experimental analysis of the effectiveness of features in chernoff faces. In: 28th AIPR Workshop: 3D Visualization for Data Exploration and Decision Making, vol. 3905, pp. 12–17. SPIE (2000)

    Google Scholar 

  28. Naji, M.A., Filali, S.E., Aarika, K., Benlahmar, E.H., Abdelouhahid, R.A., Debauche, O.: Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Comput. Sci. 191, 487–492 (2021)

    Article  Google Scholar 

  29. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)

    Google Scholar 

  30. Scitovski, R., Sabo, K., Martínez-Álvarez, F., Ungar, Š.: Indexes, pp. 101–115. Springer International Publishing, Cham (2021)

    Google Scholar 

  31. Sharma, A., Vans, E., Shigemizu, D., Boroevich, K.A., Tsunoda, T.: DeepInsight: a methodology to transform a non-image data to an image for convolution neural network architecture. Sci. Rep. 9(1) (2019)

    Google Scholar 

  32. Skopal, T.: On visualizations in the role of universal data representation. In: Proceedings of the 2020 on International Conference on Multimedia Retrieval, ICMR 2020, Dublin, Ireland, June 8–11, 2020, pp. 362–367. ACM (2020)

    Google Scholar 

  33. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)

    Google Scholar 

  34. Ventocilla, E., Riveiro, M.: A comparative user study of visualization techniques for cluster analysis of multidimensional data sets. Inf. Vis. 19(4), 318–338 (2020)

    Article  Google Scholar 

  35. Wilkinson, L.: An experimental evaluation of multivariate graphical point representations. In: Proceedings of the 1982 Conference on Human Factors in Computing Systems, pp. 202–209 (1982)

    Google Scholar 

  36. Wolberg, W., Mangasarian, O., Street, N., Street, W.: Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository (1995). https://doi.org/10.24432/C5DW2B

  37. Wong, P.C., Bergeron, R.D.: 30 years of multidimensional multivariate visualization. Sci. Vis. 2, 3–33 (1994)

    Google Scholar 

  38. Hamidani, Z.: Spotify tracks DB (2019). https://www.kaggle.com/datasets/zaheenhamidani/ultimate-spotify-tracks-db

  39. Zhao, Y., et al.: Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Trans. Vis. Comput. Graph. 25(1), 12–21 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This paper has been supported by Czech Science Foundation (GAČR) project 22-21696S and by Charles University grant SVV-260698/2023.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ladislav Peska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peska, L., Sixtova, I., Hoksza, D., Bernhauer, D., Skopal, T. (2023). Visual Representations for Data Analytics: User Study. In: da Silva, H.P., Cipresso, P. (eds) Computer-Human Interaction Research and Applications. CHIRA 2023. Communications in Computer and Information Science, vol 1997. Springer, Cham. https://doi.org/10.1007/978-3-031-49368-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49368-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49367-6

  • Online ISBN: 978-3-031-49368-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics