1 Introduction
Data visualization, analysis, and interpretation are areas of great interest to the scientific community [
12,
32]. Data visualization provides users with intuitive means to interactively explore and analyze datasets, which can be dynamic, noisy, and heterogeneous, enabling them to effectively identify interesting patterns, infer correlations and causalities, and support sense-making activities [
5], making it possible to amplify human cognition [
11,
39]. Graphic displays not only allow us to visualize and analyze the message contained in the data, but also to remember this, since for most people, visual memory is more persistent than verbal or auditory memory [
56].
However, finding out which visualization is optimal for a given dataset is difficult. The process of transforming data into visual representations involves a set of human decisions that take time and require expertise, hindering quick and massive visual access to data. Currently, it is hard for designers to anticipate and test all possible combinations of interactive inputs that a visualization might receive [
49]. Hissitt [
24] points out that there is an increased need for technical skills to first understand and translate the data and then create visualizations around the results.
Among the different and varied data that are available, spatiotemporal data are one of the most frequently analyzed datasets. For instance, access to a vast amount of geospatial information and associated geographic services through the Internet is possible with geoportals, such as the INSPIRE Geoportal [
10], which are relevant in many fields. Also, public agencies and administrations provide open data to meet the demands of citizens for agile and flexible services—which include geolocation and temporal data–, promoting transparency and citizen participation, optimizing their resources, and improving their efficiency [
57].
One of the fields where the visualization of spatiotemporal data can be of great interest is
cultural heritage (
CH), where the increase in digitized information makes datasets with visual, textual, spatial, and temporal information linked to heritage assets available, while the visualization of spatiotemporal data can contribute to their preservation and dissemination. More and more museums and CH organizations are exploiting modern visualization systems to disseminate this content in an attractive, usable, and interactive way [
6,
14,
53].
In the context of spatiotemporal data visualization applied to CH, the research project SeMap [
35] has been developed, which aims to provide an innovative way of disseminating cultural assets through a spatiotemporal map [
36]. The datasets are obtained from more than 400,000 cultural objects catalogued in a network of Spanish museums created by the Ministry of Culture and Education, which most of them are currently on the CER.ES repository accessed through a web portal [
33] (such objects are the ones considered for visualizing in SeMap, more than 200,000). SeMap provides a new way of visualizing and exploring this information using a spatiotemporal map for visualization and a knowledge graph to retrieve information on the objects. The main targeted audience is the great public, as one of the objectives of the project is to disseminate and keep our heritage alive through the web. To that end, SeMap combines different visualization and filtering strategies to offer an intuitive way of navigating the map, combining GIS tools, graphic representations, and knowledge-assisted visualization.
Indeed, CH institutions are a great source of high-quality data, which is not really being tapped into by scientists. However, it is important to highlight that cultural data are different from those provided by Natural Sciences. They are seldom discreet and univocal. Finding ways to deal with their specificities is a big challenge. In this regard, some of the problems encountered in the development of the SeMap project have been the diversity of the textual information associated with the objects and the consequent necessity of grouping textual information by categories for subsequent data filtering. In particular, data is heterogeneous and comes with a certain granularity and uncertainty. Therefore, such data must be pre-processed before visual solutions can be provided.
Although some results of the project have been previously published [
18,
37,
38,
43], these have not been analyzed in the context of data visualization, which is the focus of this article. Additionally, some early publication showed preliminary results [
18,
37,
43] rather than the finalized tool [
38]. In the current article, we provide a detailed description of the strategies used to visualize data, which include text analytics to pre-process the dataset and design decisions to map data into aesthetics. We also provide an evaluation of the tool by second-year university students on a Data Science graduate programme, as part of the subject of Data Visualization, which includes the realization of specific tasks on the map, a usability evaluation, and qualitative questions.
By involving students in the evaluation, we intend to reinforce the links between research and teaching. In this way, students benefit from being one of the first audiences to interact with the developed application and from the generated knowledge, as researchers explain them the details on how this application was built; thus, students learn more about data visualization. Also, the feedback of students is relevant for the researchers, as it can be used to improve the application and/or for further developments. The results of the evaluation show that the tool was mostly liked and well accepted by the students, despite them rating the content as the most technically complicated aspect of this tool, as this belongs to a knowledge area they are not familiar with (i.e., CH). This study has not been reported before. Our findings contribute to the field of data visualization and geomatics, as a variety of data is processed with the aim of providing intuitive and interactive spatiotemporal maps enriched with additional information, which are able to show the CH objects in a single frame (the map itself), while still allowing users to retrieve the original information as provided by CER.ES.
This article is structured as follows. Section
2 reviews previous related works. Section
3 explains the data processing followed in SeMap, focusing on spatiotemporal information and other properties, and describing their semantic relationships. Section
4 describes strategies for visualizing data in SeMap. Section
5 shows the methodology and analysis of the results of the evaluation of the tool. Section
6 discusses the results, and finally, Section
7 offers some conclusions and outlines future work in this area.
2 Related Work
Spatiotemporal visualization refers to a specialized form of data visualization that combines the dimensions of time and space to depict the variations of one or more additional variables over space and time. While extensive research has been conducted in this field, visualizing such data becomes challenging when multiple dimensions, including spacetime, need to be considered simultaneously [
3,
55]. Consequently, ad hoc strategies are often required to effectively visualize spatiotemporal data with multiple dimensions.
One of the challenges of this type of data is to visualize a large amount of spatially close data. For this purpose, clustering methods are commonly used, where algorithms automatically cluster the data [
2,
44], which can be graphically represented by different solutions, such as points [
26], trajectories [
20], or regions [
28]. PixPlot [
58] is an example of library that visualizes tens of thousands of images in a two-dimensional projection where similar images are clustered together.
Representing relationships, and their spatial and temporal fluctuations poses an additional challenge, which is addressed using different approaches, including using lines with different thicknesses or colours [
42], using some graphic features such as colour, thickness, or graphic objects [
54], or combining these approaches with clustering techniques [
22].
In terms of representing the temporal variation of the data being visualized, the most commonly employed solutions revolve around the use of timelines. They allow users to control the timeframe of the spatial visualization [
13,
27,
30,
41,
59–
61], and there are specific web-based libraries as the Knight Lab's Timeline.js for building custom visualizations of data distributed over time [
62]. Typically, users can view a broad temporal scale, zoom in to specific time periods [
51], or even focus on precise months or days [
1].
CH applications often employ timelines to observe different time periods and enable interactivity switching between them. Examples of this approach can be found in the Prado Museum, where a 2D timeline displays paintings from its collection [
63], in The British Museum [
64], which features a 3D timeline enabling navigation through time, space (continents), and art categories, or in Mapping Titian [
65] where CH objects’ spatiotemporal trajectories can be visualized in a geographic context. The Photogrammar project [
66] offers an interactive map that allows users to visualize the vast FSA-OWI photographic archive, a well-classified collection comprising over 170,000 historic photographs, with a timeline split by photographer and enables users to filter information by zooming in on the map or selecting specific photographs on the timeline.
In addition to timelines, the use of moving elements and animations provides an effective means of representing the flow of time. The scientific community has proposed various solutions in this regard, including spiral diagrams [
23,
52], river-flow diagrams [
7,
21], or those that use three-dimensional environments to show this fluctuation, like the so-called space-time cubes [
1,
3,
7,
17], and other less intuitive approaches [
16].
Compared to other spatiotemporal data, CH collection objects often have poorly or loosely defined spatiotemporal properties. Geographic and temporal data may be missing, fuzzy, incomplete, duplicated, or even incorrect, making visualization challenging. Additionally, dealing with large databases of objects poses another challenge, requiring specific techniques for filtering and displaying information, such as the clustering techniques mentioned earlier.
Peripleo [
45], a search and visualization tool developed under the Pelagios initiative [
48], addresses the challenges of visualizing information from large databases. It enables users to explore the geographic, temporal, and thematic composition of distributed digital collections, progressively filtering and drilling down to explore individual records. The interface presents search results as dots on a map, and it also displays the temporal distribution of results in a bar graph. Users can filter results by zooming in and out, panning, selecting locations on the map, or choosing intervals on the timeline graph. Initially designed to explore ancient world data, a pilot version of the Peripleo software was proposed by Simon [
46] to integrate Europeana Data [
67], which is a large-scale search engine for digitized CH material from several European institutions referencing more than 53 million objects [
34].
Recently, a new browser-based version of Peripleo [
68] has been developed and can be applied to different collections [
69]. The new tool enables the visualization of information that has a geographical component, allowing the search and filtering of categorized elements, and providing three display options: map data as points, clusters, or heatmaps. Nevertheless, in this newer version, there does not appear to be a temporal visualization and filtering feature as in the previous one.
Compared to previous works visualizing large collections of CH objects, SeMap presents a web search tool that combines many of these visualization techniques. It incorporates clustering methods for a clear display, a timeline spanning centuries to filter and display information, various categories and properties for filtering objects, and options to filter by location or provenance. Additionally, it offers pop-up visual aids for the icons and symbols. The combination of these visualization techniques aims to provide a clearer view of the CH collections, enabling researchers and users to explore, analyze, and make meaningful interpretations. It is also relevant to highlight that, differently from other works that depict specific collections (usually related to a single museum, as in [
63,
64]), SeMap considers collections of more than 100 Spanish museums, i.e., the ones integrated in CER.ES. Integrating collections from different museums in a single frame adds a difficulty regarding the data processing, as data is heterogeneous but needs to be homogeneously represented on the map. The related details are further explained in the next section.
3 Data Processing
On SeMap it has been necessary to carry out text analysis processes to visualize the information in a manageable and accessible way for the user, and then to optimize the search process.
The information on CH objects has been provided by the Digital Network of Spanish Museum Collections through its online catalogue called CER.ES [
33]. The objects have several attributes that have been used to represent the information. On one hand, there were the physical characteristics of the object: materials, techniques used for its elaboration, type of object. Although approximately 60% of the objects had values in these attributes linked to three CER.ES thesauri [
70], the terms were entered manually with their textual description (e.g., “Canary Island pine wood”) rather than with their identifier and/or related URL. For the rest (40% approx.), the values were entered according to the interpretation and knowledge of the cataloguers, but without using the thesaurus. Other data, such as dimensions or their dating, were not related to any thesaurus, the information being entered following different criteria.
Given the volume of data—e.g., there were 378,453 objects with values in the category attribute, 48% of these data not being linked to the corresponding thesaurus—, it was unfeasible to carry out this classification work manually, at least with the resources of the project. For this reason, it was decided to tackle this task using Natural Language Processing (NLP) techniques. CER.ES thesauri are highly specialized, with thousands of terms; for instance, the thesaurus used to link the category of the objects have 8,716 terms, so it would be also very costly to link each object with the exact thesaurus term. Fortunately, it was considered sufficient to use a simpler classification to provide a user-friendly interface. Such simplification was decided by the researchers of the current work in collaboration with the Area of Collections of the Spanish Ministry of Culture and Education, which supported the project. Thus, the task related to linking the unclassified objects was reduced, as each object only needed to be linked to one group of the items generated for the user graphic interface. For example, if the CER.ES object had in the material attribute the value “Tenerife Pine”, it was not necessary to link it to the thesaurus term “Canary Island pine wood”, nor to its hierarchical superiors “Pine Wood” and “Conifer Wood”, but only to the fourth hierarchical level “Wood”.
Figure
1 is a diagram showing the distribution of data in both the CER.ES objects and the thesaurus, and the percentage of objects that had to be linked in an automated way, with the simplification of each thesaurus. The process of linking to higher terms in the thesaurus has been carried out through the following work:
—
Application of a set of rules on the SpaCy library [
71], to discard values that did not provide information and detect values with greater validity.
—
Generation of synonyms of the values of the original attributes.
—
Search in the corresponding thesaurus and detect the terms with the best correspondence.
On the other hand, there were the spatiotemporal location data of its provenance. As stated above, the dataset used in SeMap is heterogeneous and comes with a certain granularity and uncertainty, affecting both the spatial and the temporal dimensions. These issues are summarized in Table
1, where specific examples are mentioned.
Temporal data were entered by cataloguers at their own discretion. However, the references were years or centuries represented numerically or textually. The range of possibilities is limited, while the information may be extracted by defining a set of rules and applying them to the data at the Spacy library. In this way heterogeneity was virtually eliminated, 96% of the data being represented by years and also by centuries to facilitate user searches, which are limited to centuries. Although ambiguity cannot be eliminated, as it is inherent to this type of data, temporal granularity was not a problem, due to the simplification created by the user interface.
The spatial information in the CER.ES objects was complex to deal with, since at best there were administrative toponymic references, such as country, province or city and the name of a place (a church, a paleontological site) The name of the place was present in most of the data, but toponymic data were often missing, usually only one or two of them being present. Less than 10% of the objects had three toponymic attribute values.
The Geonames [
72] and Google Maps repositories have been used to geolocate the provenance of the objects. Objects with only the name of the place have many possible locations, with the exception of paleontological sites or natural features (e.g., capes), where the name match is more complex or where the name is long, more than four words. Reliability is very low when there is only one toponymic value, there not being a province, or similar.
To increase the reliability of the results, techniques such as searching the repositories for the name of the place combined with the province or administrative division of the museum where it is exhibited, which is known, are being explored. Generally, many of the objects in CER.ES museums come from places in the same administrative division. This formula is especially valid for small museums, but its impact is currently being estimated, as a considerable amount of data must be checked manually. CER.ES objects have a very long description related to the object. NLP techniques could be applied to try to get information on the provenance. One of the biggest challenges in this task is to differentiate the provenance from place of origin, where it was found or created, and to isolate this information from other aspects of the object.
4 Visualizing Data
As previously mentioned, the objective of the SeMap project is the creation of a map where spatiotemporal CH information is visualized. For this purpose, a tool has been created that allows us both to visualize the map, and to search for and filter the information.
This section focuses on describing the visualization on the map and the information of each object within the tool. The strategies and decisions employed to visualize information on the map tool will be analyzed in relation to maintaining coherence with purely visual information.
A spatiotemporal map is utilized on the tool to visually represent the geographic location and historical temporal period of artifacts in Spanish museums. In order to maintain a clear and uncluttered visual display, the decision was made to include minimal text labels on the map, only displaying the number of clusters and objects. This was implemented to avoid saturating the map with an excess of information, as this already includes a base cartography, which can even be customized (Figure
2).
In addition to the visual representation of data on the map, the tool also includes a search bar with property filters on the sidebar. These filters have been created by field experts who have reviewed all the fields and values on the database and clustered them into main terms in order to maintain a usable user interface and reduce the amount of information for the user. Filters available in the sidebar include (Figure
3):
—
Classification (clasificación): allows users to search for artifacts based on their classification, such as “painting” or “sculpture”.
—
Material: allows users to search for artifacts made of specific materials, such as “wood” or “bronze”.
—
Category (categoría): allows users to search for artifacts based on their category, such as “religious” or “industrial”
—
Museum (museo): allows users to search for artifacts in a specific museum.
—
Country (país): allows users to search for artifacts from a specific country.
—
Century (siglo): allows users to search for artifacts from a specific century.
The search bar also includes a checkbox to perform a “deep search”, which will search for the submitted terms against all fields of the objects on the database and apply semantic variations of the entered term. The “deep search” does not only look for the exact occurrence of the text in the object data. It eliminates less important words (articles, etc.), associates terms (e.g., nouns with adjectives), and searches for the occurrence of each combination found in the text by considering synonyms and eliminating gender or number suffixes. It also contemplates higher hierarchical thesaurus terms with respect to the filter groups of materials, techniques, and classification.
To further assist users in comprehending the information displayed on the map and on the sidebar, the tool includes pop-up visual aids for the icons and symbols used throughout the user interface, as well as for the field related to the deep search, as the example shown in Figure
3. These visual aids are shown as white text inside a grey box, providing a brief explanation of the meaning and significance of each icon, allowing users to easily understand the information being conveyed.
Upon selection of a specific object on the map, an image and the main properties of the object are displayed on the sidebar alongside the corresponding icons. These properties include the author, current museum, provenance location, technique, classification, material, and time (start and end date). The technique, classification, and material properties are presented as “chips”, a common user interface element defined as “compact elements that represent an input, an attribute, or an action” [
73]. A truncated version of the object's description is also presented, with a link to the full information on the CERES website available for those seeking further details.
On the sidebar of our tool, we have also implemented a similarity metric for the selected object. This interface presents a table that categorizes the similarity of the chosen object based on the properties of category, classification, and material. To evaluate the degree of similarity, we utilize basic rules based on percentages of coincidence. Artifacts with over 80% of the same values for a single property are considered highly similar, those with between 30% and 80% are considered similar, those with between 0% and 30% are considered poorly similar, while those with 0% are deemed non-similar [
38].
We then compute the total number of objects that are highly similar, similar, poorly similar, and non-similar, and provide the percentage in brackets for each of the properties. This allows users to readily discern how the selected object compares to other artifacts in terms of these properties and to understand the level of similarity between them. In Figure
4, an object is depicted after filtering by “pictorial” in the field “classification”. For this result, Figure
5 shows the similarity table.
Overall, the similarity metric provides a valuable and intuitive method for users to comprehend the characteristics of the selected object and how it compares to other artifacts on the database.
Our tool includes a hidden sliding sidebar located on the left side of the website, providing users with additional customization options. The sidebar is comprised of two tabs, the first of which allows users to alter the base cartography of the map and to change the way the objects are displayed on the map. These options include the ability to choose from various map styles and to display the objects based on their provenance (
procedencia) location, which shows where the object was previously located. Alternatively, they can choose to display the objects based on their current location (
ubicación), which shows the location of the museum where the object is currently housed or displayed (Figure
6).
The second tab on the sidebar contains an inscription on the symbology used on the map, enabling users to easily comprehend the meaning of the various icons and symbols displayed on the map and to interpret the presented information more effectively (Figure
7). Additionally, this includes disclaimer information at the bottom.
Finally, the system also includes a timeline feature, consisting of an interactive slider with labels for all the centuries represented in the records on our database (Figure
8). The sidebar is displayed vertically instead of horizontally –as is commonly done in other works–, as a strategy to allow correct visualization of the text. This slider allows for real-time changes to the displayed information on the map, visualizing only artifacts matching the selected century.
5 Evaluation of the Tool
In order to evaluate the tool developed and the interface design decisions made, an evaluation was carried out with users focusing on two aspects: task completion and the usability of the tool. In addition, a qualitative study with users has also been carried out.
5.1 Participants
The spatiotemporal map was shown to and evaluated by second-year university students on a Data Science graduate programme, as part of a subject on Data Visualization. To that end, a workshop was organized dedicated to this (refer to Section
5.2). A total of 53 students participated in the workshop, aged between 19 and 21 years old. Among these, 32 recognized themselves as male, 18 as female, and 3 preferred not to say their gender.
5.2 Procedure
The workshop took place in May 2022, in a two-hour session, and consisted of the following activities: (a) First, the SeMap project and the developed tool were explained to the students (ca. 40 min); (b) then, the tasks were explained to them, the online form they had to fill in and the purpose of the evaluation (ca. 10 min); (c) finally, they had to evaluate the tool (max. of 70 min).
The presentation of the project and the developed tool was carried out by the subject lecturer, who additionally is the principal researcher on the project. While explaining the tool, she placed special emphasis on the visual strategies that were implemented to visualize the data, considering their spatiotemporal dimensions, as well as pointing out the similarities among the represented objects.
During the second part, the tasks that the students had to carry out, the form that they had to fill in, and the purpose of the evaluation were explained, also by the lecturer. Students were informed that they had to carry out some tasks making use of the SeMap spatiotemporal map, and which were described in an online form, and then respond to the questions related to each task (refer to Section
5.3.1). They were also informed that, after finishing the tasks and questions, they had to fill in two questionnaires, one for the usability testing of the tool (Section
5.3.2) and another to evaluate their overall satisfaction (Section
5.3.3). Such questionnaires were at the end of the form with the tasks and questions, where the meaning of the questions was explained.
During the third part of the workshop, students had to carry out the tasks and fill in the online form.
5.3 Data analysis
5.3.1 Task Completion.
In this section, the processes carried out by the students to perform a set of tasks satisfactorily are analyzed. First, students were asked to access the map [
36] with their own laptops and then to follow the proposed tasks (T#), as given in Table
2. After completing each task, they had to answer one question in the provided online form. The table also shows the success rate for each of the tasks, which has been computed as the percentage of correct answers given by the students, while the average success rate of all tasks is 75.74%, meaning that three quarters of the tasks were correctly done by the students.
Figure
9 shows the number of correct, empty, and wrong answers given by the students for each of the questions, in the form of stacked bars. The percentages are also depicted on the horizontal axis, where the values for most left-side bars (in grey colour) correspond to the success rate, as indicated in Table
2. As can be seen, the low value of success rate for T4 is because there is a large number of students that failed to respond correctly to the question, rather than leaving the question empty. Some failed this question because they forgot to apply some of the filters; note that to answer this question correctly, they had to apply a third filter (country: Spain), which was not explicitly indicated in the description of the task.
5.3.2 Usability Evaluation.
To assess the usability of the tool, the SUS questionnaire has been used [
8,
9]. The questions and the resulting scores for each of them are listed in Table
3 and summarized in Figures
10,
11, and
12.
In questions Q1 to Q10 the range 1–5 means: 1: strongly disagree, 5: strongly agree, where odd questions are formulated in a positive way (thus, the best possible score is 5) and the even questions are formulated in a negative way (the best possible score is 1).
From the odd questions (Q1, Q3, Q5, Q7, and Q9; Figure
10), where the best possible score is 5 points, Q3 and Q7 reach the highest mean scores (4.11 and 4.13, respectively) and the greatest number of answers with a score of 5 points (n = 22 in both cases); the second-best score, 4 points, is also highly selected (n = 21 and n = 20, respectively). This means that most students agree that the tool is easy to use and think that most people would learn to use it very quickly. On the other hand, the poorest results are obtained for Q1, meaning that students will not necessarily use the tool frequently, as few persons (n = 2 and n = 16) give a positive score (5 and 4 points, respectively). A likely cause may be that the knowledge area (CH) might not be of a high interest for the students (this will be further discussed in the discussion, Section
5).
On the other hand, from the even questions (Q2, Q4, Q6, Q8, and Q10; Figure
11) where the best possible score would be 1 point, the best result is achieved for Q10, with a mean value of 1.62, with 60.38% (n = 32) students giving 1 point, and 26.42% (n = 14) 2 points. This means that most users do not agree that they needed to learn a lot of things before they used the tool. On the other hand, the lowest scores are for Q6, with a mean value of 2.11, 30.19% (n = 16) of students giving 1 point, and 35.85% (n = 19) giving 2 points. This means that more than half of the students agreed that there was too much inconsistency in the tool. This can be due to the loading time of the objects on the map (this will be further discussed in the qualitative analysis, Section
5.3.3). However, it is worth mentioning that, in contrast to the rest of even questions, none of the students rated Q6 with the worse possible score (5 points). Overall, it is satisfactory that the mean scores of all the even questions are below 3 points, which is the neutral value.
On the other hand, the values of the SUS score range from 0 to 100, meaning 100 the best imaginable result. In our case, this score reaches 73.49 points (Table
3), which can be considered good on the scale of scores provided by the questionnaire and considering the fact that a minimum score of 70 would be deemed acceptable for a tool [
4,
9], as shown in Figure
12. As can be seen, most of the students (62.26%, n
= 33) rate the tool as acceptable (individual SUS score above 70 points), while only a few (11.32%, n = 6) rate it as not acceptable (individual SUS score below 50 points). Therefore, the perceived ease of use of our technology is good, indicating that the SeMap interactive map is usable.
5.3.3 Qualitative Study.
With the aim of identifying what could be improved in the tool on a technical level, six open questions have been included in the user test. This will help to identify gaps in terms of CH dissemination. The questions have been analyzed by coding and grouping them following an inductive method according to their similar contents [
29]. The results are presented in Table
4 and in the form of treemap (Figure
13).
Firstly, filtering was what students liked the most (n = 24), with answers such as “the great quantity of filters that can be applied to the searches”, “Allows efficient filtering”, “the diversity of filters”; followed by the content (n = 20), where responses such as “the great diversity of museums”, “the possibility of exploring CH artifacts worldwide”, “information access” are included. Opposed to this, we found that the response time was what they liked least (n = 21), here we have considered all those responses that mentioned words such as “blocked”, “error”, “failure”, or “slow”. These responses are aligned with the SUS score on Q6 where more than half of the respondents found the tool quite inconsistent.
Regarding what was the most technically complicated aspect of the tool, the majority referred to understanding the information given, that is: the content, (n = 23), followed by the filtering (n = 22). For content, we considered responses such as “I did not understand the geography”, “object data”, “understand what you are looking for”.
To continue, Q4 is aligned with the responses of Q5 regarding what they would change about SeMap, the content (n = 23) and filtering (n = 22) being the two things they would change. Surprisingly, only three students would improve the response time, when it was the most disliked aspect of the tool. Nonetheless, results of Q4 and Q5 correspond to SUS score for Q1 where it is reported that students will not necessarily use the tool frequently, as they did not understand the content. This will be further detailed in the Discussion section.
On the other hand, to the question of where they would use the tool, the answers are quite balanced, with tourism being the best rated, followed by cultural, education, and professional fields. We have distinguished between tourism and culture, understanding tourism as the act of consciously visiting a place [
74] versus culture [
75], in the sense of learning more about an object, tradition, or artistic and literary values. This means that students, even if they do not understand the content (Q4, Q5) somehow, would use the tool for leisure (counting tourism and cultural, n
= 35) and for educational purposes (counting education and professional, n
= 25).
Finally, as per their general opinion, most of the students found SeMap to be a good tool (n = 20), including: “good application”, “I find it a very powerful tool and a huge project”, “I find it a very good tool with limited use but very well built”, “Quite well done and a good idea”. Followed by useful (n = 15), and easy (n = 11).
6 Discussion
Regarding the decisions used to display information in the map tool, different strategies were chosen to achieve a clearer visual interface such as the use of clustering, minimal text labels on the map, search bar with filters, hidden sliding sidebar, and the use of a vertical timeline.
As reviewed in the state of art, clustering methods are a common approach for visualizing large amounts of spatially close data [
2,
44]. Nevertheless, these methods are not employed in other tools designed for CH data, such as Peripleo [
45,
68,
69], where each item is represented as an independent circle, leading to a map with numerous circles overlapping until zoom is applied. In comparison to these interfaces, the incorporation of clustering methods in SeMap results in a clearer display. Here, the considerable volume of closely located data is grouped within a single circle, featuring a numerical indicator. These big circles offer an initial glimpse of the visualized information, then subdivides into several smaller circles upon zooming.
CH applications often incorporate timelines [
61,
63,
64] to display the date of objects. SeMap includes this functionality through an interactive century slider, enabling users to focus on elements from their selected century. Compared to other works, SeMap introduces a slight variation by displaying the timeline vertically, which optimizes the use of available space for text descriptions.
Regarding task completion, the results show that the task success rate is good in all the proposed tasks except in T4. Most of the failures that occurred in T4 were due to students not applying the filtering, probably because this was not explicitly stated in the task description. For the rest of the tasks, the success rate is good, with an average value of over 75%, indicating that the functionalities of the tool's interface are understandable.
As for the usability evaluation and the qualitative analysis, our findings on what was liked the most vs what was disliked the most suggest that, in general, given its acceptable visualization (n = 16, qualitative analyses) and its ease of use (62.26%, n = 33, SUS score), the tool was mostly liked and well accepted by the users. While we did not evaluate the user interface's distinguishing aspects it is worth mentioning that our results indicate a high level of usability and positive feedback from the students. In forthcoming research, we acknowledge the importance of isolating the user interface functionalities from the map visualization and conducting a dedicated evaluation of its impact on the user experience. This strategy would allow for a more detailed evaluation of the impact of distinct user interface features, including the timeline, map navigation, and filtering options, on user preferences and actions. We can also indicate that the low scores given for Q1 on the SUS, a question specifically related to the application domain, have a non-negligible impact on the SUS score; computing an equivalent SUS score without the consideration of Q1 would bring a rating of the tool with 76.36 points, almost three points above the obtained score of 73.49 points.
Nevertheless, in this regard, we must assume that the poorest results are obtained for Q1, meaning that students will not necessarily use the tool frequently, which is not surprising as students reported that the content was the most technically complicated aspect of this tool. In fact, some students said that they did not understand the difference between provenance and current location. This is significant as this distinction is basic in Art History, it was probably the first time that data visualization students had heard about this difference. It is indeed described in the tool, which means it may be necessary to explain it differently for people not familiar with CH. While other respondents said that SeMap's use was quite limited (only for CH professionals) or that they would change the museums' information as it was not clear enough.
These results are aligned with the fact that CH sector does not always succeed in reaching other sectors of the public properly. In fact, as literature shows there are several projects that use technology to better explain CH but not involving STEM students and when they are, it is through specific programs [
40]. In this regard, CH professionals, including GLAMS might require a better approach for university students from STEM careers. In this regard, authors of this article are already researching if this is usual or not [
15,
25,
47]. Nonetheless, future research should further develop and confirm these initial findings by repeating this experiment with more STEM students, especially since results on where you would use the tool are focused on leisure, including tourism and acquiring cultural knowledge. In this regard, some of the students answered that what they liked the most is data access and its content (n
= 21). Broadly translated our findings indicate that STEM students appreciate CH access, but they do not understand it, hence, they find applications like this one useless unless they are used for their studies (“it can be used in a data visualization course”).
7 Conclusions
In this article, we have presented the results of the SeMap project, which aims to provide a spatiotemporal map for visualizing CH objects on the Spanish Digital Network of Museum Collections. Firstly, the process of obtaining the data to be visualized from a catalogue, and its processing and classification using NLP techniques has been described. In particular, problems related to the granularity and uncertainty of the available information have been solved.
Then, the tool developed to visualize the previously classified information has been described. This tool includes a map that allows the visualization of CH objects with the possibility of filtering information and deep search options. With regard to text information, decisions were made during the design of the tool on how to avoid saturation of textual information, which involves displaying objects’ information on a sidebar rather than on top of the map canvas. Additionally, the link to the original CER.ES catalogue is provided, which contains the full information. One of the lessons learnt is that using clustering methods within SeMap results in a clear display. The choice of employing differently sized circles based on the number of clustered elements not only enhances the clarity of the interface but also facilitates the visualization of substantial amounts of spatially close data on a map. In addition, by using concise text labels on the map and showing solely the number of clusters and objects, we prevent overwhelming the map with excessive information. Much in the same way, the vertical timeline optimizes the use of available space for text descriptions and results in a clearer display.
To validate the tool, an evaluation with users has been performed. This evaluation has included three aspects: the task completion, as a way of measuring the comprehensibility of the interface and the task that can be performed with the tool; a usability study to find out user perception; and a qualitative study to identify improvements that can be made. This evaluation has been very useful and has shown us the strengths of the tool and the issues that can be improved. The fact that most of the students find the tool usable although they do not fully understand and/or are not especially interested in the cultural-related content, make us believe that the decisions taken to visualize data (e.g., clustering, side windows) are effective and could be extrapolated to other areas of knowledge. In this regard, it should be noted the existing differences it may exist when using this tool among different categories of users [
31], and regarding their UXinteraction [
50] and UI [
19]. From motivation where STEM students only might use it for data visualization vs scholars in humanistic studies who want to better understand data and improve their research. Same happens with digital literacy and interaction style where people with a STEM background would be able to better interact with the tool whereas CH professionals might not be tech-savvy. We have already explored this tool with other audiences such as Secondary Students [
18] but future research is needed with a comparison analysis between several audiences.
As further work to increase the functionalities of SeMap, we think it would be interesting to explore the consideration of a recommendation system for aiding users based on their previous interactions similar to the one available in Europeana [
76]. Additionally, the similarity metric that we have implemented in SeMap, could be complemented with an intelligent text-and-image-based system that retrieves the most similar objects for a given object, as implemented in other projects, such as SILKNOW [
77]. However, those considerations would require exploring how these intelligent systems could be integrated in the graphical user interface of SeMap. Also, as further work, we intend to perform an evaluation of the precision and recall of the NLP-based object classification, to quantify how well the automatic data pre-processing performs.