Keywords

1 Introduction

The ImageCLEF evaluation campaign was started as part of the CLEF (Cross Language Evaluation Forum) in 2003 [7, 8]. It has been held every year since then and delivered many results in the analysis and retrieval of images [20, 21]. Medical tasks started in 2004 and have in some years been the majority of the tasks in ImageCLEF [18, 19].

The objectives of ImageCLEF have always been the multilingual or language-independent analysis of visual content. A focus has often been on multimodal data sets, so combining images with structured information, free text or other information that helps in the decision making.

Fig. 1.
figure 1

Sample images from (left to right, top to bottom): ImageCLEFcoral with coral image segmentation and labeling, ImageCLEFmedical, e.g., tuberculosis prediction and Visual Question Answering, and ImageCLEFdrawnUI with recognition of hand drawn website UIs.

Since 2018, ImageCLEF uses the crowdAI (now migrated to AIcrowdFootnote 1 starting with 2020) platform to distribute the data and receive the submitted results. The system allows having an online leader board and gives the possibility to keep data sets accessible beyond competition, including a continuous submission of runs and addition to the leader board.

Over the years, ImageCLEF and also CLEF have shown a strong scholarly impact that was captured in [27, 28]. This underlines the importance of evaluation campaigns for disseminating best scientific practices.

In the following, we introduce the four tasks that are going to run in the 2020 editionFootnote 2, namely: ImageCLEFlifelog, ImageCLEFmedical, ImageCLEFcoral, and the new ImageCLEFdrawnUI. Figure 1 captures with a few images the specificity of the tasks.

2 ImageCLEFlifelog

The main goal of the Lifelog task since its first edition [9] has been to advance the state-of-the-art research in lifelogging as an application of information retrieval. Different personal devices, such as smartphones, video cameras as well as wearable devices that allow collection of different data about our daily life are available. Large amounts of data are created by these devices containing videos, images, audio and sensor data. To be able to organize such a vast amount of data, there is a clear need of systems that can do this automatically.

As in the previous three editions, the task focuses mainly on images. The 2020 task will again be split into two subtasks: the lifelog moment retrieval and a new sports performance lifelog task. The first subtask includes new and enriched data, focusing on daily living activities and the chronological order of the moments. The second subtask provides a completely new dataset for assessing sports performance.

For the Lifelog Core Task: Lifelog Moment Retrieval the participants are required to retrieve several predefined activities in a lifelogger’s life. For example, they are asked to return the relevant moments for the query “Find the moment(s) when the lifelogger was having a beer on the beach with his/her friends”. Particular attention will be paid to the diversification of the selected moments with respect to the target scenario. To make the task possible and interesting a rich multimodal dataset will be used. The data are completely new and contain about 4.5 months of data from three lifeloggers, 1,500–2,500 images per day, visual concepts, semantic content, biometric information, music listening history and computer usage.

The other task, Lifelog Task: Sports Performance Lifelog (SPLL, 1st edition), is completely new in terms of data and topic. Teams are required to predict the expected performance (e.g., estimated finishing time, average heart rate and other performance measurements) for a non-professional athlete who trained for a sports event. For the task, a new dataset is provided containing information collected from 20–24 people that train for a 5 km run. Objective sensor data is collected using the FitBit Versa 2 sport watchFootnote 3; subjective wellness, training load and injury data is collected using the PMSYS systemFootnote 4; and information about meals, drinks, medication, etc. is collected using Google Forms. The data contain information about daily sleeping patterns, daily heart rate, sport activities, logs of food consumed during the training period (from at least 2 participants) and self reported data like mode, stress, fatigue, readiness to train and other measurements also used for professional soccer teams. The data are collected over a period of four to five month. The copyright and ethical approval to release the data are obtained by the task organizers. For the sports task data, we have the data approved by the Norwegian Center for Research DataFootnote 5. For assessing the performance of the approaches, classic metrics will be used, e.g., precision, cluster recall (to account for the diversification), etc. For this sports task, we will also utilize metrics such as Mean Absolute Error (MAE) and Root mean squared error (RMSE).

3 ImageCLEFmedical

The ImageCLEF medical task has been running every year since 2004 [22]. In 2020, it will follow a similar format as in the previous edition [18] containing the same three subtasks with some modifications. The three tasks will be: tuberculosis analysis [10,11,12], figure caption analysis [13, 17, 23], and Visual Question Answering [2, 16].

The tuberculosis task will use, as in previous editions, Computed Tomography (CT) scans of patients with tuberculosis and more clinical data. In this edition, the task will concentrate only on generating an automatic report based on the CT and not assessing a TB severity score. The new report will be more detailed than in the previous edition, containing more specific information, such as in which region each TB-related finding is located.

The caption analysis task will include more data compared to 2019. In 2020, an extension of the Radiology Objects in Context (ROCO) [24] data set is used and manually curated to reduce the data variability. The collection includes images from the medical literature including caption information, concepts and 7 sub-classes denoting the image radiology modality. The task concentrates on extracting Unified Medical Language System (UMLS®) Concept Unique Identifiers (CUIs) and can also be used as a first step towards the Medical Visual Question Answering (VQA-Med) task.

The medical Visual Question Answering (VQA-Med) task poses a challenging problem that involves both natural language processing and computer vision. In continuation of the two previous editions, the task consists of answering a natural language question from the visual content of an associated radiology image. VQA-Med 2020 will focus further on questions about abnormalities and will include a new subtask on visual question generation from radiology images.

4 ImageCLEFcoral

Coral reefs are important ecosystems because they are the most biodiverse parts of the oceans. However, corals thrive in narrow temperature ranges and ocean warming trends, among other factors, indicate that many of them will be lost within the next 30 years [3]. This would be a catastrophe, not only because of the extinction of many of the marine species they host but also because they provide an income and an essential food source to the people who live nearby [4, 25]. Monitoring changes in reef composition and structural complexity on a large-scale is crucial to understanding and prioritizing conservation efforts.

Key to conservation work is knowledge of the state of reefs. Autonomous underwater vehicles are able to collect large amounts of data, more than can be annotated by a human. Although there have been promising attempts at automatically annotating imagery of reefs for complexity and benthic composition [15, 26], it is fair to say that the problem is far from being solved. The aim of this competition is to encourage researchers to improve techniques for automatically identifying areas of interest and label them in a way that helps marine biologists and ecologists.

Following the success of the first edition of the ImageCLEFcoral task [5], in 2020, participants are required to devise and implement algorithms for automatically annotating regions in a collection of images with types of benthic substrate, such as hard coral or sponge. The dataset comprises 440 human-annotated training images and a further 200 unseen test images of a region of coral reef in Indonesia. The images were captured in high-quality JPEG format using an innovative underwater image capture system developed at the Marine Technology Research Unit at the University of Essex, UK.

The ground truth annotations of the training and test sets were made largely by marine biology MSc students at Essex and checked by an experienced coral reef researcher. The annotations were performed using a web-based tool developed in a collaborative project with London-based company Filament Ltd which allowed many people to work concurrently and which was carefully designed to be simple and quick to use; this proved so effective that we are exploring whether the tool can be made publicly available for other tasks in the future.

As in the first edition, algorithmic performance will be evaluated on the unseen test data using the intersection over union metric popularized in the PASCAL VOCFootnote 6 exercise. This computes the area of intersection of the output of an algorithm and the corresponding ground truth, normalizing that by the area of their union to ensure its maximum value remains bounded.

5 ImageCLEFdrawnUI

The user interface (UI) is the space where interactions between humans and computers occur. The increasing dependence of web and mobile applications have led many enterprises to increase the priority of developing user interfaces, in an effort to improve the overall user experience. Currently, the performance of any modern digital product is strongly correlated to the quality and usability of its user interface. However, building a user interface for digital applications is a complex process involving the interaction between multiple specialists, each with its own specific domain knowledge.

Generally, a business owner is setting a business goal. Then, a project owner or product owner is refining the requirements and builds a prototype of the application using wireframes. Once validated, the designer transforms the wireframes into designs which are transformed into code by the developer. This process is time consuming and expensive. Moreover, as more people get involved, the process is increasingly error prone. In addition, user interface experts are in limited supply. Globally, there are about 22 million developersFootnote 7, among which only 10 million are estimated to be also JavaScript UI developersFootnote 8.

Recently the use of machine learning to facilitate this process has been demonstrated as a viable solution. In 2018, pix2code, a machine learning based approach to generate low fidelity domain specific languages from screenshots, was published and open sourced [1]. Also, in 2018 Chen Chunyang et al. created their own dataset from android apps with 185,277 pairs of UI images and GUI skeletons. The dataset and code were also open-sourced [6].

In this context, in the 2020 ImageCLEFdrawnUI task, given a set of images of hand drawn UIs, participants are required to develop machine learning techniques that are able to predict the exact position and type of UI elements. The provided dataset consists of 3,000 hand drawn images inspired from mobile application screenshots and actual web pages containing 1,000 different templates. Each image was manually labelled with the positions of the bounding boxes corresponding to each UI element and its type. To avoid any ambiguity, a predefined shape dictionary with 21 classes is used (e.g., paragraph, label, header). The performance of the algorithms will be evaluated using the standard mean Average Precision over IoU .5, commonly used in object detection [14].

6 Conclusions

In this paper, we present an overview of the upcoming ImageCLEF 2020 campaign. ImageCLEF has organized many tasks in a variety of domains over the past 18 years, from general stock photography, medical and biodiversity data to multimodal lifelogging. The focus has always been on language independent or multi-lingual approaches and most often on multimodal data analysis. 2020 has a set of interesting tasks that are expected to again draw a large number of participants. As in 2019, the focus for 2020 has been on the diversity of applications and on creating clean data sets to provide a solid basis for the evaluations.