Keywords

1 Introduction

Lifelogging has recently found its way into public consciousness. More and more devices, sensors and applications become available for the end-user. In this paper, we want to discuss what implications, challenges and opportunities arise for research in the area of information retrieval (IR). Lifelogging is the process of automatically capturing and storing every possible piece of information about a person’s life. A good definition comes from Dodge and Kitchin [1]. They define lifelogging as “…a form of pervasive computing consisting of a unified digital record of the totality of an individual’s experiences, captured multimodally through digital sensors and stored permanently as a personal multimedia archive”. Furthermore, they define the goal of lifelogging to have “…a record of the past that includes every action, every event, every conversation, every material expression of an individual’s life; all events will be accessible at a future date because a life-log will be a searchable and recallable archive”. The amount of collected data differs depending on the used sensors. An AutographerFootnote 1 camera is making up to 1 million photos every day, which sums up to 480 GB per year per person. This is only one example for data collection. Data is collected from a whole lot of different sensors. Activity tracking for instance is possible using Fitibit One,Footnote 2 Withings PulseFootnote 3 or various other tracking devices. There are numerous Apps like MovesFootnote 4 or the Sony LifeLog app that are tracking what you did and where you’ve been. The wealth of data is overwhelming. Suddenly, we know more about a single user than ever before. Instead of only getting information such as ratings for items, we know the context a user made a decision in. This data allows us to approach new use cases allowing applications to be more useful to a user then previously possible.

The paper is structured as followed: In the following we will outline the general application areas of lifelogging, the use cases for lifelogging. Based on this, we will discuss in-depth lifelogging and how information retrieval can support lifelogging. Therefore, we present a web application, DailyMe, developed at our university that allows connecting different tracking applications and get a personal diary for every day. We describe the data collected and outline what can be done and what is still a problem for an IR system. Based on this we will discuss what needs to be done to enable IR systems to make full use of lifelogging data and enable users to benefit from it.

2 Use Cases for Lifelogging

While a lot of current discussions center around what is technically possible, we want to examine feasible use cases for lifelogging. In their book Total Recall, Bell and Gemmell, identify four main areas for lifelogging use cases:

  • Work

  • Health

  • Learning

  • Everyday life (social)

Correspondingly, Sellen and Whittaker describe five use cases where lifelogs can be beneficial, the so-called 5 Rs:

  • Recollecting: Recalling a specific moment in life (episodic memory).

  • Reminiscing: Recalling a specific moment for emotional or sentimental reason, this can be seen as a special case of Recollecting.

  • Retrieving: Retrieve a previously encountered digital item or information, such as documents, email, or Web pages.

  • Reflecting: A more abstract representation of personal data to facilitate reflection on Reviewing of, past experience.

  • Remembering intentions: Remember to do, e.g. remembering to show up for appointments.

These use cases come along with some challenges for IR, which are described in the book LifeLogging: Personal Big Data by Gurrin et al.:

  • Data gathering: Data collection is time consuming and requires different sensors and manual effort. Also, the data is private and thus only data from the user itself can be used.

  • Data analysis: Understanding data from heterogeneous sources, e.g., multimedia, text and sensors and extract meaning out of it (semantic extraction/semantic organization).

  • Search & retrieval: The heterogeneous data makes searching for information more complicated. We have not well understood retrieval requirements and use-cases coming with lifelogging. Instead of using text queries to find documents, we can now find for instance events based on context information we remember.

  • Evaluation: Datasets seem be a problem. As the data is private by nature, public datasets will be hard to get.

  • Summarization and data mining: Pre-step to a good and helpful presentation allowing the user to take advantage of the collected data. Supporting quantified-self style analysis and narrative/story-telling presentation.

  • User interaction and presentation: Lifelogging will produce a big amount of data. We need to define likely usage scenarios, potentially omnipresent and even how to support query formulation for many of the use-cases. This is currently poorly understood.

All of these different use cases and challenges will be problems for some time to come. A lot of effort is already put into some of these challenges for certain use cases but we are just at the beginning and it is exciting what will come. Admittedly, one major real world challenge is filtering out noisy or meaningless data. Data collected by an Autographer camera so far produced mostly pictures (the ones not blurred) showing a person during daily activities like driving, drinking coffee and sitting in front of a computer.

What also needs to be remembered is that the data belongs to the user. This implies that services in the context of lifelogging should leave the user in full control over the data. The user must be able to decide what the data is used for. The full data must be accessible, e.g., by an API. And of course, the user must be allowed to delete the data.

3 Lifelogging in the Field

In this section, we will discuss lifelogging applications and their usage. We concentrate on the use cases presented by Sellen and Whitaker – the 5 Rs. We argue that a person itself mainly drives the motivation for lifelogging, and especially the 5 Rs. While there are approaches by companies for equipping their customers, e.g. car insurances tracking the driving behavior of their customers, these approaches are restricted to a limited number of people. The main driver for lifelogging today is still the idea to gain information about oneself and learn from this data. We also argue that the uses cases presented by Bell and Gemell are covered by the 5 Rs as the approach from an IR view is similar, only the context differs. ‘Recollecting’ for example is relevant for work, learning and the every day life. A user wants to revisit a certain moment to either remember the outcome of a work discussion, an example from a school lesson or an event with friends. From an IR view, the task is similar. Based on the available information, such as people who were also there, a IR system allows to search for this data. In the next section, we present a web application, DailyMe, developed at our university that allows connecting different tracking applications, collecting data and get a personal diary for every day.

3.1 DailyMe – A Daily Diary

DailyMe is a web-based application, which allows connecting external applications from Fitbit, Foursquare, Flickr and Moves and also allows making notes and tags. By using the external applications, we come close to the general lifelogging goal that the daily behavior is tracked without much manual assistance. It allows us to track what we did, where we have been and collects photos taken. Figure 1 shows the start page of DailyMe where the user gets an overview of the activities of the last 7 days compared to the overall statistics.

Fig. 1.
figure 1

Dailyme - startpage showing the statistics for the last 7 days and the overall statistics

Figure 2 shows the overview of a single day of a typical DailyMe user. In the upper left, it shows the type of activities and the proportional distribution for the day. Next to it, it shows a map view visualizing the places a user has visited and the routes the user has taken that day. Below, the photos taken on that day are shown, followed by an overview of the places. This information describes the day of the user in form of behavioral and location based patterns. DailyMe also collects data about the sleep duration and efficiency and the users’ weight. This information is collected automatically using external apps. To better describe the day, or to allow users to make notes about memorable events not captured before, DailyMe also allows tagging the day or writing down notes.

Fig. 2.
figure 2

DailyMe - view for a single day - showing the type of activities, places visited, photos, sleep and weight information, tags and text.

This describes the data foundation of DailyMe and is similar to other applications such as Day One. In the next section, we revisit the 5 Rs and discuss how they can be approached with the currently available data based on the DailyMe scenario.

3.2 The 5 Rs and DailyMe

  • Recollecting - Recalling a specific moment in life: With the data available, we can search for places and with the right image processing capacity, e.g. detecting persons in a picture, we can also search for multimedia information. By adding context information such as weather data a basis for searching and retrieving a specific moment in life is possible.

  • Reminiscing - Recalling a specific moment for emotional or sentimental reason: This use case is currently only possible when the user enters emotions or sentiments by hand. As stated in the beginning, the goal for lifelogging is the automatic capturing of data, thus, collecting and using emotions is currently not feasible. Maybe future developments such as integrated sensors in watches or the detection of mood based on other behavioral data [7] gives us the possibility to collect such data.

  • Retrieving - Retrieve a previously encountered digital item or information: This is currently not possible with data collected by DailyMe. Capturing such information requires a deeper intrusion into the users hardware. Tools like RescueTimeFootnote 5 allow tracking every action on a personal computer, but as we use today a set of different devices such as smartphones, tablets and several computers, we face a heterogeneous data collection problem.

  • Reflecting - A more abstract representation of personal data to facilitate reflection on Reviewing of past experience: This is currently limited to a Quantified Self scenario. Thus, we can learn behavioral patterns and remind users to e.g. walk more or stand up from time to time, which is helpful for office workers. In the next section, we will discuss the data itself and present the “Data Problem” with lifelogs.

  • Remembering intentions - Remember to do, e.g. remembering to show up for appointments: A main benefit for users would be an IR system that is capable to learn from past behavior and alert users to do something, e.g. remember the user to pick up the kid after soccer. The problem with the currently collectable data is, that is does not allow us to do sophisticated forecasts. What is achievable is that we predict normal paths for users based on daily routines. This means that we can forecast the way to work for a user, based on typical patterns on weekdays. This is used for instance by Google Now and on Apple iOS to give users information about their way to work. But if the users have other destinations, we cannot predict or forecast this.

In this section, we discussed the 5 Rs and discussed their feasibility based on the presented DailyMe scenario. While some of the use case can be accomplished with the given sensors, to some extent, most of them are not realizable given the existing data.

3.3 The Data Problem

With the currently collectable data using external applications and devices, we still get a limited amount of information about a user and his intentions.

Figure 3 shows the normal step distribution per weekday from a typical user working in an office at TU Berlin. The distribution is computed using 600 days of data. We see that there are only small differences between most days of the week. Only Sunday seems to varying a bit with fewer steps than usual.

Fig. 3.
figure 3

Mean step distribution per weekday. Sunday is the first day of the week

We observe the same data distribution when we look at the sleeping patterns of the same user, see Fig. 4. The sleep is almost a uniform distribution with a bit more sleep on Saturdays. The efficiency, which takes into account how often the users’ sleep is interrupted, is also uniformly distributed.

Fig. 4.
figure 4

Mean distribution of the sleep efficiency and the minutes of sleep per weekday. Sunday is the first day of the week.

From an information retrieval point of view, this data holds not many opportunities to detect and learn different intentions of the user. We can learn typical patterns like the users’ way to way to work, but not much more. Especially when we take into account that the 5 Rs are closely related to sentiments and emotions, we observe a lack of data and information.

This becomes even more visible when we visualize the data as points with distances describing the dissimilarities between the points, see Cox et al. and Gower [5, 6]. We see that most points are similar to each other, see Fig. 5. Only few outlier exist, which are days with very few sleep (points above the point cloud), and days with very few steps (left of the cloud).

Fig. 5.
figure 5

Data points of the example user. Each point describes a day

We can detect these outliers and data-wise understand why their outliers. But we cannot detect or understand the reason from the data. We still need information from the user, e.g. tagging or describing the reason manually. Another real world problem we face is the tracking of users with different devices and comparing their behavior. If we for instance compare results from users tracking with a FitBit One compared to a Jawbone UpFootnote 6 bracelet, results for steps differ around 10–15 percent.

4 Conclusion

In this paper, we described lifelogging use cases that go beyond the ideas of Quantified Self and connected these use cases to a real world application, DailyMe, which aggregates and collects different types of data from users. What we see is that with todays devices, we can hardly fulfill any of the uses cases described by Sellen and Whittaker, the 5 Rs. While the Recollecting Use Case is achievable with the given data, IR and image processing technologies, the other four use cases need considerably much more data to be tracked. From the IR point of view, data about the user and his feelings and emotions is needed to be able to support users in finding moment or episodes of his life. Future tracking devices have to go beyond pure counting of activities and start collecting data like heartbeat or skin resistance to draw conclusions of the users emotions. What also has to be added is more location data and context data. For example weather conditions such as humidity to also distinguish if a user feels comfortable because of the activity he is doing or of the weather condition.

Beside the missing data, the problem of heterogeneous data becomes more urgent in the near future. As more and more devices become available, lifelogging applications need to aggregate and homogenize the collected data to make it comparable. Based on the currently available data, the use of lifelogging data is still mainly focused on health related use cases. By extending the tracked data to emotions and other features giving information about the users intention, we will be able to achieve more of the presented lifelogging uses cases.