Comparison of De-Identification Techniques for Privacy
Preserving Data Analysis in Vehicular Data Sharing
Sascha Löbner
Frédéric Tronnier
Sebastian Pape
Kai Rannenberg
sascha.loebner@m-chair.de
frederic.tronnier@m-chair.de
sebastian.pape@m-chair.de
kai.rannenberg@m-chair.de
Chair of Mobile Business & Multilateral Security, Goethe University Frankfurt
Frankfurt am Main, 60323
Figure 1: Data flow chart and de-identification techniques, including the Vehicle/Driver, Business Intelligence Provider (B-IP)
and Energy Grid Operator (EGO).
ABSTRACT
Vehicles are becoming interconnected and autonomous while collecting, sharing and processing large amounts of personal, and
private data. When developing a service that relies on such data,
ensuring privacy preserving data sharing and processing is one of
the main challenges. Often several entities are involved in these
steps and the interested parties are manifold. To ensure data privacy, a variety of different de-identification techniques exist that
all exhibit unique peculiarities to be considered. In this paper, we
show at the example of a location-based service for weather prediction of an energy grid operator, how the different de-identification
techniques can be evaluated. With this, we aim to provide a better
understanding of state-of-the-art de-identification techniques and
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
CSCS ’21, November 30, 2021, Ingolstadt, Germany
© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9139-9/21/11. . . $15.00
https://doi.org/10.1145/3488904.3493380
the pitfalls to consider by implementation. Finally, we find that
the optimal technique for a specific service depends highly on the
scenario specifications and requirements.
CCS CONCEPTS
· Security and privacy → Data anonymization and sanitization; Security requirements.
KEYWORDS
privacy, de-identification, anonymization, autonomous vehicles,
automotives, privacy preserving data analysis, data sharing
ACM Reference Format:
Sascha Löbner, Frédéric Tronnier, Sebastian Pape, and Kai Rannenberg. 2021.
Comparison of De-Identification Techniques for Privacy Preserving Data
Analysis in Vehicular Data Sharing. In Computer Science in Cars Symposium
(CSCS ’21), November 30, 2021, Ingolstadt, Germany. ACM, New York, NY,
USA, 11 pages. https://doi.org/10.1145/3488904.3493380
1
INTRODUCTION
The ongoing digitalization of automotive vehicles is partially driven
by the desire of autonomous driving. However, already in its current
state, automotives form a swarm of moving sensors, permanently
CSCS ’21, November 30, 2021, Ingolstadt, Germany
recording various kinds of data. Thus, an obvious idea is to make use
of this data for other purposes than (autonomous) driving. For example, the data could be integrated into concepts for smart cities. A
report from McKinsey already discussed in 2016 the monetization of
car data [Bertoncello et al. 2016] and recently, Inrix announced their
data marketplace IQ1 for anonymized location-based data [Yvkoff
2020].
However, anonymizing location-based data is not an easy task
and can easily lead to privacy or compliance violations. If the related
data can be used to identify the vehicle, the driver or the passengers
it has to be considered as personally identifiable information (PII).
E. g. [Roth et al. 2020] showed in the context of individual insurance
models that the identification of a driver in a group of all users of
a vehicle was possible with more than 90% accuracy. Due to the
General Data Protection Regulation (GDPR) [European Parliament
and Council of The European Union 2016], companies need to
provide documentation for the explicit consent of EU citizens if
they want to collect and process PII. This is also in line with a set
of privacy principles by the Alliance of Automobile Manufacturers
(AAM) from 2014 [aam 2014] which encourages affirmative consent
for the collection of sensitive data such as the driver’s biometrics,
geolocation or driver behavior data. Pesé and Shin [2019] provide
an overview of relevant automotive privacy regulations. A report
from 2017 [gao 2017] seems to confirm that car manufactures aim
to respect these guidelines.
Getting the driver’s consent may not always be easy for the
car manufacturer since the driver does not need to be the owner
of the car. Thus, de-identification, preventing the identification of
persons from the collected data, avoids the need to ask for consent. While the consumers’ intention to use a car is still mostly
influenced by the perceived benefits and privacy risks only have
a minor influence [Buck and Reith 2020], legal and compliance
issues ś as sketched above ś provide a strong incentive to get the
de-identification right.
On the other hand, even though several anonymisation techniques are available and the European Data Protection Board (EDPB)
issued guidelines on the processing of personal data in the context of
connected vehicles including guidance on data anonymisation [edb
2020], a public consultation on the content of these guidelines reiterated the need for clearer guidance with good practices of data
anonymisation [Campmas et al. 2021]. The anonymization process
rises several challenges: i) Proving that the collected data is properly de-identified, which involves proving that any re-identification
is impossible. ii) The average on-the-road lifespan of a vehicle is
about 11 years with roughly 5 years before needed to design the
vehicle [Gardiner et al. 2021]. That requires foresight of almost
two decades or a process for regular updates. iii) With several
de-identification methods in place2 and a magnitude of possible
combinations, it is difficult to select the best or even a proper approach.
Certainly, there is no one-size-fits-all approach and the used approach needs to be aligned to the relevant scenario. In this paper we
aim to identify and evaluate suitable de-identification approaches
for a scenario in which weather data is collected by cars and shared
1 https://inrix.com/products/inrix-iq/
2 ISO/IEC
20889:2018 [2018] lists more than 20 different approaches
Löbner et al.
with the operator of an energy grid to allow the operator more reliable forecasts for the production of renewable energy (cf. Sect. 4).
The scenario is meant to be a realistic example but the lessons
learned in the analysis are not limited to the specific scenario and
can be transferred to related scenarios as well.
Our contribution is the presentation of an elaborated scenario
description (in Sect. 4) in which collected data is shared with a third
party without the need to ask the driver for consent. Furthermore,
we elicit requirements for the suitable de-identification approaches
based on our threat model (Sect. 5) and then discuss (dis-)advantages
of the considered de-identification approaches (Sect. 7).
2
BACKGROUND AND RELATED WORK
In this section we provide a brief overview of existing de-identification
techniques and related literature.
2.1
De-identification
This section aims to create a comprehensive overview on the current status of academic literature on de-identification methods and
techniques. In the following, an overview on the models available
in ISO/IEC 20889:2018 and the privacy preserving machine learning
literature is provided.
In general, the de-identification techniques from the ISO/IEC
20889:2018 [2018] are separated into 8 major classes. First, statistical
tools such as sampling and aggregation. Second, cryptographic tools
including deterministic, order-preserving, and Homomorphic Encryption (HE), as well as secret sharing. Third, suppression, including masking, local suppression, record suppression and sampling.
Four, pseudonymization aiming to replace original identifying attributes with independent pseudonyms, e.g. using randomization.
Five, granularization reducing the granulartiy of information in
attributes with techniques such as rounding or top/bottom coding. Six, randomization modifying attributes randomly utilizing
e.g. noise addition, permutation, or micro aggregation. Seven, differential privacy, a system to share information in a dataset while
withholding information about a single information in that dataset
[Dwork et al. 2006]. Eight, k-anonymity defining a state where
a person cannot be distinguished from k − 1 other persons in a
dataset [Samarati and Sweeney 1998].
Especially for machine learning scenarios, Al-Rubaie and Chang
[2019] expand that list by secure processors, also known as Trusted
Execution Environments (TEE) ensuring the confidentiality and
integrity of the source code. Moreover, in the domain of privacy preserving machine learning, Secure Multiparty Computation (MPC) is
a well-known approach to ensuring data privacy when computing
data from multiple sources [Chen et al. 2019; Li et al. 2020; Reich
et al. 2019]. Another technique for privacy preserving machine
learning is Federated Learning (FL) that enables the training of
a model on a local device without sharing all data with a central
entity [Li et al. 2020; Yang et al. 2019].
2.2
Related Work
In the following, prior scientific research on de-anonymisation
techniques are presented that are of importance to this work.
Gruteser and Grunwald [2003] analyse the anonymous usage of
vehicular location-based services, such as fleet management, traffic
Comparison of De-Identification Techniques for Privacy Preserving Data Analysis in Vehicular Data Sharing
monitoring or consumption-based car insurances that are based
on telematics. They conclude that the risk of re-identification and
location tracking can be reduced utilizing k-anonymous data. A
significant amount of research has also been conducted on mobilityrelated use cases such as VANETs using k-anonymity [Tomandl
et al. 2012; Wang et al. 2019] or HE [Farouk et al. 2020; Raja et al.
2020] to ensure the privacy of vehicles. Pape and Rannenberg [2019]
demonstrate how the application of privacy patterns in fog computing environments can improve the users’ privacy in a smart vehicle
use case.
Frank and Kuijper [2020] investigates vehicle users’ privacy concerns by evaluating the use of cameras and capacitive proximity
sensing in driver assistance systems. As a result of their survey
they find evidence that the anonymization by capacitive proximity
sensing is preferred. Thereby, they underline the impotency to address the privacy concerns of vehicle users by sufficient technical
solutions. Also Krontiris et al. [2020] find that the consumers acceptance of autonomous vehicles depends on a privacy preserving
design that protects against tracking.
Krumm [2007] compares different de-identification techniques
for inference attacks on location tracks utilizing an experimental
assement. He claims that the required degree of corruption for noise
or rounding is very likely to make location-based services unusable.
In his test environment, spatial cloaking based on k-anonymity was
only effective within a 2 km radius. Ribaric et al. [2016] reviews
techniques for de-identification of personal identifiers in another
context: multimedia contents. They classify personal identifiers
into non-biometric, biometric and soft biometric identifiers.
Kumar et al. [2018] review and compare existing techniques for
de-identification with the aim to protect the personal privacy. In
their conclusion they line out that k-anonymity, 1-diversity and
T-closeness can reduce the risk of personal data unveiling although
they are vulnerable against some privacy attacks. Also Murthy
et al. [2019] analyze and compare perturbation, anonymization and
cryptographic approaches. They conclude that from the compared
techniques, suppression stands out while swapping lags behind
due to massive resource consumption. Al-Rubaie and Chang [2019]
elaborate on techniques to protect the privacy of users for certain
machine learning tasks. Majeed and Lee [2020] provide an overview
of de-identification techniques for relational tables to complex social graphs. They classify the techniques into graph modification,
generalization/clustering, privacy-aware graph computation, DP
approaches and hybrid graph anonymity methods. Moreover, they
come to the conclusion that traditional anonymization techniques
do not perform well without further improvements. Rao et al. [2018]
compare de-identification techniques for large scale data in third
party data sharing. They come to the conclusion that there is no
concrete solution yet. Nevertheless, they see future potential in
machine learning-based techniques. Wernke et al. [2014] compare
different privacy approaches to protect location privacy. They conclude that the combination of different attacks is still a challenge
for the de-identification approaches they analyzed.
Rinaldo and Horeis [2020] present a model to achieve a realistic assessment of autonomous structures considering the relation
between safety and security. However, their approach does not
consider privacy requirements at all.
3
CSCS ’21, November 30, 2021, Ingolstadt, Germany
METHODOLOGY
In the following section we explain how we have chosen, selected
and evaluated de-identification techniques for our use case.
3.1
Scenario Development
The scenario was developed in multiple video calls with experts
from the Research Association for Automotive Technology (FAT), a
department of the German Association of the Automotive Industry
(VDA3 ). Altogether, three scenarios were developed. The iterative
procedure consisted of a presentation of the current version of
the scenario by the authors of this paper. The presentation was
intermingled and followed by feedback from the experts. After
the feedback, the scenario was revised for the next presentation.
Altogether, there were five feedback loops until the scenarios were
considered to be mature.
3.2
Requirement Elicitation
To elicit the requirements in section 5, we started building a threat
model for the presented use case (cf. section 4) in collaboration
with the experts from FAT. From the related literature we identified
potential risks and mapped them to the presented use case. We
especially focus on risks for location-based sevices as, e.g. introduced by Wernke et al. [2014]. A deeper discussion will exceed the
focus of this paper because possible extension and mathematical
definition have to be introduced. Again, with this paper we aim
to provide a starting point to choose a de-identification technique
when developing a location-based service in vehicular networks.
Nevertheless, potential attacks and drawbacks of each technique
are elaborated in the results (section 7).
After we identified the potential risks for the vehicle/driver we
elicited the most important requirements the de-identification techniques have to fulfill.
From the scenario and the requirements we derive attributes to
evaluate the de-identification techniques which meet the requirements.
3.3
Selection of De-identification Techniques
To identify possible techniques we use the ISO/IEC 20889:2018
and de-identification literature in the vehicular domain. To select
suitable de-identification techniques we use the scenario definition
and requirements. Insufficient solutions are also excluded.
3.4
Analysis of De-identification Techniques
Finally, we present possible implementations with the leftover deidentification techniques. We evaluate the de-identification techniques with the attributes derived earlier. Finally, we map the results
in an overview table.
4
SCENARIO DESCRIPTION
The aim of this use case is to provide a third party, the Energy Grid
Operator (EGO), with accurate and current weather data. This data
is gathered by vehicles on the road within a specific area for which
the EGO needs more or more accurate information. Figure 2 depicts
3 German: Verband der Automobilindustrie e. V. is a German interest group of the German
automobile industry consisting of automotive manufacturers as well as automobile
component suppliers
CSCS ’21, November 30, 2021, Ingolstadt, Germany
Löbner et al.
Table 1: Communication Channel A: From vehicle to B-IP
Data
Privacy
Sensitivity
Data truthfulness
at record level
Frequency
Brightness
Rain
Temperature
Atmospheric pressure
Humidity
GPS
VIN
Low
Low
Low
Low
Low
High
High
No
No
No
No
No
No
Yes
1/min
1/min
1/min
1/min
1/min
1/min
1/min
Table 2: Communication Channel B: From B-IP to EGO
Figure 2: High-level data flow chart
a high-level overview of the use case while the entities in the use
case are described below in more detail.
4.1
Entities
The entities in this use case are defined as follows:
• E1 Vehicle The vehicle driving within a certain geographical area is using multiple sensors to collect live weather
information such as brightness, rain and humidity. This information, together with the current vehicle location, is used
to provide real-time weather insights as a service to the B-IP.
The concern of the driver is that no personal data about the
driver of a vehicle is shared with the B-IP. The sensors of
the vehicle create data in a frequency of 60 data points per
minute. There are multiple vehicles on the road.
• E2 Vehicle drivers Vehicle drivers can be both the owner
of a vehicle as well as other individuals such as friends and
family members of the vehicle owner. For the technical evaluation of this use case, a differentiation is not necessary. For
the sake of simplicity, we will also not differentiate between
driver and vehicle.
• E3 Business Intelligence Provider (B-IP) The B-IP is
responsible for analyzing and preparing the data and follows
the need-to-know principle. Thus, the B-IP only receives
data that is mandatory to meet the EGO’s requirements.
• E4 Electricity Grid Operator (EGO) The EGO uses the
data from the B-IP for energy demand predictions. Therefore,
the EGO requires aggregated data once per minute. The
data quality is required to exhibit enough information to
make reliable energy demand predictions. Therefore, the
EGO requires data in a frequency of 1 data point per minute.
• E5 Manufacturer The vehicle manufacturer is the initiator of the use case and receives information on the correct
functioning of the service itself. This may include information on the total amount of data that has been processed and
aggregated statistics on the provided service. The vehicle
manufacturer is not directly involved in the processing and
providing of data and is therefore not further considered in
this scenario. No sensitive data is exchanged between B-IP
and the manufacturer.
Data
Privacy
Sensitivity
Data truthfulness
at record level
Frequency
Brightness
Rain
Temperature
Atmospheric pressure
Humidity
GPS
Low
Low
Low
Low
Low
High
No
No
No
No
No
No
1/min
1/min
1/min
1/min
1/min
1/min
4.2
Data Flow
In this section we establish privacy sensitivity and data gathering
frequency for the different types of data that are to be used in the
use case. The different communication channels are derived from
Figure 2. Table 1 is showing the data gathered by the vehicle and
send to the B-IP via channel A.
Both EGO and B-IP place requirements on the data quality and
the frequency with which the data is to be provided to them. Table
2 is showing the data that arives at the B-IP and is send to the EGO.
4.3
Assumptions
This use case comes with several assumptions:
• A1 Personal data sharing No direct personal data is shared
about the owners and drivers of a vehicle.
• A2 De-Identification An optimal solution provides data
privacy for all data types that occur in this use case.
• A3 Data frequency For its purposes, the EGO requires
the aggregated data once per minute.
• A4 Data quality The data quality after the data collection
is sufficient for the EGO’s purpose.
5
REQUIREMENT ELICITATION
Wernke et al. [2014] claim that one approach for de-identification
in location privacy scenarios is hiding the users’ identity while
only reveling the position of anonymous objects. One of the major
threats they identified for this approach is the linking of context
information with the anonymized location. Furthermore, according to them, another approach is to only provide location data to
customers with a certain accuracy. Moreover, temporal information strongly influences the threat of context information linking
[Wernke et al. 2014].
5.1
Threat Model
In this section we present possible attack scenarios for the EGO
use case. These form the basis for finding adequate solutions that
Comparison of De-Identification Techniques for Privacy Preserving Data Analysis in Vehicular Data Sharing
can withstand such or similar attack scenarios. The thread model
is derived from the protections goals of Wernke et al. [2014] that
covers: user identity, user position, and temporal information in
combination with identity or position.
5.1.1 Exact location determination . This attack tries to reveal the
exact location of the vehicle from a perturbated GPS location. This
is done by combining the perturbated GPS location with brightness
or rain data and an external database containing tunnel data. If all
cars in a certain area report and one car does not, one can assume
that this car was driving through a tunnel at the time of reporting.
Because the number of tunnels in a certain area is limited, the
location can be guessed precisely and thereby the perturbation is
annulated. Wernke et al. [2014] describe this attack as map matching
in which irrelevant areas are removed until a certain user can be
identified.
5.1.2 Vehicle Tracking and Track Localization. Even with perturbated GPS signals, a malicious B-IP can easily be track certain
vehicles if the speed limits on certain roads are known. This information can be easily accessed with an external database. Although
the B-IP does not get the true location, the average speed can be
calculated over time and based on this, possible roads or highways
can be identified. Also, a database with traffic information containing traffic jams and accidents can leverage this attack. Gruteser and
Grunwald [2003] claim that privacy problems in vehicular environments are magnified if a service requires continous recording and
sharing of location data.
5.1.3 Linkability and Profiling. If a VIN number can be clearly
mapped to a certain vehicle, a malicious B-IP can easily profile a
certain vehicle over time. Although data is sent perturbated and
anonymized, the B-IP in this attack tries to identify certain vehicles
and creates profiles over time.
5.2
Requirements for De-Identification
From the presented threads we derive the following requirements:
• Unlinkability: The B-IP should not be able to identify a certain vehicle to lower the risk of profiling. This also holds for
the EGO who should also not be able to identify a certain
vehicle from the crowd.
• Location perturbation: No real GPS data is sent to decrease
the risk of identification. This requirement becomes more
difficult over time and is closely related to linkability.
• The quality of data should still be high enough to add value
to the EGO’s energy demand prediction model.
5.3
Attributes for De-identification Techniques
Using the threat models derived in section 5, we identify suitable
de-identification techniques that are able to provide an acceptable
level of data privacy. For each technique, the level of effort and
quality of results are determined. Hereby we focus on a qualitative
evaluation based on academic literature without the use of real data.
All de-identification techniques and the proposed de-identification
techniques are evaluated on the following aspects that we derived
from the requirements and the scenario:
CSCS ’21, November 30, 2021, Ingolstadt, Germany
• Protective effect The overall level of privacy that can be
achieved through the de-identification technique in the particular use case. An optimal solution is able to protect personal information against any attack scenario outlined in
this work.
• Complexity Complexity describes the overall complexity
to develop, implement and maintain a particular solution.
Oftentimes, a de-identification technique cannot simply be
put to work but requires careful fine-tuning towards the
specific type and frequency of gathered data as well as the
desired output. Additionally, techniques and their algorithms
need to interact with the environment in which they are
implemented.
• Runtime Runtime describes the time that the overall solution for a use case needs to perform all necessary tasks that
lead to the de-identification of data. This includes the actual
runtime of algorithms, the execution of code, and the gathering and distributing of data and results between different
entities.
• Degree of maturity The degree of maturity describes the
scientific and commercial advancement of a de-identification
technique. While some techniques are already used regularly,
others need not yet be suitable for commercial use.
• Implementation effort The overall effort that needs to be
taken to implement the solution for a specific use case. This
includes the provision, installation and fine-tuning of hardware and software for the specific entities as well as the time
and human resources that are needed for its implementation.
• Monetary cost Monetary cost includes the cost of development and procurement of all necessary hard- and software
for each use case. Possible interfering factors are use casespecific circumstances and factors that might hinder the performance, effectiveness and efficiency of the de-identification
technique.
The quality of the data after the implementation of the de-identification
techniques are evaluated on the following aspects:
• Time blur Time blur depicts the degree to which data loses
information that are related to a specific time point. That
means data that is gathered over a period of time and might
then be aggregated to a single data point. Here, time-related
information gets lost, resulting in time blur.
• Time delay Time delay depicts the delay with which data
is reported and can be acted upon. That is, data might be
collected continuously but loses its value as the computation
of results takes significant time, resulting in a time delay
that decreases the value of created insights.
• Location obfuscation Location obfuscation depicts the degree of obfuscation applied in a specific scenario. Location
data might for instance be aggregated on a street, city or
kilometer basis.
• Processing speed describes the execution time of the deidentification technique itself.
• Aggregated data Aggregated data describes a state in which
data that is gathered during a use case is aggregated and thus
a loss of information in the data occurs. While most scenarios
allow for some aggregation, as the amount of data that is
CSCS ’21, November 30, 2021, Ingolstadt, Germany
produced is high, more aggregation is likely to decrease the
usability of a de-identification technique.
• Truthfulness Truthfulness describes whether input data
and output data are equal when using a de-identification
technique. Different techniques may report non-truthful data
when data is perturbed, noise is added or the sequence of data
is changed. Less truthful data output can decrease validity
of insights that are generated in a use case. The combined
evaluation of the different aspects described above enables us
to make a statement on the overall suitability and usability
of a de-identification technique for a particular use case.
For each use case, a table is provided comparing all suitable
de-identification techniques against each other. Factors are
ranked as Low, Medium and High, whereby a color-code
using red, yellow and green demonstrates the positive or
negative effect. For instance, a technique may score High
on complexity, which would result in a red color-code, as a
high degree of complexity is not seen as favorable.
6
SELECTION OF DE-IDENTIFICATION
TECHNIQUES
We aim to evaluate the de-identification techniques identified in
section 2.1 upon the EGO use case. Thus, this section includes the
technical evaluation of suitable de-identification techniques.
Upon accessing the assumption and requirements of the EGO
use case, all de-identification methods introduced in 2.1 have been
evaluated for their fit for the use case. Only de-identification methods that could initially demonstrate a sufficient level of privacy are
discussed in detail below. In general, Wernke et al. [2014] focus
on three dimensions to evaluate de-identification techniques: user
identity, user position and identity/position in combination with
time. They claim that a common approach for the de-identification
in location privacy scenarios is hiding the users’ identity while only
reveling the position of anonymous objects. One major threat they
identified for this approach is the linking of context information
with the anonymized location. Furthermore, they claim that to keep
a user’s position secret, location data should only be provided with
a limited accuracy to customers. From their point of view, temporal
information strongly influences the threat of context information
linking [Wernke et al. 2014]. In the following, we depict methods
that have been excluded, as well as a brief statement on as to why
they are deemed not suitable for the EGO use case.
Sampling does not provide privacy protection for the subset of
data and relies on a very high sample size, which is likely not to be
the case for vehicles driving in rural areas.
Considering the cryptographic approaches, deterministic encryption is not suitable for weather data, as only a very limited set of
information is used. This makes re-identification possible. Orderpreserving encryption is also not suitable for our use case because
the order of data is not important. In contrast to that, HE can provide the B-IP with information about the weather in a certain while,
the data itself of a certain vehicle remains encrypted. Thus, we
include HE for further consideration. Also SMC that can be treated
as a cryptographic approach fulfills the requirements of section 5.
In general, suppression does not fulfill the requirements because
removing certain values will significantly decrease the data quality
Löbner et al.
and therefore violate requirements [Krumm 2007]. Nevertheless,
suppression can be very well used in combination with other techniques. For example, removing direct identifiers of vehicles and
removing unique values are essential for some of the later mentioned de-identification approaches.
Similar to supression, pseudonymization would only work for
identifiers such as a vehicle ID in our dataset. A pseudonymization
of e.g. location or temperature will have a strong negative impact
on the data quality.
Generalization is suitable for sensor data with the data type float,
e.g., brightness, rain or temperature data. While rounding is very
easy to implement, top/bottom coding lags in a useful definition of a
threshold for the weather data. Moreover, rounding and top/bottom
coding are too weak for state-of-the-art GPS de-identification or
make the data unusable for location-based services [Krumm 2007].
Similar to the techniques above, randomization alone does not
protect against location tracking or corrupts the data in a way that
it becomes useless for location-based services [Krumm 2007]. Also,
the permutation of data does not work for a trajectory. The route
of a vehicle could still be identified.
TEE require a trusted third party for the setup of the TEE. Since
that body would be considered to be the data controller, this results
in several problems. First, it is unclear how assurances that the
TEE fulfills its purpose could be conveyed to the driver. Second,
the legal classification of responsibility for the TEE is not yet fully
clarified, and thus it remains unclear if data processing within the
TEE can contribute to the de-identification of data. TEEs seem to
be more appropriate to guarantee the correctness and freshness of
the data. However, a full assessment of TEEs is beyond the scope
of this paper.
DP must be considered in several dimensions. While central
DP allows the B-IP to see the data before the de-identification,
local DP might exhibit problems in the frequency of data sharing.
Nevertheless, the Encryption Shuffle Analyze (ESA) architecture
proposed by Bittau et al. [2017] is able to overcome these issues.
Also location perturbation with local DP as presented by [Andrés
et al. 2013] is useful for the use case.
K-anonymity is a comparatively simple concept that is easy to
implement, utilizing techniques such as data suppression or generalization to create a k-anonymous dataset. Moreover, k-anonymity
provides a trade-off between usability and privacy so the level of deidentification and data quality have to be evaluated independently
for each scenario [Andrew et al. 2019; Wang et al. 2020].
In FL, the data of each vehicle stays local and only the local
models’ gradients are shared with the B-IP. In theory it is possible
to design a model that fulfills the requirements.
To put it in a nutshell, the de-identification techniques that are
evaluated to be initially suitable are HE, MPC, Distributed DP, FL
and k-anonymity.
7
ANALYSIS OF DE-IDENTIFICATION
TECHNIQUES
In this section we analyze and compare the different privacy preserving data analysis approaches identified as suitable in the previous
section.
Comparison of De-Identification Techniques for Privacy Preserving Data Analysis in Vehicular Data Sharing
7.1
CSCS ’21, November 30, 2021, Ingolstadt, Germany
Homomorphic Encryption
As explained previously, the advantage of HE is that data can be
computed while it is encrypted, guaranteeing that computations
on the data lead to the same results on the decrypted data.
In the HE de-identification approach, the EGO distributes a secret
key to the vehicles on the road (see Figure 3). The vehicles that
collect the weather data then use their key to homomorphically
encrypt their location and weather data. The data is then distributed
to the B-IP. The B-IP is now able to process the data as determined
by the EGO. Meta data is deleted and average weather and location
data is sent to the EGO. All these operations are performed on
encrypted data, the B-IP is therefore unable to gain insights into
vehicles actual locations and other provided information. However,
operations on decrypted data result in the same operation on the
underlying data. The EGO is now able to use its secret key to decrypt
the data and use the encrypted results for the intended purpose.
Figure 4: MPC data flow chart
One approach for vehicular MPC communication is provided by
Li et al. [42] who propose a cooperative control strategy incorporating with efficient MPC, reducing latency and integrating a function
secret sharing scheme.
First, one interfering factor for this de-identification technique is
the vehicle density required per cluster. In case not enough vehicles
are located in a certain cluster, no information can be calculated and
sent to the B-IP. Second, a stable connection between the cars is required to use the MPC protocol. Third, the communication between
the vehicles is likely to produce a huge overhead so that besides a
good network coverage, a minimum bandwidth is mandatory.
7.3 K-anonymity
Figure 3: HE data flow chart
Nonetheless, HE creates several drawbacks. Although the technique itself has been available for some time, its actual usefulness is
still hindered by the loss of performance and computational speed.
Only a limited number of different operations, e.g. addition and
subtraction can be computed, while runtime increases greatly with
the number of computations. However, research on HE algorithms
continues to improve runtime, making HE a suitable solution for
mobility-related use cases in the near future. Additionally, the techniques do not rely on the number of vehicles on the road and do
not decrease the actual usability and truthfulness of the data.
7.2
K-anonymity in itself is not a de-identification technique but a
property with which data privacy in a database might be measured.
ISO/IEC 20889:2018 defines k-anonymity as a formal privacy measurement model ensuring that an equivalence class in a database
contains at least K records that are similar for each identifier.
In the EGO use case, the objective of vehicles is to obfuscate their
exact location and ensure that weather information cannot be used
for location inference. For k-anonymity, a map is clustered into
various mix points whereby each mix point fulfills the k-anonymity
requirement (see Figure 5). In the EGO use case, the map represents
the area in which vehicles are to gather weather information. This
Secure Multiparty Computation
MPC can be realized using a map that is separated in different clusters e.g., with a grid (see Figure 4). Vehicles in each of these clusters
calculate the average energy demand of a certain cluster with secure
multiparty computation. One vehicle of each cluster is then chosen
as the cluster leader that sends the computed result to the B-IP.
To avoid an identification of certain vehicles by the B-IP shuffling
between the cluster leaders is also possible. In general, this can also
have a positive effect on the minimum cluster size because it can
be larger if the vehicles cannot be linked to a certain cluster. Nevertheless, the more vehicles in a cluster, the better the accuracy of the
weather infomation and the better the de-identification because a
single vehicle can can be hidden more easily in the crowd. Finally,
the B-IP receives only the average cluster data. Based on this data,
e.g. a heatmap with the weather information can be derived that is
then shared with the EGO.
Figure 5: K-anonymity data flow chart
area is divided into mix points to increase the accuracy of information. The work of Corser et al. [2016a] introduces multiple different
protocols to create mix points such as stationary mix points, mix
points occurring at irregular time intervals or randomly chosen
mix points that may occur regularly or irregularly. An additional
option would be that vehicles themselves create mix points and act
as group leaders of other vehicles, thereby managing the fulfillment
of k-anonymity and the data distribution behavior of a group of
vehicles. Within such a mix point, whose center may for instance
be an intersection, vehicles switch pseudo IDs with other vehicles
and/or are added to an anonymity set and do not communicate
information for a specific time period. Essentially, the work uses the
de-identification techniques of suppression and pseudonymization
CSCS ’21, November 30, 2021, Ingolstadt, Germany
to achieve k-anonymity. Additionally, such a model could be enhanced by adding further simple de-identification techniques such
as aggregation, noise addition or permutation to it. Such options
would enhance privacy at the cost of a loss of quality of service
as the usefulness of data decreases. In [Corser et al. 2016b] the
authors decided against the use of such options as anonymizing, for
instance through spatial cloaking, cannot effectively protect against
tracking over time and leads to less precise results. Dummifying
has not been used as false location data might lead to accidents as
the authors’ use case has been to provide relevant safety traffic data
to other vehicles through a central service. However, in our use
case, exact location data is not as important as in other use cases as
the weather might not differ strongly in a 500m radius. Time delay
might also be acceptable to an extent, as weather will not change
significantly within 5 minutes.
Therefore, a combination of simple de-identification techniques
that fulfill k-anonymity are seen as a suitable alternative for the
EGO use case. In any case, the protective effect of this solution will
not be as high as that of more advanced methods such as MPC.
Multiple factors affect the level of privacy that can be obtained:
A lower vehicle density results in a lower K-value and a lower
level of privacy. The topology, e.g. the number of roads and the
speed of travel, influence privacy as fewer roads lead to less privacy.
Similarly, the choice and design of mixing points, depending on the
chosen protocol, need to be matched with such factors.
Complexity of the model is low while the runtime again depends
on the choice of techniques and protocols used. Such protocols
however already exist, creating mature solutions that could be
implemented quickly and at low monetary cost. As elaborated, data
may be sent from each vehicle or aggregated between vehicles.
Data could include dummy variables, resulting in non-truthful data.
Depending on the number of vehicles in a mix point and on the
road, the usefulness of the data might change. Less vehicles equal
larger mixing points and an increase in location obfuscation and
possibly time delay in order to ensure privacy.
Overall, while k-anonymity-based solutions might provide a
cheap solution that can be implemented easily, data quality and
the achievable level of privacy greatly depend on topology and the
number of vehicles within an area.
7.4
Distributed Differential Privacy
For this de-identification technique we utilize the system architecture Encode, Shuffle, Analyse (ESA) proposed by Bittau et al.
[2017] to implement distributed DP (see Figure 6). In general, the
architecture consists of three entities, an encoder, a shuffler and an
analyzer, as seen above. In the following we will have a detailed
look at the tasks of each entity in our concrete scenario with the
EGO.
Encoder: The encoder is responsible for ensuring the fulfillment
of the user’s trust assumptions by locally transforming and conditioning the user’s private data [Bittau et al. 2017]. In our EGO
use case one of these transformations is the location perturbation
providing local DP as proposed by Andrés et al. [2013] by fulfilling
the requirement of geo-indistinguishability. Moreover, the encoder
is responsible for the encryption of the data with an inner and
outer encryption, and the transmission over a secure channel to
Löbner et al.
Figure 6: DP data flow chart
the shuffler. As explained above, the encoder entity is placed on
the user’s device, in the EGO use case, we place the encoder in the
car.
Shuffler: The shuffler acts as an additional privacy layer in between the user’s encoder and the analyzer that should be run by a
trusted third party. The shuffler is responsible for the anonymization, shuffling, thresholding, and batching of the data received from
the encoder. By decrypting the outer encryption, the shuffler can
access the metadata of a user, e.g., timestamps, source IP addresses,
routing paths. The main task of the shuffler is to remove all this data
before forwarding it to the analyzer. To prevent the reassignment
of the data by the analyzer to a certain user, the data are reordered
randomly and forwarded infrequently and only in batches. Moreover, the shuffler can also set thresholds and reject data items to
ensure that each item can be hidden in a sufficient crowd.
Analyzer: The analyzer is responsible for the innermost decryption, storing and aggregation of the data received from the shuffler.
The analyzer utilizes techniques such as DP to make the data available for other groups of interest without revealing private user
information. In the EGO use case this role is taken by the B-IP. The
B-IP uses the data received from the shuffler to create a weather
map that is sent to the EGO.
The biggest issue of this approach is car density and appears if
only viewed cars are in a certain location. As a result of this, a single
car cannot be hidden sufficiently in the crowd and the shuffler has
to delay or withdraw the forwarding of certain batches. Therefore,
a minimum number of cars per region is required. Moreover, the
number of cars is influenced by area topology and daytime. In a
scenario where the EGO wants to make assumptions on the required
network load, e.g. for vehicular charging, the absence of data in a
certain region would point to a very low electricity demand. The
average demand for an area could be set approximately on historic
results or in dependency of the minimum number of cars.
7.5
Federated Learning
Similar to the MPC framework, the FL as de-identification technique can be realized by dividing the map in grid-based clusters.
The vehicles in the clusters then share data with each other (see
Figure 7). To ensure de-identification in vehicle-to-vehicle communication, a MPC protocol can be utilized. Besides the grid approach,
the clusters can also be determined utilizing the cars’ communication radius similar to the approach of Yin et al. [2020]. In both cases,
the cars will have to communicate with each other to determine a
leader of each cluster. In case the weather parameters in a certain
cluster did not change, the leader will not participate in the current
Comparison of De-Identification Techniques for Privacy Preserving Data Analysis in Vehicular Data Sharing
CSCS ’21, November 30, 2021, Ingolstadt, Germany
Also, a lot of communication is required for distributed learning
approaches, thus a sufficient network coverage is mandatory.
7.6
Figure 7: FL data flow chart
round of training to keep the traffic as low as possible. To fulfill
the requirement of unlikability, a distributed shuffling protocol as
proposed by Cheu et al. [2019] between all Leaders of a cluster can
be used to delete metadata and shuffle the data between the Leaders.
Location perturbation of the leaders within the clusters could also
be helpful. The leaders participating in a training round are then
responsible to send the locally derived model to the B-IP. The B-IP
cannot link the received data to the sender because the data was
shuffled before and metadata was deleted. In each training round,
e.g., every minute, the B-IP receives updated models from the leaders. These models are then used to develop a new central model.
This model is then sent to the EGO and also distributed to all cars.
A possible extension to keep the traffic low is to determine the new
Leader for a certain round in advance and only use the Leader’s
data in that round. The Leader could still exchange data with other
cars in the cluster but the model is only with the Leader. In future, further experiments are required to build the most efficient
model.
In literature some similar approaches were already proposed. Saputra et al. [2019] propose a FL model for energy demand prediction
for electric vehicle networks, but compared to our approach they
utilize the information gathered from the charging stations. On the
one hand, they use a FL model with the aim to reduce the communication overhead between the charging stations and the main server
with the central server. On the other hand, they protect the data of
the vehicle users by only transmitting relevant information in the
form of parameter updates to the central server rather than sending
whole data sets. Liu et al. [2020] present a traffic flow prediction
scheme using location-based clustering in combination with a FL
approach. In their approach they collect the information from organizations (e.g., bus stop or station) while randomly selecting only a
defined ratio of organizations from a larger group in each round of
training. Yin et al. [2020] propose a Federated Localization (FedLoc)
framework with the aim to build accurate location services without
revealing sensitive user information. They propose a cloud-based
network infrastructure that is based on many clusters that do not
overlap. These clusters are defined by the mobile communication
range of a mobile terminal, e.g., 5G macro and micro base stations
and WiFi6-networks that can enable a high communication rate.
The biggest pitfalls for the FL approach are vehicle density and
network coverage. A minimum number of vehicles is required to
form a cluster, otherwise no information can be sent to the B-IP.
Comparison of De-Identification
Technologies
In Table 3 we provide an overview of the attribute-evaluation for all
de-identification techniques. These results are derived from the deidentification technique specific analysis. In Table 4 we summarize
all interfering factors. It is important to mention that an łxž only
indicates that the de-identification technique is sensitive to small
occurrences of this interfering factor. For larger occurrences, all
de-identification techniques are effected. For example, if there is
only one vehicle, all de-identification techniques will struggle.
Table 3: Aggregated results of de-identification techniques
Protective effect
Complexity
Runtime
Degree of maturity
Implement. effort
Monetary cost
Time blur
Location obfus.
Processing speed
Time delay
Aggregated data
Truthfulness
HE
SME
Distr. DP
FL
K -anon.
⊕ High
⊖ High
⊖ High
⊙ Medium
⊖ High
⊙ Medium
⊖ High
⊕ Low
⊖ Low
⊙ Medium
⊕ High
⊖ High
⊖ High
⊙ Medium
⊖ High
⊖ High
⊙ Medium
⊖ High
⊖ Low
⊖ High
⊕ High
⊖ High
⊖ High
⊕ High
⊖ High
⊖ High
⊕ Low
⊕ Low
⊕ High
⊙ Medium
⊕ High
⊖ High
⊖ High
⊙ Medium
⊖ High
⊖ High
⊙ Medium
⊕ Low
⊕ High
⊙ Medium
⊙ Medium
⊕ Low
⊙ Medium
⊕ High
⊕ Low
⊕ Low
⊙ Medium
⊖ High
⊙ Medium
⊙ Medium
Yes
Yes
Yes
No
Yes
Yes1
Yes
Yes
Yes
No
1 (No for GPS)
Table 4: Possible interfering factors
Communic. overhead
Network coverage
Area topology
Car density
Car speed
8
HE
SME
x
x
x
x
x
x
x
Distr. DP
FL
x
x
x
x
x
x
x
K -anon.
x
x
x
x
DISCUSSION
In this section we provide a better understanding of the results,
including impact, limitations and future work.
8.1
Impact
When comparing the different solutions for the electronic grid operator use case, we find that all advanced de-identification techniques
are able to provide a high level of privacy for individuals and vehicles. However, all solutions are relatively complex and most of
them require further research or an extension to mitigate some of
the drawbacks, such as communication overhead or computational
costs. Although the de-identification techniques are very different,
they all exhibit the trade-off between usability of the data provided
to the EGO and the de-identification of the vehicle/driver. In practice, this trade-off will be complicated by specific project restrictions
such as costs, project duration or expected service lifetime.
CSCS ’21, November 30, 2021, Ingolstadt, Germany
For example, while a solution based on k-anonymity offers the
least amount of privacy protection, it is easily implementable, cheap
with an acceptable data output for the EGO. On the one hand, distributed DP and FL are both more complex solutions, but on the
other hand, they provide more fine-grained insights as the data
quality remains higher. Very accurate results can also be achieved
with HE, but the calculation on encrypted data might be slow and
require much more resources. Another drawback occurs if every
vehicle communicates directly with the B-IP. As with FL this can
be compensated by e.g. implementing data processing at the edge
that aggregate results before they are sent to the B-IP for further
processing. Nevertheless, this has also drawbacks because the complexity of the network typology will further increase and reveal
more targets for attacks. In general, distributed de-identification
techniques like FL have the advantage that data is processed directly on the device. This decreases communication overhead and
the computational effort at the B-IP. The B-IP can also not be affected by hacking attacks in which large amounts of data are stolen
because such data does simply not exist. Nevertheless, FL is a relatively new technology, so the absence of know-how might highly
influence the decision which technology to choose.
The external factors such as vehicle density, traveling speed and
network coverage, identified for each de-identification technique
are likely to significantly influence the stable execution of each use
case. Therefore, these problems should be considered as systematic
risk to the scenario that requires some effort to be compensated. For
example, traffic flow simulations could be used to verify solutions
by combining simulated traffic scenarios with actual vehicle data.
In the scenario description we defined the manufacturer as a passive entity that monitors the scenario. In practice, the manufacturer
will initiate more than one service, and some of these services will
also require the transmission of sensitive data. This is the reason
why we have not excluded the manufacturer from the beginning.
8.2
Limitations
One major pitfall of the proposed de-identification techniques is
the theoretical approach that was used to evaluate the techniques.
Although advantages and disadvantages of each techniques were
identified, they should be understood more as a general guideline.
E.g., the real performance of a technique can only be tested in
practice using real vehicular data and including all inferring factors
that have impact on data quality, delay of service, effort and costs.
Although not considered in the analysis so far, the knowledge
and past experience with a certain de-identification technique in
the implementing body can have a huge impact on the cost decision
and implementation effort.
Another pitfall to consider is the legal assessment of each evaluation technique. For example, HE is a key-based approach. Although
the keys are kept secret, there is the chance to steal the vehicle’s key.
Also, the privacy guarantee off DP is only a mathematical construct
and not a standardized method. Including different entities, such as
the shuffler, and minimum batch sizes, the underlying mathematical
construct changes or can, in the worst case, only be approximated.
8.3
Future Work
As mentioned in the limitations, the analysis of de-identification
techniques is missing a technical approach to identify possible
Löbner et al.
pitfalls during the implementation. In the future, we will work on
a technical comparison, e.g. using simulation or in depth technical
evaluation. Moreover, while machine learning is becoming more
relevant and the computation on client devices solves the problem
of communication overhead and data leakage, more work with
the focus on de-identification approaches for distributed learning
techniques should be carried out.
9
SUMMARY AND CONCLUSION
In this paper we provide clarity on the relevant techniques for the
de-identification of location-based services in the automotive area.
Focusing on the demand of developers with a similar scenario in
particular, we aim to provide decision support for the selection
of a suitable de-identification technique. To achieve this, we have
analyzed the third party vehicular data sharing using the example of the EGO scenario. For this scenario we have identified the
privacy threats that are: exact location determination, vehicle tracking and track localization, and linkability and profiling. Based on
these threats, we elicited requirements that helped us to select
de-identification techniques from ISO/IEC 20889:2018 and further
literature on vehicular de-identification techniques. We identified
the 5 techniques homomorphic encryption, secure multiparty computation, distributed differential privacy, federated learning and
k-anonymity, which we compared on the basis of a number of
relevant attributes. We find that no strategy is dominating the others because all techniques provide an increased privacy but differ
strongly in the other attributes. Our contribution is the elaboration of these attributes per technique. Based on our scenario, we
provided possible topologies of the de-identification techniques
and explained the relation and communication between the related
entities. We also analyzed the potential computational effort of each
entity and possible pitfalls for the de-identification techniques.
Our evaluation of de-identification techniques has shown that
within each de-identification technique different approaches to calculate the privacy gain. E.g., for differential privacy, standardized
methods would make the comparison of techniques much easier.
We also find that most de-identification techniques highly depend
on the network coverage. With a high bandwidth and stable connections, the bottleneck of communication overhead can also be
reduced. Moreover, we conclude that to keep costs low, privacy
has to be considered from the beginning to ensure that the offered
service is at the same time efficient and privacy preserving.
Finally, our results built a starting point to choose a sufficient
de-identification technique for a vehicular location data sharing
scenario. As our evaluation of the attributes for the de-identification
techniques is only based on the current literature, we will plan a
technical evaluation in the next steps.
ACKNOWLEDGMENTS
The authors are grateful to the Forschungsvereinigung Automobiltechnik e.V. (FAT e.V.) who funded this research. We are in particular
grateful to FAT’s working group łAK 31 Elektronik und Softwarež
who not only initiated this research and provided input and feedback on the underlying report [Rannenberg et al. 2021] in various
meetings.
Comparison of De-Identification Techniques for Privacy Preserving Data Analysis in Vehicular Data Sharing
REFERENCES
2014. Consumer Privacy Protection Principles ś PRIVACY PRINCIPLES FOR VEHICLE
TECHNOLOGIES AND SERVICES. https://cryptome.org/2014/11/auto-privacyprinciples.pdf
2017. Vehicle Data Privacy ś Industry and Federal Efforts Under Way, but NHTSA
Needs to Define Its Role. https://www.gao.gov/assets/gao-17-656.pdf
2020. Guidelines 1/2020 on processing personal data in the context of connected
vehicles and mobility related applications. https://edpb.europa.eu/sites/default/
files/consultation/edpb_guidelines_202001_connectedvehicles.pdf
Mohammad Al-Rubaie and J Morris Chang. 2019. Privacy-preserving machine learning:
Threats and solutions. IEEE Security & Privacy 17, 2 (2019), 49ś58.
Miguel E. Andrés, Nicolás E. Bordenabe, Konstantinos Chatzikokolakis, and Catuscia
Palamidessi. 2013. Geo-indistinguishability: Differential privacy for location-based
systems. In Proceedings of the ACM Conference on Computer and Communications
Security. https://doi.org/10.1145/2508859.2516735 arXiv:1212.1984
J. Andrew, J. Karthikeyan, and Jeffy Jebastin. 2019. Privacy Preserving Big Data Publication On Cloud Using Mondrian Anonymization Techniques and Deep Neural
Networks. In 2019 5th International Conference on Advanced Computing Communication Systems (ICACCS). 722ś727. https://doi.org/10.1109/ICACCS.2019.8728384
Michele Bertoncello, Gianluca Camplone, Paul Gao, Hans-Werner Kaas, Detlev Mohr,
Timo Möller, and Dominik Wee. 2016. Monetizing car dataÐnew service business
opportunities to create new customer benefits. McKinsey & Company (2016).
Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard
Seefeld. 2017. PROCHLO: Strong Privacy for Analytics in the Crowd. In SOSP
2017 - Proceedings of the 26th ACM Symposium on Operating Systems Principles.
https://doi.org/10.1145/3132747.3132769 arXiv:1710.00901
Christoph Buck and Riccardo Reith. 2020. Privacy on the road? Evaluating German
consumers’ intention to use connected cars. International Journal of Automotive
Technology and Management 20, 3 (2020), 297ś318.
Alexandra Campmas, Nadina Iacob, Felice Simonelli, and Hien Vu. 2021. Big Data and
B2B platforms: the next big opportunity for Europe ś Report on market deficiencies
and regulatory barriers affecting cooperative, connected and automated mobility.
Valerie Chen, Valerio Pastro, and Mariana Raykova. 2019. Secure computation for
machine learning with SPDZ. arXiv preprint arXiv:1901.00329 (2019).
Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. 2019.
Distributed differential privacy via shuffling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics). https://doi.org/10.1007/978-3-030-17653-2_13 arXiv:1808.01394
George P Corser, Huirong Fu, and Abdelnasser Banihani. 2016a. Evaluating location privacy in vehicular communications and applications. IEEE transactions on
intelligent transportation systems 17, 9 (2016), 2658ś2667.
George P. Corser, Huirong Fu, and Abdelnasser Banihani. 2016b. Evaluating Location
Privacy in Vehicular Communications and Applications. IEEE Transactions on
Intelligent Transportation Systems 17, 9 (2016), 2658ś2667. https://doi.org/10.1109/
TITS.2015.2506579
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating
noise to sensitivity in private data analysis. In Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/11681878_14
European Parliament and Council of The European Union. 2016. REGULATION (EU)
2016/679 General Data Protection Regulation (GDPR). http://eur-lex.europa.eu/
legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=DE
Fifi Farouk, Yasmin Alkady, and Rawya Rizk. 2020. Efficient privacy-preserving scheme
for location based services in vanet system. IEEE Access 8 (2020), 60101ś60116.
Sebastian Frank and Arjan Kuijper. 2020. Privacy by Design: Survey on Capacitive
Proximity Sensing as System of Choice for Driver Vehicle Interfaces. In Computer
Science in Cars Symposium. 1ś9.
Michael Gardiner, Alexander Truskovsky, George Neville-Neil, and Atefeh Mashatan.
2021. Quantum-safe Trust for Vehicles: The race is already on. Queue 19, 2 (2021),
93ś115.
Marco Gruteser and Dirk Grunwald. 2003. Anonymous usage of location-based services through spatial and temporal cloaking. In Proceedings of the 1st international
conference on Mobile systems, applications and services. 31ś42.
ISO/IEC 20889:2018. 2018. Privacy enhancing data de- identification terminology and
classification of techniques. INTERNATIONAL STANDARD (2018).
Ioannis Krontiris, Kalliroi Grammenou, Kalliopi Terzidou, Marina Zacharopoulou,
Marina Tsikintikou, Foteini Baladima, Chrysi Sakellari, and Konstantinos Kaouras.
2020. Autonomous Vehicles: Data Protection and Ethical Considerations. In Computer Science in Cars Symposium. 1ś10.
John Krumm. 2007. Inference attacks on location tracks. In International Conference on
Pervasive Computing. Springer, 127ś143.
Atul Kumar, Manasi Gyanchandani, and Priyank Jain. 2018. A comparative review
of privacy preservation techniques in data publishing. In 2018 2nd International
Conference on Inventive Systems and Control (ICISC). IEEE, 1027ś1032.
Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated
Learning: Challenges, Methods, and Future Directions. IEEE Signal Processing
CSCS ’21, November 30, 2021, Ingolstadt, Germany
Magazine (2020). https://doi.org/10.1109/MSP.2020.2975749 arXiv:1908.07873
Yi Liu, James J.Q. Yu, Jiawen Kang, Dusit Niyato, and Shuyu Zhang. 2020. PrivacyPreserving Traffic Flow Prediction: A Federated Learning Approach. IEEE Internet of
Things Journal (2020). https://doi.org/10.1109/JIOT.2020.2991401 arXiv:2003.08725
Abdul Majeed and Sungchang Lee. 2020. Anonymization techniques for privacy
preserving data publishing: A comprehensive survey. IEEE Access (2020).
Suntherasvaran Murthy, Asmidar Abu Bakar, Fiza Abdul Rahim, and Ramona Ramli.
2019. A comparative study of data anonymization techniques. In 2019 IEEE 5th Intl
Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference
on High Performance and Smart Computing,(HPSC) and IEEE Intl Conference on
Intelligent Data and Security (IDS). IEEE, 306ś309.
Sebastian Pape and Kai Rannenberg. 2019. Applying Privacy Patterns to the Internet
of Things’ (IoT) Architecture. Mobile Networks and Applications (MONET) ś The
Journal of SPECIAL ISSUES on Mobility of Systems, Users, Data and Computing 24, 3
(06 2019), 925ś933. https://doi.org/10.1007/s11036-018-1148-2
Mert D Pesé and Kang G Shin. 2019. Survey of Automotive Privacy Regulations and
Privacy-Related Attacks. (2019).
Gunasekaran Raja, Sudha Anbalagan, Geetha Vijayaraghavan, Priyanka Dhanasekaran,
Yasser D. Al-Otaibi, and Ali Kashif Bashir. 2020. Energy-Efficient End-to-End
Security for Software Defined Vehicular Networks. IEEE Transactions on Industrial
Informatics 3203, c (2020), 1ś1. https://doi.org/10.1109/tii.2020.3012166
Kai Rannenberg, Sebastian Pape, Frederic Tronnier, and Sascha Löbner. 2021. Study
on the Technical Evaluation of De-Identification Procedures for Personal Data in
the Automotive Sector. Technical Report. Goethe University Frankfurt. https:
//doi.org/10.21248/gups.63413
P Ram Mohan Rao, S Murali Krishna, and AP Siva Kumar. 2018. Privacy preservation
techniques in big data analytics: a survey. Journal of Big Data 5, 1 (2018), 1ś12.
Devin Reich, Ariel Todoki, Rafael Dowsley, Martine De Cock, and Anderson CA
Nascimento. 2019. Privacy-preserving classification of personal text messages with
secure multi-party computation: An application to hate-speech detection. arXiv
preprint arXiv:1906.02325 (2019).
Slobodan Ribaric, Aladdin Ariyaeeinia, and Nikola Pavesic. 2016. De-identification
for privacy protection in multimedia content: A survey. Signal Processing: Image
Communication 47 (2016), 131ś151.
Rhea C Rinaldo and Timo F Horeis. 2020. A Hybrid Model for Safety and Security
Assessment of Autonomous Vehicles. In Computer Science in Cars Symposium. 1ś10.
Christian Roth, Sebastian Aringer, Johannes Petersen, and Mirja Nitschke. 2020. Are
sensor-based business models a threat to privacy? the case of pay-how-you-drive
insurance models. In International Conference on Trust and Privacy in Digital Business.
Springer, 75ś85.
P Samarati and L Sweeney. 1998. Protecting Privacy when Disclosing Information:
k-Anonymity and its Enforcement Through Generalization and Suppresion. Proc
of the IEEE Symposium on Research in Security and Privacy (1998).
Yuris Mulya Saputra, DInh Thai Hoang, DIep N. Nguyen, Eryk Dutkiewicz, Markus Dominik Mueck, and Srikathyayani Srikanteswara. 2019. Energy demand prediction
with federated learning for electric vehicle networks. In 2019 IEEE Global Communications Conference, GLOBECOM 2019 - Proceedings. https://doi.org/10.1109/
GLOBECOM38437.2019.9013587
Andreas Tomandl, Florian Scheuer, and Hannes Federrath. 2012. Simulation-based
evaluation of techniques for privacy protection in VANETs. In 2012 IEEE 8th international conference on wireless and mobile computing, networking and communications
(WiMob). IEEE, 165ś172.
Jinbao Wang, Zhipeng Cai, and Jiguo Yu. 2019. Achieving personalized k -AnonymityBased content privacy for autonomous vehicles in CPS. IEEE Transactions on
Industrial Informatics 16, 6 (2019), 4242ś4251.
Jinbao Wang, Zhipeng Cai, and Jiguo Yu. 2020. Achieving Personalized k-AnonymityBased Content Privacy for Autonomous Vehicles in CPS. IEEE Transactions on
Industrial Informatics 16, 6 (2020), 4242ś4251. https://doi.org/10.1109/TII.2019.
2950057
Marius Wernke, Pavel Skvortsov, Frank Dürr, and Kurt Rothermel. 2014. A classification
of location privacy attacks and approaches. Personal and ubiquitous computing 18,
1 (2014), 163ś175.
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine
learning: Concept and applications. ACM Transactions on Intelligent Systems and
Technology (2019). https://doi.org/10.1145/3298981
Feng Yin, Zhidi Lin, Yue Xu, Qinglei Kong, Deshi Li, Sergios Theodoridis, and Shuguang
Cui. 2020. FEDLOC: Federated learning framework for data-driven cooperative
localization and location data processing. https://doi.org/10.1109/ojsp.2020.3036276
arXiv:2003.03697
Liane Yvkoff. 2020.
The Success Of Autonomous Vehicles Hinges On
Smart Cities. Inrix Is Making It Easier To Build Them. Forbes.
https:
//www.forbes.com/sites/lianeyvkoff/2020/10/28/the-success-of-autonomousvehicles-hinges-on-smart-cities-inrix-is-making-it-easier-to-build-them/