Open AccessArticle

A Telemetric Framework for Assessing Vehicle Emissions Based on Driving Behavior Using Unsupervised Learning

Auwal Sagir Muhammad

^1,2,*

Cheng Wang

^1,2

and

Longbiao Chen

^1,2,*

Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, Xiamen 361005, China

Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Xiamen University, Xiamen 361005, China

Authors to whom correspondence should be addressed.

Vehicles 2024, 6(4), 2170-2194; https://doi.org/10.3390/vehicles6040106

Submission received: 28 October 2024 / Revised: 9 December 2024 / Accepted: 13 December 2024 / Published: 20 December 2024

(This article belongs to the Special Issue Sustainable Traffic and Mobility)

Download

Browse Figures

Figure 1
Methodology. "> Figure 2
Feature-level fusion of features. "> Figure 3
Ensemble Isolation Forest model. "> Figure 4
Emissions by driver behavior. "> Figure 5
Spatial distribution of anomalies. "> Figure 6
Spatial emission hotspots. (a) CO emissions; (b) HC emissions; (c) NOx emissions. "> Figure 7
Emissions by hour of the day. "> Figure 8
Emissions by day of the week. "> Figure 9
Distribution of anomaly scores. "> Figure 10
Emissions levels: Anomaly vs. Non-anomaly. "> Figure 11
Emissions by hour of the day with anomalies. "> Figure 12
Emissions comparison after reducing speed limit. ">

Versions Notes

Abstract

Urban vehicular emissions, a major contributor to environmental degradation, demand accurate methodologies that reflect real-world driving conditions. This study presents a telemetric data-driven framework for assessing emissions of Carbon Monoxide (CO), Hydrocarbons (HCs), and Nitrogen Oxides (NOx) in real-world scenarios. By utilizing Vehicle Specific Power (VSP) calculations, Gaussian Mixture Models (GMMs), and Ensemble Isolation Forests (EIFs), the framework identifies high-risk driving behaviors and maps high-emission zones. Achieving a Silhouette Score of 0.72 for clustering and a precision of 0.88 in anomaly detection, the study provides actionable insights for policymakers to mitigate urban emissions. Spatial–temporal analysis highlights critical high-emission areas, offering strategies for urban planners to reduce environmental impacts. The findings underscore the potential of interventions such as speed regulation and driving behavior modifications in lowering emissions. Future extensions of this work will include hybrid and electric vehicles, alongside the integration of granular environmental factors like weather conditions, to enhance the framework’s accuracy and applicability. By addressing the complexities of real-world emissions, this study contributes to bridging significant knowledge gaps and advancing sustainable urban mobility solutions.

Keywords:

vehicle emissions; telemetric data; Vehicle Specific Power (VSP); unsupervised learning; environmental impact

1. Introduction

The rapid expansion of urbanization and increasing vehicular traffic have significantly impacted the quality of urban air due to vehicular emissions, which degrade environmental sustainability. Cars, as the primary contributors to urban pollution, emit toxic compounds such as Carbon Monoxide (CO), Nitrogen Oxides (NOx), and hydrocarbons. These pollutants are a leading cause of air pollution, contributing to health issues, climate change, and environmental degradation [1]. According to the Euro emission standards, transport-related air pollutant emissions contribute significantly to the overall state of air quality in Europe, with Nitrogen oxides (NOx), Carbon monoxide (CO), Hydrocarbons (HCs), and Particulate matter (PM) as the key pollutants that they aim to reduce [2]. Traditional methods of vehicle emission assessment often rely on static models and laboratory tests, which fail to capture the dynamic nature of real-world driving conditions.

Historically, emissions were measured using standardized protocols like the Federal Test Procedure (FTP) and the Worldwide Harmonized Light Vehicles Test Procedure (WLTP) [3,4]. While these laboratory-based tests provided a consistent basis for regulatory compliance, they struggled to replicate the variability of actual driving environments, such as fluctuating traffic conditions and diverse road types [5,6]. As a result, the discrepancies between laboratory results and on-road emissions have become increasingly evident [7]. With the advent of telemetric data, there is an unprecedented opportunity to enhance the accuracy and scope of vehicle emission assessments by analyzing real-time data on vehicle behavior [8]. This is because it captures detailed real-time information on vehicle operation, which presents a way to overcome these limitations by providing a continuous stream of data on factors like speed, acceleration, and road conditions, which are crucial for a more accurate emissions model [9]. However, collecting labeled telemetric data is very expensive, labor-intensive, and time-consuming, making it difficult to build a large enough training set.

To address these challenges, we introduce a telemetric data-driven framework that leverages real-time vehicle behavior data for assessing vehicle emissions. Our study focuses on three key pollutants that significantly impact air quality: Nitrogen Oxides (NOx), Carbon Monoxide (CO), and Hydrocarbons (HCs). These pollutants are among the most harmful emissions produced by the transportation sector, and they are regulated under the Euro emission standards due to their detrimental effects on human health and the environment [10]. By targeting these pollutants, our research seeks to mitigate urban air pollution and contribute to global sustainability efforts. Specifically, we aim to analyze driving patterns, identify high-emission zones, and propose effective mitigation strategies. The framework we propose integrates telemetric data with unsupervised learning techniques, including Gaussian Mixture Models (GMMs) and Ensemble Isolation Forests (EIFs), to deliver a more comprehensive and actionable understanding of emissions. This study also aligns with EURO 6 emission standards, offering practical solutions to improve air quality by addressing exhaust emissions [11,12]. Specifically, the main contributions of this work are threefold.

Telemetric Framework with Unsupervised Learning: We propose an advanced telemetric data-driven framework that integrates unsupervised learning techniques, such as Gaussian Mixture Models (GMMs) and Ensemble Isolation Forests (EIFs), to accurately assess vehicle emissions under real-world driving conditions. This approach allows for the identification of high-risk driving patterns and the detection of anomalies, providing a more nuanced understanding of emissions than traditional methods.
VSP-Based Emissions Estimation: We introduce a method for estimating emissions based on Vehicle Specific Power (VSP) calculations, which allows for the measurement of pollutant levels under varying driving conditions. This method provides a more granular understanding of emissions, factoring in the instantaneous performance of vehicles during different driving scenarios.
Scalable Framework: Our framework is designed to be scalable and adaptable to various urban environments. Through the integration of case study insights on speed-limit enforcement, we demonstrate how the framework can directly inform policy by identifying high-emission zones and providing actionable data. The findings indicate that speed regulation, as an intervention strategy, can significantly reduce emissions in urban areas. This highlights the framework’s potential to support urban planners and policymakers in implementing targeted, evidence-based strategies for improving air quality and mitigating environmental impacts.

The remainder of this paper is structured as follows. Section 2 reviews the related work on vehicle emission assessment and the role of telemetric data in enhancing emission models. Section 3 outlines the proposed methodology, detailing the data preprocessing, feature engineering, clustering, emission estimation, and anomaly detection techniques used in this study. Section 4 presents the experimental setup and results, highlighting the effectiveness of the proposed framework. Section 5 presents the discussion and, finally, Section 6 concludes the paper with a summary of our contributions and suggestions for future work.

2. Related Works

2.1. Driving Behavior and Emission

Recent research has increasingly focused on the complex relationship between vehicular emissions, driving behavior, and emerging technologies, particularly emphasizing how these factors affect urban air quality. Significant contributions have been made in exploring how driving patterns, road conditions, and fuel types influence emissions. At the same time, emerging technological innovations, especially in machine learning, hold promise for mitigating these emissions.

Driving behavior is crucial in determining the level of vehicular emissions. Aggressive driving behaviors are characterized by rapid acceleration, high speeds, and frequent deceleration and are known to significantly increase the emissions of pollutants such as CO, CO₂, nitrogen oxides (NOx), hydrocarbons (HCs), and particulate matter (PM). For example, aggressive driving has been proven to result in a 56% increase in CO₂ emissions and a 15% increase in NOx emissions compared to more moderate driving [13]. Studies on diesel trucks have similarly shown that aggressive driving can more than double emissions of PM and particle-bound Polycyclic Aromatic Hydrocarbons (PAHs) [13]. Road conditions have also been proven to significantly affect emissions. Steeper road grades increase energy consumption and emissions, particularly CO₂, NOx, and PM, while descending grades can slightly reduce emissions, though not enough to offset the increases on positive slopes [14]. Driver experience is another key determinant of emissions. Novice drivers tend to produce higher levels of NOx and PM compared to experienced drivers, highlighting the potential benefits of driver training and behavioral interventions [15]. Eco-driving, which focuses on smoother acceleration and deceleration, has been shown to mitigate these effects, reducing both fuel consumption and emissions, especially among inexperienced or aggressive drivers [14].

Recent advancements in fuel technology have also contributed to reducing emissions. For example, high-detergency fuels have been shown to reduce CO, NOx, and HC emissions by preventing the formation of harmful deposits in engine components. A study on detergent-enhanced gasoline demonstrated a 5.1% reduction in fuel consumption and a 55.4% decrease in CO emissions [16]. Additionally, retrofitting vehicles with alternative fuel systems, such as Liquefied Petroleum Gas (LPG), has also proven effective in reducing emissions of PM and NOx [17]. Machine-Learning (ML) and predictive models are playing an increasingly important role in enhancing emissions management. Zhang’s (2023) Deep-Learning-Based Vehicle Emission Model (DPVEM-DGD) allows for more accurate predictions of emissions under varying conditions, such as fuel detergency [16]. This model facilitates more precise emissions regulation and real-time monitoring, aligning with policies like the Euro emission standards [17]. Traditional models such as MOVES and IVE are now being augmented or replaced by ML techniques, offering more dynamic and effective solutions for addressing real-world driving conditions and fuel variations [18].

Urban planning initiatives, particularly smart city technologies, offer considerable potential in reducing vehicular emissions. Real-time emissions monitoring systems and eco-routing navigation are examples of how data-driven decision-making can optimize traffic flow, reduce congestion, and lower pollutant levels [19]. Additionally, Automated Vehicles (AVs) programmed for optimal driving behaviors could reduce emissions by as much as 26% under certain conditions [19]. Behavioral interventions, such as eco-driving programs and real-time feedback systems, have been proven to significantly reduce emissions. Real-time systems that alert drivers to inefficient behaviors, like rapid acceleration or hard braking, can quickly improve fuel efficiency and pollutant output [18].

The integration of advanced fuel technologies, eco-driving initiatives, and machine-learning applications offers a comprehensive approach to mitigating vehicular emissions. Addressing both technological and behavioral factors enhances our understanding of emissions dynamics and provides actionable solutions for reducing urban air pollution. Future research should continue to explore the synergies between these approaches, emphasizing their real-world applicability and cost-effectiveness in achieving sustainable transportation systems.

2.2. Vehicle Emissions and Environmental Impact

The environmental impact of vehicle emissions has been a significant concern in environmental and transportation research for several decades. The primary pollutants emitted from vehicles—such as carbon dioxide (CO₂), nitrogen oxides (NOx), carbon monoxide (CO), Hydrocarbons (HCs), and particulate matter—are known to contribute substantially to air pollution, climate change, and public health risks [20]. CO₂, the dominant greenhouse gas, directly contributes to global warming, while HCs, NOx, and CO contribute to the formation of ground-level ozone and smog, exacerbating respiratory conditions like asthma and other lung diseases in urban populations [21,22].

Several studies have quantified the scale of this issue. For example, the Intergovernmental Panel on Climate Change (IPCC) has consistently highlighted the transportation sector as one of the largest contributors to greenhouse gas emissions globally, accounting for approximately 24% of direct CO₂ emissions from fuel combustion [23]. In urban areas, particularly in regions with high traffic density, transportation-related emissions can account for up to 70% of the local air pollution, with vehicle exhausts releasing a mixture of harmful gases and fine particulate matter that are linked to various health problems [24]. The effects of vehicle emissions extend beyond air quality and human health; they also have adverse ecological impacts. NOx and other pollutants contribute to acid rain, which damages forests, lakes, and buildings [25]. Additionally, these emissions play a role in nutrient pollution in coastal ecosystems, leading to phenomena such as eutrophication, which negatively affects marine biodiversity [26]. Studies such as those conducted by the EEA (European Environment Agency) have also noted the alarming rate of vehicle emissions’ contribution to premature deaths, particularly in densely populated areas [27].

Addressing the issue of vehicle emissions has prompted numerous policy interventions over the years. Regulatory frameworks like the Euro standards in Europe [28] and CAFE standards [29] in the U.S. have been instrumental in setting emissions limits and encouraging the adoption of cleaner vehicle technologies, such as Electric Vehicles (EVs) and hybrid cars. However, while these regulations have led to improvements, research indicates that technological advancements alone may not be sufficient to meet stringent climate goals. A comprehensive understanding of real-world driving conditions and behaviors, which heavily influence emission levels, remains essential to achieving sustainable reductions in vehicle emissions.

2.3. Vehicle Emission Assessment Through Telemetric Data

Vehicle emission assessment has evolved significantly over the years, transitioning from traditional laboratory-based methods to more advanced data-driven approaches. Historically, emissions were measured using standardized testing protocols such as the Federal Test Procedure (FTP) and the Worldwide Harmonized Light Vehicles Test Procedure (WLTP) [3,4]. These standardized methods are designed to measure emissions under controlled conditions, providing a consistent basis for regulatory compliance and vehicle certification. However, these traditional approaches face significant limitations, particularly in their ability to replicate the complexities of real-world driving [5,30]. Laboratory tests often fail to account for varying traffic conditions, road gradients, and diverse driving behaviors [31]. This discrepancy between controlled tests and actual on-road emissions has led to a growing recognition of the need for more representative methods of emission assessment that can better reflect the dynamic nature of vehicle operation in everyday conditions.

In response to these limitations, telemetric data have emerged as a valuable resource for enhancing vehicle emission modeling. Telemetric systems collect real-time data on various aspects of vehicle operation, including speed, acceleration, engine load, and road conditions [32]. These data provide a detailed and continuous record of how a vehicle is driven in real-world scenarios, offering a richer and more accurate basis for emission modeling. Studies, such as [33,34,35], have utilized telemetric data to develop modal emission models that account for the instantaneous operating conditions of vehicles. By capturing the variability in driving patterns and environmental conditions, telemetric data enable more precise emission estimates. They also offer a pathway to developing dynamic, real-time emission models better aligned with actual driving behaviors, thereby addressing the shortcomings of traditional laboratory-based assessments. Telemetric data, which involve the real-time collection of vehicle-specific information such as speed, acceleration, fuel consumption, and engine performance, have become increasingly valuable in the study of vehicle emissions. Unlike laboratory-based testing or traditional methods that rely on static models, telemetric data offer insights into how vehicles perform under actual driving conditions, providing a more accurate picture of emissions over time and across different contexts [33].

One of the most notable applications of telemetric data in emission studies is their use in Vehicle Specific Power (VSP) models [36,37]. VSP is a measure of the power demand placed on a vehicle’s engine at any given moment, considering factors like speed, acceleration, road grade, and air resistance. By analyzing telemetric data through VSP models, researchers can estimate vehicle emissions in real-time and under varying driving conditions. This method has been shown to outperform traditional static models, as it accounts for the dynamic nature of driving, which is often influenced by traffic, road conditions, and driver behavior. A study by Wong et al. was one of the first to highlight the utility of telemetric data in real-world emissions modeling [34]. His work showed that vehicle emissions fluctuate significantly with changes in driving conditions and that laboratory tests did not adequately capture these variations. Subsequent research has built upon these findings, with studies demonstrating that telemetric data can help identify high-emission driving behaviors, such as rapid acceleration and deceleration, which are typically overlooked in standardized testing environments [38]. The advent of telematics has significantly advanced large-scale emission monitoring, enabling researchers and policymakers to track emissions across entire vehicle fleets and over extended periods. For instance, by analyzing driving behavior within specific urban contexts, telematics can identify driving patterns and weather conditions that contribute to elevated emissions [39,40]. This information is critical for developing targeted strategies aimed at reducing emissions in particular areas or during specific times. Furthermore, telematics data facilitate the transition to cleaner and more sustainable transportation by providing insights that assist in minimizing fuel consumption and emissions [41]. Driving behavior has been shown to significantly impact both fuel efficiency and emissions.

3. Methodology

This study employs a structured, multi-phase approach to analyze telemetric and environmental data to assess vehicle emissions and identify high-emission driving patterns and areas for potential mitigation strategies. The methodology is divided into four interconnected phases: Data Preparation and Feature Engineering, Driver Behavior Clustering, Emission Estimation, and anomaly Detection as shown in Figure 1. The methodology addresses limitations in traditional laboratory-based emissions assessments, which often fail to capture the dynamic nature of real-world driving behavior and environmental variability [5,6]. Vehicle Specific Power (VSP) was selected as the primary metric for analyzing driving behavior because it effectively quantifies the relationship between a vehicle’s power demand and emissions under varying driving conditions [42]. This choice is supported by its widespread use in emissions modeling and its ability to reflect instantaneous driving dynamics [43,44]. Additionally, Gaussian Mixture Models (GMMs) were employed for clustering due to their ability to identify distinct driving patterns, including high-risk behaviors, within complex and heterogeneous datasets [45,46,47]. This method enables the differentiation of aggressive driving patterns strongly associated with pollutant spikes. To detect anomalies indicative of extreme emission events, the Ensemble Isolation Forest (EIF) was utilized, as it excels in identifying outliers in high-dimensional datasets with precision and efficiency [48].

3.1. Data Collection and Preprocessing

We utilized the CityTrek-14K dataset [49], a distinctive and extensive telemetric dataset comprising trajectories from 280 drivers, each contributing 50 trajectories, across three major U.S. cities: Philadelphia (PA), Atlanta (GA), and Memphis (TN). This dataset includes time series data capturing details such as timestamps, vehicle speeds, and GPS coordinates, with a collection frequency of 1Hz. Additionally, we collected a road network map from OpenStreetMap (OSM) (https://www.openstreetmap.org/ accessed on 8 August 2024) and extracted a road segment dataset from there that includes features like segment coordinates, segment length, distance, and POI density, among others. Weather data were also collected from OpenWeatherMap (https://openweathermap.org/ accessed on 8 August 2024), featuring attributes such as timestamp, temperature, humidity, and precipitation. Table 1 shows the description of the datasets used. These datasets were preprocessed to address missing values, encode categorical variables, and normalize numerical features. The preprocessed datasets were then fused into a comprehensive telemetric dataset through feature-level fusion, involving both spatial and temporal integration, as shown in Figure 2. Temporal fusion was achieved by aligning telemetry and weather data via timestamps, ensuring temporal coherence. Spatial fusion involved geometric operations to map telemetry data to corresponding road segments, thereby incorporating spatial context into the analysis.

3.2. Feature Engineering

Feature engineering is a crucial step that involves transforming raw telemetric data into more representative features for emissions estimation. The primary features engineered in this phase are road grade, Vehicle Specific Power (VSP), and driving behaviors. These metrics provide insights into the vehicle’s performance and environmental conditions, directly influencing emissions outputs.

3.2.1. Road Grade

Road grade represents the incline or decline of the road surface, measured as the ratio of elevation change over a horizontal distance. It is a key determinant of vehicle performance, as steeper inclines require more engine power, leading to higher emissions, while declines might reduce power requirements but could also lead to increased braking. The road grade is then computed as the ratio of the elevation difference to the horizontal distance. This is typically expressed as a percentage:

Road Grade = \frac{{Elevation}_{End} - {Elevation}_{Start}}{Distance} \times 100

(1)

where the elevation difference, which is the difference in elevation between two consecutive GPS points, is calculated, and the distance is the horizontal distance between these two points. A positive road grade indicates an uphill segment, while a negative grade indicates a downhill segment. This computed road grade is crucial because it significantly influences a vehicle’s fuel consumption and emissions. Steeper grades increase engine load, leading to higher emissions of pollutants such as CO₂, NOx, HCs, and CO

3.2.2. Vehicle Specific Power (VSP) Calculation

Vehicle Specific Power (VSP) is a more complex feature that quantifies the power demand on a vehicle during its operation. It considers various factors like speed, acceleration, and road grade, making it a highly informative metric for emissions estimation. VSP is a vital metric for understanding a vehicle’s energy efficiency and is used to estimate pollutant emissions. The formula for calculating VSP is

VSP = \frac{v \times (a + g \times sin (θ)) + Resistance Terms}{m}

(2)

where v is the vehicle speed (m/s), a is the vehicle acceleration (m/s²), g is the gravitational constant (9.81 m/s²),

θ

is the road grade (in radians), and m is the vehicle’s mass (kg).

The VSP feature is particularly useful because it allows emissions to be estimated more precisely based on the vehicle’s dynamic state (speed and acceleration) and the external environment (road grade). Higher VSP values typically correspond to aggressive driving (e.g., rapid acceleration, driving uphill), which results in higher pollutant emissions.

3.2.3. Event-Based Features

Event-based features are designed to capture specific driving behaviors that contribute to traffic risk, which include flags or counts for specific driving events like harsh braking, rapid acceleration, or overspeeding. These features indicate aggressive driving behavior and correlate with higher emissions and increased accident risk.

Harsh Braking: This feature identifies instances where the vehicle experiences sudden and significant deceleration. Harsh braking is detected when the deceleration exceeds a predefined threshold, $a_{threshold}$ . The equation is

$harsh_braking (t) = \{\begin{matrix} 1 & if \frac{Δ v}{Δ t} < - a_{threshold} \\ 0 & otherwise \end{matrix}$

(3)

where $\frac{Δ v}{Δ t}$ is the rate of change of velocity over time and $a_{threshold}$ is the predefined threshold for deceleration.
Rapid Acceleration: Rapid acceleration is identified when the vehicle’s acceleration exceeds a specific threshold, $a_{threshold}^{+}$ . The equation is

$rapid_acceleration (t) = \{\begin{matrix} 1 & if \frac{Δ v}{Δ t} > a_{threshold}^{+} \\ 0 & otherwise \end{matrix}$

(4)

where $\frac{Δ v}{Δ t}$ is the rate of change of velocity over time and $a_{threshold}^{+}$ is the predefined threshold for acceleration.
Overspeeding: This feature identifies instances where the vehicle exceeds the legal speed limit for the road segment. It is calculated as

$overspeeding (t) = \{\begin{matrix} 1 & if v (t) > v_{limit} \\ 0 & otherwise \end{matrix}$

(5)

where $v (t)$ is the vehicle’s speed at time t and $v_{limit}$ is the speed limit of the road segment.

3.2.4. Pattern-Based Features

Recurring patterns are identified through time series analysis of trip data. Rolling averages and variances of speed and acceleration help capture changes in driving behavior over time. For instance, the rolling acceleration mean and standard deviation over a trip can reveal smooth versus erratic driving patterns. These metrics are calculated to capture the variability and consistency of driving behavior over time.

Speed Variance per Trip: This feature measures the variability in the vehicle’s speed during a trip, indicating how much the speed fluctuates. It is calculated as

$σ_{speed}^{2} = \frac{1}{N} \sum_{i = 1}^{N} {(v_{i} - \bar{v})}^{2}$

(6)

where $v_{i}$ is the speed at time i, $\bar{v}$ is the mean speed over the trip, and N is the number of observations in the trip.
Rolling Acceleration Mean: This feature calculates the mean acceleration over a rolling window, providing insights into how the vehicle’s acceleration changes over time. It is given by

$μ_{acceleration} (t) = \frac{1}{W} \sum_{i = t - W + 1}^{t} a_{i}$

(7)

where W is the window size, t is the current time step, and $a_{i}$ is the acceleration at time i.
Rolling Acceleration Standard Deviation: This feature measures the standard deviation of acceleration within a rolling window, indicating the variability in acceleration over time. It is calculated as

$σ_{acceleration} (t) = \sqrt{\frac{1}{W} \sum_{i = t - W + 1}^{t} {(a_{i} - μ_{acceleration} (t))}^{2}}$

(8)

where W is the window size, t is the current time step, $a_{i}$ is the acceleration at time i, and $μ_{acceleration} (t)$ is the rolling mean acceleration.
Turning Rate: This feature measures the rate of change in bearing between consecutive data points, indicating sharp turns:

$turning_rate = | {bearing}_{t} - {bearing}_{t - 1} |$

(9)
Stop-and-Go Behavior: This feature is detected when the vehicle’s speed falls below 5 km/h and then increases:

$stop_and_go = \{\begin{matrix} 1 & if speed < 5 km / h and speed_change > 0 \\ 0 & otherwise \end{matrix}$

(10)

3.2.5. Contextual-Based Features

Contextual features provide additional information based on the environment in which the vehicle is operating. These features include the extraction of temporal context, such as the time of day from timestamps, and the one-hot encoding of categorical variables like road types, enabling the model to differentiate between various driving environments. Additionally, weather conditions are encoded, allowing the model to account for specific environmental factors during prediction.

Night Driving: Trips occurring before 6 AM or after 6 PM are flagged as night driving:

$is_night = \{\begin{matrix} 1 & if hour < 6 or hour > 18 \\ 0 & otherwise \end{matrix}$

(11)
Rainy Driving: Driving in rainy conditions is detected based on precipitation data (PRCP):

$is_rainy = \{\begin{matrix} 1 & if PRCP > 0 \\ 0 & otherwise \end{matrix}$

(12)
Weekend Driving: Driving on Saturday or Sunday is flagged as weekend driving.

3.3. Emission Estimation Using VSP Bins

In the emission modeling process, Vehicle Specific Power (VSP) is pivotal in determining vehicle emissions under various driving conditions. VSP serves as a proxy for the power demand placed on the vehicle by factors such as speed, acceleration, and road grade [50]. To estimate emissions accurately, the VSP values are categorized into bins representing different driving modes, such as idling, acceleration, and cruising. For each VSP range, specific emission factors are applied to estimate the rates of common pollutants such as Carbon Monoxide (CO), Hydrocarbons (HCs), and Nitrogen Oxides (NOx).

This approach leverages emission factors derived from real-world data collected using Portable Emission Measurement Systems (PEMSs), which measure the actual emissions produced by vehicles under a wide range of operating conditions [50]. The emission factors for each pollutant (CO, HCs, NOx) are applied to each record based on its corresponding VSP value. The VSP ranges and associated car emission factors are presented below in Table 2, adopted from [50].

3.3.1. VSP Range Determination

The VSP value for each vehicle at every point in the dataset is computed based on the vehicle’s speed, acceleration, and road grade, as calculated earlier in Section 3.2.1. Once calculated, these VSP values are categorized into the corresponding VSP bins, each representing a different vehicle operating:

Negative VSP values: Typically represent situations such as deceleration or driving downhill.
Positive VSP values: Represent power demand from activities like acceleration and cruising.

Each VSP range reflects varying degrees of energy exerted by the vehicle, with lower VSP ranges corresponding to more efficient or idling modes, and higher ranges representing aggressive driving or uphill conditions.

3.3.2. Applying Emission Factors

For each telemetric record in the dataset, the VSP value is used to assign a corresponding emission factor for CO, HCs, and NOx. These emission factors are derived from the measured data using PEMSs, ensuring they are calibrated to reflect real-world driving scenarios. The emission as the rates (in mg/s) for each pollutant are applied to the vehicle’s operation at each second, as the telemetric data are typically recorded at a 1 Hz frequency. This allows the model to estimate emissions on a per-second basis, capturing the nuanced impact of different driving behaviors. The general formula for estimating emissions for each second is as follows:

Emission Rate (mg / s) = Emission Factor (mg / s / veh) \times 1

(13)

where the emission factor is determined by the vehicle’s VSP value and its corresponding range. Once the emissions are calculated for each second of vehicle operation, the total emissions over time are determined by summing the individual emission rates for each pollutant across the entire trip. This provides an estimate of the total emissions generated by each vehicle during its operation, offering insights into the environmental impact of different driving behaviors and conditions.

Total Emission (mg) = \sum_{i = 1}^{n} Emission {Rate}_{i} \times Time Interval

(14)

where n is the number of telemetric records for a particular trip and

Time Interval

is typically 1 s if the data are collected at a 1 Hz frequency.

3.4. Driving Pattern Clustering Using GMM

In this section, the Gaussian Mixture Model (GMM) clustering algorithm is applied to categorize distinct driving behaviors based on a combination of driving, contextual, and emission-related features. The clustering aims to identify and group driving patterns that exhibit similar characteristics, which can subsequently be used to analyze how different behaviors contribute to both traffic safety and environmental impact. This method provides a probabilistic approach to clustering, allowing for a more flexible categorization of driving behaviors compared to hard clustering algorithms like K-means [51]. The probability density function for a GMM with K components is given by

p (x) = \sum_{k = 5}^{K} π_{k} N (x | μ_{k}, Σ_{k})

(15)

where

p (x)

is the probability of a data point

x

\sum_{k = 1}^{K}

sums over K Gaussian components,

π_{k}

is the weight for the k-th Gaussian component, and

\sum_{k = 1}^{K} π_{k} = 1

N (x ∣ μ_{k}, Σ_{k})

is the Gaussian distribution with mean

μ_{k}

and covariance matrix

Σ_{k}

The number of clusters is chosen based on domain knowledge and an empirical evidence evaluation using the Elbow Method. The model iteratively updates the parameters using the Expectation-Maximization (EM) algorithm to maximize the likelihood of the data fitting the mixture of Gaussians, as shown in Algorithm 1.

Algorithm 1: Clustering using GMM with EM Algorithm

: Input: Dataset D with features $x_{i} = {x_{1}, x_{2}, \dots, x_{n}}$ for each trajectory i
: Output: Cluster labels for each trajectory
1: Initialization: Initialize $μ_{k}$ , $Σ_{k}$ , $π_{k}$ randomly.;
2: repeat;
3: E-step: Calculate responsibilities;
4: $γ (z_{i k}) = \frac{π_{k} \cdot N (x_{i} ∣ μ_{k}, Σ_{k})}{\sum_{j = 1}^{K} π_{j} \cdot N (x_{i} ∣ μ_{j}, Σ_{j})}$
: M-step: Update parameters;
5: $μ_{k} = \frac{\sum_{i = 1}^{N} γ (z_{i k}) \cdot x_{i}}{\sum_{i = 1}^{N} γ (z_{i k})}$
: $Σ_{k} = \frac{\sum_{i = 1}^{N} γ (z_{i k}) \cdot (x_{i} - μ_{k}) {(x_{i} - μ_{k})}^{T}}{\sum_{i = 1}^{N} γ (z_{i k})}$

$π_{k} = \frac{\sum_{i = 1}^{N} γ (z_{i k})}{N}$
: until convergence criteria is met;
6: Assign each trajectory i to the component with the highest posterior probability;
7: Output the cluster labels for all trajectories

3.5. Anomaly Detection Using Ensemble Isolation Forest

The EIF model is a robust, tree-based algorithm that detects anomalies in high-dimensional data by isolating data points through recursive partitioning. The core principle behind Isolation Forest is that anomalies are few and different from the majority of the data. Thus, they can be isolated faster than normal data points, making Isolation Forest particularly suitable for identifying anomalous driving behaviors that may contribute to traffic risks or irregular emissions patterns. However, traditional Isolation Forest faces limitations when applied to high-dimensional or complex datasets. In these environments, the ability to effectively isolate anomalies diminishes as the data becomes more intricate, potentially reducing the accuracy and efficiency of detection. To overcome these challenges, we implemented an Ensemble Isolation Forest (EIF), which combines multiple individual Isolation Forest models as shown in Figure 3. This ensemble approach leverages the strengths of multiple trees, each trained on different subsets of the data, thereby enhancing the robustness and accuracy of anomaly detection. By aggregating the outputs of several models, the EIF reduces the variance associated with a single Isolation Forest and improves the ability to detect rare or subtle anomalies that may otherwise be missed.

By applying this model, we aim to detect outliers that reflect unsafe driving events, such as sudden accelerations, harsh braking, or extreme emissions, thereby allowing us to better model risk and environmental impact. We applied the Ensemble Isolation Forest to isolate abnormal driving patterns and anomalies within the comprehensive telemetric data, which are composed of vehicle speed, acceleration, VSP, and external factors such as road type and weather conditions. The EIF model is trained on these features to identify instances where driving behavior deviates from normal patterns, which could indicate risky driving or excessive emissions. The framework leverages these detected anomalies in two main ways:

Traffic Risk Assessment: Identified anomalies are linked with dangerous driving patterns, helping to classify trips based on risk levels.
Environmental Impact Analysis: Anomalies related to fuel consumption and emissions, such as sudden spikes in VSP or deviations in speed patterns, are flagged to better assess emission hotspots and inform mitigation strategies.

Model Architecture

Our Ensemble Isolation Forest’s architecture is designed to efficiently isolate potential anomalies from a multidimensional telemetric dataset. The architecture consists of the following components:

1.

Data Segmentation: The goal here is to select specific groups of features from the engineered features to train individual models. This involves choosing different feature subsets for different Isolation Forest models within the ensemble. The engineered features are segmented into several feature subsets, each capturing different dimensions of the data. The subsets include the following:

(a): Kinematic Features: Speed, acceleration, and Vehicle Specific Power (VSP).
(b): Emission Attributes: Concentrations of pollutants such as CO, HCs, and NOx.
(c): Behavioral Indicators: Hard braking, rapid acceleration, and overspeeding.
(d): Statistical Measures: Variance in speed and rolling statistics of acceleration.
(e): Contextual Features: Temporal and environmental factors such as night driving, rainy conditions, sharp turns, and stop-and-go behavior.

2.

Ensemble Isolation Forest Models: The purpose here is to train multiple Isolation Forest (IF) models, each on a different feature subset, to detect anomalies within that specific context. Below is the structure of each IS model:

(a): Input Layer: This layer receives the feature subset for the particular IS model.
(b): Isolation Tree Construction: The core of the Isolation Forest model is the construction of multiple binary trees, referred to as isolation trees. Each tree recursively partitions the data by randomly selecting a feature and a split point for that feature with the expectation that anomalies will be isolated in fewer splits compared to normal data points.
For a dataset $X = {x_{1}, x_{2}, \dots, x_{n}}$ with n observations, each isolation tree, T, a random feature, $f \in {f_{1}, f_{2}, \dots, f_{k}}$ (where k is the number of features), and a split value, $s_{f}$ , are chosen. The data are split into two subsets:

$Left Subset = {x_{i} \in X | x_{i, f} < s_{f}}$

(16)

$Right Subset = {x_{i} \in X | x_{i, f} \geq s_{f}}$

(17)

This process continues until each data point is isolated in its leaf node or the maximum tree depth is reached.
(c): Anomaly Scoring Mechanism: Once the trees are built, the model calculates how deeply a point is isolated in each tree. Anomalies tend to be isolated quickly, resulting in a shorter path length. The average path length for a point across all trees gives its anomaly score. Here, the expected path length, $E [h (x)]$ , for a point, x, in a dataset of size n is given by

$E [h (x)] = 2 H (n - 1) - \frac{2 (n - 1)}{n}$

(18)

where $H (n)$ is the harmonic number, and the anomaly score $s (x)$ is calculated as

$s (x) = 2^{- \frac{E [h (x)]}{c (n)}}$

(19)

with $c (n)$ being the average path length for a Binary Search Tree.
(d): Ensemble Aggregation: Since the Isolation Forest is an ensemble of isolation trees, the final anomaly score for each data point is the average of the scores across all trees in the ensemble:

$S (x) = \frac{1}{N_{T}} \sum_{i = 1}^{N_{T}} s_{i} (x)$

(20)

where $N_{T}$ is the total number of trees and $s_{i} (x)$ is the anomaly score from the i-th tree. The higher the value of $S (x)$ , the more likely it is that x is an anomaly.

3.

Anomaly Score Aggregation: After each Isolation Forest model outputs an anomaly score, the scores from all models are aggregated to produce a final comprehensive score for each data point. Aggregation can be done through averaging or other methods like majority voting. In our work, we use score averaging:

S_{final} (x) = \frac{1}{M} \sum_{i = 1}^{M} S_{i} (x)

(21)

where M is the number of models and

S_{i} (x)

is the anomaly score from the i-th model.

4.

Anomaly Classification: After aggregating the scores, a final threshold,

θ_{final}

, is applied to classify data points as anomalous or normal. The threshold can be dynamically adjusted based on the expected contamination rate or application-specific needs. If

S_{final} (x) > θ_{final}

, the point x is classified as an anomaly. The choice of threshold is critical for controlling the trade-off between false positives and false negatives.

$S (x) > θ$ : the point x is classified as an anomaly.
$S (x) \leq θ$ : the point x is considered normal.

Algorithm 2: Ensemble Isolation Forest for Anomaly Detection

Input: Telemetric dataset

D = {x_{1}, x_{2}, \dots, x_{n}}

, Number of feature subsets M, Number of Isolation Forest (IF) models N, Anomaly threshold

θ_{final}

Output: Anomaly labels for each data point

x_{i}

in D

3. Ensemble Aggregation:

Aggregate the scores from all models by averaging them:

S_{final} (x) = \frac{1}{M} \sum_{i = 1}^{M} S_{i} (x)

4. Anomaly Classification:

Classify each data point x as anomalous if

S_{final} (x) > θ_{final}

Otherwise, classify it as normal.

5. Return: Return the anomaly labels for each data point x in D.

4. Results

4.1. Data Description

The telemetric dataset utilized for the evaluation of the models in this study is derived from the CityTrek-14K dataset, which comprises detailed trajectories from 280 drivers across three major U.S. cities: Philadelphia (PA), Atlanta (GA), and Memphis (TN). This dataset includes key features such as vehicle speed, acceleration, GPS coordinates, and timestamps, collected at a frequency of 1 Hz. To enhance the analysis, we also collected complementary weather data and road network data. The weather dataset provides critical contexts for driving behavior, including attributes such as temperature, humidity, and precipitation. The road network data, sourced from OpenStreetMap, includes features like road segment identifiers, lengths, and types, which are essential for understanding the environmental context of each driving event.

After fusing the telemetric, weather, and road network datasets, we engineered several features related to driving behaviors, which include the following:

Kinematic Features: Speed, acceleration, and Vehicle Specific Power (VSP), which capture the dynamics of vehicle operation.
Emission Attributes: Concentrations of pollutants such as Carbon Monoxide (CO), Hydrocarbons (HCs), and Nitrogen Oxides (NOx), reflecting the environmental impact of driving behavior.
Behavioral Indicators: Metrics for hard braking, rapid acceleration, and overspeeding, which are indicative of aggressive driving patterns.
Statistical Measures: Variance in speed and rolling statistics of acceleration, providing insights into the consistency of driving behavior over time.
Contextual Features: Temporal and environmental factors such as night driving, rainy conditions, sharp turns, and stop-and-go behavior, which contribute to understanding driving conditions.

This comprehensive driving behavior dataset was meticulously preprocessed to ensure quality and relevance, including handling missing values, normalizing numerical features, and encoding categorical variables, facilitating effective model evaluation and tuning.

4.2. Experiment Setup

The proposed framework was evaluated through a structured methodology comprising several phases: data preprocessing, feature engineering, dimensionality reduction, emission estimation, clustering, and anomaly detection. Initially, the telemetric dataset was meticulously preprocessed to ensure consistency. Features such as Vehicle Specific Power (VSP), road grade, event-based driving behaviors, including harsh braking and rapid acceleration, and pattern-based and contextual-based driving behaviors, were engineered. Emissions were estimated using VSP-based calculations and categorized into different driving modes, as outlined in the methodology. To cluster driver behaviors, a Gaussian Mixture Model (GMM) was employed, while anomaly detection utilized an Ensemble Isolation Forest (EIF). For hyperparameter tuning, a comprehensive parameter grid was used to optimize the EIF models. Parameters like contamination, number of estimators, and maximum samples were explored.

Contamination: This parameter represents the expected proportion of anomalies within the dataset, with values set at 0.1, 0.2, and 0.3.
Number of Estimators (n_estimators): This denotes the number of decision trees included in the ensemble, with candidate values of 50, 100, and 150.
Maximum Samples (max_samples): This determines the number of training samples for each base estimator, evaluated with ’auto’, 256, and 512.

A custom scoring function assessed model performance by counting detected anomalies below a specified threshold. GridSearchCV with three-fold cross-validation ensured robust evaluation. Feature subset diversification allowed models to capture different aspects of driving behavior. The optimal parameters were identified for each subset, and models were trained accordingly. Threshold optimization further refined anomaly detection, selecting the most effective cut-off. This approach ensured robust calibration, enhancing the models’ ability to identify diverse anomalous patterns.

4.3. Performance Evaluation

4.3.1. Clustering for Behavior Patterns

Driver behaviors were clustered using Gaussian Mixture Models (GMMs) based on event-based and pattern-based features like speed variance, rolling acceleration mean, and overspeeding frequency. The clustering results were evaluated using the Silhouette Score, which yielded an average score of 0.72, indicating well-defined and cohesive clusters as pressented in Table 3. The Calinski–Harabasz Index (CHI) further validated the compactness of the clusters, with a high score suggesting that driving behaviors were distinctly separated, and the Davies–Bouldin Index was used to measure intra-cluster similarity. These clusters provide a foundation for risk profiling and tailoring emission reduction strategies based on specific driving behaviors.

4.3.2. Emission Estimation

In the emission estimation phase, Vehicle Specific Power (VSP) values were employed to classify vehicle operations into distinct driving modes, such as idling, cruising, and accelerating. The emissions of Carbon Monoxide (CO), Hydrocarbons (HCs), and Nitrogen Oxides (NOx) were estimated using real-world emission factors obtained from Portable Emission Measurement Systems (PEMSs). The methodology effectively captured dynamic changes in emission levels across varying driving conditions, particularly during aggressive driving events, such as rapid acceleration and uphill driving. These findings reflect the capacity of the framework to quantify emissions accurately in real-world driving scenarios.

Figure 4 illustrates the average emissions of CO, HCs, and NOx across different driving behavior clusters, namely, aggressive driving, normal driving, conservative driving, high-speed driving, and stop-and-go driving. The data clearly show that aggressive driving produces the highest emissions for all three pollutants, with CO emissions being particularly elevated. This pattern highlights the substantial environmental impact of aggressive driving, underscoring how behaviors like rapid acceleration can dramatically increase emissions. In contrast, normal driving results in the lowest emission levels across the board, demonstrating the efficiency of smoother driving patterns. Conservative driving and high-speed driving exhibit moderate emission levels. While these behaviors are more controlled compared to aggressive driving, they still contribute significantly to overall emissions. Notably, stop-and-go driving—common in urban settings with frequent acceleration and deceleration—also results in high emissions, similar to aggressive driving. This analysis establishes a correlation between driving behavior and emission levels, with aggressive and stop-and-go driving behaviors contributing most significantly to pollution. These insights suggest that modifying driver behavior could be crucial in reducing vehicular emissions and improving air quality.

We conducted a spatial analysis that also focused on the geographic distribution of vehicle emissions, utilizing GeoData Frame to visualize emission hotspots across urban areas. Figure 5 and Figure 6 represent the spatial analyses of vehicle emissions, with a particular focus on the identification of emission hotspots and anomalous driving behaviors. Figure 5 identifies regions where vehicle emissions deviate from normal patterns, potentially indicating irregularities such as aggressive driving. Figure 6 visualizes areas of intense vehicle emissions, highlighting zones with high concentrations of pollutants such as Nitrogen Oxides (NOx), Carbon Monoxide (CO), and Hydrocarbons (HCs). These hotspots are typically found in urban areas with dense traffic, where road types and congestion exacerbate pollutant outputs. The results identified elevated emission levels in regions with high traffic congestion, steep road gradients, and industrial zones. Such hotspots present important opportunities for urban planners to implement targeted interventions. For example, optimizing traffic flow in heavily congested areas or establishing low-emission zones in industrial districts could reduce pollution in these high-emission regions. This spatial insight provides actionable information for policymakers aiming to mitigate the environmental impact of vehicular emissions, especially in densely populated urban centers.

Temporal analysis was conducted to explore how emission levels fluctuate based on the time of day, day of the week, and specific driving conditions. The findings indicate that emissions peak during traditional rush hours—7:00–9:00 a.m. and 5:00–7:00 p.m.—when traffic congestion is at its highest. This is particularly evident for CO emissions, which surge during these periods, likely due to stop-and-go traffic and prolonged idling.

Figure 7, depicting hourly emission trends, shows that CO emissions follow a pronounced pattern, peaking in the early morning and maintaining elevated levels throughout peak commuting times. In contrast, NOx emissions remain relatively stable throughout the day, with a slight dip around midday. HC emissions, on the other hand, exhibit consistently low levels with minimal variation over the day. These temporal patterns suggest that routine traffic conditions, particularly during rush hours, significantly influence emission levels.

The analysis also found that weekday emissions were generally higher than weekend emissions, correlating with typical urban commuting patterns, as shown in Figure 8. Furthermore, emissions tended to increase during night-time driving and inclement weather conditions, particularly rainy weather. These higher emissions can likely be attributed to reduced visibility and challenging road conditions, which necessitate slower driving and frequent stops, thereby increasing fuel consumption and emissions. Such insights are crucial for developing time-based emission mitigation strategies, such as implementing congestion charges during peak hours or promoting off-peak travel to reduce emissions during the most congested times.

4.3.3. Anomaly Detection

Anomalous driving behaviors, such as extreme acceleration or abrupt braking, were identified using an Ensemble Isolation Forest (EIF) model. This method proved effective in detecting outlier events, particularly those that corresponded with high-emission occurrences. Risky driving patterns, including stop-and-go movements and rapid deceleration, were commonly associated with these anomalies. The EIF model was evaluated based on its anomaly detection rate, achieving a high precision of 0.88, which indicates its robustness in distinguishing abnormal driving events as presented in Table 4. These results suggest that the anomaly detection framework offers a valuable tool for identifying behaviors that not only elevate traffic risk but also significantly increase vehicle emissions.

Figure 9 presents the distribution of anomaly scores generated by the EIF model. The distribution is notably skewed to the right, with the majority of scores clustering above zero, representing normal driving behavior. A red dashed line illustrates the threshold that separates anomalies from normal events. Scores below this threshold are flagged as potential anomalies. The skewed distribution reflects that most driving behaviors fall within the normal range, while the EIF model efficiently isolates the rare, potentially hazardous driving patterns. This visual demarcation enhances the ability to pinpoint atypical behaviors within large datasets, making the identification process more streamlined and accurate.

Figure 10 compares the relative emission levels during anomalous and non-anomalous driving periods across three pollutants: Carbon Monoxide (CO), Hydrocarbons (HCs), and Nitrogen Oxides (NOx). The figure highlights that CO emissions exhibit the most significant increase during anomalous periods, with the ratio of emissions in anomalous events far exceeding those during normal driving. NOx emissions follow, with a moderate increase, while HCs show the smallest relative difference between anomalous and non-anomalous periods. These findings suggest that CO emissions are particularly sensitive to anomalous driving behaviors, such as rapid deceleration and aggressive acceleration. This sensitivity is critical in understanding how specific pollutants react to irregular driving patterns. By identifying the pollutants most affected by anomalies, it becomes possible to design more targeted emission reduction strategies that address the specific driving behaviors contributing to the highest levels of pollution.

Figure 11 provides a detailed analysis of emissions over the day, with red dots indicating the hours during which driving anomalies were detected. CO emissions are consistently higher throughout the day, with noticeable peaks during early morning and late-night hours. The chart suggests that anomalies tend to coincide with these peak periods, possibly due to factors such as increased traffic congestion or more frequent stop-and-go driving conditions during these times. In contrast, NOx and HC emissions remain relatively stable throughout the day, with minimal fluctuation. This pattern indicates that, while anomalies contribute to increased emissions, they have a more pronounced effect on CO levels compared to NOx and HCs. The correlation between emission spikes and detected anomalies during specific hours underlines the potential link between traffic patterns, environmental conditions, and emission surges.

4.4. Case Study

We conducted a case study aimed at assessing the impact of traffic intervention—namely speed limit enforcement. The study is conducted through a series of computational steps designed to simulate real-world scenarios and evaluate their effects on the emission levels of pollutants such as Carbon Monoxide (CO), Hydrocarbons (HCs), and Nitrogen Oxides (NOx):

Case Study 1: Limiting Speed Limit

In this case study, we investigate the potential effects of implementing a reduced speed limit on vehicular emissions. Specifically, we simulate the introduction of a speed cap of 40 km/h, and we aim to determine how limiting vehicle speeds might reduce emissions of key pollutants, such as Carbon Monoxide (CO), Hydrocarbons (HCs), and Nitrogen Oxides (NOx). By capping the maximum allowable speed for all vehicles in the dataset, we essentially aim to quantify the benefits of speed regulation on environmental outcomes. The primary objective is to evaluate the feasibility of speed limit enforcement as a policy tool to reduce vehicular emissions. The comparison between original emissions and those under the adjusted speed limit will help in quantifying the potential reductions in pollutant levels. This analysis could be particularly relevant for urban areas, where traffic congestion and high-speed driving often contribute to deteriorating air quality. The results of this study could inform policy decisions on whether or not implementing speed limits can effectively reduce emissions and improve air quality in cities.

As depicted in Figure 12, the impact of reducing the speed limit to 40 km/h is evident in the decrease in emissions of three pollutants: Carbon Monoxide (CO), Hydrocarbons (HCs), and Nitrogen Oxides (NOx). The analysis reveals a notable reduction in CO emissions, from 3.83 mg/s to 3.42 mg/s, demonstrating that slowing down vehicles can enhance fuel combustion efficiency, thus lowering CO output. Similarly, HC emissions drop from 0.43 mg/s to 0.38 mg/s, suggesting that a lower speed limit curbs incomplete fuel combustion. NOx emissions, a major contributor to air pollution, also decrease from 1.21 mg/s to 1.08 mg/s, likely due to lower engine temperatures and reduced combustion pressure at slower speeds. These results indicate that regulating speed limits can be a practical and effective approach to mitigating vehicle emissions and enhancing air quality.

5. Discussion

The results of our study provide valuable insights into the dynamics of vehicle emissions, highlighting the importance of real-time telemetric data in capturing the variability of emissions across different driving conditions. By leveraging Vehicle Specific Power (VSP) as a basis for estimating emissions, we demonstrated that emissions fluctuate significantly depending on factors such as driving behavior, road type, and environmental conditions. This approach overcomes the limitations of traditional laboratory-based methods, which often fail to account for the complex nature of real-world driving environments. Our findings show that aggressive driving behaviors, such as rapid acceleration and sudden braking, are strongly associated with increased emissions, supporting the growing body of research emphasizing the role of driver behavior on pollution levels [18]. The case study analysis on speed-limit enforcement further corroborates this, indicating that interventions targeting driver behavior, such as speed regulation, can significantly reduce emissions in urban areas. These insights are critical for policymakers and urban planners, as they suggest targeted strategies to mitigate the environmental impact of transportation. Additionally, the use of telemetric data also allowed us to identify high-emission zones, particularly in urban areas with dense traffic and complex road networks, as shown in Figure 6. These findings are crucial for urban planners, as they suggest that targeted mitigation strategies such as traffic rerouting or the creation of low-emission zones reduce the environmental impact in areas with the highest emission concentrations [52].

However, while our study provides significant contributions, several limitations must be acknowledged. First, our model focuses primarily on Internal Combustion Engine (ICE) vehicles, which represent a significant portion of emissions in the current urban context. As the adoption of electric and hybrid vehicles grows, it will be essential to adapt our model to include these newer technologies, as they have different emission profiles and operational characteristics. Additionally, while VSP-based estimations offer a dynamic view of emissions, the integration of more granular environmental data could improve model accuracy. Future work could also explore embedding this framework into digital twin technologies to enable real-time monitoring and control of both traffic and emissions, further enhancing the capabilities of smart city infrastructure. By addressing the complexities of real-world emissions, this study contributes to bridging significant knowledge gaps and advancing sustainable urban mobility solutions.

6. Conclusions

This study presents a telemetric data-driven framework that leverages unsupervised learning techniques to provide a comprehensive assessment of vehicle emissions in real-world scenarios. By integrating Vehicle Specific Power (VSP) calculations, Gaussian Mixture Models (GMMs), and Ensemble Isolation Forests (EIFs), the framework successfully identifies high-risk driving behaviors and maps high-emission zones, achieving a Silhouette Score of 0.72 for clustering and a precision of 0.88 in anomaly detection. The findings underscore the importance of incorporating real-time, data-driven approaches to emissions monitoring and management. The case study on speed-limit enforcement highlights the potential of targeted interventions, such as speed regulation, in significantly reducing emissions in urban areas. These insights are crucial for policymakers and urban planners seeking to implement effective strategies to mitigate the environmental impact of transportation. While the current model focuses on Internal Combustion Engine (ICE) vehicles, future extensions should adapt the framework to include emerging technologies like electric and hybrid vehicles. Additionally, integrating more granular environmental data could further enhance the model’s accuracy and applicability. By addressing the complexities of real-world emissions, this study contributes to bridging significant knowledge gaps and advancing sustainable urban mobility solutions.

Author Contributions

Conceptualization, A.S.M., C.W., and L.C.; methodology, A.S.M., C.W., and L.C.; software, A.S.M.; validation, L.C.; formal analysis, A.S.M. and L.C.; investigation, A.S.M.; resources, A.S.M.; data curation, A.S.M.; writing—original draft preparation, A.S.M., C.W., and L.C.; writing—review and editing, L.C.; visualization, A.S.M.; supervision, C.W. and L.C.; project administration, C.W. and L.C.; funding acquisition, L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author (A.S.M.), upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, J.; Wang, R. The Impact of Urbanization on Environmental Quality in Ecologically Fragile Areas: Evidence from Hengduan Mountain, Southwest China. Land 2024, 13, 503. [Google Scholar] [CrossRef]
Ghaffarpasand, O.; Beddows, D.; Ropkins, K.; Pope, F. Real-world assessment of vehicle air pollutant emissions subset by vehicle type, fuel and EURO class: New findings from the recent UK EDAR field campaigns, and implications for emissions restricted zones. Sci. Total Environ. 2020, 734, 139416. [Google Scholar] [CrossRef] [PubMed]
Pacura, W.; Szramowiat-Sala, K.; Gołaś, J. Emissions from Light-Duty Vehicles—From Statistics to Emission Regulations and Vehicle Testing in the European Union. Energies 2023, 17, 209. [Google Scholar] [CrossRef]
Rahman, S.A.; Fattah, I.R.; Ong, H.C.; Ashik, F.R.; Hassan, M.M.; Murshed, M.T.; Imran, M.A.; Rahman, M.H.; Rahman, M.A.; Hasan, M.A.M.; et al. State-of-the-art of establishing test procedures for real driving gaseous emissions from light-and heavy-duty vehicles. Energies 2021, 14, 4195. [Google Scholar] [CrossRef]
Triantafyllopoulos, G.; Dimaratos, A.; Ntziachristos, L.; Bernard, Y.; Dornoff, J.; Samaras, Z. A study on the CO₂ and NOx emissions performance of Euro 6 diesel vehicles under various chassis dynamometer and on-road conditions including latest regulatory provisions. Sci. Total Environ. 2019, 666, 337–346. [Google Scholar] [CrossRef]
Hu, Z.; Lu, Z.; Song, B.; Quan, Y. Impact of test cycle on mass, number and particle size distribution of particulates emitted from gasoline direct injection vehicles. Sci. Total Environ. 2021, 762, 143128. [Google Scholar] [CrossRef]
Smit, R.; Bluett, J. A new method to compare vehicle emissions measured by remote sensing and laboratory testing: High-emitters and potential implications for emission inventories. Sci. Total Environ. 2011, 409 13, 2626–2634. [Google Scholar] [CrossRef]
Wong, P.k.; Vong, C.m.; Ip, W.f.; Wong, H.c. Preliminary study on telemetric vehicle emission examination. In Green Communications and Networks, Proceedings of the International Conference on Green Communications and Networks (GCN 2011), Gaudia, Spain, 26–29 September 2011; Springer: Berlin/Heidelberg, Germany, 2012; pp. 443–451. [Google Scholar]
Ding, H.; Cai, M.; Lin, X.; Chen, T.; Li, L.; Liu, Y. RTVEMVS: Real-time modeling and visualization system for vehicle emissions on an urban road network. J. Clean. Prod. 2021, 309, 127166. [Google Scholar] [CrossRef]
Merkisz, J.; Pielecha, J.; Radzimirski, S. New Trends in Emission Control in the European Union; Springer: Berlin/Heidelberg, Germany, 2014; Volume 4. [Google Scholar]
Williams, M.; Minjares, R. A Technical Summary of Euro 6/VI Vehicle Emission Standards; ICCT: Washington, DC, USA, 2016. [Google Scholar]
Hooftman, N.; Messagie, M.; Van Mierlo, J.; Coosemans, T. A review of the European passenger car regulations–Real driving emissions vs local air quality. Renew. Sustain. Energy Rev. 2018, 86, 1–21. [Google Scholar] [CrossRef]
Dhital, N.B.; Wang, S.X.; Lee, C.H.; Su, J.; Tsai, M.Y.; Jhou, Y.J.; Yang, H.H. Effects of driving behavior on real-world emissions of particulate matter, gaseous pollutants and particle-bound PAHs for diesel trucks. Environ. Pollut. 2021, 286, 117292. [Google Scholar] [CrossRef]
Ng, E.C.; Huang, Y.; Hong, G.; Zhou, J.L.; Surawski, N.C. Reducing vehicle fuel consumption and exhaust emissions from the application of a green-safety device under real driving. Sci. Total Environ. 2021, 793, 148602. [Google Scholar] [CrossRef] [PubMed]
Fondzenyuy, S.K.; Turner, B.M.; Burlacu, A.F.; Jurewicz, C.; Usami, D.S.; Feudjio, S.L.T.; Persia, L. The Impact of Speed Limit Change on Emissions: A Systematic Review of Literature. Sustainability 2024, 16, 7712. [Google Scholar] [CrossRef]
Zhang, R.; Chen, H.; Xie, P.; Zu, L.; Wei, Y.; Wang, M.; Wang, Y.; Zhu, R. Exhaust Emissions from Gasoline Vehicles with Different Fuel Detergency and the Prediction Model Using Deep Learning. Sensors 2023, 23, 7655. [Google Scholar] [CrossRef] [PubMed]
Šarkan, B.; Jaśkiewicz, M.; Kubiak, P.; Tarnapowicz, D.; Loman, M. Exhaust Emissions Measurement of a Vehicle with Retrofitted LPG System. Energies 2022, 15, 1184–2022. [Google Scholar] [CrossRef]
Huang, Y.; Ng, E.C.; Zhou, J.L.; Surawski, N.C.; Lu, X.; Du, B.; Forehead, H.; Perez, P.; Chan, E.F. Impact of drivers on real-driving fuel consumption and emissions performance. Sci. Total Environ. 2021, 798, 149297. [Google Scholar] [CrossRef]
Mądziel, M. Modelling CO₂ Emissions from Vehicles Fuelled with Compressed Natural Gas Based on On-Road and Chassis Dynamometer Tests. Energies 2024, 17, 1850. [Google Scholar] [CrossRef]
Mislyuk, O.; Khomenko, E.; Yehorova, O.; Zhytska, L. Assessing risk caused by atmospheric air pollution from motor vehicles to the health of population in urbanized areas. East.-Eur. J. Enterp. Technol. 2023, 121. [Google Scholar] [CrossRef]
Nunes, L.J.R. The Rising Threat of Atmospheric CO₂: A Review on the Causes, Impacts, and Mitigation Strategies. Environments 2023, 11, 39751–39775. [Google Scholar] [CrossRef]
Bronte-Moreno, O.; González-Barcala, F.J.; Muñoz-Gall, X.; Pueyo-Bastida, A.; Ramos-González, J.; Urrutia-Landa, I. Impact of air pollution on asthma: A scoping review. Open Respir. Arch. 2023, 5, 100229. [Google Scholar] [CrossRef]
Lee, H.; Calvin, K.; Dasgupta, D.; Krinner, G.; Mukherji, A.; Thorne, P.; Trisos, C.; Romero, J.; Aldunce, P.; Barret, K.; et al. IPCC, 2023: Climate Change 2023: Synthesis Report, Summary for Policymakers. Contribution of Working Groups I, II and III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Core Writing Team, Lee, H., Romero, J., Eds.; IPCC: Geneva, Switzerland, 2023. [Google Scholar]
Zhang, L.; Wei, J.; Tu, R. Temporal-spatial analysis of transportation CO₂ emissions in China: Clustering and policy recommendations. Heliyon 2024, 10, e24648. [Google Scholar] [CrossRef]
Su, J. Research on the Impact of Automobile Exhaust on Air Pollution. In Proceedings of the 2022 International Conference on Urban Planning and Regional Economy (UPRE 2022), Online, 22–24 April 2022; pp. 497–501. [Google Scholar] [CrossRef]
Wurtsbaugh, W.; Paerl, H.; Dodds, W. Nutrients, eutrophication and harmful algal blooms along the freshwater to marine continuum. Wiley Interdiscip. Rev. Water 2019, 6, e1373. [Google Scholar] [CrossRef]
MALMQVIST, E. A major step toward cleaner air in the EU. ACID NEWS, June 2024; No. 2. [Google Scholar]
Jan, D.; Felipe, R. Euro 7: The New Emission Standard for Light- and Heavy-Duty Vehicles in the European Union. 2024. Available online: https://theicct.org/wp-content/uploads/2024/03/ID-116-%E2%80%93-Euro-7-standard_final.pdf (accessed on 21 September 2024).
NHTSA. Corporate Average Fuel Economy | NHTSA — nhtsa.gov. 2024. Available online: https://www.nhtsa.gov/laws-regulations/corporate-average-fuel-economy, (accessed on 21 September 2024).
Agarwal, A.; Mustafi, N. Real-world automotive emissions: Monitoring methodologies, and control measures. Renew. Sustain. Energy Rev. 2021, 137, 110624. [Google Scholar] [CrossRef]
Dimaratos, A.; Toumasatos, Z.; Doulgeris, S.; Triantafyllopoulos, G.; Kontses, A.; Samaras, Z. Assessment of CO₂ and NOx Emissions of One Diesel and One Bi-Fuel Gasoline/CNG Euro 6 Vehicles During Real-World Driving and Laboratory Testing. Front. Mech. Eng. 2019, 5, 62. [Google Scholar] [CrossRef]
Ghaffarpasand, O.; Pope, F.D. Telematics data for geospatial and temporal mapping of urban mobility: Fuel consumption, and air pollutant and climate-forcing emissions of passenger cars. Sci. Total Environ. 2023, 894, 164940. [Google Scholar] [CrossRef]
He, Z.; Ye, G.; Jiang, H.; Fu, Y. Vehicle Emission Detection in Data-Driven Methods. Math. Probl. Eng. 2020, 2020, 4875310. [Google Scholar] [CrossRef]
Wong, P.; Vong, C.; Ip, W.; Wong, H.C. Flexibility study on telemetric vehicle emission examination. Int. J. Satell. Commun. Policy Manag. 2012, 1, 220. [Google Scholar] [CrossRef]
Koupal, J.; DenBleyker, A.; Manne, G.K.; Batista, M.H.; Schmitt, T. Capabilities and Limitations of Telematics for Vehicle Emissions Inventories. Transp. Res. Rec. 2021, 2676, 49–57. [Google Scholar] [CrossRef]
Hind, G.W.; Ballantyne, E.E.; Stincescu, T.; Zhao, R.; Stone, D.A. Extracting dashcam telemetry data for predicting energy use of electric vehicles. Transp. Res. Interdiscip. Perspect. 2024, 27, 101189. [Google Scholar] [CrossRef]
Hao, L.; Yin, H.; Wang, J.; Wang, X.; Ge, Y. Potential of big data approach for remote sensing of vehicle exhaust emissions. Sci. Rep. 2021, 11, 5472. [Google Scholar] [CrossRef]
Mondal, S.; Gupta, A. Evaluation of driver Acceleration/Deceleration behavior at signalized intersections using vehicle trajectory data. Transp. Lett. 2022, 15, 350–362. [Google Scholar] [CrossRef]
Zheng, F.; Li, J.; Van Zuylen, H.J.; Lu, C. Influence of driver characteristics on emissions and fuel consumption. IET Intell. Transp. Syst. 2019, 13, 1770–1779. [Google Scholar] [CrossRef]
Hamidi, B.; Lajqi, N.; Hamidi, L. Modelling and sensitive analysis of the impact on telematics system in vehicles. IFAC-PapersOnLine 2016, 49, 232–236. [Google Scholar] [CrossRef]
Zhou, M.; Jin, H.; Wang, W. A review of vehicle fuel consumption models to evaluate eco-driving and eco-routing. Transp. Res. Part D Transp. Environ. 2016, 49, 203–218. [Google Scholar] [CrossRef]
Robinson, M.K.; Holmén, B. Hybrid-electric passenger car energy utilization and emissions: Relationships for real-world driving conditions that account for road grade. Sci. Total Environ. 2020, 738, 139692. [Google Scholar] [CrossRef] [PubMed]
Azevedo, J.A.H.; Cassiano, D.R.; Bertoncini, B.V. Real driving cycles and emissions for urban freight transport. Front. Big Data 2024, 7, 1375455. [Google Scholar] [CrossRef]
Zang, J.; Song, G.; Wu, Y.; Yu, L. Method for Evaluating Eco-Driving Behaviors Based on Vehicle Specific Power Distributions. Transp. Res. Rec. 2019, 2673, 409–419. [Google Scholar] [CrossRef]
Shakib, M.N.; Shamim, M.; Shawon, M.N.H.; Isha, M.K.F.; Hashem, M.; Kamal, M. An adaptive system for detecting driving abnormality of individual drivers using Gaussian mixture model. In Proceedings of the 2021 5th International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Dhaka, Bangladesh, 18–20 November 2021; pp. 1–6. [Google Scholar]
Ye, W.; Xu, Y.; Zhou, F.; Shi, X.; Ye, Z. Investigation of bus drivers’ reaction to ADAS warning system: Application of the Gaussian mixed model. Sustainability 2021, 13, 8759. [Google Scholar] [CrossRef]
Yang, C.H.; Chang, C.C.; Liang, D. A novel GMM-based behavioral modeling approach for smartwatch-based driver authentication. Sensors 2018, 18, 1007. [Google Scholar] [CrossRef]
Song, X.; Aryal, S.; Ting, K.; Liu, Z.; He, B. Spectral–Spatial Anomaly Detection of Hyperspectral Data Based on Improved Isolation Forest. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5516016. [Google Scholar] [CrossRef]
Moosavi, S.; Ramnath, R. Context-aware driver risk prediction with telematics data. Accid. Anal. Prev. 2023, 192, 107269. [Google Scholar] [CrossRef]
Yao, R.; Sun, L.; Long, M. VSP-based emission factor calibration and signal timing optimisation for arterial streets. IET Intell. Transp. Syst. 2019, 13, 228–241. [Google Scholar] [CrossRef]
Wang, J. Analysis of Shared Bicycle Usage based on K-Means and GMM Clustering Algorithm. In Proceedings of the 2021 2nd International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Shanghai, China, 15–27 October 2021; pp. 92–96. [Google Scholar] [CrossRef]
Lewandowski, S.; Ullrich, A. Measures to reduce corporate GHG emissions: A review-based taxonomy and survey-based cluster analysis of their application and perceived effectiveness. J. Environ. Manag. 2023, 325, 116437. [Google Scholar] [CrossRef]

Figure 1. Methodology.

Figure 2. Feature-level fusion of features.

Figure 3. Ensemble Isolation Forest model.

Figure 4. Emissions by driver behavior.

Figure 5. Spatial distribution of anomalies.

Figure 6. Spatial emission hotspots. (a) CO emissions; (b) HC emissions; (c) NOx emissions.

Figure 7. Emissions by hour of the day.

Figure 8. Emissions by day of the week.

Figure 9. Distribution of anomaly scores.

Figure 10. Emissions levels: Anomaly vs. Non-anomaly.

Figure 11. Emissions by hour of the day with anomalies.

Figure 12. Emissions comparison after reducing speed limit.

Table 1. Datasets used.

Dataset	Features
Telemetric Dataset	trip_id, timestamp, speed, acceleration, latitude, longitude, bearing, driver_id, date, duration_seconds, distance_meters, city
Road Segment Dataset	geometry, Road_Segment_ID, City, Start_Coordinates, End_Coordinates, Length, Speed_Limit, Number_of_Lanes, Road_Type, POI_Density, Distance_to_School, Distance_to_Hospital, Distance_to_Segment
Weather Dataset	date, city, humidity, pressure, temperature, wind_direction, wind_speed, PRCP

Table 2. Emission factors for cars based on VSP ranges.

VSP Range (kW/t)	CO (mg/s/veh)	HC (mg/s/veh)	NOx (mg/s/veh)
VSP < −10	1.9025	0.0673	0.3437
−10 ≤ VSP < −2	2.0918	0.1030	0.5046
−2 ≤ VSP < 0	2.5419	0.1593	0.5562
0 ≤ VSP < 2	1.8237	0.2323	0.5855
2 ≤ VSP < 5	2.3533	0.1896	0.6916
5 ≤ VSP < 9	2.2451	0.2592	0.8216
9 ≤ VSP < 13	2.6964	0.3180	1.0906
13 ≤ VSP < 17	4.0725	0.4383	1.1764
17 ≤ VSP < 20	3.9979	0.5472	1.3588
VSP ≥ 20	4.5135	0.5174	1.4514

Table 3. Clustering results.

Task	Silhouette Score	DBI	CHI
Driving pattern clustering	0.729	0.888	59,367,387.3

Table 4. Anomaly classification results.

Class	Accuracy	Precision	Recall	F1-Score
Class 0	0.87	0.88	0.85	0.86
Class 1		0.86	0.89	0.87

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Muhammad, A.S.; Wang, C.; Chen, L. A Telemetric Framework for Assessing Vehicle Emissions Based on Driving Behavior Using Unsupervised Learning. Vehicles 2024, 6, 2170-2194. https://doi.org/10.3390/vehicles6040106

AMA Style

Muhammad AS, Wang C, Chen L. A Telemetric Framework for Assessing Vehicle Emissions Based on Driving Behavior Using Unsupervised Learning. Vehicles. 2024; 6(4):2170-2194. https://doi.org/10.3390/vehicles6040106

Chicago/Turabian Style

Muhammad, Auwal Sagir, Cheng Wang, and Longbiao Chen. 2024. "A Telemetric Framework for Assessing Vehicle Emissions Based on Driving Behavior Using Unsupervised Learning" Vehicles 6, no. 4: 2170-2194. https://doi.org/10.3390/vehicles6040106

APA Style

Muhammad, A. S., Wang, C., & Chen, L. (2024). A Telemetric Framework for Assessing Vehicle Emissions Based on Driving Behavior Using Unsupervised Learning. Vehicles, 6(4), 2170-2194. https://doi.org/10.3390/vehicles6040106

Article Menu