Keywords

1 Introduction

As the most commonly-seen vulnerable road users, pedestrian safety is always a big concern in the transportation-related research. Especially with the progress in the development of automated driving systems (ADS), how to fit the autonomous vehicles into mixed traffic conditions with other human-driven cars and pedestrians becomes one of the most critical challenges now [1]. Without exclusive road design for autonomous cars, it is necessary for these ADS to behave similarly as human drivers, in terms of understanding the road scenes, estimating encountering risks, communicating with other road users, and providing human-like responses in different situations. Thus, there is imminent research need to model the encountering scenarios between human drivers and pedestrians to direct the design and testing of ADS.

A list of studies has been published focusing on pedestrian-vehicle interaction scenarios and processes [2,3,4]. Most of the work relies on crash database, videos recorded from fixed cameras, or naturalistic driving study. In these studies, the scenarios are usually modeled either based on actual crashes [3] or unclassified encountering cases [2, 4], with some limitations:

  • Although crash data can reflect the actual dangerous situations, these data are usually recorded retrospectively after the events which are not only biased but also missing many details, as summarized in [5]. These make construction of comprehensive and complete scenarios difficult.

  • In other studies, unclassified encountering cases are randomly selected and analyzed from recorded road videos. Because these data do not represent the encountering risk levels, the modeled encountering scenarios mainly represent safe cases, whereas most crashes will not happen in these circumstances. The results have reduced implications in training AI for emergent situations.

One option is to investigate the vehicle-pedestrian encountering scenarios through the potential conflict cases [5], which can represent the actual crashes to some extent [6] while also provide details recorded from the on-board or roadside cameras and other vehicle sensors. A number of representative pedestrian-vehicle encountering scenarios have been published using these types of data.

Although these potential conflict cases reflect actual dangers between particular driver and pedestrian(s), they are not necessarily the dangerous cases for ADS. Most applied AI system uses machine-learning algorithms to understand and predict the new event based on the experiences (training data). Thus, ADS will have experiences from a lot of drivers in different scenarios. This makes it interesting to investigate that after trained with the potential and actual dangerous scenarios collected from the naturalistic data, how AI algorithms can predict the encountering risks. The idea is that some more dangerous cases may become less risky if the AI algorithms can predict the danger based on the scenario conditions; while on the other hand, some less dangerous cases to the human drivers may be riskier for ADS if the condition is hard to be predicted by the AI algorithms.

To our best knowledge, most naturalistic driving studies investigate pedestrian-vehicle interactions based on all potential conflicts or near-miss events, and do not further classify them to look for more risky cases for ADS system. In this study, we try to open the discussions about the differences in potential conflict cases collected between human drivers and pedestrians in terms of their risk levels for ADS.

2 Research Objectives and Scope

ADS can predict the encountering risks with pedestrians at micro and macro levels. At the micro level, ADS analyses instantaneous sensing results of surrounding road objects with short-range movement prediction to detect and mitigate imminent conflicts. While in this study, we mainly focus on the prediction at macro level, which relies on descriptive scenario variables to classify the encountering cases into different risk levels. The main idea is that without consuming large amount of computational resources, the ADS shall be able to read the encountering cases based on scenario variables that are easier to acquire, and prepare necessary follow-up steps.

Based on a pre-labeled naturalistic driving data set focusing on pedestrian behaviors [5], this study will try to apply different machine-learning algorithms including Naive Bayes, Logistic Regression, Random Forests, Support Vector Machines, and Deep Neural Network to classify vehicle-pedestrian encountering risks in natural road environment. There are two research questions:

  • First research question: based on a list of descriptive variables without detailed movement and posture data, to what extent can machine learning algorithms classify potential conflict cases from all vehicle-pedestrian encounters?

  • Second research question: towards the potential vehicle-pedestrian conflicts, to what extent can machine learning algorithms classify passing priorities between the pedestrians and cars?

Towards the proposed research questions, a multi-step process is adopted to carry out the study. The first step is to classify between safe cases and potential conflict cases based on a list of environment, traffic, and behavior variables in the encountering scenario. The potential conflict is defined as “that the contact between the vehicle and pedestrian(s) will occur if neither the driver nor the pedestrian(s) changes the moving speed/moving direction, or the trajectories of the vehicle and the pedestrian are adjacent to each other during the time period which results in the movement responses from the driver and/or the pedestrian(s) to avoid the contact, although the responses may not be necessary” [7]. As illustrated in Fig. 1, vehicle-pedestrian encounters can be classified as three categories, safe cases, cases with predictable risks, and cases with unpredictable risks, which are corresponding to true negative, true positive, and false negative classification results about potential conflict events. Both true negative and false positive classifications will not post additional risks. True positive cases represent the potential conflict dangers that can be predictable by the AI system, which may not cause the same level of risk as for the human drivers. However, false negative cases represent the unpredictable potential dangers for the AI system, and shall be further studied.

Fig. 1.
figure 1

Vehicle-pedestrian encountering risk model

The second step is to further classify the potential conflict cases into vehicle-pass-first and pedestrian-pass-first cases. We will follow the same logic of predictable and unpredictable potential conflicts for the analysis. A series of variables describing more encountering details will be used at this step. Again, the goal focuses on studying the capability of machine learning algorithms in classifying vehicle-pedestrian encountering risks based on easy-to-acquire descriptive scenario variables to prepare the system with more challenging training data set. At this level of classification, special interests fall on the false negative classification results for pedestrian-pass-first cases, representing the risks that AI algorithms wrongly underestimate pedestrian crossing priority and suggests the vehicle to cross first. This type of wrong decisions may cause increased danger for pedestrians.

3 Methodology

3.1 Data Set Description

Based on the TASI 110-car naturalistic driving study, as described in [5, 7], this work used the full data set of more than 50,000 pedestrian-vehicle encounters, containing both safe cases (N = 49,631) and potential conflict cases (N = 1,472) (following the definition described above). For all of these cases, descriptive scenario variables are manually labeled as shown in Table 1.

Table 1. Descriptive scenario variables used for classifying potential conflict cases

In the first step of classification, all of these 18 descriptive scenario variables listed in Table 1 are used to classify safe and potential-conflict encountering cases. In the second step of data analysis, all the potential conflict cases are firstly labeled with ground-truth data about the passing priority between vehicles and pedestrians. Then, another list of variables with more encounter details are used to train the machine learning algorithms to classify the passing priorities, including:

  • Time-to-potential-collision: the time duration from pedestrian appearance moment or first potential conflict point to the moment that potential collision will happen if no avoidance behavior is performed.

  • Pedestrian and Vehicle Relative Moving Direction

  • Vehicle Speed: actual vehicle speed

  • Pedestrian Moving Action: walking, running, biking, standing

  • Pedestrian Moving Speed Category: fast, slow, normal

  • Pedestrian Moving Direction

  • Pedestrian Appearance Location

  • Driver Avoidance Behavior

  • Pedestrian Avoidance Behavior

  • Driving Environment

  • Road Type

  • Light Condition

  • Traffic Control Device

  • Weather

  • Traffic Density Level

  • Road Infrastructure: divided traffic, non-divided traffic, one-way traffic

  • Car Direction: straight, turning left, turning right

3.2 Machine-Learning Algorithms Implemented

A total of 5 different machine learning algorithms are applied towards the prior-described data set in the two-step classification process.

Naive Bayes. Naive Bayesian Classifier is a kind of probabilistic classifier [8]. Known for its simplicity, Naive Bayes can often outperform more sophisticated classification methods. It works as computing the probability of a data instance of n features: \(d = (f_l, f_2, \dots , f_n)\) belongs to category C. The probability is computed by applying the Bayes’ theorem, given by Eq. 1:

$$\begin{aligned} P(C | d) = \frac{P(C)P(d|C)}{P(d)} \end{aligned}$$
(1)

P(C|d) is thus the probability that the data instance d belongs to category C. P(d) is the probability that a randomly picked data instance has vector \(d = (f_l, f_2,... f_n)\) as its representation, and P(C) is the probability that a randomly picked document belongs to C. P(d|C) is the probability that a data instance belongs to category C has the representation d. In order to alleviate the problem, it is common to assume that \(f_l, f_2, \dots , f_n\) are conditionally independent given the category C. As a result, the P(d|C) is computed as Eq. 2. \(P(f_i|C)\) and P(C) are estimated from the training set.

$$\begin{aligned} P(d|C)= \sum _{i=1}^{n}P(w_i|C) \end{aligned}$$
(2)

Logistic Regression. Logistic Regression [9] is an efficient way for binary classification and uses predictive analysis for assigning observations to a discrete set of classes. Unlike linear regression which gives continuous number values, logistic regression transforms its output using the sigmoid function to return a probability value. This probability value can then be mapped to two or more discrete classes. Given a data instance with n features \(f_l, f_2, \dots , f_n\), the probability of it belongs to one category C is calculated as Eq. 3, where \(\alpha _0, \dots , \alpha _n\) are estimated from the training set.

$$\begin{aligned} \frac{1}{1 + exp(\alpha _0+\alpha _1 f_1,+\dots +\alpha _n f_n)} \end{aligned}$$
(3)

Random Forests. Random forests algorithm is very similar to decision tree algorithms. The one main difference was the amount of trees that were used to determine the error. Research has shown that when decision trees grow very deep to learn irregular patterns, they can over fit to the training sets [10]. Random forests take one third of the data from the sample and uses them to calculate an unbiased estimate of error. Then, it uses this information while creating the next tree to improve accuracy. They provided a way of averaging deep decision trees with the goal of reducing the variance [10]. The combination of the multiple trees created a “forest” which gave a much more accurate approximation of the data. On the other hand, they also showed tree like models for visualizing the results.

Support Vector Machines. Support Vector Machines (SVMs) [11] was one of the classification algorithms evaluated. The SVMs classifier aims to separate the input data using hyperplanes. In order to generate a less complex hyperplane function for classification, the maximum margin between the hyperplane and the support vectors is required. When the samples are not linearly separable, support vector machines was used to non-linearly transform the training features from a low dimensional space ‘d’ to a higher dimensional feature space ‘\(\phi (d)\)’ using a factor \(\varphi : d \rightarrow \phi (d)\) and a function called ‘Kernel’, an inner product of two examples in the feature space. The ability to learn from large feature spaces and the dimensional independence make the support vector machines a universal learner for data classification. Another characteristic of SVMs is the use of soft margins to protect from over fitting caused by the misclassification of noisy data in such large feature spaces [12].

Deep Neural Networks. The Deep Neural Network [13] is similar to other machine learning algorithms. During the training phase, the network accepts inputs and feed forward mathematical operations that ultimately set off an activation function on each node of the network. The activation of the node relies on the weights associated with itself. Normally, higher weight results in greater activation. The activation of the final layer is then compared to the inputs, and a cost or a loss is computed. The weights of the final layer and the preceding layers are adjusted based on the cost function in order to minimize the cost of the next prediction. Once the training phase is done, the trained model can be loaded into a system for testing. Each deep neural network extracts useful information from the input data elements automatically. Hence, no other data pre-processing is involved in the process. Figure 2 provides an example of a DNN classier. M components are the number of neurons at the input layer. Normally, they equal to the number of features \(f_l, f_2, \dots , f_n\) in the data set. N components at the output layers correspond to the number of classes to predict. Given an input, there is a associated probability of each class C at the output layer. The probability to a class is between 0 and 1. In this research, we use three hidden layers, and each layer contains 500 neurons.

Fig. 2.
figure 2

Deep neural networks

3.3 Classification Process

In this research, we have explored classification of whether there are potential conflicts and whether pedestrian or vehicle passes first when there is potential conflict. The classification process consists of two levels, first level is to classify whether there are potential conflict, whereas the second level is to classify whether pedestrian or vehicle passes first. The classifier that performs the best at the first level is used for the second level. In this research, the deep neural network performs better at the first level. Hence, it is used for the second level classification.

Fig. 3.
figure 3

Classification process

Since there are much more no potential conflict instances, The data set for the first level classification is extremely unbalanced. In order to balance the data set to train the learning models, we duplicate the instances of the potential conflict to balance the data set. Afterwards, the data set is randomly split into training and test data. In this research, 60% of the data is used as training data to train the learning models, 40% of the data is used for testing. Five different learning algorithms described in Sect. 3.2 are used for classify the potential conflict versus no potential conflict instances. Figure 3 demonstrates the classification process.

4 Machine-Learning Classification Results

In this research, we use true positive (TP), true negative (TN), false negative (FN), false positive (FP), accuracy (\(\frac{(TN + TP)}{(TN+TP+FN+FP)}\)), precision (\(\frac{TP}{(TP+FP)}\)), recall (\(\frac{TP}{(TP+FN)}\)) and F1 (\(\frac{TP}{(TP+FN+FP)}\)) to evaluate the performance of all five learning algorithms. Since both levels are binary classification, we calculate the TP, TN, FN and FP according to class potential conflict and pedestrian pass first.

4.1 Classification Results for Potential Conflict Cases

The following Tables 2 and 3 show the results of first-level classification towards potential conflict cases. For four commonly used machine learning algorithms, the classification accuracy is in the range of 0.75 to 0.81; however, deep neural network successfully achieves a accuracy rate of more than 93% for the testing data set. This suggests that only with the descriptive scenario variables listed in Table 1, AI system can recognize potential conflict at a very high accuracy level. Considering that the precision, recall, and F1 rates for DNN are also very high, the high accuracy rate is achieved with low False Positive rate.

Table 2. Classification results of training data
Table 3. Classification results of test data

Based on the results, we can answer the first research question that both traditional machine learning algorithms and deep neural network can classify potential conflict between vehicles and pedestrians based on easy-to-acquire descriptive scenario variables. Comparing to 4 machine learning algorithms, deep learning algorithm shows better performance. Thus, in one hand, the cases that can be successfully classified by deep neural network will be used for the second-level data analysis. In the other hand, the False Negative cases will be collected for more detailed investigations.

4.2 Classification Results for Passing Priority

The following Table 4 reports the classification results of passing priority between pedestrians and vehicles for potential conflicting cases, using deep neural network. With intensive training, the constructed network is able to achieve an accuracy level of 96% for the testing data set. This suggests that it is possible for AI system to classify passing priority to a very high accuracy level with enough details provided, even when the actual pedestrian moving trajectories and/or body postures are not available. Please note that the precision rate in this case is also very high, suggesting that the high accuracy is not sacrificing a high false positive rate.

Table 4. Classification results of training and test data

5 Conclusions and Future Work

Based on a large-scale naturalistic driving data set focusing on vehicle-pedestrian encountering cases, this study tries to investigate the possibility to detect potential conflict and prioritize passing sequences using only descriptive scenario variables, without involving quantitative data like pedestrian moving trajectories or body postures which consume large amount of computing resources and are difficult to acquire all the time. The main goal is to investigate if these relatively easy-to-acquire scenario variables can provide enough information to help ADS decide when to prepare for potential conflicts or give away right-of-way. Also, some wrongly classified cases may provide important features representing unpredictable risks.

Five different machine learning algorithms, including Naive Bayes, Logistic Regression, Random Forests, Support Vector Machines, and Deep Neural Networks, are applied towards the pre-labeled data set. The results show that the tested algorithms, especially the constructed deep neural network, can classify the potential conflict cases from safe cases at a very high accuracy level (93%) with low false positive rates (93% precision). Also, deep neural network can prioritize the passing sequences correctly at the accuracy level of 96% (at 96% precision level). The results prove that the descriptive scenario variables can be efficiently used to classify vehicle-pedestrian encountering risks.

As a preliminary study, there are some limitations in the work requiring additional investigations. There are some wrongly classified cases, especially the FN cases, which can represent additional risk for ADS as the corresponding risk is not predicted via the AI algorithms. The features of these cases will be further studied in comparison with other cases. Also, feature reduction process can be applied for all the scenario variables to further improve the performance of the machine learning algorithms.