Robust and Efficient Multi-Object Detection and Tracking For Vehicle Perception Systems Using Radar and Camera

Robust and Efficient Multi-Object Detection and Tracking for
Vehicle Perception Systems using Radar and Camera Sensor

Fusion
J. Burlet*, M. Dalla Fontana*
*TRW Conekt,UK; Julien.Burlet@trw.com; Mario.DallaFontana@trw.com
Keywords: Vehicle Detection; Multi-Object Tracking; Multi-object tracking is a necessary process to ensure output
Sensor Data Fusion. consistency, associate proper detection to the real world
objects, limit or suppress false detection and cope with missed
Abstract detection.
To perform filtering, each MOT track state estimation and
This paper describes a frontal vehicle perception system prediction is performed using an Extended Kalman Filter.
developed in the framework of the European project
interactIVe. The approach relies on two stage (radar and Experimentation and validation of our method is performed
camera) sensor fusion and robust multi-object tracking. using a test vehicle equipped with a sensor array composed of
Experimental results obtained using a test car show the the TRW AC100 medium range radar and the TRW T-Cam
robustness and efficiency of the method developed. camera providing lane detection and raw image for video
processing.
1 Introduction Tests are carried out on highway, rural, and urban scenarios
and show a very good detection rate while keeping the
Perceiving and understanding the environment surrounding a number of false positives very low.
vehicle is a very important step in driving assistance systems
or autonomous vehicles. In recent years, considerable The first section of this paper presents the test vehicle
research efforts have been focused on specific aspects of this platform and sensors. In a second section, we describe the
problem [1], [2] within several European projects software architecture and the work flow of the frontal object
(http://www.intersafe-2.eu, http://www.prevent-ip.org). perception module. In a third part, the development method is
Nevertheless, perception systems are still seen as a key presented in detail. Experimental results are the object of the
challenge in addressing autonomous driving assistance fourth section. The final section presents conclusions and
systems. considers further work.
In the frame of the European project interactIVe
(http://www.interactive-ip.eu), the global aim is to develop
safety systems that support the driver, intervene when
dangerous situations occur and help to mitigate the impact of
unavoidable-collision situations.
Within this project we focused on the perception system and
developed an approach to address frontal vehicle perception.
The aim of our approach is to use both radar and video
outputs in a two-stage fusion to perform robust vehicle
detection and tracking.
Radar sensors have a good range resolution and a crude
azimuth estimation and video sensors are able to give a
precise lateral estimation while having an uncertain range
estimation. A two-stage fusion process capitalises on each
sensor’s strength and compensates for its weaknesses. Figure 1 - TRW AC100 radar
Radar output is used for gating purposes to confirm the
detections at each stage and assign an accurate range and
velocity to the detected object. The camera output is then
used to confirm the initial detection and refine the object’s
lateral position.
Furthermore, to improve the efficiency of the perception
system, detections are integrated through time using a multi-
object tracking (MOT) method.
1
This test vehicle allows us to log data for development
2 Testing platform and sensors purposes as well as validate the approaches developed on a
real application platform.
A test vehicle from the interactIVe project is used in order to
obtain data sets from different situations. The process of data 3 Architecture & Process flow
acquisition focuses on three scenarios: highway, rural, and
urban areas. The TRW Conekt test car is a Fiat Stilo Sensor data are processed partly inside the sensors and partly
previously used in the PReVENT-SASPENCE project. in a central intelligence unit that takes care of synchronising
signals and performs object detection and tracking. Main
software is written in C/C++ using OpenCV.
Data enter the processing flow from the radar, the camera and
the vehicle itself (yaw rate and wheel speed sensor). Radar
data are pre-processed inside the AC100 unit itself and are
available through CAN, as a list of detected objects. There are
two kinds of objects available in the interface: one-spot
detections called targets and internally tracked objects called
tracks. Each one of these objects is characterised by its
fundamental parameters:
1) Range – that is, distance from the radar
2) Range Rate – that is, first derivative of the above
3) Azimuth – angular distance from the medium plane
of the radar
4) Level of detection
Figure 2 – T-Cam Camera with Lane Detection and The TRW Camera has internal processing capabilities,
Horizon estimation allowing it to output lane markings, object-in-lane and
indirect pitch estimation (expressed as horizon position in the
It is equipped with a sensor array composed of the TRW image). In addition to these data, the raw image as well as
AC100 medium range radar mounted below the registration edges detected below the horizon are made available for
plate and the TRW T-Cam camera, positioned below the rear further analysis. Object-in-lane information is, however, not
view mirror, providing lane detection and raw image for reliable enough and it is discarded in this application in
video processing. Also, vehicle ego motion is filtered and favour of a deeper analysis of the raw image in the central
provided through the CAN bus. Figure 3 shows images of the processor.
interactIVe test vehicle used to perform the experiments. The The seed for a track comes from radar data: once an object is
radar sensor, shown in Figure 1, is a medium range radar with detected the track is initiated and updated through the use of
a detection range up to 150m, a field of view of ±8 degrees measurements coming from radar and/or camera. Radar field
and an angular accuracy of 0.5 degrees. The camera, depicted of view is quite narrow, approximately ±8 degrees both
in Figure 2 has on board processing and image recognition horizontally and vertically, so tracks are only able to be
routines embedded which can detect lane markings and initialised when the object is actually inside the radar
provide a horizon estimation; its frame rate is 30Hz. detection zone. Vehicle tracking is maintained outside the
radar detection zone using camera data.
The tracks are carried to the next frame predicting their status
and expected measurements using ego-vehicle motion
information and track dynamic evolution. Each track at this
point triggers a raw image search to look for a vehicle in the
area where it is predicted to be. Noise from measurements
and prediction uncertainties cause the area searched to be
bigger than the detected area. The likelihood of the object
being a vehicle is calculated using histogram search
techniques and evaluating symmetry of the region.
If the prediction is confirmed, the corrector part of the
Extended Kalman Filter is performed and the track kept for
the next stage. Alternatively, if no evidence of a vehicle is
present the track is marked for possible deletion but retained.
The most likely status for the next frame is calculated via the
Extended Kalman Filter Prediction phase. Tracks that are not
confirmed for several frames are deleted. The outline scheme
for this algorithm can be seen in Figure 5.
Figure 3 - TRW demonstrator car with the fitted camera
and radar sensors
2
4 Method Figure 5 - System architecture
Multi-Object Tracking is a key part of the process. Indeed,
tracking multiple objects is necessary in order to overcome
possible issues arising from missed detections, ghost
detections and inaccurate detections, particularly when their
occurrence is limited in time. It is also an important element
to tackle proper track initialisation. Furthermore, tracking
allows retention of the history of detected objects and
prediction of where these objects are likely to be in the next
iteration. The prediction of the states is made using non-linear
differential equations, although they are locally linearised for
their use inside the Extended Kalman Filter.
Along with the issues it helps to solve, tracking introduces
new possible issues that have to be considered when
implementing the algorithm. These issues will be discussed
later.
The association, detection and tracking stages of our two Figure 4 - Logic flowchart
stage method are depicted in Figure 5. After the calculation of this likelihood function, its
thresholding and its storage in a matrix, a simple search
4. Method algorithm [9] allows association of the track with the
corresponding measurement.
In the following sections, we describe each part of the At this point, for every Track we can have 3 options:
flowchart in Figure 4 in detail. 1) Track has radar measurement associated: the measurement
is then used in the correction phase of the algorithm
4.1 Track association 2) Track doesn’t have a radar measurement associated: the
track is nevertheless carried over and the missed detection
At iteration k we consider a number of tracks already existing is recorded in track status. Track management will take
and carried over from iteration k-1. At the same iteration, we care of this, and will discriminate between tracks that are
have some measurements from radar available and so the first
confirmed by video only and tracks that are not confirmed
step is to associate established tracks with the new
by any measurement
measurements. This is done by means of a likelihood function 3) Radar measurement is not associated with any track: a
based on the distance between the measurement deriving from new track is initiated from radar data only, for the image
the predicted state of the track and the actual measurement. search algorithm to confirm detection.
Figure 6 - Example on non-structured Road
3
4.2 Track confirmation through Video calculated allows us to distinguish vehicles from non-vehicles
and so confirm tracks.
At this stage, the raw video image is used as input to verify
the presence of an object in it. Tracking allows reduction of 4.3 Track Management
the search area to certain regions of the image, lowering the
load for the processor. Confirmation by video of tracks that Once the tracks are confirmed by video, track management
are not associated by radar allows tracks to be kept alive even takes care of the confirmation or deletion of the tracks
when the object falls outside the detection angle of the radar; themselves. If a track has not been confirmed for a number of
this happens quite often in bends. So, even if the track has to frames, it is deleted and not tracked any more. The same
be initialised in the detection area of the radar, it can go on happens if the track is falling outside the detection zone of the
without radar confirmation and be picked up again when the radar. For example, objects passing by are tracked for a few
object comes back into the radar detection zone. frames even after they fall out of the Field of View. Since the
The analysis of video is based on edges detected by the relative speed is quite large, the relative longitudinal position
internal software of the camera, edges available only up to the of the track could become close to 0, if not negative,
internal horizon line. The fact that edges above the internal potentially creating numerical issues in the algorithm; track
horizon are not available can cause some issues where road management deletes them should this case occur.
gradients change markedly. Nevertheless, this has not proved
to be a major drawback in the diverse scenarios studied and 4.4 Multi-object Tracking
has enabled us to save a significant amount of processing
power. Once the region of interest is known either from radar After Track Management, if the measurement is available, it
detections or from the predicted position of the track, it is is possible to apply the Extended Kalman Filter corrector
scanned through, looking for edges that indicate the presence phase.
of the vehicle. Arranging edges along the vertical and the Every object is tracked by means of a 5 states estimator, using
horizontal direction, it is possible to identify the likely regions the corrector phase only when measurements are available.
of the lateral borders of the object. A similar operation is The 5 states for every track are:
performed looking for horizontal lines, a characteristic which Range – distance from the sensing centre to the object
usually helps to differentiate between vehicles and the road being tracked, measured on the centre line of ego-vehicle
surface. Range Rate – first derivative of the above state, is the
Once the search zone has been refined, another step is to look relative velocity of the tracked object with respect to ego-
at the symmetry of the region since vehicles are usually vehicle
symmetric when viewed from the rear, although this Lateral Position – lateral distance of the tracked object
assumption is not necessarily true in non-uniform lighting measured to the projection of the centre line of ego-
conditions. Another search consideration is that a vehicle vehicle
image is usually darker in the lower part and it has vertical Object width – tracked object width in real world coordinates
limits in this region. Combining the probability of finding all Horizon – an important parameter for the projection from
these characteristics in the analysed region it is possible to real world coordinates to image coordinates. It is
calculate the likelihood that the region is a vehicle, as well as geometrically linked with pitch but for calculation reasons
its position in the image. When two cars of similar colours are is used in this form.
queuing in front of the ego-vehicle, it could be hard to Measurements are all in the sensor’s own coordinate frame,
separate the two laterally since there is no strong edge to be and are:
detected and the lateral limit of the two cars can be merged Radar Range – as measured by Radar
together, giving a false impression of a wider vehicle. Radar Range Rate – as measured by Radar
A simple thresholding operation on the likelihood just Video Left Border – Left Column of the Detection Box
Figure 7 - Example of a reconstruction
4
Video Right Border – Right Column of the Detection Box markings, and adequate performance even in high-dynamic
Video Bottom Line – Bottom Line of the Detection Box pitch situations have been observed.
It is worth noting that the perspective transform to/from Tests have been carried out on highway, rural, and urban
image coordinates takes place inside the Extended Kalman scenarios and show a very good detection rate while keeping
Filter itself: all states are in world coordinates and the the number of false positives low.
measurements are partly in world coordinates (from the radar) Possible applications of this algorithm include the possibility
and partly in image coordinates (from camera). It is clear that to track the front object and mimic its behaviour for
the transformations are highly non-linear, requiring the applications like convoying on motorways (platooning). It is
Extended version of the Kalman Filter to be used instead of a worth noting that on top of usual platooning capability, there
standard one. It is worth noting as well that in this way the is the chance to watermark the vehicle with visual techniques
estimator becomes sub-optimal but fit for the purpose of this to re-identify it should the track be lost after it goes out of the
project. field of view.
Accurate pitch estimation is needed to fuse the information
correctly in the image plane with that of the real world 6 Conclusion & Perspectives
coordinates. As well as the internal horizon/pitch estimation
in every track, a global pitch estimator is used that takes into In this paper, we have presented work carried out in the
account single track horizon estimation, T-Cam internal framework of the European project interactIVe. We proposed
horizon estimation and other parameters like car acceleration. a vehicle perception method based on a multi-object tracking
This pitch estimator is not going to be described in detail in approach and fusion between radar and video sensors.
this paper. Experimental results obtained using a test car have been
The prediction phase for all tracks, including the ones that presented.
could not be associated with a measurement, is performed at The next steps will begin by generating ground truth data to
the end of the current iteration. This is not ideal since the time obtain quantitative results. Then, the introduction of a
between frames is not known exactly beforehand, but in this Histogram of Oriented Gradient (HOG) classifier will
particular case the time interval is constant and so known increase further the robustness of the likelihood function.
beforehand. The reason why the prediction phase has been Finally, pedestrian classification will be added to address
anticipated to the previous frame is code clarity. pedestrian detection and tracking.
4.5 Track Selection Acknowledgements

When two or more tracks are available, the main one is This work was also supported by the European Commission
selected based on the criteria of distance and belonging to the under interactIVe, a large scale integrated project, part of the
same lane (where applicable). This is useful for applications FP7-ICT for Safety and Energy Efficiency in Mobility. The
like platooning where the vehicle in front is followed, trying authors would like to thank all partners within interactIVe for
to mimic its behaviour. The main track is identified in orange their cooperation and valuable contribution.
in Figure 7.
References
5 Experimental results
[1] S. Pietzsch, T. Vu, J. Burlet, O. Aycard, T. Hackbarth, N.
The algorithm proved effective in fusing information from Appenrodt, J. Dickmann, and B. Radig, “Results of a precrash
both sensors in real time, enhancing strong characteristics application based on laser scanner and short range radars,”
from both and mitigating the shortcomings. In particular, as it IEEE Transactions on Intelligent Transport Systems, vol. 10,
is possible to notice in Figure 7, lateral position of radar no. 4, pp. 584–593, 2009.
targets is corrected by video analysis, and radar reflections [2] C. Shooter and J. Reeve, “Intersafe-2 architecture and
not from a vehicle are discarded. specification,” in IEEE International Conference on
Using a monocular Camera alone, the distance estimation Intelligent Computer Communication and Processing, 2009.
would have been problematic due to pitch dynamic of the car. [3] S.Thrun, “Robotic mapping: A survey”, in Exploring
Radar Data allow a more accurate measurement of the Artificial Intelligence in the New Millennium. Morgan
distance to target, producing at the same time a better Kaufmann, 2002
estimate of the pitch. [4] R.E. Kalman, “A new approach to linear filtering and
The difference in field of view from the two sensors allows prediction problems”, in Journal of basic engineering, 35,
continued tracking of objects even when they move out of the March 1960
field of view of the radar alone. Radar data can even be used [5] G. Welch and G. Bishop, “An introduction to the Kalman
to track one object that is shadowed by a closer one: the Filter” available at
overall algorithm, however, only tracks objects in direct line http://www.cs.unc.edu/~welch/kalman/index.html
of sight, since it needs the video confirmation. [6] G. Welch and G. Bishop, “An introduction to the Kalman
The technology used allows tracking vehicles and other Filter”. Technical Report, 2004
objects in a non-structured environment as well (Figure 6): it
has been tested on unsurfaced rural roads without road
5
[7] S. Thrun, W.Burgard and D. Fox, “Probabilistic Robotics
(Intelligent Robotics and Autonomous Agents)”, The MIT
Press, September 2005
[8] D.B.Reid, “An Algorithm for tracking multiple targets”, in
IEEE Transactions on Automatic Control, 24(6), 1979
[9] B. Castello, “The Hungarian Algorithm”, available at
http://www.ams.jhu.edu/~castello/362/Handouts/hungarian.pd
f

Robust and Efficient Multi-Object Detection and Tracking For Vehicle Perception Systems Using Radar and Camera

Uploaded by

Copyright:

Available Formats

Robust and Efficient Multi-Object Detection and Tracking For Vehicle Perception Systems Using Radar and Camera

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robust and Efficient Multi-Object Detection and Tracking For Vehicle Perception Systems Using Radar and Camera

Uploaded by

Copyright:

Available Formats

Robust and Efficient Multi-Object Detection and Tracking for

Vehicle Perception Systems using Radar and Camera Sensor

Figure 6 - Example on non-structured Road

Figure 7 - Example of a reconstruction

4.5 Track Selection Acknowledgements

You might also like