Identifying Re-identification Challenges: Past, Current and Future Trends

Yan Qian ORCID: orcid.org/0000-0002-2418-6992¹,
J. Barthelemy¹,
E. Karuppiah² &
…
P. Perez¹

1057 Accesses
Explore all metrics

Abstract

Person and vehicle re-identification has been a popular subject in the field of the computer vision technologies. Existing closed-set re-identification surpasses human-level accuracies on commonly used benchmarks, and the research focus for re-identification is shifting to the open world-setting. The latter setting is more suitable for practical applications, however, is less developed due to its challenges. On the other hand, existing research is more focused on person re-identification, even though both, person and vehicle, are important components for smart city applications. This review attempts to combine for the first time the problem of person and vehicle re-identification under closed and open settings, its challenges, and the existing research. Specifically, we start from the origin of the re-identification task and then summarize state-of-the-art research based on deep learning in different scenarios: person or vehicle or unified re-identification in closed- and open-world settings. Additionally, we analyse a new method for solving the re-identification task using the Transformer, a model architecture that relies entirely on an attention mechanism, which shows promising results. This survey facilitates future research by providing a summary on past and present trends, and aids to improve the usability of re-ID techniques.

DUPL-VR: Deep Unsupervised Progressive Learning for Vehicle Re-Identification

Domain Adaptive Egocentric Person Re-identification

PEVR: Pose Estimation for Vehicle Re-Identification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

With the increasing number of camera networks deployed in different areas of public spaces like hospitals, parks, colleges, roads, and other areas, more and more quantity of video data is being produced. It is essential to automatically identify and re-identify traffic components such as people and vehicles across non-overlapping cameras at different physical locations. This task is known as object re-identification (re-ID).

More specifically, the main objective of re-ID is to locate instances of a query object (probe) from a group of candidates (gallery) captured from different non-overlapping camera views. It is an extremely challenging task because the appearance variations of the same object (e.g. person or vehicle) might be more significant than those of different objects. Such variances are often caused by significant differences in viewpoint, illumination, resolution, occlusion, or color across various camera views.

In the earlier years, the task of re-ID was studied as a sub-task of multi-camera tracking (MCT) [138], which aims to determine the cross-camera trajectories of certain pedestrians or vehicles captured from multiple cameras. In 2006, person re-identification was defined for the first time by Gheissari et al. [35] as an independent research topic that aims to detect and match pedestrians captured from non-overlapping cameras using their visual appearance. Nowadays, with the development of large-scale datasets, re-ID has become a popular research topic in the computer vision community.

Most existing re-identification methods assume (1) a so-called closed-set setting (i.e. all the probe objects already exist in the gallery set, hence a given query is assumed to always have a correct match in the gallery), and (2) that the gallery only contains a limited number of people/vehicle (i.e. a small search scale). The latest re-ID methods have surpassed the human-level performance on several commonly used benchmarks collected from closed-world scenarios [67], such as Market1501 [184] and CUHK03 [75]. However, re-ID frameworks in a closed environment limit the scalability and usefulness in real-world application, in particular with large scale camera networks, where an unknown number of pedestrians and vehicles can appear. Taking into consideration the latter assumptions, open-set re-ID is more suitable in real-world applications, less developed and more challenging. It has however received increasing attention in the re-ID community [67].

This review is a first attempt to review in parallel existing methods for person and vehicle re-ID in an open- and closed-set environment. Section “Introduction” goes through the different re-ID scenarios. Section “Re-identification Methods” discusses state-of-the-arts methods on re-ID under various scenarios. We’ll go through person/vehicle and unified person and vehicle re-ID under closed and open environment. The interest of unifying person and vehicle re-ID into a single model instead of multiple ones, can be an advantage in terms of memory consumption and inference speed, which can be useful when deploying models in edge devices. Existing datasets and evaluation methods are summarized in Section “Datasets and Evaluation”. An overview of some state-of-the-arts methods in Section “Re-identification Methods” are presented in Section “Results”. Finally, Section “Conclusion” concludes the paper.

Steps of a Re-ID Framework

Building a re-ID system for a specific scenario requires five main steps [169], summarized and simplified in Fig. 1:

1.
Raw data collection: the first step is to acquire raw video data from cameras. These data contain complex and noisy background clutter. The cameras are usually located in different places under varying environments.
2.
Bounding box generation: the second step is then to extract the bounding boxes containing the object images from the raw video data. Bounding boxes can be obtained by either detection or tracking algorithms.
3.
Training data annotation: Data annotation is the process of labeling the videos or images of a dataset. Training data annotation is usually indispensable for discriminative re-ID model learning due to the large cross-camera variations.
4.
Model training: the annotated data (images or videos) are then being used to train a discriminative and robust re-ID model. Extensive models have been developed to handle the various challenges, concentrating on feature representation learning or distance metric learning [49].
5.
Object retrieval: afterx an object is detected with the field of view (FOV) of the camera, initial segmentation, classification and identification needs to be done. Given an object of interest (query) and a gallery set, the feature representations are extracted using the re-ID model learned in previous stage. A retrieved ranking list is obtained by sorting the calculated query-to-gallery similarity.

Person Re-ID vs Vehicle Re-ID

Both person and vehicle re-identification are core technologies of intelligent transportation systems, and are important for the construction of smart cities. Over the past few years, the field of person and vehicle re-identification have made significant progress, thanks to the progress of deep learning.

Existing research based on re-identification focus mostly on the person re-ID problem. Less attention has been paid to vehicle re-identification. This is not consistent with other computer vision tasks such as object detection, which have obtained increased attention in recent years. Although both pedestrians and vehicles are common objects in smart city applications, this lack of popularity can be explained due to the inferiority of large-scale and well-annotated datasets for vehicle re-identification.

Furthermore, compared to person re-identification, vehicle re-identification poses greater challenges due to the presence of small inter-class similarities and large intra-class differences. Small inter-class similarity comes from the fact that various vehicles look very similar. In fact, vehicles produced by the same or different manufacturers can have similar colors and shapes. As a result, subtle visual differences between two vehicle images make it difficult to determine whether they belong to the same vehicle. By contrast, people are easier to differentiate because they have more distinctive features, such as faces and clothes. Large intra-class difference is reflected in images of the same car looking different due to various factors, including diversity of resolutions, viewpoints, and illumination. Visual patterns of vehicles in different viewpoints change more than that of people, whereby images of the same person usually have consistent appearance even if there is a significant change of viewpoints [134]. See Fig. 2 for a visual comparison.

With the release of two large-scale datasets VeRi [85] and VehicleID [81] in 2016, vehicle re-identification has attracted more research attention [105].

Applications

Object re-ID technology has a significant role in fields such as multi-object tracking or intelligent monitoring.

In video observation, person re-identification is used to determine whether a person-of-interest (query) has been captured by another camera in a different location and time. The increasing demand of public safety and the widespread large camera networks in theme parks, university campuses and streets have contributed to the emergence of this task. In both scenarios, relying solely on brute-force human labor to identify a person-of-interest or track a person across cameras is extremely expensive. This makes person re-identification an accurate and efficient tool.

Vehicle re-ID can play great roles in intelligent transportation [177], urban computing [191], and intelligent surveillance [83], which can quickly discover, locate, and track the target vehicles [85]. Some practical applications for vehicle re-ID include: vehicle search, cross-camera vehicle tracking, automatic toll collection (as an alternative to expensive satellite-based tracking or Electronic Road Pricing (ERP) systems), parking lot access, traffic behavior analysis, vehicle counting, speed restriction management system, travel time estimation, etc. [29].

Challenges

The main challenges in the task of re-ID stem from small visual similarities among different objects, and large appearance variations of the same objects across cameras caused by the cross-view large disparity in viewpoint, illumination, occlusion and background clutter [133].

Some challenges are mentioned below [29]:

Small inter-class similarity: different automobile manufacturing companies produce vehicles with similar visual appearances;
Large intra-class similarity: the same vehicle looks drastically different from different viewpoints;
Pose and viewpoint variations: depending on camera calibrations, viewing angle, locations on the roadside, the same object can change appearance;
Occlusions: vehicles or people can be hidden by any other object, in result some discriminative parts are not visible and the matching can fail;
Illumination changes: depending on the camera setting, or the time of the day, illumination can change, and the same object is observed in different colors. Furthermore, vehicles can shine their lights on cameras, which in result affects the illumination of the image;
Variation in resolutions: due to camera calibration or age of cameras;
Background clutter: an object’s color can be the same as the image background;
Long-term re-ID: if the same object is captured after a long time, or captured at different locations, it might change shape or looks. Vehicles, for instance, can be repainted, retrofitted or modified.

A Timeline of Object Re-ID

Table 1 Milestones in the re-ID history [60, 67, 186]

Full size table

Research on person re-ID has started with multi-camera tracking [138]. In the following paragraphs we introduce some of the milestones in re-ID history. The different directions are also summarized in Table 1.

Multi-camera tracking. Object re-ID was tightly twined with multi-camera tracking in the early years. Initially, in order to track objects across disjoint camera views, appearance cues were integrated with the spatio-temporal reasoning [138]. Among the various frameworks that have been proposed, the Bayesian formulation is one natural way to integrate multiple types of features. Given the evidence observed in different camera views, the Bayesian formulation computes the posterior of object matching. Huang et al. [54] are amongst the first researchers proposing a Bayesian approach. They integrate the colors and the sizes of objects with velocities, arrival time and lane positions to track vehicles between two camera views. It models the probabilities of predicting the appearance or spatio-temporal features of objects observed in one camera view conditioned on their observations in the other camera view.

Multi-camera tracking with explicit re-identification. Zajdel et al. [175] proposed the first work on multi-camera tracking where the term person re-identification was proposed. The authors have developed a dynamic Bayesian network for tracking humans when they are within the FOV, and when they leave and re-enter the FOV.

Image-based re-ID as an independent task. Gheissari et al. [35] were the first to define person re-identification as an independent computer vision task that aims to match persons captured from non-overlapping cameras with their visual appearances [67, 186]. Their approach was evaluated on a dataset with 44 persons captured across 3 disparate views.

Multi-shot re-ID. Re-identification techniques based on an object’s appearances can be organized in two main groups: (1) single-shot and (2) multi-shot approaches. While the methods of the first task associate pairs of images of the same object, where each contains one instance, the second task employs multiple images of the same object as training data, considering still images or short sequences as testing observations [34]. Most prior re-ID works focus on image matching. Bazzani et al. [8] and Farenzena et al. [34] conducted the initial multi-shot re-ID works, in which frames are randomly selected. Color acts as a common feature in both works, and both calculate the minimum distance among bounding boxes in two image sets. The methods are evaluated on the VIPeR [40], iLIDS [148] and ETHZ [112] in [34], and on iLIDS [148] and ETHZ [112] in [8]. Both works [8, 34] conclude that multiple frames per person provide more information on appearance than the single-shot setting. Therefore, multi-shot re-ID has received increasing attention for improving re-ID accuracy.

Open-set re-ID. Open-set re-ID was introduced by Gong and Xiang [37], and it focuses on verifying whether the probe is in a gallery or not. Zheng et al. [190] are the first to conduct research on open-set re-ID based on a transfer ranking framework for set-based verification.

Deep learning for re-ID. In the earlier stages of re-ID research, low-level features, high-level semantic attributes, global representations, and local descriptions were obtained through complex but time-consuming hand-crafted techniques. The performance of these methods relied heavily on human expertise. After the success of deep learning in image classification [63], Yi et al. [170] and Li et al. [75] tempted to introduce CNNs to re-ID, and the methodology of feature extraction completely changed. Both authors employed a Siamese neural network [11] to determine whether a pair of input images belong to the same ID or not. They concluded that the deeply-learned paradigm almost dominates the re-ID feature learning procedure due to its advantages in end-to-end learning. Since, deep learning methods have become a popular option in re-ID.

End-to-end re-ID. Leng et al. [67] mention that end-to-end re-ID has two definitions: (1) from detection to matching, and (2) a combined learning architecture from feature extraction to distance measurement. Xu et al. [159] present a sliding window searching strategy to jointly model person detection and identity matching. This represents the first step of end-to-end re-ID research, and its experiments indicate that the search results of re-ID by combining detection and matching could better than those of the two stages conducted separately.

Closed-Set vs Open-Set Re-ID

Re-ID systems can be categorized into the closed-set and open-set setting. While a closed-set scenario assumes that the query object exists in the gallery set [9], an open-world scenario focuses on verifying whether a probe object is in a gallery or not [67].

Re-ID in a closed-set setting is a problem of identification, as the goal is to seek the most similar object of the probe in a gallery. The focus is to determine which gallery image belongs to the query object by returning the ID of the query object. Closed-set re-ID can be formulated as follows: let ${\mathscr {G}}$ be a gallery (database) composed of N images, denoted as $\{g_i\}^N_{i=1}$ [186]. They belong to N different identities 1, 2, ..., N. Given a probe (query) image q, its identity is determined by

$$\begin{aligned} i^* = \text {arg} \text {max}_{i \in 1, 2,..., N } \text {sim}\left( q, g_i\right) , \end{aligned}$$

(1)

where $i^*$ is the identity of probe q, and sim$(\cdot ,\cdot )$ is some kind of similarity function. Most existing re-ID works can be viewed as an identification task. However, this assumption does not always hold in real life, i.e. the probe may not appear in the gallery set captured from other camera views.

By contrast, open-set re-ID [37], can be considered as a verification problem, as the aim is to verify whether the probe appears in a certain gallery, e.g. is the observed person a new visitor or a returning visitor. That is, based on identification tasks, the open-world problem adds another condition to Eq. 1,

$$\begin{aligned} \text {sim}\left( q, g_{i^*}\right) > h, \end{aligned}$$

(2)

where h is the threshold above which we can assert that query q belongs to identity $i^*$; otherwise, q is determined an outlier identity which is not contained in the gallery, although $i^*$ is the first ranked identity in the identification process.

This scenario is much closer to practical video applications than the closed-set setting [139]. However, its low recognition rates under low false accepted rates of existing results show that this setting is very challenging [79, 186, 190].

Most existing re-ID, while surpassing human-level accuracies on several commonly-used benchmarks, are not suitable for solving the open-world one-shot group-based verification problem due to the following reasons [190]:

the probe set comprises exactly the same people contained in the gallery set in a closed-set environment;
a probe image is matched against every individual in the gallery set to find a match. This can, however, be unrobust in an open-world one-shot re-ID problem, because the probe image may not belong to anyone on the watch-list resulting in a forced mismatch; and
most existing learning models for re-identification require multiple images (multi-shots) of each target in the gallery in order to model appearance variations under viewpoint changes, to infer invariant features, or to learn a matching distance metric. With the number of samples decreasing, the performance of existing methods decreases.

The majority of the scientific community has been devoted mainly to closed-set re-ID, a mature technology that is convenient for conducting research given its various baselines, datasets and evaluations. Table 2 summarizes the differences between re-identification in an open-world and closed-world setting. The metrics CMC, mAP, TTR, FTR, FAR and DIR are detailed in Section “Evaluation”.

Table 2 Comparison between closed-set and open-set re-identification, from [67]

Full size table

Open-World vs. Open-set re-ID

The majority of re-ID benchmarks are constructed based on a limited number of objects and selective image data due to reasons such as privacy, workload or cost [9, 67]. These, however, do not correspond to the realistic requirements needed in practical applications. This has driven many researchers to solve the re-ID issue by focusing on methodology-driven approaches that are likely to improve the former methods rather than develop application-driven techniques. A difference is made between open-world and open-set re-ID.

Ye et al. [169] and Leng et al. [67] consider among open-world re-ID methods the following:

Heterogeneous re-ID. Studies focused on heterogeneous re-ID aim to mine the characteristics of raw multi-modality and low-resolution data in addition to general visible images. Specific data-driven re-ID studies include depth-based [45, 57], text-to-image [17, 180], visible-infrared [150, 168], and cross-resolution [78, 142] re-ID.
End-to-end re-ID. These tasks require that the model jointly performs the detection and re-ID step in a single framework. End-to-end re-ID involves the person Re-ID from raw images [157, 187] or videos [160], and multi-camera tracking [110].
Semi-/weakly supervised and unsupervised re-ID. It is assumed that we have enough training data for supervised re-ID model training in a closed-world re-ID. In open-world scenarios, however, limited labels are available [4, 84, 155] or even no label information is available [88, 167, 183]. One approach for unsupervised re-ID without target dataset labels is to transfer the knowledge on a labeled source dataset to the unlabeled target dataset using unsupervised domain adaptation (UDA) [95, 167]. One popular approach for UDA is to use GAN generation [5, 30, 146].
Noise-robust re-ID. Due to data collection and annotation difficulty, re-ID usually suffers from unavoidable noise. Three different noises are reviews in [169]: partial Re-ID with heavy occlusion [47, 122], Re-ID with sample noise caused by detection or tracking errors [18, 52], and Re-ID with label noise caused by annotation error [43, 192].
Open-set re-ID. The focus of this review. Leng et al. [67] call this task a narrow open-world perspective.

Re-identification Methods

The following paragraphs summarize methods to solve the vehicle and person re-ID task in a closed and open environment. Under closed setting, popular methods such as local features, metric learning, attention mechanism and unsupervised learning are preferred by researchers. Re-ID under open-set, on the other hand, is still less developed. Finally, we’ll introduce a new way of tackling the re-ID challenge using the Transformer model.

Reviews

Bedagkar et al. [9] present a survey of approaches and trends in person re-identification, with a focus on traditional methods, rather than deep learning. Zheng et al. [186] present a survey where they classify most current re-ID methods into two classes, i.e., image-based and video-based; in both tasks, hand-crafted and deep learning systems are reviewed. Wu et al. [152] split their review on person re-identification into six deep learning-based methods, i.e. identification deep model, verification deep model, distance metric-based deep model, part-based deep model, video-based deep model and data augmentation-based deep model. Leng et al. [67] provide a survey on open re-identification only. It is the first attempt to analyze the trends of open-world re-ID and summarizes them from both narrow (open-set) and generalized (open-world) perspectives. Ye et al. [169] conduct an overview for closed-world person re-ID from three different perspectives: deep feature representation learning, deep metric learning and ranking optimization, and for open-world person re-ID based on different perspectives including: heterogeneous re-ID, end-to-end re-ID, semi-supervised and unsupervised re-ID, noise-robust re-ID, and open-set re-ID.

In the field of vehicle re-identification, Khan et al. [60] investigated methods of vehicle re-identification for the first time by providing an insightful review on methods categorized into sensor based methods, hybrid methods, and vision based methods (hand-crafted feature based methods and deep feature based methods). Wang et al. [134], on the other hand, published a review on vehicle re-identification technologies purely based on deep learning. They separate the methods into five categories: methods based on local features, representation learning, metric learning, unsupervised learning, and attention mechanism. Deng et al. [29] publish for the first time a review on trends in vehicle re-ID that covers computer vision based methods for vehicle re-ID tasks, with different technological background of approaches including, global positioning systems (GPS), inductive loop and magnetic sensors.

We observe the following: (1) less surveys on vehicle re-identification have been published, (2) to our knowledge, no review on vehicle re-identification in an open-world setting exists, and (3) no review treating vehicle and person re-identification simultaneously has been found so far (see Table 3).

Table 3 Surveys on person and vehicle re-ID

Full size table

Methods for Closed-Set Re-ID

Traditionally, methods of re-identification require hand-crafted features. However, as the parameters need to be adjusted manually, only a few parameters are allowed in the design of the features.

Traditional machine learning uses feature engineering to artificially refine and clean the data. Generally, it includes three steps which are feature extraction, feature coding, and feature classification. Several methods for feature extraction include: the scale-invariant feature transform (SIFT) [91], the Local Binary Pattern (LBP) [104], the Histogram of Oriented Gradient (HOG) [28], and the Speed Up Robust Features (SURF) [7]. The generalization ability of these methods is, however, poor.

With the rise of large-scale datasets and deep learning systems, hand-crafted features are not needed anymore. Instead, features are learnt automatically from a lot of training data containing thousands of parameters, such that better features can be extracted. In the context of closed world re-identification, most existing works adopt the network architectures designed for image classification as the backbone [64, 118, 123]. CNN for re-ID problems should have two basic needs: feature extraction method for getting a useful feature from an image [66], and a distance metric for the comparison between the similarities of two images [1, 105].

Compared to handcrafted features, CNN feature extractors usually perform better because they can find the features robustly by supervised learning which jointly extracts the discriminant feature and estimates the classification/regression models.

Many researchers are working on person re-ID by using appearance features, human poses and temporal constraints. Deploying existing person re-ID frameworks on vehicle re-ID tasks, however, may not perform well. Vehicles have fewer discriminant features than human, due to viewpoint orientation, changes in light conditions and inter/intra-class similarity. The variations of vehicle orientation still make the re-identification task difficult. Traditionally, vehicle re-identification problems are solved by combining sensor data with other information, such as the passing time of a vehicle [80] and wireless magnetic sensors [65]. However, these methods require additional hardware costs and are very sensitive to environmental changes. License plate recognition is widely used in vehicle re-ID [99, 143]. Other works on vehicle re-ID technologies are based on vehicle attributes and appearance characteristics, such as shape, color, texture [98].

In this section we will focus on methods based on deep learning, where we summarize methods based on local features, distance metric learning, attention mechanism, unsupervised learning and Transformer models. All the mentioned research solve the person or vehicle re-identification problem in a closed-world setting. Additionally, we add a few person re-ID methods based on pose transfer. The results are summarized in Tables 5, 6, 7, 8).

Local Features

Using the whole graph to obtain a feature vector for image retrieval can lead to accuracy bottleneck problems, hence some researchers start paying attention to local features [134]. For each object, part-level local features are aggregated in order to formulate a combined representation for each object image, making it robust against misalignment variations [121, 169].

Methods based on local features can capture unique visual clues, which helps at distinguishing between different people or vehicles and improve the accuracy of their re-identification. Furthermore, researchers also combine local features with global features in order to improve the accuracy.

a. Person In the context of person re-identification, the body parts are generated by human pose estimator or roughly horizontal division. One of the main trends is to combine the full body representation with the local part features [23, 68, 182]. However, they require an additional pose detector and are prone to noise pose detections due to the large gap between the person re-ID and human pose estimation datasets [121]. Other works study the robustness of part-level feature learning against background clutter [127]. Others extract horizontal region features without pose estimation, using a Siamese Long Short-Term Memory (LSTM) model [129].

b. Vehicle In the area of vehicle re-ID, some works take their inspiration from person re-ID. Commonly used methods for extracting local features are key point location and region segmentation [124, 141]. Object key point localization finds its applications in face alignment [109] and human pose estimation [102]. The key points can be used to align the learned features. Wang et al. [141] extract local region features of different orientations based on the location of 20 key points, e.g. left-front wheel, right fog lamp, left headlight, lef rear lamp, rear license plate, etc. Some researchers also combine local features with global features in order to improve the accuracy. Liu et al. [87] propose the Region-Aware Deep Model (RAM), which, in addition to extracting global features, also extracts features from a series of local regions. Following a similar approach, He et al. [46] developed a novel framework which was trained end-to-end with combined local and global constraints by introducing a detection branch.

Strengths and Limitations The advantages of methods based on local features include their ability to capture unique visual cues and nuances that are present in local areas. This aids in distinguishing between different vehicles, ultimately improving the accuracy of object re-ID models. However, a drawback of local feature-based methods include the significant increase in computational load associated with the extraction of such features. Another limitation is that the learning of dependency happens among immediate neighbourhoods, leading to distant positions in feature maps being ignored.

Distance Metric Learning

Distance metric learning [158], or deep metric learning, is a method of using feature transformation to map and form clusters in a feature space, where the distance between each feature is being checked.

In the context of re-identification, a popular pipeline is to design suitable loss functions to train a CNN backbone (e.g. ResNet), which is used to extract features of images. Metric learning aims to learn the similarity between two images through the gallery, i.e. it aims to decrease the distance between similar targets, and to increase the distance between different targets. Therefore, metric learning requires individualized key features that help the model to distinguish between different objects [134].

Commonly used methods for metric learning loss include constrastive loss, triplet loss, quadruplet loss, where the most frequently used metric approach is the triplet-based deep architecture [26, 136, 152].

Contrastive Loss The contrastive loss takes the output of the network for a positive example and calculates its distance to an example of the same class and contrasts that with the distance to negative examples. Said another way, the loss is low if positive samples are encoded to similar (closer) representations and negative examples are encoded to different (farther) representations.

Contrastive loss function is defined as [169]

$$\begin{aligned} {\mathscr {L}}_{con} = \left( 1-\delta _{ij}\right) \left\{ max\left( 0, \rho - d_{ij}\right) \right\} ^2 + \delta _{ij} d_{ij}^2, \end{aligned}$$

where $d_{ij}$ represents the Euclidean distance between the embedding features of two input samples $x_i$ and $x_j$. $\delta _{ij}$ is a binary label indicator, i.e. $\delta _{ij} = 1$ if $x_i$ and $x_j$ belong to the same identity, and $\delta _{ij} = 0$ otherwise. The margin parameter is denoted by $\rho$ and is the distance that separates the two features of both similar and dissimilar pairs that belong to a group of images.

Triplet Loss The triplet loss treats the re-identification model training process as a retrieval ranking problem. The basic idea is that the distance between the positive pair should be smaller than the negative pair by a pre-defined margin [49, 169].

Models based on the triplet loss take three images as input: one query/anchor image $x_i$, one image with the same ID as the query (positive) $x_j$ and one image with a different ID to the query (negative) $x_k$. The margin $\alpha$ is enforced to ensure distance between positive and negative pairs. The triplet is then denoted as $t=(x_i, x_j, x_k)$, and the triplet loss function is formulated as

$$\begin{aligned} {\mathscr {L}}_{tri}(i,j,k) = max\left( \alpha + d_{ij} - d_{ik}, 0\right) , \end{aligned}$$

(3)

where d(.) measures the Euclidean distance between two samples, and $\alpha$ is the margin threshold that is enforced between positive and negative pairs. The accuracy of the model depends heavily on the selection of samples for the triplet loss function. When training the model, there should be both an easy pair and a difficult pair. An easy pair should have a small distance or minor changes between the two images, such as image rotation or slight modifications. In contrast, a hard pair should exhibit more significant changes in either clothing, surroundings, lighting, or other drastic transformations. Doing this can improve the accuracy of the triplet loss function.

Based on the definition of the loss, we can define three categories of triplets:

Easy triplets The loss is 0 and the negative sample is sufficiently distant to the anchor sample with respect to the positive sample in the embedding space: $d_{ik} > d_{ij} + \alpha$.
Hard triplets The negative sample is closer to the anchor sample than the positive sample: $d_{ik} < d_{ij}$.
Semi-hard triplets The loss is positive and smaller than $\alpha$, and the negative sample is more distant to the anchor sample than the positive sample, $d_{ij}< d_{ik} < d_{ij}+ \alpha$.

Quadruplet Loss The triplet loss pays main attentions on obtaining correct orders on the training set. It still suffers from a weaker generalization capability from the training set to the testing set, thus resulting in inferior performance. Chen et al. [22] came up with the quadruplet loss, which takes four images as input: the same three as for the triplet loss model, i.e. $t=(x_i, x_j, x_k)$, however a second negative member $x_k$, which is a sample from a class different from the anchor, but also different from the first negative, is added. The quadruplet is then denoted as $q=(x_i, x_j, x_k, x_l)$, and the quadruplet loss function can be defined as

$$\begin{aligned} {\mathscr {L}}_{quad}(i,j,k,l) = max\left( \alpha _1 + d_{ij} - d_{ik}, 0\right) + max\left( \alpha _2 + d_{ij} - d_{lk}, 0\right) , \end{aligned}$$

(4)

where d(.) marks the Euclidean distance, and $\alpha _1$ and $\alpha _2$ are the margin hyper-parameters. The first term $\alpha _1 + d_{ij} - d_{ik}$ with $\alpha _1 = 1$ is named as the strong push and $\alpha _2 + d_{ij} - d_{lk}$ with $\alpha _2$ lower than 0.5 is named as the weak push [22]. According to the latter, the improvement helps to have a better inter-class distance, resulting in a better generalization of the system with unseen and real-world data. An illustration of the difference between triplet loss and quadruplet loss is given by Fig. 3.

a. Person

Contrastive Loss The first Siamese CNN (S-CNN) architecture for human re-identification was proposed by Yi et al. [170]. The proposed method can jointly learn the color feature, texture feature and metric in a unified framework. Other works using the S-CNN include [75, 129]. Zheng et al. [188] present the first attention-driven Siamese learning architecture called the Consistent Attentive Siamese Network. This mechanism jointly models consistent attention across similar images.

Triplet Loss In the field of person re-identification, triplets have been used extensively. Wang et al. [136] employ the triple loss function in order to characterize the similarity between fine-grained images. Cheng et al. [23] propose a model that jointly learns the global full-body and local body-parts features of the input persons. Wu et al. [151] propose a model that employs triplet loss, identification loss and center loss to simultaneously train the carefully designed network. Hermans et al. [49] modify the classic way of using the triplet loss by proposing a model that uses the triplet loss with batch hard negative and positive mining to map images into a space where images with the same identity are closer than those of different identities, in order to find harder triplets to improve the efficacy of training. In general, triplet loss is not used alone. They are often combined with the cross-entropy or softmax loss. RPTM [36] uses this loss and test it on person re-ID dataset DukeMTMC as well as on vehicle re-ID datasets. Auto-ReID+ [41] is another work training on both losses. Sometimes, the center loss [149] is added to the softmax and triplet loss in one branch. Ms-Mb [56] combines the three losses which increases the inter-class variance and decreases the intra-class variance.

Quadruplet Loss Chen et al. [22] design a quadruplet loss, leading to the model output with a larger inter-class variation and a smaller intra-class variation compared to the triplet loss.

b. Vehicle

Contrastive Loss. As more variance between viewpoints exist in the field of vehicle re-identification, these methods tend to be more creative: Liu et al. [86] combine hand-crafted features and high-level attributes learned by a CNN with license-plate recognition with a Siamese neural network and spatio-temporal information. Other works using a Siamese neural network include [27, 117].

Triplet Loss. The triplet loss has also been used in the field of vehicle re-identification. Distinguishing features like paint, stickers, scratch marks on the vehicle, the annual inspection position of the vehicle on the front windshield, decoration, and tissue boxes are used to distinguish the different characteristics of the two cars. Zhang et al. [181] combine the triplet loss with a classification loss in order to learn the representations of images from the sampled triplets. Katsaros et al. [58] propose a novel, triplet-learnt coarse-to-fine reranking scheme (C2F-TriRe) to address vehicle re-identification. They rely on the assumption that if a vehicle is captured twice by the same view, the windshields should look similar. For the AI City Challenge 2019 [100] and AI City Challenge 2020 [101], several researchers have employed the triplet loss in their framework [53, 55, 197]. In the field of unmanned aerial vehicles (UAVs), Yao et al. [164] introduce the weighted triplet loss (WTL) function to train their model. This new loss deals with the large similarities between classes and focuses on the negative pairs to facilitate the re-identification capabilities of their network. Combining the softmax and triplet loss is a common approach, as done by works such as TransReID [48], V2ReID [108], RPTM [36] and GiT [115].

Quadruplet Loss Inspired by the quadruplet loss, Hou et al. [51] introduce a new method called the Deep Quadruplet Appearance Learning (DQAL), where the quadruplet concept consisting of anchor, positive, high-similar, and negative is designed and specially formed as the input.

Strengths and Limitations Methods based on metric learning produce overall accurate results. While the contrastive loss minimizes that distance between images of the same class and maximizes the distance between pairs from different class, it does not define the relation between similar or dissimilar pairs. On the other hand, triplet loss, which is one of the most popular losses, requires a large batch size of images (triplets of samples) to achieve promising performance. Finally, the quadruplet loss equips a quadruplet deep network with quadruplet inputs, making it limited to specific network structures.

Attention Mechanism

When perceiving a scene, humans focus on multiple fixation points of different locations and scales. Inspired by the human vision system, researchers have proposed deep neural network architectures that imitate our attention mechanism.

a. Person A trend in the task of person re-identification is to design attention modules that can extract information such as shape of people, or colour from clothing [162]. The most popular attention mechanisms for person re-ID are part-based systems [165], i.e the image is split into several parts: head, torso, legs and feet. A harmonious attention CNN (HA-CNN) model is introduced to combine both soft and hard attention mechanism [76]. Other propose a multi-directional attention model, which can extract attentive features through masking different levels of features with attention map [132]. Cai et al. [12] propose a multi-scale body-part mask guided attention network (MMGA), which jointly learns whole-body and part-body attention to help extract global and local features simultaneously. Background information can be easily considered as a saliency feature by re-ID models, which can influence their performance. The joint weak saliency and attention-aware (JWSAA) method is proposed by [103] to tackle the background interference and to extract better and more diverse features of a person. The adaptive weak saliency mechanism can change the importance of main features, such that the model attends to other valuable features. Another branch in person re-ID is infrared-visible person re-ID (IV-ReID), the task of matching pedestrian images captured by visible and thermal cameras. In order to distinguish people easily, features such as bags and shoes should be considered. DBA-Net [32] uses a dual branch network that is based on a multi-scale attention mechanism. The proposed network captures apparent global features, but also key fine-grained information. In the context of occluded person re-ID, BPBreID [119] learns body part representations using an attention mechanism that is trained from a dual supervision with identity and human parsing labels.

b. Vehicle In the field of vehicle re-identification, the regions on attention maps for vehicle re-ID correspond to the subtle and discriminative image regions, e.g. windshield stickers. Wei et al. [147] used the idea of the human process of identifying objects in a coarse-to-fine hierarchical process: identifying car type to details such as customized paintings or windshield stickers. Labor-intensive labels annotations are required to capture local discriminative features like in [141]. Li et al. [73] propose a self-supervised learning method that encodes local geometric features using an interpretable attention module. Chen et al. [21] propose a dedicated Semantics-guided Part Attention Network (SPAN) to robustly predict part attention masks for different views of vehicles given only image-level semantic labels during training. Other methods are based on hard attention [61], or on soft attention [126]. In order to enhance the spatial awareness of the re-ID model, DSN (Dual Self-attention Network) [195] uses a dual self-attention mechanism. The latter is built on a static and dynamic (or cross-region) attention. While the static attention captures long-range dependencies globally, the dynamic attention focuses on capturing local position-related-range dependencies.

Strengths and Limitations The attention mechanism imitates the human process of recognizing and focusing (i.e., attending) on specific objects. Attention mechanisms can extract distinctive region features which enhances the capabilities of object re-identification. However, it is worth noting that most attention-based models prioritize attention on larger regions and give less attention to finer pixel-levels. As a result, when datasets have fewer labels or backgrounds are more complex, the attention mechanism becomes less effective.

Unsupervised Learning

Most existing re-identification methods rely on supervised model learning with large scale labelled datasets and deep layered models. However, the annotation difficulty leads to poor scalability in a practical re-ID deployment, as it still suffers from generalization issues under the presence of a domain shift between the training and the test data distribution [113], hence researchers have started investigating unsupervised solutions for the re-identification task.

a. Person Fan et al. [33] propose a progressive unsupervised learning (PUL) method to transfer pre-trained deep representations to unseen domains. In their model, they indicate that the loss function for classification can be replaced with the contrastive or the triplet loss. Li et. al [72] formulate an Unsupervised Tracklet Association Learning (UTAL) framework capable of incrementally discovering and exploiting the underlying re-ID discriminative information from automatically generated person tracklet data end-to-end. They adopt the softmax CrossEntropy (CE) loss function to optimise the classification task.

Due to GANs achieving great success in many tasks, such as image generation [38] and translation [30], recent re-ID methods also explore GAN both in vehicle and person re-ID fields [90, 192]. Zheng et al. [192] start the first attempt to apply the GAN technique for person re-ID. It improves the supervised feature representation learning with the generated person images. UDA approaches include three stages, a supervised pre-training stage in the source domain followed by a clustering-based pseudo-label prediction for the target domain data and then an unsupervised fine-tuning stage in the target domain with the pseudo-labels. Following this structure, MDJL [19] extracts semantic features using data augmentation and feature decoupling. Different clustering methods are leveraged to obtain multiple groups of clustering results. Cluster results are further distilled by calculating the correlation between the cluster groups. However, the distribution of samples within clusters has not been fully considered in the process of data distilling. Often, unsupervised methods need external annotations support or partial labeling of training images. This makes these methods not purely unsupervised. In order to render these unsupervised methods more suitable fir real-world surveillance settings, Prasad et al. [106] propose a framework named Spatio-Temporal Association Rule based Deep Annotation-free Clustering (STAR-DAC) wich clusters the unlabeled person re-identification images based on visual features and then using spatio-temporal association rules, performs cluster-fine-tuning. While clustering to generate pseudo labels is a common approach, it can mix different true identities together or split the same identity into several clusters due to missing information in the embedding space. These noisy clusters affect the re-ID accuracy. Zhang et al. [179] propose an Implicit Sample Extension (ISE) method to aid finding more accurate clusters. ISE uses a progressive linear interpolation (PLI) strategy to guide the generation of support samples that aid the improvement of context representation for each cluster. However, the lack of hard negative samples is ignored.

b. Vehicle In the field of vehicle re-ID, Bashir et al. [6] present an approach that formulated the whole vehicle re-ID problem into an unsupervised learning paradigm, by combining a CNN architecture for feature extraction and an unsupervised technique to enable self-paced progressive learning. The unsupervised K-means clustering is adopted to infer the IDs in a semi-unsupervised manner. Zhou et al. [193] designed a GAN model called Cross-View Generative Adversarial Network (XVGAN) to generate cross-view vehicle images to improve cross-view re-ID, in order to learn the features of vehicle images captured by cameras with disjoint views, and take the features as conditional variables to effectively infer cross-view images. Wu et al. [153] used a GAN to synthesize vehicle images with diverse orientation and appearance variations to obtain more vehicle images and augment the training set. Lou et al. [90] proposed to generate desired vehicle images from same-view and cross-view to facilitate re-ID model training. Wu et al. [154] adopt a GAN to generate unlabeled samples and enlarge the training set. PLM [140] uses a domain adaptation module and a multi-scale attention network. The domain adaptation module employs CycleGAN and generates the "pseudo target samples". Using CycleGAN decreases the domain bias between source and target dataset. The generated samples and unlabeled samples are used to train the multi-scale attention network for feature learning. The network is trained on the weighted label smoothing (WLS) loss. Further unsupervised methods include memory bank-based methods. The Triplet Contrastive Learning (TCL) framework [114] establishes a connection between the local and global features using a proxy of a cluster memory bank. TCL has a module that extracts part and global features, a module that generated cluster pseudo labels, and three memory banks to store the updated features of the dataset. The model is trained on the proxy contrastive loss (PCL).

Strengths and Limitations Unsupervised learning uses unlabeled data to improve the generalization ability. GAN-based methods are commonly employed. While GANs facilitate image-to-image translation to overcome inconsistencies across different data domains, GANs require to balance two models in training which can lead to unstable training situations.

Pose Guided Person Re-ID

One of the main challenges for building a robust person re-ID model is the varying human poses. In addition, commonly used benchmarks such as Market-1501 or CUHK03 contain only a limited amount of pose variations, which increases the likelihood of models to overfit on certain poses. One option is to enhance existing datasets with real images of different poses and use any of the above-mentioned methods based on local features, distance metric learning or attention mechanism. Another option is to explore unsupervised learning and to use image generation methods directly to generate new poses. However, with the complexity of the human shape, image generators are more likely to create distorted human samples.

To tackle these issues, [82] propose a pose-transferrable framework. Unlike Market-1501 or CUHK03 (i.e., target sets), MARS dataset [185] (i.e, source set) contains various pose variations. The method named Pose-Transfer extracts the skeletons from the source set and transfers them to the target using a skeleton-to-image generation algorithm. This creates the generation set, which is combined with the target set to enhance the Re-ID model. Following a similar approach to extend non-rich pose datasets, [178] introduce the Pose vAriation Aware dAta Augmentation (PA⁴) method that is built using a pose transfer generative adversarial network (PTGAN) that translates poses to images to generate pose-rich samples for re-ID training.

Transformer

Deep learning models are introduced at an increasing rate, and it is often hard to keep track. One particular neural network model that has demonstrated exemplary performance in Natural Language Processing (NLP) are called transformer models. "Attention Is All You Need" [130] introduce a novel architecture called Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. Transformer is an architecture for transforming one sequence into another one with the help of two parts (encoder and decoder), but it differs from the previously described/existing sequence-to-sequence models because it does not imply any Recurrent Networks (GRU [25], LSTM, etc.).

The breakthroughs from Transformer networks in the NLP domain has sparked great interest in the computer vision community to adapt these models for vision and multi-modal learning tasks. Han et al. [44] Khan et al. [59] and provide surveys on transformers in computer vision. Transformer models have been successfully used for image recognition [31, 128], object detection [14], segmentation [166], video understanding [120], and re-identification [48].

In image recognition, pure transformers, such as Vision Transformer (ViT) [31] and Data-efficient image Transformers (DeiT) [128], have shown that they can be as effective as CNN-based methods for feature extraction. ViT, however, requires a large-scale dataset to pretrain the model, hence DeiT was proposed to overcome this shortcoming by introducing a teacher-student strategy to speed up the training for ViT.

He at al. [48]^{Footnote 1} are the first to introduce pure transformers in object re-ID. Their motivation comes from the advantages of pure transformer-based models being more suitable in CNN-based re-ID for the following reasons:

They are based on multi-head attention modules, which capture long range of dependencies and push the models to capture more discriminative parts compared to CNN-based methods;
They don’t use convolution and down-sampling operators, hence transformers are able to preserve detailed and discriminative information.

The proposed models achieve state-of-the-art performances on object re-ID, including person (e.g. Market1501 [184], DukeMTMC [111]) and vehicle (e.g. VeRi-776 [85], VehicleID [81]) re-ID. Furthermore, the authors design two modules to enhance the robust feature learning, a Jigsaw Patch Module (JPM) and a Side Information Embedding (SIE). In re-identification, an object might be partly occluded, leading to only a fragment being visible. The transformer, however, uses the information from the entire image. Hence the authors propose a JPM to address this issues. The JPM shuffles the patch embeddings and regroups them into different parts, and helps improving the robustness of the re-ID model. Additionally, a SIE is proposed to incorporate non-visual information, e.g. cameras or viewpoints, to tackle issues due to scene-bias.

With the introduction of the Transformer model in re-ID, an increased number of researchers started employing Transformer-based models.

a. Person DCAL [194] employs a global–local cross-attention to improve the interactions between global images and local regions. The Global–Local Cross-Attention (GLCA) module encodes location information and image pairs for Pair-Wise Cross-Attention (PWCA). By mining discriminative local information, GLCA can facilitate the learning of subtle features, which aids in recognizing fine-grained objects. This emphasizes the interaction between global images and local high-response regions. DCAL uses ViT as backbone. However, conventional Transformer networks have one major drawback, which is that the modelling of relations between representations can be computationally expensive.

Generally, person re-ID models learn from a single image instead of gathering information from multiple images of the same person. Potential interactions between images are ignored. Neighbor Transformer Network (NFormer) [135] models representations between persons to yield more robust and discriminative representations. NFormer is built by stacking four Landmark Agent Attention (LAA) modules where each module is adopted from the Transformer architecture and maps the relation between images. A Reciprocal Neighbor Softmax (RNS) is added to achieve sparse attention to neighbours. This helps to reduce the interference of irrelevant representations and additionally makes the computational workload lighter. A limitation of NFormer according to the authors is that it requires a large enough number of images of the same identity.

Following the same idea, X-ReID [116] employs cross-attention among multiple images of the same identity to obtain more unified and discriminative pedestrian information. Rather than using Instance-Level features from single images, the X-ReID method exploits Identity-Level features that are shared across different images for each identity. The proposed model contains a Cross Intra-Identity Instances module and a Cross Inter-Identity Instances module and uses ViT-Base pretrained on ImageNet as backbone.

b. Vehicle The complexity of background information in images can make it difficult to match identities in vehicle reID. To address this issue, MART [94] unifies three modules to extract background-unrelated global features and perspective-invariant local features, using ViT as backbone. The modules are the Foreground Global Features Extraction (FGFE) module, the Mask-guided Local Features Extraction (MLFE) module, and the Cross-images Local Features Reasoning (CLFR) module.

Qian et al. [107] propose the Unstructured Feature Decoupling Network (UFDN) as a solution to address feature misalignment by breaking down vehicle features into unstructured components and aligning them. All this is implemented without requiring any additional annotation. UFDN leverages a Transformer-based feature decomposing head (TDH) to segment the feature map into multiple grid stripes. A cluster-based decoupling constraint (CDC) is added to facilitate the alignment of the decomposed features across different images.

The focus of the Transformer is primarily on global information within images, and can potentially overlook other discriminative attributes of vehicles. To address this, the Vehicle Attribute Transformer (VAT) [172] embeds easily distinguishable attributes like color and car model, as well as viewpoint as an additional attributes. To optimize VAT, a newly created loss function named the multi-sample dispersion triplet (MDT) loss trains ViT. V2ReID [108] uses the Vision Outlooker (VOLO) [173] as backbone to extract features of vehicle images. VOLO is an outlook-attention-based network that attends the neighbouring elements to focus on the finer-level features. The dominance of VOLO reflects that the locality is indispensable for Transformer-based methods..

Methods for Unified Vehicle and Person Re-ID

In the real-world, people and vehicles are often recorded together, e.g. a person getting into the car, hence person and vehicle re-ID often need to be used in conjunction. One idea would be to integrate a person and vehicle re-ID architecture. Wei et al. [145] present a vehicle and person re-identification system called VP-ReID. It identifies query vehicle and person efficiently and accurately from a large gallery set. The system is build with two different models developed by the same author: a person re-ID model GLAD [144] and a vehicle re-ID model RAM [87]. Its accuracy, however, depends on the sub-systems, and needs another component to classify between people and vehicles, which in return, introduces inaccuracies. By training person and vehicle re-ID in an unified manner can avoid this.

The only work that truly unified person and vehicle re-identification into one framework has been published by Organisciak et al. [105]. Due to the success of the triplet loss across both tasks, Organisciak et al. [105] choose the triplet loss as a strong backbone in their implementation of an unified framework for simultaneous person and vehicle re-ID. They use the generalisation ability of metric learning, and design a network called MidTriNet, that uses the information generated by mid-level layers to develop better representations for the re-ID tasks. In fact, some works argue that mid-level layers show similar importance as higher-level layers for constructing effective feature embeddings for re-ID [171]. Organisciak et al. elaborate that mid-level information, including colours and textures, which are robust to viewpoint changes, are highly informative in determining whether an individual in the gallery set is the same as that in the query image. The highly abstract features in the final layers are therefore not necessarily optimal for comparison, especially within a triplet loss framework which attempts to differentiate between identities by comparing the feature representations of each image directly.

We can say that there’s a lack of research done in simultaneously re-identifying person and vehicle, which is a much more challenging task. Recognizing a person is very different to recognizing a car. From different perspectives and angles, a person looks fairly alike, the shape and colour information are of similar patterns. A vehicle, on the other hand, does not satisfy these conditions, colours can be distorted in different lightning due to the reflectiveness of the body of the car, shapes can look very different depending from which angle the vehicle is being viewed. From a long-term perspective, a person is more likely to undergo significant changes, e.g. putting on a coat or a hat, whereas a vehicle’s change is predictable. Hence unifying person and vehicle re-ID allows to study these underlying principles of re-ID [105].

For future research, Organisciak et al. [105] mention to explore transfer learning (a model is trained on one data set and tested on another). In fact, they hypothesise that training state-of-the-art methods on their unified dataset and testing on a dataset that is not a component of the unified one, such a DukeMTMC-reID, will render a model that is more robust and less likely to overfit, and thus perform stronger.

Methods for Open-Set Re-ID

Open-set re-ID methods focus on verifying whether a probe object is in a gallery or not, and it is more likely to be a verification task [133, 169], see Eq. 2. In the following paragraphs we summarize different works existing on open-set re-ID, and the results are indicated in Table 9.

a. Person Zheng et al. [189] conducted the first open-set re-ID work based on a transfer ranking framework for set-based verification. They introduce two set-based person verification problems: one-shot verification and multi-shot verification. In one-shot verification, the aim is to verify whether a query image is associated with a target image. In multi-shot verification, the goal is to verify whether a query is within the watch list, i.e. it performs a joint one-shot verification over all target people in the watch list. The authors design a watch list (gallery) of several known identities and a number of probes including target and non-target ones. The goal is to achieve high true target recognition (TTR) 7 and low false target recognition (FTR) 8 rate. They report results on ETHZ and iLIDS datasets that include 119 and 146 identities, respectively.

Cancela et al. [13] propose a novel framework based on online Conditional Random Field (CRF) inference, using appearance, temporal and spatial information. The authors propose a complete multi-camera tracking approach with both inter-camera and intra-camera tracking. For person re-ID, based on the first and last detection of each identity from each camera, the authors form a matrix where each entry is the pairwise similarity score of the first and last appearance of two pedestrians. The results are reported on SAIVT-Softbio dataset [10].

Liao et al. [79] formulate the re-identification task as two sub-tasks: detection and identification, where the detection part decides whether a probe identity is in the gallery, and assigns an ID if the probe is accepted. The authors also contribute to a database OPeRID v1.0 collected from a setting of 6 cameras, with 200 persons and 7.413 images segmented. Besides, two different evaluation metrics, the detection and identification rate (DIR) and the false accept rate (FAR), are proposed. Based on these metrics, a receiver operating characteristic (ROC) curve can be drawn.

Chan et al. [16] propose to use a metric learning method, where they modify a constrastive loss, such that the distance between features of the same person are below a threshold and that of distinct people are above that same threshold. The training aims at learning a transformation on embeddings that improve the open-set performance. Authors report results on the iLIDS-VID [137] and the PRID2011 [50] datasets, by discarding some gallery identities that effectively reduce the gallery set to 100 and 60 identities, respectively.

Zheng et al. [190] consider a group-based person verification problem, i.e. they assume that only one image is available for each person on a small watch list. A transfer local relative distance comparison (t-LRDC) is proposed to learn and transfer useful information from a labelled non-target dataset, captured in the same environment in order to achieve a more robust group-based verification under an open-set scenario. Authors report on the i-LIDS, ETHZ, CAVIAR4REID [24] and VIPeR [40] datasets.

Zhu et al. [196] introduce a person re-ID setting called Large Scale Open-World (LSOW) re-ID, characterized by vast probe search population with a large number of imposters. A hashing approach called Cross-view Identity Correlation and vErification (X-ICE) hashing is adopted to learn cross-view identity representation binarisation and discrimination in a joint manner.

Wang et al. [133] present a new and more realistic person Re-ID setting called OneShot-OpenSet-ReID ($\hbox {OS}^2$2ReID), which differs from existing closed-set re-ID settings and supervised open-set re-ID settings. In order to solve the problem, they propose a unsupervised subspace learning model named Regularised Kernel Subspace Learning (RKSL) model. The latter can learn cross-view identity discriminative information from unlabelled data, and is flexible to accommodate pairwise labels if available. They report on VIPeR [40] and CUHK01 [74].

Using deep learning, Li et al. [77] employ an Adversial PersonNet (APN), which jointly learns a generator, a person discriminator, a target discriminator and a feature extractor. The main idea of this GAN is to generate very target-like images (imposters) which enforces the feature extractor to be robust to the generated image attack.

Other methods address the closed-set person re-ID and report their results on the open-set setting. Chen et al. [20] remove most identities from the gallery of originally closed-set datasets like VIPer and CUHK01. Similarly, Ma et al. report results for the open-set person re-ID case by modifying versions of the iLIDS-VID and PRID2011 datasets [96]. Vidanapathirana et al. [131] propose a technique called Closed set to Open Set Person Re-Id Framework (C2OPR) that could be used to extend a closed set person re-identification system to support identification of new persons. Most recently, Alkanat et al. [2] address the open-set person re-ID by conducting tests on existing popular closed-set datasets with modifications. They do so by converting two of the most commonly user person re-ID datasets, Market1501 and DukeMTMC-reID, into an open-set dataset.

In the context of retail, person re-ID can be used for understanding how customers schedule their shopping. [97] present a first attempt TL-DCNN to solve the Top-View Open-World (TVOW) person re-id. Their model is trained on a self-collected dataset named TVPR2 (Top-View Person Re-identification 2), and trained using the triplet loss.

b. Vehicle Currently, no works on vehicle re-ID in an open-set scenario have been found so far.

Datasets and Evaluation

This section summarizes some commonly used datasets and evaluation metrics for person and vehicle re-ID under open and closed-set.

Datasets for Closed-world Re-ID

Several commonly-used datasets in the top venues of computer vision in recent years are, VIPeR [40], GRID [92], PRID2011 [50], CAVIAR4ReID [24], CUHK01 [111], CUHK03 [75], Market-1501 [184], MARS [185], iLIDS-VID [137], and DukeMTMC-reID [111].

A summary of the person re-ID datasets has been provided by Gou et al. [39] and on their website,^{Footnote 2} and a summary for vehicle re-ID datasets can be found on the website.^{Footnote 3} A summary of the most used datasets in person and vehicle re-identification is provided in Table 4.

The following observations can be made: 1) The number of public datasets for vehicle re-identification has increased in the past few years. 2) The dataset scale (both #image and #ID) has increased rapidly. Generally, the deep learning approach can benefit from more training samples. This also increases the annotation difficulty needed in closed-world person re-ID. 3) The camera number is also increasing to approximate the large-scale camera network in practical scenarios. This also introduces additional challenges for model generalizability in a dynamically updated network. 4) The bounding boxes generation is usually performed automatically detected/tracked, rather than manually cropped. This simulates the real-world scenario with tracking/detection errors [169].

Table 4 Some of the widely used publicly available benchmarks for person or vehicle re-identification

Full size table

Datasets for Unified Re-ID

Currently, there exists only one publicly available data set for re-identification [105] that includes both person and vehicle classes. Given that re-ID frameworks are mostly applicable to data consisting of pedestrians and vehicles, it is crucial for re-ID systems to simultaneously handle both types of data in order to be applicable to real-world scenarios.

Datasets for Open-world Re-ID

To our knowledge, the SAIVT-Softbio dataset [10] is the only dataset that simultaneously meets all the requirements for a full open-world task: Multi-shot data and multiple cameras with camera-transition uncertainty. It includes 152 people recorded using 8 cameras views.

One strategy to construct an open-set re-ID dataset, is to use closed-set datasets. Images from the probe and the gallery are randomly divided into two parts, one for training and one for testing with the condition that only some of same-person identities exist in the probe and gallery [133]. The authors did the following: the VIPeR [40] and the CUHK01 [74] datasets were used for the evaluation. The VIPeR dataset contains a total of 632 people with one image per person per view, whilst the CUHK01 dataset has 971 people with two images per person per view.

For both datasets, they created a gallery set G by randomly selecting 120 people from one camera view, and the probe set P by selecting half of the whole population (316 on VIPeR and 486 on CUHK01) from the other view, with the condition that 100 people exist in both G and P. For gallery set G, only one-shot image per person is included.

All the above mentioned methods result in galleries that are static and that do not adapt to new knowledge from scenes. Recently, [15] propose a self-adaptive gallery construction method for open-world person re-ID. Their approach automatically identifies new identities and incrementally adds the new unsupervised data to the gallery.

Evaluation

The performance of re-ID models is evaluated based on their ability to find the correct matches in the test set for each of the query objects [100]. Depending on the re-identification setting, different evaluation metrics are used.

Closed-world Re-ID

The most common type of comparison found in the literature between each model is CMC Curve, mean average precision (mAP), Rank 1, Rank 5, and Rank 10.

CMC Curve. Cumulative Matching Characteristics (CMC) curve is the most widely used evaluation methodology for re-ID [9, 67, 134]. It measures the performance of the system’s ranking ability from 1 to m and considers re-identification as a ranking task, in which the gallery’s candidate images are ranked according to their similarities to the query image of set of images. In other words, CMC depicts the ranking score of the best match [67].

mAP. The Average Precision (AP) metric assesses the outcomes of the model based on a single query, while the mean Average Precision (mAP) assesses how well the model judges the results across all query images. mAP is the average of all the AP, AP and mAP can be calculated as follows [6]:

$$\begin{aligned} AP = \frac{\sum ^{n}_{k=1} p(k) g(k)}{Ng} \end{aligned}$$

(5)

where n is the number of test images and $N_g$ is the number of ground truth images, p(k) the precision at the k-th position. g(k) represents the indicator function where the value is 1 if match is found at k-th else 0. The mean average precision (mAP) is calculated as follows where Q is the number of total images queried:

$$\begin{aligned} mAP = \frac{\sum ^{Q}_{q=1} AP(q)}{Q}. \end{aligned}$$

(6)

In the AI City Challenge 2019 [100], the authors use the rank-K mAP measure. The latter computes the mean of the average precision, i.e. the area under the Precision-Recall curve, over all the queries when considering only the top-K results for each query.

Rank. The similarity of a test to its own class can be measured by the rank. The rank-m value represents the probability of obtaining correct results from images with the highest confidence in the search results. This means that a higher rank indicates better performance of the model. For instance, if a car labeled c1 is searched through 100 samples and the result is c1, c2, c3, c4, c5..., the accuracy rate of rank-1 is 100% because c1 ranks first in the result sequence. Similarly, the accuracy rate of rank-2 and rank-5 is 100% because c1 is among the first two positions of the result sequence. If the identification result is instead c2, c1, c3, c4, c5..., the accuracy rate of rank-1 is 0%, rank-2 has 100% accuracy and the accuracy rate of rank-5 would be also 100%. When there are multiple vehicles to be inquired, the average value is the value of rank-m [134].

Open-set Re-ID

CMC rate calculated at a fixed False Accept Rate (FAR) indicates the likelihood of misidentification of the wrong identity [79, 133].

TTR vs FTR. Open-set evaluation metrics should be independent of one-to-one identity correspondence. Two metrics have been defined by Zheng et al. [190], high true target recognition (TTR) and low false target recognition (FTR), which focus on calculating the likelihood of the number of query target and non-target images being verified as target identities.

Since a lot of images of non-target people are mixed with the target ones as query images, the performance on how well a true target has been verified and how bad a false target has passed through the verification and their relations, need to be quantified.

They introduce the true target rate (TTR) and false target rate (FTR) as follows:

$$\begin{aligned} & \quad \text {TTR} = \frac{N_{TTQ}}{N_{TQ}} \end{aligned}$$

(7)

$$\begin{aligned} & \quad \text {FTR} = \frac{N_{FNTQ}}{N_{NTQ}} \end{aligned}$$

(8)

where $N_{TQ}$ is the numbers of probe images from target people, $N_{NTQ}$ is the numbers of probe images from non-target persons, $N_{TTQ}$ is the number of accurate verifications that the target probe images are matched in the gallery, and $N_{FNTQ}$ represents the number of false verifications that non-target probe images are matched as the target person.

DIR vs FAR. Open-set experiments can also be evaluated using Detection and Identification Rate (DIR) versus False Acceptance Rate (FAR) curve [16, 79].

For this performance evaluation, three sets of images are involved. We’ll give the explanation in terms of person re-identification. Given P, the set of probe identities, and G the set of gallery identities, we define $P\cap G$ as common identities, and $P \setminus G$ as probe imposter identities. The distance between the set of images of person j in the gallery and the set of images of person i in the probe set is given by $dist(j^g, i^p).$

Then, the DIR and FAR measures are formulated as

$$\begin{aligned} DIR(\tau , k)&= \frac{| \{i | i \in P \cap G, rank(i) \le k, dist(i^g, i^p) \le \tau |}{ |P \cap G |} \\ FAR(\tau )&= \frac{| \{i | i \in P \setminus G, min_{j \in G} dist(j^g, i^p) \le \tau |}{ |P \setminus G |}, \\ \end{aligned}$$

where $DIR(\tau , k)$ represents the proportion of common identities that are re-identified before rank k with a distance that is smaller than $\tau$, and $FAR(\tau )$ represents the proportion of impostor identities with a distance to their closest gallery identity being smaller than $\tau$.

Results

After introducing existing research, datasets and evaluation metrics, we can finally lay out a summary of the accuracy of the above mentioned research done on person or/and vehicle re-ID in a closet-set environment (tables 5, 6, 7, 8), and on research in person re-ID in an open-set environment (Table 9).

The abbreviations used are the following:

LF = local features,
DML = deep metric learning,
CL = contrastive loss,
TL = triplet loss,
QL = quadruplet loss,
AM = attention mechanism,
UL = unsupervised learning.

Conclusion

In this review we have combined for the first time the problem of person and vehicle re-identification under closed and open settings, its challenges and the existing research.

While existing closed-set re-ID surpass human-level accuracies on commonly-used benchmarks, the research focus for re-ID is shifting to the open world-setting, as this setting is more suitable for practical applications. Re-ID under the open world-setting is, however, an extremely challenging task due to several reasons. The person or vehicle in question might not be part of the already existing gallery captured by the camera network and the amount of pedestrians or vehicles is unknown or uncertain in multiple cameras. From a data set availability point of view, only one dataset simultaneously meeting all the requirements for a full open-setting task exists.

For each environment, we have presented past and present research on person, vehicle, and unified person and vehicle re-ID. Existing research based on re-identification focus mostly on the person re-ID problem. Less attention has been paid to vehicle re-ID. No existing research has been found on vehicle re-ID in an open setting. There is still research lacking on unifying person and vehicle re-ID. Combining both allows to study the underlying principles of re-ID, avoid inaccuracies that would depend on the sub-systems (if two architectures were combined and integrated into one), and can be of advantage in terms of memory consumption and inference speed when deploying the model in edge devices. Finally, we have included the Transformer to solve the re-ID problem, which achieves state-of-the-art performance, and should be considered for future work.

In summary, there is still a lack of research in closed-set vehicle re-ID, open-set person/vehicle re-ID, and unified person and vehicle re-ID. A promising direction is to continue using the Transformer model, and we encourage researchers to explore this research field more in depth.

Table 5 Summary of some results on person re-identification in a closed-set environment on the Market-1501 dataset [184]

Full size table

Table 6 Summary of some results on person re-identification in a closed-set environment on the CUHK03 dataset [75]

Full size table

Table 7 Summary of some results on vehicle re-identification in a closed-set environment on the VeRi-776 dataset [85]

Full size table

Table 8 Summary of some results on vehicle re-identification in a closed-set environment on the VehicleID dataset [81] large, i.e. nb of vehicles = 2400 / nb of images = 21542

Full size table

Table 9 Summary of results for person re-ID in an open-set environment

Full size table

Data availability

Not applicable.

Code availability

Not applicable.

Notes

https://github.com/heshuting555/TransReID.
https://github.com/NEU-Gou/awesome-re-ID-dataset.
https://github.com/knwng/awesome-vehicle-re-identification.

References

Ahmed E, Jones M, Marks TK. An improved deep learning architecture for person re-identification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA; 2015. p. 3908–16. https://doi.org/10.1109/CVPR.2015.7299016.
Alkanat T, Bondarev E, et al. Improving open-set person re-identification by statistics-driven gallery refinement. In: Twelfth International Conference on Machine Vision (ICMV 2019), International Society for Optics and Photonics,. 2020;11433:114330V.
Bai Y, Lou Y, Gao F, Wang S, Wu Y, Duan LY. Group-sensitive triplet embedding for vehicle reidentification. IEEE Trans Multimed. 2018;20(9):2385–99.
Article Google Scholar
Bak S, Carr P. One-shot metric learning for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR) 2017.
Bak S, Carr P, Lalonde JF. Domain adaptation through synthesis for unsupervised person re-identification. In: Proceedings of the European Conference on computer vision (ECCV), 2018; p. 189–205.
Bashir RMS, Shahzad M, Fraz M. Vr-proud: Vehicle re-identification using progressive unsupervised deep architecture. Pattern Recogn. 2019;90:52–65.
Article Google Scholar
Bay H, Tuytelaars T, Van Gool L. Surf: Speeded up robust features. In: European Conference on computer vision, Springer, 2006; p. 404–417.
Bazzani L, Cristani M, Perina A, Farenzena M, Murino V. Multiple-shot person re-identification by hpe signature. In: 2010 20th International Conference on Pattern Recognition, IEEE, 2010; p. 1413–1416.
Bedagkar-Gala A, Shah SK. A survey of approaches and trends in person re-identification. Image Vis Comput. 2014;32(4):270–86.
Article Google Scholar
Bialkowski A, Denman S, Sridharan S, Fookes C, Lucey P. A database for person re-identification in multi-camera surveillance networks. In: 2012 International Conference on digital image computing techniques and applications (DICTA), IEEE 2012; p. 1–8.
Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R. Signature verification using a “Siamese” time delay neural network. In: Proceedings of the 6th international conference on neural information processing systems (NIPS’93). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1993. p. 737–44.
Cai H, Wang Z, Cheng J. Multi-scale body-part mask guided attention for person re-identification. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR) Workshops 2019.
Cancela B, Hospedales TM, Gong S. Open-world person re-identification by multi-label assignment inference 2014.
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In: European Conference on computer vision, Springer, 2020; p. 213–229.
Casao S, Azagra P, Murillo AC, Montijano E. A self-adaptive gallery construction method for open-world person re-identification. Sensors. 2023;23(5):2662.
Article Google Scholar
Chan-Lang S, Pham QC, Achard C. Closed and open-world person re-identification and verification. In: 2017 International Conference on digital image computing: techniques and applications (DICTA), IEEE, 2017; p. 1–8.
Chen D, Li H, Liu X, Shen Y, Shao J, Yuan Z, Wang X. Improving deep visual representation for person re-identification by global and local image-language association. In: Proceedings of the European Conference on computer vision (ECCV), 2018; p. 54–70.
Chen D, Li H, Xiao T, Yi S, Wang X. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018; p. 1169–1178.
Chen F, Wang N, Tang J, Yan P, Yu J. Unsupervised person re-identification via multi-domain joint learning. Pattern Recogn. 2023;138: 109369.
Article Google Scholar
Chen SZ, Guo CC, Lai JH. Deep ranking for person re-identification via joint representation learning. IEEE Trans Image Process. 2016;25(5):2353–67.
Article MathSciNet Google Scholar
Chen TS, Liu CT, Wu CW, Chien SY. Orientation-aware vehicle re-identification with semantics-guided part attention network. arXiv preprint 2020. arXiv:2008.11423.
Chen W, Chen X, Zhang J, Huang K. Beyond triplet loss: a deep quadruplet network for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2017; p. 403–412.
Cheng D, Gong Y, Zhou S, Wang J, Zheng N. Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2016; p. 1335–1344.
Cheng DS, Cristani M, Stoppa M, Bazzani L, Murino V. Custom pictorial structures for re-identification. In: Bmvc, Citeseer, 2011;1.
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint 2014. arXiv:1406.1078.
Chopra S, Hadsell R, LeCun Y. Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on computer vision and pattern recognition (CVPR’05), IEEE. 2005;1:539–46.
Cui C, Sang N, Gao C, Zou L. Vehicle re-identification by fusing multiple deep neural networks. In: 2017 Seventh International Conference on Image Processing Theory. IEEE: Tools and Applications (IPTA); 2017. p. 1–6.
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on computer vision and pattern recognition (CVPR’05), IEEE,. 2005;1:886–93.
Deng J, Khokhar MS, Aftab MU, Cai J, Kumar R, Kumar J, et al. Trends in vehicle re-identification past, present, and future: a comprehensive review. arXiv preprint 2021. arXiv:2102.09744.
Deng W, Zheng L, Ye Q, Kang G, Yang Y, Jiao J. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2018; p. 994–1003.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint 2020. arXiv:2010.11929.
Fan D, Wang L, Cheng S, Li Y. Dual branch attention network for person re-identification. Sensors. 2021;21(17):5839.
Article Google Scholar
Fan H, Zheng L, Yan C, Yang Y. Unsupervised person re-identification: Clustering and fine-tuning. ACM Trans Multimed Comput Commun Appl (TOMM). 2018;14(4):1–18.
Article Google Scholar
Farenzena M, Bazzani L, Perina A, Murino V, Cristani M. Person re-identification by symmetry-driven accumulation of local features. In: 2010 IEEE Computer Society Conference on computer vision and pattern recognition, IEEE, 2010; p. 2360–2367.
Gheissari N, Sebastian TB, Hartley R. Person reidentification using spatiotemporal appearance. In: 2006 IEEE Computer Society Conference on computer vision and pattern recognition (CVPR’06), IEEE,. 2006;2:1528–35.
Ghosh A, Shanmugalingam K, Lin WY. Relation preserving triplet mining for stabilising the triplet loss in re-identification systems. In: Proceedings of the IEEE/CVF Winter Conference on applications of computer vision, 2023; p. 4840–4849.
Gong S, Xiang T. Person re-identification. In: Visual analysis of behaviour. London: Springer; 2011. p. 301–13.
Chapter Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial networks. Commun ACM. 2020;63(11):139–44. https://doi.org/10.1145/3422622.
Article MathSciNet Google Scholar
Gou M, Wu Z, Rates-Borras A, Camps O, Radke RJ, et al. A systematic evaluation and benchmark for person re-identification: features, metrics, and datasets. IEEE Trans Pattern Anal Mach Intell. 2018;41(3):523–36.
Google Scholar
Gray D, Tao H. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: European Conference on computer vision, Springer, 2008; p. 262–275.
Gu H, Fu G, Li J, Zhu J. Auto-reid+: Searching for a multi-branch convnet for person re-identification. Neurocomputing. 2021;435:53–66.
Article Google Scholar
Guo H, Zhao C, Liu Z, Wang J, Lu H. Learning coarse-to-fine structured feature embedding for vehicle re-identification. In: AAAI 2018.
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang I, Sugiyama M. Co-teaching: Robust training of deep neural networks with extremely noisy labels. arXiv preprint 2018. arXiv:1804.06872.
Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y, et al. A survey on visual transformer. arXiv preprint 2020. arXiv:2012.12556.
Haque A, Alahi A, Fei-Fei L. Recurrent attention models for depth-based person identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2016; p. 1229–1238.
He B, Li J, Zhao Y, Tian Y. Part-regularized near-duplicate vehicle re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2019; p. 3997–4005.
He L, Liang J, Li H, Sun Z. Deep spatial feature reconstruction for partial person re-identification: alignment-free approach. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2018; p. 7073–7082.
He S, Luo H, Wang P, Wang F, Li H, Jiang W. Transreid: transformer-based object re-identification. arXiv preprint 2021. arXiv:2102.04378.
Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification. arXiv preprint 2017. arXiv:1703.07737.
Hirzer M, Beleznai C, Roth PM, Bischof H. Person re-identification by descriptive and discriminative classification. In: Scandinavian Conference on Image analysis, Springer, 2011; p. 91–102.
Hou J, Zeng H, Zhu J, Hou J, Chen J, Ma KK. Deep quadruplet appearance learning for vehicle re-identification. IEEE Trans Veh Technol. 2019;68(9):8512–22.
Article Google Scholar
Hou R, Ma B, Chang H, Gu X, Shan S, Chen X. Vrstc: Occlusion-free video person re-identification. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2019; p. 7183–7192.
Hsu HM, Huang TW, Wang G, Cai J, Lei Z, Hwang JN. Multi-camera tracking of vehicles based on deep features re-id and trajectory-based camera link models. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR) Workshops 2019.
Huang T, Russell S. Object identification in a Bayesian context. In: IJCAI, Citeseer. 1997;97:1276–82.
Huang TW, Cai J, Yang H, Hsu HM, Hwang JN. Multi-view vehicle re-identification using temporal attention model and metadata re-ranking. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR) Workshops 2019.
Jiao S, Pan Z, Hu G, Shen Q, Du L, Chen Y, Wang J. Multi-scale and multi-branch feature representation for person re-identification. Neurocomputing. 2020;414:120–30.
Article Google Scholar
Karianakis N, Liu Z, Chen Y, Soatto S. Reinforced temporal attention and split-rate transfer for depth-based person re-identification. In: Proceedings of the European Conference on computer vision (ECCV), 2018; p. 715–733.
Katsaros E, Bouma H, van Rooijen A, Dusseldorp E. A triplet-learnt coarse-to-fine reranking for vehicle re-identification. In: ICPRAM, 2020; p. 518–525.
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M. Transformers in vision: A survey. arXiv preprint 2021. arXiv:2101.01169.
Khan SD, Ullah H. A survey of advances in vision-based vehicle re-identification. Comput Vis Image Underst. 2019;182:50–63.
Article Google Scholar
. Khorramshahi P, Kumar A, Peri N, Rambhatla SS, Chen JC, Chellappa R. A dual-path model with adaptive attention for vehicle re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, 2019; p. 6132–6141.
Krause J, Stark M, Deng J, Fei-Fei L. 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on computer vision workshops, 2013; p. 554–561.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25:1097–105.
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90. https://doi.org/10.1145/3065386.
Article Google Scholar
Kwong K, Kavaler R, Rajagopal R, Varaiya P. Arterial travel time estimation based on vehicle re-identification using wireless magnetic sensors. Transport Res Part C Emerg Technol. 2009;17(6):586–606.
Article Google Scholar
Lecun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. https://doi.org/10.1038/nature14539.
Article Google Scholar
Leng Q, Ye M, Tian Q. A survey of open-world person re-identification. IEEE Trans Circuits Syst Video Technol. 2019;30(4):1092–108.
Article Google Scholar
Li D, Chen X, Zhang Z, Huang K. Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2017; p. 384–393.
Li J, Wang J, Tian Q, Gao W, Zhang S. Global-local temporal representations for video person re-identification. In: Proceedings of the IEEE International Conference on computer vision, 2019; p. 3958–3967.
Li J, Yu C, Shi J, Zhang C, Ke T. Vehicle re-identification method based on swin-transformer network. Array 16 2022.
Li M, Zhu X, Gong S. Unsupervised person re-identification by deep learning tracklet association. In: Proceedings of the European Conference on computer vision (ECCV), 2018; p. 737–753.
Li M, Zhu X, Gong S. Unsupervised tracklet person re-identification. IEEE Trans Pattern Anal Mach Intell. 2019;42(7):1770–82
Article Google Scholar
Li M, Huang X, Zhang Z. Self-supervised geometric features discovery via interpretable attention for vehicle re-identification and beyond. In: Proceedings of the IEEE/CVF International Conference on computer vision 2021; p. 194–204.
Li W, Wang X. Locally aligned feature transforms across views. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2013; p. 3594–3601.
Li W, Zhao R, Xiao T, Wang X. Deepreid: deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2014; p. 152–159.
Li W, Zhu X, Gong S. Harmonious attention network for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2018; p. 2285–2294.
Li X, Wu A, Zheng WS. Adversarial open-world person re-identification. In: Proceedings of the European Conference on computer vision (ECCV) 2018; p. 280–296.
Li YJ, Chen YC, Lin YY, Du X, Wang YCF. Recover and identify: a generative dual model for cross-resolution person re-identification. In: Proceedings of the IEEE/CVF International Conference on computer vision, 2019; p. 8090–8099.
Liao S, Mo Z, Zhu J, Hu Y, Li SZ. Open-set person re-identification. arXiv preprint 2014. arXiv:1408.0872.
Lin WH, Tong D. Vehicle re-identification with dynamic time windows for vehicle passage time estimation. IEEE Trans Intell Transp Syst. 2011;12(4):1057–63.
Article Google Scholar
Liu H, Tian Y, Yang Y, Pang L, Huang T. Deep relative distance learning: Tell the difference between similar vehicles. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2016; p. 2167–2175.
Liu J, Ni B, Yan Y, Zhou P, Cheng S, Hu J. Pose transferrable person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, 2018; p. 4099–4108.
Liu W, Zhang Y, Tang S, Tang J, Hong R, Li J. Accurate estimation of human body orientation from rgb-d sensors. IEEE Trans Cybern. 2013;43(5):1442–52.
Article Google Scholar
Liu X, Song M, Tao D, Zhou X, Chen C, Bu J. Semi-supervised coupled dictionary learning for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2014; p. 3550–3557.
Liu X, Liu W, Ma H, Fu H. Large-scale vehicle re-identification in urban surveillance videos. In: 2016 IEEE International Conference on multimedia and expo (ICME), IEEE 2016; p. 1–6.
Liu X, Liu W, Mei T, Ma H. Provid: progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans Multimed. 2017;20(3):645–58.
Article Google Scholar
Liu X, Zhang S, Huang Q, Gao W. Ram: a region-aware deep model for vehicle re-identification. In: 2018 IEEE International Conference on multimedia and expo (ICME), IEEE 2018; p. 1–6.
Liu Z, Wang D, Lu H. Stepwise metric promotion for unsupervised video person re-identification. In: Proceedings of the IEEE International Conference on computer vision (ICCV) 2017.
Lou Y, Bai Y, Liu J, Wang S, Duan L. Veri-wild: a large dataset and a new method for vehicle re-identification in the wild. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2019; p. 3235–3243.
Lou Y, Bai Y, Liu J, Wang S, Duan LY. Embedding adversarial learning for vehicle re-identification. IEEE Trans Image Process. 2019;28(8):3794–807.
Article MathSciNet Google Scholar
Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE International Conference on computer vision, Ieee. 1999;2:1150–7.
Loy CC, Xiang T, Gong S. 2009 Multi-camera activity correlation analysis. In: 2009 IEEE Conference on computer vision and pattern recognition, IEEE 2009; p. 1988–1995.
Loy CC, Liu C, Gong S. Person re-identification by manifold ranking. In: 2013 IEEE International Conference on image processing, IEEE 2013; p. 3567–3571.
Lu Z, Lin R, Hu H. Mart: mask-aware reasoning transformer for vehicle re-identification. IEEE Trans Intell Transp Syst. 2023;24(2):1994–2009.
Google Scholar
Ma AJ, Yuen PC, Li J. Domain transfer support vector ranking for person re-identification without target camera label information. In: Proceedings of the IEEE International Conference on computer vision (ICCV) 2013.
Ma X, Zhu X, Gong S, Xie X, Hu J, Lam KM, Zhong Y. Person re-identification by unsupervised video matching. Pattern Recogn. 2017;65:197–210.
Article Google Scholar
Martini M, Paolanti M, Frontoni E. Open-world person re-identification with rgbd camera in top-view configuration for retail applications. IEEE Access. 2020;8:67756–65.
Article Google Scholar
Matei BC, Sawhney HS, Samarasekera S. Vehicle tracking across nonoverlapping cameras using joint kinematic and appearance features. In: CVPR 2011, IEEE 2011; p. 3465–3472.
Montazzolli Silva S, Rosito Jung C. License plate detection and recognition in unconstrained scenarios. In: Proceedings of the European Conference on computer vision (ECCV) 2018; p. 580–596.
Naphade M, Tang Z, Chang MC, Anastasiu DC, Sharma A, Chellappa R, Wang S, Chakraborty P, Huang T, Hwang JN, Lyu S. The 2019 ai city challenge. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR) Workshops 2019.
Naphade M, Wang S, Anastasiu DC, Tang Z, Chang MC, Yang X, Zheng L, Sharma A, Chellappa R, Chakraborty P. The 4th ai city challenge. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR) Workshops 2020.
Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In: European Conference on computer vision, Springer 2016; p. 483–499.
Ning X, Gong K, Li W, Zhang L. Jwsaa: joint weak saliency and attention aware for person re-identification. Neurocomputing. 2021;453:801–11.
Article Google Scholar
Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell. 2002;24(7):971–87.
Article Google Scholar
Organisciak D, Sakkos D, Ho ES, Aslam N, Shum HP. Unifying person and vehicle re-identification. IEEE Access. 2020;8:115673–84.
Article Google Scholar
Prasad MV, Balakrishnan R, et al. Spatio-temporal association rule based deep annotation-free clustering (star-dac) for unsupervised person re-identification. Pattern Recogn. 2022;122: 108287.
Article Google Scholar
Qian W, Luo H, Peng S, Wang F, Chen C, Li H. Unstructured feature decoupling for vehicle re-identification. In: Computer Vision-ECCV 2022. Cham: Springer Nature Switzerland; 2022. p. 336–53.
Qian Y, Barthelemy J, Iqbal U, Perez P. V2reid: vision-outlooker-based vehicle re-identification. Sensors. 2022;22(22):8651.
Article Google Scholar
Ren S, Cao X, Wei Y, Sun J. Face alignment at 3000 fps via regressing local binary features. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2014; p. 1685–1692.
Ristani E, Tomasi C. Features for multi-target multi-camera tracking and re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2018; p. 6036–6046.
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C. Performance measures and a data set for multi-target, multi-camera tracking. In: European Conference on computer vision, Springer 2016; p. 17–35.
Schwartz WR, Davis LS. Learning discriminative appearance-based models using partial least squares. In: 2009 XXII Brazilian Symposium on computer graphics and image processing, IEEE 2009; p. 322–329.
Sener O, Song HO, Saxena A, Savarese S. Learning transferrable representations for unsupervised domain adaptation. In: Proceedings of the 30th international conference on neural information processing systems (NIPS’16). Red Hook, NY, USA: Curran Associates Inc.; 2016. p. 2118–26.
Shen F, Du X, Zhang L, Tang J. Triplet contrastive learning for unsupervised vehicle re-identification. arXiv preprint 2023. arXiv:2301.09498.
Shen F, Xie Y, Zhu J, Zhu X, Zeng H. Git: graph interactive transformer for vehicle re-identification. IEEE Trans Image Process. 2023;32:1039–51.
Article Google Scholar
Shen L, He T, Guo Y, Ding G. X-reid: Cross-instance transformer for identity-level person re-identification. arXiv preprint 2023. arXiv:2302.02075.
Shen Y, Xiao T, Li H, Yi S, Wang X. Learning deep neural networks for vehicle re-id with visual-spatio-temporal path proposals. In: Proceedings of the IEEE International Conference on computer vision 2017: 1900–1909.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint 2014. arXiv:1409.1556.
Somers V, De Vleeschouwer C, Alahi A. Body part-based representation learning for occluded person re-identification. In: Proceedings of the IEEE/CVF Winter Conference on applications of computer vision 2023; p. 1613–1623.
Sun C, Myers A, Vondrick C, Murphy K, Schmid C. Videobert: A joint model for video and language representation learning. In: Proceedings of the IEEE/CVF International Conference on computer vision 2019; p. 7464–7473.
Sun Y, Zheng L, Yang Y, Tian Q, Wang S. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European Conference on computer vision (ECCV) 2018; p. 480–496.
Sun Y, Xu Q, Li Y, Zhang C, Li Y, Wang S, Sun J. Perceive where to focus: learning visibility-aware part-level features for partial person re-identification. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition 2019; p. 393–402.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2015; p. 1–9.
Tan X, Wang Z, Jiang M, Yang X, Wang J, Gao Y, Su X, Ye X, Yuan Y, He D, Wen S, Ding E. Multi-camera vehicle tracking and re-identification based on visual and spatial-temporal features. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR) Workshops 2019.
Tang Z, Naphade M, Liu MY, Yang X, Birchfield S, Wang S, Kumar R, Anastasiu D, Hwang JN. Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2019; p. 8797–8806.
Teng S, Liu X, Zhang S, Huang Q. Scan: spatial and channel attention network for vehicle re-identification. In: Pacific Rim Conference on Multimedia, Springer 2018; p. 350–361.
Tian M, Yi S, Li H, Li S, Zhang X, Shi J, Yan J, Wang X. Eliminating background-bias for robust person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2018; p. 5794–5803.
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H. Training data-efficient image transformers & distillation through attention. arXiv preprint 2020. arXiv:2012.12877.
Varior RR, Shuai B, Lu J, Xu D, Wang G. A Siamese long short-term memory architecture for human re-identification. In: European Conference on computer vision, Springer 2016; p. 135–153.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. arXiv preprint 2017. arXiv:1706.03762.
Vidanapathirana M, Sudasingha I, Kanchana P, Vidanapathirana J, Perera I. Open set person re-identification framework on closed set re-id systems. In: 2017 IEEE 2nd International Conference on signal and image processing (ICSIP), IEEE 2017; p. 66–71.
Wang C, Zhang Q, Huang C, Liu W, Wang X. Mancs: A multi-task attentional network with curriculum sampling for person re-identification. In: Proceedings of the European Conference on computer vision (ECCV) 2018; p. 365–381.
Wang H, Zhu X, Xiang T, Gong S. Towards unsupervised open-set person re-identification. In: 2016 IEEE International Conference on image processing (ICIP), IEEE 2016; p. 769–773.
Wang H, Hou J, Chen N. A survey of vehicle re-identification based on deep learning. IEEE Access. 2019;7:172443–69.
Article Google Scholar
Wang H, Shen J, Liu Y, Gao Y, Gavves E. Nformer: Robust person re-identification with neighbor transformer. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition 2022; p. 7297–7307.
Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, Chen B, Wu Y. Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2014; p. 1386–1393.
Wang T, Gong S, Zhu X, Wang S. Person re-identification by video ranking. In: European Conference on computer vision, Springer 2014; p. 688–703.
Wang X. Intelligent multi-camera video surveillance: a review. Pattern Recogn Lett. 2013;34(1):3–19.
Article Google Scholar
Wang Y, Wang L, You Y, Zou X, Chen V, Li S, Huang G, Hariharan B, Weinberger KQ. Resource aware person re-identification across multiple resolutions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2018; p. 8042–8051.
Wang Y, Peng J, Wang H, Wang M. Progressive learning with multi-scale attention network for cross-domain vehicle re-identification. SCIENCE CHINA Inf Sci. 2022;65(6): 160103.
Article Google Scholar
Wang Z, Tang L, Liu X, Yao Z, Yi S, Shao J, Yan J, Wang S, Li H, Wang X. Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In: Proceedings of the IEEE International Conference on computer vision 2017; p. 379–387.
Wang Z, Ye M, Yang F, Bai X, Satoh S. Cascaded sr-gan for scale-adaptive low resolution person re-identification. IJCAI. 2018;1:4.
Google Scholar
Watcharapinchai N, Rujikietgumjorn S. Approximate license plate string matching for vehicle re-identification. In: 2017 14th IEEE International Conference on advanced video and signal based surveillance (AVSS), IEEE 2017; p. 1–6.
Wei L, Zhang S, Yao H, Gao W, Tian Q. Glad: global-local-alignment descriptor for pedestrian retrieval. In: Proceedings of the 25th ACM International Conference on multimedia 2017; p. 420–428.
Wei L, Liu X, Li J, Zhang S. Vp-reid: Vehicle and person re-identification system. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval 2018; p. 501–504. https://doi.org/10.1145/3206025.3206086.
Wei L, Zhang S, Gao W, Tian Q. Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2018; p. 79–88.
Wei XS, Zhang CL, Liu L, Shen C, Wu J. Coarse-to-fine: a rnn-based hierarchical attention model for vehicle re-identification. In: Asian Conference on computer vision, Springer 2018; p. 575–591.
Wei-Shi Z, Shaogang G, Tao X. Associating groups of people. In: Proceedings of the British Machine Vision Conference 2009; p. 1–23.
Wen Y, Zhang K, Li Z, Qiao Y. A discriminative feature learning approach for deep face recognition. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, Springer 2016; p. 499–515.
Wu A, Zheng WS, Yu HX, Gong S, Lai J. Rgb-infrared cross-modality person re-identification. In: Proceedings of the IEEE International Conference on computer vision 2017; p. 5380–5389.
Wu D, Zheng SJ, Bao WZ, Zhang XP, Yuan CA, Huang DS. A novel deep model with multi-loss and efficient training for person re-identification. Neurocomputing. 2019;324:69–75.
Article Google Scholar
Wu D, Zheng SJ, Zhang XP, Yuan CA, Cheng F, Zhao Y, Lin YJ, Zhao ZQ, Jiang YL, Huang DS. Deep learning-based methods for person re-identification: a comprehensive review. Neurocomputing. 2019;337:354–71.
Article Google Scholar
Wu F, Yan S, Smith JS, Zhang B. Joint semi-supervised learning and re-ranking for vehicle re-identification. In: 2018 24th International Conference on pattern recognition (ICPR), IEEE 2018; p. 278–283.
Wu F, Yan S, Smith JS, Zhang B. Vehicle re-identification in still images: application of semi-supervised learning and re-ranking. Signal Process Image Commun. 2019;76:261–71.
Article Google Scholar
Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR) 2018.
Wu Y, Lin Y, Dong X, Yan Y, Ouyang W, Yang Y. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2018; p. 5177–5186.
Xiao T, Li S, Wang B, Lin L, Wang X. Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2017; p. 3415–3424.
Xing EP, Ng AY, Jordan MI, Russell S. Distance metric learning, with application to clustering with side-information. In: Proceedings of the 15th international conference on neural information processing systems (NIPS'02). Cambridge, MA, USA: MIT Press; 2002. p. 521–8.
Xu Y, Ma B, Huang R, Lin L. Person search in a scene by jointly modeling people commonness and person uniqueness. In: Proceedings of the 22nd ACM International Conference on multimedia 2014; p. 937–940.
Yamaguchi M, Saito K, Ushiku Y, Harada T. Spatio-temporal person retrieval via natural language queries. In: Proceedings of the IEEE International Conference on computer vision 2017; p. 1453–1462.
Yan K, Tian Y, Wang Y, Zeng W, Huang T. Exploiting multi-grain ranking constraints for precisely searching visually-similar vehicles. In: Proceedings of the IEEE International Conference on computer vision 2017; p. 562–570.
Yang F, Yan K, Lu S, Jia H, Xie X, Gao W. Attention driven person re-identification. Pattern Recogn. 2019;86:143–55.
Article Google Scholar
Yang L, Luo P, Change Loy C, Tang X. A large-scale car dataset for fine-grained categorization and verification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2015; p. 3973–3981.
Yao A, Huang M, Qi J, Zhong P. Attention mask-based network with simple color annotation for uav vehicle re-identification. IEEE Geosci Remote Sens Lett. 2021;19:1–5.
Google Scholar
Yao H, Zhang S, Hong R, Zhang Y, Xu C, Tian Q. Deep representation learning with part loss for person re-identification. IEEE Trans Image Process. 2019;28(6):2860–71.
Article MathSciNet Google Scholar
Ye L, Rochan M, Liu Z, Wang Y. Cross-modal self-attention network for referring image segmentation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition 2019; p. 10502–10511.
Ye M, Lan X, Yuen PC. Robust anchor embedding for unsupervised video person re-identification in the wild. In: Proceedings of the European Conference on computer vision (ECCV) 2018.
Ye M, Lan X, Wang Z, Yuen PC. Bi-directional center-constrained top-ranking for visible thermal person re-identification. IEEE Trans Inf Forensics Secur. 2019;15:407–19.
Article Google Scholar
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC. Deep learning for person re-identification: a survey and outlook. IEEE Trans Pattern Anal Mach Intell. 2022;44(6):2872–93.
Article Google Scholar
Yi D, Lei Z, Liao S, Li SZ. Deep metric learning for person re-identification. In: 2014 22nd International Conference on pattern recognition, IEEE 2014; p. 34–39.
Yu Q, Chang X, Song YZ, Xiang T, Hospedales TM. The devil is in the middle: Exploiting mid-level representations for cross-domain instance matching. arXiv preprint 2017. arXiv:1711.08106.
Yu Z, Pei J, Zhu M, Zhang J, Li J. Multi-attribute adaptive aggregation transformer for vehicle re-identification. Inform Process Manag. 2022;59(2): 102868.
Article Google Scholar
Yuan L, Hou Q, Jiang Z, Feng J, Yan S. Volo: vision outlooker for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2023;45(5):6575–86.
Google Scholar
Zahra A, Perwaiz N, Shahzad M, Fraz MM. Person re-identification: a retrospective on domain specific open challenges and future trends. arXiv preprint 2022. arXiv:2202.13121
Zajdel W, Zivkovic Z, Krose BJ. Keeping track of humans: have i seen this person before? In: Proceedings of the 2005 IEEE International Conference on robotics and automation, IEEE 2005; p. 2081–2086.
Zapletal D, Herout A. Vehicle re-identification for automatic video traffic surveillance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2016; p. 25–31.
Zhang J, Wang FY, Wang K, Lin WH, Xu X, Chen C. Data-driven intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst. 2011;12(4):1624–39.
Article Google Scholar
Zhang L, Jiang N, Diao Q, Zhou Z, Wu W. Person re-identification with pose variation aware data augmentation. Neural Comput Appl. 2022;34(14):11817–30.
Article Google Scholar
Zhang X, Li D, Wang Z, Wang J, Ding E, Shi JQ, Zhang Z, Wang J. Implicit sample extension for unsupervised person re-identification. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition 2022; p. 7369–7378.
Zhang Y, Lu H. Deep cross-modal projection learning for image-text matching. In: Proceedings of the European Conference on computer vision (ECCV) 2018; p. 686–701.
Zhang Y, Liu D, Zha ZJ. Improving triplet-wise training of convolutional neural network for vehicle re-identification. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE 2017; p. 1386–1391.
Zhao L, Li X, Zhuang Y, Wang J. Deeply-learned part-aligned representations for person re-identification. In: Proceedings of the IEEE International Conference on computer vision 2017; p. 3219–3228.
Zhao R, Ouyang W, Wang X. Unsupervised salience learning for person re-identification. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2013; p. 3586–3593.
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q. Scalable person re-identification: a benchmark. In: Proceedings of the IEEE International Conference on computer vision 2015; p. 1116–1124.
Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q. Mars: A video benchmark for large-scale person re-identification. In: European Conference on computer vision, Springer 2016; p. 868–884.
Zheng L, Yang Y, Hauptmann AG. Person re-identification: past, present and future. arXiv preprint 2016. arXiv:1610.02984.
Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q. Person re-identification in the wild. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2017; p. 1367–1376.
Zheng M, Karanam S, Wu Z, Radke RJ. Re-identification with consistent attentive Siamese networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition 2019; p. 5735–5744.
Zheng WS, Gong S, Xiang T. Transfer re-identification: from person to set-based verification. In: 2012 IEEE Conference on computer vision and pattern recognition, IEEE 2012; p. 2650–2657.
Zheng WS, Gong S, Xiang T. Towards open-world person re-identification by one-shot group-based verification. IEEE Trans Pattern Anal Mach Intell. 2015;38(3):591–606.
Article Google Scholar
Zheng Y, Capra L, Wolfson O, Yang H. Urban computing: concepts, methodologies, and applications. ACM Trans Intell Syst Technol (TIST). 2014;5(3):1–55.
Google Scholar
Zheng Z, Zheng L, Yang Y. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on computer vision 2017; p. 3754–3762.
Zhou Y, Shao L. Cross-view gan based vehicle generation for re-identification. BMVC. 2017;1:1–12.
Google Scholar
Zhu H, Ke W, Li D, Liu J, Tian L, Shan Y. Dual cross-attention learning for fine-grained visual categorization and object re-identification. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR), IEEE, New Orleans, LA, USA 2022; p. 4692–4702.
Zhu W, Wang Z, Wang X, Hu R, Liu H, Liu C, Wang C, Li D. A dual self-attention mechanism for vehicle re-identification. Pattern Recogn. 2023;137: 109258.
Article Google Scholar
Zhu X, Wu B, Huang D, Zheng WS. Fast open-world person re-identification. IEEE Trans Image Process. 2017;27(5):2286–300.
Article MathSciNet Google Scholar
Zhu X, Luo Z, Fu P, Ji X. Voc-reid: Vehicle re-identification based on vehicle-orientation-camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 2020; p. 602–603.

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

University of Wollongong, Wollongong, Australia
Yan Qian, J. Barthelemy & P. Perez
NVIDIA Corporation, Santa Clara, USA
E. Karuppiah

Authors

Yan Qian
View author publications
You can also search for this author in PubMed Google Scholar
J. Barthelemy
View author publications
You can also search for this author in PubMed Google Scholar
E. Karuppiah
View author publications
You can also search for this author in PubMed Google Scholar
P. Perez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Qian.

Ethics declarations

Conflict of Interest

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Yes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Qian, Y., Barthelemy, J., Karuppiah, E. et al. Identifying Re-identification Challenges: Past, Current and Future Trends. SN COMPUT. SCI. 5, 937 (2024). https://doi.org/10.1007/s42979-024-03271-9

Download citation

Received: 05 January 2022
Accepted: 26 August 2024
Published: 07 October 2024
DOI: https://doi.org/10.1007/s42979-024-03271-9

Identifying Re-identification Challenges: Past, Current and Future Trends

Abstract

Similar content being viewed by others

DUPL-VR: Deep Unsupervised Progressive Learning for Vehicle Re-Identification

Domain Adaptive Egocentric Person Re-identification

PEVR: Pose Estimation for Vehicle Re-Identification

Explore related subjects

Introduction

Steps of a Re-ID Framework

Person Re-ID vs Vehicle Re-ID

Applications

Challenges

A Timeline of Object Re-ID

Closed-Set vs Open-Set Re-ID

Open-World vs. Open-set re-ID

Re-identification Methods

Reviews

Methods for Closed-Set Re-ID

Local Features

Distance Metric Learning

Attention Mechanism

Unsupervised Learning

Pose Guided Person Re-ID

Transformer

Methods for Unified Vehicle and Person Re-ID

Methods for Open-Set Re-ID

Datasets and Evaluation

Datasets for Closed-world Re-ID

Datasets for Unified Re-ID

Datasets for Open-world Re-ID

Evaluation

Closed-world Re-ID

Open-set Re-ID

Results

Conclusion

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation