1 - Design of GPU-Based Aerial Monitoring Signal Processing Software
1 - Design of GPU-Based Aerial Monitoring Signal Processing Software
1 - Design of GPU-Based Aerial Monitoring Signal Processing Software
DOI: 10.1049/rsn2.12244
ORIGINAL RESEARCH
- -
Revised: 18 January 2022 Accepted: 18 February 2022
1
Department of Security and Crime Science, Abstract
University College London, London, UK
Software defined radar (SDRadar) systems have become an important area for future
2
Department of Electronic and Electrical radar development and are based on similar concepts to Software defined radio (SDR).
Engineering, University College London, London,
UK Most of the processing like filtering, frequency conversion and signal generation are
implemented in software. Currently, radar systems tend to have complex signal processing
Correspondence and operate at wider bandwidth, which means that limits on the available computational
Wenda Li, Department of Security and Crime power must be considered when designing a SDRadar system. This paper presents a
Science, University College London, 35 Tavistock feasible solution to this potential limitation by accelerating the signal processing using a
Square, Bloomsbury, London, WC1H 9EZ, UK.
Email: wenda.li@ucl.ac.uk GPU to enable the development of a high speed SDRadar system. The developed system
overcomes the limitation on the processing speed by CPU‐only, and has been tested on
Funding information three different SDR devices. Results show that, with GPU accelerator, the processing rate
Engineering and Physical Sciences Research Council, can achieve up to 80 MHz compared to 20 MHz with the CPU‐only. The high speed
Grant/Award Number: EP/R018677/1 processing makes it possible to run in real‐time and process full bandwidth across the
WiFi signal acquired by multiple channels. The gains made through porting the pro-
cessing to the GPU moves the technology towards real‐world application in various
scenarios ranging from healthcare to IoT, and other applications that required significant
computational processing.
KEYWORDS
GPU accelerator, signal processing, software defined radar
-
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
© 2022 The Authors. IET Radar, Sonar & Navigation published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.
consumption and latency etc [6, 7]. The major concern in this also been used to accelerate radar image processing [12] with
work is the amount of onboard memory. For example, com- real‐time capability. This system hybrids CPU‐GPU scheme to
mercialised Xilinx FPGA UltraScale product line has a process on 40 MHz sampling rate, and can generate images at
maximum of 133 megabytes block memory [8]. In comparison, 8 fps composed of 6000 pixels. In SDRadar systems, using a
gaming GPUs such as NVIDIA GeForce GTX 2060 can offer GPU as an accelerator has been studied broadly in recent years.
onboard memory of 6 gigabytes. In this work, we focus on the One of the most representatives is [13], which implemented a
high sampling rate radar signal which requires extensive data real‐time Global Positioning System (GPS) receiver with
transfer via the peripheral component interconnect (PCI) adaptive beam‐steering capability using a software‐defined
Express. Therefore, GPU has larger onboard memory and approach. This SDRadar offers sufficient computational
allows data to be transferred and processed simultaneously, capability to support 4‐element antenna array up to 40Msps
making it well‐suited for this SDRadar work. Also, GPU‐based (mega samples per second). After that, a testbed [14] for anti‐
accelerator means the system can be easily modified with other jamming receiver was developed embedded with Space‐Time
Software Defined Radio (SDR) devices without heavy devel- Adaptive Processing and Space‐Frequency Adaptive Process-
opment by FPGA‐based accelerator. ing. It further deploys the batch mode to fully take advantage
With the introduction of NVIDIA Compute Unified of the parallelism resources provided by GPU. However, both
Device Architecture (CUDA), the parallel processing capabil- of the two SDRadars are especially developed for beamform-
ities of GPU becomes accessible not only for the graphics, but ing and are limited in flexibility and configurability. A com-
makes financially and technologically accessible. Another parison between these works and our system is summarised in
advantage of GPU‐based accelerator is its ability to free‐up the Table 1.
CPU from heavy parallel computing and focus on Data In this work, we use the LabVIEW GPU analysis toolkit
Acquisition (DAQ) and synchronisation control. The uti- to interface the CUDA functions and embed into our pre-
lisation of GPU enables a real‐time ability without degradation vious SDRadar system [15] with a new architecture. This is
on sampling rate. achieved by switching to a parallel framework where DAQ by
Considering these advantages, many researchers have been CPU and signal processing (partly by GPU) are processed
working on GPU‐based accelerator in various systems to deal separately. The processing speed is significantly improved
with high computational process such as fast Fourier trans- with the GPU accelerator when compared to CPU‐only
form (FFT) [9] and correlation [10]. An early work [11] pre- system. To demonstrate the effectiveness of GPU acceler-
sents an analysis on correlation with GPU acceleration and ator in real‐time, three SDR devices have been tested to
demonstrates a speed‐up factor of 15 compared to CPU only quantify the advantage of proposed concepts in flexibility
process. Work [10] demonstrates a GPU‐accelerated back‐ deployment and processing speed. The hardware specification
projection in reconstruction of Synthetic Aperture Radar. It of these SDR devices is presented in Table 2 and shown in
also shows an impressive runtime with a speed‐up factor be- Figure 1.
tween 50 and 60. However, this system does not provide a Compared to previous works [12, 17, and 18], the
practical solution for GPU‐based accelerators in real‐time following contributions are made by this paper:
processing. A Raspberry Pi base SDRadar system described
in [9] built for passive radar including reference signal recon- � This paper presents a robust high‐speed SDRadar system
struction and two‐dimensional FFT (size of 2048 � 512) with that is capable of processing sampling rates of up to
on‐board CPU and GPU. In this case, GPU only shows a slight 80 MHz without sacrifice from dropping samples. The
improvement by 10% due to the sequential framework design system runs with a typical passive radar processing including
which leads to overhead for single FFT. The limitation of this Cross‐Ambiguity Function (CAF) and Direct Signal Inter-
system is that it operates at a relatively low sampling rate ference cancellation [19].
240 kHz and is not fully compatible with the parallel frame- � The proposed system can be easily tuned to run with various
work to take full advantage from GPU acceleration. GPU has SDR devices. We have modified it to run with three
No. channel 4 Up to 8 Up to 16
Device USRP 2920 [3] USRP 2945 [16] DigitizerNetbox DN2.593‐16 [4]
No of channels (devices) 1 (2) 4 (1) 16 (1)
Theoretical sampling rate (as a SDR device) 40 MHz 100 MHz 640 MHz
FIGURE 1 Three Software Defined Radio (SDR) devices: (a) USRP 2920, (b) USRP 2945, and (c) DigitizerNetbox (DN)
different devices including the USRP 2920, USRP 2945 and (transmitted) signals sr and surveillance (received) signals ss can
DigitizerNetbox. be written as Equations (1) and (2):
� The new system shows a marked improvement in process-
∞
ing speed when comparing to our previous CPU only sys- ðsr ⋅ ss ÞðτÞ ¼ ∫ −∞ sr ∗ ðtÞss ðt þ τÞdt ð1Þ
tem [15]. Three SDRadar features (full bandwidth
processing, multi‐channel and phased‐array system) have ∞
X
been implemented to demonstrate the feasibility of GPU ðsr ⋅ ss Þ½n� ¼ sr ∗ ½m� � ss ½m þ n� ð2Þ
accelerator. m¼−∞
The rest of this article is organised as follows. Section II where * represents complex conjugate. The cross‐correlation
outlines the concepts of our SDRadar system and the associ- theorem suggested in [22] in discrete form is presented in
ated signal processing for WiFi based passive sensing; Sec- Equation (3):
tion III presents the design and implementation of GPU‐based ∞
X � �
parallel processing; measured performance and experimental ðsr ⋅ ss Þ½n� ¼ IFFT FFT s∗r ½m� FFT ðss ½m þ n�Þ ð3Þ
results are shown in Section IV; finally, conclusions from this m¼ −∞
study are in Section V. where FFT is fast Fourier transform and IFFT is inverse fast
Fourier transform.
One of the limitations to apply Equation (3) to SDRadar
2 | SIGNAL PROCESSING IN SDR system is the size of sr and ss. Considering the sampling rate of
20 MHz (a typical bandwidth of WiFi signal at 2.4 GHz), which
2.1 | Cross‐correlation means there are 20 M data points that need to be processed for
every second. Particularly, the FFT process on a long sequence
Cross‐correlation evaluates the level of similarity between two is very slow. This makes the cross‐correlation almost impos-
functions or signals [20], and is widely used to detect the time sible to be processed in real‐time. For this reason, batch pro-
delay and Doppler shift for a known transmitted signal and cess has been applied which segments a long sequence s into L
reflected signal from objects. More specifically, according to short and equal length sequences s = [s1, s2,…, sL]. Cross‐
our previous works [15, 21], cross‐correlation consumes the correlation with batch process can be expressed as:
major computational power in signal processing. In such
scenarios, the speed of the cross‐correlation is crucial to L−1
X � ��
the overall performance of the system. In the time and discrete sr ⋅ ss ¼ IFFT FFT sr i∗ FFT ss i ð4Þ
domains, the definition of cross‐correlation between reference i¼0
17518792, 2022, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12244 by Cochrane Mexico, Wiley Online Library on [18/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1086
- LI ET AL.
When comparing the hardware architectures between GPU 3.1 | Mapping the data points to GPU grid
and CPU, GPUs are specifically designed for computationally
intensive calculations that have high parallelism rates. Conse- Figure 5 presents the structure of mapping the data points to
quently, GPUs are designed with more transistors for data GPU grid. An SDRadar system with inputs from N channels at
processing than data caching and flow control. These differ- Sr sampling rate in Ti integration time, has a data size of N �
ences in hardware architectures determine the different pro- Sr � Ti. Within each channel, data points are segmented
cessing speed of signal processing. For operating systems, there equally into L batches with a batch length of LB. Batches from
is no affect to process the data in GPU memory which also each channel are combined together and downloaded into
gives better stability. As a result, accelerating the signal pro- GPU memory for cross‐correlation. In the CUDA framework,
cessing with GPU is considered a powerful and fast develop- a given sequence of instructions is called kernel. Each kernel
ment option for SDRadar systems [10, 17]. controls a group of blocks which process the data in parallel.
17518792, 2022, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12244 by Cochrane Mexico, Wiley Online Library on [18/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL.
- 1087
The block index (red context in Figure 5) indicates the number saved into the hard drive for off‐line processing or down-
of parallel block NB, where NB = N � L. To make it simple, we loaded into the GPU memory through the PCIE X 16 3.0
use the processing rate Pr to measure the actual data that interface for subsequent online processing. In T2, the whole
processed by the system. The number of blocks, overall data process is operated for each cross‐correlation in parallel by the
size downloaded by GPU and processing/sampling rate can be GPU accelerator, and the obtained data are uploaded to host
expressed as: memory from GPU memory again in the end. In T3, the
spectrogram is generated by the CPU from the data loaded in
N � Sr � Ti ¼ NB � LB ¼ P r ð5Þ the host memory for graphic user display or storage into the
hard drive for further processing for example, activity recog-
whereas the left part represents the processed data points in nition, identification, etc.
CPU memory, the middle part represents the data points in Recall the Equation (5), taking integration time Ti of 1
GPU memory. The processing rate can also be calculated as second as example, the total amount of received data points are
the product of NB � LB. Note that, the value of parameters in N � Sr. The detailed description relating to the cross corre-
Equation (5) is not constant. They vary depending on the SDR lation processing on the GPU is given below:
device and the maximum throughput of DAQ.
1. Reshape the received vector data (complex values) into
parallel blocks NB � LB. This process is done by CPU.
3.2 | System integration 2. Download the batch data from CPU to GPU memory.
3. Complex conjugate is implemented as a GPU function and
Figure 6 schematically displays the system integration of the applied on the received signal (NB − L) � LB in time
GPU‐accelerated cross‐correlation processing for our SDRa- domain, by changing the sign of imaginary parts of the
dar system. There are three major threads including raw data complex number. This is because the transmitted signal
DAQ (Thread1, T1), multi‐channel cross‐correlation by GPU does not do complex conjugate.
(Thread2, T2) and spectrogram generation (Thread3, T3), 4. FFT: forward FFT is performed on all the batches NB � LB.
respectively. The solid arrows describe the main data stream, 5. The transmitted signal L � LB and every L � LB received
and the red arrows indicate the data flow inside the GPU. signal, both are complex numbers in the Fourier domain,
In T1, the raw RF signal is acquired by the SDR device and are multiplied. This gives size of (NB − L) � LB.
transferred into the host computer's memory through the 6. IFFT: inverse fast Fourier transform is performed on all the
Ethernet/PCIe port (10 GHz). Afterwards, there is an initial batches (NB − L) � LB.
preparation to sort the data into 2D matrix. Then, it can be 7. Upload the processed data from GPU to CPU memory.
FIGURE 6 Flow chart of GPU‐accelerated raw Radio Frequency (RF) data processing
17518792, 2022, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12244 by Cochrane Mexico, Wiley Online Library on [18/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1088
- LI ET AL.
F I G U R E 9 Performance comparisons between CPU and GPU based cross‐correlation processing on various sizes of batch length and block number.
(a) Processing time comparison on 100 blocks with varying batch length. (b) Processing time comparison on varying block numbers with a fixed batch length of
20 k. (c) and (d) GPU–CPU processing acceleration ratio and processing rate corresponding to (a) and (b), respectively
demonstrate that this acceleration ratio increases with the size lower than the 1 second of processing time. These results
of batch length and block number. The comparison between indicate an attractive performance delivered by GPU acceler-
Figures 9c and d demonstrates that the GPU accelerates the ator which is a feas of high sampling rate processing.
cross‐correlation based on two factors: 1. Faster FFT pro-
cessing over longer data in each block; 2. More efficient parallel
processing due to the highly paralleled core hierarchy in GPU. 4 | SDRADAR APPLICATIONS
The simulation is based on the data from 1 second, which
means latency will be induced when the processing time is A typical radar system requires at least two channels: one
longer than 1 second, thus not suitable for real‐time process- transmitter and one receiver. Two USRP 2920 have been used
ing. Besides, CPU also has another heavy load task: DAQ from since each of them only has 1 channel. More details about the
SDR devices. From Figure 9b, it can be seen that CPU has two USRP 2920 setup can be found in our previous paper [24].
more than 1 second processing time in the test of 200 k, 500 k In comparison, USRP 2940 (4 channels) and DigitizerNetbox
batch length and 800, 1600 blocks, while GPU in all tests are (16 channels) can fully function as an SDRadar system.
less than 1 second. In addition, the overall acceleration ratio in Additional channels can be employed for distributed sensing
this work is not as significant as in work [18] which ranged and angle‐of‐arrival detection.
from 10 to 30 times faster. The reason is because we also
simulated the process of downloading and uploading data to
the GPU memory which revealed longer processing times than 4.1 | Advantages in sensitivity (USRP 2920)
the cross correlation instead. This has a particular effect in
processing time especially with a large data size. One of the important measures for SDRadar system is it's
It is worth noting from Figure 9a that the processing time ability to detect targets against the background clutter and
of 10k(LB) � 100(NB) can be accelerated to 2 times faster than noise. This is defined by multiple factors, for example, the
CPU processing. This is the minimum sampling rate (1 MHz) signal strength, antenna beam pattern, etc. In terms of range‐
used in this work; GPU performs 2 times faster than CPU Doppler surface, the sensitivity can be measured as the Peak
despite it having a short processing time. In comparison, at Signal‐to‐Noise Ratio (PSNR), where peak represents the pulse
500k(LB) � 100(NB) (the maximum sampling rate), GPU has relating to the object and noise represents the sidelobes. High
contributed more than 4 times acceleration ratio. From PSNR means the object can be easily identified.
Figure 9b, GPU processing also has much shorter processing Here we provide examples of range‐Doppler surface based
time when dealing with multiple channels. The processing time on WiFi signals. In this measurement, two USRP 2920 s were
of 20k(LB) � 50(NB) has been accelerated up to 3 times faster used as receivers, one channel was connected directly to the
than CPU processing. While, there is only slightly acceleration WiFi router to measure a ‘Reference’ signal while the other
has been seen at 20k(LB) � 1600(NB) at 3.7 times, but still channel was connected to an antenna to record corresponding
17518792, 2022, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12244 by Cochrane Mexico, Wiley Online Library on [18/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1090
- LI ET AL.
signal reflections from the environment. The antenna was Figure 11 presents the PSNR versus batch length from 1 k
configured as the same direction towards the WiFi router to 200 k for three frame rates. The red dash line indicates the
under a stationary environment without any Doppler shifts. maximum CPU processing ability at 100 k, however, the GPU
Both USRP 2920 were operated at 20 MHz for full bandwidth accelerator can easily process more than 200 k. The highest
of WiFi signal at 2.4 GHz channel. To demonstrate the PSNR of 20 Hz frame rate can reach up to 17 dB, whereas
sensitivity performance of SDRadar system in different signal 100 frame rate and 1 k frame rate both can reach more than
scenarios, three different WiFi frame rates were used as 20 Hz, 27 dB. PSNR values are getting closer after batch length of
100 Hz and 1 kHz per second. The integration time Ti was set 160 k between 100 and 1 k frame rate. This indicates that WiFi
at 1 s, the block number was constant at 100 that gives signal at 100 Hz can deliver high performance when sufficient
the maximum batch length of 20 M/100 = 200 k. Thus, the data points have been processed, this can be also observed
processing rate is given as (2 � 100) � 200k = 40 MHz, where from Figure 10. In addition, in 1 k frame rate, there is only little
2 represents two channels. The SDRadar system made a full improvement after batch length of 80 k. Processing on addi-
CAF processing upon various batch lengths. tional data points will not have extra benefit on PSNR. Thus,
Figure 10 presents the range‐Doppler surface on three the trade‐off between the amount of processing and PSNR
frame rates at different of batch lengths. As expected, frame threshold needs to be identified depending on the frame rate.
rate of 1 k has the best performance compared to frame rates
of 20 and 100 Hz. This is because of the effective signal
depending on the WiFi frame rate as discussed in our previous 4.2 | Distributed channels (USRP 2945)
paper [15]. However, the effective signal may not be captured if
the batch length is not of sufficient length, even under a high Distributed channels, which measure the object from different
frame rate. Consequently, improvements can be observed in angle of aspects, can generate multiple range‐Doppler surfaces
range‐Doppler surfaces with increasing batch length and frame simultaneously. This can bring spatial diversity in Doppler in-
rate. In the worst case, 20 Hz frame rate with 10 k batch length formation and deliver higher recognition accuracy [25]. How-
results in significant difficulties with identifying the preferred ever, there are many challenges for such systems, for example,
peak. This situation gets improved when the batch length is the clock/time synchronisation among different channels, also
increased to 200 k. Range‐Doppler surface at 100 Hz frame the much higher sampling rate when compared with single
rate with 200 k batch length, has almost similar performance channel SDR.
compared to the same batch length at 1 k frame rate. In In this measurement, a USRP 2945 was used as the
comparison, all peaks in 1 k frame rate can easily be receiver with total 4 channels. Among them, one channel was
distinguished. used to recreate transmitted signal, and other three channels
F I G U R E 1 0 Range‐Doppler surface for a WiFi signal: rows 1, 2 and 3 illustrates frame rates of 20 Hz, 100 Hz and 1 kHz, respectively. Column 1–7 are
batch lengths of 10 k, 20 k, 30 k, 50 k, 100 k, 150 k and 200 k, respectively. The x‐axis plots range and the y‐axis plots Doppler
17518792, 2022, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12244 by Cochrane Mexico, Wiley Online Library on [18/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL.
- 1091
FIGURE 13 Doppler spectrogram captured by 3 distributed receivers at different angle (random direction walking)
Abbreviations: FFT, fast Fourier transform; IFFT, inverse fast Fourier transform.
17518792, 2022, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12244 by Cochrane Mexico, Wiley Online Library on [18/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
LI ET AL.
- 1093
4.3 | Angle finding (DigitizerNetbox) using data from 4‐channel processed by CPU (as the base-
line) and data from 16‐channel systems processed by GPU.
The 16‐channel DigitizerNetbox was originally designed As expected, the angular resolution has been largely
for waveform spectrum analyser, whereas we use it as a improved by the 16‐channel with a clear peak at 50°, whereas
MIMO‐SDR. We integrated a phased array antenna to the the 4‐channel system gives a much coarse estimation. This
DigitizerNetbox and performed an Angle‐of‐Arrival (AoA) indicates the performance gain using a GPU accelerator can
analysis over the measurements from all 16 channels. Let the sufficiently improve the angular resolution at no additional
RF signal received at ith channel be si(t), the sampling data cost on the computing unit.
from DigitierNetbox can be written as S(t) = (s1[t] s2[t] … si[t]),
i ≤ 16. For T period of sampling time, AoA is calculated as
the sum of �the signal �amplitude from the signal source as 5 | CONCLUSIONS
PT −1
θðtÞ ¼ FFT t¼0 SðtÞ . Since there is only a single FFT
This paper presents a high‐speed design for an SDRadar sys-
process, we slightly modified the architecture shown in tem by using a GPU accelerator to speed up the cross‐
Figure 7 by removing the multiplication and second FFT correlation process. The idea is that CPU can handle raw
process. Here, the AoA analysis aims to search the direction of Data DAQ and post‐processing which are non‐parallel threads,
the incoming signals by the signal source. while GPU can handle the parallel threads involving FFT
Due to limitations of the Ethernet cable creating a data process. The proposed GPU accelerator demonstrates high
flow bottleneck, we reduced the DAQ rate to 3 iterations per flexibility and extensibility, and is able to work with three
second (96 M data points received every second). The sampling different SDR devices. Experimental results show that the
rate was set at 80 MHz for all 16 channels. We ran the system proposed GPU accelerator can speed up the system by up to
with and without GPU accelerator to show the difference in four times than the CPU‐only system. There are significant
angular detection. To ensure the system remains in real‐time improvements in PSNR (Figure 10) and angular resolution
processing, only the CPU was running at a block number of (Figure 14) by using the GPU accelerator to process more
0.4 k and batch length of 20 k (8 M data points processed every samples.
iteration), while GPU accelerator was running at a block Future work will focus on the joint FPGA and GPU
number of 1.6 k and batch length of 20 k (32 M data points implementation for ultra‐speed SDRadar systems. This could
processed every iteration). According to the angle resolution save more computational power from the CPU and reduce the
Δθ ¼ Dλ where D is the size of antenna, it is expected that 16‐ sampling rate from computer side as some processing can be
channel system should provide 4 times higher resolution than performed by FPGA. It is envisioned that such systems could
the 4‐channel system. be a solution for many industry‐based radar applications and
In this measurement, a WiFi access point (AP) was can be compatible with AI systems for mission‐critical appli-
located at 50° towards the phased array antenna at a distance cations that require very low‐latency, such as autonomous ve-
of 3 m Figure 14 presents an AoA plot for a signal source hicles and manufacturing operations.
F I G U R E 1 4 Performance gain: angle‐of‐arrival plot for a signal source measured at 50° to the antenna, (a) data from 4‐channel and processed by CPU and
(b) data from 16‐channel and processed by GPU
17518792, 2022, 7, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.12244 by Cochrane Mexico, Wiley Online Library on [18/07/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1094
- LI ET AL.
ACKN OW LE DG E ME N T 10. Fasih, A., Hartley, T.: Gpu‐accelerated synthetic aperture radar back-
This work is part of the OPERA project funded by the UK projection in cuda. In: 2010 IEEE Radar Conference, pp. 1408–1413.
Engineering and Physical Sciences Research Council (EPSRC), IEEE (2010)
11. Gembris, D., et al.: Correlation analysis on gpu systems using nvidia’s
Grant No: EP/R018677/1. cuda. J. Real. Time. Image. Process. 6(4), 275–280 (2011)
12. Garcia‐Rial, F., Ubeda‐Medina, L., Grajal, J.: Real‐time gpu‐based image
CO NF LI CT O F I N T ER E ST processing for a 3‐d thz radar. IEEE Trans. Parallel Distr. Syst. 28(10),
All authors listed in this paper declare that they have no 2953–2964 (2017)
conflicts of interest. 13. Seo, J., et al.: A real‐time capable software‐defined receiver using gpu for
adaptive anti‐jam gps sensors. Sensors. 11(9), 8966–8991 (2011)
14. Xu, H., Cui, X., Lu, M.: An sdr‐based real‐time testbed for gnss adaptive
DATA AVA IL AB I LI T Y STA T E ME N T array anti‐jamming algorithms accelerated by gpu. Sensors. 16(3), 356
The data that support the findings of this study are available (2016)
from the corresponding author upon reasonable request. 15. Li, W., et al.: Passive wifi radar for human sensing using a stand‐alone
access point. IEEE Trans. Geosci. Rem. Sens. (2020)
16. Ni usrp 2945. [Online]. https://www.ni.com/en‐gb/support/model.
OR CID usrp‐2945.html
Wenda Li https://orcid.org/0000-0001-6617-9136 17. Zhang, C., Yang, Q., Deng, W.: High frequency radar signal processing
Shelly Vishwakarma https://orcid.org/0000-0003-1035- based on the parallel technique, 2015
3259 18. Li, J., Xiao, Y.: Gpu accelerated parallel fft processing for Fourier
transform hyperspectral imaging. Appl. Opt. 54(13), D91–D98 (2015)
19. Chetty, K., Smith, G.E., Woodbridge, K.: Through‐the‐wall sensing of
RE FE RE NCE S personnel using passive bistatic wifi radar at standoff distances. IEEE
1. Debatty, T.: Software defined radar a state of the art. In: 2010 2nd In- Trans. Geosci. Rem. Sens. 50(4), 1218–1226 (2011)
ternational Workshop on Cognitive Information Processing, pp. 253–- 20. Yoo, J.‐C., Han, T.H.: Fast normalized cross‐correlation. Circ. Syst. Signal
257. IEEE (2010) Process. 28(6), 819–843 (2009)
2. Jondral, F.K.: Software‐defined radio—basics and evolution to cognitive 21. Li, W., et al.: Physical activity sensing via stand‐alone wifi device. In: 2019
radio. EURASIP J. Wirel. Commun. Netw. 3, 652784 (2005) IEEE Global Communications Conference (GLOBECOM), pp. 1–6.
3. Ni usrp 2920. [Online]. https://www.ni.com/en‐gb/shop/hardware/ IEEE (2019)
products/usrp‐software‐defined‐radio‐device.html 22. Cross‐correlation theorem. [Online]. https://mathworld.wolfram.com/
4. Digitizernetbox. [Online]. https://spectrum‐instrumentation.com/en/ Cross‐CorrelationTheorem.html
digitizernetbox 23. Cuda. [Online]. NVIDIA CUDA C Programming Guide
5. Anderson, C.R., et al.: Analysis and implementation of a time‐interleaved 24. Li, W., Tan, B., Piechocki, R.J.: Wifi‐based passive sensing system for
adc array for a software‐defined uwb receiver. IEEE Trans. Veh. Tech- human presence and activity event classification. IET Wirel. Sens. Syst.
nol. 58(8), 4046–4063 (2009) 8(6), 276–283 (2018)
6. Nurvitadhi, E., et al.: Accelerating binarized neural networks: compari- 25. Fioranelli, F., et al.: Feature diversity for optimized human micro‐
son of fpga, cpu, gpu, and asic. In: 2016 International Conference on Doppler classification using multistatic radar. IEEE Trans. Aero. Elec-
Field‐Programmable Technology (FPT), pp. 77–84. IEEE (2016) tron. Syst. 53(2), 640–654 (2017)
7. Fowers, J., et al.: A performance and energy comparison of convolution
on gpus, fpgas, and multicore processors. ACM Trans. Archit. Code
Optim. 9(4), 1–21 (2013) How to cite this article: Li, W., et al.: Design of high‐
8. Xilinx: [Online]. UltraScale Architecture and product ata Sheet: speed software defined radar with GPU accelerator. IET
Overview
Radar Sonar Navig. 16(7), 1083–1094 (2022). https://
9. Moser, D., et al.: Design and evaluation of a low‐cost passive radar
receiver based on iot hardware. In: 2019 IEEE Radar Conference doi.org/10.1049/rsn2.12244
(RadarConf), pp. 1–6. IEEE (2019)