An Encoder–Decoder Architecture within a Classical Signal-Processing Framework for Real-Time Barcode Segmentation
"> Figure 1
<p>Topology of the VGG16 model. ReLU stands for the type of activation function used: rectified linear unit.</p> "> Figure 2
<p>SegNet topology. It uses VGG16 as the encoder and adds a specularly symmetric decoder. Note that the indices of the max pooling operations are used when upsampling.</p> "> Figure 3
<p>A series of rotated image tiles containing bars, surrounded by plots of their pixel intensity sums per row (red) and per column (blue). The variance difference among them is maximal when the bars are fully vertically or horizontally oriented.</p> "> Figure 4
<p>Illustration of how adjacent pixels and subsequently line segments are combined in the partial stages of DRT. In this example, <math display="inline"><semantics><mrow><mn>16</mn><mo>×</mo><mn>16</mn></mrow></semantics></math> pixels (represented by circles) are combined initially, to the left, two by two; in the next stage, four by four; and finally, eight by eight. The final tile spacing between solutions (arrows, at the bottom) is eight. In each partial-stage solution, pixels are combined to consider each displacement and slope, but in non-overlapping vertical and horizontal bands, separated by dashed lines. Eminently horizontal lines are rendered in red, whilst vertical ones are rendered in blue.</p> "> Figure 5
<p>Dilemma of how to choose the right tile size to study a barcode region (in blue). From left to right, the tile subdivisions are shown for three different tile sizes, each of which quadruples the previous one in area. The tiles are depicted as rectangles with a circle at their centre and define the spacing of the output grid. The tiles whose centres are within the barcode region are depicted in a coarser line.</p> "> Figure 6
<p>From left to right, from top to bottom: the output of the method by Oliva-García et al. [<a href="#B33-sensors-23-06109" class="html-bibr">33</a>] given the zoomed example image in the top-left corner at scales ranging from <tt>tile_size = 2</tt> to <tt>tile_size = 32</tt>. Note that as the <tt>tile_size</tt> is doubled, the areas are quadrupled in size, but their number is reduced to a quarter. Each result was upsampled with nearest-neighbour interpolation so that all are shown to be the same size. The hue encodes the angle of the bars, and the intensity encodes the intensity of detection.</p> "> Figure 7
<p>On the left, large tile size with smaller tile spacing, implying tile overlapping. On the right, the merging of tiles at various scales to optimise barcode coverage.</p> "> Figure 8
<p>A depiction of how non-overlapping bands fuse at scales two and four to obtain a final output spacing of eight pixels per dimension. The depiction combines, in one picture, the evolution of vertical bands as they are shown at the bottom of <a href="#sensors-23-06109-f004" class="html-fig">Figure 4</a>.</p> "> Figure 9
<p>Illustration of how the final tile spacing determines the bands of computations in partial stages. Both represent <math display="inline"><semantics><mrow><mi>N</mi><mo>=</mo><mn>16</mn></mrow></semantics></math> and <tt>tile_size</tt><math display="inline"><semantics><mrow><mo>=</mo><mn>8</mn></mrow></semantics></math>. They differ in the <tt>tile_spacing</tt>, which is 4 on the left and 2 on the right.</p> "> Figure 10
<p>Architecture of the detector based on the partial DRT, as described in [<a href="#B33-sensors-23-06109" class="html-bibr">33</a>] but taking advantage of all the calculated stages up to <math display="inline"><semantics><mrow><mi mathvariant="monospace">tile</mi><mo>_</mo><mi mathvariant="monospace">size</mi><mo>=</mo><mn>32</mn></mrow></semantics></math>.</p> "> Figure 11
<p>New encoder topology, based on the work by Oliva-García et al. [<a href="#B33-sensors-23-06109" class="html-bibr">33</a>] and modified as described in this section. Note that for simplicity, the partial DRT data structure was made smaller in scale for display purposes.</p> "> Figure 12
<p>From <b>left</b> to <b>right</b>: input image and activation maps at scale = 2, slicing for slopes 0 to 12, which correspond to angles in the range <math display="inline"><semantics><mrow><mo>[</mo><mo>−</mo><mi>π</mi><mo>/</mo><mn>4</mn><mo>,</mo><mn>3</mn><mi>π</mi><mo>/</mo><mn>4</mn><mo>]</mo></mrow></semantics></math>. A colormap is used to represent intensity, ranging from a darker blue to a lighter blue. The upper part of the image contains lines at an angle of <math display="inline"><semantics><mrow><mi>π</mi><mo>/</mo><mn>6</mn></mrow></semantics></math>, and the lower part of the image contains lines at an angle of <math display="inline"><semantics><mrow><mi>π</mi><mo>/</mo><mn>3</mn></mrow></semantics></math>. Note that the activation map correctly activates more in the corresponding slice of each half.</p> "> Figure 13
<p>From <b>left</b> to <b>right</b>: input image and activation maps for the central angle of each scale, from finer (first) to coarser (last) scales. A colormap is used to represent intensity, ranging from a darker blue to a lighter blue. It is clear that finer lines (lower part) produce more intensity at the second and third scales, while coarser lines (upper part) are better detected at the last scale.</p> "> Figure 14
<p>Example of unpooling operation. The pooling operation reduces dimensionality, and the unpooling operation increases it. Note that there are “gaps” where no indices refer to the array on the right.</p> "> Figure 15
<p>From <b>left</b> to <b>right</b>: cropped part of input image; coarse activation map for horizontal angles; fine activation map for horizontal angles; the result of performing the unpooling described in Algorithm 3. Note that there are “gaps” and that each activation function that got propagated used the coarse angle information, and the correct spot was calculated by finding the argument of the maxima within the corresponding area at a finer scale.</p> "> Figure 16
<p>Full encoder–decoder architecture, where the DRT-based encoder is matched with the newly designed decoder integrating the operations described in this section.</p> "> Figure 17
<p>Average execution times of the Halide implementation of the four algorithms. These measurements were performed on an i9 9900 desktop CPU and on the CPU of the Qualcomm Snapdragon 888 mobile SoC. The red dashed line emphasises the 1/30 s time limit.</p> "> Figure 18
<p><b>Left</b>: unprocessed output of the <tt>MDD DRT</tt> segmenter. <b>Right</b>: superimposed on the input image, connected regions whose aspect ratio between the major and minor oriented axes is consistent with a barcode and whose area exceeds a minimum threshold.</p> "> Figure 19
<p>The <tt>MDD DRT</tt> algorithm was executed for thresholds ranging from 10 to 38 on both the WWU Münster dataset and the Arte-lab Rotated dataset. The selected threshold value is marked with a dashed red line.</p> "> Figure 20
<p>The IoU of a barcode that was scaled to different sizes.</p> "> Figure 21
<p>The IoU of a barcode that was rotated to angles ranging from 0 to 180 degrees.</p> "> Figure 22
<p>Columns, from left to right: input image and output of algorithms <tt>PDRT 2</tt>, <tt>PDRT 32</tt>, <tt>PS DRT</tt>, and <tt>MDD DRT</tt>. Rows, challenging scenarios, from top to bottom: elongated, cluttered, mild lens blur, strong lens blur, motion blur, low contrast, and shear.</p> "> Figure 22 Cont.
<p>Columns, from left to right: input image and output of algorithms <tt>PDRT 2</tt>, <tt>PDRT 32</tt>, <tt>PS DRT</tt>, and <tt>MDD DRT</tt>. Rows, challenging scenarios, from top to bottom: elongated, cluttered, mild lens blur, strong lens blur, motion blur, low contrast, and shear.</p> "> Figure 23
<p>Frame of the video provided in the repository documentation. Available at <a href="https://raw.githubusercontent.com/DoMondo/an_encoder_decoder_architecture/master/readme_data/cover.gif" target="_blank">https://raw.githubusercontent.com/DoMondo/an_encoder_decoder_architecture/master/readme_data/cover.gif</a>, accessed on 25 June 2023.</p> ">
Abstract
:1. Introduction
1.1. Barcode Detection Summary
1.2. Semantic Segmentation
1.3. Review of Selected Works
- Robust angle-invariant, 1D barcode detection [16]
- Real-time barcode detection in the wild [10]
- Low-computation egocentric barcode detector for the blind [11]
- Real-time barcode detection and classification using deep learning [18]
- Universal barcode detector using semantic segmentation [17]
- One-dimensional barcode detection: novel benchmark datasets and comprehensive comparison of deep convolutional neural network approaches [24]
1.4. Remaining Challenges
1.5. Purpose of This Work
- 1.
- To propose an adapted version of the above method to allow overlapping among local detection zones so that tile sizes and spacing between detection processes are decoupled.
- 2.
- To propose a second adapted version that works by merging the multiple scales of the original method, i.e., the different tile sizes, in an automatic and optimal way.
- 3.
- To perform a comparison between the original method and the two cited innovations, and one between these and existing methods for barcode segmentation.
2. Discrete Radon Transform as a Bar Detector
2.1. Discrete Radon Transform
- 1.
- A method to better expose the parallelism of the DRT [39], which achieves, in one pass, the eminently horizontal lines of the form , and in another pass, those of the form , thus changing from four quadrants to two groups of 90 degrees (around the horizontal axis and the vertical axis). Additionally, this reformulation eliminates the need for prior zero padding, which the conventional DRT requires.
- 2.
- A bar detector method, built upon the aforementioned modified DRT, that acts locally [33], giving an estimate of the presence of bars and their inclination for each block of size into which an image can be subdivided without overlapping. This local size is denoted by tile_size.
2.2. From a Line Detector to a Local Bar Detector
2.2.1. Disadvantages of Working at a Single Scale
- 1.
- At smaller scales, the angle cannot be determined with high precision. However, at the smallest scale, it is observed that the method discriminates between vertical bars, in colours in the range of the blues, and horizontal bars, in colours in the range of the oranges.
- 2.
- If the scale is so small that only a uniform area is observed in a tile (e.g., the interiors of bars and spaces at scales of two and four), no activation is generated, which, although correct, represents a problem when grouping all the bars and spaces, with varying widths belonging to the same barcode.
- 3.
- On the other hand, at larger scales, there may be tiles that simultaneously observe part of the barcode and the background. This poses a problem in accurately determining the boundaries of the code.
- 4.
- In the previous case, i.e., tiles containing both bars and background, but also in the case of tiles that contain a zone with only a few thick bars, with respect to a tile covering a zone with thin bars, the activation intensity decreases. This explains the variability in detection intensity while keeping the hue—where the angle is encoded—uniform in areas with barcodes at the greatest scale.
- 5.
- At large scales, a drop in detection intensity can also occur when there are differently angled structures within a single tile. This is the case of zones with alphanumeric characters. In these cases, unlike the previous two, in smaller-scale sub-tiles, there may be angular disagreement, and there may even be greater intensity of detection in the parts than in the whole.
2.3. Initial Ideas for the New Methods
3. Overlapping Tiles for Increased Spatial Resolution
3.1. Computational Complexity without Overlapping
Algorithm 1 Computing the partial DRT of an image with tile overlapping. |
|
3.2. Computational Complexity with Overlapping
4. Multiscale Domain-Based Segmentation
4.1. Modifying DRT Output for Multiscale Analysis
4.2. Finding Similarities with Machine Learning-Based Algorithms
4.3. Adapting the Existing Algorithm to Resemble an Encoder
Algorithm 2 Creating the activation maps for the CNN-like encoder from partial DRTs. |
|
4.4. Testing the Encoder
4.5. Designing a Decoder
An Upsampling Operation for a DRT-Based Encoder
Algorithm 3 Unpooling operation for a DRT-based encoder. |
Input: Fine activations of size Input: Coarse activations of size Output: New activations of size
|
- 1.
- Propagating features that are prominent at a coarser scale into a finer scale. This is achieved with the unpooling in Algorithm 3.
- 2.
- Propagating features to the neighbouring vicinity. In this problem, vicinity is defined in the x- and y-dimensions but also in the angle (depth) dimension.
- 3.
- Allowing features that are prominent at any scale to be considered. As Figure 13 suggests, there can be bars that are only detected at finer scales, so coarser detection should not overthrow these if they are more prominent.
5. Implementation
5.1. Pruning and Optimisation
- PDRT 2: The original algorithm of the partial discrete Radon transform as described in [33], executed with . This means that a single step of the DRT was calculated.
- PDRT 32: The original algorithm of the partial discrete Radon transform as described in [33], executed with . This means that five steps of the DRT were calculated.
- PS DRT: Partially strided DRT-based detector as described in Section 3. The chosen parameters were and .
- MDD DRT: Multiscale domain detector based on the DRT, as described in Section 4.
5.2. Choosing a Fast Approximation of a Gaussian Filter
6. Results and Conclusions
6.1. Time Results
6.2. Runtime Comparison
6.3. Accuracy Comparison
Accuracy Metric and Benchmarking Datasets
6.4. Synthetic Parametric Tests
6.5. Qualitative Analysis
- The first row shows a barcode that could be considered unproblematic. Although it contains some contrast variation due to paper bending and quite separated bars, its main difficulty, only for some neural network-based methods, is that its aspect ratio is approximately 1:8. This was not a problem for the PS DRT algorithm, nor for MDD DRT. The fixed-scale algorithms, on the other hand, had problems in properly contouring the code boundaries. This was consistently found in the rest of the examples; therefore, these algorithms, whose outputs are of interest mainly because they are the base and apex of the data pyramid on which the PS DRT and MDD DRT methods operate, are not further discussed.
- The second row shows an image containing a multitude of codes. Some are out of focus; some are rotated; and several dividing lines and alphabetic characters make it difficult to separate them correctly. In addition, a couple of codes are incomplete at the lower end. The MDD DRT method contoured the codes better than the PS DRT method. It was also more sensitive to the angle of the lines and tended to make them more uniform, but this does not result in definite advantages in a real-world application. Instead, as seen in the previous subsection, the focus from now on is exclusively on the MDD DRT method, because the PS DRT method simply does not fit into the time constraints. The segmentation was accurate, and even the small and trimmed code on the border was detected.The main disadvantage of the MDD DRT method is that it joins characters close to a barcode if they are made up of strokes with the same slope. This is the case for characters such as 0, 1, 7, O, I, etc. Fortunately, these “overflows” can be counteracted in the post-processing phase, which is not covered in this article.Bars or outlines close and parallel to the ends of the codes are also problematic, since the measures taken to save the internal areas with low texture but belonging to the codes force their undesired inclusion.This image is the same as the one used in Figure 18 to illustrate how to get rid of false positives with a simple criterion.
- The third and fourth rows illustrate the cases of barcodes affected by strong and extreme lens blur, respectively. The proposed methods still managed to adequately contour the input of the third row but not that of the fourth row. The final scale of the algorithm, with analysis zones of 32 × 32 pixels, allowed bars to be detected, even though, at a smaller scale, the blur merged them with other bars. Also depending on the loss of contrast, at some point, the detector stops triggering. In an example such as this, the detector showed its performance limits, but regardless of whether it can be further improved as a future line of work, it is already a major advance over previous methods, which simply do not work at all when facing heavily defocused barcodes.
- The next row presents a case of severe motion blur in the direction that affects the code bars the most. Both types of blur, lens blur and motion blur, were treated in the same way, and in cases such as this, the detector behaved as expected.
- The image in the sixth row combines very low contrast with glare and smudges on the codes. As it can be seen, the detection is satisfactory, but in this extreme case, the combined effect of a wide bar, low contrast, and glares oriented in the opposite direction with respect to the bars broke the code into two disconnected regions. This is another limitation of our method. In PS DRT, this does not occur, since it is the voting scheme in MDD DRT that penalises character regions with sharp slope changes nearby that works against the correct detection. Fortunately, in the rare occasions in which this happens, it can be solved in post-processing, since angle labels per region are available.
- The last example is a case of extreme perspective that makes the observed slope of the bars of the same code vary. The MDD DRT method solved it very well, and the bars changed smoothly so that they could be grouped in a single region.The problems of joining nearby characters are once again evident.The lack of precision of the boundaries in the upper-right corner of the code is due to another factor. Unlike the other sample images in this figure, which were taken by directly accessing the camera, i.e., without compression and disabling edge enhancements, in this case, it is a JPEG image where there are compression artefacts. Those artefacts, at a low scale, make the MDD DRT method choose not to trigger. This is not exactly a disadvantage, but it has to be noted that the method works best when using raw camera frames instead of compressed images or video.
6.6. Disadvantages
- Certain alphanumeric characters above or below barcodes, and lines and outlines to the left and right, tend to merge with the barcodes, producing false positives.
- When several adverse circumstances concur, a single instance of a barcode can be split into two connected regions.
6.7. Conclusions
6.8. Future Lines of Research
- First, the scheduling of the algorithm in Halide can be improved, since the measurements were manually taken using an auto-scheduler, which gives a schedule that can be improved upon. In addition, the GPU target of SD 888 can also be used.
- Second, more pruning can be made on MDD DRT by modifying two constants: input size and halting stage. For example, starting from instead of and stopping at tile_size = 16 would reduce the execution time to 40% of the original time. How much pruning can be performed without losing too much quality is an assessment that can be performed in follow-up research.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial intelligence |
AR | Augmented reality |
CNN | Convolutional neural network |
CPU | Central processing unit |
DRT | Discrete Radon transform |
FCN | Fully convolutional network |
GPU | Graphics processing unit |
SoC | System on a chip |
SD | Snapdragon |
IoU | Intersection over union |
References
- GS 22423-2023; GS1 General Specifications. GS1: London, UK, 2022; pp. 5.1–5.5. Available online: https://www.gs1.org/docs/barcodes/GS1_General_Specifications.pdf (accessed on 5 June 2023).
- Azuma, R.; Baillot, Y.; Behringer, R.; Feiner, S.; Julier, S.; MacIntyre, B. Recent advances in augmented reality. IEEE Comput. Graph. Appl. 2001, 21, 34–47. [Google Scholar] [CrossRef] [Green Version]
- Muniz, R.; Junco, L.; Otero, A. A robust software barcode reader using the Hough transform. In Proceedings of the 1999 International Conference on Information Intelligence and Systems, Bethesda, MD, USA, 31 October–3 November 1999; pp. 313–319. [Google Scholar] [CrossRef]
- Wachenfeld, S.; Terlunen, S.; Jiang, X. Robust 1-D barcode recognition on camera phones and mobile product information display. In Mobile Multimedia Processing: Fundamentals, Methods, and Applications; Springer: Berlin, Germany, 2010; pp. 53–69. [Google Scholar]
- Gallo, O.; Manduchi, R. Reading 1D Barcodes with Mobile Phones Using Deformable Templates. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1834–1843. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, D.T.; Lin, M.C.; Huang, K.Y. Real-time automatic recognition of omnidirectional multiple barcodes and DSP implementation. Mach. Vis. Appl. 2011, 22, 409–419. [Google Scholar] [CrossRef]
- Katona, M.; Nyúl, L.G. A Novel Method for Accurate and Efficient Barcode Detection with Morphological Operations. In Proceedings of the 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems, Naples, Italy, 25–29 November 2012; pp. 307–314. [Google Scholar] [CrossRef] [Green Version]
- Bodnár, P.; Nyúl, L.G. Improving Barcode Detection with Combination of Simple Detectors. In Proceedings of the 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems, Naples, Italy, 25–29 November 2012; pp. 300–306. [Google Scholar] [CrossRef] [Green Version]
- Sörös, G.; Flörkemeier, C. Blur-resistant joint 1D and 2D barcode localization for smartphones. In Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, Luleå, Sweden, 2–5 December 2013; pp. 1–8. [Google Scholar]
- Creusot, C.; Munawar, A. Real-Time Barcode Detection in the Wild. In Proceedings of the Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, Waikoloa, HI, USA, 5–9 January 2015; pp. 239–245. [Google Scholar]
- Creusot, C.; Munawar, A. Low-computation egocentric barcode detector for the blind. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 2856–2860. [Google Scholar]
- Namane, A.; Arezki, M. Fast real time 1D barcode detection from webcam images using the bars detection method. In Proceedings of the World Congress on Engineering (WCE), London, UK, 5–7 July 2017; Volume 1. [Google Scholar]
- Chen, C.; He, B.; Zhang, L.; Yan, P.Q. Autonomous Recognition System for Barcode Detection in Complex Scenes. In Proceedings of the 4th Annual International Conference on Information Technology and Applications (ITA 2017), Guangzhou, China, 26–28 May 2017; p. 04016. [Google Scholar] [CrossRef]
- Fernandez, W.P.; Xian, Y.; Tian, Y. Image-Based Barcode Detection and Recognition to Assist Visually Impaired Persons. In Proceedings of the 2017 IEEE 7th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Honolulu, HI, USA, 31 July–4 August 2017; pp. 1241–1245. [Google Scholar] [CrossRef]
- Xiao, Y.; Ming, Z. 1D Barcode Detection via Integrated Deep-Learning and Geometric Approach. Appl. Sci. 2019, 9, 3268. [Google Scholar] [CrossRef] [Green Version]
- Zamberletti, A.; Gallo, I.; Albertini, S. Robust Angle Invariant 1D Barcode Detection. In Proceedings of the 2nd IAPR Asian Conference on Pattern Recognition, Okinawa, Japan, 5–8 November 2013; pp. 160–164. [Google Scholar] [CrossRef] [Green Version]
- Zharkov, A.; Zagaynov, I. Universal Barcode Detector via Semantic Segmentation. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 837–843. [Google Scholar] [CrossRef] [Green Version]
- Hansen, D.K.; Nasrollahi, K.; Rasmusen, C.B.; Moeslund, T.B. Real-Time Barcode Detection and Classification using Deep Learning. In Proceedings of the 9th International Joint Conference on Computational Intelligence—IJCCI. INSTICC, Madeira, Portugal, 1–3 November 2017; pp. 321–327. [Google Scholar] [CrossRef]
- Zhang, L.; Sui, Y.; Zhu, F.; Zhu, M.; He, B.; Deng, Z. Fast Barcode Detection Method Based on ThinYOLOv4. In Proceedings of the International Conference on Cognitive Systems and Signal Processing, Zhuhai, China, 25–27 December 2020. [Google Scholar]
- Wudhikarn, R.; Charoenkwan, P.; Malang, K. Deep Learning in Barcode Recognition: A Systematic Literature Review. IEEE Access 2022, 10, 8049–8072. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 1–74. [Google Scholar] [CrossRef]
- Hao, S.; Zhou, Y.; Guo, Y. A Brief Survey on Semantic Segmentation with Deep Learning. Neurocomputing 2020, 406, 302–321. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Kamnardsiri, T.; Charoenkwan, P.; Malang, K.C.; Wudhikarn, R. 1D Barcode Detection: Novel Benchmark Datasets and Comprehensive Comparison of Deep Convolutional Neural Network Approaches. Sensors 2022, 22, 8788. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems–Volume 1, NIPS’15, Cambridge, MA, USA, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 14–19 June 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; NanoCode012; Kwon, Y.; TaoXie; Fang, J.; imyhxy; Michael, K.; et al. ultralytics/yolov5: v6.1—TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference. Available online: https://zenodo.org/record/6222936 (accessed on 25 June 2023).
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Zhao, Q.; Ni, F.; Song, Y.; Wang, Y.; Tang, Z. Deep dual pyramid network for barcode segmentation using barcode-30k database. arXiv 2018, arXiv:1807.11886. [Google Scholar]
- Zharkov, A.; Vavilin, A.; Zagaynov, I. New Benchmarks for Barcode Detection Using Both Synthetic and Real Data. In Proceedings of the International Workshop on Document Analysis Systems, Wuhan, China, 26–29 July 2020. [Google Scholar]
- Kaggle. Kaggle Competitions. Available online: https://www.kaggle.com/docs/competitions (accessed on 26 April 2023).
- Oliva-García, R.; Gómez-Cárdenes, Ó.; Marichal-Hernández, J.G.; Martín-Hernández, J.; Phillip-Lüke, J.; Rodríguez-Ramos, J.M. A local real-time bar detector based on the multiscale Radon transform. In Proceedings of the Real-Time Image Processing and Deep Learning, Orlando, FL, USA, 6–12 June 2022; Kehtarnavaz, N., Carlsohn, M.F., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, DC, USA, 2022; Volume 12102, p. 121020A. [Google Scholar] [CrossRef]
- Lindeberg, T. Scale-Space Theory in Computer Vision; Springer Science & Business Media: Berlin, Germany, 2013; Volume 256. [Google Scholar]
- Radon, J. Über die Bestimmung von Funktionen durch ihre Integralwerte längs gewisser Mannigfaltigkeiten. Akad. Wiss. 1917, 69, 262–277. [Google Scholar]
- Götz, W.; Druckmüller, H. A fast digital Radon transform—An efficient means for evaluating the Hough transform. Pattern Recognit. 1996, 29, 711–718. [Google Scholar] [CrossRef]
- Brady, M.L. A fast discrete approximation algorithm for the Radon transform. SIAM J. Comput. 1998, 27, 107–119. [Google Scholar] [CrossRef]
- Brandt, A.; Dym, J. Fast calculation of multiple line integrals. SIAM J. Sci. Comput. 1999, 20, 1417–1429. [Google Scholar] [CrossRef] [Green Version]
- Gómez-Cárdenes, O.; Oliva-García, R.; Rodríguez-Abreu, G.A.; Marichal-Hernández, J.G. Exposing Parallelism of Discrete Radon Transform. In Proceedings of the 3rd International Conference on Telecommunications and Communication Engineering, ICTCE ’19, New York, NY, USA, 19 February 2019; pp. 136–140. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2014, arXiv:1411.4038. [Google Scholar]
- Van Rossum, G. Python Tutorial, CS-R9526. Technical Report. 1995. Available online: https://ir.cwi.nl/pub/5007/05007D.pdf (accessed on 25 June 2023).
- Lam, S.K.; Pitrou, A.; Seibert, S. Numba: A LLVM-Based Python JIT Compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM ’15, New York, NY, USA, 15 November 2015. [Google Scholar] [CrossRef]
- ISO. ISO/IEC 14882:2017 Information Technology—Programming Languages—C++, 5th ed.; ISO: Geneva, Switzerland, 2017; p. 1605. [Google Scholar]
- Ragan-Kelley, J.; Barnes, C.; Adams, A.; Paris, S.; Durand, F.; Amarasinghe, S. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM Sigplan Not. 2013, 48, 519–530. [Google Scholar] [CrossRef] [Green Version]
- Adams, A.; Ma, K.; Anderson, L.; Baghdadi, R.; Li, T.M.; Gharbi, M.; Steiner, B.; Johnson, S.; Fatahalian, K.; Durand, F.; et al. Learning to optimize halide with tree search and random programs. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Mullapudi, R.T.; Adams, A.; Sharlet, D.; Ragan-Kelley, J.; Fatahalian, K. Automatically scheduling halide image processing pipelines. ACM Trans. Graph. 2016, 35, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Li, T.M.; Gharbi, M.; Adams, A.; Durand, F.; Ragan-Kelley, J. Differentiable programming for image processing and deep learning in Halide. ACM Trans. Graph. 2018, 37, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Source Code. Available online: https://github.com/DoMondo/an_encoder_decoder_architecture (accessed on 28 April 2023).
- Wells, W.M. Efficient Synthesis of Gaussian Filters by Cascaded Uniform Filters. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 234–239. [Google Scholar] [CrossRef] [PubMed]
- Kawase, M. Frame buffer postprocessing effects in double-steal (wrechless). In Proceedings of the Game Developers Conference 2003, San Jose, CA, USA, 4–8 March 2003. [Google Scholar]
- Martin, S.; Garrard, A.; Gruber, A.; Bjørge, M.; Zioma, R.; Benge, S.; Nummelin, N. Moving mobile graphics. In Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH ’15, Los Angeles, CA, USA, 9–13 August 2015. [Google Scholar] [CrossRef]
- Nayar, S.; Nakagawa, Y. Shape from focus. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 824–831. [Google Scholar] [CrossRef] [Green Version]
- Gómez-Cárdenes, Ó.; Marichal-Hernández, J.G.; Trujillo-Sevilla, J.M.; Carmona-Ballester, D.; Rodríguez-Ramos, J.M. Focus measurement in 3D focal stack using direct and inverse discrete radon transform. In Proceedings of the Three-Dimensional Imaging, Visualization, and Display, Anaheim, CA, USA, 10–11 April 2017; Javidi, B., Son, J.Y., Matoba, O., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, DC, USA, 2017; Volume 10219, p. 102190R. [Google Scholar] [CrossRef]
tile_size | Partial DRT | Bar Detector | |
---|---|---|---|
1 | - | - | |
2 | |||
4 | |||
8 | |||
16 | |||
32 |
Block | Operation | Input Size | Output Size |
---|---|---|---|
Encoder | mdd_drt_v | ||
Encoder | mdd_drt_h | ||
Encoder | mdd_bar_detector_0 | ||
Encoder | mdd_bar_detector_1 | ||
Encoder | mdd_bar_detector_2 | ||
Encoder | mdd_bar_detector_3 | ||
Encoder | mdd_bar_detector_4 | ||
Decoder | unpool_3 | ||
Decoder | convolutions_3 | ||
Decoder | unpool_2 | ||
Decoder | convolutions_2 | ||
Decoder | unpool_1 | ||
Decoder | convolutions_1 | ||
Decoder | unpool_0 | ||
Decoder | convolutions_0 | ||
Decoder | argmatxth |
Method | Resolution | Time (ms) |
---|---|---|
YOLO v5 [28] | 1024 × 1024 | 70 |
512 × 512 | 61 | |
640 × 480 | 45 | |
Zamberletti et al. (2013) [16] | 640 × 480 | 130 |
Creusot and Munawar (2015) [10] | 640 × 480 | 115 |
Creusot and Munawar (2016) [11] | 640 × 480 | 42 |
1080 × 960 | 116 | |
Zharkov and Zagaynov (2019) [17] | 512 × 512 | 44 |
MDD DRT | 1024 × 1024 | 21 |
PS DRT | 1024 × 1024 | 107 |
Method | Arte-Lab Rotated | WWU Münster | ||
---|---|---|---|---|
EfficientDet [26] | 1.000 | 0.855 | 0.999 | 0.782 |
Faster R-CNN [25] | 1.000 | 0.859 | 1.000 | 0.792 |
Retina Net [27] | 1.000 | 0.876 | 1.000 | 0.809 |
YOLO v5 [28] | 0.996 | 0.935 | 0.998 | 0.896 |
YOLO x [29] | 0.970 | 0.848 | 1.000 | 0.813 |
Zamberletti et al. (2013) [16] | 0.805 | - | 0.829 | - |
Creusot and Munawar (2015) [10] | 0.893 | - | 0.963 | - |
Creusot and Munawar (2016) [11] | 0.989 | - | 0.982 | - |
Hansen et al. (2017) [18] | 0.926 | - | 0.991 | - |
Zharkov and Zagaynov (2019) [17] | 0.989 | - | 0.980 | - |
PS DRT | 0.886 | 0.700 | 0.944 | 0.732 |
MDD DRT | 0.901 | 0.783 | 0.958 | 0.827 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gómez-Cárdenes, Ó.; Marichal-Hernández, J.G.; Son, J.-Y.; Pérez Jiménez, R.; Rodríguez-Ramos, J.M. An Encoder–Decoder Architecture within a Classical Signal-Processing Framework for Real-Time Barcode Segmentation. Sensors 2023, 23, 6109. https://doi.org/10.3390/s23136109
Gómez-Cárdenes Ó, Marichal-Hernández JG, Son J-Y, Pérez Jiménez R, Rodríguez-Ramos JM. An Encoder–Decoder Architecture within a Classical Signal-Processing Framework for Real-Time Barcode Segmentation. Sensors. 2023; 23(13):6109. https://doi.org/10.3390/s23136109
Chicago/Turabian StyleGómez-Cárdenes, Óscar, José Gil Marichal-Hernández, Jung-Young Son, Rafael Pérez Jiménez, and José Manuel Rodríguez-Ramos. 2023. "An Encoder–Decoder Architecture within a Classical Signal-Processing Framework for Real-Time Barcode Segmentation" Sensors 23, no. 13: 6109. https://doi.org/10.3390/s23136109
APA StyleGómez-Cárdenes, Ó., Marichal-Hernández, J. G., Son, J. -Y., Pérez Jiménez, R., & Rodríguez-Ramos, J. M. (2023). An Encoder–Decoder Architecture within a Classical Signal-Processing Framework for Real-Time Barcode Segmentation. Sensors, 23(13), 6109. https://doi.org/10.3390/s23136109