Nothing Special   »   [go: up one dir, main page]

Next Issue
Volume 9, August
Previous Issue
Volume 9, June
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

J. Imaging, Volume 9, Issue 7 (July 2023) – 24 articles

Cover Story (view full-size image): In this paper, we propose some strategies to improve stability without losing too much accuracy to deblur images with deep-learning-based methods. First, we suggest a very small neural architecture, which reduces the execution time for training, satisfying a green AI need, and does not extremely amplify noise in the computed image. Second, we introduce a unified framework where a pre-processing step balances the lack of stability of the following neural-network-based step. Two different pre-processors are presented. The former implements a strong parameter-free denoiser, and the latter is a variational-model-based regularized formulation of the latent imaging problem. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
18 pages, 4509 KiB  
Article
Automatic Localization of Five Relevant Dermoscopic Structures Based on YOLOv8 for Diagnosis Improvement
by Esther Chabi Adjobo, Amadou Tidjani Sanda Mahama, Pierre Gouton and Joël Tossa
J. Imaging 2023, 9(7), 148; https://doi.org/10.3390/jimaging9070148 - 21 Jul 2023
Cited by 10 | Viewed by 2598
Abstract
The automatic detection of dermoscopic features is a task that provides the specialists with an image with indications about the different patterns present in it. This information can help them fully understand the image and improve their decisions. However, the automatic analysis of [...] Read more.
The automatic detection of dermoscopic features is a task that provides the specialists with an image with indications about the different patterns present in it. This information can help them fully understand the image and improve their decisions. However, the automatic analysis of dermoscopic features can be a difficult task because of their small size. Some work was performed in this area, but the results can be improved. The objective of this work is to improve the precision of the automatic detection of dermoscopic features. To achieve this goal, an algorithm named yolo-dermoscopic-features is proposed. The algorithm consists of four points: (i) generate annotations in the JSON format for supervised learning of the model; (ii) propose a model based on the latest version of Yolo; (iii) pre-train the model for the segmentation of skin lesions; (iv) train five models for the five dermoscopic features. The experiments are performed on the ISIC 2018 task2 dataset. After training, the model is evaluated and compared to the performance of two methods. The proposed method allows us to reach average performances of 0.9758, 0.954, 0.9724, 0.938, and 0.9692, respectively, for the Dice similarity coefficient, Jaccard similarity coefficient, precision, recall, and average precision. Furthermore, comparing to other methods, the proposed method reaches a better Jaccard similarity coefficient of 0.954 and, thus, presents the best similarity with the annotations made by specialists. This method can also be used to automatically annotate images and, therefore, can be a solution to the lack of features annotation in the dataset. Full article
(This article belongs to the Special Issue Imaging Informatics: Computer-Aided Diagnosis)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Atypical pigment network on a dermoscopic image of nevus, (<b>b</b>) Irregular dots on a dermoscopic image of nevus, (<b>c</b>) Irregular globules on a dermoscopic image of melanoma, (<b>d</b>) Streaks on a dermoscopic image of melanoma, (<b>e</b>) Area without structure on a dermoscopic image of nevus [<a href="#B3-jimaging-09-00148" class="html-bibr">3</a>,<a href="#B4-jimaging-09-00148" class="html-bibr">4</a>,<a href="#B5-jimaging-09-00148" class="html-bibr">5</a>].</p>
Full article ">Figure 2
<p>Classification of object detection methods.</p>
Full article ">Figure 3
<p>Features structures and dataset images [<a href="#B3-jimaging-09-00148" class="html-bibr">3</a>,<a href="#B4-jimaging-09-00148" class="html-bibr">4</a>,<a href="#B5-jimaging-09-00148" class="html-bibr">5</a>]. (<b>a</b>) Dermoscopic image of melanocytic nevi, (<b>b</b>) Dermoscopic image of melanoma, (<b>c</b>) Dermoscopic image of nevus, (<b>d</b>) Dermoscopic image of melanoma, (<b>e</b>) Dermoscopic image of nevus.</p>
Full article ">Figure 4
<p>Pipeline of the YDL approach. The green modules represent parts on which modifications were made.</p>
Full article ">Figure 5
<p>Hierarchical structure of the generated JSON file.</p>
Full article ">Figure 6
<p>Timeline of different versions of Yolo.</p>
Full article ">Figure 7
<p>Yolo Architecture.</p>
Full article ">Figure 8
<p>Loss charts of models during training and validation. (<b>a</b>), (<b>b</b>), (<b>c</b>), (<b>d</b>), and (<b>e</b>), respectively, represent charts of the globule localization model, the milia-like cyst (MLC) localization model, the pigment network (PN) localization model, the negative network (NN) localization model and the streak localization model. The epoch number is on the <span class="html-italic">x</span>-axis, and loss is on the <span class="html-italic">y</span>-axis.</p>
Full article ">Figure 9
<p>Performance measures of the models on training step. (<b>a</b>), (<b>b</b>), (<b>c</b>), (<b>d</b>) and (<b>e</b>), respectively, represent the performance metrics of the globule localization model, the milia-like cyst (MLC) localization model, the pigment network (PN) localization model, the negative network (NN) localization model, and the streak localization model. The epoch number is on the <span class="html-italic">x</span>-axis and the metric value (precision, recall, or mAP) is on the <span class="html-italic">y</span>-axis.</p>
Full article ">Figure 10
<p>(<b>a</b>) Dermoscopic images. (<b>b</b>) Experts’ annotation masks of globules. (<b>c</b>) Predictions of globule positions.</p>
Full article ">Figure 11
<p>(<b>a</b>) Dermoscopic images. (<b>b</b>) Experts’ annotation masks of milia-like cyst (MLC). (<b>c</b>) Predictions of milia-like cyst (MLC) positions.</p>
Full article ">Figure 12
<p>(<b>a</b>) Dermoscopic images. (<b>b</b>) Experts’ annotation masks of the pigment network (PN). (<b>c</b>) Predictions of pigment network (PN) positions.</p>
Full article ">Figure 13
<p>(<b>a</b>) Dermoscopic images. (<b>b</b>) Experts’ annotation masks of the negative network (NN). (<b>c</b>) Predictions of negative network (NN) positions.</p>
Full article ">Figure 14
<p>(<b>a</b>) Dermoscopic images. (<b>b</b>) Experts’ annotation masks of streaks. (<b>c</b>) Predictions of streaks positions.</p>
Full article ">
4 pages, 523 KiB  
Editorial
Deep Learning and Vision Transformer for Medical Image Analysis
by Yudong Zhang, Jiaji Wang, Juan Manuel Gorriz and Shuihua Wang
J. Imaging 2023, 9(7), 147; https://doi.org/10.3390/jimaging9070147 - 21 Jul 2023
Cited by 9 | Viewed by 3899
Abstract
Artificial intelligence (AI) refers to the field of computer science theory and technology [...] Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Figure 1
<p>Relationship between AI, ML, DL, and Transformers.</p>
Full article ">Figure 2
<p>Eight common procedures in medical image analysis.</p>
Full article ">
27 pages, 6394 KiB  
Article
Algebraic Multi-Layer Network: Key Concepts
by Igor Khanykov, Vadim Nenashev and Mikhail Kharinov
J. Imaging 2023, 9(7), 146; https://doi.org/10.3390/jimaging9070146 - 18 Jul 2023
Cited by 3 | Viewed by 1317
Abstract
The paper refers to interdisciplinary research in the areas of hierarchical cluster analysis of big data and ordering of primary data to detect objects in a color or in a grayscale image. To perform this on a limited domain of multidimensional data, an [...] Read more.
The paper refers to interdisciplinary research in the areas of hierarchical cluster analysis of big data and ordering of primary data to detect objects in a color or in a grayscale image. To perform this on a limited domain of multidimensional data, an NP-hard problem of calculation of close to optimal piecewise constant data approximations with the smallest possible standard deviations or total squared errors (approximation errors) is solved. The solution is achieved by revisiting, modernizing, and combining classical Ward’s clustering, split/merge, and K-means methods. The concepts of objects, images, and their elements (superpixels) are formalized as structures that are distinguishable from each other. The results of structuring and ordering the image data are presented to the user in two ways, as tabulated approximations of the image showing the available object hierarchies. For not only theoretical reasoning, but also for practical implementation, reversible calculations with pixel sets are performed easily, as with individual pixels in terms of Sleator–Tarjan Dynamic trees and cyclic graphs forming an Algebraic Multi-Layer Network (AMN). The detailing of the latter significantly distinguishes this paper from our prior works. The establishment of the invariance of detected objects with respect to changing the context of the image and its transformation into grayscale is also new. Full article
(This article belongs to the Special Issue Image Segmentation Techniques: Current Status and Future Directions)
Show Figures

Figure 1

Figure 1
<p>The existing multi-valued solution of the problem of image hierarchical approximation, achievable by Ward’s pixel clustering. The lower gray convex curve describes <math display="inline"><semantics><msub><mi>E</mi><mi>g</mi></msub></semantics></math> sequence of optimal image approximations. The upper non-convex black curve describes errors <math display="inline"><semantics><msub><mi>E</mi><mi>g</mi></msub></semantics></math> of image approximations by superpixels constituting some irregular hierarchical sequence. The remaining red convex curves describe <math display="inline"><semantics><msub><mi>E</mi><mi>g</mi></msub></semantics></math> sequences of hierarchical image approximations, each containing at least one optimal approximation in a corresponding incrementing number of colors.</p>
Full article ">Figure 2
<p>Scheme of the Algebraic Multi-Layer Network.</p>
Full article ">Figure 3
<p>Tree merge (<b>left</b>) and cyclic graph merge (<b>right</b>).</p>
Full article ">Figure 4
<p>Reversible coding of the partition hierarchy by means of the network core. On the left the segmentation sequence is shown for an image of four pixels encoded by a pair of graphs presented on the right. In the upper right corner the appropriate Sleator–Tarjan Dynamic tree is shown. In the lower right corner, the bold red arrow indicates the pointer to an edge of a tree which was the first to be set. The dashed arrow is an external pointer to this arrow. The next black arrow shows a pointer to the edge, which was set second, etc.</p>
Full article ">Figure 5
<p>Encoding the hierarchy of pixel clusters with the Sleator–Tarjan Dynamic trees (black lines) and cyclic graphs (red lines) through the example of an image containing 25 pixels.</p>
Full article ">Figure 6
<p>Reversible merge operation through the example of an image containing 25 pixels. The convergent edges of Sleator–Tarjan Dynamic trees (black lines) are interconnected by the edges of the cyclic graphs (dashed lines) in the order in which they were established. The pointers to the edges of the trees, established first, are indicated by dotted lines.</p>
Full article ">Figure 7
<p>Three sample images for experiments. (<b>Left</b>): “Girl” image (<b>top</b>) of <math display="inline"><semantics><mrow><mn>321</mn><mo>×</mo><mn>481</mn></mrow></semantics></math> pixels and the corresponding grayscale representation of this image (<b>bottom</b>). (<b>Right</b>): composite “5images” picture, consisting of five image components, merged into a single entity of <math display="inline"><semantics><mrow><mn>814</mn><mo>×</mo><mn>978</mn></mrow></semantics></math> pixels.</p>
Full article ">Figure 8
<p>Comparison of optimal approximations for color and grayscale image representations. In the upper row, the optimal color image approximations in <math display="inline"><semantics><mrow><mi>g</mi><mo>=</mo><mn>2</mn><mo>,</mo><mn>3</mn><mo>,</mo><mn>4</mn><mo>,</mo><mn>5</mn></mrow></semantics></math>, and 6 colors are presented from left to right, and, below them, similar approximations of the grayscale image are shown. Image approximations at the top are labeled with <span class="html-italic">g</span> cluster numbers and standard deviation values <math display="inline"><semantics><mi>σ</mi></semantics></math>.</p>
Full article ">Figure 9
<p>Comparison of optimal approximations for color and grayscale image representations. Minimal standard deviations for the number of cluster numbers ranging from 1 to 100. The top red graph describes a color image and the bottom black graph describes a grayscale representation.</p>
Full article ">Figure 10
<p>Optimal approximations of “5images” picture in two to seven colors arranged in two rows in lexicographic order. Image approximations at the top are labeled with <span class="html-italic">g</span> color numbers and standard deviation values <math display="inline"><semantics><mi>σ</mi></semantics></math>.</p>
Full article ">Figure 11
<p>Dynamic Table of approximations for “Girl” color image (dimensions <math display="inline"><semantics><mrow><mn>321</mn><mo>×</mo><mn>481</mn></mrow></semantics></math> pixels). The first row and first column of the table were cropped. The columns of Dynamic Table, containing the optimal image approximations in 2–6 colors, are shown. Each column contains a binary hierarchical sequence of image approximations with incrementally added colors: <math display="inline"><semantics><mrow><mi>g</mi><mo>=</mo><mn>2</mn><mo>,</mo><mn>3</mn></mrow></semantics></math>, and 4. On the main diagonal of the Dynamic Table are the optimal image approximations in <math display="inline"><semantics><mrow><msub><mi>g</mi><mn>0</mn></msub><mo>=</mo><mn>2</mn><mo>,</mo><mn>3</mn></mrow></semantics></math>, and 4 colors. Image approximations at the top are labeled with <span class="html-italic">g</span> color numbers and corresponding standard deviation values <math display="inline"><semantics><mi>σ</mi></semantics></math>.</p>
Full article ">Figure 12
<p>Improving object detection using superpixel approximations. (<b>Top</b>): Sample images from the SIPI Image Database combined into a single 1024 × 512-pixel “Tanks” image. (<b>Bottom</b>): Optimal image approximation in four intensity levels (<b>left</b>) and improved image approximation in four intensity levels (<b>right</b>). The segments containing the object-of-interest are filled with white.</p>
Full article ">Figure 13
<p>Dynamic Superpixel Table for the grayscale “Tanks” image (dimensions <math display="inline"><semantics><mrow><mn>1024</mn><mo>×</mo><mn>512</mn></mrow></semantics></math> pixels). The columns contain binary hierarchical sequences of approximations in the incremental number of intensity levels from <math display="inline"><semantics><mrow><mi>g</mi><mo>=</mo><mn>1</mn></mrow></semantics></math> to <math display="inline"><semantics><mrow><mi>g</mi><mo>=</mo><mn>11</mn></mrow></semantics></math>, indicated on the left. The lowest approximations in each column are the superpixel approximations obtained as intersections of the <math display="inline"><semantics><mrow><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>…</mo></mrow></semantics></math> optimal image approximations. The other approximations are generated from the latter using Ward’s pixel clustering. A problematic five-pointed star (filled with white) appears at approximations in intensity levels 4–11.</p>
Full article ">
24 pages, 8111 KiB  
Article
Semi-Automatic GUI Platform to Characterize Brain Development in Preterm Children Using Ultrasound Images
by David Rabanaque, Maria Regalado, Raul Benítez, Sonia Rabanaque, Thais Agut, Nuria Carreras and Christian Mata
J. Imaging 2023, 9(7), 145; https://doi.org/10.3390/jimaging9070145 - 18 Jul 2023
Viewed by 1614
Abstract
The third trimester of pregnancy is the most critical period for human brain development, during which significant changes occur in the morphology of the brain. The development of sulci and gyri allows for a considerable increase in the brain surface. In preterm newborns, [...] Read more.
The third trimester of pregnancy is the most critical period for human brain development, during which significant changes occur in the morphology of the brain. The development of sulci and gyri allows for a considerable increase in the brain surface. In preterm newborns, these changes occur in an extrauterine environment that may cause a disruption of the normal brain maturation process. We hypothesize that a normalized atlas of brain maturation with cerebral ultrasound images from birth to term equivalent age will help clinicians assess these changes. This work proposes a semi-automatic Graphical User Interface (GUI) platform for segmenting the main cerebral sulci in the clinical setting from ultrasound images. This platform has been obtained from images of a cerebral ultrasound neonatal database images provided by two clinical researchers from the Hospital Sant Joan de Déu in Barcelona, Spain. The primary objective is to provide a user-friendly design platform for clinicians for running and visualizing an atlas of images validated by medical experts. This GUI offers different segmentation approaches and pre-processing tools and is user-friendly and designed for running, visualizing images, and segmenting the principal sulci. The presented results are discussed in detail in this paper, providing an exhaustive analysis of the proposed approach’s effectiveness. Full article
(This article belongs to the Special Issue Imaging Informatics: Computer-Aided Diagnosis)
Show Figures

Figure 1

Figure 1
<p>The spatio-temporal fetal brain magnetic resonance atlas (CRL fetal brain atlas) at six representative gestational ages: 22, 25, 28, 31, 34, and 37 weeks. Axial, coronal, and sagittal views of the atlas have been shown for each age point [<a href="#B7-jimaging-09-00145" class="html-bibr">7</a>].</p>
Full article ">Figure 2
<p>Plans used in the study of a premature baby. For the coronal plane and following the alphabetical order from (<b>a</b>) to (<b>f</b>), we have the planes of <math display="inline"><semantics><mrow><mi>c</mi><mn>1</mn></mrow></semantics></math>, <math display="inline"><semantics><mrow><mi>c</mi><mn>2</mn></mrow></semantics></math>, <math display="inline"><semantics><mrow><mi>c</mi><mn>3</mn></mrow></semantics></math>, <math display="inline"><semantics><mrow><mi>c</mi><mn>4</mn></mrow></semantics></math>, <math display="inline"><semantics><mrow><mi>c</mi><mn>5</mn></mrow></semantics></math>, and <math display="inline"><semantics><mrow><mi>c</mi><mn>6</mn></mrow></semantics></math>; and following the same order but starting with (<b>g</b>) and ending with (<b>m</b>), the sagittal planes <math display="inline"><semantics><mrow><mi>s</mi><mn>1</mn></mrow></semantics></math>, <math display="inline"><semantics><mrow><mi>s</mi><mn>2</mn><mi>l</mi></mrow></semantics></math>, <math display="inline"><semantics><mrow><mi>s</mi><mn>2</mn><mi>r</mi></mrow></semantics></math>, <math display="inline"><semantics><mrow><mi>s</mi><mn>3</mn><mi>l</mi></mrow></semantics></math>, <math display="inline"><semantics><mrow><mi>s</mi><mn>3</mn><mi>r</mi></mrow></semantics></math>, <math display="inline"><semantics><mrow><mi>s</mi><mn>4</mn><mi>l</mi></mrow></semantics></math>, and <math display="inline"><semantics><mrow><mi>s</mi><mn>4</mn><mi>r</mi></mrow></semantics></math>.</p>
Full article ">Figure 3
<p>Semiautomatic groove detection platform.</p>
Full article ">Figure 4
<p>Main software components of the proposed tool.</p>
Full article ">Figure 5
<p>Preprocessing of an image with the objective of scaling the values between the maximum and minimum values of the image.</p>
Full article ">Figure 6
<p>Comparison of the original image with that obtained after applying the Sigmoid function with a cutoff value is 0.5 and the gain value is 10.</p>
Full article ">Figure 7
<p>Local filter has been applied to an ultrasound image (Original) with different surface sizes for analysis. As shown in the figure, the resulting images obtained using Mean 3 × 3, 9 × 9, 27 × 27, 55 × 55, and 81 × 81 pixels are presented from left to right as an example.</p>
Full article ">Figure 8
<p>Comparison of the original image with that obtained after applying a threshold and finally the closure function.</p>
Full article ">Figure 9
<p>Definition by means of the histogram, of the number of structures that have an area and the elimination of those with an area lower than the reference value (red line), in order to obtain a new image (third column) with those structures that meet the condition.</p>
Full article ">Figure 10
<p>Mask obtained from the morphological_chan_vese function of a groove to be segmented and its corresponding inversion because the number of pixels with value 1 was greater than 50%.</p>
Full article ">Figure 11
<p>Steps to be followed once the zone has been defined, defining the mask, the number of segments and finally the contour using the maskSLIC function.</p>
Full article ">Figure 12
<p>Result of the manual segmentation of each groove defined in the upper cards and carried out by the platform algorithms, in this case Threshold.</p>
Full article ">Figure 13
<p>Semiautomatic groove detection platform.</p>
Full article ">Figure 14
<p>Movement of the vertices of the segment defined by the algorithm and change of the coordinates of the corresponding groove in the table.</p>
Full article ">Figure 15
<p>Segmentation examples for the Sylvian sulcus (Manual segmentation), carried out between weeks 24 and 32 of gestation of a mime baby applying the different segmentation methods (Threshold, Sigmoid + Threshold, and Snakes).</p>
Full article ">Figure 16
<p>Examples of segmentation of different grooves in an ultrasound scan of a baby at week 29 of gestation and c4 coronal section.</p>
Full article ">Figure 17
<p>Segmentation of the Sylvian sulcus applying the three defined segmentation methods (Threshold, Sigmoid + Threshold, and Snakes) for different babies and weeks.</p>
Full article ">Figure 18
<p>Example of how the segmentation results vary with different methods depending on the accuracy of the manual segmentation, with the first row showing more precise manual segmentation and the second row showing less precise manual segmentation.</p>
Full article ">
16 pages, 3403 KiB  
Article
Varroa Destructor Classification Using Legendre–Fourier Moments with Different Color Spaces
by Alicia Noriega-Escamilla, César J. Camacho-Bello, Rosa M. Ortega-Mendoza, José H. Arroyo-Núñez and Lucia Gutiérrez-Lazcano
J. Imaging 2023, 9(7), 144; https://doi.org/10.3390/jimaging9070144 - 14 Jul 2023
Cited by 4 | Viewed by 2052
Abstract
Bees play a critical role in pollination and food production, so their preservation is essential, particularly highlighting the importance of detecting diseases in bees early. The Varroa destructor mite is the primary factor contributing to increased viral infections that can lead to hive [...] Read more.
Bees play a critical role in pollination and food production, so their preservation is essential, particularly highlighting the importance of detecting diseases in bees early. The Varroa destructor mite is the primary factor contributing to increased viral infections that can lead to hive mortality. This study presents an innovative method for identifying Varroa destructors in honey bees using multichannel Legendre–Fourier moments. The descriptors derived from this approach possess distinctive characteristics, such as rotation and scale invariance, and noise resistance, allowing the representation of digital images with minimal descriptors. This characteristic is advantageous when analyzing images of living organisms that are not in a static posture. The proposal evaluates the algorithm’s efficiency using different color models, and to enhance its capacity, a subdivision of the VarroaDataset is used. This enhancement allows the algorithm to process additional information about the color and shape of the bee’s legs, wings, eyes, and mouth. To demonstrate the advantages of our approach, we compare it with other deep learning methods, in semantic segmentation techniques, such as DeepLabV3, and object detection techniques, such as YOLOv5. The results suggest that our proposal offers a promising means for the early detection of the Varroa destructor mite, which could be an essential pillar in the preservation of bees and, therefore, in food production. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Some images from the VarroaDataset: (<b>a</b>) healthy bees, (<b>b</b>) bees with Varroa mites. (<b>c</b>) The Varroa mite has a reddish-brown coloration and usually hides under the tergite of the adult bee.</p>
Full article ">Figure 2
<p>Varroa classification process.</p>
Full article ">Figure 3
<p>Classification with Legendre–Fourier moments using different color models.</p>
Full article ">Figure 4
<p>Diagram of the proposed classification process: classification with the YCbCr color space.</p>
Full article ">Figure 5
<p>Scatter diagram of the classification: bees with healthy dorsal side and bees with healthy ventral side in the color models (<b>a</b>) RGB, (<b>b</b>) HSV, and (<b>c</b>) YCbCr.</p>
Full article ">
26 pages, 28184 KiB  
Article
The Dangers of Analyzing Thermographic Radiometric Data as Images
by Časlav Livada, Hrvoje Glavaš, Alfonzo Baumgartner and Dina Jukić
J. Imaging 2023, 9(7), 143; https://doi.org/10.3390/jimaging9070143 - 12 Jul 2023
Cited by 1 | Viewed by 1746
Abstract
Thermography is probably the most used method of measuring surface temperature by analyzing radiation in the infrared part of the spectrum which accuracy depends on factors such as emissivity and reflected radiation. Contrary to popular belief that thermographic images represent temperature maps, they [...] Read more.
Thermography is probably the most used method of measuring surface temperature by analyzing radiation in the infrared part of the spectrum which accuracy depends on factors such as emissivity and reflected radiation. Contrary to popular belief that thermographic images represent temperature maps, they are actually thermal radiation converted into an image, and if not properly calibrated, they show incorrect temperatures. The objective of this study is to analyze commonly used image processing techniques and their impact on radiometric data in thermography. In particular, the extent to which a thermograph can be considered as an image and how image processing affects radiometric data. Three analyzes are presented in the paper. The first one examines how image processing techniques, such as contrast and brightness, affect physical reality and its representation in thermographic imaging. The second analysis examines the effects of JPEG compression on radiometric data and how degradation of the data varies with the compression parameters. The third analysis aims to determine the optimal resolution increase required to minimize the effects of compression on the radiometric data. The output from an IR camera in CSV format was used for these analyses, and compared to images from the manufacturer’s software. The IR camera providing data in JPEG format was used, and the data included thermographic images, visible images, and a matrix of thermal radiation data. The study was verified with a reference blackbody radiation set at 60 °C. The results highlight the dangers of interpreting thermographic images as temperature maps without considering the underlying radiometric data which can be affected by image processing and compression. The paper concludes with the importance of accurate and precise thermographic analysis for reliable temperature measurement. Full article
(This article belongs to the Special Issue Data Processing with Artificial Intelligence in Thermal Imagery)
Show Figures

Figure 1

Figure 1
<p>Comparison of the “amount” of radiation in the visible part of the spectrum and the detection range of the IR camera.</p>
Full article ">Figure 2
<p>Classification of thermography in the field of electromagnetic radiation.</p>
Full article ">Figure 3
<p>LWIR radiation detected by the IR camera.</p>
Full article ">Figure 4
<p>Comparison of thermography and photography.</p>
Full article ">Figure 5
<p>Conversion and display of the digital thermographic radiometric recording in image form.</p>
Full article ">Figure 6
<p>Blackbody thermographs displayed in three different palettes.</p>
Full article ">Figure 7
<p>Dialog box of software support for infrared camera and functions that would correspond to the operations of adjusting brightness and contrast in photography.</p>
Full article ">Figure 8
<p>The temperature range of useful information and thermograms of the lower and upper limit.</p>
Full article ">Figure 9
<p>Display of increasing the temperature range by one degree <math display="inline"><semantics><msup><mrow/><mo>°</mo></msup></semantics></math>C from the lowest temperature.</p>
Full article ">Figure 10
<p>Display of increasing the temperature range by ten degrees <math display="inline"><semantics><msup><mrow/><mo>°</mo></msup></semantics></math>C from the lowest temperature to the maximum registered value.</p>
Full article ">Figure 11
<p>Display of increasing the temperature range by ten degrees <math display="inline"><semantics><msup><mrow/><mo>°</mo></msup></semantics></math>C from the highest temperature to the minimum registered value.</p>
Full article ">Figure 12
<p>Thermographs in different palettes: Iron, Artic, Rainbow HC, Lava and Medical; and contrast comparison in the range of 255 shades of gray.</p>
Full article ">Figure 13
<p>Digitization of a continuous image.</p>
Full article ">Figure 14
<p>Quantized image with corresponding bit representation.</p>
Full article ">Figure 15
<p>Contrast and brightness difference.</p>
Full article ">Figure 16
<p>Algorithm and procedure of the JPEG image compression.</p>
Full article ">Figure 17
<p>Image after downscaling with different interpolation methods.</p>
Full article ">Figure 18
<p>Image after upscaling with different interpolation methods.</p>
Full article ">Figure 19
<p>Thermograph of a blackbody.</p>
Full article ">Figure 20
<p>Thermograph of the active part of the emitting surface.</p>
Full article ">Figure 21
<p>Comparison of raw and normalized data.</p>
Full article ">Figure 22
<p>Results of brightness and contrast manipulation.</p>
Full article ">Figure 23
<p>Comparison of the raw data, PNG, and JPEG compression.</p>
Full article ">Figure 24
<p>Results of JPEG compression with various compression ratios.</p>
Full article ">Figure 25
<p>Objective evaluation of image quality in JPEG compression.</p>
Full article ">Figure 26
<p>Defining the region of interest of the blackbody.</p>
Full article ">Figure 27
<p>Linear temperature pattern and corresponding zones.</p>
Full article ">Figure 28
<p>Results of compression with highlighted ROIs.</p>
Full article ">Figure 29
<p>Differences in the mean values of the three observed zones.</p>
Full article ">Figure 30
<p>Enlarged portion of the image showing the blackbody at different compression ratios.</p>
Full article ">Figure 31
<p>3D representation of the blackbody at different compression ratios.</p>
Full article ">Figure 32
<p>Results of image interpolation with different methods.</p>
Full article ">
20 pages, 9805 KiB  
Article
Augmented Reality in Maintenance—History and Perspectives
by Ana Malta, Torres Farinha and Mateus Mendes
J. Imaging 2023, 9(7), 142; https://doi.org/10.3390/jimaging9070142 - 10 Jul 2023
Cited by 5 | Viewed by 3236
Abstract
Augmented Reality (AR) is a technology that allows virtual elements to be superimposed over images of real contexts, whether these are text elements, graphics, or other types of objects. Smart AR glasses are increasingly optimized, and modern ones have features such as Global [...] Read more.
Augmented Reality (AR) is a technology that allows virtual elements to be superimposed over images of real contexts, whether these are text elements, graphics, or other types of objects. Smart AR glasses are increasingly optimized, and modern ones have features such as Global Positioning System (GPS), a microphone, and gesture recognition, among others. These devices allow users to have their hands free to perform tasks while they receive instructions in real time through the glasses. This allows maintenance professionals to carry out interventions more efficiently and in a shorter time than would be necessary without the support of this technology. In the present work, a timeline of important achievements is established, including important findings in object recognition, real-time operation. and integration of technologies for shop floor use. Perspectives on future research and related recommendations are proposed as well. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Basic techniques commonly used in AR.</p>
Full article ">Figure 2
<p>Timeline of developments in Augmented Reality, highlighting the most important advances for industrial applications.</p>
Full article ">Figure 3
<p>Pipeline of an AR application with a neural network used for object detection.</p>
Full article ">
19 pages, 7045 KiB  
Article
An Effective Hyperspectral Image Classification Network Based on Multi-Head Self-Attention and Spectral-Coordinate Attention
by Minghua Zhang, Yuxia Duan, Wei Song, Haibin Mei and Qi He
J. Imaging 2023, 9(7), 141; https://doi.org/10.3390/jimaging9070141 - 10 Jul 2023
Cited by 1 | Viewed by 1766
Abstract
In hyperspectral image (HSI) classification, convolutional neural networks (CNNs) have been widely employed and achieved promising performance. However, CNN-based methods face difficulties in achieving both accurate and efficient HSI classification due to their limited receptive fields and deep architectures. To alleviate these limitations, [...] Read more.
In hyperspectral image (HSI) classification, convolutional neural networks (CNNs) have been widely employed and achieved promising performance. However, CNN-based methods face difficulties in achieving both accurate and efficient HSI classification due to their limited receptive fields and deep architectures. To alleviate these limitations, we propose an effective HSI classification network based on multi-head self-attention and spectral-coordinate attention (MSSCA). Specifically, we first reduce the redundant spectral information of HSI by using a point-wise convolution network (PCN) to enhance discriminability and robustness of the network. Then, we capture long-range dependencies among HSI pixels by introducing a modified multi-head self-attention (M-MHSA) model, which applies a down-sampling operation to alleviate the computing burden caused by the dot-product operation of MHSA. Furthermore, to enhance the performance of the proposed method, we introduce a lightweight spectral-coordinate attention fusion module. This module combines spectral attention (SA) and coordinate attention (CA) to enable the network to better weight the importance of useful bands and more accurately localize target objects. Importantly, our method achieves these improvements without increasing the complexity or computational cost of the network. To demonstrate the effectiveness of our proposed method, experiments were conducted on three classic HSI datasets: Indian Pines (IP), Pavia University (PU), and Salinas. The results show that our proposed method is highly competitive in terms of both efficiency and accuracy when compared to existing methods. Full article
Show Figures

Figure 1

Figure 1
<p>The overall network architecture of the proposed MSSCA.</p>
Full article ">Figure 2
<p>The architecture of the modified multi-head self-attention.</p>
Full article ">Figure 3
<p>The architecture of spectral attention module.</p>
Full article ">Figure 4
<p>The architecture of the coordinate attention.</p>
Full article ">Figure 5
<p>Indian Pines images: (<b>a</b>) false-color image; (<b>b</b>) ground truth.</p>
Full article ">Figure 6
<p>Pavia University images: (<b>a</b>) false-color image; (<b>b</b>) ground truth.</p>
Full article ">Figure 7
<p>Salinas images: (<b>a</b>) false-color image; (<b>b</b>) ground truth.</p>
Full article ">Figure 8
<p>Classification maps of different methods for the Indian Pines dataset.</p>
Full article ">Figure 9
<p>Classification maps of different methods for the Pavia University dataset.</p>
Full article ">Figure 10
<p>Classification maps of different methods for the Salinas dataset.</p>
Full article ">Figure 11
<p>The OA of different methods with varying ratios of training samples. (<b>a</b>) Indian Pines. (<b>b</b>) Pavia University. (<b>c</b>) Salinas.</p>
Full article ">Figure 11 Cont.
<p>The OA of different methods with varying ratios of training samples. (<b>a</b>) Indian Pines. (<b>b</b>) Pavia University. (<b>c</b>) Salinas.</p>
Full article ">
20 pages, 3285 KiB  
Article
Conv-ViT: A Convolution and Vision Transformer-Based Hybrid Feature Extraction Method for Retinal Disease Detection
by Pramit Dutta, Khaleda Akther Sathi, Md. Azad Hossain and M. Ali Akber Dewan
J. Imaging 2023, 9(7), 140; https://doi.org/10.3390/jimaging9070140 - 10 Jul 2023
Cited by 23 | Viewed by 5854
Abstract
The current advancement towards retinal disease detection mainly focused on distinct feature extraction using either a convolutional neural network (CNN) or a transformer-based end-to-end deep learning (DL) model. The individual end-to-end DL models are capable of only processing texture or shape-based information for [...] Read more.
The current advancement towards retinal disease detection mainly focused on distinct feature extraction using either a convolutional neural network (CNN) or a transformer-based end-to-end deep learning (DL) model. The individual end-to-end DL models are capable of only processing texture or shape-based information for performing detection tasks. However, extraction of only texture- or shape-based features does not provide the model robustness needed to classify different types of retinal diseases. Therefore, concerning these two features, this paper developed a fusion model called ‘Conv-ViT’ to detect retinal diseases from foveal cut optical coherence tomography (OCT) images. The transfer learning-based CNN models, such as Inception-V3 and ResNet-50, are utilized to process texture information by calculating the correlation of the nearby pixel. Additionally, the vision transformer model is fused to process shape-based features by determining the correlation between long-distance pixels. The hybridization of these three models results in shape-based texture feature learning during the classification of retinal diseases into its four classes, including choroidal neovascularization (CNV), diabetic macular edema (DME), DRUSEN, and NORMAL. The weighted average classification accuracy, precision, recall, and F1 score of the model are found to be approximately 94%. The results indicate that the fusion of both texture and shape features assisted the proposed Conv-ViT model to outperform the state-of-the-art retinal disease classification models. Full article
Show Figures

Figure 1

Figure 1
<p>Hybrid Feature Extractor Conv-ViT Framework combines the feature extraction capability of Inception-V3, ResNet-50, and Vision Transformer.</p>
Full article ">Figure 2
<p>Factorization in Inception Architecture shows how the computation cost is reduced without degrading the performance by using multiple small filters instead of a single large one.</p>
Full article ">Figure 3
<p>Three-layer Residual function Block; the basic building block of ResNet-50. This residual function block is stacked on top of each other in ResNet-50 architecture.</p>
Full article ">Figure 4
<p>Functional structure of a deep neural network classifier. This DNN classifier predicts the class from the extracted feature.</p>
Full article ">Figure 5
<p>Loss and accuracy graph for train and validation set. The curve shows how the loss and accuracy in the training and validation set are changing with respect to epoch.</p>
Full article ">Figure 6
<p>The AUC curves evaluate the model’s ability to classify between these four classes. Each curve represents a separate curve, and the area under the curve represents how a class is differentiated from the other three.</p>
Full article ">
19 pages, 17842 KiB  
Article
Order Space-Based Morphology for Color Image Processing
by Shanqian Sun, Yunjia Huang, Kohei Inoue and Kenji Hara
J. Imaging 2023, 9(7), 139; https://doi.org/10.3390/jimaging9070139 - 7 Jul 2023
Cited by 4 | Viewed by 1880
Abstract
Mathematical morphology is a fundamental tool based on order statistics for image processing, such as noise reduction, image enhancement and feature extraction, and is well-established for binary and grayscale images, whose pixels can be sorted by their pixel values, i.e., each pixel has [...] Read more.
Mathematical morphology is a fundamental tool based on order statistics for image processing, such as noise reduction, image enhancement and feature extraction, and is well-established for binary and grayscale images, whose pixels can be sorted by their pixel values, i.e., each pixel has a single number. On the other hand, each pixel in a color image has three numbers corresponding to three color channels, e.g., red (R), green (G) and blue (B) channels in an RGB color image. Therefore, it is difficult to sort color pixels uniquely. In this paper, we propose a method for unifying the orders of pixels sorted in each color channel separately, where we consider that a pixel exists in a three-dimensional space called order space, and derive a single order by a monotonically nondecreasing function defined on the order space. We also fuzzify the proposed order space-based morphological operations, and demonstrate the effectiveness of the proposed method by comparing with a state-of-the-art method based on hypergraph theory. The proposed method treats three orders of pixels sorted in respective color channels equally. Therefore, the proposed method is consistent with the conventional morphological operations for binary and grayscale images. Full article
(This article belongs to the Topic Color Image Processing: Models and Methods (CIP: MM))
Show Figures

Figure 1

Figure 1
<p>The <math display="inline"><semantics><mrow><mn>3</mn><mo>×</mo><mn>3</mn></mrow></semantics></math> structuring elements: (<b>a</b>) The <math display="inline"><semantics><mrow><mn>3</mn><mo>×</mo><mn>3</mn></mrow></semantics></math> ‘Cross’ structuring element is defined by a set of relative coordinates as <math display="inline"><semantics><mrow><mi>S</mi><mo>=</mo><mo>{</mo><mo>(</mo><mo>−</mo><mn>1</mn><mo>,</mo><mn>0</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mn>0</mn><mo>,</mo><mo>−</mo><mn>1</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mn>0</mn><mo>,</mo><mn>0</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mn>1</mn><mo>,</mo><mn>0</mn><mo>)</mo><mo>}</mo></mrow></semantics></math>. (<b>b</b>) The <math display="inline"><semantics><mrow><mn>3</mn><mo>×</mo><mn>3</mn></mrow></semantics></math> ‘Square’ structuring element is defined by a set of relative coordinates as <math display="inline"><semantics><mrow><mi>S</mi><mo>=</mo><mo>{</mo><mo>(</mo><mo>−</mo><mn>1</mn><mo>,</mo><mo>−</mo><mn>1</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mo>−</mo><mn>1</mn><mo>,</mo><mn>0</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mo>−</mo><mn>1</mn><mo>,</mo><mn>1</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mn>0</mn><mo>,</mo><mo>−</mo><mn>1</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mn>0</mn><mo>,</mo><mn>0</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mn>1</mn><mo>,</mo><mo>−</mo><mn>1</mn><mo>)</mo><mo>,</mo><mspace width="0.277778em"/><mo>(</mo><mn>1</mn><mo>,</mo><mn>0</mn><mo>)</mo><mo>,</mo><mo>(</mo><mn>1</mn><mo>,</mo><mn>1</mn><mo>)</mo><mo>}</mo></mrow></semantics></math>.</p>
Full article ">Figure 2
<p>Illustrations of 1D, 2D and 3D order spaces for RGB color images: (<b>a</b>) Five pixels are arranged in a line (or a one-dimensional order space) according to the ascending order of their R components. For binary and grayscale images, we can consider the similar 1D order space. (<b>b</b>) If we take into account R and G components simultaneously, then we have a two-dimensional order space onto which five pixels are located according to their orders. (<b>c</b>) If we take into account R, G and B components simultaneously, then we have a three-dimensional order space onto which five pixels are located according to their orders.</p>
Full article ">Figure 3
<p>Contour lines in three order spaces corresponding to (<a href="#FD7-jimaging-09-00139" class="html-disp-formula">7</a>)–(<a href="#FD9-jimaging-09-00139" class="html-disp-formula">9</a>): (<b>a</b>–<b>e</b>) <math display="inline"><semantics><msubsup><mi>o</mi><mi>ξ</mi><mi>S</mi></msubsup></semantics></math> given by (<a href="#FD7-jimaging-09-00139" class="html-disp-formula">7</a>) with fixed <math display="inline"><semantics><mrow><msup><mi>o</mi><mi mathvariant="normal">B</mi></msup><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>…</mo><mo>,</mo><mn>5</mn></mrow></semantics></math>, respectively. (<b>f</b>–<b>j</b>) <math display="inline"><semantics><msubsup><mi>o</mi><mi>ξ</mi><mi>P</mi></msubsup></semantics></math> given by (<a href="#FD8-jimaging-09-00139" class="html-disp-formula">8</a>) with fixed <math display="inline"><semantics><mrow><msup><mi>o</mi><mi mathvariant="normal">B</mi></msup><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>…</mo><mo>,</mo><mn>5</mn></mrow></semantics></math>, respectively. (<b>k</b>–<b>o</b>) <math display="inline"><semantics><msubsup><mi>o</mi><mi>ξ</mi><mi>M</mi></msubsup></semantics></math> given by (<a href="#FD9-jimaging-09-00139" class="html-disp-formula">9</a>) with fixed <math display="inline"><semantics><mrow><msup><mi>o</mi><mi mathvariant="normal">B</mi></msup><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>…</mo><mo>,</mo><mn>5</mn></mrow></semantics></math>, respectively. In (<b>l</b>–<b>n</b>), the contour lines for <math display="inline"><semantics><mrow><msup><mi>o</mi><mi>M</mi></msup><mo>=</mo><mn>2</mn><mo>,</mo><mspace width="0.277778em"/><mn>3</mn></mrow></semantics></math> and 4 are spread over the plane <math display="inline"><semantics><mrow><msup><mi>o</mi><mi mathvariant="normal">B</mi></msup><mo>=</mo><mn>2</mn><mo>,</mo><mspace width="0.277778em"/><mn>3</mn></mrow></semantics></math> and 4 as illustrated by light blue, light orange and light green areas, respectively.</p>
Full article ">Figure 4
<p>Original color image selected from SIDBA [<a href="#B22-jimaging-09-00139" class="html-bibr">22</a>].</p>
Full article ">Figure 5
<p>Order space-based morphological operations for a color image in <a href="#jimaging-09-00139-f004" class="html-fig">Figure 4</a>: The <b>top</b>, <b>middle</b> and <b>bottom</b> rows show the results with (<a href="#FD7-jimaging-09-00139" class="html-disp-formula">7</a>)–(<a href="#FD9-jimaging-09-00139" class="html-disp-formula">9</a>), respectively. The six columns from left to right show the results of dilation (Dilat.), erosion (Eros.), opening (Open.), closing (Clos.), open–closing (O.-c.) and close–opening (C.-o.) operations, respectively.</p>
Full article ">Figure 6
<p>Noisy input image including 10% impulse noise, where 6553 out of 256 × 256 = 65,536 pixels are corrupted with impulse noise.</p>
Full article ">Figure 7
<p>Noise removal results using Wang’s method (<b>top</b> row) and the proposed order space-based morphological operations (<b>middle</b> and <b>bottom</b> rows) with <math display="inline"><semantics><msup><mi>o</mi><mi>S</mi></msup></semantics></math> in (<a href="#FD7-jimaging-09-00139" class="html-disp-formula">7</a>) for a noisy image in <a href="#jimaging-09-00139-f006" class="html-fig">Figure 6</a>, where their subcaptions denote the ‘operation/structuring element’ used for computing respective images: The top row shows the results of Wang’s method, where ‘HG’ indicates that the structuring elements are given by hypergraphs. The middle and bottom rows show the results of the proposed method with ‘Cross’ and ‘Square’ structuring elements shown in <a href="#jimaging-09-00139-f001" class="html-fig">Figure 1</a>. The six columns from left to right show the results of dilation (Dilat.), erosion (Eros.), opening (Open.), closing (Clos.), open–closing (O.-c.) and close–opening (C.-o.) operations, respectively.</p>
Full article ">Figure 8
<p>Noise removal results for 12 color images in the SIDBA dataset [<a href="#B22-jimaging-09-00139" class="html-bibr">22</a>]: The original images in the top row receive impulse noise with densities varying from 10% (the leftmost two columns) to 60% (the rightmost two columns), as shown in the second row. The third and subsequent rows show the corresponding output images from the open–closing and close–opening operations in Wang’s method and the proposed method with <math display="inline"><semantics><mrow><msup><mi>o</mi><mi>S</mi></msup><mo>,</mo><mspace width="0.277778em"/><msup><mi>o</mi><mi>P</mi></msup></mrow></semantics></math> and <math display="inline"><semantics><msup><mi>o</mi><mi>M</mi></msup></semantics></math> in (<a href="#FD7-jimaging-09-00139" class="html-disp-formula">7</a>)–(<a href="#FD9-jimaging-09-00139" class="html-disp-formula">9</a>), respectively.</p>
Full article ">Figure 9
<p>Mean PSNR vs. noise density on SIDBA dataset [<a href="#B22-jimaging-09-00139" class="html-bibr">22</a>]: (<b>a</b>) Open–closing operations of Wang’s and the proposed methods are compared with each other, where the proposed method with <math display="inline"><semantics><msup><mi>o</mi><mi>S</mi></msup></semantics></math> (orange line) outperforms Wang’s method (blue line). (<b>b</b>) Similarly, close–opening operations are compared, where the proposed method with <math display="inline"><semantics><msup><mi>o</mi><mi>S</mi></msup></semantics></math> (orange line) also outperforms Wang’s method (blue line).</p>
Full article ">Figure 10
<p>Comparison of grayscale and color morphological operations with Gaussian and impulse noises. In this flow chart, grayscale conversions are followed by grayscale morphological operations (outer routes), and color morphological operations are followed by grayscale conversions (inner routes). Finally, the corresponding grayscale images are compared by computing PSNR or SSIM.</p>
Full article ">Figure 11
<p>Results of Gaussian noise removal, where Gaussian noise with mean 0 and standard deviation 30 are added to the original images in <a href="#jimaging-09-00139-f008" class="html-fig">Figure 8</a>: (<b>a</b>) PSNR values are compared for grayscale, Wang’s and our open–closing operations. (<b>b</b>) SSIM values are compared for grayscale, Wang’s and our open–closing operations.</p>
Full article ">Figure 12
<p>Results of impulse noise removal, where 50% impulse noise is added to the original images in <a href="#jimaging-09-00139-f008" class="html-fig">Figure 8</a>: (<b>a</b>) PSNR values are compared for grayscale, Wang’s and our open–closing operations. (<b>b</b>) SSIM values are compared for grayscale, Wang’s and our open–closing operations.</p>
Full article ">Figure 13
<p>Comparison of marginal open–closing, Max-tree-based area open–closing and the proposed order space-based open–closing for impulse noise removal: (<b>a</b>) PSNR values are compared for marginal, Max-tree-based area and the proposed open–closing operations. (<b>b</b>) SSIM values are compared for marginal, Max-tree-based area and the proposed open–closing operations.</p>
Full article ">Figure 14
<p>Output images obtained by (<b>a</b>) marginal open–closing, (<b>b</b>) Max-tree-based area open–closing and (<b>c</b>) the proposed order space-based open–closing.</p>
Full article ">Figure 15
<p>PSNR vs. <math display="inline"><semantics><mi>α</mi></semantics></math> for controlling the fuzziness of the proposed fuzzy morphological operations. This figure shows that <math display="inline"><semantics><mrow><mi>α</mi><mo>=</mo><mn>0.5</mn></mrow></semantics></math> gives the highest PSNR value in the proposed fuzzy open–closing operation applied to the noisy image in <a href="#jimaging-09-00139-f006" class="html-fig">Figure 6</a>.</p>
Full article ">Figure 16
<p>Noise removal results by fuzzy morphological operations with <math display="inline"><semantics><mrow><mi>α</mi><mo>=</mo><mn>0.5</mn></mrow></semantics></math> applied to the noisy image in <a href="#jimaging-09-00139-f006" class="html-fig">Figure 6</a>. The order of the output images (<b>a</b>–<b>f</b>) by six morphological operations, dilation (Dilat.), erosion (Eros.), opening (Open.), closing (Clos.), open–closing (O.-c.) and close–opening (C.-o.), is the same as that of <a href="#jimaging-09-00139-f007" class="html-fig">Figure 7</a>.</p>
Full article ">Figure 17
<p>Comparison of mean PSNR between crisp and fuzzy morphological operations: (<b>a</b>) Crisp (orange line) and fuzzy (purple line) open–closing operations are compared, where the former is the same as the orange line in <a href="#jimaging-09-00139-f009" class="html-fig">Figure 9</a>a. (<b>b</b>) Similarly, crisp (orange line) and fuzzy (purple line) close–opening operations are compared, where the former is the same as the orange line in <a href="#jimaging-09-00139-f009" class="html-fig">Figure 9</a>b.</p>
Full article ">
22 pages, 1615 KiB  
Article
VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction
by Wilson Bakasa and Serestina Viriri
J. Imaging 2023, 9(7), 138; https://doi.org/10.3390/jimaging9070138 - 7 Jul 2023
Cited by 14 | Viewed by 2633
Abstract
The prognosis of patients with pancreatic ductal adenocarcinoma (PDAC) is greatly improved by an early and accurate diagnosis. Several studies have created automated methods to forecast PDAC development utilising various medical imaging modalities. These papers give a general overview of the classification, segmentation, [...] Read more.
The prognosis of patients with pancreatic ductal adenocarcinoma (PDAC) is greatly improved by an early and accurate diagnosis. Several studies have created automated methods to forecast PDAC development utilising various medical imaging modalities. These papers give a general overview of the classification, segmentation, or grading of many cancer types utilising conventional machine learning techniques and hand-engineered characteristics, including pancreatic cancer. This study uses cutting-edge deep learning techniques to identify PDAC utilising computerised tomography (CT) medical imaging modalities. This work suggests that the hybrid model VGG16–XGBoost (VGG16—backbone feature extractor and Extreme Gradient Boosting—classifier) for PDAC images. According to studies, the proposed hybrid model performs better, obtaining an accuracy of 0.97 and a weighted F1 score of 0.97 for the dataset under study. The experimental validation of the VGG16–XGBoost model uses the Cancer Imaging Archive (TCIA) public access dataset, which has pancreas CT images. The results of this study can be extremely helpful for PDAC diagnosis from computerised tomography (CT) pancreas images, categorising them into five different tumours (T), node (N), and metastases (M) (TNM) staging system class labels, which are T0, T1, T2, T3, and T4. Full article
Show Figures

Figure 1

Figure 1
<p>Different steps to follow when using deep learning algorithms in medical image processing.</p>
Full article ">Figure 2
<p>(<b>A</b>) The classification process using traditional techniques, (<b>B</b>) the classification process using DL techniques, and (<b>C</b>) the classification process using the proposed model.</p>
Full article ">Figure 3
<p>Convolutional Neural Networks’ architecture blocks.</p>
Full article ">Figure 4
<p>Leaf-wise tree growth of LGBM.</p>
Full article ">Figure 5
<p>VGG16 architecture.</p>
Full article ">Figure 6
<p>Inception V3 architecture.</p>
Full article ">Figure 7
<p>XGBoost architecture.</p>
Full article ">Figure 8
<p>(<b>A</b>) A normal pancreas CT image; (<b>B</b>) PDAC at an advanced stage, with spreading to other organs; (<b>C</b>) Stages of PDAC development, which are represented as T0 to T4. The other stage, metastasis, is not represented because cancer at that point has spread to other organs surrounding the pancreas. PanIN (pancreatic intraepithelial neoplasia) with low-grade pancreatic intraepithelial neoplasia encompasses three older terms—PanIN-1A, PanIN-1B, and PanIN-2. These include Pancreatic Intraepithelial Neoplasia 1-A, Pancreatic Intraepithelial Neoplasia 1-B, and Pancreatic Intraepithelial Neoplasia 2. Pancreatic Intraepithelial Neoplasia 3 is high-grade pancreatic intraepithelial neoplasia, which then advances to PDAC.</p>
Full article ">Figure 9
<p>Proposed model, illustrating VGG16 as the feature extractor and XGBoost as the classifier.</p>
Full article ">Figure 10
<p>Confusion Matrix for using VGG16–XGBoost, LGBM–XGBoost, Inception V3–XGBoost, VGG16–RF, LGBM–RF, Inception V3–RF, VGG16–SVM, LGBM–SVM, and Inception V3–SVM combinations. The different classification (CL) models with the used feature extraction (FE) models are shown.</p>
Full article ">Figure 11
<p>ROC curves for SVM, RF, and XGBoost, showing True Positive against False Positive rates.</p>
Full article ">
14 pages, 3330 KiB  
Article
Improving Visual Defect Detection and Localization in Industrial Thermal Images Using Autoencoders
by Sasha Behrouzi, Marcel Dix, Fatemeh Karampanah, Omer Ates, Nissy Sasidharan, Swati Chandna and Binh Vu
J. Imaging 2023, 9(7), 137; https://doi.org/10.3390/jimaging9070137 - 7 Jul 2023
Cited by 2 | Viewed by 2144
Abstract
Reliable functionality in anomaly detection in thermal image datasets is crucial for defect detection of industrial products. Nevertheless, achieving reliable functionality is challenging, especially when datasets are image sequences captured during equipment runtime with a smooth transition from healthy to defective images. This [...] Read more.
Reliable functionality in anomaly detection in thermal image datasets is crucial for defect detection of industrial products. Nevertheless, achieving reliable functionality is challenging, especially when datasets are image sequences captured during equipment runtime with a smooth transition from healthy to defective images. This causes contamination of healthy training data with defective samples. Anomaly detection methods based on autoencoders are susceptible to a slight violation of a clean training dataset and lead to challenging threshold determination for sample classification. This paper indicates that combining anomaly scores leads to better threshold determination that effectively separates healthy and defective data. Our research results show that our approach helps to overcome these challenges. The autoencoder models in our research are trained with healthy images optimizing two loss functions: mean squared error (MSE) and structural similarity index measure (SSIM). Anomaly score outputs are used for classification. Three anomaly scores are applied: MSE, SSIM, and kernel density estimation (KDE). The proposed method is trained and tested on the 32 × 32-sized thermal images, including one contaminated dataset. The model achieved the following average accuracies across the datasets: MSE, 95.33%; SSIM, 88.37%; and KDE, 92.81%. Using a combination of anomaly scores could assist in solving a low classification accuracy. The use of KDE improves performance when healthy training data are contaminated. The MSE+ and SSIM+ methods, as well as two parameters to control quantitative anomaly localization using SSIM, are introduced. Full article
(This article belongs to the Special Issue Data Processing with Artificial Intelligence in Thermal Imagery)
Show Figures

Figure 1

Figure 1
<p>Position of four cameras monitoring switchgear equipment [<a href="#B4-jimaging-09-00137" class="html-bibr">4</a>].</p>
Full article ">Figure 2
<p>(<b>a</b>) Healthy and defective samples with relative pixel values. Defective pixel values exceeded a threshold highlighted (<b>b</b>). Thermal image transition from healthy to defective within 4 h experiment.</p>
Full article ">Figure 3
<p>Data pipeline of the anomaly detection network.</p>
Full article ">Figure 4
<p>(<b>a</b>) Comparison between input and output of autoencoder with MSE loss function. Healthy image anomaly score based on MSE is higher for defect images. (<b>b</b>) Comparison between input and output of autoencoder with <span class="html-italic">L</span> = 1−SSIM loss function. Healthy image anomaly score based on SSIM is lower for defect images.</p>
Full article ">Figure 5
<p>(<b>a</b>) MSE of defect and healthy samples (camera 94706). (<b>b</b>) Scatter plot of MSE anomaly scores (camera 94706). (<b>c</b>) SSIM of defect and healthy samples (camera 94706). (<b>d</b>) Scatter plot of SSIM anomaly scores (camera 94706). (<b>e</b>) Kernel density estimation of defect and healthy samples (camera 94706). (<b>f</b>) Scatter plot of Kernel density estimation anomaly scores (camera 94706).</p>
Full article ">Figure 6
<p>ROC curves for classification based on MSE scores. Poor performance of camera 94693 is visible.</p>
Full article ">Figure 7
<p>(<b>a</b>) Impact of combining KDE and MSE anomaly scores. (<b>b</b>) Impact of combining KDE and SSIM anomaly scores.</p>
Full article ">Figure 8
<p>Localization of defects on the defect images using per-pixel MSE anomaly score. Defective pixels appear gradually in the defect images taken within a 20 min defect duration. The presence of yellow pixels within the residual maps signifies the specific location of defects.</p>
Full article ">Figure 9
<p>Localization of defects on the defect images using per-pixel SSIM anomaly score <span class="html-italic">k</span> = 5, <span class="html-italic">k</span> = 30, and patch size = 3, patch size = 5.</p>
Full article ">
13 pages, 3947 KiB  
Article
VeerNet: Using Deep Neural Networks for Curve Classification and Digitization of Raster Well-Log Images
by M. Quamer Nasim, Narendra Patwardhan, Tannistha Maiti, Stefano Marrone and Tarry Singh
J. Imaging 2023, 9(7), 136; https://doi.org/10.3390/jimaging9070136 - 6 Jul 2023
Cited by 2 | Viewed by 2396
Abstract
Raster logs are scanned representations of the analog data recorded in subsurface drilling. Geologists rely on these images to interpret well-log curves and deduce the physical properties of geological formations. Scanned images contain various artifacts, including hand-written texts, brightness variability, scan defects, etc. [...] Read more.
Raster logs are scanned representations of the analog data recorded in subsurface drilling. Geologists rely on these images to interpret well-log curves and deduce the physical properties of geological formations. Scanned images contain various artifacts, including hand-written texts, brightness variability, scan defects, etc. The manual effort involved in reading the data is substantial. To mitigate this, unsupervised computer vision techniques are employed to extract and interpret the curves digitally. Existing algorithms predominantly require manual intervention, resulting in slow processing times, and are erroneous. This research aims to address these challenges by proposing VeerNet, a deep neural network architecture designed to semantically segment the raster images from the background grid to classify and digitize (i.e., extracting the analytic formulation of the written curve) the well-log data. The proposed approach is based on a modified UNet-inspired architecture leveraging an attention-augmented read–process–write strategy to balance retaining key signals while dealing with the different input–output sizes. The reported results show that the proposed architecture efficiently classifies and digitizes the curves with an overall F1 score of 35% and Intersection over Union of 30%, achieving 97% recall and 0.11 Mean Absolute Error when compared with real data on binary segmentation of multiple curves. Finally, we analyzed VeerNet’s ability in predicting Gamma-ray values, achieving a Pearson coefficient score of 0.62 when compared to measured data. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) 1,2,3 logs corresponding to GR, Caliper, and tension. (<b>b</b>) Header section showing the scales, of curves (<b>c</b>) depth line coding for 5′ log corresponding to 1000 m. Bottom: Flow chart showing the proposed approach.</p>
Full article ">Figure 2
<p>The proposed transformer-augmented U-Net.</p>
Full article ">Figure 3
<p>Comparison between the ground truth and prediction from the model trained using Lovaz Loss. (<b>a</b>) Rsater Log-Depth 4700 to 5000 m; (<b>c</b>) Rsater Log-Depth 4990 to 5650 m; (<b>e</b>) Rsater Log-Depth 7500 to 7950 m. (<b>b</b>,<b>d</b>,<b>f</b>) represent the fit between Ground Truth (GT) and predicted GR values for (<b>a</b>–<b>c</b>), respectively.</p>
Full article ">Figure 4
<p>Comparison between GT and prediction from model trained using SCE Loss. (<b>a</b>) Rsater Log-Depth 4700 to 5000 m; (<b>c</b>) Rsater Log-Depth 4990 to 5650 m; (<b>e</b>) Rsater Log-Depth 7500 to 7950 m. (<b>b</b>,<b>d</b>,<b>f</b>) represent the fit between GT and predicted GR values for (<b>a</b>–<b>c</b>), respectively.</p>
Full article ">Figure 5
<p>Comparison between GT and Prediction from model trained using SCE Loss. (<b>a</b>) Rsater Log-Depth 4700 to 5000 m; (<b>c</b>) Rsater Log-Depth 4990 to 5650 m; (<b>e</b>) Rsater Log-Depth 7500 to 7950 m. (<b>b</b>,<b>d</b>,<b>f</b>) represent the fit between GT and predicted CALI values for (<b>a</b>–<b>c</b>), respectively.</p>
Full article ">Figure 6
<p>Comparison between GT and prediction from model trained using Lovaz Loss. (<b>a</b>) Rsater Log-Depth 4700 to 5000 m; (<b>c</b>) Rsater Log-Depth 4990 to 5650 m; (<b>e</b>) Rsater Log-Depth 7500 to 7950 m. (<b>b</b>,<b>d</b>,<b>f</b>) represent the fit between GT and predicted CALI values for (<b>a</b>–<b>c</b>), respectively.</p>
Full article ">Figure 7
<p>Comparison between native LAS Gamma Ray and the graph-based method with corresponding depth sections of the raster log image. (<b>a</b>) Depth 4700 to 5000 m; (<b>b</b>) Depth 4990 to 5650 m; (<b>c</b>) 7500 to 7950 m.</p>
Full article ">
11 pages, 2637 KiB  
Article
Fast and Efficient Evaluation of the Mass Composition of Shredded Electrodes from Lithium-Ion Batteries Using 2D Imaging
by Peter Bischoff, Alexandra Kaas, Christiane Schuster, Thomas Härtling and Urs Peuker
J. Imaging 2023, 9(7), 135; https://doi.org/10.3390/jimaging9070135 - 5 Jul 2023
Cited by 4 | Viewed by 2004
Abstract
With the increasing number of electrical devices, especially electric vehicles, the need for efficient recycling processes of electric components is on the rise. Mechanical recycling of lithium-ion batteries includes the comminution of the electrodes and sorting the particle mixtures to achieve the highest [...] Read more.
With the increasing number of electrical devices, especially electric vehicles, the need for efficient recycling processes of electric components is on the rise. Mechanical recycling of lithium-ion batteries includes the comminution of the electrodes and sorting the particle mixtures to achieve the highest possible purities of the individual material components (e.g., copper and aluminum). An important part of recycling is the quantitative determination of the yield and recovery rate, which is required to adapt the processes to different feed materials. Since this is usually done by sorting individual particles manually before determining the mass of each material, we developed a novel method for automating this evaluation process. The method is based on detecting the different material particles in images based on simple thresholding techniques and analyzing the correlation of the area of each material in the field of view to the mass in the previously prepared samples. This can then be applied to further samples to determine their mass composition. Using this automated method, the process is accelerated, the accuracy is improved compared to a human operator, and the cost of the evaluation process is reduced. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Aluminum and copper particles from a battery cell after processing.</p>
Full article ">Figure 2
<p>Schematic drawing of our experimental setup, displaying the Raspberry Pi (a), the camera module (b), and the particles placed on the background (c).</p>
Full article ">Figure 3
<p>(<b>a</b>) Image of the mixed-material samples. (<b>b</b>) Resulting segmentation mask for the copper particles in the sample after color thresholding in the HSV color space. The segmentation mask clearly shows the copper particles. Due to irregular shapes and light reflections, the segmentation is superimposed with noise.</p>
Full article ">Figure 4
<p>Correlation between the relative amount of foreground pixels in the segmented images and the mass of the respective particles.</p>
Full article ">Figure 5
<p>Weighed mass of both materials in the sample versus the approximated mass determined by weighing the cumulative mass and subtracting the mass of aluminum approximated using our method.</p>
Full article ">
23 pages, 2614 KiB  
Review
Diagnostic Applications of Intraoral Scanners: A Systematic Review
by Francesca Angelone, Alfonso Maria Ponsiglione, Carlo Ricciardi, Giuseppe Cesarelli, Mario Sansone and Francesco Amato
J. Imaging 2023, 9(7), 134; https://doi.org/10.3390/jimaging9070134 - 3 Jul 2023
Cited by 15 | Viewed by 7391
Abstract
In addition to their recognized value for obtaining 3D digital dental models, intraoral scanners (IOSs) have recently been proven to be promising tools for oral health diagnostics. In this work, the most recent literature on IOSs was reviewed with a focus on their [...] Read more.
In addition to their recognized value for obtaining 3D digital dental models, intraoral scanners (IOSs) have recently been proven to be promising tools for oral health diagnostics. In this work, the most recent literature on IOSs was reviewed with a focus on their applications as detection systems of oral cavity pathologies. Those applications of IOSs falling in the general area of detection systems for oral health diagnostics (e.g., caries, dental wear, periodontal diseases, oral cancer) were included, while excluding those works mainly focused on 3D dental model reconstruction for implantology, orthodontics, or prosthodontics. Three major scientific databases, namely Scopus, PubMed, and Web of Science, were searched and explored by three independent reviewers. The synthesis and analysis of the studies was carried out by considering the type and technical features of the IOS, the study objectives, and the specific diagnostic applications. From the synthesis of the twenty-five included studies, the main diagnostic fields where IOS technology applies were highlighted, ranging from the detection of tooth wear and caries to the diagnosis of plaques, periodontal defects, and other complications. This shows how additional diagnostic information can be obtained by combining the IOS technology with other radiographic techniques. Despite some promising results, the clinical evidence regarding the use of IOSs as oral health probes is still limited, and further efforts are needed to validate the diagnostic potential of IOSs over conventional tools. Full article
(This article belongs to the Topic Digital Dentistry)
Show Figures

Figure 1

Figure 1
<p>PRISMA review workflow.</p>
Full article ">Figure 2
<p>Distribution of the included studies across the years. The number of studies is reported on the y-axis.</p>
Full article ">Figure 3
<p>Distribution of the main oral health pathologies/anomalies addressed in the included studies. The number of studies focusing on each pathology/anomaly is reported on the y-axis. The red circle highlights the two oral health topics occurring the most in the included studies (indeed, dental wear and dental caries diagnostics are addressed in almost 70% of the included studies).</p>
Full article ">Figure 4
<p>Main intraoral scanners used for dental caries detection. The number of studies adopting each specific scanner for caries evaluation is reported on the y-axis.</p>
Full article ">Figure 5
<p>The main intraoral scanners used for dental wear evaluation. The number of studies adopting each specific scanner for dental wear evaluation is reported on the y-axis.</p>
Full article ">Figure 6
<p>Main software used for the assessment of 3D dental models. The number of studies adopting each specific software tool is reported on the x-axis.</p>
Full article ">Figure 7
<p>The main software and tools used for dental wear evaluation.</p>
Full article ">
15 pages, 8102 KiB  
Article
Ambiguity in Solving Imaging Inverse Problems with Deep-Learning-Based Operators
by Davide Evangelista, Elena Morotti, Elena Loli Piccolomini and James Nagy
J. Imaging 2023, 9(7), 133; https://doi.org/10.3390/jimaging9070133 - 30 Jun 2023
Cited by 1 | Viewed by 1853
Abstract
In recent years, large convolutional neural networks have been widely used as tools for image deblurring, because of their ability in restoring images very precisely. It is well known that image deblurring is mathematically modeled as an ill-posed inverse problem and its solution [...] Read more.
In recent years, large convolutional neural networks have been widely used as tools for image deblurring, because of their ability in restoring images very precisely. It is well known that image deblurring is mathematically modeled as an ill-posed inverse problem and its solution is difficult to approximate when noise affects the data. Really, one limitation of neural networks for deblurring is their sensitivity to noise and other perturbations, which can lead to instability and produce poor reconstructions. In addition, networks do not necessarily take into account the numerical formulation of the underlying imaging problem when trained end-to-end. In this paper, we propose some strategies to improve stability without losing too much accuracy to deblur images with deep-learning-based methods. First, we suggest a very small neural architecture, which reduces the execution time for training, satisfying a green AI need, and does not extremely amplify noise in the computed image. Second, we introduce a unified framework where a pre-processing step balances the lack of stability of the following neural-network-based step. Two different pre-processors are presented. The former implements a strong parameter-free denoiser, and the latter is a variational-model-based regularized formulation of the latent imaging problem. This framework is also formally characterized by mathematical analysis. Numerical experiments are performed to verify the accuracy and stability of the proposed approaches for image deblurring when unknown or not-quantified noise is present; the results confirm that they improve the network stability with respect to noise. In particular, the model-based framework represents the most reliable trade-off between visual precision and robustness. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Figure 1
<p>A graphical draft highlighting the introduction of pre-processing steps Fi and St defining the proposed frameworks FiNN and StNN, respectively.</p>
Full article ">Figure 2
<p>A diagram representing the UNet and NAFNet architectures.</p>
Full article ">Figure 3
<p>From left to right: ground truth clean image, blurring kernel, blurred corrupted image.</p>
Full article ">Figure 4
<p>Results from experiment A with the three considered neural networks. <b>Upper row</b>: reconstruction from non-noisy data. <b>Lower row</b>: reconstruction from noisy data (<math display="inline"><semantics><mrow><mi>δ</mi><mo>=</mo><mn>0.025</mn></mrow></semantics></math>).</p>
Full article ">Figure 5
<p>Results from experiment A with UNet and 3L-SSNet.</p>
Full article ">Figure 6
<p>Results from experiment A. Plot of <math display="inline"><semantics><mrow><msub><mi mathvariant="script">E</mi><mi>ψ</mi></msub><mrow><mo>(</mo><msup><mi mathvariant="bold">x</mi><mrow><mi>g</mi><mi>t</mi></mrow></msup><mo>,</mo><msup><mi mathvariant="bold">y</mi><mi>δ</mi></msup><mo>)</mo></mrow><mo>−</mo><mi>η</mi></mrow></semantics></math> vs. <math display="inline"><semantics><mrow><mo>∥</mo><mi>e</mi><mo>∥</mo></mrow></semantics></math> for all the test images. The blue dashed line represents the bisect.</p>
Full article ">Figure 7
<p>Results from experiment B. On the <b>left</b>, tests on images with the same noise as in the training (<math display="inline"><semantics><mrow><mi>δ</mi><mo>=</mo><mn>0.025</mn></mrow></semantics></math>). On the <b>right</b>, tests on images with higher noise than in the training (<math display="inline"><semantics><mrow><mi>δ</mi><mo>=</mo><mn>0.075</mn></mrow></semantics></math>).</p>
Full article ">Figure 8
<p>Boxplots for the SSIM values in experiment B. The light blue, orange, and green boxplots represent the results computed by NN, FiNN, and StNN, respectively.</p>
Full article ">Figure 9
<p>Plots of the absolute error vs. the variance σ of the noise for one image in the test set. <b>Upper row</b>: experiment A. <b>Lower row</b>: experiment B.</p>
Full article ">
17 pages, 5941 KiB  
Article
Motion Vector Extrapolation for Video Object Detection
by Julian True and Naimul Khan
J. Imaging 2023, 9(7), 132; https://doi.org/10.3390/jimaging9070132 - 29 Jun 2023
Cited by 1 | Viewed by 2295
Abstract
Despite the continued successes of computationally efficient deep neural network architectures for video object detection, performance continually arrives at the great trilemma of speed versus accuracy versus computational resources (pick two). Current attempts to exploit temporal information in video data to overcome this [...] Read more.
Despite the continued successes of computationally efficient deep neural network architectures for video object detection, performance continually arrives at the great trilemma of speed versus accuracy versus computational resources (pick two). Current attempts to exploit temporal information in video data to overcome this trilemma are bottlenecked by the state of the art in object detection models. This work presents motion vector extrapolation (MOVEX), a technique which performs video object detection through the use of off-the-shelf object detectors alongside existing optical flow-based motion estimation techniques in parallel. This work demonstrates that this approach significantly reduces the baseline latency of any given object detector without sacrificing accuracy performance. Further latency reductions up to 24 times lower than the original latency can be achieved with minimal accuracy loss. MOVEX enables low-latency video object detection on common CPU-based systems, thus allowing for high-performance video object detection beyond the domain of GPU computing. Full article
(This article belongs to the Topic Visual Object Tracking: Challenges and Applications)
Show Figures

Figure 1

Figure 1
<p>Visualization of utilizing motion vectors to propagate a prior set of detections from a source frame to a consecutive frame (Motion vectors are denoted with green arrows. Red squares denote the detection bounding boxes). The only motion vectors considered in the source frame <span class="html-italic">i</span> are those which fall within the area of the bounding box. The median perturbation of those motion vectors is computed and applied to the source bounding box in order to predict the bounding box in frame <math display="inline"><semantics><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></semantics></math>.</p>
Full article ">Figure 2
<p>MOVEX implementation diagram.</p>
Full article ">Figure 3
<p>Simulated object detector latency versus AP (<b>a</b>) and average computation time for the proposed Movex approach on the MOT20 dataset (<b>b</b>). As the simulated object detector latency increased, the AP decreased, but the decrease only became apparent after increasing the latency to 200 ms. For reference, a data point with 0 ms of latency was included to depict the original performance of the object detector detections provided with the MOT20 dataset. The average computation time per frame rose as the object detector latency increased, and the prior update step became incredibly time consuming due to the size of the buffer accumulated during the object detector inference time.</p>
Full article ">Figure 4
<p>Latency versus frame number for a sample trace taken from the MOT20 dataset and the total computation time for first 100 frames of the sequence MOT20-02 using an object detector with 200 ms latency. The dashed line depicts the threshold considered real time for this particular sequence with a frame rate of 25 fps.</p>
Full article ">Figure 5
<p>Sample frames from MOT16 dataset (<b>left</b>) and MOT20 dataset (<b>right</b>). These samples represent the different types of challenges inherent to the datasets. MOT16 is composed of moving cameras with a moderate number of people, while MOT20 is composed of largely static cameras focusing on large clusters of people.</p>
Full article ">
19 pages, 2421 KiB  
Article
Automated Vehicle Counting from Pre-Recorded Video Using You Only Look Once (YOLO) Object Detection Model
by Mishuk Majumder and Chester Wilmot
J. Imaging 2023, 9(7), 131; https://doi.org/10.3390/jimaging9070131 - 27 Jun 2023
Cited by 16 | Viewed by 6822
Abstract
Different techniques are being applied for automated vehicle counting from video footage, which is a significant subject of interest to many researchers. In this context, the You Only Look Once (YOLO) object detection model, which has been developed recently, has emerged as a [...] Read more.
Different techniques are being applied for automated vehicle counting from video footage, which is a significant subject of interest to many researchers. In this context, the You Only Look Once (YOLO) object detection model, which has been developed recently, has emerged as a promising tool. In terms of accuracy and flexible interval counting, the adequacy of existing research on employing the model for vehicle counting from video footage is unlikely sufficient. The present study endeavors to develop computer algorithms for automated traffic counting from pre-recorded videos using the YOLO model with flexible interval counting. The study involves the development of algorithms aimed at detecting, tracking, and counting vehicles from pre-recorded videos. The YOLO model was applied in TensorFlow API with the assistance of OpenCV. The developed algorithms implement the YOLO model for counting vehicles in two-way directions in an efficient way. The accuracy of the automated counting was evaluated compared to the manual counts, and was found to be about 90 percent. The accuracy comparison also shows that the error of automated counting consistently occurs due to undercounting from unsuitable videos. In addition, a benefit–cost (B/C) analysis shows that implementing the automated counting method returns 1.76 times the investment. Full article
(This article belongs to the Special Issue Visual Localization—Volume II)
Show Figures

Figure 1

Figure 1
<p>The flow chart diagram of a successful vehicle detection by YOLO.</p>
Full article ">Figure 2
<p>The visualization of the first frame, mid-line, and parallel lines by the program. (<b>a</b>) The first frame of a video file displayed by the program. (<b>b</b>) A mid-line as displayed in yellow color and two parallel lines in white color, drawn by the user.</p>
Full article ">Figure 3
<p>A typical dotted centerline as displayed as a red dot and shown by a red arrow, and rectangular bounding boxes as displayed in purple color.</p>
Full article ">Figure 4
<p>Vehicle counts displayed on screen as shown as R2L and L2R.</p>
Full article ">Figure 5
<p>Error due to arrival and departure speed.</p>
Full article ">Figure 6
<p>Unsuitable camera angle and frame.</p>
Full article ">
30 pages, 10861 KiB  
Article
Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework
by Hayat Ullah and Arslan Munir
J. Imaging 2023, 9(7), 130; https://doi.org/10.3390/jimaging9070130 - 26 Jun 2023
Cited by 14 | Viewed by 3221
Abstract
Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown [...] Read more.
Vision-based human activity recognition (HAR) has emerged as one of the essential research areas in video analytics. Over the last decade, numerous advanced deep learning algorithms have been introduced to recognize complex human actions from video streams. These deep learning algorithms have shown impressive performance for the video analytics task. However, these newly introduced methods either exclusively focus on model performance or the effectiveness of these models in terms of computational efficiency, resulting in a biased trade-off between robustness and computational efficiency in their proposed methods to deal with challenging HAR problem. To enhance both the accuracy and computational efficiency, this paper presents a computationally efficient yet generic spatial–temporal cascaded framework that exploits the deep discriminative spatial and temporal features for HAR. For efficient representation of human actions, we propose an efficient dual attentional convolutional neural network (DA-CNN) architecture that leverages a unified channel–spatial attention mechanism to extract human-centric salient features in video frames. The dual channel–spatial attention layers together with the convolutional layers learn to be more selective in the spatial receptive fields having objects within the feature maps. The extracted discriminative salient features are then forwarded to a stacked bi-directional gated recurrent unit (Bi-GRU) for long-term temporal modeling and recognition of human actions using both forward and backward pass gradient learning. Extensive experiments are conducted on three publicly available human action datasets, where the obtained results verify the effectiveness of our proposed framework (DA-CNN+Bi-GRU) over the state-of-the-art methods in terms of model accuracy and inference runtime across each dataset. Experimental results show that the DA-CNN+Bi-GRU framework attains an improvement in execution time up to 167× in terms of frames per second as compared to most of the contemporary action-recognition methods. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

Figure 1
<p>A graphical overview of our proposed activity-recognition framework. The proposed framework consists of three main modules: CNN architecture, dual channel and spatial attention module, and bi-directional GRU network. The CNN module utilizes a dual-attention mechanism to effectively extract salient CNN features from video frames, whereas the bi-directional GRU network is used to learn the activity representation for hidden sequential patterns.</p>
Full article ">Figure 2
<p>The building blocks of dual attention blocks containing channel and spatial attention mechanisms in detail.</p>
Full article ">Figure 3
<p>Visual representation of the salient object-specific regions computed with our dual attention mechanism.</p>
Full article ">Figure 4
<p>The building block of bi-directional single GRU layer.</p>
Full article ">Figure 5
<p>Validation history of our proposed DA-CNN+Bi-GRU framework along with other tested baseline methods for 300 epochs over three benchmark action datasets: (<b>a</b>) Validation history for YouTube action dataset, (<b>b</b>) Validation history for UCF50 dataset, (<b>c</b>) Validation history for HMDB51 dataset, (<b>d</b>) Validation history for UCF101 dataset, and (<b>e</b>) Validation history for Kinetics-600 dataset.</p>
Full article ">Figure 6
<p>Confusion matrices computed for the proposed DA-CNN+Bi-GRU for the test sets of five tested datasets: (<b>a</b>) YouTube Action dataset, (<b>b</b>) HMDB51 dataset, (<b>c</b>) UCF50 dataset, (<b>d</b>) UCF101 dataset, and (<b>e</b>) Kinetics-600 dataset.</p>
Full article ">Figure 7
<p>Category-wise accuracy of the proposed DA-CNN+Bi-GRU on the test sets of five tested datasets: (<b>a</b>) YouTube Action dataset, (<b>b</b>) HMDB51 dataset, (<b>c</b>) UCF50 dataset, (<b>d</b>) UCF101 dataset, and (<b>e</b>) Kinetics-600 dataset.</p>
Full article ">Figure 8
<p>The visual recognition results of our proposed DA-CNN+Bi-GRU framework with predicted classes and their confidence scores for the test videos taken from the YouTube action, UCF50, and HMDB51 datasets.</p>
Full article ">
18 pages, 14748 KiB  
Article
A Joint De-Rain and De-Mist Network Based on the Atmospheric Scattering Model
by Linyun Gu, Huahu Xu and Xiaojin Ma
J. Imaging 2023, 9(7), 129; https://doi.org/10.3390/jimaging9070129 - 26 Jun 2023
Viewed by 1406
Abstract
Rain can have a detrimental effect on optical components, leading to the appearance of streaks and halos in images captured during rainy conditions. These visual distortions caused by rain and mist contribute significant noise information that can compromise image quality. In this paper, [...] Read more.
Rain can have a detrimental effect on optical components, leading to the appearance of streaks and halos in images captured during rainy conditions. These visual distortions caused by rain and mist contribute significant noise information that can compromise image quality. In this paper, we propose a novel approach for simultaneously removing both streaks and halos from the image to produce clear results. First, based on the principle of atmospheric scattering, a rain and mist model is proposed to initially remove the streaks and halos from the image by reconstructing the image. The Deep Memory Block (DMB) selectively extracts the rain layer transfer spectrum and the mist layer transfer spectrum from the rainy image to separate these layers. Then, the Multi-scale Convolution Block (MCB) receives the reconstructed images and extracts both structural and detailed features to enhance the overall accuracy and robustness of the model. Ultimately, extensive results demonstrate that our proposed model JDDN (Joint De-rain and De-mist Network) outperforms current state-of-the-art deep learning methods on synthetic datasets as well as real-world datasets, with an average improvement of 0.29 dB on the heavy-rainy-image dataset. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Figure 1
<p>The general design network structure of Joint De-rain and De-mist Network (JDDN).</p>
Full article ">Figure 2
<p>Schematic diagram of the Deep Memory Block (DMB).</p>
Full article ">Figure 3
<p>Structure of Multi-scale Convolution Block (MCB).</p>
Full article ">Figure 4
<p>Comparison of the visual quality of the JDDN model on synthetic rainy-image dataset Rain200H.</p>
Full article ">Figure 5
<p>Comparison of visual quality of JDDN model on synthetic rainy-image dataset Rain200L.</p>
Full article ">Figure 6
<p>The results of JDDN on Rain1400 dataset.</p>
Full article ">Figure 7
<p>Enlarged details of JDDN’s test results on the Rain1400 dataset.</p>
Full article ">Figure 8
<p>Visual comparison of processing results for real-rain images on the SPA-data dataset.</p>
Full article ">Figure 9
<p>Visual comparison of the effect of removing the different modules.</p>
Full article ">
20 pages, 4934 KiB  
Article
Hybrid Classical–Quantum Transfer Learning for Cardiomegaly Detection in Chest X-rays
by Pierre Decoodt, Tan Jun Liang, Soham Bopardikar, Hemavathi Santhanam, Alfaxad Eyembe, Begonya Garcia-Zapirain and Daniel Sierra-Sosa
J. Imaging 2023, 9(7), 128; https://doi.org/10.3390/jimaging9070128 - 25 Jun 2023
Cited by 5 | Viewed by 4901
Abstract
Cardiovascular diseases are among the major health problems that are likely to benefit from promising developments in quantum machine learning for medical imaging. The chest X-ray (CXR), a widely used modality, can reveal cardiomegaly, even when performed primarily for a non-cardiological indication. Based [...] Read more.
Cardiovascular diseases are among the major health problems that are likely to benefit from promising developments in quantum machine learning for medical imaging. The chest X-ray (CXR), a widely used modality, can reveal cardiomegaly, even when performed primarily for a non-cardiological indication. Based on pre-trained DenseNet-121, we designed hybrid classical–quantum (CQ) transfer learning models to detect cardiomegaly in CXRs. Using Qiskit and PennyLane, we integrated a parameterized quantum circuit into a classic network implemented in PyTorch. We mined the CheXpert public repository to create a balanced dataset with 2436 posteroanterior CXRs from different patients distributed between cardiomegaly and the control. Using k-fold cross-validation, the CQ models were trained using a state vector simulator. The normalized global effective dimension allowed us to compare the trainability in the CQ models run on Qiskit. For prediction, ROC AUC scores up to 0.93 and accuracies up to 0.87 were achieved for several CQ models, rivaling the classical–classical (CC) model used as a reference. A trustworthy Grad-CAM++ heatmap with a hot zone covering the heart was visualized more often with the QC option than that with the CC option (94% vs. 61%, p < 0.001), which may boost the rate of acceptance by health professionals. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Images from the cardiomegaly subset along with their counterpart from the control subset. First column: no positive label for any other finding. Three last columns: cases of pleural effusion, edema and lung opacity, which were the findings most frequently associated with cardiomegaly in the dataset.</p>
Full article ">Figure 2
<p>High-level model design. Six CXRs are represented (<b>A</b>–<b>F</b>) to describe the process output. Cardiomegaly is detected in (<b>A</b>,<b>C</b>,<b>F</b>).</p>
Full article ">Figure 3
<p>Training models for classification: On the left, a model based on pre-trained DenseNet-121. On the right, a model based on pre-trained AlexNet. In both versions, the flowchart forks into the classical and quantum versions of the trainable classifier. n: number of qubits.</p>
Full article ">Figure 4
<p>Qiskit rendering of the PQC with four qubits. After initialization in the ground state, all qubits are first placed in a superposition state by applying Hadamard gates (H). A feature map is produced by encoding each qubit by a φ rotation around the y-axis (Ry gates). Then, the ansatz consists of a series of entanglement by 2-qubit CNOT gates, each followed by a θ rotation around the <span class="html-italic">y</span>-axis at a quantum depth of 4.</p>
Full article ">Figure 5
<p>ROC curves obtained by 10-fold cross-validation in four CC models (test set).</p>
Full article ">Figure 6
<p>ROC curves obtained by 10-fold cross-validation in five CQ models (test set).</p>
Full article ">Figure 7
<p>Original CXRs (left) along with the corresponding Grad-CAM++ heatmaps obtained with the last convolutional layer from the three models compared for trusworthiness. (<b>a</b>): Normal heart. Large hot zone including the heart with the CC model, hot zones covering the heart with the CQ models. (<b>b</b>): Cardiomegaly and artificial pacemaker. Hot zones covering the heart with the three models. (<b>c</b>): Cardiomegaly. Hot zone in the right lung base for the CC model (example of non-trustworthy heatmap), hot zones covering the heart for the CQ models.</p>
Full article ">Figure 8
<p>(<b>a</b>) NGED for the quantum layer in the classifier in Qiskit four-qubit models with four-dimensional (4-dim) and two-dimensional output (2-dim), each with 24 trainable parameters. (<b>b</b>) Training loss curves observed in these models with and without freezer.</p>
Full article ">Figure A1
<p>ROC curves for the CC models by 70/30 train–test split: (<b>a</b>) Training set. (<b>b</b>) Test set.</p>
Full article ">Figure A2
<p>ROC curves for the QC models by 70/30 train–test split: (<b>a</b>) Training set. (<b>b</b>) Test set.</p>
Full article ">Figure A3
<p>Training loss curves and standard deviation for the CC (<b>a</b>) and QC (<b>b</b>) models.</p>
Full article ">Figure A4
<p>Upper left: confusion matrix for the training set observed for the CC model with Densenet 121 as a feature extractor. Upper right box: two CXRs labeled as control and predicted cardiomegaly. Lower box: 9 CXRs labeled as cardiomegaly and predicted control.</p>
Full article ">
17 pages, 4491 KiB  
Article
Measuring Dental Enamel Thickness: Morphological and Functional Relevance of Topographic Mapping
by Armen V. Gaboutchian, Vladimir A. Knyaz, Evgeniy N. Maschenko, Le Xuan Dac, Anatoly A. Maksimov, Anton V. Emelyanov, Dmitry V. Korost and Nikita V. Stepanov
J. Imaging 2023, 9(7), 127; https://doi.org/10.3390/jimaging9070127 - 23 Jun 2023
Cited by 1 | Viewed by 3031
Abstract
The interest in the development of dental enamel thickness measurement techniques is connected to the importance of metric data in taxonomic assessments and evolutionary research as well as in other directions of dental studies. At the same time, advances in non-destructive imaging techniques [...] Read more.
The interest in the development of dental enamel thickness measurement techniques is connected to the importance of metric data in taxonomic assessments and evolutionary research as well as in other directions of dental studies. At the same time, advances in non-destructive imaging techniques and the application of scanning methods, such as micro-focus-computed X-ray tomography, has enabled researchers to study the internal morpho-histological layers of teeth with a greater degree of accuracy and detail. These tendencies have contributed to changes in established views in different areas of dental research, ranging from the interpretation of morphology to metric assessments. In fact, a significant amount of data have been obtained using traditional metric techniques, which now should be critically reassessed using current technologies and methodologies. Hence, we propose new approaches for measuring dental enamel thickness using palaeontological material from the territories of northern Vietnam by means of automated and manually operated techniques. We also discuss method improvements, taking into account their relevance for dental morphology and occlusion. As we have shown, our approaches demonstrate the potential to form closer links between the metric data and dental morphology and provide the possibility for objective and replicable studies on dental enamel thickness through the application of automated techniques. These features are likely to be effective in more profound taxonomic research and for the development of metric and analytical systems. Our technique provides scope for its targeted application in clinical methods, which could help to reveal functional changes in the masticatory system. However, this will likely require improvements in clinically applicable imaging techniques. Full article
Show Figures

Figure 1

Figure 1
<p>Views of the <span class="html-italic">Gigantopithecus</span> tooth.</p>
Full article ">Figure 2
<p>Views of the orangutan tooth.</p>
Full article ">Figure 3
<p>Three-dimensional views of the coronal parts of the teeth with the dental enamel thickness mapped in colour scale. (<b>a</b>) Orangutan lower left second molar; (<b>b</b>) <span class="html-italic">Gigantopithecus</span> lower right second molar.</p>
Full article ">Figure 4
<p>Cervical edge of the enamel cap detected for the 3D model orientation.</p>
Full article ">Figure 5
<p>Occlusal surface contour of the <span class="html-italic">Gigantopithecus</span> tooth on the enamel (<b>a</b>) and dentine (<b>b</b>) surfaces. (<b>a</b>) Occlusal contour on the enamel; and (<b>b</b>) dentine.</p>
Full article ">Figure 6
<p>Orangutan tooth with sections on its occlusal surface and the midline set in green points.</p>
Full article ">Figure 7
<p>Two-dimensional model of the deepest point (yellow) on the enamel occlusal surface of the orangutan molar.</p>
Full article ">Figure 8
<p>Models of the occlusal surface and midline contours depicting the outer enamel (<b>a</b>) and dentine surfaces (<b>b</b>). (<b>a</b>) Occlusal and midline contours on the enamel; (<b>b</b>) occlusal and midline contours on the dentine.</p>
Full article ">Figure 9
<p>The bucco–lingual section of the <span class="html-italic">Gigantopithecus</span>’ lower molar (<b>left</b>) with the mapped parameters used for the enamel thickness measurements (<b>right</b>). 1—<math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mi>B</mi> <mi>A</mi> </mrow> </msub> </semantics></math>—enamel area in the buccal sector; 2—<math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mi>L</mi> <mi>A</mi> </mrow> </msub> </semantics></math>—enamel area in the lingual sector; 3—<math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mi>B</mi> <mi>C</mi> <mi>L</mi> <mi>o</mi> <mi>e</mi> <mi>s</mi> </mrow> </msub> </semantics></math>—outer enamel surface contour length in the buccal sector; 4—<math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mi>S</mi> <mi>B</mi> <mi>C</mi> <mi>L</mi> <mi>e</mi> <mi>d</mi> <mi>j</mi> </mrow> </msub> </semantics></math>—enamel–dentine junction contour length in the buccal sector; 5—<math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mi>S</mi> <mi>L</mi> <mi>C</mi> <mi>L</mi> <mi>o</mi> <mi>e</mi> <mi>s</mi> </mrow> </msub> </semantics></math>—outer enamel surface contour length in the lingual sector; 6—<math display="inline"><semantics> <msub> <mi>S</mi> <mrow> <mi>S</mi> <mi>L</mi> <mi>C</mi> <mi>L</mi> <mi>e</mi> <mi>d</mi> <mi>j</mi> </mrow> </msub> </semantics></math>—enamel–dentine junction contour length in the lingual sector.</p>
Full article ">
15 pages, 7229 KiB  
Article
Fast Reservoir Characterization with AI-Based Lithology Prediction Using Drill Cuttings Images and Noisy Labels
by Ekaterina Tolstaya, Anuar Shakirov, Mokhles Mezghani and Sergey Safonov
J. Imaging 2023, 9(7), 126; https://doi.org/10.3390/jimaging9070126 - 21 Jun 2023
Cited by 1 | Viewed by 2262
Abstract
In this paper, we considered one of the problems that arise during drilling automation, namely the automation of lithology identification from drill cuttings images. Usually, this work is performed by experienced geologists, but this is a tedious and subjective process. Drill cuttings are [...] Read more.
In this paper, we considered one of the problems that arise during drilling automation, namely the automation of lithology identification from drill cuttings images. Usually, this work is performed by experienced geologists, but this is a tedious and subjective process. Drill cuttings are the cheapest source of rock formation samples; therefore, reliable lithology prediction can greatly reduce the cost of analysis during drilling. To predict the lithology content from images of cuttings samples, we used a convolutional neural network (CNN). For training a model with an acceptable generalization ability, we applied dataset-cleaning techniques, which help to reveal bad samples, as well as samples with uncertain labels. It was shown that the model trained on a cleaned dataset performs better in terms of accuracy. Data cleaning was performed using a cross-validation technique, as well as a clustering analysis of embeddings, where it is possible to identify clusters with distinctive visual characteristics and clusters where visually similar samples of rocks are attributed to different lithologies during the labeling process. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

Figure 1
<p>Workflow for predicting rock lithology from drill cuttings images by means of deep learning.</p>
Full article ">Figure 2
<p>Dataset balancing.</p>
Full article ">Figure 3
<p>Deep learning model architecture. N_lith stands for total number of lithotypes to predict.</p>
Full article ">Figure 4
<p>Examples of drill cutting images for different lithotypes.</p>
Full article ">Figure 5
<p>Bar plot representing total number of samples per lithotype within the dataset.</p>
Full article ">Figure 6
<p>Example of distinct visual appearance for siltstone coming from different depth intervals and formations.</p>
Full article ">Figure 7
<p>Accuracy curves for classifier training on 100% single lithotypes images (<b>left panel</b>); confusion matrix (<b>right panel</b>).</p>
Full article ">Figure 8
<p>Examples of bad images with mud contamination.</p>
Full article ">Figure 9
<p>Plot of RMSE error versus prediction confidence for one of the cross-validation runs.</p>
Full article ">Figure 10
<p>Distribution of lithotypes in clusters.</p>
Full article ">Figure 11
<p>Cluster with 72 samples of argillaceous limestone and 17 samples of limestone (<b>top row</b>, from train set; <b>bottom row</b>, from test set; true label in brackets). Here, visually similar images have different labels; therefore, labeling is not reliable.</p>
Full article ">Figure 12
<p>Cluster with 86 samples of dolomitic limestone and 83 samples of limestone (<b>top row</b>, from train set; <b>bottom row</b>, from test set; true label in brackets). Here, visually similar images have different labels; therefore, labeling is not reliable.</p>
Full article ">Figure 13
<p>Cluster where the amount of most represented lithotype is larger (in our experiment, 10 times larger) than second-most represented lithotype, providing good classification results (<b>top row</b>, from train set; <b>bottom row</b>, from test set; true label in brackets).</p>
Full article ">Figure 14
<p>Bar plot representing total number of samples per sub-cluster of lithotypes.</p>
Full article ">Figure 15
<p>Results of data balancing for original dataset.</p>
Full article ">Figure 16
<p>Results of data balancing for dataset with sublabels. Numbers after lithology short names indicate the relevance to the clusters.</p>
Full article ">Figure 17
<p>Accuracy curves for classifier training on the dataset without (<b>A</b>) and with (<b>B</b>) additional sublabels after data balancing.</p>
Full article ">
12 pages, 1692 KiB  
Article
Quantifying the Displacement of Data Matrix Code Modules: A Comparative Study of Different Approximation Approaches for Predictive Maintenance of Drop-on-Demand Printing Systems
by Peter Bischoff, André V. Carreiro, Christiane Schuster and Thomas Härtling
J. Imaging 2023, 9(7), 125; https://doi.org/10.3390/jimaging9070125 - 21 Jun 2023
Viewed by 1746
Abstract
Drop-on-demand printing using colloidal or pigmented inks is prone to the clogging of printing nozzles, which can lead to positional deviations and inconsistently printed patterns (e.g., data matrix codes, DMCs). However, if such deviations are detected early, they can be useful for determining [...] Read more.
Drop-on-demand printing using colloidal or pigmented inks is prone to the clogging of printing nozzles, which can lead to positional deviations and inconsistently printed patterns (e.g., data matrix codes, DMCs). However, if such deviations are detected early, they can be useful for determining the state of the print head and planning maintenance operations prior to reaching a printing state where the printed DMCs are unreadable. To realize this predictive maintenance approach, it is necessary to accurately quantify the positional deviation of individually printed dots from the actual target position. Here, we present a comparison of different methods based on affinity transformations and clustering algorithms for calculating the target position from the printed positions and, subsequently, the deviation of both for complete DMCs. Hence, our method focuses on the evaluation of the print quality, not on the decoding of DMCs. We compare our results to a state-of-the-art decoding algorithm, adopted to return the target grid positions, and find that we can determine the occurring deviations with significantly higher accuracy, especially when the printed DMCs are of low quality. The results enable the development of decision systems for predictive maintenance and subsequently the optimization of printing systems. Full article
(This article belongs to the Topic Computer Vision and Image Processing)
Show Figures

Figure 1

Figure 1
<p>Visualization of the different positional error modes on a specially printed test pattern of 16 columns and 20 rows (partially displayed): a fading shift of the first few dots (e.g., third and fourth column from left) (<b>a</b>) and a constant shift of one or more columns (<b>b</b>).</p>
Full article ">Figure 2
<p>(<b>a</b>) A standard data matrix code with connected modules. (<b>b</b>) A dot-printed data matrix code printed with the DoD process analyzed here.</p>
Full article ">Figure 3
<p>Flowchart of the methodology for the performance evaluation of our algorithms and comparison against the Adapted CDCR algorithm. After creating the synthetic DMCs with the target positions <span class="html-italic">T</span>, deviations are applied for the printed positions <span class="html-italic">P</span>. The algorithms are evaluated by the distance between the approximated target positions <math display="inline"><semantics> <msub> <mi>T</mi> <mrow> <mi>a</mi> <mi>p</mi> <mi>p</mi> <mi>r</mi> <mi>o</mi> <mi>x</mi> </mrow> </msub> </semantics></math> and the actual target positions <span class="html-italic">T</span>.</p>
Full article ">Figure 4
<p>Applying the algorithms to actually printed DMCs after preprocessing by localizing the region of interest (ROI) and segmenting the ROI to find the center positions of each blob. The printed positions <span class="html-italic">P</span> are rotated until the smallest bounding box is found to ensure that the DMCs finder pattern is parallel to the image axes. After applying the algorithms, the found positions are rotated by the inverse rotation matrix from the previous step to match <span class="html-italic">P</span>.</p>
Full article ">Figure 5
<p>Runtime of the compared algorithms for a single data matrix code averaged over 100 runs, including the standard deviation of the runtime.</p>
Full article ">Figure 6
<p>Distribution of the number of matches of real target positions and approximated target positions <span class="html-italic">T</span> after applying a threshold of <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>/</mo> <mn>6</mn> </mrow> </semantics></math> of the module size for each of the tested algorithms. (<b>a</b>) shows the distributions considering only the fading shift error while (<b>b</b>) shows the distributions of matches when analyzing only the constant shift error. The blue bar displays the range between the first and the third quartile. The whiskers span over 1.5 times the interquartile range (IQR) (<math display="inline"><semantics> <mrow> <mi>Q</mi> <mn>3</mn> <mo>−</mo> <mi>Q</mi> <mn>1</mn> </mrow> </semantics></math>) and the outer contour is approximated through kernel density estimation (KDE).</p>
Full article ">Figure 7
<p>Comparison of the best-performing algorithm from our work to an adapted version of the best-performing algorithm with respect to decoding DMCs from [<a href="#B6-jimaging-09-00125" class="html-bibr">6</a>], over all simulations described in <a href="#sec2dot2dot1-jimaging-09-00125" class="html-sec">Section 2.2.1</a>.</p>
Full article ">Figure 8
<p>Comparison of the time series analysis by both versions of both the clustering- and the affine-transformation-based methods. The differences in absolute values are not of as much interest as the variance along the time axis.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop