ELCVIA Electronic Letters on Computer Vision and Image Analysis https://elcvia.cvc.uab.cat/ Electronic Journal on Computer Vision and Image Analysis en-US Authors who publish with this journal agree to the following terms:<br /><ol type="a"><li>Authors retain copyright.</li><li>The texts published in this journal are – unless indicated otherwise – covered by the Creative Commons Spain <a href="http://creativecommons.org/licenses/by-nc-nd/4.0">Attribution-NonComercial-NoDerivatives 4.0</a> licence. You may copy, distribute, transmit and adapt the work, provided you attribute it (authorship, journal name, publisher) in the manner specified by the author(s) or licensor(s). The full text of the licence can be consulted here: <a href="http://creativecommons.org/licenses/by-nc-nd/4.0">http://creativecommons.org/licenses/by-nc-nd/4.0</a>.</li><li>Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li><li>Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See <a href="http://opcit.eprints.org/oacitation-biblio.html" target="_new">The Effect of Open Access</a>).</li></ol> elcvia@cvc.uab.cat (Electronic Letters on Computer Vision and Image Analysis) elcvia@cvc.uab.cat (ELCVIA) Fri, 06 Sep 2024 07:58:29 +0000 OJS 3.2.1.4 http://blogs.law.harvard.edu/tech/rss 60 Deep Learning based-framework for Math Formulas Understanding https://elcvia.cvc.uab.cat/article/view/1833 <p>Extracting mathematical formulas from images of scientific documents and converting them into structured data for storage in a database is essential for their further use. However, recognizing and extracting math formulas automatically, rapidly, and effectively can be challenging. To handle this problem, we have proposed a system, with a deep learning architecture, that uses the formula combination features to train the YOLOv8 model. This system can detect and classify the formula inside and outside the text. Once extracted, we built a robust end-to-end math formula recognition system that automatically identifies and classifies math symbols, using the faster R-CNN object detection, then a Convolution Graphical Neural network (ConvGNN) to analyze the math formula layout, as the formula is better represented as a graph with complex relationships and object interdependency. ConvGNN can predict formula linkages without resorting to laborious feature engineering. Experimental results on the IBEM and CROHME 2019 datasets reveal that the proposed approach can accurately extract isolated formulas with mAP of 99.3\%, embedded formulas with mAP of 80.3%, detect symbols with mAP of 87.3%, and analyze formula layout with an accuracy of 92%. We also showed that our system is competitive with related work.</p> Kawther Khazri Ayeb, Afef Kacem, Mm Takwa Ben Aicha Copyright (c) 2024 Afef Kacem https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1833 Fri, 06 Sep 2024 00:00:00 +0000 Dr DAH-Unet: A modified UNet for Semantic Segmentation of MRI images for brain tumour detection https://elcvia.cvc.uab.cat/article/view/1755 <p class="western" align="justify"><span style="color: #000000;"><span style="font-family: Times New Roman, serif;"><span style="font-size: small;"><span lang="en-CA">Using sophisticated image processing techniques on brain MR images for medical image segmentation significantly improves the ability to detect tumors. It takes a lot of time and requires a doctor's training and experience to manually segment a brain tumor. </span></span></span></span><span style="color: #000000;"><span style="font-family: Times New Roman, serif;"><span style="font-size: small;"><span lang="en-CA">To address this issue, we proposed a modification in Unet architecture called DAH-Unet that combines residual blocks, a rebuilt atrous spatial pyramid pooling (ASPP), and depth-wise convolutions. Also, a hybrid loss function which is explicitly aware of the boundaries is another thing we suggested. Experiments were conducted on two publicly available dataset and proved better in some metrics as compare to existing semantic segmentation models. </span></span></span></span></p> <p>&nbsp;</p> <p>&nbsp;</p> Mohankrishna Potnuru, B. Suribabu Naick Copyright (c) 2024 Mohankrishna Potnuru, B. Suribabu Naick https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1755 Tue, 12 Nov 2024 00:00:00 +0000 An Efficient Deep Learning based License Plate Recognition for Smart Cities https://elcvia.cvc.uab.cat/article/view/1917 <p>Computer vision algorithm with the amalgamation of deep learning technologies has provided endless possible applications. Currently, with the high load of vehicle traffic it is very difficult to trace and capture vehicular information over traffic surveillance on roads, parking or for safety concerns. Here, we have done an exploration for such a use case where a deep learning model is trained to detect and recognize a license plate in a vehicle. In the proposed method an object detection model, EfficientDet-D0 has been trained with custom dataset for license plate detection and have used optical character recognition model, Tesseract. In the proposed method, we have used a novel license plate extraction algorithm which reduces false localization followed by character recognition in a pipeline manner. We have also explored model quantization method to compress the model at reduced precision for efficient edge-based deployment for an end-application. In the proposed work, we have dedicated our study for Indian vehicles and have evaluated the performance with standard datasets like CCPD, UFPR and have achieved 97.9% in license localization and 95.15% in end-to-end detection and recognition respectively. We have implemented on Raspberry Pi3 and NVIDIA Jetson Nano deviced with improved performances. Comparing with state-of-the-art we have achieved 2×, 3.8× and 2.5× in CPU, GPU and edge platform respectively.</p> Swati, Shubh Dinesh Kawa, Shubham Kamble, Darshit Desai, Pratik Himanshu Karelia, Pinalkumar Engineer Copyright (c) 2024 Swati, Shubh Dinesh Kawa, Shubham Kamble, Darshit Desai, Pratik Himanshu Karelia, Pinalkumar Engineer https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1917 Tue, 12 Nov 2024 00:00:00 +0000 A Labeled Array Distance Metric for Measuring Image Segmentation Quality https://elcvia.cvc.uab.cat/article/view/1941 <p>This work introduces two new distance metrics for comparing labeled arrays, which are common outputs of image segmentation algorithms. Each pixel in an image is assigned a label, with binary segmentation providing only two labels ('foreground' and 'background'). These can be represented by a simple binary matrix and compared using pixel differences. However, many segmentation algorithms output multiple regions in a labeled array. We propose two distance metrics, named LAD and MADLAD, that calculate the distance between two labeled images. By doing so, the accuracy of different image segmentation algorithms can be evaluated by measuring their outputs against a 'ground truth' labeling. Both proposed metrics, operating with a complexity of <em>O(N)</em> for images with <em>N</em> pixels, are designed to quickly identify similar labeled arrays, even when different labeling methods are used. Comparisons are made between images labeled manually and those labeled by segmentation algorithms. This evaluation is crucial when searching through a space of segmentation algorithms and their hyperparameters via a genetic algorithm to identify the optimal solution for automated segmentation, which is the goal in our lab, SEE-Insight. By measuring the distance from the ground truth, these metrics help determine which algorithm provides the most accurate segmentation.</p> Maryam Berijanian, Katrina Gensterblum, Doruk Alp Mutlu, Katelyn Reagan, Andrew Hart, Dirk Colbry Copyright (c) 2024 Maryam Berijanian, Katrina Gensterblum, Doruk Alp Mutlu, Katelyn Reagan, Andrew Hart, Dirk Colbry https://creativecommons.org/licenses/by-nc-nd/4.0 https://elcvia.cvc.uab.cat/article/view/1941 Tue, 12 Nov 2024 00:00:00 +0000