ELCVIA Electronic Letters on Computer Vision and Image Analysis

Deep Learning based-framework for Math Formulas Understanding

2024-11-07T01:16:20+00:00

Extracting mathematical formulas from images of scientific documents and converting them into structured data for storage in a database is essential for their further use. However, recognizing and extracting math formulas automatically, rapidly, and effectively can be challenging. To handle this problem, we have proposed a system, with a deep learning architecture, that uses the formula combination features to train the YOLOv8 model. This system can detect and classify the formula inside and outside the text. Once extracted, we built a robust end-to-end math formula recognition system that automatically identifies and classifies math symbols, using the faster R-CNN object detection, then a Convolution Graphical Neural network (ConvGNN) to analyze the math formula layout, as the formula is better represented as a graph with complex relationships and object interdependency. ConvGNN can predict formula linkages without resorting to laborious feature engineering. Experimental results on the IBEM and CROHME 2019 datasets reveal that the proposed approach can accurately extract isolated formulas with mAP of 99.3\%, embedded formulas with mAP of 80.3%, detect symbols with mAP of 87.3%, and analyze formula layout with an accuracy of 92%. We also showed that our system is competitive with related work.

Dr DAH-Unet: A modified UNet for Semantic Segmentation of MRI images for brain tumour detection

2024-08-16T14:45:46+00:00

Using sophisticated image processing techniques on brain MR images for medical image segmentation significantly improves the ability to detect tumors. It takes a lot of time and requires a doctor's training and experience to manually segment a brain tumor. To address this issue, we proposed a modification in Unet architecture called DAH-Unet that combines residual blocks, a rebuilt atrous spatial pyramid pooling (ASPP), and depth-wise convolutions. Also, a hybrid loss function which is explicitly aware of the boundaries is another thing we suggested. Experiments were conducted on two publicly available dataset and proved better in some metrics as compare to existing semantic segmentation models.

An Efficient Deep Learning based License Plate Recognition for Smart Cities

2024-08-16T17:31:42+00:00

Computer vision algorithm with the amalgamation of deep learning technologies has provided endless possible applications. Currently, with the high load of vehicle traffic it is very difficult to trace and capture vehicular information over traffic surveillance on roads, parking or for safety concerns. Here, we have done an exploration for such a use case where a deep learning model is trained to detect and recognize a license plate in a vehicle. In the proposed method an object detection model, EfficientDet-D0 has been trained with custom dataset for license plate detection and have used optical character recognition model, Tesseract. In the proposed method, we have used a novel license plate extraction algorithm which reduces false localization followed by character recognition in a pipeline manner. We have also explored model quantization method to compress the model at reduced precision for efficient edge-based deployment for an end-application. In the proposed work, we have dedicated our study for Indian vehicles and have evaluated the performance with standard datasets like CCPD, UFPR and have achieved 97.9% in license localization and 95.15% in end-to-end detection and recognition respectively. We have implemented on Raspberry Pi3 and NVIDIA Jetson Nano deviced with improved performances. Comparing with state-of-the-art we have achieved 2×, 3.8× and 2.5× in CPU, GPU and edge platform respectively.

A Labeled Array Distance Metric for Measuring Image Segmentation Quality

2024-09-20T09:50:41+00:00

This work introduces two new distance metrics for comparing labeled arrays, which are common outputs of image segmentation algorithms. Each pixel in an image is assigned a label, with binary segmentation providing only two labels ('foreground' and 'background'). These can be represented by a simple binary matrix and compared using pixel differences. However, many segmentation algorithms output multiple regions in a labeled array. We propose two distance metrics, named LAD and MADLAD, that calculate the distance between two labeled images. By doing so, the accuracy of different image segmentation algorithms can be evaluated by measuring their outputs against a 'ground truth' labeling. Both proposed metrics, operating with a complexity of O(N) for images with N pixels, are designed to quickly identify similar labeled arrays, even when different labeling methods are used. Comparisons are made between images labeled manually and those labeled by segmentation algorithms. This evaluation is crucial when searching through a space of segmentation algorithms and their hyperparameters via a genetic algorithm to identify the optimal solution for automated segmentation, which is the goal in our lab, SEE-Insight. By measuring the distance from the ground truth, these metrics help determine which algorithm provides the most accurate segmentation.