Ayush Das Report
Ayush Das Report
Ayush Das Report
AYUSH DAS
19MIM10021
January - 2024
Declaration
I hereby declare that the thesis Animal Species Prediction Using Deep Learning
submitted by Ayush Das (19MIM10021) to the School of Computing Science and
Engineering, VIT Bhopal University, Madhya Pradesh - 466114 in partial fulfillment of the
requirements for the award of Integrated Master of Technology in Artificial
Intelligence is a bona-fide record of the work carried out by me under the supervision of
Dr. Rudra Kalyan Nayak. I further declare that the work reported in this thesis, has not
been submitted and will not be submitted, either in part or in full, for the award of any
other degree or diploma of this institute or of any other institute or University.
Signature:
Name of the candidate: Ayush Das
Register Number: 19MIM10021
Date:
School of Computing Science and Engineering
Certificate
This is to certify that the thesis titled Animal Species Prediction Using Deep Learning
submitted by Ayush Das (19MIM10021) to VIT Bhopal University, in partial fulfillment of the
requirement for the award of the degree of Integrated Master of Technology in Artificial
Intelligence is a bona-fide work carried out under my supervision. The thesis fulfills the
requirements as per the regulations of this University and in my opinion meets the necessary
standards for submission. The contents of this thesis have not been submitted and will not be
submitted either in part or in full, for the award of any other degree or diploma and the same is
certified.
Signature Signature
Name: Name:
Designation: Designation:
Date: Date:
Examiner
Signature
Name:
Designation:
Date:
Abstract
Observing animals is a popular pastime, but animal books are helpful in identifying the
species that they belong to. To assess biodiversity richness, track endangered species, and
investigate how climate change affects species distribution in a given area, accurate animal
species identification is essential. Finding and categorizing animal species is the first step
in figuring out how long they will survive and how humans might be affecting them.
Second, it helps people distinguish between animals that are not predators and those that
are, both of which are very dangerous to people and the environment. Thirdly, because it is
frequently spotted on roads and causes multiple collisions with cars, it reduces the
overcome in order to identify and classify animal species, including differences in behavior
and size between species. Millions of ecological photos are produced by the passive
monitoring method known as camera traps. Given that manual evaluation of huge datasets
the high quantity of photos. In recent years, advances in deep learning networks have
enabled them to achieve state-of-the-art outcomes in computer vision tasks including the
identification of objects and species. With the goal of developing an adaptable and
successful method for predicting the animal type, this study looks into cutting-edge
convolutional neural network (CNN) was trained to identify salient elements in animal
photos. The dataset was gathered via Kaggle. Several algorithms are utilized to construct a
machine learning model from the candidates' historical data. Using contemporary
ⅳ
exploratory data analysis, data preprocessing, and the development and training of deep
neural networks. During the picture processing and analysis stage, CNNs are used in a
novel way. The neural network, which has been trained to identify and categorize species
made possible by this automated classification, which greatly speeds up the process of
identifying species.
Acknowledgement
It is indeed with a great sense of pleasure and immense sense of gratitude that
to thank Dr. Rudra Kalyan Nayak for his support and advice to get and
AYUSH DAS
19MIM10021
ⅶ
Contents
DECLARATION ⅱ
ABSTRACT ⅳ
LIST OF TABLES
ⅷ
LIST OF FIGURES ⅸ
1 INTRODUCTION 1
2. LITERATURE REVIEW 4
3. THEORETICAL BACKGROUND/METHODOLOGY 9
3.1 THEORETICAL BACKGROUND 9
3.2 METHODOLOGY 10
3.2.1 EXPLORATORY DATA ANALYSIS 10
3.2.2 DATA PREPROCESSING 14
ⅷ
List of Tables
List of Figures
CHAPTER 1
Introduction
decreased productivity
3
CHAPTER 2
Literature Review
In paper [1] the researcher named Yao, S. et al aims to classify things that belong to
the same species through the use of fine-grained visual categorization. This
innovative explanation necessitated solely the first picture for input, yet it was
capable of producing visually discernible descriptions autonomously, which were
sufficient for intricate visual classification. Large-scale picture processing and
computational cost are the main drawbacks of fine-grained visual categorization.
In paper [2] the researcher Xie. Et al suggests that an instance search should
produce results that are more precise, which is typically what a user wants to see,
rather than just images that are nearly identical. By building a large-scale database of
reference images that are compressed at consistent bit rate levels using various JPEG
encoder optimisation techniques, it presents a baseline system that uses fine-grained
classification scores. In subjective tests, the comparison approach is used to rank them
in order to identify tiny differences. The primary disadvantage of fine-grained results
is the occurrence of duplication in the classification of items that belong to the same
species. A precise artificial intelligence (AI) system for automating snake species is
extremely valuable since it enables medical professionals to promptly diagnose
injured individuals, hence minimising the number of fatal snake bites. The 1,572
snake species in the SnakeCLEF 2022 challenge dataset are included, along with
details on their habitats. Accurate identification is challenging due to the dataset's
long-tailed distribution. They employed the AutoAugment, RandAugment, and Focal
Loss techniques to train the models. In the end, they successfully increase the
recognition accuracy by using the model integration strategy, and ultimately obtain a
macro f1 score of 71.82% on the private leaderboard.
6
In paper [3] researcher named G Chen proposed that in order to classify wild animals
using extremely difficult camera-trap imaging data, they suggested a revolutionary
deep convolutional neural network based species recognition system. The cutting-
edge graph-cut method was used to automatically segment the picture data, which
were taken with a motion-triggered camera trap. As a baseline species recognition
algorithm, they employed the conventional bag of visual words model for
comparison. It is evident that better performance is obtained with the suggested deep
convolutional neural network based species recognition. This is the first attempt, as
far as they were aware, at fully automatic computer vision based species recognition
using real camera-trap images. Additionally, for assessment and benchmarking
purposes, they gathered and annotated a standard camera-trap dataset of 20 common
species in North America. This dataset comprises 14, 346 training photos and 9, 530
testing images, and it is open to the public.
In paper [4] researcher named MS Norouzzadeh looked into how such data may be
automatically, precisely, and cheaply collected. This could enable several disciplines
in ecology, wildlife biology, zoology, conservation biology, and animal behaviour
become "big data" sciences. Motion-sensor "camera traps" make it possible to
regularly, affordably, and unobtrusively take photos of wildlife. Nevertheless,
information extraction from these images is still a costly, labor-intensive manual
process. They illustrated how deep learning, a state-of-the-art form of artificial
intelligence, can automatically retrieve such information. Subsequently, the 3.2
million-image Snapshot Serengeti dataset was used to train deep convolutional neural
networks to recognise, quantify, and characterise the behaviours of 48 different
species. With an accuracy of over 93.8%, their deep neural networks can recognise
animals automatically. They anticipate that this percentage will rise quickly in the
upcoming years.
In paper [6] researcher named H Nguyen proposed in order to create an automated
wildlife monitoring system, they offered a framework in their study for developing
automated animal recognition in the wild. Specifically, they trained a computational
system that can filter animal photos and identify species automatically using the state-
of-the-art deep convolutional neural network architectures and a single-labeled
dataset from the citizen scientist-led Wildlife Spotter project. The feasibility of
developing a fully automated wildlife observation system was demonstrated by the
7
experimental results, which achieved an accuracy of 96.6% for the task of detecting
images containing animals and 90.4% for identifying the three most common species
among the set of images of wild animals taken in South-central Victoria, Australia As
a result, research findings may be produced more quickly, citizen science-based
monitoring systems may be built more effectively, and management decisions may be
made more quickly. These actions may have a substantial impact on the fields of
ecology and trap camera image processing.
In paper [10] researcher named X Yu, J Wang proposed a method for
automatically identifying species in wildlife photos taken by remote camera traps.
They begin their process with photos that have been clipped from the background.
Then they employed enhanced sparse coding spatial pyramid matching (ScSPM),
which created global features using weighted sparse coding and max pooling with a
multi-scale pyramid kernel. The images are then classified using a linear support
vector machine algorithm. Dense SIFT descriptor and cell-structured LBP (cLBP) are
extracted as the local features. In feature space, both sparsity and locality of encoding
are enforced by the use of weighted sparse coding. They obtained an average
classification accuracy of 82% by testing the approach on a dataset of more than
7,000 camera trap photos of 18 species from two distinct field sites. Their
investigation shows that in practical, complex settings, the combination of SIFT and
cLBP can be an effective strategy for animal species recognition.
8
Table 2.1 Various papers with different datasets and different models used
CHAPTER 3
The study of computer vision and deep learning provides the theoretical foundation for the
Convolutional Neural Networks (CNNs) animal species prediction project. CNNs are a
specific kind of neural network that is used for tasks like image classification. They are
made to process and analyse visual data. Convolutional, pooling, and fully connected
layers are used in the theoretical framework to automatically extract hierarchical
characteristics from images. In order to recognise complicated visual patterns in images,
CNNs are particularly good at capturing spatial hierarchies and local patterns. To make the
most of the knowledge gained from a variety of datasets, transfer learning in which
previously trained models on big datasets are adjusted for particular tasks is frequently
used. The capacity of CNNs to automatically extract and learn hierarchical features is the
foundation for their success in picture classification tasks. This ability allows for the
correct classification of animal species based on visual cues available in the image dataset.
CNNs in this research perform better thanks to theoretical ideas from deep learning,
convolutional operations, and transfer learning.
11
3.2 Methodology
Image
Tensorflow Training CNN
Dataset
Predict
Output
An essential phase in the data analysis process, exploratory data analysis (EDA) helps to
identify the essential features of a dataset and offers important new information for
further research. EDA's fundamental strength is in its dependence on data visualisation
methods such as scatter plots, box plots, and histograms. The identification of underlying
trends is made possible by these visualisations, which offer a comprehensive picture of
data distributions, patterns, and possible outliers. EDA also entails fixing missing data,
as understanding the dataset's completeness is critical for successful analysis.
12
When it comes to an animal species prediction, exploratory data analysis (EDA) is vital
since it offers valuable insights into the properties and distribution of the image
collection. Researchers can use EDA to find patterns, variances, and possible problems
in the data, which helps them make wise judgements in later project phases. Key
components of EDA include determining potential biases or imbalances in the dataset,
evaluating image quality, and comprehending the distribution of species across different
classifications. EDA also guides the selection of acceptable models, helps determine
optimal preprocessing approaches, and improves the overall accuracy and resilience of
the animal species prediction system.
The image dataset used in this study, obtained from Kaggle and is an extensive
collection of high-quality photos covering a wide range of animal species. Every species
is painstakingly organised into designated folders that hold a variety of photos that
capture unique positions, lighting, and backgrounds. The dataset displays a wide variety
of wildlife, from widespread to threatened species. Preliminary preprocessing has been
applied to the images, such as scaling and normalisation, to guarantee consistency in
pixel values and dimensions. Interestingly, the dataset addresses any biases and
imbalances by displaying a balanced distribution across classes. The dataset's
representativeness is improved by include various viewpoints and environmental
situations in the photos, which strengthens the animal species prediction model's
resilience. A rigorous annotation method that assigns a unique class label to each image,
signifying the animal species it belongs to, characterises the labelled data in the image
dataset. In order to enable the model to understand the unique characteristics of each
species during training, this labelling is essential to supervised learning.
13
The model may be biased towards the more represented species if there is an imbalance in
the number of photos for each species in the training dataset. This could result in less than
ideal performance for less common species. In order to develop, train, and assess deep
learning models for the proposed automated wildlife species detection system utilising
Convolutional Neural Networks (CNNs), two necessary libraries are TensorFlow and
Keras. Training large-scale CNNs requires efficient computation on GPUs and TPUs,
which TensorFlow's entire deep learning ecosystem enables. As a high-level neural
network API, Keras makes model building easier and speeds up prototyping. OpenCV can
be used for feature extraction, image editing, and preprocessing. Its adaptable features
come in handy when managing the difficulties posed by the unpredictability of wildlife
images. Scikit-learn can also be used to compute metrics and evaluate models. Combined,
these libraries provide a stable, scalable, and effective framework for creating and
implementing the model for identifying animal species, utilising cutting-edge techniques
from the computer vision and deep learning fields.
For both the training and testing datasets, the algorithm in use counts the quantity of
photos that are accessible for every species of animal. This data is kept in dictionaries with
the associated image counts acting as the values and class names acting as keys. The
pandas dataframes train_data_df and test_data_df, which show the distribution of images
among various species in the training and testing datasets, respectively, are used to
organise the image count information. The training and testing data are also concatenated
to generate a composite dataframe all_data_df. The training and testing datasets' image
counts are used to sort the combined DataFrame. Next, a bar chart is plotted to show how
the photos are distributed among the various animal species. Any class imbalances in
which a particular species may be overrepresented or underrepresented in the dataset can
be found with the use of this visualisation.
15
Preprocessing the data is essential to improving the model's performance when using
an image dataset. Because of the inherent difficulties in photographing animals,
preprocessing techniques like downsizing, normalisation, and augmentation are
crucial. Consistency in image proportions is ensured through resizing, which makes
model training more effective. Pixel values are standardised by normalisation, which
eliminates variances brought on by different illumination. Through random
modifications like as flips and rotations, augmentation adds diversity to the dataset,
reducing overfitting and improving the model's capacity to accommodate changes in
views and deformations. By combining these preprocessing techniques, the
Convolutional Neural Network (CNN) becomes more resilient and broadly based,
which enables it to extract pertinent information from the wildlife photos.
A Keras library class called ImageDataGenerator. In order to improve the
training dataset's resilience and diversity for the animal species prediction, this
procedure is essential. In real time during the model training phase, the images are
transformed by the ImageDataGenerator. These modifications consist of rotating,
inverting the horizontal axis, shearing, rotating, rotating, and rescaling the pixel
values to a range between 0 and 1. With the addition of variability to the training set,
the model is better able to adapt to the various orientations, views, and deformations
that may arise in real-world situations.
Next, batches of augmented photos are produced directly from the directory
holding the training dataset that has been preprocessed using the flow_from_directory
function. RGB is the designated colour mode, and the intended size of the images is
set to (256, 256). In order to comply with the multi-class classification requirements
of the animal species prediction challenge, the images are classified into classes using
categorical encoding. The training sample order is made random by enabling
shuffling and setting the batch size to 32, which guarantees effective processing
during model training.
16
Significant results are obtained from this data preparation. This updated dataset
includes a wider variety of modifications and adjustments, enhancing the model's
exposure to a wider range of realistic images. The model's capacity to generalise well
to new data is improved by this approach, which also helps prevent overfitting.
Furthermore, the collection of fresh photographs is obviated, thereby expanding the
dataset efficiently. This is especially advantageous when dealing with limited data.
Using this code to run the training dataset modifies and adds enhanced images to it,
which improves the Convolutional Neural Network's overall performance and
robustness both during training and when it comes to predicting animal species.