Nothing Special   »   [go: up one dir, main page]

Convolutional Neural Networks For Image Classification

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Volume 8, Issue 5, May 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Convolutional Neural Networks for Image


Classification
Jasmin Praful Bharadiya
Doctor of Philosophy Information Technology,
University of the Cumberlands, USA

Abstract:- Deep learning has recently been applied to There are several deep learning architectures available.
scene labelling, object tracking, pose estimation, text Convolutional neural networks, the most effective and
detection and recognition, visual saliency detection, and practical deep neural network for this sort of data, were
image categorization. Deep learning typically uses utilised to create the model reported in this research, a
models like Auto Encoder, Sparse Coding, Restricted classifier system. As a result, CNNs that have been trained
Boltzmann Machine, Deep Belief Networks, and on huge datasets of pictures for recognition tasks may be
Convolutional Neural Networks. Convolutional neural used to their advantage by applying these learning
networks have exhibited good performance in picture representations to tasks that need less training data.
categorization when compared to other types of models.
A straightforward Convolutional neural network for Since 2006, a variety of techniques have been created
image categorization was built in this paper. The image to get around the challenges involved in training deep neural
classification was finished by this straightforward networks. Krizhevsky suggests a traditional CNN
Convolutional neural network. On the foundation of the architecture Alexnet and demonstrates a considerable
Convolutional neural network, we also examined several advancement over earlier approaches to the picture
learning rate setting techniques and different classification job. Numerous initiatives to boost Alexnet's
optimisation algorithms for determining the ideal performance have been recommended in light of its success.
parameters that have the greatest influence on image VGGNet, GoogleNet, and ZFNet are suggested.
categorization.
 Hierarchical Feature Extraction: CNNs excel at
Keywords:- Convolutional neural network, Deep Learning, learning hierarchical representations of images. They
Transfer Learning, ImageNet, Image classification; learning consist of multiple layers, including convolutional layers
rate, parametric solution. and pooling layers, that progressively extract features at
different levels of abstraction. This hierarchical approach
I. INTRODUCTION allows CNNs to capture intricate patterns and structures in
images, leading to more accurate classification.
Image classification in computer vision is important
for our education, jobs, and daily life. Images are classified II. KEY REASONS FOR THE SIGNIFICANCE OF
using a procedure that includes image preprocessing, image CNN
segmentation, key feature extraction, and matching
identification. With the aid of the most modern image A. Translation Invariance:
classification techniques, we are now able to acquire image CNNs are designed to be translation invariant, meaning
data more quickly than ever before and put it to use in a they can recognize patterns regardless of their location in an
number of fields, including face recognition, traffic image. This is achieved through the use of convolutional
identification, security, and medical equipment. In order to layers that apply filters to an image, detecting features
address the shortcomings of the conventional approach of regardless of their position. This property enables CNNs to
feature selection, feature extraction and classifier have been classify images regardless of their orientation or position,
merged into a learning framework with the emergence of making them more robust and accurate in real-world
deep learning. The goal of deep learning is to identify scenarios.
several layers of representation with the expectation that
high-level characteristics will capture the data's more B. Data Efficiency:
ethereal semantics. Using Convolutional architectures in CNNs require fewer training examples than traditional
image classification is a crucial component of deep learning. machine learning algorithms. They can learn from a small
The anatomy of the mammalian visual system serves as number of examples due to their ability to capture relevant
inspiration for convolutional neural network. Hubel and features and generalize to unseen data. This property makes
Wiesel suggested a visual structure model based on the cat CNNs ideal for scenarios where large amounts of labeled
visual brain in 1962. For the first time, the idea of a data are not available.
receptive field has been put out. In 1980, Fukushima
C. Transfer Learning:
presented the first hierarchical framework Neocognition
CNNs are capable of transfer learning, meaning they can
would utilise to analyse pictures. In order to achieve
learn from one task and transfer that knowledge to another
network translation invariance, Neocognitionutilised the
related task. This is achieved through the use of pre-trained
local connection between neurons.
models, which are trained on large datasets, and can be fine-
tuned for specific image classification tasks. Transfer

IJISRT23MAY881 www.ijisrt.com 673


Volume 8, Issue 5, May 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
learning reduces the amount of training data required and with the designed gene encoding strategies of both
can lead to significant improvements in classification architectures and connection weights.
performance.  Propose an effective fitness measure of the individuals
representing different CNNs, which does not require
D. Scalability: intensive computational resources.
CNNs are scalable, meaning they can be used for image  Investigate whether the new approach significantly
classification tasks with varying levels of complexity. This outperform the existing methods in both classification
scalability is due to their ability to add or remove layers, accuracy and number of weights.
adjust the number of filters in each layer, and change the
size of the filters used in convolutional layers. This III. METHODOLOGY OF EVALUATION
flexibility makes CNNs suitable for a wide range of
applications, from simple image classification to more Our research's major goal is to comprehend how
complex tasks such as object detection and segmentation. effectively networks operate with both static and real-time
video streams. Transfer learning on networks using picture
To achieve this goal, the objectives below have been datasets is the initial stage in the next process. The next
specified: stage is to execute transfer learning on networks with picture
 Design a flexible gene encoding scheme of the datasets. This is followed by testing the next phase. The
architecture, which does not constrain the maximal length prediction rate of the same item on still photos and live
of the building blocks in CNNs. With this gene encoding video streams is then examined.
scheme, the evolved architecture is expected to benefit
CNNs to achieve good performance in solving different The various accuracy rates are noticed, recorded, and
tasks at hand. shown in the tables provided in subsequent sections. The
 Investigate the connection weight encoding strategy, third crucial factor for judging the performance was to see if
which is capable of representing tremendous numbers of there were any differences in prediction accuracy between
the connection weights in an efficient way. With this the CNNs used in the study. Videos are utilised as testing
encoding approach, the weight connection initialization datasets, not as a training dataset, it must be highlighted. As
problem in CNNs is expected to be effectively optimized a result, we are searching for the best picture classifier
by the proposed GA. where the object is the primary attribute for scene category
 Develop associated selection (including the environmental categorization.
selection), crossover, and mutation operators that can cope

Fig. 1: Primary attribute for scene category categorization

Different layers of the convolutional neural network each window, it preserves the best fits of each feature
used are: within the window.
 Input Layer: The first layer of each CNN used is ‘input  Rectified Linear Unit Layer: The next ‘Rectified Linear
layer’ which takes images, resize them for passing onto Unit’ or ReLU layer swaps every negative number of the
further layers for feature extraction. pooling layer with 0. This helps the CNN stay
 Convolution Layer: The next few layers are mathematically stable by keeping learned values from
‘Convolution layers’ which act as filters for images, hence getting stuck near 0 or blowing up toward infinity.
finding out features from images and also used for  Fully Connected Layer: The final layer is the fully
calculating the match feature points during testing. connected layers which takes the high-level filtered
 Pooling Layer: The extracted feature sets are then passed images and translate them into categories with labels.
to ‘pooling layer’. This layer takes large images and  Basic CNN components: Convolutional layer, pooling
shrink them down while preserving the most important layer, and fully-connected layer are the three major types
information in them. It keeps the maximum value from of convolutional neural network layers.

IJISRT23MAY881 www.ijisrt.com 674


Volume 8, Issue 5, May 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 2: Basic CNN components

The steps of proposed method are as follows: is subtracted. ResNet does this through a shortcut
 Creating training and testing dataset: The super classes connection that connects some (n+x) of the layer's input
images used for training is resized [224,244] pixels for straight to another layer. The comparison is made among
AlexNet and [227,227] pixels GoogLeNet and ResNet50, three existing neural networks i.e. the AlexNets, Google
and the dataset is divided into two categories i.e. training Nets and ResNet50. The training of existing networks and
and validation data sets. the creation of new networks for additional comparison are
 Modifying CNNs network: Replace the last three layers then followed by the transfer learning ideas. The new
of the network with fully connected layer, a softmax layer, models have the same number of layers as the original
and a classification output layer. Set the final fully models, but their performance differs greatly from that of
connected layer to have the same size as the number of the old networks. The tables in the next section provide the
classes in the training data set. Increase the learning rate varied accuracy rates that were calculated on the identical
factors of the fully connected layer to train network faster. photos.
 Train the network: Set the training options, including
learning rate, mini-batch size, and validation data V. CONVOLUTIONAL NEURAL NETWORKS
according to GPU specification of the system. Train the (CNNS) FOR IMAGE CLASSIFICATION OF
network using the training data. ADVANCEMENTS
 Test the accuracy of the network: Classify the  Attention Mechanisms: Recent advancements in CNN
validation images using the fine-tuned network, and architectures have introduced attention mechanisms,
calculate the classification accuracy. Similarly testing the which enable the network to focus on specific regions or
fine tune network on real time video feeds for accurate features in an image that are most relevant for
results. classification. You can explore different attention
IV. MODELS mechanisms, such as self-attention or spatial attention,
and their impact on improving the accuracy and
There are several intelligent pre-trained CNN; these interpretability of CNN models.
CNN can transmit learning. Therefore, at its input layer, it  Transformer-Based Architectures: The success of the
just needs the training and testing datasets. The core layers Transformer model in natural language processing has led
and methods employed in the networks' architecture vary. to its adaptation for image classification tasks.
The Inception Modules in GoogleNet execute convolutions Transformer-based architectures, such as Vision
of varying sizes and combine the filters for the following Transformers (ViTs), replace convolutional layers with
layer. AlexNet, on the other hand, utilises the output of the self-attention mechanisms, enabling the model to capture
preceding layer as its input rather than filter concatenation. global dependencies in images. You can investigate the
Both networks have undergone independent testing and performance and scalability of these architectures
make use of the Caffe Deep Learning framework's compared to traditional CNNs.
implementation.  Meta-Learning and Few-Shot Learning: Meta-learning
approaches aim to enhance the ability of CNNs to learn
However, as we go further away, neural network from a few labeled examples by leveraging prior
training gets challenging and accuracy begins to saturate knowledge learned from similar tasks or datasets. Few-
before declining. Residual Learning makes an effort to shot learning techniques, such as meta-learning, metric
address both of these issues. A deep convolutional neural learning, or generative modeling, enable CNNs to
network often has many layers that are layered and trained generalize to new classes with limited training data. You
for the given purpose. At the conclusion of its layers, the can explore the advancements in meta-learning and few-
network learns a number of low-, mid-, and high-level shot learning for image classification and compare their
characteristics. In residual learning, the network tries to performance with traditional CNN models.
learn some residual rather than certain characteristics.
Residual is just the feature learnt from the layer's input that

IJISRT23MAY881 www.ijisrt.com 675


Volume 8, Issue 5, May 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 AutoML and Neural Architecture Search: Automated VII. SUMMARY
Machine Learning (AutoML) techniques, specifically
Neural Architecture Search (NAS), have gained attention This article provides an overview of Convolutional
for automatically discovering optimal CNN architectures Neural Networks (CNNs) for image classification. It begins
for image classification. NAS algorithms leverage by highlighting the importance of image classification in
reinforcement learning, evolutionary algorithms, or various domains and the liitations of traditional feature
gradient-based optimization to search for architectures selection approaches. Deep learning, particularly CNNs, is
with improved performance. You can discuss the progress introduced as a solution to address these limitations.
in AutoML and NAS and evaluate their effectiveness in
The article explains that CNNs excel at learning
discovering superior CNN architectures.
hierarchical representations of images by utilizing
 Explain ability and Interpretability: As CNNs become
convolutional layers and pooling layers to extract features at
more complex, understanding the decision-making
different levels of abstraction. CNNs offer translation
process of these models becomes crucial. Future
invariance, allowing them to recognize patterns regardless
advancements in CNNs for image classification should
of their location in an image. They are also data-efficient,
focus on improving interpretability and explainability.
requiring fewer training examples due to their ability to
You can explore methods like attention visualization,
capture relevant features and generalize to unseen data.
saliency maps, or class activation maps that provide
insights into which regions of an image contribute most to Transfer learning is emphasized as a key capability of
the classification decision. CNNs, enabling them to leverage pre-trained models trained
 Robustness and Adversarial Defense: CNNs are on large datasets and fine-tune them for specific image
susceptible to adversarial attacks, where subtle classification tasks. This reduces the amount of training data
perturbations to input images can lead to required and improves classification performance.
misclassification. Future advancements in CNN
architectures should address the robustness and security Scalability is another advantage of CNNs, as they can
concerns by incorporating defenses against adversarial be adjusted by adding or removing layers, changing the
attacks. You can discuss different defense mechanisms number of filters, and modifying the size of filters used in
and compare their effectiveness in improving the convolutional layers. This flexibility makes CNNs suitable
robustness of CNN models. for various image classification tasks, from simple
classification to complex tasks like object detection and
VI. FURTHER DISCUSSIONS segmentation.
We will go into more detail about the proposed The article outlines the methodology for evaluating
EvoCNN method's fitness evaluation, weights-related CNN performance, which involves training networks on
parameters, and architectures' encoding strategies in this static and real-time video streams, performing transfer
paragraph. The experimental findings are also reviewed, learning, and testing accuracy. It mentions different types of
which may offer helpful information about the potential uses CNN layers, including input layers, convolution layers,
of the suggested EvoCNN approach. Mutation operators pooling layers, rectified linear unit (ReLU) layers, and fully
serve as the exploration search, or the global search, connected layers.
whereas crossover operators serve as the exploitation search,
or the local search. Since local and global searches should Several models and architectures are discussed, such
compliment one another, only properly developing both of as AlexNet, GoogLeNet, and ResNet50. The article
them might significantly boost performance. The commonly compares their performance and introduces advancements in
employed methods for CNN weight optimisation are based CNNs, including attention mechanisms, transformer-based
on the gradient data. The gradient-based optimizers' architectures, meta-learning, AutoML, and neural
sensitivity to the beginning positions of the parameters that architecture search. It also emphasizes the need for
need to be optimised is well known. The gradient-based explainability and interpretability in CNNs, as well as
methods are prone to becoming stuck in local minima robustness against adversarial attacks.
without a suitable starting point. It seems impossible to
identify a better starting point for the connection weights VIII. CONCLUSION
using GAs due to the vast amount of characteristics. As we
In order to autonomously evolve the architectures and
have seen, a sizable number of factors cannot be
weights of CNNs for image classification challenges, a
successfully optimised or efficiently stored into the
novel evolutionary technique is being developed in this
chromosomes. An indirect encoding strategy is used in the
study. By putting forth a new representation for weight
proposed EvoCNN technique, which simply encodes the
initialization strategy, a new encoding scheme for variable-
means and standard derivations of the weights in each layer.
length chromosomes, a new genetic operator for
The final classification accuracy is frequently taken into
chromosomes with different lengths, a slacked binary
account by methods now in use to find CNN architectures
tournament selection for choosing promising individuals,
together with an individual's fitness. The training method
and an effective fitness evaluation method to speed up
normally involves several additional epochs, which takes a
evolution, this goal has been successfully attained.
long time to get a final classification accuracy.
Understanding deep learning is important, and it is useful
since training time is limited. Future study will improve our

IJISRT23MAY881 www.ijisrt.com 676


Volume 8, Issue 5, May 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
system by incorporating evolutionary algorithms to address
the classification feature extraction challenge and reduce the
number of parameters required for this operation.

REFERENCES

[1.] Ahonen, T., Hadid, A., and Pietikinen, “M. Face


description with local binary patterns: Application to
face recognition.” Pattern Analysis and Machine
Intelligence, 2037–2041. 2016.
[2.] Bharadiya , J. P., Tzenios, N. T., & Reddy , M.
(2023). Forecasting of Crop Yield using Remote
Sensing Data, Agrarian Factors and Machine
Learning Approaches. Journal of Engineering
Research and Reports, 24(12), 29–44.
https://doi.org/10.9734/jerr/2023/v24i12858
[3.] Bharadiya, J. (2023). Artificial Intelligence in
Transportation Systems A Critical Review. American
Journal of Computing and Engineering, 6(1), 34 - 45.
https://doi.org/10.47672/ajce.1487
[4.] Bharadiya, J. . (2023). A Comprehensive Survey of
Deep Learning Techniques Natural Language
Processing. European Journal of Technology, 7(1), 58
- 66. https://doi.org/10.47672/ejt.1473
[5.] Bharadiya, J. . (2023). Machine Learning in
Cybersecurity: Techniques and Challenges. European
Journal of Technology, 7(2), 1 - 14.
[6.] Bharadiya, J. . (2023). The Impact of Artificial
Intelligence on Business Processes. European Journal
of Technology, 7(2), 15 - 25.
https://doi.org/10.47672/ejt.1488
[7.] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and
Fei-Fei, L., “ImageNet: A large-scale hierarchical
image database.” In CVPR, 2009.
[8.] Hang Chang, Cheng Zhong, Ju Han, Jian-Hua Mao,
“Unsupervised Transfer Learning via Multi-Scale
Convolutional Sparse Coding for Biomedical
Application.” IEEE Transactions on Pattern Analysis
and Machine Intelligence, 23 janvier 2017.
[9.] Howard, A. , “Some improvements on deep
convolutional neural network based image
classi¿cation.” ICLR, 2014.
[10.] Nallamothu, P. T., &Bharadiya, J. P. (2023).
Artificial Intelligence in Orthopedics: A Concise
Review. Asian Journal of Orthopaedic Research,
6(1), 17–27. Retrieved from
https://journalajorr.com/index.php/AJORR/article/vie
w/164
[11.] Redmon J, and Angelova A, “Real-time grasp
detection using convolutional neural networks”, IEEE
International Conference on Robotics and
Automation, pp. 1316–1322, 2015.
[12.] Van de Sande, K. E. A., Gevers, T., and Snoek, C. G.
M, “Evaluating color descriptors for object and scene
recognition”, IEEE Transactions on Pattern
Analysisand Machine Intelligence.” 1582– 1596.
2010.
[13.] Zhou, X., Yu, K., Zhang, T., & Huang, T. “Image
classi¿cation using super-vector coding of local
image descriptors.” In ECCV,2010.

IJISRT23MAY881 www.ijisrt.com 677

You might also like