Online Note Categorization

COLLEGE OF APPLIED BUSINESS
Tribhuvan University
Institute of Science and Technology
i
“Online Note Categorization System”
A PROJECT REPORT
Submitted To:
Department of Computer Science and Information Technology
In partial fulfillment of the requirements for the Bachelor’s Degree in Computer Science
and Information Technology
Submitted By:
Bijay Basnet
Kripesh Prasad Bhattarai
Ritesh Shrestha
College of Applied Business

SUPERVISOR’S RECOMMENDATION
I am pleased to recommend the project titled "ONLINE NOTE CATEGORIZATION

SYSTEM," prepared under my supervision by Bijay Basnet, Kripesh Prasad Bhattarai, and
ii
Ritesh Shrestha. This project fulfills the necessary requirements for the degree of B.Sc. in
Computer Science and Information Technology and should now proceed for evaluation.
…………………………………………
Tekendra Nath Yogi
Lecturer
Tribhuvan University

STUDENT’S DECLARATION
We, the undersigned, solemnly declare that we are the sole authors of this work and that no
sources other than those listed here have been utilized in its creation.
iii
We further confirm that this work has not been submitted for any other academic evaluation
or degree program.
…………………………………………
Bijay Basnet (105)
…………………………………………
Kripesh Prasad Bhattarai (118)
…………………………………………
Ritesh Shrestha (126)

LETTER OF APPROVAL
This is to certify that this project prepared by BIJAY BASNET, KRIPESH PRASAD
BHATTARAI AND RITESH SHRESTHA entitled “ONLINE NOTE
iv
CATEGORIZATION SYSTEM” in partial fulfillment of the requirements for the degree of
B.Sc. in Computer Science and Information Technology , has been thoroughly reviewed. In
our assessment, it demonstrates a satisfactory level of scope and quality, fulfilling the criteria
for the required degree.
……………………………… ……………………………….
Head Coordinator Supervisor
College of Applied Business Tekendra Nath Yogi
…………………………………. …………………………………..
Internal Examiner External Examiner
ACKNOWLEDGEMENT
We would like to express our sincere gratitude as we acknowledge the completion

of our project titled "Online Note Categorization System," in accordance with the
syllabus of "Project Work CSC 412" prescribed by Tribhuvan University for the
fulfillment of the semester-project. In addition to our own efforts, there have been
v
several individuals whose invaluable contributions were instrumental in the
successful completion of this endeavor.
First and foremost, we extend our deepest appreciation to our Supervisor, Tekendra
Nath Yogi, for his unwavering guidance and support throughout this project. His
expertise, insights, and encouragement were invaluable, and without his mentorship,
this project would not have achieved its success.
We would also like to express our sincere gratitude to the other supervisors who
played a crucial role in our project presentation. Their constructive comments and
advice not only enhanced our presentation skills but also contributed significantly to
the improvement of our project documentation.
Furthermore, we are grateful to our classmates for their support and valuable
suggestions during the development of this application. Their input and
collaboration were greatly appreciated and helped us refine our work.
Finally, we would like to thank all the individuals who provided us with the
opportunity and resources necessary to complete this report. Their unwavering
support and belief in our abilities have been invaluable.
With Respect,
Name: Bijay Basnet

Roll no.: 105
Name: Kripesh Prasad Bhattarai

Roll no.: 118
Name: Ritesh Shrestha

Roll no.: 126
ABSTRACT
As the volume of textual information continues to grow across various domains, the task of
text categorization has gained significant importance. Our project focuses on addressing this
challenge by automatically categorizing personal notes and generating relevant tags based on
their content.
vi
To achieve our objective, we employed the Multinomial Naive Bayes algorithm for
classifying personal notes into predefined categories, including Technology, Politics,
Religion, Science, Sports, Space, Medicine, and Miscellaneous. Additionally, we
implemented 20 specific categories from the dataset, such as alt.atheism, comp.graphics,
comp.os.ms-windows.misc, comp.sys.ibm.pc.hardware, comp.sys.mac.hardware,
comp.windows.x, misc.forsale, rec.autos, rec.motorcycles, rec.sport.baseball,
rec.sport.hockey, sci.crypt, sci.electronics, sci.med, sci.space, soc.religion.christian,
talk.politics.guns, talk.politics.mideast, talk.politics.misc, and talk.religion.misc. This
approach allowed us to generate automatic tags for each personal note.
Our system effectively processed and classified the textual personal notes using the
Multinomial Naive Bayes algorithm. The categorized notes were then displayed, enabling
users to filter them based on specific categories. This report highlights the successful
application of the Multinomial Naive Bayes algorithm for text classification, offering a
streamlined note-taking process and simplified note management for users.
Keywords: MULTINOMIAL NAIVE BAYES, Online Note, Python, Django
Table of Contents
SUPERVISOR’S RECOMMENDATION................................................................................ii
STUDENT’S DECLARATION...............................................................................................iii
LETTER OF APPROVAL........................................................................................................iv
vii
ACKNOWLEDGEMENT.........................................................................................................v
ABSTRACT..............................................................................................................................vi
LIST OF FIGURES..................................................................................................................ix
LIST OF TABLES.....................................................................................................................x
LIST OF ABBREVIATIONS...................................................................................................xi
CHAPTER 1...............................................................................................................................1
INTRODUCTION......................................................................................................................1
1.1 Introduction.................................................................................................................1
1.2 Problem Statement.......................................................................................................2
1.3 Objectives....................................................................................................................2
1.4 Scope and Limitation...................................................................................................3
1.5 Development Methodology.........................................................................................4
1.6 Report Organization....................................................................................................5
CHAPTER 2...............................................................................................................................6
BACKGROUND STUDY AND LITERATURE REVIEW.....................................................6
2.1 Background Study.......................................................................................................6
2.2 Literature Review........................................................................................................7
CHAPTER 3...............................................................................................................................9
SYSTEM ANALYSIS...............................................................................................................9
3.1 System Analysis..........................................................................................................9
3.1.1 Requirement Analysis..........................................................................................9
i. Functional Requirements...........................................................................................9
ii. Non-Functional Requirements.................................................................................11
3.1.2 Feasibility Analysis............................................................................................11
i. Technical Feasibility................................................................................................11
ii. Operational Feasibility.............................................................................................12
iii. Economic Feasibility..............................................................................................12
viii
iv. Schedule Feasibility................................................................................................12
3.1.3 Analysis..............................................................................................................13
CHAPTER 4.............................................................................................................................16
SYSTEM DESIGN..................................................................................................................16
4.1 Design........................................................................................................................16
4.2 Algorithm Details......................................................................................................18
CHAPTER 5.............................................................................................................................24
IMPLEMENTATION AND TESTING...................................................................................24
5.1 Implementation..............................................................................................................24
5.1.1 Tools Used.............................................................................................................25
5.1.2 Implementation Details of Modules...................................................................27
5.2 Testing.......................................................................................................................28
5.2.1 Test Cases for Unit Testing................................................................................28
5.2.2 Test Cases for System Testing...........................................................................33
5.3 Result Analysis..........................................................................................................36
CHAPTER 6.............................................................................................................................37
CONCLUSION AND FUTURE RECOMMENDATIONS....................................................37
6.1 Conclusion......................................................................................................................37
6.2 Future Recommendation................................................................................................38
REFERENCES.........................................................................................................................39
APPENDIX..............................................................................................................................40
LIST OF FIGURES
Figure 1.1 Waterfall Model........................................................................................................4

Figure 3.1 Use Case Diagram for Online Note Categorization System...................................10
ix
Figure 3.2 Schedule Feasibility Gantt Chart............................................................................12
Figure 3.3 ER Diagram of Online Note Categorization System..............................................13
Figure 3.4 Level 0 DFD of Online Note Categorization System…………………………….14
Figure 3.4 Level 1 DFD of Online Note Categorization System…………………………….15
Figure 4.1 Relational Data Model of Online Note Categorization System…………………..17
LIST OF TABLES
Table 5.1 Test Case for User Registration

Table 5.2 Test Case for User Login
Table 5.3 Test Case for Admin Login
Table 5.4 Test Case for Text Preprocessing
x
Table 5.5 Test Case for Managing category
Table 5.6 Test Case for Managing Notes
Table 5.7 Test Case for Text Categorization
LIST OF ABBREVIATIONS
ML Machine Learning
NLP Natural Language Processing
xi
MNB Multinomial Naïve Bayes
ER Entity-Relationship
DFD Data Flow Diagram
PK Primary Key
FK Foreign Key
TF Term Frequency
IDF Inverse Document Frequency
TFIDF Term Frequency-Inverse Document Frequency
NLTK Natural Language Toolkit
HTTP Hypertext Transfer Protocol
HTML Hypertext Markup Language
xii
CHAPTER 1
INTRODUCTION
1.1 Introduction
In today's digital age, the volume of textual data being generated on a daily basis is staggering. It
has become a challenge to manage and extract insights from such an enormous amount of
unstructured data. However, with the advent of natural language processing (NLP) and machine
learning techniques, text classification has emerged as a powerful tool to organize, structure, and
categorize this data, leading to better decision-making. Text classification has numerous
applications, such as sentiment analysis, spam detection, intent detection, and topic labeling. It is
estimated that 80% of all information is unstructured, with text being the most common type of
unstructured data. This makes it a challenging task to analyze, understand, and sort through
textual data, which is why companies often fail to use it to its full potential.
Text classification involves categorizing text documents into predefined categories or classes.
This technique can be used to automatically sort through large volumes of textual data, enabling
the extraction of useful insights and patterns that might otherwise be difficult to identify. It is an
essential task in many fields, including information retrieval, web searching, and text mining.
Text classification is typically approached as a supervised learning problem, where a machine
learning algorithm is trained on a labeled dataset of text documents, with the goal of developing
a model that can accurately classify new, unseen documents.
In this document, we focus on "Online Note Categorization and Automatic Tag Generation"
using the Multinomial Naive Bayes algorithm. The aim of this project is to classify personal
notes into predefined categories and generate automatic tags for them, making it easier to search
and filter the notes based on their content. Our approach involves pre-processing the textual
personal notes and then training the Multinomial Naive Bayes algorithm to classify them
categories. We then use these categories to automatically generate tags for each note, enabling
users to quickly filter and find the information they need.
Overall, the project aims to showcase the effectiveness of the Multinomial Naive Bayes
algorithm in text classification by accurately classifying textual notes into predefined categories
and generating automatic tags for them.
1
1.2 Problem Statement
With the ever-increasing amount of textual data generated on social media, news sites, and blogs,
there is a need for efficient organization and categorization of such data. This becomes
particularly important in countries like Nepal where natural language processing techniques are
not commonly used. Our project addresses the problem of categorizing personal notes into
predefined categories using Multinomial Naive Bayes algorithm and generating automatic tags
for those notes. The system can be useful for companies and governmental sectors to gather
insights from customer feedback, comments, and opinions.
“Jesus Christ was a famous Christian figure in the 1st century.” . → Religion ( #soc
#religion #christian )
“A new President will be sworn in tomorrow.” → Politics ( #talk #politics #misc )
“Just started using Adobe Photoshop.” → Technology ( #comp #graphics )
1.3 Objectives
The main objectives of this study are:
1. Develop a Multinomial Naive Bayes Classification model to identify, extract, and study
various personal notes, enabling effective text analysis.
2. Create a web application that automates the categorization of unstructured user notes into
different categories and generates relevant tags to describe the notes accurately.
2
1.4 Scope and Limitation
Some of the scopes are:
1. Analyze opinions and notes about different topics of different people.
2. Classify notes, comments, and subjective expressions into different categories and
generate automatic tags through a classification algorithm.
3. Efficient organization of notes to help users quickly organize and retrieve them easily.
4. Time-saving by eliminating the need for manual categorization and tagging of notes.
5. Consistency in categorization and tagging of notes, eliminating errors or inconsistencies
that may arise from manual categorization.
6. User-friendly and easy-to-use system, allowing users to focus on the content of their
notes instead of spending time organizing them.
Some of the limitations are:

1. Accuracy: The accuracy of the categorization and tagging is dependent on the quality of
the input data, and inaccurate categorization or tagging can lead to difficulty in finding
notes.
2. Limited categories: This system is limited to a pre-defined set of tags and categories,
which may not match the user’s specific needs or preferences.
3. Language limitations: The categorization and tagging algorithm is currently designed to
work only with notes written in English.
4. Dependency on input quality: The effectiveness of this system is highly dependent on
the quality and clarity of the notes provided as input, and unclear or poorly written notes
may result in inaccurate categorization and tagging.
3
1.5 Development Methodology
The software development methodology used to develop the Online Note Categorization system
is the Waterfall Methodology due to the simplicity of the project requirements and the absence of
stakeholders or users who may require frequent changes to the system. The Waterfall model
includes various phases such as requirements analysis, design, implementation, testing
(validation), integration, and maintenance. Each phase must be completed before the next one
can begin, and a review takes place at the end of each phase to determine the project's progress.
The testing starts only after the development is complete, and the outcome of one phase acts as
the input for the next phase sequentially, with no overlap between phases.
Figure 1.1: Waterfall Model
4
1.6 Report Organization
The report is organized as follows:
Preliminary Section: This section includes the title page, certificate page, acknowledgement,
abstract, table of contents, list of figures, list of tables, and list of abbreviations.
Chapter 1: Introduction - This chapter presents an overview of the project, including its
background, objectives, scope, limitations, development methodology, and report organization.
Chapter 2: Background Study and Literature Review - This chapter provides a detailed
description, summary, and critical evaluation of relevant literature and research related to the
project.
Chapter 3: System Analysis - This chapter focuses on the requirement analysis and feasibility
analysis of the project, including economic, technical, operational, and schedule feasibility.
Chapter 4: System Design - This chapter describes the system design process, including the
interpretation of findings and the definition of system components and data to satisfy
requirements.
Chapter 5: Implementation and Testing - This chapter discusses the results of implementing
the project, including checking the outputs and testing using various test cases.
Chapter 6: Conclusion and Recommendation - This chapter presents the conclusions and
recommendations based on the results of the project, including ideas for future development and
improvements to the system.
5
CHAPTER 2
BACKGROUND STUDY AND LITERATURE REVIEW
2.1 Background Study
Online note-taking applications have become increasingly popular in recent years due to the
convenience they offer in storing, organizing, and sharing notes. However, with the increasing
volume of notes and comments generated by users, it becomes difficult to efficiently categorize
and tag them for easy retrieval. This is where automatic note categorization systems come into
play. These systems use machine learning algorithms to categorize and tag notes based on their
content. Multinomial Naive Bayes Classification is one such algorithm that has been widely used
for text classification tasks, achieving high accuracy in various applications such as spam
filtering, sentiment analysis, and document categorization.
The Online Note Categorization system is designed to classify and tag notes, comments, and
subjective expressions into different categories using a classification algorithm. In order to
understand the fundamental theories and general concepts related to the project, it is important to
explore various areas such as Natural Language Processing (NLP), Machine Learning (ML),
Text Classification, Data Mining, and User Interface Design. NLP involves the analysis of
natural language text in order to categorize it into different classes. ML uses machine learning
algorithms to classify notes into different categories based on patterns and relationships within
the data. Text classification is the process of categorizing text into predefined categories based
on its content. Data mining is the process of discovering patterns in large datasets, which the
Online Note Categorization system uses to identify patterns and relationships within the data to
improve classification accuracy. Finally, user interface design will be utilized to create an
interface that is easy for users to understand and use.
Therefore, the aim of this project is to develop an automatic note categorization system using
Multinomial Naive Bayes Classification, which will allow users to efficiently organize their
notes and retrieve them easily. Understanding these fundamental theories, general concepts, and
terminologies related to the project is essential for the successful development of the Online Note
Categorization system.
6
2.2 Literature Review
Online note-taking applications have become increasingly popular in recent years due to the
convenience they offer in storing, organizing, and sharing notes. However, with the increasing
volume of notes and comments generated by users, it becomes difficult to efficiently categorize
and tag them for easy retrieval. This is where automatic note categorization systems come into
play. These systems use machine learning algorithms to categorize and tag notes based on their
content.
Multinomial Naive Bayes Classification is one such algorithm that has been widely used for text
classification tasks. It works by calculating the probability of a document belonging to a
particular category, based on the frequency of words in the document. This algorithm has been
shown to achieve high accuracy in text classification tasks and has been used in various
applications such as spam filtering, sentiment analysis, and document categorization.
Several projects have been developed in the area of automatic note categorization.
"Automatic Text Classification System"

This system [1] uses a Naive Bayes Classifier and Support Vector Machine (SVM) to classify
text documents into different categories. The system achieved an accuracy of 87.3% in
categorizing text documents from the Reuters-21578 dataset.
"NoteCatcher: A Note Categorization and Classification System for Personal Notebooks"

This system [2] uses a combination of keyword extraction and supervised machine learning
algorithms to classify notes into different categories. The system achieved an accuracy of
83.87% in classifying notes into seven different categories.
"Automatic Note Categorization Using Multi-Level Attention-Based Convolutional Neural

Network"
They proposed a novel approach using a multi-level attention-based convolutional neural
network to automatically categorize notes. It utilizes both word and character-level embeddings
to categorize notes. The model achieved an accuracy of 93.6% on a dataset consisting of
personal notes from college students.
7
Research [3] conducted by Li et al. (2017) on automatic text classification using Naive Bayes
showed that the algorithm achieved an average accuracy of 85.4% in classifying text documents
into different categories. Similarly, research [4] conducted by Zhang and Yang (2014) on
automatic categorization of emails using Naive Bayes showed that the algorithm achieved an
accuracy of 92.8%.
Another relevant study [5] was conducted by Chong et al. (2018) on automatic categorization of
scientific articles using Naive Bayes. The study showed that the algorithm achieved an accuracy
of 90.8% in classifying articles into different categories. They also compared Naive Bayes with
other classification algorithms, such as Support Vector Machines and Random Forest, and found
that Naive Bayes outperformed them in terms of accuracy and speed.
Automatic note categorization systems have become increasingly important due to the large
volume of notes and comments generated by users. Multinomial Naive Bayes Classification is a
widely used algorithm for text classification tasks and has been shown to achieve high accuracy
in various applications. Several projects have been developed in the area of automatic note
categorization using different techniques, such as keyword extraction and supervised machine
learning algorithms. Studies conducted by researchers have shown that Naive Bayes achieves
high accuracy in classifying text documents into different categories. Recent studies have also
proposed novel approaches using deep learning techniques, such as convolutional neural
networks, to improve the accuracy of note categorization systems.
8
CHAPTER 3
SYSTEM ANALYSIS
3.1 System Analysis
3.1.1 Requirement Analysis
Effective requirement analysis is critical for project success. Thorough research, clear
requirements definition, and careful tool selection are essential. It is crucial to choose
requirements wisely to align with project objectives and ensure effective output. The functional
and non-functional requirements are described below:
i. Functional Requirements
A functional requirement is something a system must do. In this project functional requirements
are:
1. User authentication: The system should allow users to create and manage their accounts
and provide login functionality.
2. Manage categories: The system should allow users to create, add, save, edit and delete
categories.
3. Manage notes: The system should allow users to create, add, save, edit, view and delete
notes.
4. Automatic tag generation: The system should generate relevant tags for notes based on
their content using natural language processing (NLP) techniques.
5. Note categorization: The system should provide functionality to categorize notes.
6. Important notes: The system should provide the option to mark important notes, so they
are visible in the important note list in the dashboard.
7. Tag-based search: The system should allow users to search notes based on tags and
categories.
8. Security: The system should ensure the privacy and security of user data and protect
against unauthorized access.
9
Figure 3.1 Use Case Diagram for Online Note Categorization System
10
ii. Non-Functional Requirements
The non-functional requirements of this web application can be summarized as follows:
1. Performance: The website must load fast and respond quickly to user requests. The
website must be fault-tolerant.
2. Usability: The website must have a minimalistic and non-distracting UI, with easy-to-use
functionality that is easy to understand. The website must have a login system for
authentication, and the user must be helped appropriately to fill in the mandatory fields in
case of invalid input.
3. Security: The website must have secure access to confidential data. Only authenticated
users must be allowed to view their respective databases. User credentials must be stored
securely in a database as a hash string.
4. Maintainability: The website must be easy to extend and reliable, providing correct
results. The analysis procedure must be carried out to increase the precision of the
system.
Overall, the web application must be responsive, secure, user-friendly, maintainable, and
scalable, providing positive user experience and optimal website performance.
3.1.2 Feasibility Analysis

Feasibility studies aim to provide an objective and rational evaluation of the strengths and
weaknesses of a proposed venture or existing business, as well as the resources required to carry
out the project. This feasibility analysis will evaluate the technical, operational, economic, and
schedule feasibility of the Online Note Categorization (ONC) and Automatic Tag Generation
System.
i. Technical Feasibility
The tools and software products required to construct the system are easily available on the web
and do not require a special environment to execute. It needs an IDE. While the application
requires simple user interfaces, the implementation of algorithms and real calculations are
complex. However, with some assistance from a supervisor, the technical feasibility of this
project is achievable.
11
ii. Operational Feasibility
This system will aid decision makers and business owners to make proper decisions by providing
efficient reports and classification. Statistics and reports generated from the system are easier to
read and understand, thus making the system operationally feasible.
iii. Economic Feasibility
The main economic cost would be the investment in hardware, software, and manpower required
to develop the system. Most of the tools and software required for development are freely
available. Therefore, the initial investment in hardware and software is expected to be low.
Furthermore, the system can be developed by a team of students, which would eliminate the need
for additional manpower cost. The only cost associated with manpower would be for the
development of skills required for the project, which can be achieved through self-learning or
with the assistance of a supervisor.
As the project does not involve any commercial use or product, there is no direct revenue
expected to be generated from the system. However, the system has potential benefits for
academic and research purposes, which can contribute to the advancement of knowledge and
technology.
iv. Schedule Feasibility
Activities Week (1-2) Week (3-4) Week (5-10) Week (11-12)
Requirements Analysis
Design X
Coding X
Testing X
Figure 3.2 Schedule Feasibility Gantt Chart
In conclusion, the Online Note Categorization system project is technically feasible with some
assistance, operationally feasible, economically feasible, and schedule feasible.
12
3.1.3 Analysis
Structured Approach
3.1.3.1 Data modelling using ER Diagram
ER diagrams consist of entities, attributes, and relationships, which are represented using
symbols such as rectangles, ovals, and diamonds, respectively. An entity represents a real-world
object or concept, such as a user, a note, or a category. An attribute is a characteristic of an
entity, such as a user's name, a note's content, or a category's description. A relationship
represents a connection between two or more entities, such as a user creating a note, a note
belonging to a category, or a note having multiple tags.
By using ER diagram to model the data for the Online Note Categorization system, we can
ensure that the relationships between the data are clearly defined and can be easily understood by
the people involved in this project. This can help to improve the accuracy and efficiency of the
development process and ensure that the application meets the requirements.
Figure 3.3: ER Diagram of Online Note Categorization System
13
3.1.3.2 Process Modelling using DFD
i. Level 0 DFD
Level 0 DFD is a basic overview of the whole system or process being modeled. It provides a
bird's eye view of the system and is an excellent starting point for understanding the system's
overall architecture and functionality.
Figure 3.4: Level 0 DFD of Online Note Categorization System
14
ii. Level 1 DFD
Level 1 Data Flow Diagram (DFD) is a detailed graphical representation of a system's processes,
inputs, outputs, and interactions with external entities at a more granular level than the Level 0
DFD. It provides a more detailed view of the system than the Level 0 DFD and is useful for
identifying the sub-processes and data flows that make up the system. This understanding is
necessary for the development of the system's detailed design and implementation.
Figure 3.5: Level 1 DFD of Online Note Categorization System
15
CHAPTER 4
SYSTEM DESIGN
System Design is a vital aspect of software development that involves defining the architecture,
modules, components, and interfaces of a system, as well as the data that goes through it. It
enables businesses and organizations to satisfy specific needs and requirements by engineering a
well-structured and efficient system. By providing a means to manage and control the software
development process, System Design helps to ensure that the resulting system is scalable,
maintainable, and upgradable over time. Ultimately, the goal of System Design is to create a
coherent and well-running system that meets the needs of the users.
4.1 Design
Database Design: Transformation of ER to relations and normalizations
The ER diagram for the system includes entities such as Signup, Category, Notes, and
Noteshistory, each with their own set of attributes.
To transform the ER diagram into a set of relations, each entity in the diagram is mapped to a
corresponding table in the database. For example, the Signup entity becomes a table named
Signup, with columns such as user_id, ContactNo, About, Role, and RegDate. Similarly, the
Category entity becomes a table named Category, with columns such as signup_id and
categoryName.
Next, relationships between the entities are mapped to foreign keys in the corresponding tables.
For example, the relationship between Signup and Category is represented by a foreign key
named signup in the Category table. This foreign key links the Category table to the Signup table
and ensures that each category is associated with a specific signup.
Once the tables are created, normalization rules are applied to ensure that the data is consistent
and free of redundancy. The first normal form (1NF) ensures that each table has a primary key
and that each column in the table is atomic. The second normal form (2NF) ensures that each
non-key column in the table depends on the entire primary key, and the third normal form (3NF)
ensures that each non-key column depends only on the primary key and not on other non-key
columns.
16
In the case of the Online Note Categorization System, normalizing the tables ensures that the
data is consistent and can be easily queried and manipulated. For example, by ensuring that each
table has a primary key, queries can be performed more efficiently, and data can be easily
updated or deleted without affecting other data in the table.
Overall, transforming the ER diagram into a set of relations and applying normalization rules are
critical steps in database design for the Online Note Categorization System.
17
Figure 4.1: Relational Data Model of Online Note Categorization System
Forms Design
18
Forms design ensure an intuitive and efficient user experience when interacting with the system's
forms. Forms are the primary means for users to input and manipulate data in the system. When
documenting forms design, the following aspects were addressed:
1.User-Friendly Interface: Forms with a user-centric approach, ensuring a clean and intuitive
interface. Appropriate labels, field types, and validation techniques to guide users and prevent
errors.
2. Error Handling and Validation: Validation techniques to ensure data integrity and minimize
errors. Clear error messages when users submit invalid data or miss required fields.
Interface Design
Interface design elements ensure an intuitive and engaging user experience, allowing users to
interact with the system effectively. Here's an overview of interface design considerations for the
project:
1. Visual Consistency: Visual consistency throughout the interface to provide a cohesive and
unified experience. Consistent color schemes, typography, and iconography across different
screens and components.
2. Clear Information Hierarchy: Organized the interface elements in a logical and hierarchical
manner. Important information and features are easily discoverable and accessible to users.
Appropriate visual cues, such as headings, labels, and grouping, to convey the information
hierarchy.
3. Intuitive Navigation: Intuitive navigation menus and controls to help users navigate through
different sections and features of the system. Familiar navigation patterns.
4.2 Algorithm Details

Data Collection
The dataset used is the 20 newsgroups dataset which is fetched from sklearn.datasets. It
consists of 20 different categories. The dataset was divided into train set and test set.
The train set of data were used to train the model and the test set of data were used to
test the accuracy of the model. The specific data in the train set data and test set data
19
were each kept inside with their respective category folder name.
Data Preprocessing
1. Text Cleaning and Tokenization
Punctuations and case are removed in the required field. Extra spaces are removed if found any.
Then the input sentence is broken down into array of words known as tokens.
2.Stop Word Removal
After cleaning and tokenizing the input text, stop words are removed. Since stop words doesn’t
have categorical meaning. Some of the stop word are a, an, the, and, are, as etc.
3.Porter Stemming
It is the process for removing common morphological endings from words in English. It is
mainly used to remove the redundant words that give the same meaning. For example, catch,
catches, catching etc. are different forms of word catch. So, instead of keeping all of those
words, we represented all the words with a single word, i.e., catch, by performing Porter
Stemming.
Feature Extraction
In feature extraction, we use the Term Frequency and Inverse Document Frequency (TFIDF).
TFIDF
gives the relative weight of the individual words in the given input.
Calculation of TF,
Calculation of IDF,
Calculation of TFIDF,
20
Or,
Where,
= Term frequency,
= Inverse document frequency.
= Term k in document .
= Frequency of term in document .
=Inverse document frequency of in document collection C.
N = Total number of document in the collection C.
𝑛𝑘= The number of document in C that contain𝑇𝑘.
TfidfVectorizer
It is a built-in class provided by the scikit-learn library in Python. It combines both the TF and
IDF calculations into a single step. By using TfidfVectorizer, you can calculate TF-IDF scores,
and transform the data into a TF-IDF matrix in a single line of code. It is highly optimized and
efficient, making it suitable for processing large amounts of text data.
Multinomial Naive Bayes Algorithm

Classifier
Classification is the problem of identifying to which of a set of categories a new observation
belongs, on the basis of a training set of data containing observation whose category is known. A
classifier is an algorithm that maps the input data to a specific category. Multinomial Naive
Bayes model is easy to build and particularly useful for very large data sets. Along with the
simplicity, Multinomial Naive Bayes is also known to outperform even highly sophisticated
classifications methods. Studies comparing classification algorithms have found a simple
Bayesian classifier known as the naive Bayesian classifier to be comparable in performance with
21
decision tree and selected neural network classifiers. Bayesian classifiers have also exhibited
high accuracy and speed when applied to large databases.
Naive Bayesian classifiers assume that the effect of an attribute value on a given class is
independent of the values of the other attributes. This assumption is called class conditional
independence. It is made to simplify the computations involved and, in this sense, is considered
“naive”.
Bayes’ Theorem
Bayes’ theorem is named after Thomas Bayes, a nonconformist English clergyman who did early
work in probability and decision theory during the 18th century. Let X be a data tuple. In
Bayesian terms, X is considered “evidence.” As usual, it is described by measurements made on
a set of n attributes. Let H be some hypothesis such as that the data tuple X belongs to a specified
class C. For classification problems, we want to determine P(H|X), the probability that the
hypothesis H holds given the “evidence” or observed data tuple X. In other words, we are
looking for the probability that tuple X belongs to class C, given that we know the attribute
description of X. P(H|X) is the posterior probability, or a posteriori probability, of H conditioned
on X.
In contrast, P(H) is the prior probability, or a priori probability, of H.
Similarly, P(X|H) is the posterior probability of X conditioned on H
P(X) is the prior probability of X.
Bayes’ theorem is
P (X ∨H ) P( H )
P(H|X) =
P(X )
Naive Bayesian Classification
The naive Bayesian classifier, or simple Bayesian classifier, works as follows:
1. Let D be a training set of tuples and their associated class labels. As usual, each tuple is
represented by an n-dimensional attribute vector, X = (x1, x2,..., xn), depicting n measurements
made on the tuple from n attributes, respectively, A1, A2,..., An.
2. Suppose that there are m classes, C1, C2,..., Cm. Given a tuple, X, the classifier will predict
that X belongs to the class having the highest posterior probability, conditioned on X. That is, the
naive Bayesian classifier predicts that tuple X belongs to the class Ci if and only if
P (Ci |X) > P (Cj |X) for 1 ≤ j ≤ m, j ≠ i.
22
Thus, we maximize P (Ci |X). The class Ci for which P (Ci |X) is maximized is called the
maximum posteriori hypothesis. By Bayes’ theorem,
P (X ∨Ci)P(Ci)
P (Ci |X) =
P (X ).
3. As P(X) is constant for all classes, only P(X|Ci) P(Ci) needs to be maximized. If the class prior
probabilities are not known, then it is commonly assumed that the classes are equally likely, that
is, P(C1) = P(C2) = ··· = P(Cm), and we would therefore maximize P(X|Ci). Otherwise, we
maximize P(X|Ci) P(Ci). Note that the class prior probabilities may be estimated by P(Ci) = |Ci,
D|/|D|, where |Ci, D| is the number of training tuples of class Ci in D.
4. Given data sets with many attributes, it would be extremely computationally expensive to
compute P(X|Ci). To reduce computation in evaluating P(X|Ci), the naive assumption of class-
conditional independence is made. This presumes that the attributes’ values are conditionally
independent of one another, given the class label of the tuple (i.e., that there are no dependence
relationships among the attributes). Thus,
n
P(X|Ci) = ∏ ¿ 1 P (xk |Ci)
k
= P(x1|Ci) × P(x2|Ci) × ··· × P(xn|Ci).

We can easily estimate the probabilities P(x1|Ci), P (x2|Ci), P(xn|Ci) from the training tuples.
Recall that here xk refers to the value of attribute Ak for tuple X. For each attribute, we look at
whether the attribute is categorical or continuous-valued. For instance, to compute P(X|Ci), we
consider the following:
(a) If Ak is categorical, then P (xk |Ci) is the number of tuples of class Ci in D having the value
xk for Ak, divided by |Ci, D|, the number of tuples of class Ci in D.
(b) If Ak is continuous-valued, then we need to do a bit more work, but the calculation is pretty
straightforward. A continuous-valued attribute is typically assumed to have a Gaussian
distribution with a mean µ and standard deviation σ, defined by
1 −¿
(x−µ)
¿
g (x, µ, σ) = e 2σ
√ 2 πσ
so that
P (xk |Ci) = g (xk, µCi, σCi).
23
We need to compute µCi and σCi, which are the mean (i.e., average) and standard deviation,
respectively, of the values of attribute Ak for training tuples of class Ci. We then plug these two
quantities together with xk, to estimate P (xk |Ci).
5. To predict the class label of X, P(X|Ci) P(Ci) is evaluated for each class Ci. The classifier
predicts that the class label of tuple X is the class Ci if and only if
P(X|Ci) P(Ci) > P(X|Cj) P(Cj) for 1 ≤ j ≤ m, j ≠ i.
(In other words, the predicted class label is the class Ci for which P(X|Ci) P(Ci) is the maximum.
“How effective are Bayesian classifiers?” Various empirical studies of this classifier in
comparison to decision tree and neural network classifiers have found it to be comparable in
some domains. In theory, Bayesian classifiers have the minimum error rate in comparison to all
other classifiers. However, in practice this is not always the case, owing to inaccuracies in the
assumptions made for its use, such as class-conditional independence, and the lack of available
probability data. Bayesian classifiers are also useful in that they provide a theoretical justification
for other classifiers that do not explicitly use Bayes’ theorem. For example, under certain
assumptions, it can be shown that many neural network and curve-fitting algorithms output the
maximum posteriori hypothesis, as does the naive Bayesian classifier.
Algorithm explanation with example:

1. Take the notes description field content from the logged in user.
For example: Just started using Adobe Photoshop and it's amazing what you can do with it.
2. Perform the text preprocessing
a) Remove punctuation marks.
For example: Just started using Adobe Photoshop and its amazing what you can do with it
b) Convert to lower case.
For example: just started using adobe photoshop and its amazing what you can do with it
c) Tokenize the text into words.
For example: ['just', 'started', 'using', 'adobe', 'photoshop', 'and', 'its', 'amazing', 'what', 'you',
'can', 'do', 'with', 'it']
d) Remove the stop words
For example: ['started', 'using', 'adobe', 'photoshop', 'amazing']
e) Stem the remaining words
24
Example: ['start', 'use', 'adob', 'photoshop', 'amaz']
f) Join the words back into the string
Example: [start use adob photoshop amaz]
3) Transform using a TF-IDF vectorizer.
(0, 110891) 0.17194467768154792
(0, 103045) 0.269221459723549
(0, 89179) 0.6323672913057574
(0, 21102) 0.4597762305091781
(0, 19675) 0.5354178370083506
4) Classify the text.
[1]
5) Map the predicted category index to the corresponding category label:
comp.graphics
In this way, every note description is categorized into predefined categories and the tags are
generated by splitting the words.
CHAPTER 5
IMPLEMENTATION AND TESTING
5.1 Implementation
The implementation of the Online Note Categorization System with Automatic Tag Generation
was carried out in three distinct phases. This chapter provides an overview of the implementation
process, including the utilization of algorithms, the development of backend APIs and databases,
and the creation of a frontend website for note management.
Phase 1: Algorithm Implementation
In the first phase, the focus was on implementing the Multinomial Naive Bayes algorithm along
with the necessary preprocessors for handling textual data. This algorithm was chosen for its
effectiveness in categorizing notes based on their content. Preprocessors such as tokenization,
stemming, and stop word removal were applied to enhance the accuracy of the categorization
process.
25
Phase 2: Backend API and Database Implementation
The second phase involved the development of the backend API, which serves as the backbone
of the system. The API was responsible for handling user requests, managing user profiles,
categories, and notes, and integrating the categorization and tag generation algorithms. The
database, implemented using technology SQLite was used to store user information, notes,
categories, and other relevant data.
Phase 3: Frontend Website Development
The final phase focused on creating a user-friendly frontend website for note management.
Technologies such as HTML, CSS, Bootstrap, and Django were employed to design and
implement the user interface. The website provided different dashboards for customers and
admin users with superuser privileges. While the landing page allowed visitors to browse the
site, features such as profile management, category management, and note management required
users to either log in or register in the system.
Execution and Deployment

At present, the system is deployed on a local machine. The system is initiated by running a
command in the local terminal, and users can access it by entering the appropriate URL in their
web browser. The user interface provides distinct dashboards for customers and admin users,
offering features tailored to their respective roles and permissions.
In conclusion, the implementation phase involved translating the project plan into action. It
followed the design process outlined in the previous chapter, ensuring that each module was
implemented as per the system's requirements. The implementation was carried out in a
structured manner, incorporating algorithms, backend APIs, databases, and frontend website
development. The resulting system is a functional online note categorization and tag generation
platform, providing users with an efficient and user-friendly experience.
26
5.1.1 Tools Used
Tools Used in the Implementation of Online Note Categorization and Automatic Tag Generation
System:
Web Technologies:
1. HTML (Hypertext Markup Language): The standard markup language used to create the
structure and content of web pages.
2. CSS (Cascading Style Sheets): A style sheet language used for defining the presentation
and layout of HTML documents.
3. JavaScript: A programming language that enables interactive and dynamic behavior on web
pages.
Web Framework and Backend Technologies:

1. Django: A Python-based web framework used for backend development, providing a robust
structure and features for building web applications.
2. Python: The core programming language used for implementing the backend logic and
algorithms of the system.
Database Platform:
1. SQLite: A lightweight and serverless relational database management system used for local
development and testing.
CSS Framework:
1. Bootstrap: A popular CSS framework that provides pre-built styles and components for
creating responsive and visually appealing web pages.
Additional Tools:
1. Jupyter Notebook: An interactive computational environment used for running Python
code and creating Jupyter notebook documents.
27
2. Draw.io: A browser-based diagramming tool used for creating system design diagrams,
flowcharts, and block diagrams.
These tools were essential for the implementation of the Online Note Categorization and
Automatic Tag Generation System. They provided the necessary frameworks, languages, and
utilities to build the frontend interface, develop the backend logic, manage databases, and create
visual system design diagrams. The combination of HTML, CSS, and JavaScript enabled the
creation of a user-friendly and interactive web interface. Django, along with Python, facilitated
the development of the backend API and database integration. SQLite served as the database
platform for local development and testing. Bootstrap provided a convenient set of styles and
components for responsive web design. Jupyter Notebook allowed for the execution of Python
code and the integration of natural language processing functionalities. Draw.io was utilized for
creating system design diagrams and visual representations of the project.
5.1.2 Implementation Details of Modules

Model Implementation:
The model implementation uses the scikit-learn library for text classification.
1. The `fetch_20newsgroups` function from `sklearn.datasets` is imported to load the 20
Newsgroups dataset.
2. The dataset is split into training and testing subsets using the `subset` and `categories`
parameters.
3. The NLTK library is imported to perform text preprocessing tasks such as removing
punctuation, converting to lowercase, tokenizing, removing stop words, and stemming.
4. The `preprocess` function is defined to preprocess the text data using the NLTK
preprocessing tasks.
5. The training and testing data are preprocessed using a loop and stored in separate lists.
28
6. The `TfidfVectorizer` from scikit-learn is used to convert the preprocessed text data into
feature vectors.
7. A pipeline is created using `make_pipeline`, which combines the vectorizer and a
`MultinomialNB` classifier.
8. The model is trained using the training features and target labels.
9. The model and categories are saved using the `pickle` module.
Views Implementation:
The views implementation consists of several functions that handle different HTTP requests and
render the corresponding HTML templates.
1. The ìndex` and àbout` functions simply render the corresponding HTML templates.
2. The `register` function handles the registration process. It retrieves the user's input from
the request, creates a new user object, and saves it to the database.
3. The ùser_login` function handles the user login process. It authenticates the user using
the provided credentials and redirects to the appropriate page based on the user's role
(admin or regular user).
4. The `dashboard` function retrieves user-related information and renders the dashboard
template.
5. The `profile` function handles updating the user's profile information.
6. The `manageCategory` function handles managing user categories. It retrieves existing
categories and allows the user to add new categories.
7. The èditCategory` function handles editing a category's details.
8. The `deleteCategory` function handles deleting a category.
9. The `manageNotes` function handles managing user notes. It retrieves the user's
categories and allows the user to add new notes.
10. The èditNotes` function handles editing a note's details.
11. The `viewNotes` function handles viewing a specific note and allows the user to add
comments on the note.
12. The `deleteNotesHistory` function handles deleting a note's comment history.
13. The `deleteNotes` function handles deleting a note.
29
14. The `generalCategory` and `specificCategory` functions render the corresponding HTML
templates for category selection.
15. The `resultSpecific` and `resultGeneral` functions handle the prediction process based on
the notes description.
16. The `searchNotes` function handles searching for notes based on the user's query.
17. The `changePassword` function handles the password change process for the logged-in
user.
18. The `Logout` function handles logging out the user.
5.2 Testing
The testing phase determines the possible flaws and the potential inefficiency of the system.
After building the project, following testing measures are applied and got the results as shown
below:
5.2.1 Test Cases for Unit Testing

Unit testing is a software testing technique in which individual units or components of a software
application are tested in isolation from the rest of the system. The primary goal of unit testing is
to identify and fix bugs in the code early in the development process, reducing the overall cost of
development and maintenance. By writing and running tests for each unit, we ensured that the
code works as expected.
Table 5.1: Test Case for User Registration
Test Case Input Expected Result Test Result
Successfull First Name = Ram A message A message

User Last Name = Pudasaini “Registeration “Registeration
Registration Email = ram@gmail.com Successfull” should Successfull” is
Password = ram be displayed. displayed.
Contact no.= 9812345678
About = I am a dancer.
30
User First Name = Ram An error message An error message
Registration Last Name = Pudasaini related to incomplete “Please enter a
Fail Email = ram@ email address should part following
Password = ram be displayed. @. ram@ is
Contact no.= 9812345678 incomplete.” is
About = I am a dancer. displayed.
Table 5.2: Test Case for User Login
Test Case Input Expected Result Test Result
31
Successful Email = kripesh@gmail.com The login page is The login page is
User Password = kripesh redirected to personal redirected to personal
Login dashboard with the dashboard with the
message “Logged In message “Logged In
successfully.” successfully.”
User Login Email = kripesh@gmail.com A message “Invalid A message “Invalid

Fail Password = kripesh123 Credentials, Try Credentials, Try
Again” should be Again” is
displayed. displayed.
Table 5.3:Test Case for Admin Login
32
S.N. Action Inputs Expected Actual Test
Output Output Result
1 Open Website http://127.0.0.1: Show Admin Show Admin Pass

Admin 8000/admin/ Dashboard. Dashboard.
Dashboard.
2 Enter Username: Login Success. Login Success. Pass

username and bijaynation@gmail.com
password. Password:
bijaynation
3 Add user, Insert user , category Added Added Pass

category and and note information. successfully. Successfully
note.
4 Delete the Delete the selected item. Successfully Successfully Pass

Data. deleted. deleted.
5 Update data Insert updated data . Updated Updated Pass
Successfully Successfully
Table 5.4: Test Case for Text Processing
33
Test Unit Test Expected Result Test Outcome Evidence
no.
1 Removal of Removing Text with Successful Test A

punctuation punctuation punctuations
marks. marks removed
2 Lower Case Converting to Lower case text. Successful Test B

Conversion lower case.
3 Tokenizer Tokenizing the Tokenized words. Successful Test C

text into words.
4 Removal of Removing the Words without Successful Test D

stop words. stop words. stop words.
5 Porter Stem words Root word of each Successful Test E

Stemming word.
Test A: Punctuation marks removal.

Input: Just started using Adobe Photoshop and it's amazing what you can do with it.
Output: Just started using Adobe Photoshop and its amazing what you can do with it
Test B: Lower Case Conversion.

Input: Just started using Adobe Photoshop and its amazing what you can do with it
Output: just started using adobe photoshop and its amazing what you can do with it
Test C: Tokenization
Input: just started using adobe photoshop and its amazing what you can do with it
Output: ['just', 'started', 'using', 'adobe', 'photoshop', 'and', 'its', 'amazing', 'what', 'you', 'can', 'do',
'with', 'it']
Test D: Stop-Words removal
Precondition: A list of stop words is available
34
Assumption: Given words are preprocessed
Input: ['just', 'started', 'using', 'adobe', 'photoshop', 'and', 'its', 'amazing', 'what', 'you', 'can', 'do',
'with', 'it']
Output: ['started', 'using', 'adobe', 'photoshop', 'amazing']
Test E: Porter Stemming

Input: ['started', 'using', 'adobe', 'photoshop', 'amazing']
Output: ['start', 'use', 'adob', 'photoshop', 'amaz']
5.2.2 Test Cases for System Testing

System testing is a crucial phase in software testing that focuses on evaluating the complete and
integrated system as a whole. It aims to ensure that all the components of the software interact
correctly and fulfill the specified requirements. By simulating real-world scenarios and testing
various system functionalities, system testing helps identify defects or inconsistencies that may
arise due to the interactions between different modules or components.
Table 5.5 :Test Case for Managing Category
35
S.N. Test case Input Expected Result Test Result
1 User login Email : ritesh@gmail.com Dashboard should be Dashboard is

password : ritesh displayed displayed
2 Add Category Fill Form to add category A successful message “New Category
should be displayed has been
Added.”
message is
displayed
3 Display Category Go to the manage category Recently Added category Recently Added
page should be displayed category is
displayed
Table 5.6 :Test Case for Managing Notes
S.N. Test case Input Expected Result Test Result
1 User login Email : ritesh@gmail.com Dashboard should be Dashboard is

password : ritesh displayed displayed
2 Add Note. Fill Form to add note. A successful message “New Note has
should be displayed been Added.”
message is
displayed
3 Display Notes Go to the manage note Recently added note Recently added
page should be displayed. note is
displayed .
Table 5.7 :Test Case for Text Categorization
Test Part Test Expected Result Test Outcome Evidence
36
no.
1 General Text Classify input text Input text classified Successful Test F
Classification into predefined into predefined
general category general category
2 Specific Text Classify input text Input text classified Successful Test G
Classification into predefined into predefined
specific category specific category
Test F: General Text Classification

Output: Technology
Test G: Specific Text Classification

Output: comp.graphics
5.3 Result Analysis

After the completion of the system, a comprehensive result analysis was conducted to ensure its
effectiveness. Every aspect of the system was thoroughly examined by providing both valid and
invalid inputs to evaluate the outcomes. In case of any errors encountered, the code was
promptly revised and retested to ensure optimal functionality.
37
The system demonstrated exceptional performance throughout the testing phase, meeting and
even exceeding expectations. The admin successfully carried out all the required CRUD
operations without any issues, indicating the system's seamless operation and adherence to the
specified requirements.
Upon analyzing the results of all the testing conditions, it can be confidently stated that the
system has successfully passed all the intended tests. It exhibited the desired functionality,
reliability, and accuracy, thus validating its effectiveness in note categorization and automatic tag
generation.
The comprehensive result analysis not only confirmed the system's ability to perform as expected
but also provided valuable insights for further improvements and enhancements.
CHAPTER 6
CONCLUSION AND FUTURE RECOMMENDATIONS
6.1 Conclusion
In conclusion, the development and implementation of the Online Note Categorization System
have been successfully accomplished. The system offers an efficient and user-friendly platform
38
for organizing and managing notes by automatically generating relevant tags based on their
content.
Throughout the project, careful attention was given to ensuring the system's accuracy, reliability,
and performance. The implementation phase involved thorough testing and result analysis, which
validated the system's ability to handle various scenarios and deliver the expected outcomes.
The system's key features, such as user registration, note creation, categorization, and tag
generation, were implemented with precision and have proven to be effective in enhancing
productivity and organizing notes effectively. The intuitive user interface and seamless user
experience contribute to the system's usability and overall satisfaction.
By automating the tag generation process, the system eliminates the need for manual tagging,
saving time and effort for users. The categorization of notes based on their content further
enhances accessibility and facilitates efficient retrieval of information.
6.2 Future Recommendation

Moving forward, there are several recommendations for enhancing the Online Note
Categorization to further improve its functionality and user experience:
1. Collaboration Features: Introducing collaborative features would enable multiple users to
work on shared notes or projects. This could include real-time editing, version control, and
commenting features, fostering collaboration and teamwork among users.
39
2. Integration with External Platforms: To expand the system's reach and usability, integrating
with external platforms and services could be beneficial. Integration with popular productivity
tools, cloud storage services, or note-taking applications would allow seamless synchronization
and access to notes across multiple platforms.
3. Mobile Application: Developing a mobile application for the system would provide users
with the flexibility to access and manage their notes on the go. A mobile app would enhance
convenience and cater to users who prefer using their smartphones or tablets for note-taking and
organization.
4. User Feedback and Analytics: Implementing a feedback mechanism and analytics system
would enable users to provide suggestions and report issues. Analyzing user feedback and
system usage data can provide valuable insights for further system improvements and feature
enhancements.
By considering these recommendations, the Online Note Categorization and Automatic Tag
Generation System can evolve into a more comprehensive and powerful tool for efficient note
management and organization. Continuous development, user feedback, and staying updated
with technological advancements would contribute to the system's long-term success and user
satisfaction.
REFERENCES
[1] X. Dong, J. Zhang and X. Li, "Automatic text classification system based on Naive Bayes
and SVM.," Journal of Information Science and Engineering, vol. 31, no. 1, pp. 301-312,
2015.
40
[2] T. Ananthakrishnan and R. Krishnamurthy, "NoteCatcher: A Note Categorization and
Classification System for Personal Notebooks.," in IEEE International Conference on Data
Science and Data Intensive Systems, 2015.
[3] X. Li, X. Dong and J. Zhang, "Research on Automatic Text Classification Based on Naive
Bayes.," Journal of Physics: Conference Series, 2017.
[4] J. Zhang and W. Yang, "Automatic categorization of emails based on naïve Bayes
algorithm.," Journal of Computational Information Systems, vol. 10, no. 2, pp. 781-788,
2014.
[5] F. K. Chong, C. S. G. Khoo and B. H. Kang, "Automatic categorization of scientific articles:
An empirical study.," Journal of Information Science, vol. 44, no. 3, pp. 373-390, 2018.
41
APPENDIX
Figure: Home Page
Figure: Registration Page
42
Figure: Login Page
Figure: Dashboard
43
Figure: Manage Profile Page
Figure: Manage Category page
44
Figure: Manage Notes Page
45
Figure: Categorization Page
Figure: Search Results for Technology
46
Figure: Django Admin Panel
47

Online Note Categorization

Uploaded by

Copyright:

Available Formats

Online Note Categorization

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Online Note Categorization

Uploaded by

Copyright:

Available Formats

COLLEGE OF APPLIED BUSINESS

Institute of Science and Technology

Department of Computer Science and Information Technology

Kripesh Prasad Bhattarai

College of Applied Business

I am pleased to recommend the project titled "ONLINE NOTE CATEGORIZATION

College of Applied Business

College of Applied Business

We would like to express our sincere gratitude as we acknowledge the completion

Name: Bijay Basnet

Name: Kripesh Prasad Bhattarai

Name: Ritesh Shrestha

Keywords: MULTINOMIAL NAIVE BAYES, Online Note, Python, Django

1.2 Problem Statement.......................................................................................................2

1.4 Scope and Limitation...................................................................................................3

1.5 Development Methodology.........................................................................................4

1.6 Report Organization....................................................................................................5

BACKGROUND STUDY AND LITERATURE REVIEW.....................................................6

2.1 Background Study.......................................................................................................6

2.2 Literature Review........................................................................................................7

3.1 System Analysis..........................................................................................................9

3.1.1 Requirement Analysis..........................................................................................9

ii. Non-Functional Requirements.................................................................................11

3.1.2 Feasibility Analysis............................................................................................11

ii. Operational Feasibility.............................................................................................12

iii. Economic Feasibility..............................................................................................12

4.2 Algorithm Details......................................................................................................18

IMPLEMENTATION AND TESTING...................................................................................24

5.1.1 Tools Used.............................................................................................................25

5.1.2 Implementation Details of Modules...................................................................27

5.2.1 Test Cases for Unit Testing................................................................................28

5.2.2 Test Cases for System Testing...........................................................................33

5.3 Result Analysis..........................................................................................................36

CONCLUSION AND FUTURE RECOMMENDATIONS....................................................37

6.2 Future Recommendation................................................................................................38

Figure 1.1 Waterfall Model........................................................................................................4

Table 5.1 Test Case for User Registration

“A new President will be sworn in tomorrow.” → Politics ( #talk #politics #misc )

“Just started using Adobe Photoshop.” → Technology ( #comp #graphics )

Some of the limitations are:

Figure 1.1: Waterfall Model

"Automatic Text Classification System"

"NoteCatcher: A Note Categorization and Classification System for Personal Notebooks"

"Automatic Note Categorization Using Multi-Level Attention-Based Convolutional Neural

3.1.2 Feasibility Analysis

Activities Week (1-2) Week (3-4) Week (5-10) Week (11-12)

Figure 3.2 Schedule Feasibility Gantt Chart

Figure 3.3: ER Diagram of Online Note Categorization System

Figure 3.4: Level 0 DFD of Online Note Categorization System

Figure 3.5: Level 1 DFD of Online Note Categorization System

4.2 Algorithm Details

Multinomial Naive Bayes Algorithm

= P(x1|Ci) × P(x2|Ci) × ··· × P(xn|Ci).

Algorithm explanation with example:

Execution and Deployment

Web Framework and Backend Technologies:

5.1.2 Implementation Details of Modules

5.2.1 Test Cases for Unit Testing