Sample Mini Project Report
Sample Mini Project Report
Sample Mini Project Report
A PROJECT REPORT
Submitted by
of
BACHELOR OF ENGINEERING
IN
KAVARAIPETTAI - 601206
ANNA UNIVERSITY: CHENNAI – 600 025
MAY – 2021
ANNA UNIVERSITY: CHENNAI – 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Dr. P. Ezhumalai,B.E,M.Tech, Ph.D., Dr. A.Gnanasekar,AMIE., M.Tech., Ph.D,
HEAD OF THE DEPARTMENT, ASSOCIATE PROFESSOR,
Department of Computer Science Department of Computer Science
Engineering, Engineering,
RMD Engineering College RMD Engineering College
R.S.M Nagar R.S.M Nagar
Kavaraipettai – 601 206. Kavaraipettai – 601 206.
ii
VIVA-VOCE EXAMINATION
iii
ACKNOWLEDGEMENT
We take this opportunity to give profound and heartfelt thanks to the Dean-
Research Dr. K.SIVARAM B.E., M. Tech., Ph.D., and Dean-Academic
Dr.K. K. THYAGHARAJAN B.E., M.E., Ph.D., for their continuous support in
successful completion of this project.
The presence of bots has been felt in many aspects of social media. Twitter, one
example of social media, has especially felt the impact, with bots accounting for a
large portion of its users. These bots have been used for malicious tasks such as
spreading false information about political candidates and inflating the perceived
popularity of celebrities. Furthermore, these bots can change the results of common
analyses performed on social media. With the significant increase in the volume,
velocity, and variety of user data (e.g., user generated data) in online social networks,
there have been attempts to design new ways of collecting and analyzing such big
data. For example, social bots have been used to perform automated analytical
services and provide users with improved quality of service. However, malicious
social bots have also been used to disseminate false information (e.g., fake news), and
malicious social bots in online social networks is crucial. The most existing detection
methods of malicious social bots analyze the quantitative features of their behavior.
These features are easily imitated by social bots; thereby resulting in low accuracy of
semi-supervised clustering, is presented in this paper. This method not only analyzes
v
transition probability of user behavior clickstreams but also considers the time feature of
behavior. Findings from our experiments on real online social network platforms
demonstrate that the detection accuracy for different types of malicious socialbots by the
detection method of malicious social bots based on transition probability of user behavior
x
LIST OF ABBREVIATIONS
ABBREVIATION EXPANSION
xi
CHAPTER 1 - INTRODUCTION
GENERAL:
In online social networks, social bots are social accounts controlled by automated
programs that can perform corresponding operations based on a set of procedures. The
increasing use of mobile devices (e.g., Android and iOS devices) also contributed to an increase
in the frequency and nature of user interaction via social networks. It is evidenced by the
significant volume, velocity and variety of data generated from the large online social network
user base. Social bots have been widely deployed to enhance the quality and efficiency of
collecting and analyzing data from social network services. For example, the social bot SF
QuakeBot is designed to generate earthquake reports in the San Francisco Bay, and it can analyze
earthquake related information in social networks in real-time. However, public opinion about
social networks and massive user data can also be mined or disseminated for malicious or
nefarious purposes. In online social networks, automatic social bots cannot represent the real
desires and intentions of normal human beings, so they are usually looked upon as malicious
ones. For example, some fake social bots accounts created to imitate the profile of a normal user,
steal user data and compromise their privacy, disseminate malicious or fake information,
malicious comment, promote or advance certain political or ideology agenda and propaganda,
and influence the stock market and other societal and economical markets. Such activities can
adversely impact the security and stability of social networking platforms. In previous research,
various methods were used to protect the security of online social networks. User behavior is the
most direct manifestation of user intent, as different users have different habits, preferences, and
online behavior (e.g., the way one clicks or types, as well as the speed of typing). In other words,
we may be able to mine and analyze information hidden in a user's online behavior to profile and
identify different users. However, we also need to be conscious of situational factors that may
play a role in changing a user's online behavior. In other words, user behavior is dynamic and its
environment is constantly changing i.e., external observable environment (e.g., environment and
behavior) of application context and the hidden environment in user information.
OBJECTIVE:
In order to distinguish social bots from normal users accurately, detect malicious social
bots, and reduce the harm of malicious social bots, we need to acquire and analyze social
1
situations of user behavior and compare and understand the differences of malicious social bots
and normal users in dynamic behavior. Specifically, in this paper, we aim to detect malicious
social bots on social network platforms in real-time, by
(1) proposing the transition probability features between user clickstreams based on the social
situation analytics; and
(2) designing an algorithm for detecting malicious social bots based on spatiotemporal features.
EXISTING SYSTEM:
Recent statistics show that more than 50% of Twitter accounts are not human users.
Social network administrators are well aware of these harmful activities and try to delete these
users using their suspension/removal systems. By one estimate 28% of accounts created in 2008
and half of the accounts created in 2014 have been suspended by Twitter. What is not well taken
care of is the role of bots in facilitating these malicious activities. In one study, 145,000 accounts
survived for months without detection.
Today, 16% of spammers on Twitter are bots. Although social network spam detection
approaches are still insufficient, bot detection in social networks has received wide attention
from the research community in recent years. Botnets become widespread in wired and wireless
networks. In particular, bots in a botnet are able to cooperate towards a common malicious
purpose. In recent years, social bots have become very popular in social networks, and they can
imitate human activities in social networks. They are also programmed to work together to fulfill
the prescribed tasks. There are a wide range of methods (e.g., sophisticated techniques and tools
that may be associated with nation states and state-sponsored actors) used by some users with
malicious or nefarious intent as well as social bots. For example, in order to imitate the features
of human users successfully, social bots may `crawl' for words and pictures from online social
networks to complete fabricated user profiles and so on. Semi-social bots between humans and
social bots have also reportedly emerged in social networks, which are highly complex social
bots that bear the characteristics of human behaviour and social bot behaviour. The automated
procedure for semi-social bots is generally activated by humans, and the subsequent actions are
automatically performed by social bots. This process further increases the uncertainty of the
operation time of social bots.
Social bots are generally more intelligent and they can more easily imitate human
behaviour, and they cannot be easily detected. In existing literature, social bots are generally
detected using machine learning-based approaches, such as Bot or not released by Twitter in
2014. In Bot or not, the random forest model is used in both training and analysis by using
historical social information of normal users and social bots accounts. Based on six features (i.e.
network, user, making friends, time, content and emotion),this model distinguished normal users
from social bots. Morstatter et al. proposed a heuristic type supervised Boost OR model with
increasing recall rate to detect malicious bots, which using the proportion of tweets forwarded
toP. Shi et al.: Detecting Malicious Social Bots Based on Clickstream Sequences the published
tweets on Twitter, the mean length of tweets, URL, and forwarding interval. Wang et al.
constructed a semi-supervised clickstream similarity graph model for user behaviour to detect
abnormal accounts in Renren.
According to the social interactions between users of the Twitter user to identify the
active, passive and inactive users, a supervised machine learning method was proposed to
identify social bots on the basis of age, location and other static features of active, passive, and
inactive users in the Twitter, as well as interacting person, interaction content, interaction theme,
and some dynamic characteristics. A time act model, namely, Act-M, was constructed focusing
on the timing of user behaviour activities, which can be used to accurately determine the interval
between different behaviours of social media users to accurately detect malicious users. They
have been focused on detecting semi-social bots too. For example, a management framework
relying on entropy component, spam detection component, account attribute component, and
decision maker was proposed by Chu et al. In the approach, Naive Bayes is adopted to categorize
automated Twitter accounts into human, social bots, or semi-social bots. Previous studies have
also shown that quantitative features such as friends, fans, forwarders, and tweets can be used in
feature selection. The supervised learning method can be effective in detecting social bots,
however annotation and training for large amounts of data are required in supervised learning.
Tagging data requires time, manpower, and is generally unsuitable for the big data social
networking environment.
1. Cai et al [1] , proposed a system in which social bots are regarded as the most common
kind of malwares in social platforms. They can produce fake messages, spread rumours, and
even manipulate public opinions. Recently, massive social bots are created and widely spread in
social platforms, they bring negative effects to public and netizen security. Bot detection aims to
distinguish bots from humans and it catches more and more attention in recent years. In this
paper, we propose a behavior enhanced deep model (BeDM) for bot detection. The proposed
model regards user content as temporal text data instead of plain text to extract latent temporal
patterns. Moreover, BeDM fuses content information and behavior information using deep
learning methods. However, this is the first trial that applies deep neural networks in bot
detection. Experiments on real world dataset collected from Twitter also demonstrate the
effectiveness of the proposed model.
3. Fred Morstatter et al [3], realized the presence of bots has been felt in many aspects of
social media. Twitter, one example of social media, has especially felt the impact, with bots
accounting for a large portion of its users. These bots have been used for malicious tasks such as
spreading false information about political candidates and inflating the perceived popularity of
celebrities. Furthermore, these bots can change the results of common analyses performed on
social media. It is important that researchers and practitioners have tools in their arsenal to
remove them. Approaches exist to remove bots, however they focus on precision to evaluate
their model at the cost of recall. This means that while these approaches are almost always
correct in the bots they delete, they ultimately delete very few, thus many bots remain. We
propose a model which increases the recall in detecting bots, allowing a researcher to delete
more bots. However this model is evaluated on two real world social media datasets and shows
that the detection algorithm removes more bots from a dataset than current approaches.
5. Yadong Zhou et al [5], established that online social networks gradually integrate
financial capabilities by enabling the usage of real and virtual currency. They serve as new
platforms to host a variety of business activities such as online promotion events, where users
can possibly get virtual currency as rewards by participating in such events. Both OSNs and
business partners are significantly concerned when attackers instrument a set of accounts to
collect virtual currency from these events, which make these events ineffective and result in
significant financial loss. It becomes of great importance to proactively detect these malicious
accounts before the online promotion activities and subsequently decrease their priority to be
rewarded. In this paper, we propose a novel system, namely ProGuard, to accomplish this
objective by systematically integrating features that characterize accounts from three
perspectives including their general behaviors, their recharging patterns, and the usage of their
currency. We have performed extensive experiments based on data collected from Tencent QQ, a
global leading OSN with built-in financial management activities. However, experimental results
have demonstrated that our system can accomplish a high detection rate of 96.67% at a very low
false positive rate of 0.3%.
PROPOSED SYSTEM:
In this paper, we aim to detect malicious social bots on social network platforms in
real-time, by (1) proposing the transition probability features between user clickstreams based on
the social situation analytics; and(2) designing an algorithm for detecting malicious social bots
based on spatiotemporal features. In order to better detect malicious social bots in online social
networks, we analyze user behavior features and identify transition probability features between
user clickstreams Based on the transition probability features and time interval features, a
semi-supervised social bots detection method based on space-time features is proposed.
● We evaluate user behavior features and select the transition probability of user behaviour
on the basis of general behaviour Characteristics.
● We then analyze and classify situation aware user behaviors in social networks using our
proposed semi supervised clustering detection method.
● This allows us to promptly detect malicious social bots using only a small number of
tagged users.
CHAPTER 2- PROJECT DESCRIPTION
GENERAL
In this project, the most existing detection methods of malicious social bots analyze the
quantitative features of their behavior. These features are easily imitated by social bots; thereby
resulting in low accuracy of the analysis. A novel method of detecting malicious socialbots,
including both features selection based on the transition probability of clickstream sequences and
semi-supervised clustering, is presented in this paper. This method not only analyzes transition
probability of user behavior clickstreams but also considers the time feature of behavior.
Findings from our experiments on real online social network platforms demonstrate that the
detection accuracy for different types of malicious socialbots by the detection method of
malicious social bots based on transition probability of user behavior clickstreams increases by
an average of 12.8%, in comparison to the detection method based on quantitative analysis of
user behavior.
MODULES
● DATA COLLECTION
● EXPERIMENTAL DESIGN
● MALICIOUS SOCIAL BOTS DETECTION
The CyVOD platform comprises the website platform and Android and iOS applications.
On CyVOD, the user clickstream behavior is obtained by a data burying point, and user
clickstream data is collected server-side. In the realistic environment, for your own website, you
can use the buried technology to get the corresponding data; for other websites, you need to get
the data by working with the website or by calling the corresponding API (if provided).
EXPERIMENTAL DESIGN
Social bots that perform a single task, malicious social bots that coordinate to perform
tasks, and malicious social bots that perform mixed tasks. For example, a user can perform two
or more actions in the actions of liking,comment, sharing and so on. The social bot for malicious
likes, the value of the P(play,like) (the transition probability of ‘‘the current click event is and the
next click event is liking’’) would be high and the value of other transition probability features
would be small or zero.
Data processing:
Some data are selected randomly from the normal user set and social bots set to the label.
Normal user account is labeled as 1, and the social bots account is labeled as −1. Seed users are
classified as the category of clusters.
Feature selection:
In the spatial dimension: according to the main functions of the CyVOD platform, we
select the transition probability features related to the play-back function: P(play, play), P(play,
like) , P(play, feedback), P(play, comment), P(play, share) and P(play, more) ; in the time
dimension: we can get the inter- arrival times (IATs). Because if all transition probability
matrices of user behavior are constructed, extremely huge data size and sparse matrix can
increase the difficulty of data detection.
● DATA CLEANING
● DATA PROCESSING
● FEATURE SELECTION
● SEMI SUPERVISED CLUSTERING
● OBTAIN NORMAL USER SET AND SOCIAL BOT SET
● RESULT EVALUATION
Data cleaning:
Data that is clicked less must be cleaned to remove wrong data, obtain accurate transition
probability between clickstreams, and avoid the error of transition probability caused by fewer
data.
First, the initial centers of two clusters are determined by labeled seed users. Then,
unlabeled data are used to iterate and optimize the clustering results constantly.
Obtain the normal user set and social bots set:
The normal user set and social bots set can be finally obtained by detecting.
Result evaluation:
We evaluate results based on three different metrics: Precision, Recall, and F1 Score
(F1 is the harmonic average of Precision and Recall, F1 = 2 · Precision· Recall Precision +
Recall ). In the meantime, we use Accuracy as a metric and compare it with the SVM
algorithm to verify the efficiency of the method. Accuracy is the ratio of the number of
samples correctly classified by the classifier to the total number of samples.
SYSTEM TECHNIQUES:
Data Integration is the combination of technical and business processes used to combine
data from disparate sources into meaningful and valuable information. The process of Data
Integration is about taking data from many disparate sources (such as files, various databases,
mainframes etc.,) and combining that data to provide a unified view of the data for business
intelligence. Data integration is needed when a business decides to implement a new application
and migrate its data from the legacy systems into the new application. It becomes even critically
important in cases of company mergers where two companies merge and they need to
consolidate their applications. One of the most commonly known uses of data integration is
building a data warehouse for an enterprise which enables a business to have a unified view of
their data for analysis and business intelligence (BI) needs.
CHAPTER 3 - REQUIREMENTS ENGINEERING
GENERAL
To be used efficiently, all computer software needs certain hardware components or the
other software resources to be present on a computer. These prerequisites are known as
(computer) system requirements and are often used as a guideline as opposed to an absolute rule.
Most software defines two sets of system requirements: minimum and recommended. With
increasing demand for higher processing power and resources in newer versions of software,
system requirements tend to increase over time. Industry analysts suggest that this trend plays a
bigger part in driving upgrades to existing computer systems than technological advancements.
They are
1.Hardware Requirements.
2. Software Requirements.
HARDWARE REQUIREMENTS
The most common set of requirements defined by any operating system or software
application is the physical computer resources, also known as hardware. A hardware
requirements list is often accompanied by a hardware compatibility list (HCL), especially in case
of operating systems. An HCL lists tested, compatible and sometimes incompatible hardware
devices for a particular operating system or application. The following subsections discuss the
various aspects of hardware requirements
SOFTWARE REQUIREMENTS
GENERAL
Design Engineering deals with the various UML [Unified Modeling language] diagrams
for the implementation of projects. Design is a meaningful engineering representation of a thing
that is to be built. Software design is a process through which the requirements are translated into
representation of the software. Design is the place where quality is rendered in software
engineering. Design is the means to accurately translate customer requirements into finished
products.
A Use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented as
use cases), and any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor.
A use case is a methodology used in system analysis to identify, clarify, and organize
system requirements. In this context, the term "system" refers to something being developed or
operated, such as a mail-order product sales and service Website. Use case diagrams are
employed in UML (Unified Modeling Language), a standard notation for the modeling of
real-world objects and systems.
Class diagram is an illustration of the relationships and source code dependencies among
classes in the Unified Modeling Language (UML). In this context, a class defines the methods
and variables in an object, which is a specific entity in a program or the unit of code representing
that entity. Class diagrams are useful in all forms of object-oriented programming (OOP). The
concept is several years old but has been refined as OOP modeling paradigms have
evolved.
EXPLANATION:
A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages exchanged between
the objects needed to carry out the functionality of the scenario. Sequence diagrams are typically
associated with use case realizations in the Logical View of the system under development.
Sequence diagrams are sometimes called event diagrams or event scenarios. A sequence diagram
shows, as parallel vertical lines, different processes or objects that live simultaneously, and, as
horizontal arrows, the messages exchanged between them, in the order in which they occur. This
allows the specification of simple runtime scenarios in a graphical manner. Messages, written
with horizontal arrows with the message name written above them, display interaction. Solid
arrowheads represent synchronous calls, open arrowheads represent asynchronous messages, and
dashed lines represent reply messages. If a caller sends a synchronous message, it must wait until
the message is done, such as invoking a subroutine. If a caller sends an asynchronous message, it
can continue processing and doesn’t have to wait for a response. Asynchronous calls are present
in multithreaded applications, event-driven applications and in message-oriented middleware.
Activation boxes, or method-call boxes, are opaque rectangles drawn on top of lifelines to
represent that processes are being performed in response to the message
EXPLANATION:
1. The DFD is also called a bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing carried out on
this data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by the
process, an external entity that interacts with the system and the information flows in the system.
3. DFD shows how the information moves through the system and how it is modified by
a series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing information
flow and functional detail.
EXPLANATION:
System architecture is the conceptual model that defines the structure, behavior, and
more views of a system. An architecture description is a formal description and representation of
a system, organized in a way that supports reasoning about the structures and behaviors of the
system. A system architecture can consist of system components and the sub-systems developed,
that will work together to implement the overall system. There have been efforts to formalize
languages to describe system architecture; collectively these are called architecture description
languages (ADL).
19
CHAPTER 5 - SOFTWARE SPECIFICATION
DEVELOPMENT TOOLS
GENERAL
This chapter is about the software language and the tools used in the development of the
project. The platform used here is PYTHON.
FEATURES OF PYTHON
20
● It provides very high-level dynamic data types and supports dynamic type checking.
● It supports automatic garbage collection.
● It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
OBJECTIVES OF PYTHON
Python is available on a wide variety of platforms including Linux and Mac OS X. Let's
understand how to set up our Python environment Python is a high-level, interpreted, interactive
and object-oriented scripting language. Python is designed to be highly readable. It uses English
keywords frequently whereas other languages use punctuation, and it has fewer syntactic
constructions than other languages.
HISTORY OF PYTHON:
Python was developed by Guido van Rossum in the late eighties and early nineties at the
National Research Institute for Mathematics and Computer Science in the Netherlands. Python is
derived from many other languages, including ABC, Modula-3, C, C++, Algol-68, SmallTalk,
and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the GNU
General Public License (GPL). Python is now maintained by a core development team at the
institute, although Guido van Rossum still holds a vital role in directing its progress.
OBJECT ORIENTED LANGUAGE
To be an Object Oriented language, any language must follow at least the four
characteristics.
1.Inheritance: It is the process of creating the new classes and using the behavior of the
existing classes by extending them just to reuse the existing code and adding additional features
as needed.
2.Encapsulation: It is the mechanism of combining the information and providing the
abstraction.
3.Polymorphism: As the name suggests one name multiple form, Polymorphism is the
way of providing the different functionality by the functions having the same name based on the
signatures of the methods.
4.Dynamic binding: Sometimes we don't have the knowledge of objects about their
specific types while writing our code. It is the way of providing the maximum functionality to a
program about the specific type at runtime.
GETTING PYTHON:
The most up-to-date and current source code, binaries, documentation, news, etc., is
available on the official website of Python https://www.python.org Windows Installation Here
are the steps to install Python on Windows machines.
The Python language has many similarities to Perl, C, and Java. However, there are
some definite differences between the languages.
FIRST PYTHON PROGRAM:
Invoking the interpreter without passing a script file as a parameter brings up the
following prompt −
$ python
>>>
Type the following text at the Python prompt and press the Enter −
If you are running a new version of Python, then you would need to use print statement
with parenthesis as in print ("Hello, Python!");. However in Python version 2.4.3, this produces
the following result −
Hello, Python!
Invoking the interpreter with a script parameter begins execution of the script and
continues until the script is finished. When the script is finished, the interpreter is no longer
active.Let us write a simple Python program in a script. Python files have extension .py. Type the
following source code in a test.py file −
We assume that you have a Python interpreter set in the PATH variable. Now, try to run
this program as follows −
$ python test.py
Hello, Python!
Flask Framework:
Flask is a web application framework written in Python. Armin Ronacher, who leads an
international group of Python enthusiasts named Pocco, develops it. Flask is based on Werkzeug
WSGI toolkit and Jinja2 template engine. Both are Pocco projects.
SOFTWARE DESCRIPTION:
Python uses dynamic typing, and a combination of reference counting and a cycle-
detecting garbage collector for memory management. It also features dynamic name resolution
(late binding), which binds method and variable names during program execution.
Python's design offers some support for functional programming in the Lisp tradition. It
has a filter map and reduce functions; list comprehensions, dictionaries, sets and generator
expressions. The standard library has two modules (itertools and functools) that implement
functional tools borrowed from Haskell and Standard ML.
The language's core philosophy is summarized in the document The Zen of Python (PEP
20), which includes aphorisms such as:
Rather than having all of its functionality built into its core, Python was designed to be
highly extensible. This compact modularity has made it particularly popular as a means of
adding programmable interfaces to existing applications. Van Rossum's vision of a small core
language with a large standard library and easily extensible interpreter stemmed from his
frustrations with ABC, which espoused the opposite approach.
Python strives for a simpler, less-cluttered syntax and grammar while giving developers a
choice in their coding methodology. In contrast to Perl's "there is more than one way to do it"
motto, Python embraces a "there should be one—and preferably only one—obvious way to do it"
design philosophy. Alex Martelli, a Fellow at the Python Software Foundation and Python book
author, writes that "To describe something as 'clever' is not considered a compliment in the
Python culture."
Python's developers strive to avoid premature optimization, and reject patches to non-
critical parts of the CPython reference implementation that would offer marginal increases in
speed at the cost of clarity. When speed is important, a Python programmer can move time-
critical functions to extension modules written in languages such as C, or use PyPy, a just-in-time
compiler. Cython is also available, which translates a Python script into C and makes direct C-
level API calls into the Python interpreter.
An important goal of Python's developers is keeping it fun to use. This is reflected in the
language's name as a tribute to the British comedy group Monty Python and in occasionally
playful approaches to tutorials and reference materials, such as examples that refer to spam and
eggs (from a famous Monty Python sketch) instead of the standard foo and bar.
A common neologism in the Python community is pythonic, which can have a wide
range of meanings related to program style. To say that code is pythonic is to say that it uses
Python idioms well, that it is natural or shows fluency in the language, that it conforms with
Python's minimalist philosophy and emphasis on readability. In contrast, code that is difficult to
understand or reads like a rough transcription from another programming language is called
unpythonic.
Users and admirers of Python, especially those considered knowledgeable or
experienced, are often referred to as Pythonistas.
LIBRARIES:
Python's large standard library, commonly cited as one of its greatest strengths, provides
tools suited to many tasks. For Internet-facing applications, many standard formats and protocols
such as MIME and HTTP are supported. It includes modules for creating graphical user
interfaces, connecting to relational databases, generating pseudorandom numbers, arithmetic
with arbitrary precision decimals, manipulating regular expressions, and unit testing.
Some parts of the standard library are covered by specifications, but most modules are
not. They are specified by their code, internal documentation, and test suites (if supplied).
However, because most of the standard library is cross-platform Python code, only a few
modules need altering or rewriting for variant implementations.
As of March 2018, the Python Package Index (PyPI), the official repository for third-
party Python software, contains over 130,000 packages with a wide range of functionality,
including:
CPython's public releases come in three types, distinguished by which part of the version
number is incremented:Backward-incompatible versions, where code is expected to break and
need to be manually ported. The first part of the version number is incremented. These releases
happen infrequently for example, version 3.0 was released 8 years after 2.0. Major or "feature"
releases, about every 18 months, are largely compatible but introduce new features. The second
part of the version number is incremented. Each major version is supported by bug fixes for
several years after its release.
Bugfix releases, which introduce no new features, occur about every 3 months and are
made when a sufficient number of bugs have been fixed upstream since the last release. Security
vulnerabilities are also patched in these releases. The third and final part of the version number is
incremented.Many alpha, beta, and release-candidates are also released as previews and for
testing before final releases. Although there is a rough schedule for each release, they are often
delayed if the code is not ready. Python's development team monitors the state of the code by
running the large unit test suite during development, and using the BuildBot continuous
integration system.
The community of Python developers has also contributed over 86,000 software modules
(as of 20 August 2016) to the Python Package Index (PyPI), the official repository of third-party
Python libraries.
The major academic conference on Python is PyCon. There are also special Python
mentoring programmes, such as Pyladies.
CHAPTER 6 - IMPLEMENTATION
GENERAL
In this we implement the coding part using eclipse. Below are the coding’s
that are used to apply for the various schemes available.
import sys
import datetime
import random
import string
import json
import csv
import time
import pandas as pd
def datasetcollect(numMsgs,id1):
List=[]
iotmsg_header = """\
"guid": "%s",
iotmsg_payload ="""\
"payload": {
iotmsg_data ="""\
{"id": "%s",
"time": "%s",
"timestamp": "%d",
"event": "%s",
"IP": "%d",
"location": "%s"
"""
dict1={}
f= open("dataset.txt","w+")
bb=0;
f.write("[")
dataElementDelimiter = ","
bb=bb+1;
p1 = random.randrange(1, 6)
now = datetime.datetime.now()
timestamp = int(time.time()*1000.0)
if counter == numMsgs-1:
timestamp = int(time.time()*1000.0)
data=iotmsg_data % (str(id1),now,timestamp,"play",bb,"IOS")+
dataElementDelimiter
f.write(data)
if counter1 == p1-1:
timestamp = int(time.time()*1000.0)
data=iotmsg_data
%(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")
f.write(data)
else:
timestamp = int(time.time()*1000.0)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")+ dataElementDelimiter
f.write(data)
else:
dataElementDelimiter = ","
timestamp = int(time.time()*1000.0)
data=iotmsg_data % (str(id1),now,timestamp,"play",bb,"IOS")+
dataElementDelimiter
f.write(data)
timestamp = int(time.time()*1000.0)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")+ dataElementDelimiter
f.write(data)
f.write("]")
f.close()
def botcollect(numMsgs,id1):
List=[]
iotmsg_header = """\
"guid": "%s",
iotmsg_eventTime = """\
iotmsg_payload ="""\
"payload": {
iotmsg_data ="""\
{"id": "%s",
"time": "%s",
"timestamp": "%d",
"event": "%s",
"IP": "%d",
"location": "%s"
"""
dict1={}
f= open("dataset.txt","w+")
bb=0
f.write("[")
dataElementDelimiter = ","
p1 = random.randrange(1, 5)
bb=bb+1
now = datetime.datetime.now()
timestamp = int(time.time()*1000.0)
if counter == numMsgs-1:
if counter1 == p1-1:
timestamp = int(time.time()*1000.0)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")
f.write(data)
else:
timestamp = int(time.time()*1000.0)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")+ dataElementDelimiter
f.write(data)
else:
dataElementDelimiter = ","
timestamp = int(time.time()*1000.0)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")+ dataElementDelimiter
f.write(data)
f.write("]")
f.close()
import sys
import datetime
import random
import string
import json
import csv
import time
import pandas as pd
def datasetcollect(numMsgs,id1):
List=[]
iotmsg_header = """\
iotmsg_eventTime = """\
iotmsg_payload ="""\
"payload": {
iotmsg_data ="""\
{"id": "%s",
"time": "%s",
"timestamp": "%d",
"event": "%s",
"IP": "%d",
"location": "%s"
}
"""
dict1={}
f= open("dataset.txt","w+")
bb=0;
f.write("[")
dataElementDelimiter = ","
bb=bb+1;
p1 = random.randrange(1, 6)
now = datetime.datetime.now()
timestamp = int(time.time()*1000.0)
if counter == numMsgs-1:
timestamp = int(time.time()*1000.0)
data=iotmsg_data % (str(id1),now,timestamp,"play",bb,"IOS")+
dataElementDelimiter
f.write(data)
if counter1 == p1-1:
timestamp = int(time.time()*1000.0)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")
f.write(data)
else:
timestamp = int(time.time()*1000.0)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")+ dataElementDelimiter
f.write(data)
else:
dataElementDelimiter = ","
timestamp = int(time.time()*1000.0)
data=iotmsg_data % (str(id1),now,timestamp,"play",bb,"IOS")+
dataElementDelimiter
f.write(data)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")+ dataElementDelimiter
f.write(data)
f.write("]")
f.close()
def botcollect(numMsgs,id1):
List=[]
iotmsg_header = """\
"guid": "%s",
iotmsg_eventTime = """\
iotmsg_payload ="""\
"payload": {
iotmsg_data ="""\
{"id": "%s",
"time": "%s",
"timestamp": "%d",
"event": "%s",
"IP": "%d",
"location": "%s"
"""
dict1={}
f= open("dataset.txt","w+")
bb=0
f.write("[")
dataElementDelimiter = ","
p1 = random.randrange(1, 5)
bb=bb+1
now = datetime.datetime.now()
timestamp = int(time.time()*1000.0)
if counter == numMsgs-1:
for counter1 in range(0, p1):
if counter1 == p1-1:
timestamp = int(time.time()*1000.0)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")
f.write(data)
else:
timestamp = int(time.time()*1000.0)
data=iotmsg_data %
(str(id1),now,timestamp,eventlist[counter1],bb,"IOS")+ dataElementDelimiter
f.write(data)
else:
dataElementDelimiter = ","
timestamp = int(time.time()*1000.0)
data=iotmsg_data % (str(id1),now,timestamp,eventlist[counter1],bb,"IOS")+
dataElementDelimiter
f.write(data)
f.write("]")
f.close()
s= clickstream.datasetcollect(50,1)
df=pd.read_json (r'dataset.txt')
gh=[];
for gg in range(2,20):
s= clickstream.datasetcollect(50,gg)
df1=pd.read_json (r'dataset.txt')
#Data Cleaning
for nn in range(1,51):
m=df1[(df1['IP'] == nn) ]
g2=m.shape[0]
if(g2<=1):
df1.drop(indexNames , inplace=True)
lenn=df.shape[0]
for hj in range(0,lenn):
gh.append("0")
for gg in range(20,26):
s= clickstream.botcollect(50,gg)
df1=pd.read_json (r'dataset.txt')
for nn in range(1,51):
m=df1[(df1['IP'] == nn) ]
g2=m.shape[0]
if(g2<=1):
df1.drop(indexNames , inplace=True)
lenn1=df.shape[0]-lenn;
for hj in range(0,lenn1):
gh.append("1")
df
df['class']=gh
df
# Feature Selection
#P(play;like), P(play;feedback), P(play;comment), P(play;share) and
P(play;more);
like=[];
feedback=[];
comment=[];
share=[];
more=[];
userid=[];
seid=[];
label=[];
for j in range(1,26):
userclickstream=df[(df['id'] == j)]
for nn in range(1,51):
like1=0;
feedback1=0;
comment1=0;
share1=0;
more1=0;
m=userclickstream[(userclickstream['IP'] == nn) ]
kk1=m['time'].values;
kks=m['class'].values;
#print(kks[0])
ll=1;
for jk in kks:
ll=jk;
label.append(ll)
kk=m['event'].values;
cc=len(kk)
for jk in kk:
if(kk[0]=="play"):
if(jk=="like"):
hkl=1/(cc-1)
like1=hkl;
elif(jk=="feedback"):
hkl=1/(cc-1)
feedback1=hkl;
elif(jk=="comment"):
hkl=1/(cc-1)
comment1=hkl;
elif(jk=="share"):
hkl=1/(cc-1)
share1=hkl;
elif(jk=="more"):
hkl=1/(cc-1)
more1=hkl;
else:
if(jk=="like"):
hkl=1
like1=hkl;
elif(jk=="feedback"):
hkl=1
feedback1=hkl;
elif(jk=="comment"):
hkl=1
comment1=hkl;
elif(jk=="share"):
hkl=1
share1=hkl;
elif(jk=="more"):
hkl=1
more1=hkl;
like.append(like1)
feedback.append(feedback1)
comment.append(comment1)
share.append(share1)
more.append(more1)
userid.append(j);
seid.append(nn);
import numpy as np
newDF = pd.DataFrame()
newDF['userid']=userid;
newDF['seid']=seid;
newDF['like']=like;
newDF['share']=share;
newDF['more']=more;
newDF['comment']=comment;
newDF['feedback']=feedback;
newDF['class']=label;
bb=newDF
m1=bb[(bb['class'] == "0") ]
cls=np.array( m1.mean(axis = 0) )
cls=np.delete(cls, -1)
m2=bb[(bb['class'] == "1") ]
cls2=np.array( m2.mean(axis = 0) )
cls2=np.delete(cls2, -1)
testdata=newDF
test=testdata.values;
cc=[];
cc.append(cls.tolist());
cc.append(cls2.tolist());
test
import math
sum = 0
a = (i - j) * (i - j)
sum = sum + a
return math.sqrt(sum)
c_dist = []
for i in centroids:
temp = []
for j in data:
temp.append(euc_dist(i, j))
c_dist.append(temp)
return c_dist
clusters = []
for i in range(k):
clusters.append([])
for i in range(len(dist_table[0])):
d = []
for j in range(len(dist_table)):
d.append(dist_table[j][i])
clusters[d.index(min(d))].append(i)
return clusters
#distance Calculation:
centroids=cc;
result={}
cluster_mem = []
distance_table=cal_dist(centroids,test);
cluster_mem.append(cluster_table)
acc=0;
for i in range(len(centroids)):
for j in range(len(cluster_table[i])):
#print(cluster_table[i][j])
result[cluster_table[i][j]]=i
re=[]
for k, v in l:
re.append(v)
re
threshold=25
clabel=[];
for hj in range(0,len(label),50):
count=0;
count1=0;
for h in range(hj,(hj+50)):
if (label[h])=="0":
count=count+1;
else:
count1=count1+1;
if(count>count1):
clabel.append(0)
else:
clabel.append(1)
clabel
predict=[]
for hj in range(0,len(re),50):
count=0;
count1=0;
for h in range(hj,(hj+50)):
if (re[h])==0:
count=count+1;
else:
count1=count1+1;
if(count>count1):
predict.append(0)
else:
predict.append(1)
predict
predictions = predict
test_y = clabel
score=f1_score(predictions,test_y,average='binary')
ccoun=0;
for jj in predictions:
ccoun=ccoun+1;
if(jj==0):
else:
import numpy as np
y_pos = np.arange(len(objects))
performance = [96,92]
plt.xticks(y_pos, objects)
plt.ylabel('Accuracy %')
plt.show()
y_pos = np.arange(len(objects))
performance = [92,95]
plt.xticks(y_pos, objects)
plt.ylabel('Precision %')
plt.show()
y_pos = np.arange(len(objects))
performance = [97,83]
plt.xticks(y_pos, objects)
plt.ylabel('Recall %')
plt.show()
CHAPTER 7 - SNAPSHOTS
GENERAL
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable manner. There are
various types of tests. Each test type addresses a specific testing requirement.
DEVELOPING METHODOLOGIES
The test process is initiated by developing a comprehensive plan to test the general
functionality and special features on a variety of platform combinations. Strict quality control
procedures are used.
The process verifies that the application meets the requirements specified in the system
requirements document and is bug free. The following are the considerations used to develop the
framework from developing the testing methodologies.
TYPES OF TESTING
Unit testing
Unit testing involves the design of test cases that validate that the internal program logic
is functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration.
System Test
System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system testing is
the configuration oriented system integration test. System testing is based on process
descriptions and flows, emphasizing pre-driven process links and integration points.
White Box Testing
White Box Testing is a testing in which the software tester has knowledge of the inner
workings, structure and language of the software, or at least its purpose. It is purposeful. It is
used to test areas that cannot be reached from a black box level.
Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner
workings, structure or language of the module being tested. Black box tests, as most other kinds
of tests, must be written from a definitive source document, such as specification or requirements
document, such as specification or requirements document.
It is a testing in which the software under test is treated as a black box .you cannot “see”
into it. The test provides inputs and responds to outputs without considering how the software
works.
Test strategy and approach
Field testing will be performed manually and functional tests will be written in detail.
Test objectives
Features to be tested
Integration Testing
Test Results:
All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional requirements.
Test Results:
All the test cases mentioned above passed successfully. No defects encountered.
Follow the seven steps below to create a test plan as per IEEE 829
Any project can be divided into units that can be further performed for detailed processing. Then
a testing strategy for each of this unit is carried out.
CHAPTER 9 - CONCLUSION
CONCLUSION:
In this paper, we proposed a novel method to accurately detect malicious social bots in
online social networks. Experiments showed that transition probability between user
clickstreams based on the social situation analytics can be used to detect malicious social bots in
online social platforms accurately. In future research, additional behaviors of malicious social
bots will be further considered and the proposed detection approach will be extended and
optimized to identify specific intentions and purposes of a broader range of malicious social
bots.
FUTURE ENHANCEMENT:
As a future direction,
i) To develop an app that can detect malicious bots with options to choose from various social
network applications available
ii)To add a feature of blocking or reporting the spam bots when detected
62
REFERENCES
[1] C. Cai, L. Li, and D. Zengi, ``Behavior enhanced deep bot detection in social media,''
in Proc. IEEE Int. Conf. Intell. Secur. Inform. (ISI), Beijing, China, Jul.2017, pp. 128_130.
[3] F. Morstatter, L. Wu, T. H. Nazer, K. M. Carley, and H. Liu, ``A new approach to bot
detection: Striking the balance between precision and recall,'' in Proc. IEEE/ACM Int. Conf. Adv.
Social Netw. Anal. Mining, San Francisco, CA, USA, Aug. 2016, pp. 533_540.
[4] Z. Zhang, C. Li, B. B. Gupta, and D. Niu, ``Ef_cient compressed ciphertext length
scheme using multi-authority CP-ABE for hierarchical attributes,'' IEEE Access, vol. 6, pp.
38273_38284, 2018. doi:10.1109/ACCESS.2018.2854600.
[5] Y. Zhou et al., ``ProGuard: Detecting malicious accounts in social network- based
online promotions,'' IEEE Access, vol. 5, pp. 1990_1999, 2017.