Project Final Document
Project Final Document
Project Final Document
INTRODUCTION
Recent advances in technology have led to the introduction of cyber-physical systems, which due
to their better computational and communicational ability and integration between physical and
cyber-components, has led to significant advances in many dynamic applications. But this
improvement comes at the cost of being vulnerable to cyber-attacks. Cyber-physical systems are
made up of logical elements and embedded computers, which communicate with communication
channels such as the Internet of Things(IoT). More specifically, these systems include digital or
cyber components, analog components, physical devices and humans that designed to operate
between physical and cyber parts. In other words, a cyber-physical system is any system that
includes cyber and physical components and humans, and has the ability to trade between the
physical and cyber parts. In cyber-physical systems, the security of these types of systems becomes
more important due to the addition of the physical part.
Physical components including sensors, which receive data from the physical environment, maybe
attacked and be injected incorrect data into the system. One of the most important challenges of a
cyber-physical system, in its physical part is the presence of a large number of sensors in the
environment, which collect so much data, with so much variety, and at high speed. Also, the
connection between the sensors and the necessary calculations and the analysis of the obtained
data will be among the main challenges. Therefore, one of the most important features of a cyber-
physical system is to communicate between these sensors, compute and control the system
LITERATURE SURVEY
[1] Kwon, Cheolhyeon, Weiyi Liu, and Inseok Hwang. ”Security analysis for cyber-physical
systems against stealthy deception attacks.” In 2013 American control conference, IEEE
(2013): 3344-3349
The security issue in the state estimation problem is investigated for a networked control system
(NCS). The communication channels between the sensors and the remote estimator in the NCS are
vulnerable to attacks from malicious adversaries. The false data injection attacks are considered.
The aim of this project to find the so-called insecurity conditions under which the estimation
system is insecure in the sense that there exist malicious attacks that can bypass the anomaly
detector but still lead to unbounded estimation errors. In particular, a new necessary and sufficient
condition for the insecurity is derived in the case that all communication channels are
compromised by the adversary. Moreover, a specific algorithm is proposed for generating attacks
with which the estimation system is insecure. Furthermore, for the insecure system, a system
protection scheme through which only a few (rather than all) communication channels require
protection against false data injection attacks is proposed. A simulation example is utilized to
demonstrate the effectiveness of the proposed conditions/algorithms in the secure estimation
problem for a flight vehicle.
[2] Pajic, Miroslav, James Weimer, Nicola Bezzo, Oleg Sokolsky, George J. Pappas, and
Insup Lee. ”Design and implementation of attack-resilient cyberphysical systems: With a
focus on attack-resilient state estimators.” IEEE Control Systems Magazine 37, no. 2 (2017):
66-81.
Recent years have witnessed a significant increase in the number of security-related incidents in
control systems. These include high-profile attacks in a wide range of application domains, from
attacks on critical infrastructure, as in the case of the Maroochy Water breach [1], and industrial
systems (such as the StuxNet virus attack on an industrial supervisory control and data acquisition
system [2], [3] and the German Steel Mill cyberattack [4], [5]), to attacks on modern vehicles [6]-
[8]. Even high-assurance military systems were shown to be vulnerable to attacks, as illustrated in
the highly publicized downing of the RQ-170 Sentinel U.S. drone [9]-[11]. These incidents have
greatly raised awareness of the need for security in cyberphysical systems (CPSs), which feature
tight coupling of computation and communication substrates with sensing and actuation
components. However, the complexity and heterogeneity of this next generation of safety-critical,
networked, and embedded control systems have challenged the existing design methods in which
security is usually consider as an afterthought.
Embedded computational resources in autonomous robotic vehicles are becoming more abundant
and have enabled improved operational effectiveness of cooperative robotic systems in civilian
and military applications. Compared to autonomous robotic vehicles that operate single tasks,
cooperative teamwork has greater efficiency and operational capability. Multirobotic vehicle
systems have many potential applications, such as platooning of vehicles in urban transportation,
the operation of the multiple robots, autonomous underwater vehicles, and formation of aircrafts
in military affairs [1–3]. The project of group behaviors for multirobot systems is the main
objective of the work. Group cooperative behavior signifies that individuals in the group share a
common objective and action according to the interest of the whole group. Group cooperation can
be efficient if individuals in the group coordinate their actions well. Each individual can coordinate
with other individuals in the group to facilitate group cooperative behavior in two ways, named
local coordination and global coordination. For the local coordination, individuals react only to
other individuals that are close, such as fish engaged in a school.
In this project, we project the problem of reaching a consensus among all the agents in the
networked control systems (NCS) in the presence of misbehaving agents. A reputation-based
resilient distributed control algorithm is first proposed for the leader-follower consensus network.
The proposed algorithm embeds a resilience mechanism that includes four phases (detection,
mitigation, identification, and update), into the control process in a distributed manner. At each
phase, every agent only uses local and one-hop neighbors' information to identify and isolate the
misbehaving agents, and even compensate their effect on the system. We then extend the proposed
algorithm to the leaderless consensus network by introducing and adding two recovery schemes
(rollback and excitation recovery) into the current framework to guarantee the accurate
convergence of the well-behaving agents in NCS. The effectiveness of the proposed method is
demonstrated through case studies in multirobot formation control and wireless sensor networks.
This project focuses on resilient control of networked control systems (NCSs) under the denial of
service (DoS) attacks which is characterized by a Markov process. Firstly, the packets dropout are
modeled as Markov process according to the game between attack strategies and defense
strategies. Then, an NCS under such game results is modeled as a Markovian jump linear system
and four theorems are proved for the system stability analysis and controller design. Finally, a
numerical example is used to illustrative the application of these theorems. Networked control
systems (NCSs) have received an increasing attention in the past decades. Now, NCSs have been
widely applied in industrial processes, electric power networks, intelligent transportation and so
on . With the growing of the NCSs, network, as a critical element in an NCS, is vulnerable to
cyber-threats which can menace the control systems.
Existing Method:
In the existing system, implementation of machine learning algorithms is bit complex to build due
to the lack of information about the data visualization. Mathematical calculations are used in
existing system for model building this may takes the lot of time and complexity. To overcome all
this, we use machine learning packages available in the scikit-learn library.
Disadvantages:
High complexity.
Time consuming.
Proposed System:
Proposed several machine learning models to classify whether there will be a cyber-attack or not,
but none have adequately addressed this misdiagnosis problem. Also, similar studies that have
proposed models for evaluation of such performance classification mostly do not consider the
heterogeneity and the size of the data Therefore, we propose a Support Vector , Decision Tree,
Random forest, Extra Tree Classifier and ad boost and neural network classifier classification
techniques.
Advantages:
Highest accuracy
Reduces time complexity.
Block Diagram
Architecture:
1. DECISION TREE:
Decision tree is a flowchart-like tree structure where an internal node represents feature(or
attribute), the branch represents a decision rule, and each leaf node represents the outcome. The
topmost node in a decision tree is known as the root node. It learns to partition on the basis of the
attribute value. It partitions the tree in recursively manner call recursive partitioning. This
flowchart-like structure helps you in decision making. It's visualization like a flowchart diagram
which easily mimics the human level thinking. That is why decision trees are easy to understand
and interpret.
A single decision tree is usually overfits the data it is learning from because it learn from only one
pathway of decisions. Predictions from a single decision tree usually don’t make accurate
predictions on new data.
Random forest models reduce the risk of overfitting by introducing randomness by:
splitting nodes on the best split among a random subset of the features selected at every node
Extra Trees is like Random Forest, in that it builds multiple trees and splits nodes using random
subsets of features, but with two key differences: it does not bootstrap observations (meaning it
samples without replacement), and nodes are split on random splits, not best splits. So, in summary,
ExtraTrees:
builds multiple trees with bootstrap = False by default, which means it samples without
replacement
nodes are split based on random splits among a random subset of the features selected at
every node
In Extra Trees, randomness doesn’t come from bootstrapping of data, but rather comes from the
random splits of all observations.
A random forest is a machine learning technique that’s used to solve regression and classification
problems. It utilizes ensemble learning, which is a technique that combines many classifiers to
provide solutions to complex problems.
A random forest algorithm consists of many decision trees. The ‘forest’ generated by the random
forest algorithm is trained through bagging or bootstrap aggregating. Bagging is an ensemble meta-
algorithm that improves the accuracy of machine learning algorithms.
The (random forest) algorithm establishes the outcome based on the predictions of the decision
trees. It predicts by taking the average or mean of the output from various trees. Increasing the
number of trees increases the precision of the outcome.
A random forest eradicates the limitations of a decision tree algorithm. It reduces the over fitting
of datasets and increases precision. It generates predictions without requiring many configurations
in packages (like Scikit-learn).
Features of a Random Forest Algorithm:
It’s more accurate than the decision tree algorithm.
It provides an effective way of handling missing data.
It can produce a reasonable prediction without hyper-parameter tuning.
It solves the issue of over fitting in decision trees.
In every random forest tree, a subset of features is selected randomly at the node’s splitting
point.
Decision trees are the building blocks of a random forest algorithm. A decision tree is a decision
support technique that forms a tree-like structure. An overview of decision trees will help us
understand how random forest algorithms work.
A decision tree consists of three components: decision nodes, leaf nodes, and a root node. A
decision tree algorithm divides a training dataset into branches, which further segregate into other
branches. This sequence continues until a leaf node is attained. The leaf node cannot be segregated
further.
The information theory can provide more information on how decision trees work. Entropy and
information gain are the building blocks of decision trees. An overview of these fundamental
concepts will improve our understanding of how decision trees are built.
Entropy is a metric for calculating uncertainty. Information gain is a measure of how uncertainty
in the target variable is reduced, given a set of independent variables.
The information gain concept involves using independent variables (features) to gain information
about a target variable (class). The entropy of the target variable (Y) and the conditional entropy of
Y (given X) are used to estimate the information gain. In this case, the conditional entropy is
subtracted from the entropy of Y.
Information gain is used in the training of decision trees. It helps in reducing uncertainty in these
trees. A high information gain means that a high degree of uncertainty (information entropy) has
The objective of the support vector machine algorithm is to find a hyper plane in an N-dimensional
space (N — the number of features) that distinctly classifies the data points.
To separate the two classes of data points, there are many possible Hyper planes that could be
chosen. Our objective is to find a plane that has the maximum margin, i.e. the maximum distance
Hyper planes are decision boundaries that help classify the data points. Data points falling on either
side of the hyper plane can be attributed to different classes. Also, the dimension of the hyper plane
depends upon the number of features. If the number of input features is 2, then the hyper plane is
just a line. If the number of input features is 3, then the hyper plane becomes a two-dimensional
plane. It becomes difficult to imagine when the number of features exceeds 3.
Support Vectors
Support vectors are data points that are closer to the hyper plane and influence the position and
orientation of the hyper plane. Using these support vectors, we maximize the margin of the
classifier. Deleting the support vectors will change the position of the hyper plane. These are the
points that help us build our SVM.
In logistic regression, we take the output of the linear function and squash the value within the
range of [0,1] using the sigmoid function. If the squashed value is greater than a threshold value
(0.5) we assign it a label 1, else we assign it a label 0. In SVM, we take the output of the linear
function and if that output is greater than 1, we identify it with one class and if the output is -1, we
identify is with another class. Since the threshold values are changed to 1 and -1 in SVM, we obtain
this reinforcement range of values ([-1, 1]) which acts as margin.
Hinge loss function (function on left can be represented as a function on the right)
The cost is 0 if the predicted value and the actual value are of the same sign. If they are not, we
then calculate the loss value. We also add a regularization parameter the cost function. The
objective of the regularization parameter is to balance the margin maximization and loss. After
adding the regularization parameter, the cost functions looks as below.
Gradients
When there is no misclassification, i.e. our model correctly predicts the class of our data point, we
only have to update the gradient from the regularization parameter.
When there is a misclassification, i.e. our model make a mistake on the prediction of the class of
our data point, we include the loss along with the regularization parameter to perform gradient
update.
5. Neural Network:
An artificial neural network (ANN) is the piece of a computing system designed to simulate the
way the human brain analyzes and processes information. It is the foundation of artificial
intelligence (AI) and solves problems that would prove impossible or difficult by human or
statistical standards. ANNs have self-learning capabilities that enable them to produce better
results as more data becomes available.
An ANN has hundreds or thousands of artificial neurons called processing units, which are
interconnected by nodes. These processing units are made up of input and output units. The input
units receive various forms and structures of information based on an internal weighting system,
and the neural network attempts to learn about the information presented to produce one output
report. Just like humans need rules and guidelines to come up with a result or output, ANNs also
use a set of learning rules called backpropagation, an abbreviation for backward propagation of
error, to perfect their output results.
An ANN initially goes through a training phase where it learns to recognize patterns in
data, whether visually, aurally, or textually. During this supervised phase, the network
compares its actual output produced with what it was meant to produce—the desired
output. The difference between both outcomes is adjusted using backpropagation. This
means that the network works backward, going from the output unit to the input units to
adjust the weight of its connections between the units until the difference between the
actual and desired outcome produces the lowest possible error.
Whenever we increase the layers in our ANN then it is nothing but our Deep Neural Networks.
A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the
input and output layers. There are different types of neural networks but they always consist of the same
components: neurons, synapses, weights, biases, and functions.
Requirement’s analysis is very critical process that enables the success of a system or software
project to be assessed. Requirements are generally split into two types: Functional and non-
functional requirements.
Functional Requirements: These are the requirements that the end user specifically
demands as basic facilities that the system should offer. All these functionalities need to be
necessarily incorporated into the system as a part of the contract. These are represented or stated
in the form of input to be given to the system, the operation performed and the output expected.
They are basically the requirements stated by the user which one can see directly in the final
product, unlike the non-functional requirements.
Examples of functional requirements:
1) Authentication of user whenever he/she logs into the system
2) System shutdown in case of a cyber-attack
3) A verification email is sent to user whenever he/she register for the first time on some
software system.
Non-functional requirements: These are basically the quality constraints that the
system must satisfy according to the project contract. The priority or extent to which these factors
are implemented varies from one project to other. They are also called non-behavioral
requirements.
They basically deal with issues like:
Portability
Security
Scalability
Performence
Flexibility
Examples of non-functional requirements:
1) Emails should be sent with a latency of no greater than 12 hours from such an activity.
2) The processing of each request should be done within 10 seconds
3) The site should load in 3 seconds whenever of simultaneous users are > 10000
Hardware:
RAM : 8 GB
Software:
IDE : PyCharm.
Framework : Flask
SYSTEM DESIGN:
Input Design:
In an information system, input is the raw data that is processed to produce output. During the
input design, the developers must consider the input devices such as PC, MICR, OMR, etc.
Therefore, the quality of system input determines the quality of system output. Well-designed
input forms and screens have following properties −
It should serve specific purpose effectively such as storing, recording, and retrieving the
information.
All these objectives are obtained using the knowledge of basic design principles regarding
−
To design source documents for data capture or devise other data capture methods
To design input data records, data entry screens, user interface screens, etc.
Output Design:
The design of output is the most important task of any system. During output design, developers
identify the type of outputs needed, and consider the necessary output controls and prototype
report layouts.
To develop output design that serves the intended purpose and eliminates the production
of unwanted output.
To develop the output design that meets the end user’s requirements.
To form the output in appropriate format and direct it to the right person.
MODULES:
1. User:
1.1 View Home page:
Here user view the home page of the cyber-attack web application.
1.2 View about page:
In the about page, users can learn more about the poverty classification.
1.3 Input Model:
The user must provide input values for the certain fields in order to get results.
1.4 View Results:
User view’s the generated results from the model.
1.5 View score:
Here user have ability to view the score in %
2. System
2.1 Working on dataset:
System checks for data whether it is available or not and load the data in csv files.
2.2 Pre-processing:
Data need to be pre-processed according the models it helps to increase the accuracy of
the model and better information about the data.
2.3 Training the data:
After pre-processing the data will split into two parts as train and test data before training
with the given algorithms.
2.4 Model Building
To create a model that predicts the personality with better accuracy, this module will help
user.
2.5 Generated Score:
2.6 Here user view the score in %
2.7 Generate Results:
We train the machine learning algorithm and predict the cyber-attack detection.
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modelling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modelling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns
and components.
7. Integrate best practices.
CLASS DIAGRAM
SEQUENCE DIAGRAM
COLLABORATION DIAGRAM
DEPLOYMENT DIAGRAM
ACTIVITY DIAGRAM
ER DIAGRAM
Home Page:
Here user view the home page of cyber-attack detection web application.
ABOUT
Here we can read about our project.
Register
In the page, users need to register by entering his credentials.
Log in
In the page, users has to enter the credentials to enter into the cyber-attack prediction.
Load
In the load page, users can load the cyber dataset.
Model
Here we train our data with different ML algorithms.`
Prediction
This page show the detection result of the cyber-attack detection data.
FUTURE SCOPE
There are quite a few things that can be polished or be added in the future work. • We have opted
to use two data mining classifies in this project namely the ID3 and Naive Bayes classifier. There
are more classieres such as the Bayesian network classifier, Neural Network classifier and C4.5
classifier. Such classifiers were not included in this project and could be counted in future to give
a more data to be compared with.
CONCLUSION
In this project, an attempt was made to use the resilient control consensus method in complex
discrete cyber-physical networks with a number of local attacks off. By applying this control
method, it was observed that even in the presence of cyber-attacks, the system can remain stable
and isolate the attacked node and the performance of the system is not weakened. Using the neural
network used in this project, it was observed that with a deep neural network, with 7 hidden layers,
the system shows better performance. Also in a recurrent neural network integrated with a deep
neural network, a deep layer network with a linear function performs better. Therefore, it can be
said that the system has less complexity. So With deep learning method, systems can analyse
patterns and learn from them to help prevent similar attacks and respond to changing behaviour.
In short, machine learning can make cyber security simpler, more proactive, less expensive and
far more effective. After observing the state of the system reported by the neural network, the
control system makes decisions based on it and, if there is an attack, detects it and isolates it, so as
not to have a detrimental effect on the behaviour of other agents. In future research, more attacks
on agents can be considered, also data mining and other machine learning methods, such as support
vector machine (SVM) algorithms or other types of neural networks such as recurrent neural
networks to evaluate system performance improvements.
SOURCE CODE
# Importing necessary libraries
import pandas as pd
import numpy as np
import mysql.connector
db=mysql.connector.connect(user="root",password="",port='3306',database='cyber_attack')
cur=db.cursor()
app=Flask(__name__)
app.secret_key="CBJcb786874wrf78chdchsdcv"
@app.route('/')
def index():
return render_template('index.html')
@app.route('/about')
def about():
return render_template('about.html')
@app.route('/drug')
def drug():
return render_template('drug.html')
@app.route('/login',methods=['POST','GET'])
def login():
if request.method=='POST':
useremail=request.form['useremail']
session['useremail']=useremail
userpassword=request.form['userpassword']
cur.execute(sql)
data=cur.fetchall()
db.commit()
if data ==[]:
return render_template("login.html",name=msg)
else:
return render_template("load.html",myname=data[0][1])
return render_template('login.html')
@app.route('/registration',methods=["POST","GET"])
def registration():
if request.method=='POST':
username=request.form['username']
useremail = request.form['useremail']
userpassword = request.form['userpassword']
conpassword = request.form['conpassword']
address = request.form['address']
contact = request.form['contact']
if userpassword == conpassword:
cur.execute(sql)
data=cur.fetchall()
db.commit()
print(data)
if data==[]:
val=(username,useremail,userpassword,Age,address,contact)
cur.execute(sql,val)
db.commit()
flash("Registered successfully","success")
return render_template("login.html")
else:
return render_template("registration.html")
else:
return render_template("registration.html")
return render_template('registration.html')
@app.route('/load',methods=["GET","POST"])
def load():
data = request.files['data']
df = pd.read_csv(data)
dataset = df.head(100)
return render_template('load.html')
@app.route('/view')
def view():
print(dataset)
print(dataset.head(2))
print(dataset.columns)
def preprocess():
if request.method == "POST":
size = int(request.form['split'])
le=LabelEncoder()
df['protocol_type']=le.fit_transform(df['protocol_type'])
df['flag']= le.fit_transform(df['flag'])
df['service']= le.fit_transform(df['service'])
x = df.iloc[:, :-1]
y = df.iloc[:, -1]
print(x_train,x_test)
return render_template('preprocess.html')
def model():
if request.method == "POST":
global model
print('ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc')
s = int(request.form['algo'])
if s == 0:
elif s == 1:
et = ExtraTreesClassifier()
et.fit(x_train,y_train)
y_pred = et.predict(x_test)
print('aaaaaaaaaaaaaaaaaaaaaaaaa')
msg = 'The accuracy obtained by Extra Tree Classifier is ' + str(ac_et) + str('%')
elif s == 2:
classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)
msg = 'The accuracy obtained by Decision Tree Classifier is ' + str(ac_dt) + str('%')
elif s == 3:
svc=SVC()
svc=svc.fit(x_train,y_train)
y_pred = svc.predict(x_test)
msg = 'The accuracy obtained by Support Vector Classifier is ' + str(ac_svc) + str('%')
elif s == 4:
knn = KNeighborsClassifier(n_neighbors=12)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
elif s == 5:
adb = AdaBoostClassifier()
adb.fit(x_train, y_train)
y_pred = adb.predict(x_test)
elif s == 6:
model = load_model('neural_network.h5')
score=0.9423418045043945
return render_template('model.html')
def prediction():
if request.method == "POST":
# f1=int(request.form['city'])
f1 = float(request.form['duration'])
f2 = float(request.form['protocol_type'])
f3 = float(request.form['service'])
f4 = float(request.form['flag'])
f5 = float(request.form['src_bytes'])
f6 = float(request.form['dst_bytes'])
f7 = float(request.form['land'])
f8 = float(request.form['wrong_fragment'])
f9 = float(request.form['urgent'])
f10 = float(request.form['hot'])
f11 = float(request.form['num_failed_logins'])
f12 = float(request.form['logged_in'])
f13 = float(request.form['num_compromised'])
f14 = float(request.form['root_shell'])
f15 = float(request.form['su_attempted'])
f16 = float(request.form['num_root'])
f17 = float(request.form['num_file_creations'])
f18 = float(request.form['num_shells'])
f19 = float(request.form['num_access_files'])
f20 = float(request.form['num_outbound_cmds'])
f22 = float(request.form['is_guest_login'])
f23 = float(request.form['count'])
f24 = float(request.form['srv_count'])
f25 = float(request.form['serror_rate'])
f26 = float(request.form['srv_serror_rate'])
f27 = float(request.form['rerror_rate'])
f28 = float(request.form['srv_rerror_rate'])
f29 = np.float(request.form['same_srv_rate'])
f30 = float(request.form['diff_srv_rate'])
f31 = float(request.form['srv_diff_host_rate'])
f32 = float(request.form['dst_host_count'])
f33 = float(request.form['dst_host_srv_count'])
f34 = float(request.form['dst_host_same_srv_rate'])
f35 = float(request.form['dst_host_diff_srv_rate'])
f36 = float(request.form['dst_host_same_src_port_rate'])
f37 = float(request.form['dst_host_srv_diff_host_rate'])
f38 = float(request.form['dst_host_serror_rate'])
f39 = float(request.form['dst_host_srv_serror_rate'])
f40 = float(request.form['dst_host_rerror_rate'])
f41 = float(request.form['dst_host_srv_rerror_rate'])
print(f2)
print(type(f2))
li =
[f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20,f21,f22,f23,f24,f25,f26,f27,f28,f29,f3
0,
f31,f32,f33,f34,f35,f36,f37,f38,f39,f40,f41]
# model.fit(X_transformed, y_train)
# print(f2)
import pickle
model = ExtraTreesClassifier()
model.fit(x_train,y_train)
result = model.predict([li])
print(result)
print('result is ',result)
# (Anomaly = 0, Normal = 1 )
if result == 0:
else:
return render_template('prediction.html')
if __name__=='__main__':
app.run(debug=True)