US20220407769A1 - Control system anomaly detection using neural network consensus - Google Patents
Control system anomaly detection using neural network consensus Download PDFInfo
- Publication number
- US20220407769A1 US20220407769A1 US17/837,472 US202217837472A US2022407769A1 US 20220407769 A1 US20220407769 A1 US 20220407769A1 US 202217837472 A US202217837472 A US 202217837472A US 2022407769 A1 US2022407769 A1 US 2022407769A1
- Authority
- US
- United States
- Prior art keywords
- network
- data
- control system
- sensor data
- sensor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 107
- 238000001514 detection method Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 claims abstract description 71
- 238000004891 communication Methods 0.000 claims description 61
- 238000011176 pooling Methods 0.000 claims description 36
- 238000012549 training Methods 0.000 claims description 34
- 238000013527 convolutional neural network Methods 0.000 claims description 28
- 238000012952 Resampling Methods 0.000 claims description 24
- 238000009826 distribution Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 21
- 238000009825 accumulation Methods 0.000 claims description 16
- 238000007781 pre-processing Methods 0.000 claims description 15
- 238000010606 normalization Methods 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 claims description 7
- 230000001934 delay Effects 0.000 claims description 6
- 238000003860 storage Methods 0.000 description 39
- 238000004422 calculation algorithm Methods 0.000 description 35
- 230000000670 limiting effect Effects 0.000 description 26
- 230000015654 memory Effects 0.000 description 25
- 230000006870 function Effects 0.000 description 18
- 238000004590 computer program Methods 0.000 description 17
- 230000006399 behavior Effects 0.000 description 13
- 230000004913 activation Effects 0.000 description 11
- 238000012545 processing Methods 0.000 description 10
- 230000003542 behavioural effect Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 6
- 230000000306 recurrent effect Effects 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 5
- 230000002093 peripheral effect Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 206010000117 Abnormal behaviour Diseases 0.000 description 3
- 241001092459 Rubus Species 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000004973 liquid crystal related substance Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000000053 physical method Methods 0.000 description 3
- 230000002441 reversible effect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 229920001621 AMOLED Polymers 0.000 description 2
- 102100026278 Cysteine sulfinic acid decarboxylase Human genes 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000004883 computer application Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 108010064775 protein C activator peptide Proteins 0.000 description 2
- APTZNLHMIGJTEW-UHFFFAOYSA-N pyraflufen-ethyl Chemical compound C1=C(Cl)C(OCC(=O)OCC)=CC(C=2C(=C(OC(F)F)N(C)N=2)Cl)=C1F APTZNLHMIGJTEW-UHFFFAOYSA-N 0.000 description 2
- 239000010979 ruby Substances 0.000 description 2
- 229910001750 ruby Inorganic materials 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 102000004877 Insulin Human genes 0.000 description 1
- 108090001061 Insulin Proteins 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 229940125396 insulin Drugs 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003534 oscillatory effect Effects 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000010454 slate Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002459 sustained effect Effects 0.000 description 1
- 238000010408 sweeping Methods 0.000 description 1
- WZWYJBNHTWCXIM-UHFFFAOYSA-N tenoxicam Chemical compound O=C1C=2SC=CC=2S(=O)(=O)N(C)C1=C(O)NC1=CC=CC=N1 WZWYJBNHTWCXIM-UHFFFAOYSA-N 0.000 description 1
- 229960002871 tenoxicam Drugs 0.000 description 1
- 239000010409 thin film Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0604—Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
- H04L41/0622—Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time based on time
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/147—Network analysis or design for predicting network behaviour
Definitions
- Control systems provide transportation, essential utilities, and manufacturing of goods to the masses. It is critical that controlled processes within these systems are executed correctly and according to schedule. Monitoring the system's performance during their operation is important for maintaining their reliability and availability.
- ICS industrial control system
- ICS computer controlled electro-mechanical frameworks
- ICS industrial control system
- These frameworks coordinate industrial operations between protocols, connections, and devices in the control system, so they can be executed properly and on schedule.
- these various components are generally interoperable due to the emergence of standardized computer interfaces and networking protocols that support control system implementations.
- the control system demonstrates state-like behavior that characterizes its overall functionality.
- the protocols employed may be open communication protocols or the software may be open source, which may increase the risk of cyberattacks.
- control system may be supplied by various vendors and often have large state spaces with high complexity (e.g., cycling states), thus making it impractical to capture the complete behavior of the control system.
- critical processes may be disrupted, resulting in damage to essential utilities, without detection.
- An automated anomaly detection may be achieved using machine learning/artificial intelligence (ML) algorithms, such as neural networks, that analyze the overall health of the components in a control system and predict the states of the components.
- ML machine learning/artificial intelligence
- the ML algorithms may consider patterns in data related to network traffic as well data from equipment (e.g., sensor data), and the states may be classified taking into account random deviations that may occur. If the control system is functioning properly, the state classified should match or be reasonably similar (i.e., consensus from the two models is achieved). However, when faulty equipment or processing errors cause unexpected behavior in the system, the classification may diverge, causing loss of consensus. Because the system diverges from normal behavior, this classification can also be described as anomaly detection.
- a method for control system anomaly detection comprising: receiving input data comprising: sensor data from equipment in the control system; and network data from a network in communication with the control system; normalizing distributions of the sensor data and the network data; checking time alignment between the sensor data to the network data; selecting a time window for accumulating the sensor data and the network data; feeding the sensor data into a first neural network comprising a behavior classifier of the equipment of the control system to output a first classified state of the control system; feeding the network data into a second neural network comprising a network traffic classifier to output a second classified state of the control system; and comparing the first and the second classified states for consensus for system anomaly detection, wherein accumulation of differences in classified states in a given time interval above a threshold indicates occurrence of an anomaly.
- control system comprises an industrial control system, distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded control system, or a combination thereof.
- control system comprises a general purpose computer.
- the industrial control system comprises one or more of programmable logic controllers, remote terminal units, intelligent electronic devices, engineering workstations, human machine interfaces, data historians, communication gateways, and front-end processors.
- the control system employs one or more network communication protocols.
- the one or more network communication protocols comprise standard network communication protocols, non-standard network communication protocols, or a combination thereof.
- the standard network communication protocols comprise process field bus (Profibus), process field net (Profinet), highway addressable remote transducer (HART), distributed network protocol (DNP3), Modbus, open platform communication (OPC), building automation and control networks (BACnet), common industrial protocol (CIP), or ethernet for control automation technology (EtherCAT).
- the sensor data comprises time series data.
- the sensor data is obtained from a standalone sensor or an integrated sensor.
- the integrated sensor is part of a control device comprising an actuator.
- the network data comprises packet data, metadata, or a combination thereof.
- the packet data comprises a packet's header, payload, trailer, or any combination thereof.
- the packet data from the packet's payload comprises bit streams.
- normalizing distributions of the sensor data and the network data comprises adjusting the distributions' mean, variance, higher-ordered moments, or a combination thereof.
- the method comprises resampling the sensor data, the network data, or a combination thereof for the time alignment between the sensor data and network data.
- the resampling results in the sensor data and the network data having a same number of samples.
- the resampling comprises downsampling.
- the resampling comprises upsampling.
- the resampling comprises unsampling.
- the method comprises windowing to adjust the time window for accumulating the sensor data, the network data, or a combination thereof. In further embodiments, the windowing accounts for delays in the network data, the sensor data, or a combination thereof.
- one or both of the first neural network and the second neural network are deep neural networks.
- the deep neural networks comprise convolutional layers such that one or both of the first neural network and the second neural network are convolutional neural networks.
- the convolutional neural networks comprise convolutional layers, pooling layers, flattening layers, dropout layers, and dense layers.
- the convolutional layers comprise 1D, 2D, or 3D convolutional layers.
- the pooling layers comprise maximum pooling layers, minimum pooling layers, average pooling layers, or a combination thereof.
- the convolutional neural networks have hyperparameters that are empirically chosen based on patterns in the network of the control system.
- the convolutional neural networks are supervised for training to identify one or both of the first classified state and the second classified state.
- the comparing the first and the second classified states for consensus for system anomaly detection is unsupervised for detecting the differences between the first and the second classified states.
- the threshold is an average discrepancy rate between the first and the second classified state.
- the threshold is dynamically changed over time.
- the anomaly is due to attacks on at least one of the equipment in the control system and the network of the control system.
- a computer-implemented systems for control system anomaly detection comprising: at least one logic element configured to perform operations on sensor data from equipment in the control system and network data from a network in the control system the operations comprising: a normalization operation to normalize distributions of the sensor data and the network data; a checking operation to check time alignment between the sensor data and the network data; and a selection operation to select a time window for accumulating the sensor data and the network data; a first neural network comprising a behavior classifier of the equipment of the control system for outputting a first classified state of the control system from the sensor data; a second neural network comprising a network traffic classifier for outputting a second classified state of the control system from the network data; and a discrepancy aggregator for comparing the first and the second classified state for consensus for control system anomaly detection, wherein accumulation of differences in the classified states in a given time interval above a threshold indicates occurrence of an anomaly.
- the computer-implemented system comprises at least one processor, a memory, and instructions executable by at least one processor.
- the computer-implemented system comprises a general purpose computer.
- the at least one logic element comprises a programmable logic controller (PLC), programmable logic array (PLA), programmable array logic (PAL), generic logic array (GLA), complex programmable logic decide (CPLD), field programmable gate array (FPGA), or application-specific integrated circuit (ASIC).
- the at least one logic element is implemented on a general purpose computer.
- the control system comprises an industrial control system, distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded system, or a combination thereof.
- DCS distributed control system
- SCADA supervisory control and data acquisition
- the industrial control system comprises one or more of programmable logic controllers, remote terminal units, intelligent electronic devices, engineering workstations, human machine interfaces, data historians, communication gateways, and front-end processors.
- the control system employs one or more network communication protocols.
- the one or more network communication protocols comprise standard network communication protocols, non-standard network communication protocols, or a combination thereof.
- the standard network communication protocols comprise process field bus (Profibus), process field net (Profinet), highway addressable remote transducer (HART), distributed network protocol (DNP3), Modbus, open platform communication (OPC), building automation and control networks (BACnet), common industrial protocol (CIP), or ethernet for control automation technology (EtherCAT).
- the sensor data comprises time series data. In some embodiments, the sensor data is obtained from a standalone sensor or an integrated sensor. In further embodiments, the integrated sensor is part of a control device comprising an actuator.
- the network data comprises packet data, metadata, or a combination thereof. In further embodiments, the packet data comprises a packet's header, payload, trailer, or a combination thereof. In still further embodiments, the packet data from the packet's payload comprises bit streams. In some embodiments, the normalization operation comprises adjusting the distribution's mean, variance, higher-ordered moments, or a combination thereof.
- the at least one logic element is configured to perform a resampling operation of the sensor data, the network data, or a combination thereof for the time alignment between the network data and the sensor data.
- the resampling operation results in the sensor data and the network data having a same number of samples.
- the resampling operation comprises downsampling.
- the resampling operation comprises upsampling.
- the resampling operation comprises unsampling.
- the at least one logic element is configured to perform a windowing operation to adjust the time windows for accumulating the sensor data, the network data, or a combination thereof.
- the windowing operation accounts for delays in the network data, sensors data, or a combination thereof.
- one or both of the first neural network and the second neural network are deep neural networks.
- the deep neural networks comprise convolutional layers such that one or both of the first neural network and the second neural network are convolutional neural networks.
- the convolutional neural networks comprise convolutional layers, pooling layers, flattening layers, dropout layers, and dense layers.
- the convolutional layers comprise 1D, 2D, or 3D convolutional layers.
- the pooling layers comprise maximum pooling layers, minimum pooling layers, average pooling layers, or a combination thereof.
- the convolutional neural networks have hyperparameters that are empirically chosen based on patterns in the network of the control system. In further embodiments, the convolutional neural networks are supervised for training to identify the classified states. In some embodiments, the threshold is an average discrepancy rate between the first and the second classified state. In further embodiments, the threshold is dynamically changed over time. In some embodiments, the anomaly is due to attacks on at least one of the equipment in the control system and the network of the control system.
- platforms for control system anomaly detection comprising: an apparatus comprising at least one logic element for performing operations on sensor data from equipment in the control system and network data from a network in communication with the control system; and a discrepancy aggregator for control system anomaly detection; and a cloud computing resource communicably coupled to the apparatus and comprising a first neural network and a second neural network; wherein the operations comprise: a normalization operation to normalize distributions of the sensor data and the network data; a checking operation to check time alignment between the sensor data and the network data; and a selection operation to select a time window for accumulating the sensor data and the network data; wherein the first neural network comprises a behavior classifier of the equipment of the control system outputting a first classified state of the control system from the sensor data from the operations; wherein the second neural network comprises a network traffic classifier outputting a second classified state of the control system from the network data from the operations; wherein the discrepancy aggregator compares the first and the second classified state for consensus for
- the apparatus comprising at least one logic element comprises at least one processor, a memory, and instructions executable by at least one processor.
- the at least one logic element comprises a programmable logic controller (PLC), programmable logic array (PLA), programmable array logic (PAL), generic logic array (GLA), complex programmable logic decide (CPLD), field programmable gate array (FPGA), or application-specific integrated circuit (ASIC).
- the control system comprises an industrial control system, distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded system, or a combination thereof.
- the industrial control system comprises one or more of programmable logic controllers, remote terminal units, intelligent electronic devices, engineering workstations, human machine interfaces, data historians, communication gateways, and front-end processors.
- the control system employs one or more network communication protocols.
- the one or more network communication protocols comprise standard network communication protocols, non-standard network communication protocols, or a combination thereof.
- the standard network communication protocols comprise process field bus (Profibus), process field net (Profinet), highway addressable remote transducer (HART), distributed network protocol (DNP3), Modbus, open platform communication (OPC), building automation and control networks (BACnet), common industrial protocol (CIP), or ethernet for control automation technology (EtherCAT).
- the sensor data comprises time series data. In some embodiments, the sensor data is obtained from a standalone sensor or an integrated sensor. In further embodiments, the integrated sensor is part of a control device comprising an actuator.
- the network data comprises packet data, metadata, or a combination thereof. In further embodiments, the packet data comprises a packet's header, payload, trailer, or a combination thereof. In still further embodiments, the packet data from the packet's payload comprises bit streams. In some embodiments, the normalization operation comprises adjusting the distribution's mean, variance, higher-ordered moments, or a combination thereof.
- the operations comprise a resampling operation of the sensor data, the network data, or a combination thereof for the time alignment between the network data and the sensor data.
- the resampling operation results in the sensor data and the network data having a same number of samples.
- the resampling operation comprises downsampling.
- the resampling operation comprises upsampling.
- the resampling operation comprises unsampling.
- the operations comprise a windowing operation to adjust the time windows for accumulating the sensor data, the network data, or a combination thereof.
- the windowing operation accounts for delays in the network data, sensors data, or a combination thereof.
- the first neural network and the second neural network are deep neural networks.
- the deep neural networks comprise convolutional layers such that one or both of the first neural network and the second neural network are convolutional neural networks.
- the convolutional neural networks comprise convolutional layers, pooling layers, flattening layers, dropout layers, and dense layers.
- the convolutional layers comprise 1D, 2D, or 3D convolutional layers.
- the pooling layers comprise maximum pooling layers, minimum pooling layers, average pooling layers, or a combination thereof.
- the convolutional neural networks have hyperparameters that are empirically chosen based on patterns in the network of the control system.
- the convolutional neural network is supervised for training to identify the first and the second classified states.
- the threshold is an average discrepancy rate between the first and the second classified state.
- the threshold is dynamically changed over time.
- the anomaly is due to attacks on at least one of the equipment in the control system and the network of the control system.
- a computer-implemented methods of training neural networks for control system anomaly detection comprising: collecting input data comprising sensor data from equipment in the control system and network data from a network in communication with the control system; preprocessing the sensor data and the network data to output preprocessed sensor data and preprocessed network data, the preprocessing comprising: normalizing to adjust distributions of the sensor data and the network data; checking the sensor data and the network data for time alignment; and selecting a time window for accumulating the sensor data and the network data; creating training sets comprising a first training set comprising the preprocessed sensor data and a second training set comprising the preprocessed network data; and training a first neural network comprising a behavior classifier of the equipment of the control system with the first training set to output a first classified state; and training a second neural network comprising a network traffic classifier with the second training set to output a second classified state.
- the method is implemented on a general purpose computer, a server, a cluster of servers, a distributed computing platform, or a cloud computing platform.
- the network data comprises packet data, metadata, or a combination thereof.
- the packet data comprises a packet's header, payload, trailer, or a combination thereof.
- the packet data from the packet's payload comprises bit streams.
- normalizing comprises adjusting the distribution's mean, variance, higher-ordered moments, or a combination thereof.
- the preprocessing comprises resampling for the time alignment of the sensor data, the network data, or a combination thereof.
- the resampling results in the sensor data and the network data having a same number of samples.
- resampling comprises downsampling, upsampling, or unsampling.
- the preprocessing comprises windowing to adjust the time windows for accumulating the sensor data, the network data, or a combination thereof.
- the windowing accounts for delays in the network data, the sensor data, or a combination thereof.
- one or both of the first neural network and the second neural network are deep neural networks.
- the deep neural networks comprise convolutional layers such that one or both of the first neural network and the second neural network are convolutional neural networks.
- the convolutional neural networks comprise convolutional layers, pooling layers, flattening layers, dropout layers, and dense layers.
- the convolutional layers comprise 1D, 2D, or 3D.
- the pooling layers comprise maximum pooling layers, minimum pooling layers, average pooling layers, or a combination thereof.
- the convolutional neural networks have hyperparameters empirically chosen based on patterns in the network of the control system.
- FIG. 1 shows a non-limiting example of a computing device; in this case, a device with one or more processors, memory, storage, and a network interface;
- FIG. 2 shows a non-limiting example of a block diagram of a generic ICS feedback loop
- FIG. 3 shows a non-limiting example of a multi-view classification system for a control system; in this case, for an ICS;
- FIG. 4 shows a non-limiting example of an architecture for an ICS testbed; in this case, for a MITM attack;
- FIG. 5 shows a non-limiting example of a dual-CNN architecture
- FIGS. 6 A- 6 D show raw data obtained from a trial during an MITM attack
- FIGS. 7 A- 7 C show confusion matrices for the raw sensor and packet data
- FIGS. 8 A- 8 C show classifier outputs tracking the difference between classified states
- FIG. 9 shows precision-recall curve (PRC) of the classifier performances.
- FIG. 10 shows a distribution of total prediction errors before anomaly detection.
- Described herein, in certain embodiments, are computer-implemented methods for control system anomaly detection comprising: receiving input data comprising: sensor data from equipment in the control system; and network data from a network in communication with the control system; normalizing distributions of the sensor data and the network data; checking time alignment between the sensor data to the network data; selecting a time window for accumulating the sensor data and the network data; feeding the sensor data into a first neural network comprising a behavior classifier of the equipment of the control system to output a first classified state of the control system; feeding the network data into a second neural network comprising a network traffic classifier to output a second classified state of the control system; and comparing the first and the second classified states for consensus for system anomaly detection, wherein accumulation of differences in classified states in a given time interval above a threshold indicates occurrence of an anomaly.
- ⁇ comprising: at least one logic element configured to perform operations on sensor data from equipment in the control system and network data from a network in the control system the operations comprising: a normalization operation to normalize distributions of the sensor data and the network data; a checking operation to check time alignment between the sensor data and the network data; and a selection operation to select a time window for accumulating the sensor data and the network data; a first neural network comprising a behavior classifier of the equipment of the control system for outputting a first classified state of the control system from the sensor data; a second neural network comprising a network traffic classifier for outputting a second classified state of the control system from the network data; and a discrepancy aggregator for comparing the first and the second classified state for consensus for control system anomaly detection, wherein accumulation of differences in the classified states in a given time interval above a threshold indicates occurrence of an anomaly.
- platforms for control system anomaly detection comprising: an apparatus comprising at least one logic element for performing operations on sensor data from equipment in the control system and network data from a network in communication with the control system; and a discrepancy aggregator for control system anomaly detection; and a cloud computing resource communicably coupled to the apparatus and comprising a first neural network and a second neural network; wherein the operations comprise: a normalization operation to normalize distributions of the sensor data and the network data; a checking operation to check time alignment between the sensor data and the network data; and a selection operation to select a time window for accumulating the sensor data and the network data; wherein the first neural network comprises a behavior classifier of the equipment of the control system outputting a first classified state of the control system from the sensor data from the operations; wherein the second neural network comprises a network traffic classifier outputting a second classified state of the control system from the network data from the operations; wherein the discrepancy aggregator compares the first and the second classified state
- ⁇ are computer-implemented methods of training neural networks for control system anomaly detection comprising: collecting input data comprising sensor data from equipment in the control system and network data from a network in communication with the control system; preprocessing the sensor data and the network data to output preprocessed sensor data and preprocessed network data, the preprocessing comprising: normalizing to adjust distributions of the sensor data and the network data; checking the sensor data and the network data for time alignment; and selecting a time window for accumulating the sensor data and the network data; creating training sets comprising a first training set comprising the preprocessed sensor data and a second training set comprising the preprocessed network data; and training a first neural network comprising a behavior classifier of the equipment of the control system with the first training set to output a first classified state; and training a second neural network comprising a network traffic classifier with the second training set to output a second classified state.
- control system may generally refer to a framework to coordinate operations between components, such as protocols, connections, and devices, in a system.
- the operations may be executed with one or more logic elements.
- the control system comprises an industrial control system (ICS), distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded system, or a combination thereof.
- the control system comprises a general purpose computing system with one or more network connections such as the Internet, Bluetooth, and the like, wherein the general purpose computing system is controlled by user input or by application programs running on the general purpose computing system.
- the general purpose computing system comprises an edge device such as a desktop or a notebook, tablet, smartphone, or other portable computing device.
- the general purpose computing system comprises a server or server cluster interconnected to a combination of local components and remote components via one or more network connections.
- neural network may generally refer to a computational network composed of nodes.
- the nodes of the neural network may be connected as layers or graphs.
- the neural network comprises an algorithm designed for solving a specific problem.
- the neural network may comprise a generalizable algorithm to solve a range of problems.
- the neural network may “learn” how to solve one or more problems.
- classified state may generally refer to a state(s) of a component(s) in a control system or in communication with a control system.
- the state may be determined based on values, ranges, or patterns detected in physical measurements of components in the control system or in communication with the control system.
- the states may be determined by a ML algorithm for classification or clustering.
- the ML algorithm may be a neural network.
- the term “discrepancy aggregator” may generally refer to a computational framework comprising at least one logic element for comparing classified states of components of a control system.
- the discrepancy aggregator may accumulate errors (or difference) between classified states for a given time period if the classified states of the components in the control system lack consensus.
- the accumulation of errors may be compared to a threshold. If the accumulation of errors is greater than the threshold, an anomaly may be identified in the control system.
- the term “anomaly” or “anomalies” may generally refer to abnormal behavior in one or more components in a control system or in communication with the control system. Abnormal behavior may comprise of irregular values, ranges, or patterns detected in physical measurements of components in the control system or in communication with the control system. In some embodiments, the anomaly may comprise of faulty components due to wearing of components over time or due to an accident. In some embodiments, the anomaly may be indicative of a cyberattack.
- FIG. 1 a block diagram is shown depicting an exemplary machine that includes a computer system 100 (e.g., a processing or computing system) within which a set of instructions can execute for causing a device to perform or execute any one or more of the aspects and/or methodologies for static code scheduling of the present disclosure.
- a computer system 100 e.g., a processing or computing system
- the components in FIG. 1 are examples only and do not limit the scope of use or functionality of any hardware, software, embedded logic component, or a combination of two or more such components implementing particular embodiments.
- Computer system 100 may include one or more processors 101 , a memory 103 , and a storage 108 that communicate with each other, and with other components, via a bus 140 .
- the bus 140 may also link a display 132 , one or more input devices 133 (which may, for example, include a keypad, a keyboard, a mouse, a stylus, etc.), one or more output devices 134 , one or more storage devices 135 , and various tangible storage media 136 . All of these elements may interface directly or via one or more interfaces or adaptors to the bus 140 .
- the various tangible storage media 136 can interface with the bus 140 via storage medium interface 126 .
- Computer system 100 may have any suitable physical form, including but not limited to one or more integrated circuits (ICs), printed circuit boards (PCBs), mobile handheld devices (such as mobile telephones or PDAs), laptop or notebook computers, distributed computer systems, computing grids, or servers.
- ICs integrated circuits
- PCBs printed circuit boards
- Computer system 100 includes one or more processor(s) 101 (e.g., central processing units (CPUs), general purpose graphics processing units (GPGPUs), or quantum processing units (QPUs)) that carry out functions.
- processor(s) 101 optionally contains a cache memory unit 102 for temporary local storage of instructions, data, or computer addresses.
- Processor(s) 101 are configured to assist in execution of computer readable instructions.
- Computer system 100 may provide functionality for the components depicted in FIG. 1 as a result of the processor(s) 101 executing non-transitory, processor-executable instructions embodied in one or more tangible computer-readable storage media, such as memory 103 , storage 108 , storage devices 135 , and/or storage medium 136 .
- the computer-readable media may store software that implements particular embodiments, and processor(s) 101 may execute the software.
- Memory 103 may read the software from one or more other computer-readable media (such as mass storage device(s) 135 , 136 ) or from one or more other sources through a suitable interface, such as network interface 120 .
- the software may cause processor(s) 101 to carry out one or more processes or one or more steps of one or more processes described or illustrated herein. Carrying out such processes or steps may include defining data structures stored in memory 103 and modifying the data structures as directed by the software.
- the memory 103 may include various components (e.g., machine readable media) including, but not limited to, a random access memory component (e.g., RAM 104 ) (e.g., static RAM (SRAM), dynamic RAM (DRAM), ferroelectric random access memory (FRAM), phase-change random access memory (PRAM), etc.), a read-only memory component (e.g., ROM 105 ), and any combinations thereof.
- ROM 105 may act to communicate data and instructions unidirectionally to processor(s) 101
- RAM 104 may act to communicate data and instructions bidirectionally with processor(s) 101 .
- ROM 105 and RAM 104 may include any suitable tangible computer-readable media described below.
- a basic input/output system 106 (BIOS) including basic routines that help to transfer information between elements within computer system 100 , such as during start-up, may be stored in the memory 103 .
- BIOS basic input/output system 106
- Fixed storage 108 is connected bidirectionally to processor(s) 101 , optionally through storage control unit 107 .
- Fixed storage 108 provides additional data storage capacity and may also include any suitable tangible computer-readable media described herein.
- Storage 108 may be used to store operating system 109 , executable(s) 110 , data 111 , applications 112 (application programs), and the like.
- Storage 108 can also include an optical disk drive, a solid-state memory device (e.g., flash-based systems), or a combination of any of the above.
- Information in storage 108 may, in appropriate cases, be incorporated as virtual memory in memory 103 .
- storage device(s) 135 may be removably interfaced with computer system 100 (e.g., via an external port connector (not shown)) via a storage device interface 125 .
- storage device(s) 135 and an associated machine-readable medium may provide non-volatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for the computer system 100 .
- software may reside, completely or partially, within a machine-readable medium on storage device(s) 135 .
- software may reside, completely or partially, within processor(s) 101 .
- Bus 140 connects a wide variety of subsystems.
- reference to a bus may encompass one or more digital signal lines serving a common function, where appropriate.
- Bus 140 may be any of several types of bus structures including, but not limited to, a memory bus, a memory controller, a peripheral bus, a local bus, and any combinations thereof, using any of a variety of bus architectures.
- such architectures include an Industry Standard Architecture (ISA) bus, an Enhanced ISA (EISA) bus, a Micro Channel Architecture (MCA) bus, a Video Electronics Standards Association local bus (VLB), a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, an Accelerated Graphics Port (AGP) bus, HyperTransport (HTX) bus, serial advanced technology attachment (SATA) bus, and any combinations thereof.
- ISA Industry Standard Architecture
- EISA Enhanced ISA
- MCA Micro Channel Architecture
- VLB Video Electronics Standards Association local bus
- PCI Peripheral Component Interconnect
- PCI-X PCI-Express
- AGP Accelerated Graphics Port
- HTTP HyperTransport
- SATA serial advanced technology attachment
- Computer system 100 may also include an input device 133 .
- a user of computer system 100 may enter commands and/or other information into computer system 100 via input device(s) 133 .
- Examples of an input device(s) 133 include, but are not limited to, an alpha-numeric input device (e.g., a keyboard), a pointing device (e.g., a mouse or touchpad), a touchpad, a touch screen, a multi-touch screen, a joystick, a stylus, a gamepad, an audio input device (e.g., a microphone, a voice response system, etc.), an optical scanner, a video or still image capture device (e.g., a camera), and any combinations thereof.
- an alpha-numeric input device e.g., a keyboard
- a pointing device e.g., a mouse or touchpad
- a touchpad e.g., a touch screen
- a multi-touch screen e.g., a
- the input device is a Kinect, Leap Motion, or the like.
- Input device(s) 133 may be interfaced to bus 140 via any of a variety of input interfaces 123 (e.g., input interface 123 ) including, but not limited to, serial, parallel, game port, USB, FIREWIRE, THUNDERBOLT, or any combination of the above.
- computer system 100 when computer system 100 is connected to network 130 , computer system 100 may communicate with other devices, specifically mobile devices and enterprise systems, distributed computing systems, cloud storage systems, cloud computing systems, and the like, connected to network 130 . Communications to and from computer system 100 may be sent through network interface 120 .
- network interface 120 may receive incoming communications (such as requests or responses from other devices) in the form of one or more packets (such as Internet Protocol (IP) packets) from network 130 , and computer system 100 may store the incoming communications in memory 103 for processing.
- Computer system 100 may similarly store outgoing communications (such as requests or responses to other devices) in the form of one or more packets in memory 103 and communicated to network 130 from network interface 120 .
- Processor(s) 101 may access these communication packets stored in memory 103 for processing.
- Examples of the network interface 120 include, but are not limited to, a network interface card, a modem, and any combination thereof.
- Examples of a network 130 or network segment 130 include, but are not limited to, a distributed computing system, a cloud computing system, a wide area network (WAN) (e.g., the Internet, an enterprise network), a local area network (LAN) (e.g., a network associated with an office, a building, a campus or other relatively small geographic space), a telephone network, a direct connection between two computing devices, a peer-to-peer network, and any combinations thereof.
- a network, such as network 130 may employ a wired and/or a wireless mode of communication. In general, any network topology may be used.
- a display 132 can be displayed through a display 132 .
- a display 132 include, but are not limited to, a cathode ray tube (CRT), a liquid crystal display (LCD), a thin film transistor liquid crystal display (TFT-LCD), an organic liquid crystal display (OLED) such as a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display, a plasma display, and any combinations thereof.
- the display 132 can interface to the processor(s) 101 , memory 103 , and fixed storage 108 , as well as other devices, such as input device(s) 133 , via the bus 140 .
- the display 132 is linked to the bus 140 via a video interface 122 , and transport of data between the display 132 and the bus 140 can be controlled via the graphics control 121 .
- the display is a video projector.
- computer system 100 may include one or more other peripheral output devices 134 including, but not limited to, an audio speaker, a printer, a storage device, and any combinations thereof.
- peripheral output devices may be connected to the bus 140 via an output interface 124 .
- Examples of an output interface 124 include, but are not limited to, a serial port, a parallel connection, a USB port, a FIREWIRE port, a THUNDERBOLT port, and any combinations thereof.
- computer system 100 may provide functionality as a result of logic hardwired or otherwise embodied in a circuit, which may operate in place of or together with software to execute one or more processes or one or more steps of one or more processes described or illustrated herein.
- Reference to software in this disclosure may encompass logic, and reference to logic may encompass software.
- reference to a computer-readable medium may encompass a circuit (such as an IC) storing software for execution, a circuit embodying logic for execution, or both, where appropriate.
- the present disclosure encompasses any suitable combination of hardware, software, or both.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
- a processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- a software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
- An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
- the storage medium may be integral to the processor.
- the processor and the storage medium may reside in an ASIC.
- the ASIC may reside in a user terminal.
- the processor and the storage medium may reside as discrete components in a user terminal.
- suitable computing devices include, by way of non-limiting examples, server computers, desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, personal digital assistants, and vehicles.
- server computers desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, personal digital assistants, and vehicles.
- Suitable tablet computers include those with booklet, slate, and convertible configurations, known to those of skill in the art.
- the computing device includes an operating system configured to perform executable instructions.
- the operating system is, for example, software, including programs and data, which manages the device's hardware and provides services for execution of applications.
- suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®.
- suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®.
- the operating system is provided by cloud computing.
- suitable mobile smartphone operating systems include, by way of non-limiting examples, Nokia® Symbian® OS, Apple® iOS®, Research In Motion® BlackBerry OS®, Google® Android®, Microsoft® Windows Phone® OS, Microsoft® Windows Mobile® OS, Linux®, and Palm® WebOS®.
- suitable media streaming device operating systems include, by way of non-limiting examples, Apple TV®, Roku®, Boxee®, Google TV®, Google Chromecast®, Amazon Fire®, and Samsung® HomeSync®.
- the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked computing device.
- a computer readable storage medium is a tangible component of a computing device.
- a computer readable storage medium is optionally removable from a computing device.
- a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, distributed computing systems including cloud computing systems and services, and the like.
- the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.
- the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same.
- a computer program includes a sequence of instructions, executable by one or more processor(s) of the computing device's CPU, written to perform a specified task.
- Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), computing data structures, and the like, that perform particular tasks or implement particular abstract data types.
- APIs Application Programming Interfaces
- a computer program comprises one sequence of instructions. In some embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.
- a computer program includes a web application.
- a web application in various embodiments, utilizes one or more software frameworks and one or more database systems.
- a web application is created upon a software framework such as Microsoft® .NET or Ruby on Rails (RoR).
- a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, XML, and document oriented database systems.
- suitable relational database systems include, by way of non-limiting examples, Microsoft® SQL Server, mySQLTM, and Oracle®.
- a web application in various embodiments, is written in one or more versions of one or more languages.
- a web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof.
- a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or eXtensible Markup Language (XML).
- a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS).
- CSS Cascading Style Sheets
- a web application is written to some extent in a client-side scripting language such as Asynchronous JavaScript and XML (AJAX), Flash® ActionScript, JavaScript, or Silverlight®.
- AJAX Asynchronous JavaScript and XML
- a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion®, Perl, JavaTM, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), PythonTM, Ruby, Tcl, Smalltalk, WebDNA®, or Groovy.
- a web application is written to some extent in a database query language such as Structured Query Language (SQL).
- SQL Structured Query Language
- a web application integrates enterprise server products such as IBM® Lotus Domino®.
- a web application includes a media player element.
- a media player element utilizes one or more of many suitable multimedia technologies including, by way of non-limiting examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, JavaTM, and Unity®.
- a computer program includes a mobile application provided to a mobile computing device.
- the mobile application is provided to a mobile computing device at the time it is manufactured. In other embodiments, the mobile application is provided to a mobile computing device via the computer network described herein.
- a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of non-limiting examples, C, C++, C#, Objective-C, JavaTM, JavaScript, Pascal, Object Pascal, PythonTM, Ruby, VB.NET, WML, and XHTML/HTML with or without CSS, or combinations thereof.
- Suitable mobile application development environments are available from several sources.
- Commercially available development environments include, by way of non-limiting examples, AirplaySDK, alcheMo, Appcelerator®, Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform.
- Other development environments are available without cost including, by way of non-limiting examples, Lazarus, MobiFlex, MoSync, and PhoneGap.
- mobile device manufacturers distribute software developer kits including, by way of non-limiting examples, iPhone and iPad (iOS) SDK, AndroidTM SDK, BlackBerry® SDK, BREW SDK, Palm® OS SDK, Symbian SDK, webOS SDK, and Windows® Mobile SDK.
- a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in.
- standalone applications are often compiled.
- a compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, JavaTM, Lisp, PythonTM, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program.
- a computer program includes one or more executable complied applications.
- the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same.
- software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art.
- the software modules disclosed herein are implemented in a multitude of ways.
- a software module comprises a file, a section of code, a programming object, a programming structure, a distributed computing resource, a cloud computing resource, or combinations thereof.
- a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, a plurality of distributed computing resources, a plurality of cloud computing resources, or combinations thereof.
- the one or more software modules comprise, by way of non-limiting examples, a web application, a mobile application, a standalone application, and a distributed or cloud computing application.
- software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on a distributed computing platform such as a cloud computing platform. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.
- the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same.
- suitable databases include, by way of non-limiting examples, relational databases, non-relational databases, object-oriented databases, object databases, entity-relationship model databases, associative databases, XML databases, document oriented databases, and graph databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, Sybase, and MongoDB.
- a database is Internet-based.
- a database is web-based.
- a database is cloud computing-based.
- a database is a distributed database.
- a database is based on one or more local computer storage devices.
- a control system may comprise a framework to coordinate operations between protocols, connections, and devices, so they may be executed properly and on schedule.
- the operations may be executed with one or more logic elements comprising a programmable logic controller (PLC), programmable logic array (PLA), programmable array logic (PAL), generic logic array (GLA), complex programmable logic decide (CPLD), field programmable gate array (FPGA), or application-specific integrated circuit (ASIC).
- PLC programmable logic controller
- PLA programmable logic array
- PAL programmable array logic
- GLA generic logic array
- CPLD complex programmable logic decide
- FPGA field programmable gate array
- ASIC application-specific integrated circuit
- the control system may comprise one or more network communication protocols that may be standard network communication protocols, non-standard network communication protocols, or a combination thereof.
- the standard network communication protocols are process field bus (Profibus), process field net (Profinet), highway addressable remote transducer (HART), distributed network protocol (DNP3), Modbus, open platform communication (OPC), building automation and control networks (BACnet), common industrial protocol (CIP), or ethernet for control automation technology (EtherCAT).
- the control system may be an industrial control system (ICS), distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded system, or a combination thereof.
- control systems may include industrial and manufacturing facilities (i.e., an ICS)
- the control system may support production and processing objectives on a mass-scale.
- An ICS may comprise one or more of PLCs, remote terminal units, intelligent electronic devices, engineering workstations, HMI, data historians, communication gateways, and front-end processors.
- an ICS may have different controllable states as steps of a process, and may use an open communication protocol (e.g., Modbus for ICS networks). Further, the open communication protocol, such as Modbus, may not be encrypted at any point during the communication, thus increasing the likelihood of an attack.
- an open communication protocol e.g., Modbus for ICS networks
- Modbus Modbus
- a generic ICS feedback control loop is exemplary illustrated in FIG. 1 .
- An ICS feedback loop may generally comprise a human-machine interface (HMI) 205 .
- the HMI 205 may be a user interface (e.g., GUI) that connects a person to one or more components (e.g., equipment, network, etc.) in the ICS.
- the HMI 205 may send a query to a programmable logic controller (PLC) 210 regarding the state or function of components in the ICS, and the PLC 210 may send a response back to the HMI 205 , which may be displayed on the user interface.
- PLC 210 may send status information regarding components of the ICS to the HMI 205 .
- the PLC 210 may implement control strategies using a system comprising a microprocessor for managing components in the ICS.
- the components may be a physical device 215 , such as equipment in the ICS. In some cases, the equipment may be on-site or remote.
- the PLC 210 may control a physical device 215 or a plurality thereof, such as control motors, valves, switches, etc. In some examples, the PLC 210 may control a physical device 215 based on measurements obtained from sensors 220 , which may determine when and how the physical device 215 should operate. In some cases, the measurements may be physical measurements obtained from sensors 220 , such as pressure, volume, temperature, humidity, torque, vacuum, motion, etc.
- the sensor 220 may be a standalone sensor or an integrated sensor. In some examples, the integrated sensor may be part of a control device comprising an actuator.
- the PLC 210 may receive commands for the physical device 215 to perform functions (e.g., pump actuation, stirrer operation, conveyor belt operation, etc.) from the HMI 205 .
- Safety, reliability, and resilience to cyberattacks may be key attributes for the successful operation of an ICS. These attributes may be threatened due to an increase in attack surfaces due to IOT devices, difficulties in performing patch updates to components in the ICS from downtime and vendor varieties, or an accumulation of small errors over time that may result in larger failures.
- An anomaly detection system for recognizing threats may increase the likelihood of the successful operation of an ICS.
- data obtained related to the ICS may be used for anomaly recognition.
- the data may be obtained from one or more sources, such as components of the ICS or communicably coupled to the ICS (e.g., data from a network, such as Modbus commands, sensor data, etc.).
- the data from one or more sources may be analyzed and compared to previous data for anomalies.
- a pressure sensor may have a normal operating range, and a pressure value outside that range may be flagged as an anomaly.
- network traffic patterns may be analyzed, and an unusually high or low traffic pattern may be flagged as an anomaly.
- the data from multiple sources may be analyzed and compared to one another for anomalies.
- having two different models i.e., ML algorithms
- predict the state of multiple sources may help identify miscommunication errors and the occurrence of an anomaly.
- sensor data and network traffic patterns may be analyzed and compared to one another to better assess when an anomaly has occurred.
- the anomaly detection as described herein, by way of non-limiting example, for an ICS may be performed with a classification system employing ML techniques.
- the classification system may employ neural networks.
- An architecture comprising neural networks may be used for predicting the states of components for a control system.
- the states may be predicted by the neural networks using classification of behavioral patterns of components in the control system (e.g., ‘FAST’, ‘SLOW’, ‘ON’, ‘OFF’, etc.).
- classification may be compared to past classifications or may be compared to other components in the control system for multi-view classification in order to identify the occurrence of an anomaly.
- raw data 305 may be obtained from components in the control system or in communication with the control system.
- Raw data 305 may comprise of one or more inputs from components as described herein.
- the raw data 305 may comprise sensor data from equipment in the control system (e.g., accelerometer or gyrometer in an ICS).
- the sensor data may be obtained from equipment in an embedded system (e.g., glucose sensors in an insulin pump, sensors in a pacemaker, etc.).
- the raw data 305 may comprise network data from a network in communication with the control system.
- the network data may comprise packet data, metadata, or a combination thereof.
- the packet data may comprise a packet's header, payload, trailer, or any combination thereof.
- the packet data from the packet's payload may comprise bit streams.
- the network data may comprise of interarrival times, which may be referred to as packet time deltas or the first difference.
- each packet may contain a timestamp for when it arrives to the ICS or a component of the ICS, and taking the difference between two adjacent timestamps may yield the amount of time between each packet arrival.
- interarrival times may change (e.g., increase or decrease) during a change in the state of a control system, which may then return to a baseline interarrival time.
- interarrival times may be used for detecting anomalous state changes.
- Preprocessing may be performed on the raw data 305 using at least one logic element, as described herein.
- the multi-view classification as exemplary illustrated in FIG. 3 , may preprocess data for time period 310 , in which the time period of the raw data 305 may be adjusted.
- preprocessing may comprise of normalizing distributions of one or more inputs of the raw rate 305 (e.g., the sensor data and the network data).
- a normalizing operation may adjust a distributions' mean, variance, higher-ordered moments, or a combination thereof.
- preprocessing may comprise of checking time alignment between one or more inputs of the raw data 305 (e.g., the sensor data to the network data).
- the checking operation may resample any one of the inputs of the raw data 305 (e.g., as i.e., the sensor data, the network data, or any combination thereof) for the time alignment between them.
- the resampling may result in the inputs of the raw data 305 having a same number of samples.
- the resampling comprises downsampling.
- the resampling comprises upsampling.
- the resampling comprises unsampling.
- preprocessing may comprise of selecting a time window for accumulating the one or more inputs of the raw data 305 (e.g., sensor data and the network data).
- this selection operation may comprise of windowing to adjust the time window for accumulating any one of the inputs of the raw data 305 .
- the windowing accounts for delays in any one of the inputs of the raw data 305 .
- using a smaller time window may allow control of false positive to false negative ratio of the classification, which can be optimized based on the costs of a misclassification.
- the size of the time window may be empirically chosen from observing the patterns of the raw data 305 .
- the data from preprocessing operations, as described herein, may be fed into one or more ML algorithms for identifying single or multi-stage attacks, or detecting anomalies in a control system.
- these attacks or anomalies may be detected by analyzing packet streams and content from a network.
- the network may use one or more communication protocol (e.g., the Modbus protocol).
- these attack or anomalies may be detected from time series data of sensors.
- the one or more ML algorithms may be supervised, semi-supervised, or unsupervised for training to identify anomalies.
- the one or more ML algorithms may perform classification or clustering to identify anomalies or attacks.
- the one or more ML algorithms may comprise classical ML algorithms for performing clustering to identify outliers.
- Classical ML algorithms may comprise of algorithms that learn from existing observations (i.e., known features) to predict outputs.
- the classical ML algorithms for performing clustering may be K-means clustering, mean-shift clustering, density-based spatial clustering of applications with noise (DBSCAN), expectation-maximization (EM) clustering (e.g., using Gaussian mixture models (GMM)), agglomerative hierarchical clustering, or a combination thereof.
- the one or more ML algorithms may comprise classical ML algorithms for classification.
- the classical ML algorithms may comprise logistic regression, na ⁇ ve Bayes, K-nearest neighbors, random forests or decision trees, gradient boosting, support vector machines (SVMs), or a combination thereof.
- the one or more ML algorithm may employ deep learning.
- a deep learning algorithm may comprise of an algorithm that learns by extracting new features to predict outputs.
- the deep learning algorithm may comprise of layers, which may comprise a neural network.
- Neural networks may comprise of connected nodes in a network, which may perform functions, such as transforming or translating input data.
- the output from a given node may be passed on as input to another node.
- the nodes in the network may comprise of input units, hidden units, output units, or a combination thereof.
- an input node may be connected to one or more hidden units.
- one or more hidden units may be connected to an output unit.
- the nodes may take in input and may generate an output based on an activation function.
- the input or output may be a tensor, a matrix, a vector, an array, or a scalar.
- the activation function may be a Rectified Linear Unit (ReLU) activation function, a sigmoid activation function, or a hyperbolic tangent activation function.
- the activation function may be a Softmax activation function.
- the connections between nodes may further comprise of weights for adjusting input data to a given node (i.e., to activate input data or deactivate input data).
- the weights may be learned by the neural network.
- the neural network may be trained using gradient-based optimizations. In some cases, the gradient-based optimization may comprise of one or more loss functions.
- the gradient-based optimization may be conjugate gradient descent, stochastic gradient descent, or a variation thereof (e.g., adaptive moment estimation (Adam)).
- the gradient in the gradient-based optimization may be computed using backpropagation.
- the nodes may be organized into graphs to generate a network (e.g., graph neural networks).
- the nodes may be organized into one or more layers to generate a network (e.g., feed forward neural networks, convolutional neural networks (CNNs), recurrent neural networks (RNNs), etc.).
- the neural network may be a deep neural network comprising of more than one layer.
- the neural network may comprise one or more recurrent layer.
- the one or more recurrent layer may be one or more long short-term memory (LSTM) layers or gated recurrent unit (GRU), which may perform sequential data classification and clustering.
- LSTM long short-term memory
- GRU gated recurrent unit
- future predictions may be made by the one or more recurrent layers according to the sequence of past events since data ordering is considered.
- the recurrent layer may retain or “remember” important information, while selectively “forgetting” what is not essential in the classification model.
- the neural network may comprise one or more convolutional layers.
- the input and output may be a tensor representing of variables or attributes in a data set (i.e., features), which may be referred to as a feature map (or activation map).
- the one or more convolutional layers may be referred to as a feature extraction phase.
- the convolutions may be one dimensional (1D) convolutions, two dimensional (2D) convolutions, three dimensional (3D) convolutions, or any combination thereof.
- the convolutions may be 1D transpose convolutions, 2D transpose convolutions, 3D transpose convolutions, or any combination thereof.
- one-dimensional convolutional layers may be suited for time series sensor data analysis since it may classify time series through parallel convolutions.
- convolutional layers may be used for analyzing raw data in the payload of a network packet. Further, the convolutional layers may be efficient for detecting properties in payload bit patterns of a control system since they may follow a recognizable pattern (e.g., payload bit patterns in an ICS follow recognizable ICS command patterns).
- the layers in a neural network may further comprise one or more pooling layers before or after a convolutional layer.
- the one or more pooling layers may reduce the dimensionality of the feature map using filters that summarize regions of the matrix. This may down sample the number of outputs, and thus reduce the parameters and computational resources needed for the neural network.
- the one or more pooling layers may be max pooling, min pooling, average pooling, global pooling, norm pooling, or a combination thereof. Max pooling may reduce the dimensionality of the data by taking only the maximums values in the region of the matrix, which helps capture the significant feature.
- the one or more pooling layers may be one dimensional (1D), two dimensional (2D), three dimensional (3D), or any combination thereof.
- the neural network may further comprise of one or more flattening layers, which may flatten the input to be passed on to the next layer.
- the input e.g., feature map
- the flattened inputs may be used to output a classification of an object (e.g., binary classification of an image, such as cat or dog, or of a system's performance, such as normal or abnormal, or multi-class classification identifying hand-written digits, etc.).
- the neural networks may further comprise of one or more dropout layers. Dropout layers may be used during training of the neural network (e.g., to perform binary or multi-class classifications).
- the one or more dropout layers may randomly set certain weights as 0, which may set corresponding elements in the feature map as 0, so the neural network may avoid overfitting.
- the neural network may further comprise of one or more dense layers, which comprise a fully connected network. In the dense layer, information may be passed through the fully connected network to generate a predicted classification of an object, and the error may be calculated. In some embodiments, the error may be backpropagated to improve the prediction.
- the one or more dense layers may comprise of a Softmax activation function, which may convert a vector of numbers to a vector of probabilities. These probabilities may be subsequently used in classifications, such as classifications of states in a control system as described herein. In some embodiments, the classifications of states from one or more components in a control system may be compared to detect the occurrence of an anomaly.
- An architecture for anomaly detection may comprise two neural networks for dual neural network state prediction as exemplary illustrated in FIG. 3 .
- the neural networks may use different sets of features for prediction, such as those obtained from network data and sensor data.
- two neural networks are employed in this example, one of skill in the art will appreciate that any one of the ML algorithms as described herein may be used which may be suited for a particular input data set and desired output.
- more than two ML algorithms may be employed in this architecture.
- the ML algorithms as described herein may be combined or that more than one input data may be fed into a single ML algorithm (e.g., the network data and sensor data may be fed into the same algorithm).
- the network data (e.g., network payload data) may be fed into a neural network comprising a network traffic classifier 315 .
- the neural network comprising the network traffic classifier 315 may be trained to learn “normal” network traffic patterns and classify the network traffic patterns in a given time period by comparing it to the “normal” network traffic pattern.
- the network traffic classifier 315 may use the comparison to classify the state of the network traffic pattern in a given time period (e.g., “FAST”, “SLOW”, “MEDIUM”, “HALT”, “OFF”, “REVERSE”, etc.).
- the output from the network traffic classifier 315 may comprise of a classified state, illustrated as y in FIG. 3 .
- the neural network may be trained to classify network data that is encrypted through various methods (e.g., Electronic Code Book, Cipher-Block Chaining, Cipher FeedBack, XOR encryption, etc.).
- the sensor data may be fed into a behavioral classifier.
- the sensor data may be time series data.
- the sensor data may be time series data obtained from an accelerometer, a gyrometer, or any other equipment of the ICS.
- the behavioral classifier may comprise a motor behavioral classifier 320 .
- the neural network comprising the behavioral classifier may be trained to learn “normal” sensor ranges or values for a given time period, and classify the sensor data in a given time period by comparing it to the “normal” range or values.
- the behavioral classifier may use the comparison to classify the state of the sensor data in a given time period (e.g., “FAST”, “SLOW”, “MEDIUM”, “HALT”, “OFF”, “REVERSE”, etc.).
- the output from the behavioral classifier may comprise of another classified state, illustrated as ⁇ in FIG. 3 .
- the classified states, y and ⁇ from the neural networks comprising classifiers may be compared to one another using a discrepancy aggregator which may comprise at least one logic element.
- consensus from the two neural networks is achieved and the classification system may return to preprocess data for time period 310 for new raw data 305 .
- the classified states may be logged, or data may be used for comparison against new raw data 305 .
- the discrepancy aggregator may then accumulate errors (or difference) for the current time window (or time period) of predictions 325 (i.e., E in FIG. 3 ) between the classified states.
- the accumulation of errors, E may then be compared to a threshold, T.
- the threshold may be empirically chosen from observing the patterns of the raw data 305 .
- T as a threshold 330 may be set according to an average discrepancy rate between the classified states.
- T as a threshold 330 may be dynamically changed over time.
- the threshold and the time window may be inversely related (i.e., the greater the time window, the lower the threshold may be needed).
- the classification system may return to preprocess data for time period 310 for new raw data 305 . If the accumulation of errors is greater than the threshold (i.e., E>T), an anomaly is identified 335 .
- the anomaly may comprise of faulty or abnormal behavior of components in the control system or in communication with the control system, or may be indicative of a cyberattack.
- An ICS test bed to detect anomalies using packet and sensor data patterns was created according to the architecture illustrated FIG. 4 .
- This test bed used two streams of data under the assumption that during normal operation, the patterns of command payloads would result in specific patterns of sensor behavior.
- the architecture was created for a man-in-the-middle (MITM) 410 attack.
- MITM 410 attack may comprise of a scenario in which an attacker may secretly relay and alter communications between two or more sources in an ICS without their knowledge.
- the testbed comprised a MITM 410 between an HMI/PLC 405 and a switch 415 .
- the set up comprised a Tolomatic industrial motor, which received commands from the HMI/PLC 405 .
- the data flow in FIG. 4 was as followed; 1) the HMI/PLC 405 sent a continual stream of motor commands to the switch 415 (e.g., off, on, change of speed, etc), 2) the commands from the switch 415 were sent to a sensor controller 420 comprising an inline Raspberry Pi for logging purposes, 3) the sensor controller 420 forwarded messages to a motor 425 or a sensor 430 , and 4) the motor 425 and sensor 430 responded with a continual stream of data that was read from the motor or recorded by the sensor, which was also routed through the inline Raspberry Pi.
- the sensor 430 recorded accelerometer and gyroscopic data related to the motor 425 .
- the accelerometer and gyroscopic sensor data were stored as Comma Separated Values (CSVs) in the X, Y, and Z directions that represented the acceleration and orientation of the attached sensor separated in the three-dimensional space.
- CSVs Comma Separated Values
- This data was collected as a constant stream as the sensor controller continuously logged data from the motor at a fixed sample rate of 10 thousand samples per second.
- the gyrometer data as logged as floating-point values represent angular velocity as degrees per second.
- the accelerometer data measured the force on the motor in that direction in meters per second squared. In total, six sensor data streams are used for sensor classification.
- This ICS ted bed system was constructed to communicate using Modbus packets between the HMI/PLC 405 and the motor 425 . Communication was structured such that all messages sent from the HMI/PLC 405 to the motor 425 resulted in a response message sent back to the HMI/PLC 405 . Modbus packets within the system were therefore the Read/Write commands sent to the motor 425 , and motor data sent back to the HMI/PLC 405 . Rather than being a constant stream of data input, each payload arrived at different times from the sensor controller 420 . The payload data was converted from its original byte format to binary, since network data was preprocessed from PCAP files. Each individual data payload was about 53 bytes between 0 to 255, which were converted to binary for machine learning input changing the input width from 53 bytes to 424 bits.
- FIGS. 6 A- 6 D Raw data obtained from a trial during an MITM attack is shown in FIGS. 6 A- 6 D .
- the payload data was represented in its byte format in FIG. 6 A , where each of the 53 bytes were vertically stacked and pixel color intensity represented the 0-255 value for that byte. Thus, the color changes represent how byte locations in some packet have static, cyclic, or random values.
- the accelerometer and gyrometer sensor data are shown in FIGS. 6 B and 6 C , respectively, where a state change from the random short burst of speed and forces were observed.
- FIG. 6 D illustrates packet deltas over time, as described herein, although this data was not used to predict the ICS states in the present test case.
- an error was defined as a difference between the predicted state of two CNNs, which classified the states of the payload or sensor data into one of six possible ICS states: ‘FAST’, ‘HALT’, ‘MEDIUM’, ‘OFF’, ‘REVERSE’, and ‘SLOW’.
- Preprocessing steps were performed on the raw sensor (accelerometer and gyrometer) and packet data.
- normalization was performed on the accelerometer sensor data and packet data. Normalization was not necessary for the gyrometer sensor data since each axis was already centered around 0 with a constant standard deviation.
- the accelerometer sensor data the z-axis was scaled down by dividing by 16767, which was the maximum value that the hardware sensors could read. This min-max scaling was done in order to reduce the large magnitude of forces in that direction to be between 0 and 1.
- the absolute values of the raw values were taken in order to specifically detect the magnitude of the rotational and straight-line forces. This was done since the direction itself was oscillatory around the axis, so the magnitude was the primary source of classification information.
- the payload data contained constant noise from a variety of packets that ping and maintain the connection. By taking a moving average of 100 of the packet bitstreams, a constant amount of noise on the network was accounted and the classification was improved.
- a time window was selected to accumulate the raw data.
- the packet payload messages sent on the network took some time to impact the ICS actuators, especially mechanical peripherals because of startup transients. This added delay between the observed state from the PCAP analysis and the observed state from the sensors. Further, the packet payload arrival time varied depending on whether an ICS state transition was occurring, which gave it a variable sampling rate. This meant each payload could not directly be correlated with a sensor output because many payloads could be correlated to only a few ICS sensor changes, and vice versa. The timing effects could be mitigated by using a larger input size. As sample input size increased, the variable sampling rate and differences among sample rates became less impactful.
- the data from the ICS testbed was fed into a dual-CNN architecture according to the architecture shows in FIG. 5 .
- the input 505 was either raw time series sensor data or bit streams from the payloads in packets over time. Training, validation, and testing splits were performed at the ratio of 70:20:10 to ensure the model can accurately detect ICS states from payloads and sensors.
- the Keras package was used for design and training.
- the model uses a combination of convolutional layers 510 / 520 , max pooling layers 515 / 525 , a flattening layer 530 , a dropout layer 535 , and a dense neural network layer 540 .
- the CNN models were first trained and tested on windows of 100 samples for both payload and sensor data streams. For the anomaly detection, the occurrence of errors (or disagreements in the states) between the two CNNs were monitored. Since the trials were about 500,000 samples each and the models predicted from 100 samples, there was be about 5000 predictions per trial. A sliding window of size 20 was used to calculate error prediction percentage over time. In other words, every group of 20 predictions, produced an error rate.
- FIGS. 6 A- 6 D shows the visualization of results of both the payload and gyrometer sensor classifiers, and the error rate per moving window of 20 predictions. The selection of a moving window error rate of 20 was used because, while random misclassification can occur, after around 20 predictions the error rate was observed to be fairly low. A threshold of 18% for the error rate is used to identify anomalies since the baseline error rate for a window of size 20 is around 15% for our models.
- the training data was analyzed using confusion matrices for the raw data to visualize the effectiveness of the classifiers.
- the raw sensor from the accelerometer and gyrometer are shown in FIG. 7 A and FIG. 7 B , respectively, and packet data is shown in FIG. 7 C .
- the F1 scores and weighted averages are also shown in these figures.
- the best performing model used the gyrometer sensor data ( FIG. 7 B ), with near perfect classification except for misclassifications for the ‘halt’ and ‘off’ states.
- FIGS. 8 A- 8 C show The results of combining the classifier outputs for tracking the number of occurrences when the classified states for the gyrometer sensor data and packet data differed.
- FIGS. 8 A- 8 C show how comparing the classified states in an unsupervised way allowed for a robust anomaly detection.
- a precision-recall curve was used to detect the precision to recall ratio as the threshold of anomaly detection was adjusted. This method revealed the degree at which the overall classifier performed greater than random chance. By sweeping the threshold from 0.0% to 100.0% of errors within a window, a diagram as shown in FIG. 9 was created where, as recall of anomalies increased, the false positives also increased, and precision decreased. Detecting true positives provides utility since this model had consistent results at detecting the baseline (true negative) at every threshold and had minimal false negatives. Further, to improved visualization through the PRC curve, emphasis on recalling true positives was important since the model had to be able to detect and mitigate threats before they caused permanent major failure to the ICS system.
- the calculated area under the precision-recall curve is about 86% in FIG. 9 .
- the optimal threshold was taken where precision and recall are equal (i.e., equal error rate point or EER).
- EER error rate point
- the model was run on our test set.
- a confusion matrix and statistics were used to evaluate the combined, unsupervised anomaly detector whose performances were shown to have an F1 score: 0.89, Sensitivity (Recall): 0.87, and Precision: 0.88.
- the results were obtained by analyzing the true positive and false negatives from anomaly injections and false positives and true negatives from baseline. These results represented the strength of the classifier after it was tuned to be an optimal threshold for this dataset.
- the precision of detection reached about 0.88 and its recall about 0.87. Detecting this percentage of anomalies generated was quite strong because the inserted anomalies in the system were of relatively short duration. Though some anomalies were not detected, a more sustained MITM attack would eventually trigger an alarm. Overall, the classifier was robust to the random noise of multiple classifiers and could accurately distinguish anomalies from baseline data.
- FIG. 10 shows the delay in prediction, which were used to estimate the latency. For example, the median number of predictions between the first incorrect prediction and the anomaly (threshold crossed) was 39.5. This meant that about 3950 sensor and payload data were used in total before the error was confirmed. At a rate of 10 samples per milliseconds, 395 milliseconds of sensor data passed until detection. When taking account of all timing information, the combined setup was fast enough to classify and compare windows of data from two data streams.
- a general purpose computing device in the form of a handheld tablet that is wirelessly connected to a network, for example, the Internet is utilized.
- Devices such as handheld tablets generally comprise many different types of sensors.
- One type of sensor that is commonly contained within a handheld tablet is a gyroscope that senses orientation.
- Yet another type of sensor is embedded within the touchscreen that produces pressure readings when the touchscreen is interacted with by the user.
- Such sensors are known to be useful for a variety of uses, one of which is demographic classification of the user. For example, using machine learning algorithms, a tablet user's interactions with the touchscreen and resulting pressure sensor output can be used to predict certain demographic characteristics of the user.
- monitoring and analyzing Internet packets received by, and sent from, the tablet device can additionally yield certain information about the user, including, by way of example, web sites being interacted with, and the like.
- a machine learning algorithm can classify or predict certain characteristics of the user based on characteristics of the Internet packets being received by and transmitted from the handheld tablet device when it is being manipulated by a user.
- analysis of such web packets can enable a machine learning algorithm to classify if the tablet has malicious software, e.g., “malware,” installed or not.
- the subject matter disclosed herein can utilize the two machine learning algorithms; the first algorithm processing sensor data and the second algorithm processing Internet packet characteristics to enhance the overall predictability and reliability of the prediction or classification task.
- the prediction or classification task in this example, could be to enhance the prediction or classification of certain user demographics, identify if the user is utilizing a tablet while it is infected with malware, or identify if the user is installing and executing malware.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Testing And Monitoring For Control Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Described herein are methods, systems, and platforms comprising neural networks for control system anomaly detection.
Description
- This application claims priority to U.S. Provisional Application No. 63/211,281, filed Jun. 16, 2021, and U.S. Provisional Application No. 63/275,759, filed Nov. 4, 2021, which are hereby incorporated by reference in their entireties.
- Control systems provide transportation, essential utilities, and manufacturing of goods to the masses. It is critical that controlled processes within these systems are executed correctly and according to schedule. Monitoring the system's performance during their operation is important for maintaining their reliability and availability.
- Many control systems, including those in industrial and manufacturing facilities, rely on computer controlled electro-mechanical frameworks (e.g., industrial control system (ICS)). These frameworks coordinate industrial operations between protocols, connections, and devices in the control system, so they can be executed properly and on schedule. Further, these various components are generally interoperable due to the emergence of standardized computer interfaces and networking protocols that support control system implementations. Thus, it may be the case that the control system demonstrates state-like behavior that characterizes its overall functionality. However, often times the protocols employed may be open communication protocols or the software may be open source, which may increase the risk of cyberattacks. Further, the components in the control system may be supplied by various vendors and often have large state spaces with high complexity (e.g., cycling states), thus making it impractical to capture the complete behavior of the control system. Thus, critical processes may be disrupted, resulting in damage to essential utilities, without detection.
- Detecting anomalies in a control system may help to increase its safety, reliability, and resilience. An automated anomaly detection may be achieved using machine learning/artificial intelligence (ML) algorithms, such as neural networks, that analyze the overall health of the components in a control system and predict the states of the components. The ML algorithms may consider patterns in data related to network traffic as well data from equipment (e.g., sensor data), and the states may be classified taking into account random deviations that may occur. If the control system is functioning properly, the state classified should match or be reasonably similar (i.e., consensus from the two models is achieved). However, when faulty equipment or processing errors cause unexpected behavior in the system, the classification may diverge, causing loss of consensus. Because the system diverges from normal behavior, this classification can also be described as anomaly detection.
- In one aspect, disclosed herein are computer-implemented methods for control system anomaly detection comprising: receiving input data comprising: sensor data from equipment in the control system; and network data from a network in communication with the control system; normalizing distributions of the sensor data and the network data; checking time alignment between the sensor data to the network data; selecting a time window for accumulating the sensor data and the network data; feeding the sensor data into a first neural network comprising a behavior classifier of the equipment of the control system to output a first classified state of the control system; feeding the network data into a second neural network comprising a network traffic classifier to output a second classified state of the control system; and comparing the first and the second classified states for consensus for system anomaly detection, wherein accumulation of differences in classified states in a given time interval above a threshold indicates occurrence of an anomaly. In some embodiments, the control system comprises an industrial control system, distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded control system, or a combination thereof. In some embodiments, the control system comprises a general purpose computer. In some embodiments, the industrial control system comprises one or more of programmable logic controllers, remote terminal units, intelligent electronic devices, engineering workstations, human machine interfaces, data historians, communication gateways, and front-end processors. In some embodiments, the control system employs one or more network communication protocols. In further embodiments, the one or more network communication protocols comprise standard network communication protocols, non-standard network communication protocols, or a combination thereof. In still further embodiments, the standard network communication protocols comprise process field bus (Profibus), process field net (Profinet), highway addressable remote transducer (HART), distributed network protocol (DNP3), Modbus, open platform communication (OPC), building automation and control networks (BACnet), common industrial protocol (CIP), or ethernet for control automation technology (EtherCAT). In some embodiments, the sensor data comprises time series data. In some embodiments, the sensor data is obtained from a standalone sensor or an integrated sensor. In further embodiments, the integrated sensor is part of a control device comprising an actuator. In some embodiments, the network data comprises packet data, metadata, or a combination thereof. In further embodiments, the packet data comprises a packet's header, payload, trailer, or any combination thereof. In still further embodiments, the packet data from the packet's payload comprises bit streams. In some embodiments, normalizing distributions of the sensor data and the network data comprises adjusting the distributions' mean, variance, higher-ordered moments, or a combination thereof. In some embodiments, the method comprises resampling the sensor data, the network data, or a combination thereof for the time alignment between the sensor data and network data. In further embodiments, the resampling results in the sensor data and the network data having a same number of samples. In further embodiments, the resampling comprises downsampling. In further embodiments, the resampling comprises upsampling. In further embodiments, the resampling comprises unsampling. In some embodiments, the method comprises windowing to adjust the time window for accumulating the sensor data, the network data, or a combination thereof. In further embodiments, the windowing accounts for delays in the network data, the sensor data, or a combination thereof. In some embodiments, one or both of the first neural network and the second neural network are deep neural networks. In further embodiments, the deep neural networks comprise convolutional layers such that one or both of the first neural network and the second neural network are convolutional neural networks. In still further embodiments, the convolutional neural networks comprise convolutional layers, pooling layers, flattening layers, dropout layers, and dense layers. In still further embodiments, the convolutional layers comprise 1D, 2D, or 3D convolutional layers. In still further embodiments, the pooling layers comprise maximum pooling layers, minimum pooling layers, average pooling layers, or a combination thereof. In still further embodiments, the convolutional neural networks have hyperparameters that are empirically chosen based on patterns in the network of the control system. In still further embodiments, the convolutional neural networks are supervised for training to identify one or both of the first classified state and the second classified state. In some embodiments, the comparing the first and the second classified states for consensus for system anomaly detection is unsupervised for detecting the differences between the first and the second classified states. In some embodiments, the threshold is an average discrepancy rate between the first and the second classified state. In further embodiments, the threshold is dynamically changed over time. In some embodiments, the anomaly is due to attacks on at least one of the equipment in the control system and the network of the control system.
- In another aspect, disclosed herein are computer-implemented systems for control system anomaly detection comprising: at least one logic element configured to perform operations on sensor data from equipment in the control system and network data from a network in the control system the operations comprising: a normalization operation to normalize distributions of the sensor data and the network data; a checking operation to check time alignment between the sensor data and the network data; and a selection operation to select a time window for accumulating the sensor data and the network data; a first neural network comprising a behavior classifier of the equipment of the control system for outputting a first classified state of the control system from the sensor data; a second neural network comprising a network traffic classifier for outputting a second classified state of the control system from the network data; and a discrepancy aggregator for comparing the first and the second classified state for consensus for control system anomaly detection, wherein accumulation of differences in the classified states in a given time interval above a threshold indicates occurrence of an anomaly. In some embodiments, the computer-implemented system comprises at least one processor, a memory, and instructions executable by at least one processor. In some embodiments, the computer-implemented system comprises a general purpose computer. In some embodiments, the at least one logic element comprises a programmable logic controller (PLC), programmable logic array (PLA), programmable array logic (PAL), generic logic array (GLA), complex programmable logic decide (CPLD), field programmable gate array (FPGA), or application-specific integrated circuit (ASIC). In some embodiments, the at least one logic element is implemented on a general purpose computer. In some embodiments, the control system comprises an industrial control system, distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded system, or a combination thereof. In further embodiments, the industrial control system comprises one or more of programmable logic controllers, remote terminal units, intelligent electronic devices, engineering workstations, human machine interfaces, data historians, communication gateways, and front-end processors. In some embodiments, the control system employs one or more network communication protocols. In further embodiments, the one or more network communication protocols comprise standard network communication protocols, non-standard network communication protocols, or a combination thereof. In still further embodiments, the standard network communication protocols comprise process field bus (Profibus), process field net (Profinet), highway addressable remote transducer (HART), distributed network protocol (DNP3), Modbus, open platform communication (OPC), building automation and control networks (BACnet), common industrial protocol (CIP), or ethernet for control automation technology (EtherCAT). In some embodiments, the sensor data comprises time series data. In some embodiments, the sensor data is obtained from a standalone sensor or an integrated sensor. In further embodiments, the integrated sensor is part of a control device comprising an actuator. In some embodiments, the network data comprises packet data, metadata, or a combination thereof. In further embodiments, the packet data comprises a packet's header, payload, trailer, or a combination thereof. In still further embodiments, the packet data from the packet's payload comprises bit streams. In some embodiments, the normalization operation comprises adjusting the distribution's mean, variance, higher-ordered moments, or a combination thereof. In some embodiments, the at least one logic element is configured to perform a resampling operation of the sensor data, the network data, or a combination thereof for the time alignment between the network data and the sensor data. In further embodiments, the resampling operation results in the sensor data and the network data having a same number of samples. In further embodiments, the resampling operation comprises downsampling. In further embodiments, the resampling operation comprises upsampling. In further embodiments, the resampling operation comprises unsampling. In some embodiments, the at least one logic element is configured to perform a windowing operation to adjust the time windows for accumulating the sensor data, the network data, or a combination thereof. In further embodiments, the windowing operation accounts for delays in the network data, sensors data, or a combination thereof. In some embodiments, one or both of the first neural network and the second neural network are deep neural networks. In further embodiments, the deep neural networks comprise convolutional layers such that one or both of the first neural network and the second neural network are convolutional neural networks. In still further embodiments, the convolutional neural networks comprise convolutional layers, pooling layers, flattening layers, dropout layers, and dense layers. In still further embodiments, the convolutional layers comprise 1D, 2D, or 3D convolutional layers. In still further embodiments, the pooling layers comprise maximum pooling layers, minimum pooling layers, average pooling layers, or a combination thereof. In further embodiments, the convolutional neural networks have hyperparameters that are empirically chosen based on patterns in the network of the control system. In further embodiments, the convolutional neural networks are supervised for training to identify the classified states. In some embodiments, the threshold is an average discrepancy rate between the first and the second classified state. In further embodiments, the threshold is dynamically changed over time. In some embodiments, the anomaly is due to attacks on at least one of the equipment in the control system and the network of the control system.
- In another aspect, disclosed herein are platforms for control system anomaly detection comprising: an apparatus comprising at least one logic element for performing operations on sensor data from equipment in the control system and network data from a network in communication with the control system; and a discrepancy aggregator for control system anomaly detection; and a cloud computing resource communicably coupled to the apparatus and comprising a first neural network and a second neural network; wherein the operations comprise: a normalization operation to normalize distributions of the sensor data and the network data; a checking operation to check time alignment between the sensor data and the network data; and a selection operation to select a time window for accumulating the sensor data and the network data; wherein the first neural network comprises a behavior classifier of the equipment of the control system outputting a first classified state of the control system from the sensor data from the operations; wherein the second neural network comprises a network traffic classifier outputting a second classified state of the control system from the network data from the operations; wherein the discrepancy aggregator compares the first and the second classified state for consensus for control system anomaly detection; and wherein accumulation of differences in the classified states in a given time interval above a threshold indicates occurrence of an anomaly. In some embodiments, the apparatus comprising at least one logic element comprises at least one processor, a memory, and instructions executable by at least one processor. In some embodiments, the at least one logic element comprises a programmable logic controller (PLC), programmable logic array (PLA), programmable array logic (PAL), generic logic array (GLA), complex programmable logic decide (CPLD), field programmable gate array (FPGA), or application-specific integrated circuit (ASIC). In some embodiments, the control system comprises an industrial control system, distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded system, or a combination thereof. In some embodiments, the industrial control system comprises one or more of programmable logic controllers, remote terminal units, intelligent electronic devices, engineering workstations, human machine interfaces, data historians, communication gateways, and front-end processors. In some embodiments, the control system employs one or more network communication protocols. In further embodiments, the one or more network communication protocols comprise standard network communication protocols, non-standard network communication protocols, or a combination thereof. In still further embodiments, the standard network communication protocols comprise process field bus (Profibus), process field net (Profinet), highway addressable remote transducer (HART), distributed network protocol (DNP3), Modbus, open platform communication (OPC), building automation and control networks (BACnet), common industrial protocol (CIP), or ethernet for control automation technology (EtherCAT). In some embodiments, the sensor data comprises time series data. In some embodiments, the sensor data is obtained from a standalone sensor or an integrated sensor. In further embodiments, the integrated sensor is part of a control device comprising an actuator. In some embodiments, the network data comprises packet data, metadata, or a combination thereof. In further embodiments, the packet data comprises a packet's header, payload, trailer, or a combination thereof. In still further embodiments, the packet data from the packet's payload comprises bit streams. In some embodiments, the normalization operation comprises adjusting the distribution's mean, variance, higher-ordered moments, or a combination thereof. In some embodiments, the operations comprise a resampling operation of the sensor data, the network data, or a combination thereof for the time alignment between the network data and the sensor data. In further embodiments, the resampling operation results in the sensor data and the network data having a same number of samples. In further embodiments, the resampling operation comprises downsampling. In further embodiments, the resampling operation comprises upsampling. In further embodiments, the resampling operation comprises unsampling. In some embodiments, the operations comprise a windowing operation to adjust the time windows for accumulating the sensor data, the network data, or a combination thereof. In further embodiments, the windowing operation accounts for delays in the network data, sensors data, or a combination thereof. In some embodiments, one or both of the first neural network and the second neural network are deep neural networks. In further embodiments, the deep neural networks comprise convolutional layers such that one or both of the first neural network and the second neural network are convolutional neural networks. In still further embodiments, the convolutional neural networks comprise convolutional layers, pooling layers, flattening layers, dropout layers, and dense layers. In still further embodiments, the convolutional layers comprise 1D, 2D, or 3D convolutional layers. In still further embodiments, the pooling layers comprise maximum pooling layers, minimum pooling layers, average pooling layers, or a combination thereof. In still further embodiments, the convolutional neural networks have hyperparameters that are empirically chosen based on patterns in the network of the control system. In still further embodiments, the convolutional neural network is supervised for training to identify the first and the second classified states. In some embodiments, the threshold is an average discrepancy rate between the first and the second classified state. In further embodiments, the threshold is dynamically changed over time. In some embodiments, the anomaly is due to attacks on at least one of the equipment in the control system and the network of the control system.
- In another aspect, disclosed herein are computer-implemented methods of training neural networks for control system anomaly detection comprising: collecting input data comprising sensor data from equipment in the control system and network data from a network in communication with the control system; preprocessing the sensor data and the network data to output preprocessed sensor data and preprocessed network data, the preprocessing comprising: normalizing to adjust distributions of the sensor data and the network data; checking the sensor data and the network data for time alignment; and selecting a time window for accumulating the sensor data and the network data; creating training sets comprising a first training set comprising the preprocessed sensor data and a second training set comprising the preprocessed network data; and training a first neural network comprising a behavior classifier of the equipment of the control system with the first training set to output a first classified state; and training a second neural network comprising a network traffic classifier with the second training set to output a second classified state. In various embodiments, the method is implemented on a general purpose computer, a server, a cluster of servers, a distributed computing platform, or a cloud computing platform. In some embodiments, the network data comprises packet data, metadata, or a combination thereof. In further embodiments, the packet data comprises a packet's header, payload, trailer, or a combination thereof. In still further embodiments, the packet data from the packet's payload comprises bit streams. In some embodiments, normalizing comprises adjusting the distribution's mean, variance, higher-ordered moments, or a combination thereof. In some embodiments, the preprocessing comprises resampling for the time alignment of the sensor data, the network data, or a combination thereof. In further embodiments, the resampling results in the sensor data and the network data having a same number of samples. In further embodiments, resampling comprises downsampling, upsampling, or unsampling. In some embodiments, the preprocessing comprises windowing to adjust the time windows for accumulating the sensor data, the network data, or a combination thereof. In further embodiments, the windowing accounts for delays in the network data, the sensor data, or a combination thereof. In some embodiments, one or both of the first neural network and the second neural network are deep neural networks. In further embodiments, the deep neural networks comprise convolutional layers such that one or both of the first neural network and the second neural network are convolutional neural networks. In still further embodiments, the convolutional neural networks comprise convolutional layers, pooling layers, flattening layers, dropout layers, and dense layers. In still further embodiments, the convolutional layers comprise 1D, 2D, or 3D. In still further embodiments, the pooling layers comprise maximum pooling layers, minimum pooling layers, average pooling layers, or a combination thereof. In still further embodiments, the convolutional neural networks have hyperparameters empirically chosen based on patterns in the network of the control system.
- A better understanding of the features and advantages of the present subject matter will be obtained by reference to the following detailed description that sets forth illustrative embodiments and the accompanying drawings of which:
-
FIG. 1 shows a non-limiting example of a computing device; in this case, a device with one or more processors, memory, storage, and a network interface; -
FIG. 2 shows a non-limiting example of a block diagram of a generic ICS feedback loop; -
FIG. 3 shows a non-limiting example of a multi-view classification system for a control system; in this case, for an ICS; -
FIG. 4 shows a non-limiting example of an architecture for an ICS testbed; in this case, for a MITM attack; -
FIG. 5 shows a non-limiting example of a dual-CNN architecture; -
FIGS. 6A-6D show raw data obtained from a trial during an MITM attack; -
FIGS. 7A-7C show confusion matrices for the raw sensor and packet data; -
FIGS. 8A-8C show classifier outputs tracking the difference between classified states; -
FIG. 9 shows precision-recall curve (PRC) of the classifier performances; and -
FIG. 10 shows a distribution of total prediction errors before anomaly detection. - Described herein, in certain embodiments, are computer-implemented methods for control system anomaly detection comprising: receiving input data comprising: sensor data from equipment in the control system; and network data from a network in communication with the control system; normalizing distributions of the sensor data and the network data; checking time alignment between the sensor data to the network data; selecting a time window for accumulating the sensor data and the network data; feeding the sensor data into a first neural network comprising a behavior classifier of the equipment of the control system to output a first classified state of the control system; feeding the network data into a second neural network comprising a network traffic classifier to output a second classified state of the control system; and comparing the first and the second classified states for consensus for system anomaly detection, wherein accumulation of differences in classified states in a given time interval above a threshold indicates occurrence of an anomaly.
- Also described herein, in certain embodiments, are computer-implemented systems for control system anomaly detection comprising: at least one logic element configured to perform operations on sensor data from equipment in the control system and network data from a network in the control system the operations comprising: a normalization operation to normalize distributions of the sensor data and the network data; a checking operation to check time alignment between the sensor data and the network data; and a selection operation to select a time window for accumulating the sensor data and the network data; a first neural network comprising a behavior classifier of the equipment of the control system for outputting a first classified state of the control system from the sensor data; a second neural network comprising a network traffic classifier for outputting a second classified state of the control system from the network data; and a discrepancy aggregator for comparing the first and the second classified state for consensus for control system anomaly detection, wherein accumulation of differences in the classified states in a given time interval above a threshold indicates occurrence of an anomaly.
- Also described herein, in certain embodiments, are platforms for control system anomaly detection comprising: an apparatus comprising at least one logic element for performing operations on sensor data from equipment in the control system and network data from a network in communication with the control system; and a discrepancy aggregator for control system anomaly detection; and a cloud computing resource communicably coupled to the apparatus and comprising a first neural network and a second neural network; wherein the operations comprise: a normalization operation to normalize distributions of the sensor data and the network data; a checking operation to check time alignment between the sensor data and the network data; and a selection operation to select a time window for accumulating the sensor data and the network data; wherein the first neural network comprises a behavior classifier of the equipment of the control system outputting a first classified state of the control system from the sensor data from the operations; wherein the second neural network comprises a network traffic classifier outputting a second classified state of the control system from the network data from the operations; wherein the discrepancy aggregator compares the first and the second classified state for consensus for control system anomaly detection; and wherein accumulation of differences in the classified states in a given time interval above a threshold indicates occurrence of an anomaly.
- Also described herein, in certain embodiments, are computer-implemented methods of training neural networks for control system anomaly detection comprising: collecting input data comprising sensor data from equipment in the control system and network data from a network in communication with the control system; preprocessing the sensor data and the network data to output preprocessed sensor data and preprocessed network data, the preprocessing comprising: normalizing to adjust distributions of the sensor data and the network data; checking the sensor data and the network data for time alignment; and selecting a time window for accumulating the sensor data and the network data; creating training sets comprising a first training set comprising the preprocessed sensor data and a second training set comprising the preprocessed network data; and training a first neural network comprising a behavior classifier of the equipment of the control system with the first training set to output a first classified state; and training a second neural network comprising a network traffic classifier with the second training set to output a second classified state.
- Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present subject matter belongs.
- As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Any reference to “or” herein is intended to encompass “and/or” unless otherwise stated.
- Reference throughout this specification to “some embodiments,” “further embodiments,” or “a particular embodiment,” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in some embodiments,” or “in further embodiments,” or “in a particular embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
- As used herein, the term “control system” may generally refer to a framework to coordinate operations between components, such as protocols, connections, and devices, in a system. In some embodiments, the operations may be executed with one or more logic elements. In various embodiments, the control system comprises an industrial control system (ICS), distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded system, or a combination thereof. In some embodiments, the control system comprises a general purpose computing system with one or more network connections such as the Internet, Bluetooth, and the like, wherein the general purpose computing system is controlled by user input or by application programs running on the general purpose computing system. In further embodiments, the general purpose computing system comprises an edge device such as a desktop or a notebook, tablet, smartphone, or other portable computing device. In further embodiments, the general purpose computing system comprises a server or server cluster interconnected to a combination of local components and remote components via one or more network connections.
- As used herein, the term “neural network” may generally refer to a computational network composed of nodes. The nodes of the neural network may be connected as layers or graphs. In some embodiments, the neural network comprises an algorithm designed for solving a specific problem. In some embodiments, the neural network may comprise a generalizable algorithm to solve a range of problems. In some embodiments, the neural network may “learn” how to solve one or more problems.
- As used herein, the term “classified state” or “classified states” may generally refer to a state(s) of a component(s) in a control system or in communication with a control system. The state may be determined based on values, ranges, or patterns detected in physical measurements of components in the control system or in communication with the control system. In some embodiments, the states may be determined by a ML algorithm for classification or clustering. In some cases, the ML algorithm may be a neural network.
- As used herein, the term “discrepancy aggregator” may generally refer to a computational framework comprising at least one logic element for comparing classified states of components of a control system. In some embodiments, the discrepancy aggregator may accumulate errors (or difference) between classified states for a given time period if the classified states of the components in the control system lack consensus. In some embodiments, the accumulation of errors may be compared to a threshold. If the accumulation of errors is greater than the threshold, an anomaly may be identified in the control system.
- As used herein, the term “anomaly” or “anomalies” may generally refer to abnormal behavior in one or more components in a control system or in communication with the control system. Abnormal behavior may comprise of irregular values, ranges, or patterns detected in physical measurements of components in the control system or in communication with the control system. In some embodiments, the anomaly may comprise of faulty components due to wearing of components over time or due to an accident. In some embodiments, the anomaly may be indicative of a cyberattack.
- Referring to
FIG. 1 , a block diagram is shown depicting an exemplary machine that includes a computer system 100 (e.g., a processing or computing system) within which a set of instructions can execute for causing a device to perform or execute any one or more of the aspects and/or methodologies for static code scheduling of the present disclosure. The components inFIG. 1 are examples only and do not limit the scope of use or functionality of any hardware, software, embedded logic component, or a combination of two or more such components implementing particular embodiments. -
Computer system 100 may include one ormore processors 101, amemory 103, and astorage 108 that communicate with each other, and with other components, via abus 140. Thebus 140 may also link adisplay 132, one or more input devices 133 (which may, for example, include a keypad, a keyboard, a mouse, a stylus, etc.), one ormore output devices 134, one ormore storage devices 135, and varioustangible storage media 136. All of these elements may interface directly or via one or more interfaces or adaptors to thebus 140. For instance, the varioustangible storage media 136 can interface with thebus 140 viastorage medium interface 126.Computer system 100 may have any suitable physical form, including but not limited to one or more integrated circuits (ICs), printed circuit boards (PCBs), mobile handheld devices (such as mobile telephones or PDAs), laptop or notebook computers, distributed computer systems, computing grids, or servers. -
Computer system 100 includes one or more processor(s) 101 (e.g., central processing units (CPUs), general purpose graphics processing units (GPGPUs), or quantum processing units (QPUs)) that carry out functions. Processor(s) 101 optionally contains acache memory unit 102 for temporary local storage of instructions, data, or computer addresses. Processor(s) 101 are configured to assist in execution of computer readable instructions.Computer system 100 may provide functionality for the components depicted inFIG. 1 as a result of the processor(s) 101 executing non-transitory, processor-executable instructions embodied in one or more tangible computer-readable storage media, such asmemory 103,storage 108,storage devices 135, and/orstorage medium 136. The computer-readable media may store software that implements particular embodiments, and processor(s) 101 may execute the software.Memory 103 may read the software from one or more other computer-readable media (such as mass storage device(s) 135, 136) or from one or more other sources through a suitable interface, such asnetwork interface 120. The software may cause processor(s) 101 to carry out one or more processes or one or more steps of one or more processes described or illustrated herein. Carrying out such processes or steps may include defining data structures stored inmemory 103 and modifying the data structures as directed by the software. - The
memory 103 may include various components (e.g., machine readable media) including, but not limited to, a random access memory component (e.g., RAM 104) (e.g., static RAM (SRAM), dynamic RAM (DRAM), ferroelectric random access memory (FRAM), phase-change random access memory (PRAM), etc.), a read-only memory component (e.g., ROM 105), and any combinations thereof.ROM 105 may act to communicate data and instructions unidirectionally to processor(s) 101, andRAM 104 may act to communicate data and instructions bidirectionally with processor(s) 101.ROM 105 andRAM 104 may include any suitable tangible computer-readable media described below. In one example, a basic input/output system 106 (BIOS), including basic routines that help to transfer information between elements withincomputer system 100, such as during start-up, may be stored in thememory 103. -
Fixed storage 108 is connected bidirectionally to processor(s) 101, optionally throughstorage control unit 107.Fixed storage 108 provides additional data storage capacity and may also include any suitable tangible computer-readable media described herein.Storage 108 may be used to storeoperating system 109, executable(s) 110,data 111, applications 112 (application programs), and the like.Storage 108 can also include an optical disk drive, a solid-state memory device (e.g., flash-based systems), or a combination of any of the above. Information instorage 108 may, in appropriate cases, be incorporated as virtual memory inmemory 103. - In one example, storage device(s) 135 may be removably interfaced with computer system 100 (e.g., via an external port connector (not shown)) via a
storage device interface 125. Particularly, storage device(s) 135 and an associated machine-readable medium may provide non-volatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for thecomputer system 100. In one example, software may reside, completely or partially, within a machine-readable medium on storage device(s) 135. In another example, software may reside, completely or partially, within processor(s) 101. -
Bus 140 connects a wide variety of subsystems. Herein, reference to a bus may encompass one or more digital signal lines serving a common function, where appropriate.Bus 140 may be any of several types of bus structures including, but not limited to, a memory bus, a memory controller, a peripheral bus, a local bus, and any combinations thereof, using any of a variety of bus architectures. As an example and not by way of limitation, such architectures include an Industry Standard Architecture (ISA) bus, an Enhanced ISA (EISA) bus, a Micro Channel Architecture (MCA) bus, a Video Electronics Standards Association local bus (VLB), a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, an Accelerated Graphics Port (AGP) bus, HyperTransport (HTX) bus, serial advanced technology attachment (SATA) bus, and any combinations thereof. -
Computer system 100 may also include aninput device 133. In one example, a user ofcomputer system 100 may enter commands and/or other information intocomputer system 100 via input device(s) 133. Examples of an input device(s) 133 include, but are not limited to, an alpha-numeric input device (e.g., a keyboard), a pointing device (e.g., a mouse or touchpad), a touchpad, a touch screen, a multi-touch screen, a joystick, a stylus, a gamepad, an audio input device (e.g., a microphone, a voice response system, etc.), an optical scanner, a video or still image capture device (e.g., a camera), and any combinations thereof. In some embodiments, the input device is a Kinect, Leap Motion, or the like. Input device(s) 133 may be interfaced tobus 140 via any of a variety of input interfaces 123 (e.g., input interface 123) including, but not limited to, serial, parallel, game port, USB, FIREWIRE, THUNDERBOLT, or any combination of the above. - In particular embodiments, when
computer system 100 is connected to network 130,computer system 100 may communicate with other devices, specifically mobile devices and enterprise systems, distributed computing systems, cloud storage systems, cloud computing systems, and the like, connected tonetwork 130. Communications to and fromcomputer system 100 may be sent throughnetwork interface 120. For example,network interface 120 may receive incoming communications (such as requests or responses from other devices) in the form of one or more packets (such as Internet Protocol (IP) packets) fromnetwork 130, andcomputer system 100 may store the incoming communications inmemory 103 for processing.Computer system 100 may similarly store outgoing communications (such as requests or responses to other devices) in the form of one or more packets inmemory 103 and communicated to network 130 fromnetwork interface 120. Processor(s) 101 may access these communication packets stored inmemory 103 for processing. - Examples of the
network interface 120 include, but are not limited to, a network interface card, a modem, and any combination thereof. Examples of anetwork 130 ornetwork segment 130 include, but are not limited to, a distributed computing system, a cloud computing system, a wide area network (WAN) (e.g., the Internet, an enterprise network), a local area network (LAN) (e.g., a network associated with an office, a building, a campus or other relatively small geographic space), a telephone network, a direct connection between two computing devices, a peer-to-peer network, and any combinations thereof. A network, such asnetwork 130, may employ a wired and/or a wireless mode of communication. In general, any network topology may be used. - Information and data can be displayed through a
display 132. Examples of adisplay 132 include, but are not limited to, a cathode ray tube (CRT), a liquid crystal display (LCD), a thin film transistor liquid crystal display (TFT-LCD), an organic liquid crystal display (OLED) such as a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display, a plasma display, and any combinations thereof. Thedisplay 132 can interface to the processor(s) 101,memory 103, and fixedstorage 108, as well as other devices, such as input device(s) 133, via thebus 140. Thedisplay 132 is linked to thebus 140 via avideo interface 122, and transport of data between thedisplay 132 and thebus 140 can be controlled via thegraphics control 121. In some embodiments, the display is a video projector. - In addition to a
display 132,computer system 100 may include one or more otherperipheral output devices 134 including, but not limited to, an audio speaker, a printer, a storage device, and any combinations thereof. Such peripheral output devices may be connected to thebus 140 via anoutput interface 124. Examples of anoutput interface 124 include, but are not limited to, a serial port, a parallel connection, a USB port, a FIREWIRE port, a THUNDERBOLT port, and any combinations thereof. - In addition or as an alternative,
computer system 100 may provide functionality as a result of logic hardwired or otherwise embodied in a circuit, which may operate in place of or together with software to execute one or more processes or one or more steps of one or more processes described or illustrated herein. Reference to software in this disclosure may encompass logic, and reference to logic may encompass software. Moreover, reference to a computer-readable medium may encompass a circuit (such as an IC) storing software for execution, a circuit embodying logic for execution, or both, where appropriate. The present disclosure encompasses any suitable combination of hardware, software, or both. - Those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality.
- The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by one or more processor(s), or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
- In accordance with the description herein, suitable computing devices include, by way of non-limiting examples, server computers, desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, personal digital assistants, and vehicles. Those of skill in the art will also recognize that select televisions, video players, and digital music players with optional computer network connectivity are suitable for use in the system described herein. Suitable tablet computers, in various embodiments, include those with booklet, slate, and convertible configurations, known to those of skill in the art.
- In some embodiments, the computing device includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages the device's hardware and provides services for execution of applications. Those of skill in the art will recognize that suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing. Those of skill in the art will also recognize that suitable mobile smartphone operating systems include, by way of non-limiting examples, Nokia® Symbian® OS, Apple® iOS®, Research In Motion® BlackBerry OS®, Google® Android®, Microsoft® Windows Phone® OS, Microsoft® Windows Mobile® OS, Linux®, and Palm® WebOS®. Those of skill in the art will also recognize that suitable media streaming device operating systems include, by way of non-limiting examples, Apple TV®, Roku®, Boxee®, Google TV®, Google Chromecast®, Amazon Fire®, and Samsung® HomeSync®.
- In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked computing device. In further embodiments, a computer readable storage medium is a tangible component of a computing device. In still further embodiments, a computer readable storage medium is optionally removable from a computing device. In some embodiments, a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, distributed computing systems including cloud computing systems and services, and the like. In some cases, the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.
- In some embodiments, the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same. A computer program includes a sequence of instructions, executable by one or more processor(s) of the computing device's CPU, written to perform a specified task. Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), computing data structures, and the like, that perform particular tasks or implement particular abstract data types. In light of the disclosure provided herein, those of skill in the art will recognize that a computer program may be written in various versions of various languages.
- The functionality of the computer readable instructions may be combined or distributed as desired in various environments. In some embodiments, a computer program comprises one sequence of instructions. In some embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.
- In some embodiments, a computer program includes a web application. In light of the disclosure provided herein, those of skill in the art will recognize that a web application, in various embodiments, utilizes one or more software frameworks and one or more database systems. In some embodiments, a web application is created upon a software framework such as Microsoft® .NET or Ruby on Rails (RoR). In some embodiments, a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, XML, and document oriented database systems. In further embodiments, suitable relational database systems include, by way of non-limiting examples, Microsoft® SQL Server, mySQL™, and Oracle®. Those of skill in the art will also recognize that a web application, in various embodiments, is written in one or more versions of one or more languages. A web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof. In some embodiments, a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or eXtensible Markup Language (XML). In some embodiments, a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS). In some embodiments, a web application is written to some extent in a client-side scripting language such as Asynchronous JavaScript and XML (AJAX), Flash® ActionScript, JavaScript, or Silverlight®. In some embodiments, a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion®, Perl, Java™, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), Python™, Ruby, Tcl, Smalltalk, WebDNA®, or Groovy. In some embodiments, a web application is written to some extent in a database query language such as Structured Query Language (SQL). In some embodiments, a web application integrates enterprise server products such as IBM® Lotus Domino®. In some embodiments, a web application includes a media player element. In various further embodiments, a media player element utilizes one or more of many suitable multimedia technologies including, by way of non-limiting examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, Java™, and Unity®.
- In some embodiments, a computer program includes a mobile application provided to a mobile computing device. In some embodiments, the mobile application is provided to a mobile computing device at the time it is manufactured. In other embodiments, the mobile application is provided to a mobile computing device via the computer network described herein.
- In view of the disclosure provided herein, a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of non-limiting examples, C, C++, C#, Objective-C, Java™, JavaScript, Pascal, Object Pascal, Python™, Ruby, VB.NET, WML, and XHTML/HTML with or without CSS, or combinations thereof.
- Suitable mobile application development environments are available from several sources. Commercially available development environments include, by way of non-limiting examples, AirplaySDK, alcheMo, Appcelerator®, Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform. Other development environments are available without cost including, by way of non-limiting examples, Lazarus, MobiFlex, MoSync, and PhoneGap. Also, mobile device manufacturers distribute software developer kits including, by way of non-limiting examples, iPhone and iPad (iOS) SDK, Android™ SDK, BlackBerry® SDK, BREW SDK, Palm® OS SDK, Symbian SDK, webOS SDK, and Windows® Mobile SDK.
- Those of skill in the art will recognize that several commercial forums are available for distribution of mobile applications including, by way of non-limiting examples, Apple® App Store, Google® Play, Chrome WebStore, BlackBerry® App World, App Store for Palm devices, App Catalog for webOS, Windows® Marketplace for Mobile, Ovi Store for Nokia® devices, Samsung® Apps, and Nintendo® DSi Shop.
- In some embodiments, a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in. Those of skill in the art will recognize that standalone applications are often compiled. A compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, Java™, Lisp, Python™, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program. In some embodiments, a computer program includes one or more executable complied applications.
- In some embodiments, the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same. In view of the disclosure provided herein, software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art. The software modules disclosed herein are implemented in a multitude of ways. In various embodiments, a software module comprises a file, a section of code, a programming object, a programming structure, a distributed computing resource, a cloud computing resource, or combinations thereof. In further various embodiments, a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, a plurality of distributed computing resources, a plurality of cloud computing resources, or combinations thereof. In various embodiments, the one or more software modules comprise, by way of non-limiting examples, a web application, a mobile application, a standalone application, and a distributed or cloud computing application. In some embodiments, software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on a distributed computing platform such as a cloud computing platform. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.
- In some embodiments, the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same. In view of the disclosure provided herein, those of skill in the art will recognize that many databases are suitable for storage and retrieval of control system information. In various embodiments, suitable databases include, by way of non-limiting examples, relational databases, non-relational databases, object-oriented databases, object databases, entity-relationship model databases, associative databases, XML databases, document oriented databases, and graph databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, Sybase, and MongoDB. In some embodiments, a database is Internet-based. In further embodiments, a database is web-based. In still further embodiments, a database is cloud computing-based. In a particular embodiment, a database is a distributed database. In other embodiments, a database is based on one or more local computer storage devices.
- A control system may comprise a framework to coordinate operations between protocols, connections, and devices, so they may be executed properly and on schedule. In some embodiments, the operations may be executed with one or more logic elements comprising a programmable logic controller (PLC), programmable logic array (PLA), programmable array logic (PAL), generic logic array (GLA), complex programmable logic decide (CPLD), field programmable gate array (FPGA), or application-specific integrated circuit (ASIC). The control system may comprise one or more network communication protocols that may be standard network communication protocols, non-standard network communication protocols, or a combination thereof. In some embodiments, the standard network communication protocols are process field bus (Profibus), process field net (Profinet), highway addressable remote transducer (HART), distributed network protocol (DNP3), Modbus, open platform communication (OPC), building automation and control networks (BACnet), common industrial protocol (CIP), or ethernet for control automation technology (EtherCAT). In some embodiments, the control system may be an industrial control system (ICS), distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded system, or a combination thereof.
- In some embodiments, where the control systems may include industrial and manufacturing facilities (i.e., an ICS), the control system may support production and processing objectives on a mass-scale. An ICS may comprise one or more of PLCs, remote terminal units, intelligent electronic devices, engineering workstations, HMI, data historians, communication gateways, and front-end processors. In some embodiments, an ICS may have different controllable states as steps of a process, and may use an open communication protocol (e.g., Modbus for ICS networks). Further, the open communication protocol, such as Modbus, may not be encrypted at any point during the communication, thus increasing the likelihood of an attack. For example, A generic ICS feedback control loop is exemplary illustrated in
FIG. 1 . - An ICS feedback loop may generally comprise a human-machine interface (HMI) 205. The
HMI 205 may be a user interface (e.g., GUI) that connects a person to one or more components (e.g., equipment, network, etc.) in the ICS. TheHMI 205 may send a query to a programmable logic controller (PLC) 210 regarding the state or function of components in the ICS, and thePLC 210 may send a response back to theHMI 205, which may be displayed on the user interface. In some embodiments, thePLC 210 may send status information regarding components of the ICS to theHMI 205. In some embodiments, thePLC 210 may implement control strategies using a system comprising a microprocessor for managing components in the ICS. - In some cases, the components may be a
physical device 215, such as equipment in the ICS. In some cases, the equipment may be on-site or remote. In some examples, thePLC 210 may control aphysical device 215 or a plurality thereof, such as control motors, valves, switches, etc. In some examples, thePLC 210 may control aphysical device 215 based on measurements obtained fromsensors 220, which may determine when and how thephysical device 215 should operate. In some cases, the measurements may be physical measurements obtained fromsensors 220, such as pressure, volume, temperature, humidity, torque, vacuum, motion, etc. In some cases, thesensor 220 may be a standalone sensor or an integrated sensor. In some examples, the integrated sensor may be part of a control device comprising an actuator. In further embodiments, thePLC 210 may receive commands for thephysical device 215 to perform functions (e.g., pump actuation, stirrer operation, conveyor belt operation, etc.) from theHMI 205. - Safety, reliability, and resilience to cyberattacks may be key attributes for the successful operation of an ICS. These attributes may be threatened due to an increase in attack surfaces due to IOT devices, difficulties in performing patch updates to components in the ICS from downtime and vendor varieties, or an accumulation of small errors over time that may result in larger failures. An anomaly detection system for recognizing threats, such as those described herein, may increase the likelihood of the successful operation of an ICS. In some embodiments, data obtained related to the ICS may be used for anomaly recognition. In some cases, the data may be obtained from one or more sources, such as components of the ICS or communicably coupled to the ICS (e.g., data from a network, such as Modbus commands, sensor data, etc.). In some cases, the data from one or more sources may be analyzed and compared to previous data for anomalies. For example, a pressure sensor may have a normal operating range, and a pressure value outside that range may be flagged as an anomaly. In a further example, network traffic patterns may be analyzed, and an unusually high or low traffic pattern may be flagged as an anomaly. In some cases, the data from multiple sources may be analyzed and compared to one another for anomalies. In some examples, having two different models (i.e., ML algorithms) predict the state of multiple sources may help identify miscommunication errors and the occurrence of an anomaly. For example, sensor data and network traffic patterns may be analyzed and compared to one another to better assess when an anomaly has occurred. The anomaly detection as described herein, by way of non-limiting example, for an ICS, may be performed with a classification system employing ML techniques. In some embodiments, the classification system may employ neural networks.
- An architecture comprising neural networks may be used for predicting the states of components for a control system. The states may be predicted by the neural networks using classification of behavioral patterns of components in the control system (e.g., ‘FAST’, ‘SLOW’, ‘ON’, ‘OFF’, etc.). The classification may be compared to past classifications or may be compared to other components in the control system for multi-view classification in order to identify the occurrence of an anomaly.
- An example of a multi-view classification system for a control system, in this case, by way of non-limiting example, for an ICS, is illustrated in
FIG. 3 . First,raw data 305 may be obtained from components in the control system or in communication with the control system.Raw data 305 may comprise of one or more inputs from components as described herein. In some embodiments, theraw data 305 may comprise sensor data from equipment in the control system (e.g., accelerometer or gyrometer in an ICS). In alternative embodiments, the sensor data may be obtained from equipment in an embedded system (e.g., glucose sensors in an insulin pump, sensors in a pacemaker, etc.). In some embodiments, theraw data 305 may comprise network data from a network in communication with the control system. The network data may comprise packet data, metadata, or a combination thereof. In some cases, the packet data may comprise a packet's header, payload, trailer, or any combination thereof. In some further cases, the packet data from the packet's payload may comprise bit streams. In some embodiments, the network data may comprise of interarrival times, which may be referred to as packet time deltas or the first difference. In such embodiments, each packet may contain a timestamp for when it arrives to the ICS or a component of the ICS, and taking the difference between two adjacent timestamps may yield the amount of time between each packet arrival. The interarrival times (or time between packet arrivals) may change (e.g., increase or decrease) during a change in the state of a control system, which may then return to a baseline interarrival time. Thus, in such embodiments, interarrival times may be used for detecting anomalous state changes. - Preprocessing may be performed on the
raw data 305 using at least one logic element, as described herein. In some embodiments, the multi-view classification, as exemplary illustrated inFIG. 3 , may preprocess data fortime period 310, in which the time period of theraw data 305 may be adjusted. In some cases, preprocessing may comprise of normalizing distributions of one or more inputs of the raw rate 305 (e.g., the sensor data and the network data). In some examples, a normalizing operation may adjust a distributions' mean, variance, higher-ordered moments, or a combination thereof. In some cases, preprocessing may comprise of checking time alignment between one or more inputs of the raw data 305 (e.g., the sensor data to the network data). In some examples, the checking operation may resample any one of the inputs of the raw data 305 (e.g., as i.e., the sensor data, the network data, or any combination thereof) for the time alignment between them. The resampling may result in the inputs of theraw data 305 having a same number of samples. In some examples, the resampling comprises downsampling. In some examples, the resampling comprises upsampling. In some examples, the resampling comprises unsampling. In some cases, preprocessing may comprise of selecting a time window for accumulating the one or more inputs of the raw data 305 (e.g., sensor data and the network data). In some examples, this selection operation may comprise of windowing to adjust the time window for accumulating any one of the inputs of theraw data 305. In some examples, the windowing accounts for delays in any one of the inputs of theraw data 305. In some embodiments, using a smaller time window may allow control of false positive to false negative ratio of the classification, which can be optimized based on the costs of a misclassification. In some embodiments, the size of the time window may be empirically chosen from observing the patterns of theraw data 305. - The data from preprocessing operations, as described herein, may be fed into one or more ML algorithms for identifying single or multi-stage attacks, or detecting anomalies in a control system. In some examples, these attacks or anomalies may be detected by analyzing packet streams and content from a network. In some examples, the network may use one or more communication protocol (e.g., the Modbus protocol). In some examples, these attack or anomalies may be detected from time series data of sensors. In some embodiments, the one or more ML algorithms may be supervised, semi-supervised, or unsupervised for training to identify anomalies. In some embodiments, the one or more ML algorithms may perform classification or clustering to identify anomalies or attacks. In some embodiments, the one or more ML algorithms may comprise classical ML algorithms for performing clustering to identify outliers. Classical ML algorithms may comprise of algorithms that learn from existing observations (i.e., known features) to predict outputs. In some cases, the classical ML algorithms for performing clustering may be K-means clustering, mean-shift clustering, density-based spatial clustering of applications with noise (DBSCAN), expectation-maximization (EM) clustering (e.g., using Gaussian mixture models (GMM)), agglomerative hierarchical clustering, or a combination thereof. In some embodiments, the one or more ML algorithms may comprise classical ML algorithms for classification. In some cases, the classical ML algorithms may comprise logistic regression, naïve Bayes, K-nearest neighbors, random forests or decision trees, gradient boosting, support vector machines (SVMs), or a combination thereof. In some embodiments, the one or more ML algorithm may employ deep learning. A deep learning algorithm may comprise of an algorithm that learns by extracting new features to predict outputs. The deep learning algorithm may comprise of layers, which may comprise a neural network.
- Neural networks may comprise of connected nodes in a network, which may perform functions, such as transforming or translating input data. In some examples, the output from a given node may be passed on as input to another node. In some embodiments, the nodes in the network may comprise of input units, hidden units, output units, or a combination thereof. In some cases, an input node may be connected to one or more hidden units. In some cases, one or more hidden units may be connected to an output unit. The nodes may take in input and may generate an output based on an activation function. In some embodiments, the input or output may be a tensor, a matrix, a vector, an array, or a scalar. In some embodiments, the activation function may be a Rectified Linear Unit (ReLU) activation function, a sigmoid activation function, or a hyperbolic tangent activation function. In some embodiments, the activation function may be a Softmax activation function. The connections between nodes may further comprise of weights for adjusting input data to a given node (i.e., to activate input data or deactivate input data). In some embodiments, the weights may be learned by the neural network. In some embodiments, the neural network may be trained using gradient-based optimizations. In some cases, the gradient-based optimization may comprise of one or more loss functions. In some examples, the gradient-based optimization may be conjugate gradient descent, stochastic gradient descent, or a variation thereof (e.g., adaptive moment estimation (Adam)). In further examples, the gradient in the gradient-based optimization may be computed using backpropagation. In some embodiments, the nodes may be organized into graphs to generate a network (e.g., graph neural networks). In some embodiments, the nodes may be organized into one or more layers to generate a network (e.g., feed forward neural networks, convolutional neural networks (CNNs), recurrent neural networks (RNNs), etc.). In some cases, the neural network may be a deep neural network comprising of more than one layer.
- In some cases, the neural network may comprise one or more recurrent layer. In some examples, the one or more recurrent layer may be one or more long short-term memory (LSTM) layers or gated recurrent unit (GRU), which may perform sequential data classification and clustering. Thus, future predictions may be made by the one or more recurrent layers according to the sequence of past events since data ordering is considered. Further, the recurrent layer may retain or “remember” important information, while selectively “forgetting” what is not essential in the classification model. In some embodiments, the neural network may comprise one or more convolutional layers. The input and output may be a tensor representing of variables or attributes in a data set (i.e., features), which may be referred to as a feature map (or activation map). Thus, the one or more convolutional layers may be referred to as a feature extraction phase. In some cases, the convolutions may be one dimensional (1D) convolutions, two dimensional (2D) convolutions, three dimensional (3D) convolutions, or any combination thereof. In further cases, the convolutions may be 1D transpose convolutions, 2D transpose convolutions, 3D transpose convolutions, or any combination thereof. In some examples, one-dimensional convolutional layers may be suited for time series sensor data analysis since it may classify time series through parallel convolutions. In some examples, convolutional layers may be used for analyzing raw data in the payload of a network packet. Further, the convolutional layers may be efficient for detecting properties in payload bit patterns of a control system since they may follow a recognizable pattern (e.g., payload bit patterns in an ICS follow recognizable ICS command patterns).
- The layers in a neural network may further comprise one or more pooling layers before or after a convolutional layer. The one or more pooling layers may reduce the dimensionality of the feature map using filters that summarize regions of the matrix. This may down sample the number of outputs, and thus reduce the parameters and computational resources needed for the neural network. In some embodiments, the one or more pooling layers may be max pooling, min pooling, average pooling, global pooling, norm pooling, or a combination thereof. Max pooling may reduce the dimensionality of the data by taking only the maximums values in the region of the matrix, which helps capture the significant feature. In some embodiments, the one or more pooling layers may be one dimensional (1D), two dimensional (2D), three dimensional (3D), or any combination thereof. The neural network may further comprise of one or more flattening layers, which may flatten the input to be passed on to the next layer. In some cases, the input (e.g., feature map) may be flattened by reducing it to a one-dimensional array. The flattened inputs may be used to output a classification of an object (e.g., binary classification of an image, such as cat or dog, or of a system's performance, such as normal or abnormal, or multi-class classification identifying hand-written digits, etc.). The neural networks may further comprise of one or more dropout layers. Dropout layers may be used during training of the neural network (e.g., to perform binary or multi-class classifications). The one or more dropout layers may randomly set certain weights as 0, which may set corresponding elements in the feature map as 0, so the neural network may avoid overfitting. The neural network may further comprise of one or more dense layers, which comprise a fully connected network. In the dense layer, information may be passed through the fully connected network to generate a predicted classification of an object, and the error may be calculated. In some embodiments, the error may be backpropagated to improve the prediction. The one or more dense layers may comprise of a Softmax activation function, which may convert a vector of numbers to a vector of probabilities. These probabilities may be subsequently used in classifications, such as classifications of states in a control system as described herein. In some embodiments, the classifications of states from one or more components in a control system may be compared to detect the occurrence of an anomaly.
- An architecture for anomaly detection may comprise two neural networks for dual neural network state prediction as exemplary illustrated in
FIG. 3 . The neural networks may use different sets of features for prediction, such as those obtained from network data and sensor data. Although two neural networks are employed in this example, one of skill in the art will appreciate that any one of the ML algorithms as described herein may be used which may be suited for a particular input data set and desired output. One of skill in the art will also appreciate that more than two ML algorithms may be employed in this architecture. Further, one of skill in the art will appreciate that the ML algorithms as described herein may be combined or that more than one input data may be fed into a single ML algorithm (e.g., the network data and sensor data may be fed into the same algorithm). - In the dual neural network architecture illustrated in
FIG. 3 , the network data (e.g., network payload data) may be fed into a neural network comprising anetwork traffic classifier 315. The neural network comprising thenetwork traffic classifier 315 may be trained to learn “normal” network traffic patterns and classify the network traffic patterns in a given time period by comparing it to the “normal” network traffic pattern. Thenetwork traffic classifier 315 may use the comparison to classify the state of the network traffic pattern in a given time period (e.g., “FAST”, “SLOW”, “MEDIUM”, “HALT”, “OFF”, “REVERSE”, etc.). The output from thenetwork traffic classifier 315 may comprise of a classified state, illustrated as y inFIG. 3 . In further embodiments, the neural network may be trained to classify network data that is encrypted through various methods (e.g., Electronic Code Book, Cipher-Block Chaining, Cipher FeedBack, XOR encryption, etc.). In some embodiments, the sensor data may be fed into a behavioral classifier. In some cases, the sensor data may be time series data. In the case of an ICS, the sensor data may be time series data obtained from an accelerometer, a gyrometer, or any other equipment of the ICS. Further, the behavioral classifier may comprise a motorbehavioral classifier 320. The neural network comprising the behavioral classifier (e.g., motor behavioral classifier 320) may be trained to learn “normal” sensor ranges or values for a given time period, and classify the sensor data in a given time period by comparing it to the “normal” range or values. The behavioral classifier may use the comparison to classify the state of the sensor data in a given time period (e.g., “FAST”, “SLOW”, “MEDIUM”, “HALT”, “OFF”, “REVERSE”, etc.). The output from the behavioral classifier may comprise of another classified state, illustrated as ŷ inFIG. 3 . - The classified states, y and ŷ from the neural networks comprising classifiers, may be compared to one another using a discrepancy aggregator which may comprise at least one logic element. In some embodiments, the classified states may match (i.e., y==ŷ) or be reasonably similar. In such embodiments, consensus from the two neural networks is achieved and the classification system may return to preprocess data for
time period 310 for newraw data 305. In some cases, the classified states may be logged, or data may be used for comparison against newraw data 305. In alternative embodiments, the classified states may lack consensus (i.e., y !=ŷ). The discrepancy aggregator may then accumulate errors (or difference) for the current time window (or time period) of predictions 325 (i.e., E inFIG. 3 ) between the classified states. The accumulation of errors, E, may then be compared to a threshold, T. In some embodiments, the threshold may be empirically chosen from observing the patterns of theraw data 305. In some embodiments, T as athreshold 330, may be set according to an average discrepancy rate between the classified states. In some embodiments, T as athreshold 330, may be dynamically changed over time. In some embodiments, the threshold and the time window may be inversely related (i.e., the greater the time window, the lower the threshold may be needed). If the accumulation of errors is less than the threshold (i.e., E<T), then the classification system may return to preprocess data fortime period 310 for newraw data 305. If the accumulation of errors is greater than the threshold (i.e., E>T), an anomaly is identified 335. The anomaly may comprise of faulty or abnormal behavior of components in the control system or in communication with the control system, or may be indicative of a cyberattack. - The following illustrative examples are representative of embodiments of the software applications, systems, and methods described herein and are not meant to be limiting in any way.
- An ICS test bed to detect anomalies using packet and sensor data patterns was created according to the architecture illustrated
FIG. 4 . This test bed used two streams of data under the assumption that during normal operation, the patterns of command payloads would result in specific patterns of sensor behavior. The architecture was created for a man-in-the-middle (MITM) 410 attack. AMITM 410 attack may comprise of a scenario in which an attacker may secretly relay and alter communications between two or more sources in an ICS without their knowledge. The testbed comprised aMITM 410 between an HMI/PLC 405 and aswitch 415. - The set up comprised a Tolomatic industrial motor, which received commands from the HMI/
PLC 405. The data flow inFIG. 4 was as followed; 1) the HMI/PLC 405 sent a continual stream of motor commands to the switch 415 (e.g., off, on, change of speed, etc), 2) the commands from theswitch 415 were sent to asensor controller 420 comprising an inline Raspberry Pi for logging purposes, 3) thesensor controller 420 forwarded messages to amotor 425 or asensor 430, and 4) themotor 425 andsensor 430 responded with a continual stream of data that was read from the motor or recorded by the sensor, which was also routed through the inline Raspberry Pi. Here, thesensor 430 recorded accelerometer and gyroscopic data related to themotor 425. The accelerometer and gyroscopic sensor data were stored as Comma Separated Values (CSVs) in the X, Y, and Z directions that represented the acceleration and orientation of the attached sensor separated in the three-dimensional space. This data was collected as a constant stream as the sensor controller continuously logged data from the motor at a fixed sample rate of 10 thousand samples per second. The gyrometer data as logged as floating-point values represent angular velocity as degrees per second. The accelerometer data measured the force on the motor in that direction in meters per second squared. In total, six sensor data streams are used for sensor classification. - This ICS ted bed system was constructed to communicate using Modbus packets between the HMI/
PLC 405 and themotor 425. Communication was structured such that all messages sent from the HMI/PLC 405 to themotor 425 resulted in a response message sent back to the HMI/PLC 405. Modbus packets within the system were therefore the Read/Write commands sent to themotor 425, and motor data sent back to the HMI/PLC 405. Rather than being a constant stream of data input, each payload arrived at different times from thesensor controller 420. The payload data was converted from its original byte format to binary, since network data was preprocessed from PCAP files. Each individual data payload was about 53 bytes between 0 to 255, which were converted to binary for machine learning input changing the input width from 53 bytes to 424 bits. - Raw data obtained from a trial during an MITM attack is shown in
FIGS. 6A-6D . The payload data was represented in its byte format inFIG. 6A , where each of the 53 bytes were vertically stacked and pixel color intensity represented the 0-255 value for that byte. Thus, the color changes represent how byte locations in some packet have static, cyclic, or random values. The accelerometer and gyrometer sensor data are shown inFIGS. 6B and 6C , respectively, where a state change from the random short burst of speed and forces were observed.FIG. 6D illustrates packet deltas over time, as described herein, although this data was not used to predict the ICS states in the present test case. - Here, an error was defined as a difference between the predicted state of two CNNs, which classified the states of the payload or sensor data into one of six possible ICS states: ‘FAST’, ‘HALT’, ‘MEDIUM’, ‘OFF’, ‘REVERSE’, and ‘SLOW’.
- Preprocessing steps were performed on the raw sensor (accelerometer and gyrometer) and packet data. First, normalization was performed on the accelerometer sensor data and packet data. Normalization was not necessary for the gyrometer sensor data since each axis was already centered around 0 with a constant standard deviation. For the accelerometer sensor data, the z-axis was scaled down by dividing by 16767, which was the maximum value that the hardware sensors could read. This min-max scaling was done in order to reduce the large magnitude of forces in that direction to be between 0 and 1. The absolute values of the raw values were taken in order to specifically detect the magnitude of the rotational and straight-line forces. This was done since the direction itself was oscillatory around the axis, so the magnitude was the primary source of classification information. For this reason, an absolute value was used to reduce the neural network learning needed to find the magnitude. The payload data contained constant noise from a variety of packets that ping and maintain the connection. By taking a moving average of 100 of the packet bitstreams, a constant amount of noise on the network was accounted and the classification was improved.
- Next, time alignment between the raw data was checked, so that the sensor and packet data were from the same time period. The two data sets had varying amounts of data for each period of time because the sensor data arrived in constant intervals while the packet is arrived sporadically. In order to have around the same amount of data for the time period, the sensor data was downsampled by taking every other sensor reading. This reduction of sensor data to half its readings allowed payload data to be aligned to its corresponding sensor readings in time.
- Finally, a time window was selected to accumulate the raw data. The packet payload messages sent on the network took some time to impact the ICS actuators, especially mechanical peripherals because of startup transients. This added delay between the observed state from the PCAP analysis and the observed state from the sensors. Further, the packet payload arrival time varied depending on whether an ICS state transition was occurring, which gave it a variable sampling rate. This meant each payload could not directly be correlated with a sensor output because many payloads could be correlated to only a few ICS sensor changes, and vice versa. The timing effects could be mitigated by using a larger input size. As sample input size increased, the variable sampling rate and differences among sample rates became less impactful. 100 samples of payload data and 100 samples of sensor data (ending at the same point in time) as input for each CNN model was determined to be a conservative sample size that worked for the classification, since sensor visuals seemed to show that the state change happened over less than 100 samples. By increasing the sample input size to the ratio of the number of samples it took to change states, misclassification errors were reduced to a single prediction during an ICS state change.
- The data from the ICS testbed was fed into a dual-CNN architecture according to the architecture shows in
FIG. 5 . Theinput 505 was either raw time series sensor data or bit streams from the payloads in packets over time. Training, validation, and testing splits were performed at the ratio of 70:20:10 to ensure the model can accurately detect ICS states from payloads and sensors. To create the model, the Keras package was used for design and training. The model uses a combination ofconvolutional layers 510/520,max pooling layers 515/525, aflattening layer 530, adropout layer 535, and a denseneural network layer 540. All activation functions were ReLU except for the final Softmax activation for classification in thedense layer 540. The loss function employed for training was cross entropy across the six possible ICS states. An adaptive momentum (ADAM) optimizer was employed with a learning rate of 1e-5 and was used to iteratively update the weights. The model was trained for 100 epochs and dropout (dropout layer 535) was used to help prevent overfitting. - The CNN models were first trained and tested on windows of 100 samples for both payload and sensor data streams. For the anomaly detection, the occurrence of errors (or disagreements in the states) between the two CNNs were monitored. Since the trials were about 500,000 samples each and the models predicted from 100 samples, there was be about 5000 predictions per trial. A sliding window of
size 20 was used to calculate error prediction percentage over time. In other words, every group of 20 predictions, produced an error rate.FIGS. 6A-6D shows the visualization of results of both the payload and gyrometer sensor classifiers, and the error rate per moving window of 20 predictions. The selection of a moving window error rate of 20 was used because, while random misclassification can occur, after around 20 predictions the error rate was observed to be fairly low. A threshold of 18% for the error rate is used to identify anomalies since the baseline error rate for a window ofsize 20 is around 15% for our models. - The training data was analyzed using confusion matrices for the raw data to visualize the effectiveness of the classifiers. The raw sensor from the accelerometer and gyrometer are shown in
FIG. 7A andFIG. 7B , respectively, and packet data is shown inFIG. 7C . The F1 scores and weighted averages are also shown in these figures. The best performing model used the gyrometer sensor data (FIG. 7B ), with near perfect classification except for misclassifications for the ‘halt’ and ‘off’ states. - The results of combining the classifier outputs for tracking the number of occurrences when the classified states for the gyrometer sensor data and packet data differed are shown in
FIGS. 8A-8C . When the accumulation of differences in a given time window surpassed a certain threshold, the anomaly was marked. A threshold of 18% worked well in flagging the anomalies.FIGS. 8A-8C show how comparing the classified states in an unsupervised way allowed for a robust anomaly detection. - A precision-recall curve (PRC) was used to detect the precision to recall ratio as the threshold of anomaly detection was adjusted. This method revealed the degree at which the overall classifier performed greater than random chance. By sweeping the threshold from 0.0% to 100.0% of errors within a window, a diagram as shown in
FIG. 9 was created where, as recall of anomalies increased, the false positives also increased, and precision decreased. Detecting true positives provides utility since this model had consistent results at detecting the baseline (true negative) at every threshold and had minimal false negatives. Further, to improved visualization through the PRC curve, emphasis on recalling true positives was important since the model had to be able to detect and mitigate threats before they caused permanent major failure to the ICS system. The calculated area under the precision-recall curve (AUPRC) is about 86% inFIG. 9 . - From the precision-recall curve, the optimal threshold was taken where precision and recall are equal (i.e., equal error rate point or EER). At this threshold of around 0.17, the model was run on our test set. A confusion matrix and statistics were used to evaluate the combined, unsupervised anomaly detector whose performances were shown to have an F1 score: 0.89, Sensitivity (Recall): 0.87, and Precision: 0.88. The results were obtained by analyzing the true positive and false negatives from anomaly injections and false positives and true negatives from baseline. These results represented the strength of the classifier after it was tuned to be an optimal threshold for this dataset.
- For the classifier, the precision of detection reached about 0.88 and its recall about 0.87. Detecting this percentage of anomalies generated was quite strong because the inserted anomalies in the system were of relatively short duration. Though some anomalies were not detected, a more sustained MITM attack would eventually trigger an alarm. Overall, the classifier was robust to the random noise of multiple classifiers and could accurately distinguish anomalies from baseline data.
- Another important metric was latency of prediction. For every prediction there were 100 data points of sensor data and packets, and there was a potential anomaly flagged every 20 predictions. Latency was defined as: latency=(W·ei−ea)/s, where W was the window (number of samples per prediction), ei−ea were the number of predictions between the first error and the error where the anomaly threshold was crossed, and s was the sampling rate in samples per millisecond.
FIG. 10 shows the delay in prediction, which were used to estimate the latency. For example, the median number of predictions between the first incorrect prediction and the anomaly (threshold crossed) was 39.5. This meant that about 3950 sensor and payload data were used in total before the error was confirmed. At a rate of 10 samples per milliseconds, 395 milliseconds of sensor data passed until detection. When taking account of all timing information, the combined setup was fast enough to classify and compare windows of data from two data streams. - In another example, a general purpose computing device, in the form of a handheld tablet that is wirelessly connected to a network, for example, the Internet is utilized. Devices such as handheld tablets generally comprise many different types of sensors. One type of sensor that is commonly contained within a handheld tablet is a gyroscope that senses orientation. Yet another type of sensor is embedded within the touchscreen that produces pressure readings when the touchscreen is interacted with by the user. Such sensors are known to be useful for a variety of uses, one of which is demographic classification of the user. For example, using machine learning algorithms, a tablet user's interactions with the touchscreen and resulting pressure sensor output can be used to predict certain demographic characteristics of the user.
- Additionally, monitoring and analyzing Internet packets received by, and sent from, the tablet device can additionally yield certain information about the user, including, by way of example, web sites being interacted with, and the like. A machine learning algorithm can classify or predict certain characteristics of the user based on characteristics of the Internet packets being received by and transmitted from the handheld tablet device when it is being manipulated by a user. Furthermore, if a malicious process is running on the handheld device, analysis of such web packets can enable a machine learning algorithm to classify if the tablet has malicious software, e.g., “malware,” installed or not.
- In this example, the subject matter disclosed herein, as described above, can utilize the two machine learning algorithms; the first algorithm processing sensor data and the second algorithm processing Internet packet characteristics to enhance the overall predictability and reliability of the prediction or classification task. The prediction or classification task, in this example, could be to enhance the prediction or classification of certain user demographics, identify if the user is utilizing a tablet while it is infected with malware, or identify if the user is installing and executing malware.
- While preferred embodiments of the present subject matter have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the present subject matter. It should be understood that various alternatives to the embodiments of the present subject matter described herein may be employed in practicing the present subject matter.
Claims (30)
1. A computer-implemented method for control system anomaly detection comprising:
a) receiving input data comprising: sensor data from equipment in the control system; and network data from a network in communication with the control system;
b) normalizing distributions of the sensor data and the network data;
c) checking time alignment between the sensor data to the network data;
d) selecting a time window for accumulating the sensor data and the network data;
e) feeding the sensor data into a first neural network comprising a behavior classifier of the equipment of the control system to output a first classified state of the control system;
f) feeding the network data into a second neural network comprising a network traffic classifier to output a second classified state of the control system; and
g) comparing the first and the second classified states for consensus for system anomaly detection, wherein accumulation of differences in classified states in a given time interval above a threshold indicates occurrence of an anomaly.
2. The method of claim 1 , wherein the control system comprises an industrial control system, distributed control system (DCS), supervisory control and data acquisition (SCADA) system, embedded control system, or a combination thereof.
3. The method of claim 1 , wherein the control system comprises a general purpose computer.
4. The method of claim 1 , wherein the control system employs one or more standard network communication protocols selected from the group consisting of: process field bus (Profibus), process field net (Profinet), highway addressable remote transducer (HART), distributed network protocol (DNP3), Modbus, open platform communication (OPC), building automation and control networks (BACnet), common industrial protocol (CIP), and ethernet for control automation technology (EtherCAT).
5. The method of claim 1 , wherein the wherein the control system employs one or more non-standard network communication protocols, or a combination of standard network communication protocols and non-standard network communication protocols.
6. The method of claim 1 , wherein the sensor data comprises time series data.
7. The method of claim 1 , wherein the sensor data is obtained from a standalone sensor or an integrated sensor.
8. The method of claim 7 , wherein the integrated sensor is part of a control device comprising an actuator.
9. The method of claim 1 , wherein the network data comprises packet data, metadata, or a combination thereof.
10. The method of claim 9 , wherein the packet data comprises a packet's header, payload, trailer, or any combination thereof.
11. The method of claim 10 , wherein the packet data from the packet's payload comprises bit streams.
12. The method of claim 1 , wherein normalizing distributions of the sensor data and the network data comprises adjusting the distributions' mean, variance, higher-ordered moments, or a combination thereof.
13. The method of claim 1 , wherein the method comprises resampling the sensor data, the network data, or a combination thereof for the time alignment between the sensor data and network data.
14. The method of claim 13 , wherein the resampling results in the sensor data and the network data having a same number of samples and comprises downsampling, upsampling, or unsampling.
15. The method of claim 1 , wherein the method comprises windowing to adjust the time window for accumulating the sensor data, the network data, or a combination thereof.
16. The method of claim 15 , wherein the windowing accounts for delays in the network data, the sensor data, or a combination thereof.
17. The method of claim 1 , wherein one or both of the first neural network and the second neural network are deep neural networks.
18. The method of claim 17 , wherein the deep neural networks comprise convolutional layers such that one or both of the first neural network and the second neural network are convolutional neural networks.
19. The method of claim 18 , wherein the convolutional neural networks comprise convolutional layers, pooling layers, flattening layers, dropout layers, and dense layers.
20. The method of claim 19 , wherein the convolutional layers are 1D, 2D, or 3D convolutional layers.
21. The method of claim 19 , wherein the pooling layers comprise maximum pooling layers, minimum pooling layers, average pooling layers, or a combination thereof.
22. The method of claim 18 , wherein the convolutional neural networks have hyperparameters that are empirically chosen based on patterns in the network of the control system.
23. The method of claim 18 , wherein the convolutional neural networks are supervised for training to identify one or both of the first classified state and the second classified state.
24. The method of claim 1 , wherein the comparing the first and the second classified states for consensus for system anomaly detection is unsupervised for detecting the differences between the first and the second classified states.
25. The method of claim 1 , wherein the threshold is an average discrepancy rate between the first and the second classified state.
26. The method of claim 25 , wherein the threshold is dynamically changed over time.
27. The method of claim 1 , wherein the anomaly is due to attacks on at least one of the equipment in the control system and the network of the control system.
28. A computer-implemented system for control system anomaly detection comprising:
a) at least one logic element configured to perform operations on sensor data from equipment in the control system and network data from a network in the control system the operations comprising:
i) a normalization operation to normalize distributions of the sensor data and the network data;
ii) a checking operation to check time alignment between the sensor data and the network data; and
iii) a selection operation to select a time window for accumulating the sensor data and the network data;
b) a first neural network comprising a behavior classifier of the equipment of the control system for outputting a first classified state of the control system from the sensor data;
c) a second neural network comprising a network traffic classifier for outputting a second classified state of the control system from the network data; and
d) a discrepancy aggregator for comparing the first and the second classified state for consensus for control system anomaly detection, wherein accumulation of differences in the classified states in a given time interval above a threshold indicates occurrence of an anomaly.
29. A platform for control system anomaly detection comprising:
a) an apparatus comprising: at least one logic element for performing operations on sensor data from equipment in the control system and network data from a network in communication with the control system, and a discrepancy aggregator for control system anomaly detection; and
b) a cloud computing resource communicably coupled to the apparatus and comprising a first neural network and a second neural network;
wherein the operations comprise:
a) a normalization operation to normalize distributions of the sensor data and the network data;
b) a checking operation to check time alignment between the sensor data and the network data; and
c) a selection operation to select a time window for accumulating the sensor data and the network data;
wherein the first neural network comprises a behavior classifier of the equipment of the control system outputting a first classified state of the control system from the sensor data from the operations;
wherein the second neural network comprises a network traffic classifier outputting a second classified state of the control system from the network data from the operations;
wherein the discrepancy aggregator compares the first and the second classified state for consensus for control system anomaly detection; and wherein accumulation of differences in the classified states in a given time interval above a threshold indicates occurrence of an anomaly.
30. A computer-implemented method of training neural networks for control system anomaly detection comprising:
a) collecting input data comprising sensor data from equipment in the control system and network data from a network in communication with the control system;
b) preprocessing the sensor data and the network data to output preprocessed sensor data and preprocessed network data, the preprocessing comprising:
i) normalizing to adjust distributions of the sensor data and the network data;
ii) checking the sensor data and the network data for time alignment; and
iii) selecting a time window for accumulating the sensor data and the network data;
c) creating training sets comprising a first training set comprising the preprocessed sensor data and a second training set comprising the preprocessed network data; and
d) training a first neural network comprising a behavior classifier of the equipment of the control system with the first training set to output a first classified state; and
e) training a second neural network comprising a network traffic classifier with the second training set to output a second classified state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/837,472 US11546205B1 (en) | 2021-06-16 | 2022-06-10 | Control system anomaly detection using neural network consensus |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163211281P | 2021-06-16 | 2021-06-16 | |
US202163275759P | 2021-11-04 | 2021-11-04 | |
US17/837,472 US11546205B1 (en) | 2021-06-16 | 2022-06-10 | Control system anomaly detection using neural network consensus |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220407769A1 true US20220407769A1 (en) | 2022-12-22 |
US11546205B1 US11546205B1 (en) | 2023-01-03 |
Family
ID=84489506
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/837,472 Active US11546205B1 (en) | 2021-06-16 | 2022-06-10 | Control system anomaly detection using neural network consensus |
Country Status (6)
Country | Link |
---|---|
US (1) | US11546205B1 (en) |
EP (1) | EP4356576A1 (en) |
CA (1) | CA3221679A1 (en) |
IL (1) | IL309318A (en) |
MX (1) | MX2023015248A (en) |
WO (1) | WO2022265923A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116541794A (en) * | 2023-07-06 | 2023-08-04 | 中国科学技术大学 | Sensor data anomaly detection method based on self-adaptive graph annotation network |
CN116663434A (en) * | 2023-07-31 | 2023-08-29 | 江铃汽车股份有限公司 | Whole vehicle load decomposition method based on LSTM deep neural network |
US12081640B2 (en) * | 2022-04-12 | 2024-09-03 | Turck Holding Gmbh | Device and method for connecting a field device to a communication system |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180157552A1 (en) * | 2015-05-27 | 2018-06-07 | Hewlett Packard Enterprise Development Lp | Data validation |
US20190138423A1 (en) * | 2018-12-28 | 2019-05-09 | Intel Corporation | Methods and apparatus to detect anomalies of a monitored system |
US20190166141A1 (en) * | 2017-11-30 | 2019-05-30 | Shape Security, Inc. | Detection of malicious activity using behavior data |
US20200209842A1 (en) * | 2017-09-06 | 2020-07-02 | Nippon Telegraph And Telephone Corporation | Anomalous sound detection apparatus, anomaly model learning apparatus, anomaly detection apparatus, anomalous sound detection method, anomalous sound generation apparatus, anomalous data generation apparatus, anomalous sound generation method and program |
US10902539B2 (en) * | 2013-08-02 | 2021-01-26 | Digimarc Corporation | Learning systems and methods |
US20210124913A1 (en) * | 2019-10-24 | 2021-04-29 | Deere & Company | Object identification on a mobile work machine |
JP2021089723A (en) * | 2019-11-15 | 2021-06-10 | エヌビディア コーポレーション | Multi-view deep neural network for LiDAR perception |
CN113298265A (en) * | 2021-05-22 | 2021-08-24 | 西北工业大学 | Heterogeneous sensor potential correlation learning method based on deep learning |
CN113376657A (en) * | 2020-02-25 | 2021-09-10 | 百度(美国)有限责任公司 | Automatic tagging system for autonomous vehicle LIDAR data |
US20210342652A1 (en) * | 2020-04-30 | 2021-11-04 | Bae Systems Information And Electronic Systems Integration Inc. | Anomaly detection system using multi-layer support vector machines and method thereof |
US20210365762A1 (en) * | 2020-05-19 | 2021-11-25 | Dell Products L.P. | Detecting behavior patterns utilizing machine learning model trained with multi-modal time series analysis of diagnostic data |
US20220019863A1 (en) * | 2020-07-16 | 2022-01-20 | Applied Materials, Inc. | Anomaly detection from aggregate statistics using neural networks |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3535625B1 (en) | 2016-12-07 | 2021-02-24 | Arilou Information Security Technologies Ltd. | System and method for using signal waveform analysis for detecting a change in a wired network |
US10728265B2 (en) | 2017-06-15 | 2020-07-28 | Bae Systems Information And Electronic Systems Integration Inc. | Cyber warning receiver |
US10548032B2 (en) | 2018-01-26 | 2020-01-28 | Verizon Patent And Licensing Inc. | Network anomaly detection and network performance status determination |
WO2019191506A1 (en) * | 2018-03-28 | 2019-10-03 | Nvidia Corporation | Detecting data anomalies on a data interface using machine learning |
-
2022
- 2022-06-10 MX MX2023015248A patent/MX2023015248A/en unknown
- 2022-06-10 CA CA3221679A patent/CA3221679A1/en active Pending
- 2022-06-10 US US17/837,472 patent/US11546205B1/en active Active
- 2022-06-10 IL IL309318A patent/IL309318A/en unknown
- 2022-06-10 EP EP22825565.9A patent/EP4356576A1/en active Pending
- 2022-06-10 WO PCT/US2022/032934 patent/WO2022265923A1/en active Application Filing
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10902539B2 (en) * | 2013-08-02 | 2021-01-26 | Digimarc Corporation | Learning systems and methods |
US20180157552A1 (en) * | 2015-05-27 | 2018-06-07 | Hewlett Packard Enterprise Development Lp | Data validation |
US20200209842A1 (en) * | 2017-09-06 | 2020-07-02 | Nippon Telegraph And Telephone Corporation | Anomalous sound detection apparatus, anomaly model learning apparatus, anomaly detection apparatus, anomalous sound detection method, anomalous sound generation apparatus, anomalous data generation apparatus, anomalous sound generation method and program |
US20190166141A1 (en) * | 2017-11-30 | 2019-05-30 | Shape Security, Inc. | Detection of malicious activity using behavior data |
US20190138423A1 (en) * | 2018-12-28 | 2019-05-09 | Intel Corporation | Methods and apparatus to detect anomalies of a monitored system |
US20210124913A1 (en) * | 2019-10-24 | 2021-04-29 | Deere & Company | Object identification on a mobile work machine |
JP2021089723A (en) * | 2019-11-15 | 2021-06-10 | エヌビディア コーポレーション | Multi-view deep neural network for LiDAR perception |
CN113376657A (en) * | 2020-02-25 | 2021-09-10 | 百度(美国)有限责任公司 | Automatic tagging system for autonomous vehicle LIDAR data |
US20210342652A1 (en) * | 2020-04-30 | 2021-11-04 | Bae Systems Information And Electronic Systems Integration Inc. | Anomaly detection system using multi-layer support vector machines and method thereof |
US20210365762A1 (en) * | 2020-05-19 | 2021-11-25 | Dell Products L.P. | Detecting behavior patterns utilizing machine learning model trained with multi-modal time series analysis of diagnostic data |
US20220019863A1 (en) * | 2020-07-16 | 2022-01-20 | Applied Materials, Inc. | Anomaly detection from aggregate statistics using neural networks |
CN113298265A (en) * | 2021-05-22 | 2021-08-24 | 西北工业大学 | Heterogeneous sensor potential correlation learning method based on deep learning |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12081640B2 (en) * | 2022-04-12 | 2024-09-03 | Turck Holding Gmbh | Device and method for connecting a field device to a communication system |
CN116541794A (en) * | 2023-07-06 | 2023-08-04 | 中国科学技术大学 | Sensor data anomaly detection method based on self-adaptive graph annotation network |
CN116663434A (en) * | 2023-07-31 | 2023-08-29 | 江铃汽车股份有限公司 | Whole vehicle load decomposition method based on LSTM deep neural network |
Also Published As
Publication number | Publication date |
---|---|
CA3221679A1 (en) | 2022-12-22 |
WO2022265923A1 (en) | 2022-12-22 |
US11546205B1 (en) | 2023-01-03 |
MX2023015248A (en) | 2024-03-19 |
IL309318A (en) | 2024-02-01 |
EP4356576A1 (en) | 2024-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11546205B1 (en) | Control system anomaly detection using neural network consensus | |
Shahraki et al. | A comparative study on online machine learning techniques for network traffic streams analysis | |
US11138376B2 (en) | Techniques for information ranking and retrieval | |
Vinayakumar et al. | Long short-term memory based operation log anomaly detection | |
US20190034497A1 (en) | Data2Data: Deep Learning for Time Series Representation and Retrieval | |
US10726335B2 (en) | Generating compressed representation neural networks having high degree of accuracy | |
Yang et al. | IoT data analytics in dynamic environments: From an automated machine learning perspective | |
US20230176562A1 (en) | Providing an alarm relating to anomaly scores assigned to input data method and system | |
US20230281310A1 (en) | Systems and methods of uncertainty-aware self-supervised-learning for malware and threat detection | |
US20230109260A1 (en) | Techniques for cursor trail capture using generative neural networks | |
Zhai et al. | Classification of high-dimensional evolving data streams via a resource-efficient online ensemble | |
Ray et al. | Contemporary developments and technologies in deep learning–based IoT | |
Hammad et al. | An unsupervised TinyML approach applied to the detection of urban noise anomalies under the smart cities environment | |
CN117829209A (en) | Abnormal operation detection method, computing device and computer program for process equipment | |
US20230289482A1 (en) | Machine learning methods and systems for detecting platform side-channel attacks | |
Siwach et al. | Anomaly detection for web log based data: a survey | |
US20220027400A1 (en) | Techniques for information ranking and retrieval | |
US11810351B2 (en) | Video analytic processing with neuro-symbolic artificial intelligence | |
Abbood et al. | Improving multimedia data transmission quality in wireless multimedia sensor networks though priority-based data collection. | |
WO2023191787A1 (en) | Recommendation for operations and asset failure prevention background | |
US11321586B2 (en) | Method, apparatus, and computer program product for determining burner operating state | |
Nguyen et al. | Comprehensive survey of sensor data verification in internet of things | |
WO2022115178A1 (en) | Methods and systems for recognizing video stream hijacking on edge devices | |
KR102598126B1 (en) | Method and apparatus for managing redundant security threat data in cluster environment | |
Sanyour et al. | A Light-Weight Real-Time Anomaly Detection Framework for Edge Computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: IRONWOOD CYBER INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:THORNTON, MITCHELL;LARSON, ERIC;MANIKAS, THEODORE;AND OTHERS;SIGNING DATES FROM 20220627 TO 20220812;REEL/FRAME:060893/0115 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |