CN110992690B

CN110992690B - False data detection method based on space-time outliers in Internet of vehicles

Info

Publication number: CN110992690B
Application number: CN201911200558.3A
Authority: CN
Inventors: 刘伎昭; 王佩雪; 张俊宝; 杨昌霖
Original assignee: Zhongyuan University of Technology
Current assignee: Zhongyuan University of Technology
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2021-09-17
Anticipated expiration: 2039-11-29
Also published as: CN110992690A

Abstract

The invention proposes a false data detection method based on spatiotemporal outliers in the Internet of Vehicles. Quantify the degree of similarity of data within each group, and detect abnormal data groups with significant data divergence based on Bayesian methods. The area with high correlation of traffic data in the road space is defined as the spatiotemporal neighborhood, and the spatiotemporal neighborhood is established based on the historical data, and the outlier detection algorithm is used to detect the false data that deviates significantly from the whole. The invention utilizes the space-time correlation characteristics of traffic data in the real traffic environment to preliminarily determine the false data, and then combines the space-time outlier detection algorithm to quantify the degree of deviation of each data in the space-time neighborhood from the whole, and find the false data, which can be correct and effective. It can evaluate road traffic conditions and improve traffic management behaviors, so as to realize various upper-layer applications such as traffic safety and intelligent navigation.

Description

False data detection method based on spatiotemporal outliers in the Internet of Vehicles

技术领域technical field

本发明涉及车联网安全技术领域，特别是指一种车联网中基于时空离群点的虚假数据检测方法。The invention relates to the technical field of Internet of Vehicles security, in particular to a false data detection method based on spatiotemporal outliers in the Internet of Vehicles.

背景技术Background technique

车联网通过在车辆上安装多类型传感器(GPS、雷达、加速度传感器等)，使车辆在行驶过程能够感知自身及周边车辆的运动状态(位置、速度、方向、瞬时加速度等)，以及道路环境信息(天气、道路状况等)。使用专用短距离通信技术(DSRC:Dedicated Short-RangeCommunications)和5G技术，实现车-车之间、车-路边基础设施之间或者车-后台管理中心之间的无线通信，将这些信息分享给周边车辆，或者经过预部署的路边基础设施(RSU:RoadSide Unit)进行数据聚合后提供给后台交通管理中心。基于这些信息，能够有效地评价道路交通状况，计算车辆行程时间，或者感知交通拥堵、交通事故等异常事件，从而实现交通安全、智能导航等丰富多样的上层应用。By installing multiple types of sensors (GPS, radar, acceleration sensors, etc.) on the vehicle, the Internet of Vehicles enables the vehicle to perceive the motion state (position, speed, direction, instantaneous acceleration, etc.) of itself and surrounding vehicles during driving, as well as road environment information. (weather, road conditions, etc.). Use Dedicated Short-Range Communications (DSRC: Dedicated Short-Range Communications) and 5G technology to realize wireless communication between vehicles, between vehicles and roadside infrastructure, or between vehicles and background management centers, and share this information with The surrounding vehicles or the pre-deployed roadside infrastructure (RSU: RoadSide Unit) aggregates the data and provides it to the back-end traffic management center. Based on this information, it can effectively evaluate road traffic conditions, calculate vehicle travel time, or perceive abnormal events such as traffic congestion and traffic accidents, so as to realize various upper-layer applications such as traffic safety and intelligent navigation.

车联网的网络架构具有分布式、自组织的特点，车辆之间具有平等的地位，任意车辆都可以发送交通数据。由于恶意攻击、用户自私性、传感器故障等原因，车辆节点可能提供不准确甚至错误的数据。比如：为了更快地抵达目的地，自私的车辆驾驶者发送伪造的高车辆密度和低行驶速度，制造交通拥堵的假象，诱使周边车辆驶出当前道路，从而实现清空道路的目的。车辆可能被错误的数据误导做出错误的反应，如急刹车、变道等，不但干扰正常的交通秩序，极端情况下甚至可能造成交通事故。后台管理中心也会对交通状况产生错误的判断，做出错误的交通管理行为，导致交通混乱。The network architecture of the Internet of Vehicles has the characteristics of distributed and self-organization. Vehicles have equal status, and any vehicle can send traffic data. Due to malicious attacks, user selfishness, sensor failures, etc., vehicle nodes may provide inaccurate or even wrong data. For example, in order to reach their destination faster, selfish vehicle drivers send fake high vehicle density and low driving speed to create the illusion of traffic congestion, and induce surrounding vehicles to leave the current road, so as to achieve the purpose of clearing the road. Vehicles may be misled by wrong data to make wrong responses, such as sudden braking, lane change, etc., which not only interfere with normal traffic order, but may even cause traffic accidents in extreme cases. The background management center will also make wrong judgments on traffic conditions, make wrong traffic management behaviors, and cause traffic chaos.

发明内容SUMMARY OF THE INVENTION

针对上述背景技术中存在的不足，本发明提出了一种车联网中基于时空离群点的虚假数据检测方法，解决了现有车联网中内部敌手注入虚假交通数据造成交通混乱的技术问题。In view of the deficiencies in the above background technology, the present invention proposes a false data detection method based on spatiotemporal outliers in the Internet of Vehicles, which solves the technical problem of traffic chaos caused by the injection of false traffic data by internal adversaries in the existing Internet of Vehicles.

本发明的技术方案是这样实现的：The technical scheme of the present invention is realized as follows:

一种车联网中基于时空离群点的虚假数据检测方法，其步骤如下：A method for detecting false data based on spatiotemporal outliers in the Internet of Vehicles, the steps of which are as follows:

S1、数据收集与分组：车联网中部署在路边的RSU负责收集覆盖范围内车辆节点报告的数据，并按照数据的时间、空间属性对数据进行分组得到数据组；S1. Data collection and grouping: The RSU deployed on the roadside in the Internet of Vehicles is responsible for collecting the data reported by vehicle nodes within the coverage area, and grouping the data according to the time and space attributes of the data to obtain data groups;

S2、数据异常检测：利用RSU计算数据组的数据一致性因子C，并基于贝叶斯决策和假设检验方法判断数据一致性因子C的值在时间轴上是否发生显著改变，若是，将数据组标记为异常数据组，执行步骤S3，否则，返回步骤S1；S2. Data anomaly detection: Use RSU to calculate the data consistency factor C of the data group, and judge whether the value of the data consistency factor C has changed significantly on the time axis based on Bayesian decision-making and hypothesis testing methods. If it is marked as an abnormal data group, go to step S3, otherwise, return to step S1;

S3、建立时空邻域：利用关联熵构建异常数据组的时空领域；S3. Establish spatio-temporal neighborhood: use correlation entropy to construct spatio-temporal domain of abnormal data set;

S4、时空离群点检测：利用离群点检测算法检测异常数据组的时空邻域中的离群点，确定攻击者注入的虚假数据。S4, spatiotemporal outlier detection: use an outlier detection algorithm to detect outliers in the spatiotemporal neighborhood of an abnormal data group, and determine the false data injected by the attacker.

所述步骤S2中数据异常检测的方法为：The method for data abnormality detection in the step S2 is:

S21、数据组的数据一致性因子C的计算方法为：S21. The calculation method of the data consistency factor C of the data group is:

其中，G_s(t)＝{d₁,d₂,…d_N}为一组N个不同车辆在路段s和时隙t内采集的交通数据的数据组，t＝1,2,…T，i,j＝1,2,…,N；Among them, G _s (t)={d ₁ , d ₂ ,...d _N } is a data group of traffic data collected by a group of N different vehicles in road segment s and time slot t, t=1, 2,...T , i,j=1,2,...,N;

S22、根据假设检验原理，建立假设模型：S22. According to the principle of hypothesis testing, establish a hypothesis model:

其中，Θ^T＝{C₁,…C_T}为当前数据组在时隙1,2,…T上的一致性因子向量，H₀表示C_t的值未发生显著变化，H₁为表示C_t的值在时隙t₀发生显著变化；Among them, Θ ^T = {C ₁ ,...C _T } is the consistency factor vector of the current data group on time slots 1, 2,...T, H ₀ means that the value of C _t has not changed significantly, H ₁ is the expression C The value of _t changes significantly at time slot t ₀ ;

S23、根据贝叶斯原理，使用后验概率确定数据一致性因子C的值在时间t上发生显著变化的概率：S23. According to the Bayesian principle, use the posterior probability to determine the probability that the value of the data consistency factor C changes significantly at time t:

其中，q∈{1，2}，P(H_q|Θ^T)为似然函数，P(H_q)为先验概率；Among them, q∈{1, 2}, P(H _q |Θ ^T ) is the likelihood function, and P(H _q ) is the prior probability;

S24、根据似然公式判断数据一致性因子C在t＝t₀时发生了显著改变：S24. According to the likelihood formula, it is judged that the data consistency factor C has changed significantly at t=t ₀ :

S25、将数据组G_s(t₀)视为异常数据组，并构建时空邻域，确定其中攻击者注入的虚假数据。S25 , regard the data group G _s (t ₀ ) as an abnormal data group, and construct a spatiotemporal neighborhood to determine the false data injected by the attacker.

所述时空邻域包括直接时间邻域、直接空间邻域和间接时空邻域；所述直接时间邻域：包括同一位置、相邻时隙采集的数据组G_s(t-1)、G_s(t+1)；所述直接空间邻域：包括同一时隙，相邻路段的数据G_s-1(t)、G_s+1(t)；所述间接时空邻域：包括时间、空间都间接相邻的数据组。The spatiotemporal neighborhood includes a direct temporal neighborhood, a direct spatial neighborhood, and an indirect spatiotemporal neighborhood; the direct temporal neighborhood includes the data sets G _s (t-1), G _s collected at the same location and adjacent time slots (t+1); the direct spatial neighborhood: including the same time slot, data G _s-1 (t) and G _s+1 (t) of adjacent road segments; the indirect space-time neighborhood: including time, space are all indirectly adjacent datasets.

所述步骤S3中利用关联熵构建异常数据组的时空领域的方法为：In the step S3, the method of constructing the spatiotemporal domain of the abnormal data group by using the association entropy is:

S31、令X为随机变量，表示车辆收集的交通数据，将X值离散化为Q个小的子区间；S31. Let X be a random variable, representing the traffic data collected by the vehicle, and discretize the X value into Q small sub-intervals;

S32、数据组G_s(t)＝(d₁,d₂,…d_N)为N个车辆报告的数据d₁,d₂,…d_N，其中，d_i∈[0,Q-1]；在N′个速度值中，落在第j子区间的速度值的个数是s_j，其中，0≤j≤Q-1，随机变量X的熵为：S32. The data group G _s (t)=(d ₁ , d ₂ ,...d _N ) is the data d ₁ , d ₂ ,... d _N reported by N vehicles, where d _i ∈[0,Q-1] ; Among the N' velocity values, the number of velocity values falling in the jth sub-interval is s _j , where 0≤j≤Q-1, and the entropy of the random variable X is:

S33、令X_G1为数据组G₁(t)的随机变量，X_G2为数据组G₂(t)的随机变量，则数据组G₁(t)和G₂(t)的关联熵为：S33. Let X _G1 be the random variable of the data group G ₁ (t), and X _G2 be the random variable of the data group G ₂ (t), then the correlation entropy of the data groups G ₁ (t) and G ₂ (t) is:

其中，

表示

且

的概率，则随机变量

和

的互信息为：in,

express

and

probability, then the random variable

and

The mutual information is:

S34、利用数据组G₁(t)和G₂(t)之间的冗余信息R(G₁,G₂)量化数据组G₁(t)和G₂(t)之间的关联：S34. Use redundant information R(G ₁ , G ₂ ) between the data groups G ₁ (t) and G ₂ (t) to quantify the association between the data groups G ₁ (t) and G ₂ (t):

S35、存在s-l^-或s+l⁺满足

或者

但

或者

将l^-或者l⁺定义数据组G_s(t)的时空邻域边界；G_s(t)的空间邻域为

其中，l^-和l⁺均是正整数；S35. Existence of ^sl- or s+l ⁺ is satisfied

or

but

or

Let l- or l ⁺ define the space ^- time neighborhood boundary of the data set G _s (t); the spatial neighborhood of G _s (t) is

Among them ^, l- and l ⁺ are positive integers;

S36、一个数据组的时间邻域包含在同样路段但不同时隙收集的数据组，若存在t-l^-或t+l⁺满足R(G_s(t),G_s(t-l^-))≥ε或者R(G_s(t),G_s(t+l⁺))≥ε，但R(G_s(t),G_s(t-l^--1))<ε或者R(G_s(t),G_s(t+l⁺+1))<ε，称l^-或者l⁺为时空邻域边界。S36. The time neighborhood of a data group includes data groups collected in the same road section but in different time slots, if there is tl ^- or t+l ⁺ satisfying R(G _s (t), G _s (tl ^- ))≥ε or R(G _s (t),G _s (t+l ⁺ ))≥ε, but R(G _s (t),G _s (tl ^- -1))<ε or R(G _s (t),G _s (t+l ⁺ +1))<ε, call l ^- or l ⁺ as the space-time neighborhood boundary.

所述步骤S4中利用离群点检测算法检测异常数据组的时空邻域中的离群点的方法为：The method for detecting the outliers in the spatiotemporal neighborhood of the abnormal data group by using the outlier detection algorithm in the step S4 is:

S41、一个异常数据组的时空邻域由m个数据项组成，记为M＝{o₁,o₂,..p_m}，计算M中数据的均值：S41. The spatiotemporal neighborhood of an abnormal data group is composed of m data items, denoted as M={o ₁ , o ₂ , ..p _m }, and the mean value of the data in M is calculated:

S42、计算M中数据的标准差为

其中，

S42. Calculate the standard deviation of the data in M as

in,

S43、检查每一个数据项o是否在区间[A-k₀·σ,A+k₀·σ]内，其中，k₀>1是预定义的常数参数；落在此区间之外的数据项被视为离群点，被视为敌手注入的虚假数据，数据提供者被视为攻击者。S43. Check whether each data item o is within the interval [Ak ₀ ·σ, A+k ₀ ·σ], where k ₀ >1 is a predefined constant parameter; data items that fall outside this interval are regarded as As outliers, it is regarded as fake data injected by the adversary, and the data provider is regarded as the attacker.

本技术方案能产生的有益效果：本发明利用真实交通环境中交通数据的时空相关特性，初步判断虚假数据，再结合时空离群点检测算法，量化时空邻域中每个数据与整体的偏离程度，找出虚假数据，能够正确有效地评价道路交通状况，提高交通管理行为能力，从而实现交通安全、智能导航等丰富多样的上层应用。The beneficial effects that this technical solution can produce: the present invention utilizes the spatiotemporal correlation characteristics of traffic data in a real traffic environment to preliminarily determine false data, and then combines the spatiotemporal outlier detection algorithm to quantify the degree of deviation between each data in the spatiotemporal neighborhood and the whole , find false data, can correctly and effectively evaluate road traffic conditions, improve traffic management behavior, so as to achieve traffic safety, intelligent navigation and other rich and diverse upper-level applications.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明的流程图；Fig. 1 is the flow chart of the present invention;

图2为本发明实施例实用的地图；2 is a practical map of an embodiment of the present invention;

图3为本发明与对比算法在不同车辆密度下的检测率；Fig. 3 is the detection rate of the present invention and the comparison algorithm under different vehicle densities;

图4为本发明与对比算法在不同攻击者比例下的检测率；Fig. 4 is the detection rate of the present invention and the comparative algorithm under different proportions of attackers;

图5为本发明与对比算法在不同攻击者比例下的误检率。FIG. 5 is the false detection rate of the present invention and the comparative algorithm under different proportions of attackers.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明使用离散化的时间和道路空间。时间划分为若干等长的时隙，记为t＝1,2,…。道路划分为若干个路段，记为s＝1,2,…。车辆在行驶过程中利用车载传感器采集车辆密度、速度、流量等交通数据，并通过DSRC(Dedicated Short-Range Communications)无线通信技术分享这些数据，以实现交通安全、交通管理等多类型上层应用。消息的一般格式为o＝{t,(lon,lat),d,vid}，其中，t是数据采集的时间，lon和lat表示数据采集的位置(经纬度)，d是车辆采集的交通数据，vid是车辆身份标识符。假设沿道路预先部署RSU(Roadside unit)，负责收集车辆报告的数据，执行本发明提出的检测方法，评价车辆报告的数据的真实性，检测攻击者注入的虚假数据。The present invention uses discretized time and road space. Time is divided into several time slots of equal length, denoted as t=1, 2, . . . The road is divided into several sections, denoted as s=1,2,…. Vehicles use on-board sensors to collect traffic data such as vehicle density, speed, and flow during driving, and share these data through DSRC (Dedicated Short-Range Communications) wireless communication technology to achieve multiple types of upper-layer applications such as traffic safety and traffic management. The general format of the message is o={t,(lon,lat),d,vid}, where t is the time of data collection, lon and lat indicate the location (latitude and longitude) of data collection, d is the traffic data collected by the vehicle, vid is the vehicle identification identifier. It is assumed that the RSU (Roadside Unit) is pre-deployed along the road, responsible for collecting the data reported by the vehicle, executing the detection method proposed by the present invention, evaluating the authenticity of the data reported by the vehicle, and detecting the false data injected by the attacker.

如图1所示，本发明实施例提供了一种车联网中基于时空离群点的虚假数据检测方法，具体步骤如下：As shown in FIG. 1 , an embodiment of the present invention provides a method for detecting false data based on spatiotemporal outliers in the Internet of Vehicles. The specific steps are as follows:

S1、数据收集与分组：车联网中部署在路边的RSU负责收集覆盖范围内车辆节点报告的数据，并按照数据的时间、空间属性对数据进行分组得到数据组；以路段s为例，RSU将收到的数据分组为：G_s(1),G_s(2),…G_s(T)，表示路段同一路段s中在一系列时隙1,2,…T采集的数据。S1. Data collection and grouping: The RSU deployed on the roadside in the Internet of Vehicles is responsible for collecting the data reported by the vehicle nodes within the coverage area, and grouping the data according to the time and space attributes of the data to obtain a data group; taking the road segment s as an example, the RSU The received data are grouped into: G _s (1), G _s (2),…G _s (T), which represent the data collected in a series of time slots 1, 2,…T in the same road segment s of the road segment.

所述数据异常检测的方法为：The data anomaly detection method is:

S21、数据一致性的定义为数据一致性因子量化组内数据相似的程度，数据组的数据一致性因子C的计算方法为：S21. The definition of data consistency is that the data consistency factor quantifies the degree of similarity of the data in the group. The calculation method of the data consistency factor C of the data group is:

其中，Θ^T＝{C₁,…C_T}为当前数据组在时隙1,2,…T上的一致性因子组成的向量，H₀为原假设，表示C_t的值未发生显著变化，H₁为备选假设，表示C_t的值在时隙t₀发生显著变化；Among them, Θ ^T ={C ₁ ,...C _T } is the vector composed of the consistency factors of the current data set on time slots 1, 2,...T, and H ₀ is the null hypothesis, indicating that the value of C _t has not changed significantly , H ₁ is an alternative hypothesis, indicating that the value of C _t changes significantly at time slot t ₀ ;

S25、将数据组G_s(t₀)视为异常数据组，并执行进一步检测过程，确定其中攻击者注入的虚假数据。S25, regard the data group G _s (t ₀ ) as an abnormal data group, and perform a further detection process to determine the false data injected by the attacker.

S3、建立时空邻域：利用关联熵构建异常数据组的时空领域；使用步骤S2能够发现数据异常的数据组，但无法确定哪些组内数据是攻击者注入的。本发明利用历史交通数据构建异常数据组的时空领域，将每个数据项与时空邻域内的数据进行比较，使用时空离群点算法发现显著偏离的数据项。道路场景中，相邻两个位置之间交通数据一般存在较强的时空关联性，但关联的程度受地理位置、交通规则等多种因素影响。如高速公路场景中，前后相邻的两个路段呈现较强的关联性，而城市场景中被十字路口分割的前后两个路段虽然地理位置邻近，但受红绿灯影响车辆运动相关性较弱。使用时空邻域表示道路空间中交通数据高度相关的地理区域。具体地，数据组G_s(t)包含时隙s和路段t的数据，其时空邻域包括直接时间邻域、直接空间邻域和间接时空邻域；所述直接时间邻域：包括同一位置、相邻时隙采集的数据组G_s(t-1)、G_s(t+1)；所述直接空间邻域：包括同一时隙，相邻路段的数据G_s-1(t)、G_s+1(t)；所述间接时空邻域：包括时间、空间都间接相邻的数据组。S3. Establish spatiotemporal neighborhood: use association entropy to construct the spatiotemporal domain of abnormal data groups; using step S2, data groups with abnormal data can be found, but it is impossible to determine which data groups are injected by attackers. The present invention uses historical traffic data to construct a spatiotemporal domain of abnormal data groups, compares each data item with data in the spatiotemporal neighborhood, and uses a spatiotemporal outlier algorithm to find data items that deviate significantly. In road scenes, traffic data between two adjacent locations generally has a strong spatial and temporal correlation, but the degree of correlation is affected by various factors such as geographic location and traffic rules. For example, in the highway scene, the two adjacent road sections before and after show strong correlation, while in the urban scene, the front and rear road sections divided by intersections are geographically close, but the correlation between vehicle motions affected by traffic lights is weak. Use spatiotemporal neighborhoods to represent geographic areas in road space where traffic data is highly correlated. Specifically, the data group G _s (t) contains data of time slot s and road segment t, and its spatiotemporal neighborhood includes direct temporal neighborhood, direct spatial neighborhood and indirect spatiotemporal neighborhood; the direct temporal neighborhood: including the same location , the data groups G _s (t-1), G _s (t+1) collected in adjacent time slots; the direct spatial neighborhood: including the same time slot, the data G _s-1 (t), G _s+1 (t); the indirect spatiotemporal neighborhood: including data groups that are indirectly adjacent in time and space.

所述利用关联熵构建异常数据组的时空领域的方法为：The method for constructing the spatiotemporal domain of anomalous data sets by using the correlation entropy is as follows:

S32、数据组G_s(t)＝(d₁,d₂,…d_N)为N个车辆报告的数据d₁,d₂,…d_N，d_i∈[0,Q-1]；在N′个速度值中，落在第j子区间的速度值的个数是s_j，其中，0≤j≤Q-1，随机变量X的熵为：S32. The data group G _s ( _t ) ₌ ₍ _d ₁ , d ₂ , _. Among the N' velocity values, the number of velocity values falling in the jth sub-interval is s _j , where 0≤j≤Q-1, and the entropy of the random variable X is:

其中，

表示

且

的概率，则随机变量

和

的互信息为：in,

express

and

probability, then the random variable

and

The mutual information is:

S35、存在s-l^-或s+l⁺满足

或者

但

或者

其中，l^-和l⁺均是正整数，将l^-或者l⁺定义数据组G_s(t)的时空邻域边界；G_s(t)的空间邻域为

S35. Existence of ^sl- or s+l ⁺ is satisfied

or

but

or

Among them, l ^- and l ⁺ are both positive integers, and l ^- or l ⁺ defines the space-time neighborhood boundary of the data group G _s (t); the spatial neighborhood of G _s (t) is

所述利用离群点检测算法检测异常数据组的时空邻域中的离群点的方法为：The method for detecting outliers in the spatiotemporal neighborhood of an abnormal data group by using an outlier detection algorithm is:

S41、一个异常数据组的时空邻域由m个数据项组成，记为M＝{o₁,o₂,..o_m}，计算M中数据的均值：S41. The spatiotemporal neighborhood of an abnormal data group is composed of m data items, denoted as M={o ₁ , o ₂ , ..o _m }, and the mean value of the data in M is calculated:

S42、计算M中数据的标准差为

其中，

S42. Calculate the standard deviation of the data in M as

in,

基于仿真方法验证本发明的有效性，并与现有方法进行对比分析，评价本发明的技术优势。The effectiveness of the present invention is verified based on the simulation method, and the technical advantages of the present invention are evaluated by comparative analysis with the existing methods.

(1)仿真方法(1) Simulation method

仿真实验中使用交通仿真软件SUMO产生具有高度真实性的交通场景，使用OpenStreetMap项目提供的电子地图，仿真区域大小为2000米×2000米，如图2所示。所有道路为双向、每向3车道，车辆进入仿真区域的位置和方向为随机选择。仿真区域内车辆的平均密度为75辆/公里，车辆行驶过程中向RSU报告实时行驶速度。仿真运行100次，所有仿真数据为100次运行的平均值。In the simulation experiment, the traffic simulation software SUMO is used to generate a highly realistic traffic scene, and the electronic map provided by the OpenStreetMap project is used. The size of the simulation area is 2000 meters × 2000 meters, as shown in Figure 2. All roads are two-way, three lanes in each direction, and the position and direction of the vehicle entering the simulation area are randomly selected. The average density of vehicles in the simulation area is 75 vehicles/km, and the real-time driving speed is reported to the RSU during the driving process. Simulations were run 100 times and all simulation data are the average of 100 runs.

为了验证检测方法的准确性，设置了合谋攻击场景，假设存在多个合谋的攻击者，发送伪造的低车辆速度，制造交通拥堵的假象。攻击位置和时间为随机选择，伪造的车辆速度在[0,10]km/h之间随机选择，每个车辆在一个时隙中发送一次车辆速度，攻击者在所有车辆中的比例记为γ。In order to verify the accuracy of the detection method, a collusion attack scenario is set up, assuming that there are multiple colluding attackers, sending fake low vehicle speeds to create the illusion of traffic congestion. The attack location and time are randomly selected, the fake vehicle speed is randomly selected between [0,10]km/h, each vehicle sends the vehicle speed once in a time slot, and the proportion of attackers in all vehicles is denoted as γ .

使用如下度量对检测方法的性能进行评价。检测率(detection rate：DR)定义为所有虚假数据中被正确检测的比例，误检率(false positive rate:FPR)定义为数据被错误检测的概率，包括真实数据被检测为虚假数据和虚假数据被检测为真实数据。The performance of the detection method was evaluated using the following metrics. The detection rate (DR) is defined as the proportion of all false data that are correctly detected, and the false positive rate (FPR) is defined as the probability that the data is detected incorrectly, including real data being detected as false data and false data. detected as real data.

(2)结果及分析(2) Results and analysis

考察了车辆密度和攻击者比例对算法性能的影响。仿真结果在图3至图5中给出。图3给出了不同的车辆密度下本发明方法和LEAVE(Local eviction of attackers andvoting evaluators)协议的性能对比评价。本发明方法在所有车辆密度下的检测率均高于LEAVE，在车辆密度为80辆/公里时，本方法的检测率为98.3％，而LEAVE协议的检测率为91.5％。此外，本发明方法在车辆密度较低时检测率低于车辆高密度场景，在车辆密度为20辆/公里时，检测率为92.0％。当车辆密度为200辆/公里时检测率为99.9％。这主要是因为车辆密度较低时车辆之间的距离较大，车辆之间的空间限制不明显，车辆能够按照理想速度行驶，由于不同类型的车辆具有不同的理想速度，因此车辆行驶速度相差较大，导致车辆采集的数据呈现较大的方差，给正确识别增加了难度。The effects of vehicle density and attacker ratio on the performance of the algorithm are investigated. The simulation results are presented in Figures 3 to 5. Figure 3 shows the performance comparison evaluation of the method of the present invention and the LEAVE (Local eviction of attackers and voting evaluators) protocol under different vehicle densities. The detection rate of the method of the present invention is higher than that of LEAVE under all vehicle densities. When the vehicle density is 80 vehicles/km, the detection rate of this method is 98.3%, while the detection rate of the LEAVE protocol is 91.5%. In addition, when the vehicle density is low, the detection rate of the method of the present invention is lower than that of the high vehicle density scene, and when the vehicle density is 20 vehicles/km, the detection rate is 92.0%. The detection rate is 99.9% when the vehicle density is 200 vehicles/km. This is mainly because the distance between vehicles is large when the vehicle density is low, the space restriction between vehicles is not obvious, and the vehicles can travel at the ideal speed. Since different types of vehicles have different ideal speeds, the speed difference between vehicles is relatively small. If the data is large, the data collected by the vehicle presents a large variance, which increases the difficulty of correct identification.

图4和图5评价了本发明方法和LEAVE协议的抗合谋攻击能力。图3给出了不同攻击者比例下的检测率。从图中可知本发明方法在低攻击者比例下的检测率与LEAVE相似。然而，当攻击者比例升高时，LEAVE协议的检测率快速下降，当攻击者比例分别达到30％和40％时，检测率为65.5％和40％。本发明方法显示出较强的抗合谋攻击能力。在攻击者比例达到25％时，仍然能够达到96.5％的检测率。同时，误检率也显著低于LEAVE，在25％的攻击者比例下误检率为5.9％。Figures 4 and 5 evaluate the anti-collision attack capability of the method of the present invention and the LEAVE protocol. Figure 3 presents the detection rates under different proportions of attackers. It can be seen from the figure that the detection rate of the method of the present invention is similar to that of LEAVE under the low proportion of attackers. However, when the proportion of attackers increases, the detection rate of the LEAVE protocol decreases rapidly, when the proportion of attackers reaches 30% and 40%, respectively, the detection rate is 65.5% and 40%. The method of the invention shows strong anti-collusion attack capability. When the proportion of attackers reaches 25%, it is still able to achieve a detection rate of 96.5%. At the same time, the false detection rate is also significantly lower than that of LEAVE, with a false detection rate of 5.9% under a 25% attacker ratio.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the scope of the present invention. within the scope of protection.

Claims

1. a false data detection method based on spatiotemporal outliers in the Internet of Vehicles, is characterized in that, its steps are as follows:

S1. Data collection and grouping: The RSU deployed on the roadside in the Internet of Vehicles is responsible for collecting the data reported by vehicle nodes within the coverage area, and grouping the data according to the time and space attributes of the data to obtain data groups;

S2. Data anomaly detection: Use RSU to calculate the data consistency factor C of the data group, and judge whether the value of the data consistency factor C has changed significantly on the time axis based on Bayesian decision-making and hypothesis testing methods. If it is marked as an abnormal data group, go to step S3, otherwise, return to step S1;

The method for data abnormality detection in the step S2 is:

S21. The calculation method of the data consistency factor C of the data group is:

Among them, G _s (t)={d ₁ , d ₂ ,...d _N } is a data group of traffic data collected by a group of N different vehicles in road segment s and time slot t, t=1, 2,...T , i,j=1,2,...,N;

S22. According to the principle of hypothesis testing, establish a hypothesis model:

Among them, Θ ^T = {C ₁ ,...C _T } is the consistency factor vector of the current data group on time slots 1, 2,...T, H ₀ means that the value of C _t has not changed significantly, H ₁ is the expression C The value of _t changes significantly at time slot t ₀ ;

S23. According to the Bayesian principle, use the posterior probability to determine the probability that the value of the data consistency factor C changes significantly at time t:

Among them, q∈{1, 2}, P(H _q |Θ ^T ) is the likelihood function, and P(H _q ) is the prior probability;

S24. According to the likelihood formula, it is judged that the data consistency factor C has changed significantly at t=t ₀ :

S25, regard the data group G _s (t ₀ ) as an abnormal data group, and construct a space-time neighborhood to determine the false data injected by the attacker;

S3. Establish spatio-temporal neighborhood: use correlation entropy to construct spatio-temporal domain of abnormal data set;

S4, spatiotemporal outlier detection: use an outlier detection algorithm to detect outliers in the spatiotemporal neighborhood of an abnormal data group, and determine the false data injected by the attacker.

2. The method for detecting false data based on spatiotemporal outliers in the Internet of Vehicles according to claim 1, wherein the spatiotemporal neighborhood comprises a direct temporal neighborhood, a direct spatial neighborhood and an indirect spatiotemporal neighborhood; the Direct temporal neighborhood: including the data groups G _s (t-1) and G _s (t+1) collected at the same location and adjacent time slots; the direct spatial neighborhood: including the data of the same time slot and adjacent road sections G _s-1 (t), G _s+1 (t); the indirect space-time neighborhood: including data groups that are indirectly adjacent in time and space.

3. the false data detection method based on spatiotemporal outliers in the Internet of Vehicles according to claim 1, is characterized in that, utilizes association entropy in described step S3 to construct the method for the spatiotemporal domain of abnormal data group as:

S31. Let X be a random variable, representing the traffic data collected by the vehicle, and discretize the X value into Q small sub-intervals;

S32. The data group G _s (t)=(d ₁ , d ₂ ,...d _N ) is the data d ₁ , d ₂ ,... d _N reported by N vehicles, where d _i ∈[0,Q-1] ; Among the N' velocity values, the number of velocity values falling in the jth sub-interval is s _j , where 0≤j≤Q-1, and the entropy of the random variable X is:

S33. Let X _G1 be the random variable of the data group G ₁ (t), and X _G2 be the random variable of the data group G ₂ (t), then the correlation entropy of the data groups G ₁ (t) and G ₂ (t) is:

in,

express

and

probability, then the random variable

and

The mutual information is:

S34. Use redundant information R(G ₁ , G ₂ ) between the data groups G ₁ (t) and G ₂ (t) to quantify the association between the data groups G ₁ (t) and G ₂ (t):

S35. Existence of ^sl- or s+l ⁺ is satisfied

or

but

or

Among them ^, l- and l ⁺ are positive integers;

S36. The time neighborhood of a data group includes data groups collected in the same road section but in different time slots, if there is tl ^- or t+l ⁺ satisfying R(G _s (t), G _s (tl ^- ))≥ε or R(G _s (t),G _s (t+l ⁺ ))≥ε, but R(G _s (t),G _s (tl ^- -1))<ε or R(G _s (t),G _s (t+l ⁺ +1))<ε, call l ^- or l ⁺ as the space-time neighborhood boundary.

4. The method for detecting false data based on spatiotemporal outliers in the Internet of Vehicles according to claim 1, wherein in the step S4, an outlier detection algorithm is used to detect outliers in the spatiotemporal neighborhood of abnormal data groups The point method is:

S41. The spatiotemporal neighborhood of an abnormal data group is composed of m data items, denoted as M={o ₁ , o ₂ , ..o _m }, and the mean value of the data in M is calculated:

S42. Calculate the standard deviation of the data in M as

in,

S43. Check whether each data item o is within the interval [Ak ₀ ·σ, A+k ₀ ·σ], where k ₀ >1 is a predefined constant parameter; data items that fall outside this interval are regarded as As outliers, it is regarded as fake data injected by the adversary, and the data provider is regarded as the attacker.