TWI723476B - Interpretation feature determination method, device and equipment for abnormal detection - Google Patents
Interpretation feature determination method, device and equipment for abnormal detection Download PDFInfo
- Publication number
- TWI723476B TWI723476B TW108126301A TW108126301A TWI723476B TW I723476 B TWI723476 B TW I723476B TW 108126301 A TW108126301 A TW 108126301A TW 108126301 A TW108126301 A TW 108126301A TW I723476 B TWI723476 B TW I723476B
- Authority
- TW
- Taiwan
- Prior art keywords
- sample
- feature
- anomaly detection
- detection model
- sample feature
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Testing And Monitoring For Control Systems (AREA)
- Image Analysis (AREA)
Abstract
本說明書實施例提供一種異常檢測的解釋特徵確定方法和裝置,其中,方法可以包括:對於輸入異常檢測模型的一個樣本,所述樣本包括至少一個樣本特徵,根據每個樣本特徵的分佈參數確定所述樣本特徵的偏移度;所述分佈參數用於表示該樣本特徵在所述異常檢測模型的訓練集資料中的分佈特點;所述異常檢測模型是無監督模型;根據所述樣本中的各個樣本特徵的偏移度,確定至少一個樣本特徵作為所述樣本對應的解釋特徵,所述解釋特徵用於解釋所述樣本與對應的所述異常檢測模型的模型輸出結果之間的關聯。 The embodiments of this specification provide a method and device for determining interpretation features of anomaly detection, wherein the method may include: for a sample input to an anomaly detection model, the sample includes at least one sample feature, and the distribution parameter of each sample feature is determined. The deviation degree of the sample feature; the distribution parameter is used to represent the distribution feature of the sample feature in the training set data of the anomaly detection model; the anomaly detection model is an unsupervised model; according to each of the samples The deviation degree of the sample feature determines at least one sample feature as the explanatory feature corresponding to the sample, and the explanatory feature is used to explain the association between the sample and the model output result of the corresponding anomaly detection model.
Description
本公開涉及大資料技術領域,特別涉及一種異常檢測的解釋特徵確定方法和裝置。 The present disclosure relates to the technical field of big data, and in particular to a method and device for determining interpretation characteristics of anomaly detection.
異常檢測是資料採擷中的較為重要的一部分,可以應用於入侵偵測、欺詐檢測、故障檢測、系統健康檢測、感測器網路事件檢測和生態系統干擾檢測等多種領域。在實際的異常檢測應用當中,其中一種演算法即為無監督的異常檢測模型。異常檢測模型往往是一個黑盒,使用者無法感知其內部工作狀態,為了提高使用模型的可信度,模型解釋就顯得至關重要。透過對模型解釋,可以進一步理解模型的輸出結果,例如究竟輸入樣本的哪些特徵對模型輸出影響最大。透過模型解釋能夠為異常檢測模型的輸出結果的原因提供分析方向。 Anomaly detection is an important part of data collection, and can be applied to various fields such as intrusion detection, fraud detection, fault detection, system health detection, sensor network event detection, and ecosystem interference detection. In actual anomaly detection applications, one of the algorithms is an unsupervised anomaly detection model. The anomaly detection model is often a black box, and users cannot perceive its internal working status. In order to improve the credibility of using the model, model interpretation is very important. By interpreting the model, we can further understand the output of the model, such as which features of the input sample have the greatest impact on the model output. Interpretation through the model can provide an analysis direction for the causes of the output results of the anomaly detection model.
有鑑於此,本說明書一個或多個實施例提供一種異常檢測的解釋特徵確定方法和裝置,以提高異常檢測的解釋特徵獲取的準確性。 In view of this, one or more embodiments of this specification provide a method and device for determining interpretation features of anomaly detection, so as to improve the accuracy of obtaining interpretation features of anomaly detection.
具體地,本說明書一個或多個實施例是透過如下技術方案實現的: Specifically, one or more embodiments of this specification are implemented through the following technical solutions:
第一態樣,提供一種異常檢測的解釋特徵確定方法,所述方法包括:對於輸入異常檢測模型的一個樣本,所述樣本包括至少一個樣本特徵,根據每個樣本特徵的分佈參數確定所述樣本特徵的偏移度;所述分佈參數用於表示該樣本特徵在所述異常檢測模型的訓練集資料中的分佈特點;所述異常檢測模型是無監督模型;根據所述樣本中的各個樣本特徵的偏移度,確定至少一個樣本特徵作為所述樣本對應的解釋特徵,所述解釋特徵用於解釋所述樣本與對應的所述異常檢測模型的模型輸出結果之間的關聯。 In a first aspect, a method for determining interpretation features of anomaly detection is provided. The method includes: for a sample input to an anomaly detection model, the sample includes at least one sample feature, and the sample is determined according to the distribution parameter of each sample feature The offset degree of the feature; the distribution parameter is used to indicate the distribution feature of the sample feature in the training set data of the anomaly detection model; the anomaly detection model is an unsupervised model; according to each sample feature in the sample At least one sample feature is determined as the explanatory feature corresponding to the sample, and the explanatory feature is used to explain the association between the sample and the model output result of the corresponding abnormality detection model.
第二態樣,提供一種異常檢測的解釋特徵確定裝置,所述裝置包括:偏移度計算模組,用於對於輸入異常檢測模型的一個樣本,所述樣本包括至少一個樣本特徵,根據每個樣本特徵的分佈參數確定所述樣本特徵的偏移度;所述分佈參數用於表示該樣本特徵在所述異常檢測模型的訓練集資料中的分佈特點;所述異常檢測模型是無監督模型;特徵確定模組,用於根據所述樣本中的各個樣本特徵的偏移度,確定至少一個樣本特徵作為所述樣本對應的解釋特徵,所述解釋特徵用於解釋所述樣本與對應的所述異常檢測模型的模型輸出結果之間的關聯。 第三態樣,提供一種異常檢測的解釋特徵確定設備,所述設備包括記憶體、處理器及儲存在記憶體上並可在處理器上運行的電腦程式,所述處理器執行所述程式時實現以下步驟: 對於輸入異常檢測模型的一個樣本,所述樣本包括至少一個樣本特徵,根據每個樣本特徵的分佈參數確定所述樣本特徵的偏移度;所述分佈參數用於表示該樣本特徵在所述異常檢測模型的訓練集資料中的分佈特點;所述異常檢測模型是無監督模型; 根據所述樣本中的各個樣本特徵的偏移度,確定至少一個樣本特徵作為所述樣本對應的解釋特徵,所述解釋特徵用於解釋所述樣本與對應的所述異常檢測模型的模型輸出結果之間的關聯。 本說明書一個或多個實施例的異常檢測的解釋特徵確定方法和裝置,透過根據分佈參數找到異常的解釋特徵,這是基於樣本特徵的特徵值本身的資料分佈特點,來找到解釋特徵,與模型無關且不依賴於模型,因此,模型相關資訊的不完善比如樣本不平衡性不會影響到解釋特徵的檢測,並且,利用分佈參數識別解釋特徵,符合異常檢測的異常點數據分佈特點,解釋特徵獲取的準確性較高。A second aspect provides an interpretation feature determination device for abnormality detection. The device includes a deviation calculation module for inputting a sample of an abnormality detection model, the sample includes at least one sample feature, and the The distribution parameter of the sample feature determines the degree of deviation of the sample feature; the distribution parameter is used to indicate the distribution feature of the sample feature in the training set data of the anomaly detection model; the anomaly detection model is an unsupervised model; The feature determination module is configured to determine at least one sample feature as an explanatory feature corresponding to the sample according to the deviation degree of each sample feature in the sample, and the explanatory feature is used to interpret the sample and the corresponding The correlation between the model output results of the anomaly detection model. A third aspect provides an interpretation feature determination device for abnormality detection. The device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor. When the processor executes the program, Implement the following steps: For a sample input to the anomaly detection model, the sample includes at least one sample feature, and the deviation degree of the sample feature is determined according to the distribution parameter of each sample feature; the distribution parameter is used to indicate that the sample feature is in the abnormality The distribution characteristics in the training set data of the detection model; the anomaly detection model is an unsupervised model; According to the deviation degree of each sample feature in the sample, determine at least one sample feature as an explanatory feature corresponding to the sample, and the explanatory feature is used to explain the model output result of the sample and the corresponding abnormality detection model The relationship between. The method and device for determining the interpretation feature of anomaly detection in one or more embodiments of this specification find the interpretation feature of the abnormality based on the distribution parameter, which is based on the data distribution characteristics of the feature value of the sample feature, and the model It is irrelevant and does not depend on the model. Therefore, the imperfection of model-related information such as sample imbalance will not affect the detection of explanatory features, and the use of distribution parameters to identify explanatory features is in line with the abnormal point data distribution characteristics of anomaly detection, and explanatory features The accuracy of acquisition is high.
為了使本技術領域的人員更好地理解本說明書一個或多個實施例中的技術方案,下面將結合本說明書一個或多個實施例中的圖式,對本說明書一個或多個實施例中的技術方案進行清楚、完整地描述,顯然,所描述的實施例僅僅是一部分實施例,而不是全部的實施例。基於本說明書一個或多個實施例,本領域普通技術人員在沒有做出進步性勞動前提下所獲得的所有其他實施例,都應當屬於本申請案保護的範圍。
異常檢測也稱為離群點檢測,離群點是一個明顯偏離其他資料點的物件,離群點和大部分的資料不太一樣,在整體的資料當中也只是占一小部分,異常檢測需要將這些離群點從資料中分辨出來。例如,可以用於識別異常交易。
本說明書至少一個實施例提供了一種異常檢測的解釋特徵確定方法,該方法可以應用於對無監督的異常檢測模型的解釋,並且該解釋方案可以無需引入額外的解釋模型,並且也不會依賴於異常檢測模型本身。
如下對該方法描述中涉及到的部分特徵進行說明:
樣本:該樣本可以是用於作為異常檢測模型的輸入,並且可以對應一個異常檢測模型的模型輸出結果。例如,可以將A輸入異常檢測模型,並得到模型輸出的B,那麼A即為所述樣本。
樣本特徵:一個樣本可以具有至少一個樣本特徵,該樣本特徵用於描述該樣本在不同態樣的屬性性質。例如,該樣本可以是使用者標識為1100的使用者,該樣本包括的至少一個樣本特徵可以包括:該使用者的年齡、住址、工作年限等。其中,年齡是一個樣本特徵,住址可以是另一個樣本特徵。
解釋特徵:機器學習任務中,不同的模型被提出,用以對問題進行建模。除了模型的直接輸出以外,我們還需要對結果進一步的理解,例如究竟哪些特徵對模型輸出影響最大,究竟是什麼因素決定了它所對應的輸出,這就需要對模型進行相應的解釋。本說明書實施例中用“解釋特徵”來表示能夠對異常檢測模型的模型輸出結果進行解釋的特徵,該解釋特徵可以用於解釋異常檢測模型的輸入樣本和模型輸出結果之間的關聯。比如,將樣本Y1輸入異常檢測模型得到模型輸出結果D1,且確定的解釋特徵是t1和t2,那麼,樣本Y1中包括的特徵t1和t2對輸出D1的貢獻值較高,可能是由於這兩個樣本特徵t1和t2才導致得到了D1。解釋特徵可以是由上述的樣本特徵中確定的部分特徵,例如,樣本特徵可以包括F1、F2和F3,解釋特徵可以是其中的F1和F2。
在上述特徵說明的基礎上,下面描述本說明書實施例的解釋特徵確定方法。
請參見圖1所示,異常檢測的過程包括“訓練”和“預測”兩個過程。其中,在“訓練”階段可以透過訓練集資料去訓練異常檢測模型。在“預測”階段,就可以將測試集資料中的某個樣本作為該異常檢測模型的輸入,以預測該輸入的樣本是否是異常資料。而本說明書至少一個實施例提供的對異常檢測模型的解釋方案中,與上述的訓練異常檢測模型和應用該模型進行預測是無關的,即,模型的解釋和模型的訓練預測是兩個獨立運行的部分。
請繼續參見圖1,並結合圖2所示,圖2描述了一種異常檢測的解釋特徵的確定方法。其中,首先需要說明的是,該方法在解釋異常檢測模型時,採用的是局部模型解釋,即針對某一條具體樣本的預測提供相應解釋。
如圖2所示,該方法可以包括:
在步驟200中,根據異常檢測模型的訓練集資料,分別獲得所述訓練集資料中各個樣本特徵的分佈參數。
本步驟中,該異常檢測模型可以是無監督模型。
所述的訓練集資料,可以是用於訓練異常檢測模型的資料,該訓練集資料中可以包括多個樣本,每個樣本中可以包括至少一個樣本特徵。
示例性的,該樣本可以是使用者標識為1100的使用者,該樣本中包括的至少一個樣本特徵可以包括:該使用者的年齡、住址、工作年限、年收入等。
每一個樣本特徵都可以得到一個對應的分佈參數,例如,樣本特徵“年齡”對應一個分佈參數S1,樣本特徵“工作年限”對應一個分佈參數S2。
而每個樣本特徵的分佈參數的獲得,可以是由所述訓練集資料的各個樣本中分別獲取相同的樣本特徵,該相同的樣本特徵可以稱為目標樣本特徵,進而得到包括多個目標樣本特徵的目標特徵集;並根據所述目標特徵集,確定所述目標樣本特徵的分佈參數。
例如,以樣本特徵“年收入”為例,訓練集資料中可以包括多個樣本,假設包括標識為1100的使用者、標識為1101的使用者以及標識為1102的使用者。每個使用者的樣本特徵中都包括該“年收入”。可以由各個樣本中分別獲取該“年收入”樣本特徵,該特徵可以稱為目標樣本特徵。可以得到一個目標特徵集,該目標特徵集中包括上述三個使用者的“年收入”。接著可以根據該目標特徵集中的“年收入”的特徵值,確定該特徵“年收入”對應的分佈參數。
分佈參數可以用於表示樣本特徵在異常檢測模型的訓練集資料中的分佈特點。例如,在異常檢測中,多元高斯模型是一種經典演算法,其資料假設為每維特徵分佈滿足正態分佈,在這個假設之下有一個著名的3-sigma原則,在均值附近3個方差區域範圍內包含了99.7%的資料,而在這個區域以外就可以被認為是一個異常點(outlier)。當然還可以有2-sigma原則、1-sigma原則等。
上述的描述即表示了一種資料分佈特點,異常檢測所要檢測識別的異常點,由分佈特點上來看,通常是偏離大多數資料所在區域的點,而所述的大多數資料所在區域是有一定特點的,比如,在均值附近3個方差的區域範圍內。
基於上述,例如,本步驟中計算的分佈參數可以包括:樣本特徵的均值和方差。例如,均值可以用u表示,方差可以用s表示。
在步驟202中,對於輸入異常檢測模型的一個樣本,所述輸入樣本包括至少一個樣本特徵,根據每個樣本特徵的分佈參數確定所述樣本特徵的偏移度。
本步驟中,所述的樣本是測試集資料中的一個樣本,測試集資料可以包括多個樣本,每個樣本可以包括至少一個樣本特徵。如前所述的,本方法對異常檢測的解釋方案,是應用於局部模型解釋,即對每一個具體樣本的異常檢測進行解釋。
例如,樣本Y1輸入訓練完成的異常檢測模型得到模型輸出結果D1,樣本Y2輸入異常檢測模型得到模型輸出結果D2,而本方法的模型解釋應用於分別解釋Y1和D1之間的關聯、以及Y2和D2之間的關聯。比如,Y1的哪些特徵對得到結果D1的貢獻較大,Y2的哪些特徵對得到D2的貢獻較大。因此,步驟202和步驟204可以是對測試集資料中的其中一個樣本執行。
與訓練集資料類似的,測試集資料中的每一個樣本也可以包括多個樣本特徵。本步驟中,對每個樣本特徵計算其對應的偏移度,該偏移度可以是一個用於衡量該樣本特徵是否處於上述的“大多數資料所在區域”的指標。
例如,可以基於如下原則來計算偏移度:對每一維特徵,可以計算每一個新樣本偏離訓練集上均值幾倍方差的距離,偏離越多則證明資料越異常。那麼,以分佈參數為均值和方差為例,如下的公式(1)可以作為偏移度的計算公式:
n=(v-u)/s…………(1)
在上述的公式(1)中,n是偏移度,該n可以為不同的樣本特徵提供一個統一的異常衡量指標。v是樣本中的一個樣本特徵在所述樣本中的實際特徵值;u是基於訓練集資料統計得到的該樣本特徵的均值;s是基於訓練集資料統計得到的該樣本特徵的方差。根據公式(1),確定所述實際值偏離所述均值幾倍方差的距離,作為所述偏移度。
在步驟204中,根據所述樣本中的各個樣本特徵的偏移度,確定至少一個樣本特徵作為所述樣本對應的本次異常檢測的解釋特徵。
其中,所述解釋特徵用於解釋在本次異常檢測中輸入的所述樣本和模型輸出結果之間的關聯。比如,將樣本Y1輸入異常檢測模型得到模型輸出結果D1,且確定的解釋特徵是t1和t2,那麼,樣本Y1中包括該特徵t1和t2,並且,該t1和t2對輸出D1的貢獻值較高,可能是由於這兩個樣本特徵t1和t2才導致得到了模型輸出結果D1。當然,還可以在解釋特徵的基礎上進一步詳細分析本次Y1對應的異常檢測輸出結果D1的原因。
例如,解釋特徵的獲得方法可以是:根據輸入模型的樣本中的各個樣本特徵的偏移度,將所述各個樣本特徵進行降冪排列,並將排序在前預設位數的至少一個樣本特徵作為所述解釋特徵。該方法是選取了幾個偏移度較高的樣本特徵作為解釋特徵。具體實施中,不局限於該方法,例如,還可以設定偏移度閾值,將偏移度高於該閾值的樣本特徵作為解釋特徵。
上述的各個步驟,可以分別在同一設備上執行,也可以在不同設備上執行。比如,步驟200可以在一個設備執行,屬於訓練階段,即異常檢測模型的訓練階段可以包括兩個部分,一部分是常規的異常檢測模型的訓練,另一部分是根據訓練集資料得到分佈參數。而步驟202和步驟204可以在另一個設備執行(也可以同一設備),屬於模型的預測階段,即異常檢測模型的預測階段也包括兩個部分,一部分是常規的利用模型進行預測是否異常,另一部分是根據分佈參數得到解釋特徵。在每個階段,訓練階段或者預測階段,模型解釋方案和模型的訓練預測方案,可以是獨立運行。當然,也可以是一邊訓練一邊計算分佈參數,或者一邊預測一邊根據輸入樣本計算解釋特徵。
本說明書至少一個實施例的異常檢測的解釋特徵的確定方法,透過根據分佈參數找到異常的解釋特徵,這是基於樣本特徵的特徵值本身的資料分佈特點,來找到解釋特徵,與模型無關且不依賴於模型,因此,模型相關資訊的不完善比如樣本不平衡性不會影響到解釋特徵的檢測,並且,利用分佈參數識別解釋特徵,符合異常檢測的異常點數據分佈特點,解釋特徵獲取的準確性較高。
圖3為本說明書一個或多個實施例提供的一種異常檢測的解釋特徵確定裝置,如圖3所示,該裝置可以包括:偏移度計算模組31和特徵確定模組32。
偏移度計算模組31,用於對於輸入異常檢測模型的一個樣本,所述樣本包括至少一個樣本特徵,根據每個樣本特徵的分佈參數確定所述樣本特徵的偏移度;所述分佈參數用於表示該樣本特徵在所述異常檢測模型的訓練集資料中的分佈特點;所述異常檢測模型是無監督模型;
特徵確定模組32,用於根據所述樣本中的各個樣本特徵的偏移度,確定至少一個樣本特徵作為所述樣本對應的解釋特徵,所述解釋特徵用於解釋所述樣本與對應的所述異常檢測模型的模型輸出結果之間的關聯。
圖4為本說明書一個或多個實施例提供的另一種異常檢測的解釋特徵確定裝置,如圖4所示,該裝置在圖3所示結構的基礎上,還可以包括:分佈計算模組33。
分佈計算模組33,用於由訓練集資料的各個樣本中分別獲取目標樣本特徵,得到包括多個目標樣本特徵的目標特徵集;根據所述目標特徵集,確定所述目標樣本特徵的分佈參數;所述訓練集資料包括多個樣本,每個樣本包括至少一個樣本特徵。
在另一個例子中,偏移度計算模組31,具體用於:對於所述異常檢測模型的測試集資料中所述樣本的其中一個樣本特徵,確定所述樣本特徵在所述樣本中的實際值;獲取所述樣本特徵在訓練集資料中的均值;確定所述實際值偏離所述均值幾倍方差的距離,作為所述偏移度;所述分佈參數包括:所述樣本特徵的均值和方差。
本說明書至少一個實施例還提供了一種異常檢測的解釋特徵確定設備,所述設備包括記憶體、處理器及儲存在記憶體上並可在處理器上運行的電腦程式,所述處理器執行所述程式時實現以下步驟:
對於輸入異常檢測模型的一個樣本,所述樣本包括至少一個樣本特徵,根據每個樣本特徵的分佈參數確定所述樣本特徵的偏移度;所述分佈參數用於表示該樣本特徵在所述異常檢測模型的訓練集資料中的分佈特點;所述異常檢測模型是無監督模型;
根據所述樣本中的各個樣本特徵的偏移度,確定至少一個樣本特徵作為所述樣本對應的解釋特徵,所述解釋特徵用於解釋所述樣本與對應的所述異常檢測模型的模型輸出結果之間的關聯。
上述方法實施例中所示流程中的各個步驟,其執行順序不限制於流程圖中的順序。此外,各個步驟的描述,可以實現為軟體、硬體或者其結合的形式,例如,本領域技術人員可以將其實現為軟體代碼的形式,可以為能夠實現所述步驟對應的邏輯功能的電腦可執行指令。當其以軟體的方式實現時,所述的可執行指令可以儲存在記憶體中,並被設備中的處理器執行。
上述實施例闡明的裝置或模組,具體可以由電腦晶片或實體實現,或者由具有某種功能的產品來實現。一種典型的實現設備為電腦,電腦的具體形式可以是個人電腦、膝上型電腦、蜂巢式電話、相機電話、智慧型電話、個人數位助理、媒體播放機、導航設備、電子郵件收發設備、遊戲控制台、平板電腦、可穿戴設備或者這些設備中的任意幾種設備的組合。
為了描述的方便,描述以上裝置時以功能分為各種模組分別描述。當然,在實施本說明書一個或多個實施例時可以把各模組的功能在同一個或多個軟體和/或硬體中實現。
本領域內的技術人員應明白,本說明書一個或多個實施例可提供為方法、系統、或電腦程式產品。因此,本說明書一個或多個實施例可採用完全硬體實施例、完全軟體實施例、或結合軟體和硬體態樣的實施例的形式。而且,本說明書一個或多個實施例可採用在一個或多個其中包含有電腦可用程式碼的電腦可用儲存媒體(包括但不限於磁碟記憶體、CD-ROM、光學記憶體等)上實施的電腦程式產品的形式。
這些電腦程式指令也可儲存在能引導電腦或其他可編程資料處理設備以特定方式工作的電腦可讀記憶體中,使得儲存在該電腦可讀記憶體中的指令產生包括指令裝置的製造品,該指令裝置實現在流程圖一個流程或多個流程和/或方塊圖一個方塊或多個方塊中指定的功能。
這些電腦程式指令也可裝載到電腦或其他可編程資料處理設備上,使得在電腦或其他可編程設備上執行一系列操作步驟以產生電腦實現的處理,從而在電腦或其他可編程設備上執行的指令提供用於實現在流程圖一個流程或多個流程和/或方塊圖一個方塊或多個方塊中指定的功能的步驟。
還需要說明的是,術語“包括”、“包含”或者其任何其他變體意在涵蓋非排他性的包含,從而使得包括一系列要素的過程、方法、商品或者設備不僅包括那些要素,而且還包括沒有明確列出的其他要素,或者是還包括為這種過程、方法、商品或者設備所固有的要素。在沒有更多限制的情況下,由語句“包括一個……”限定的要素,並不排除在包括所述要素的過程、方法、商品或者設備中還存在另外的相同要素。
本說明書一個或多個實施例可以在由電腦執行的電腦可執行指令的一般上下文中描述,例如程式模組。一般地,程式模組包括執行特定任務或實現特定抽象資料類型的常式、程式、物件、組件、資料結構等等。也可以在分散式運算環境中實踐本說明書一個或多個實施例,在這些分散式運算環境中,由透過通信網路而被連接的遠端處理設備來執行任務。在分散式運算環境中,程式模組可以位於包括儲存設備在內的本地和遠端電腦儲存媒體中。
本說明書中的各個實施例均採用漸進的方式描述,各個實施例之間相同相似的部分互相參見即可,每個實施例重點說明的都是與其他實施例的不同之處。尤其,對於資料獲取設備或者資料處理設備實施例而言,由於其基本相似於方法實施例,所以描述的比較簡單,相關之處參見方法實施例的部分說明即可。
上述對本說明書特定實施例進行了描述。其它實施例在所附申請專利範圍的範圍內。在一些情況下,在申請專利範圍中記載的動作或步驟可以按照不同於實施例中的順序來執行並且仍然可以實現期望的結果。另外,在圖式中描繪的過程不一定要求示出的特定順序或者連續順序才能實現期望的結果。在某些實施方式中,多工處理和並行處理也是可以的或者可能是有利的。
以上所述僅為本說明書一個或多個實施例的較佳實施例而已,並不用以限制本公開,凡在本公開的精神和原則之內,所做的任何修改、等同替換、改進等,均應包含在本公開保護的範圍之內。
In order to enable those skilled in the art to better understand the technical solutions in one or more embodiments of this specification, the following will combine the drawings in one or more embodiments of this specification to compare the The technical solution is described clearly and completely. Obviously, the described embodiments are only a part of the embodiments, rather than all of the embodiments. Based on one or more embodiments of this specification, all other embodiments obtained by a person of ordinary skill in the art without making progressive work shall fall within the protection scope of this application.
Anomaly detection is also called outlier detection. An outlier is an object that clearly deviates from other data points. The outlier is not the same as most data, and it only accounts for a small part of the overall data. Anomaly detection needs Distinguish these outliers from the data. For example, it can be used to identify abnormal transactions.
At least one embodiment of this specification provides a method for determining interpretation characteristics of anomaly detection, which can be applied to the interpretation of unsupervised anomaly detection models, and the interpretation scheme may not need to introduce additional interpretation models, and will not rely on The anomaly detection model itself.
Some features involved in the method description are explained as follows:
Sample: The sample can be used as the input of an anomaly detection model, and can correspond to the model output result of an anomaly detection model. For example, A can be input into the anomaly detection model, and B output by the model is obtained, then A is the sample.
Sample feature: A sample can have at least one sample feature, which is used to describe the properties of the sample in different states. For example, the sample may be a user whose user ID is 1100, and at least one sample characteristic included in the sample may include: the user's age, address, working years, and so on. Among them, age is a sample feature, and address can be another sample feature.
Explaining features: In machine learning tasks, different models are proposed to model the problem. In addition to the direct output of the model, we also need to further understand the results, such as which features have the greatest impact on the model output, and what factors determine its corresponding output, which requires corresponding interpretation of the model. In the embodiments of this specification, “interpretation features” are used to refer to features that can interpret the model output results of the anomaly detection model, and the interpretation features can be used to explain the relationship between the input samples of the anomaly detection model and the model output results. For example, input sample Y1 into the anomaly detection model to obtain model output result D1, and the determined explanatory features are t1 and t2, then the features t1 and t2 included in sample Y1 have a higher contribution to output D1, which may be due to these two Only the sample characteristics t1 and t2 lead to D1. The explanatory feature may be a partial feature determined from the above-mentioned sample features. For example, the sample feature may include F1, F2, and F3, and the explanatory feature may be F1 and F2 among them.
On the basis of the above-mentioned feature description, the method for determining the interpretation feature of the embodiment of this specification is described below.
As shown in Figure 1, the process of anomaly detection includes two processes: "training" and "prediction". Among them, in the "training" stage, the anomaly detection model can be trained through the training set data. In the "prediction" stage, a sample of the test set data can be used as the input of the anomaly detection model to predict whether the input sample is anomalous data. However, in the interpretation scheme for the anomaly detection model provided by at least one embodiment of this specification, it is irrelevant to the above-mentioned training anomaly detection model and the application of the model for prediction, that is, the interpretation of the model and the training prediction of the model are two independent operations. part.
Please continue to refer to Fig. 1 in combination with Fig. 2, which describes a method for determining the interpretation feature of anomaly detection. Among them, the first thing that needs to be explained is that when this method interprets the anomaly detection model, it uses a local model interpretation, that is, provides a corresponding explanation for the prediction of a specific sample.
As shown in Figure 2, the method may include:
In
31:偏移度計算模組 32:特徵確定模組 33:分佈計算模組 200:步驟 202:步驟 204:步驟31: Offset calculation module 32: feature determination module 33: Distributed Computing Module 200: step 202: Step 204: Step
為了更清楚地說明本說明書一個或多個實施例或現有技術中的技術方案,下面將對實施例或現有技術描述中所需要使用的圖式作簡單地介紹,顯而易見地,下面描述中的圖式僅僅是本說明書一個或多個實施例中記載的一些實施例,對於本領域普通技術人員來講,在不付出進步性勞動性的前提下,還可以根據這些圖式獲得其他的圖式。 圖1為本說明書一個或多個實施例提供的異常檢測的原理示意圖; 圖2為本說明書一個或多個實施例提供的異常檢測的解釋特徵的確定方法; 圖3為本說明書一個或多個實施例提供的一種異常檢測的解釋特徵的確定裝置的結構示意圖; 圖4為本說明書一個或多個實施例提供的另一種異常檢測的解釋特徵的確定裝置的結構示意圖。In order to more clearly describe the technical solutions in one or more embodiments of this specification or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or prior art. Obviously, the figures in the following description The formulas are only some of the embodiments recorded in one or more embodiments of this specification. For those of ordinary skill in the art, other schemas can be obtained based on these schemas without making progressive labor. FIG. 1 is a schematic diagram of the principle of anomaly detection provided by one or more embodiments of this specification; Figure 2 is a method for determining the interpretation feature of anomaly detection provided by one or more embodiments of this specification; FIG. 3 is a schematic structural diagram of a device for determining interpretation features of anomaly detection provided by one or more embodiments of this specification; Fig. 4 is a schematic structural diagram of another device for determining interpretation features of anomaly detection provided by one or more embodiments of this specification.
Claims (10)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811208609.2A CN109583470A (en) | 2018-10-17 | 2018-10-17 | A kind of explanation feature of abnormality detection determines method and apparatus |
CN201811208609.2 | 2018-10-17 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202044111A TW202044111A (en) | 2020-12-01 |
TWI723476B true TWI723476B (en) | 2021-04-01 |
Family
ID=65920123
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW108126301A TWI723476B (en) | 2018-10-17 | 2019-07-25 | Interpretation feature determination method, device and equipment for abnormal detection |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN109583470A (en) |
TW (1) | TWI723476B (en) |
WO (1) | WO2020078059A1 (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583470A (en) * | 2018-10-17 | 2019-04-05 | 阿里巴巴集团控股有限公司 | A kind of explanation feature of abnormality detection determines method and apparatus |
CN112148763A (en) * | 2019-06-28 | 2020-12-29 | 京东数字科技控股有限公司 | Unsupervised data anomaly detection method and device and storage medium |
CN111027607B (en) * | 2019-11-29 | 2023-10-17 | 泰康保险集团股份有限公司 | Unsupervised high-dimensional data feature importance assessment and selection method and device |
CN111340102B (en) * | 2020-02-24 | 2022-03-01 | 支付宝(杭州)信息技术有限公司 | Method and apparatus for evaluating model interpretation tools |
CN111262887B (en) * | 2020-04-26 | 2020-08-28 | 腾讯科技(深圳)有限公司 | Network risk detection method, device, equipment and medium based on object characteristics |
CN111767938B (en) * | 2020-05-09 | 2023-12-19 | 北京奇艺世纪科技有限公司 | Abnormal data detection method and device and electronic equipment |
CN116130095B (en) * | 2023-04-04 | 2023-07-11 | 深圳市金瑞铭科技有限公司 | State monitoring method and device based on sensing technology and storage medium |
CN116304641B (en) * | 2023-05-15 | 2023-09-15 | 山东省计算中心(国家超级计算济南中心) | Anomaly detection interpretation method and system based on reference point search and feature interaction |
CN116881724B (en) * | 2023-09-07 | 2023-12-19 | 中国电子科技集团公司第十五研究所 | Sample labeling method, device and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140309886A1 (en) * | 2013-04-15 | 2014-10-16 | Flextronics Ap, Llc | Splitting mission critical systems and infotainment between operating systems |
TW201816530A (en) * | 2016-09-27 | 2018-05-01 | 日商東京威力科創股份有限公司 | Abnormality detection program, abnormality detection method and abnormality detection device |
CN108108743A (en) * | 2016-11-24 | 2018-06-01 | 百度在线网络技术(北京)有限公司 | Abnormal user recognition methods and the device for identifying abnormal user |
TW201831881A (en) * | 2012-07-25 | 2018-09-01 | 美商提拉諾斯股份有限公司 | Image analysis and measurement of biological samples |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776641B (en) * | 2015-11-24 | 2020-09-08 | 华为技术有限公司 | Data processing method and device |
CN108038211A (en) * | 2017-12-13 | 2018-05-15 | 南京大学 | A kind of unsupervised relation data method for detecting abnormality based on context |
CN108512827B (en) * | 2018-02-09 | 2021-09-21 | 世纪龙信息网络有限责任公司 | Method, device, equipment and storage medium for establishing abnormal login identification and supervised learning model |
CN109583470A (en) * | 2018-10-17 | 2019-04-05 | 阿里巴巴集团控股有限公司 | A kind of explanation feature of abnormality detection determines method and apparatus |
-
2018
- 2018-10-17 CN CN201811208609.2A patent/CN109583470A/en active Pending
-
2019
- 2019-07-23 WO PCT/CN2019/097171 patent/WO2020078059A1/en active Application Filing
- 2019-07-25 TW TW108126301A patent/TWI723476B/en active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201831881A (en) * | 2012-07-25 | 2018-09-01 | 美商提拉諾斯股份有限公司 | Image analysis and measurement of biological samples |
US20140309886A1 (en) * | 2013-04-15 | 2014-10-16 | Flextronics Ap, Llc | Splitting mission critical systems and infotainment between operating systems |
TW201816530A (en) * | 2016-09-27 | 2018-05-01 | 日商東京威力科創股份有限公司 | Abnormality detection program, abnormality detection method and abnormality detection device |
CN108108743A (en) * | 2016-11-24 | 2018-06-01 | 百度在线网络技术(北京)有限公司 | Abnormal user recognition methods and the device for identifying abnormal user |
Also Published As
Publication number | Publication date |
---|---|
TW202044111A (en) | 2020-12-01 |
WO2020078059A1 (en) | 2020-04-23 |
CN109583470A (en) | 2019-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI723476B (en) | Interpretation feature determination method, device and equipment for abnormal detection | |
US10817394B2 (en) | Anomaly diagnosis method and anomaly diagnosis apparatus | |
Wang et al. | FD4C: Automatic fault diagnosis framework for Web applications in cloud computing | |
US20180196837A1 (en) | Root cause analysis of performance problems | |
Arif et al. | A data mining approach for developing quality prediction model in multi-stage manufacturing | |
US20200342340A1 (en) | Techniques to use machine learning for risk management | |
JP2020501232A (en) | Risk control event automatic processing method and apparatus | |
US10311067B2 (en) | Device and method for classifying and searching data | |
CN112182508A (en) | Abnormity monitoring method and device for compliance business indexes | |
JPWO2014132612A1 (en) | System analysis apparatus and system analysis method | |
US20160162759A1 (en) | Abnormal pattern analysis method, abnormal pattern analysis apparatus performing the same and storage medium storing the same | |
Lee et al. | Assessing the lifetime performance index of exponential products with step-stress accelerated life-testing data | |
WO2021120845A1 (en) | Homogeneous risk unit feature set generation method, apparatus and device, and medium | |
EP3036655A1 (en) | K-nearest neighbor-based method and system to provide multi-variate analysis on tool process data | |
Grbac et al. | Stability of software defect prediction in relation to levels of data imbalance | |
JP2022177322A (en) | System and method for explanation of condition predictions in complex systems | |
Yuan et al. | Enhancing deep learning-based vulnerability detection by building behavior graph model | |
Gupta et al. | Eagle: User profile-based anomaly detection for securing Hadoop clusters | |
Kim et al. | An adaptive step-down procedure for fault variable identification | |
CN114297665A (en) | Intelligent contract vulnerability detection method and device based on deep learning | |
Genc | Sensitivity analysis on PROMETHEE and TOPSIS weights | |
US20180176108A1 (en) | State information completion using context graphs | |
CN108073629B (en) | Method and device for identifying purchase mode through website access data | |
Saroha et al. | Software effort estimation using enhanced use case point model | |
WO2016053231A1 (en) | Retain data above threshold |