๐Ÿ“‘Paper Review

[paper reivew] Multivariate Time-series Anomaly Detection via Graph Attention Network

date
Feb 21, 2024
slug
Multivariate-Time-series-Anomaly-Detection-via-Graph-Attention-Network
author
status
Public
tags
paper
DeepLearning
summary
type
Post
thumbnail
์บก์ฒ˜.PNG
category
๐Ÿ“‘Paper Review
updatedAt
Sep 6, 2024 01:46 PM

Abstract

๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋Š” ์‚ฐ์—… ๋ถ„์•ผ์—์„œ ๋‹ค์–‘ํ•˜๊ฒŒ ํ™œ์šฉ์ด ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ๋ณ€์ˆ˜๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ์ •์˜ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์–ด์„œ, ์ž˜๋ชป๋œ ์˜ˆ์ธก์„ ํ•  ์ˆ˜ ์žˆ๋‹ค. (false alarm)
์ด์— ๋ณธ ๋…ผ๋ฌธ์˜ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.
๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด์˜ ๋ณต์žกํ•œ ์˜์กด์„ฑ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋‹จ๋ณ€๋Ÿ‰ ๋ณ€์ˆ˜๋ฅผ ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ๊ฐ€์ •ํ•˜๊ณ , ๊ฐ๊ฐ ๋‘๊ฐœ์˜ graph attention layer๋ฅผ ๋ณ‘๋ ฌ๋กœ ๋ฐฐ์น˜ํ•จ
forecasting-based model ๊ณผ reconstruction based model์„ jointly optimizeํ•˜์—ฌ, single timestamp prediction๊ณผ ์ „์ฒด tieme-series์˜ reconstruction์„ ๊ณ ๋ คํ•˜๋Š” time-series representation์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•จ
ย 

Introduction

notion image
  1. Malhotra et.al์€ LSTM ๊ธฐ๋ฐ˜์˜ encoder - decoder network๋ฅผ ์ œ์‹œํ•˜๊ณ , multiple sensor๋“ค์—์„œ ์—๋Ÿฌ๋ฅผ ๋””ํ…ํŒ…ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ
  1. Hundman et. al์€ prediction error๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด ์ด์ƒ์น˜ ํƒ์ง€ ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰
  1. OmniAnomaly๋Š” stochastic recurrent neuarl network๋ฅผ ์‚ฌ์šฉ, latent variable์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด modelling data ๋ถ„ํฌ์˜ normal pattern์„ ํš๋“
  1. ์œ„ 3๊ฐœ์˜ ์˜ˆ์‹œ์™€ ๊ฐ™์ด ์‹œ๊ณ„์—ด ์ด์ƒ์น˜ ํƒ์ง€๋ฅผ ์œ„ํ•ด, ๋ณ€์ˆ˜๊ฐ„ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”

์†”๋ฃจ์…˜

MTAD-GAT(Multivariate Time-series Anomaly Detection via Graph Attention Network)

contribution

  1. self-supervised ๋ฅผ ํ†ตํ•ด multivariate-time series ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ์–ด ๋‹น์‹œ SOTA score ๋‹ฌ์„ฑ
  1. 2๊ฐœ์˜ parallel ํ•œ graph attention (GAT)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค๋ฅธ time-series์™€ timestamp์˜ relationship์„ ์ด์ „์˜ ์ •๋ณด๊ฐ€ ๋ถˆํ•„์š”ํ•œ ์ƒํƒœ์—์„œ ์ฐพ์Œ
  1. forecasting(single time-stamp prediction ์— ๋ชฉ์ ) ๊ณผ reconstruction(entire time-series ์˜ latent representation ์— ๋ชฉ์ ) ๊ธฐ๋ฐ˜์˜ ๋ชจ๋ธ์„ joint optimizing ํ•จ
  1. ํ•ด์„ ๊ฐ€๋Šฅํ•˜๊ณ , ์ง๊ด€์ ์ธ attention score ์‚ฌ์šฉ
ย 

Related work

  1. Univariate Anomaly Detection
    1. Hypothesis testing, waveltet analysis, SVD, ARIMA
    2. Netflix : PCA๋ฅผ ํ†ตํ•œ ์ด์ƒ์น˜ ํƒ์ง€
    3. Twitter : Seasonal Hybrid Exteme Studyt Deviation test
    4. DONUT : unsupervised anomaly detecion method based on Cariational Auto Encoder and SR-CNN
  1. Multivariate Anomaly Detection
    1. Forecasting-based Models : prediction error ๋ฅผ ํ†ตํ•ด ์ด์ƒ์น˜ ํƒ์ง€
      1. LSTM-NDT : unsupervised ๋ฐ non-parametric thresholding์„ ์ œ์‹œ
      2. Ding et. al : Hierarchical Temporal Memory(HTM) and Bayesian Netwrok ๊ธฐ๋ฐ˜ real-time ์ด์ƒ์น˜ ํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜
      3. Gugulothu et. al : non-temporal dimension reduction ๋ฐ recurrent auto-encoder๋ฅผ ํ†ตํ•ด end to end learning framework ์ œ์‹œ
      4. DAGMM : temporal dependency๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ์ด์ƒ์น˜ ํƒ์ง€ ๊ณ ๋ คํ•˜์—ฌ ๋‹จ์ง€ ์ „์ฒด์˜ observation์„ ์ธํ’‹์œผ๋กœ ํ•จ
    2. Reconstruction-based Models : Latent variable์„ ํ†ตํ•ด entire time-series๋ฅผ ๋ณต์›ํ•˜๋„๋ก ํ•™์Šต
      1. Pankaj et. al : LSTM-based Encoder-Decoder framework๋กœ normal ์‹œ๊ณ„์—ด data์˜ representation ํ•™์Šต
      2. Kitsune : unsupervised๋กœ feature๋ฅผ instance๋กœ mapping์„ ํ•˜๊ณ  decoder๋ฅผ ํ†ตํ•ด reconstruction
      3. MAD-GAN : entire variable set ์„ ๊ณ ๋ คํ•˜์—ฌ latent interactions๋ฅผ capture
      4. GAN-Li : GAN์œผ๋กœ ํ•™์Šต๋œ discriminator์— ์‹ค์ œ์™€ fake๋ฅผ ๋„ฃ์–ด residuals์„ ๋งŒ๋“ฆ
      5. LSTM-VAE : LSTM์„ variational auto-encoder์— integrates ํ•˜์—ฌ singals๋ฅผ fuse ํ•˜๊ณ  ๋ถ„ํฌ๋ฅผ ์žฌ๊ตฌ์„ฑ, temporal dependencies๋ฅผ latent space์— LSTM encoder๋ฅผ ํ†ตํ•ด ์••์ถ•
      6. OmniAnomaly : deterministic method๋กœ unpredictable instances ๋•Œ๋ฌธ์— ์‹ค์ˆ˜ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์œผ๋ฏ€๋กœ stochastic model์„ ์ œ์‹œํ•˜์—ฌ ๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด ์ด์ƒ์น˜ํƒ์ง€๋ฅผ ์ง„ํ–‰ํ•จ. ์ •์ƒ pattern์„ stochastic variable connection๊ณผ plannar normalizing flow๋ฅผ ํ†ตํ•ด normal pattern ์„ captureํ•จ

Methodology

  1. overview
    1. notion image
    2. ์ฒซ๋ฒˆ์งธ ๋ ˆ์ด์–ด์— 1-D convolution์„ ์ปค๋„ ์‚ฌ์ด์ฆˆ 7์„ ํ™œ์šฉํ•˜์—ฌ ๊ณ ์ฐจ์›์˜ feature ์ถ”์ถœํ•จ
    3. ๊ณผ์ • 1์˜ feature๋ฅผ 2๊ฐ€์ง€ GAT layer์— input์œผ๋กœ ์‚ฌ์šฉ
    4. 2 GAT layer output์„ concatenate ํ•œ ๋’ค GRU ์— feed
    5. GRU output์„ forecasting ๊ณผ reconstruction model์— ๋„ฃ์–ด์„œ final result๋ฅผ ๋„์ถœ
    6. ย 
  1. Data Preprocessing
    1. Data normalization
      1. notion image
    2. Data cleaning
      1. Spectral Residual
        1. FFT ํ™œ์šฉํ•˜์—ฌ, ๊ฐ ์œˆ๋„์šฐ ๋‚ด์˜ ์‹ ํ˜ธ์—์„œ ํ‰๊ท  spectrum์„ ๋นผ๊ณ , ์—ญ๋ณ€ํ™˜ํ•˜์—ฌ Saliency Map ์ƒ์„ฑ
          SR-CNN์ฒ˜๋Ÿผ, Saliency Map์˜ ์ผ์ • threshold ์ด์ƒ์ธ ๊ฐ’์„ anomaly๋กœ ๋ผ๋ฒจ๋ง
          ํ•ด๋‹น ๋ผ๋ฒจ์ด ๋ถ™์€ ๊ฐ’์€ ์ฃผ๋ณ€ ๊ฐ’์œผ๋กœ imputationํ•˜์—ฌ cleaning
          ย 
  1. Graph attention
    1. notion image
      notion image
    2. Feature-oriented graph attention layer
      1. ๋‹ค๋ณ€๋Ÿ‰์˜ ์‹œ๊ณ„์—ด์„ graph๋กœ ๋‹ค๋ฃจ๊ณ , ๊ฐ node๋Š” ํŠน์ • feature๋ฅผ represent
      2. ๊ฐ edge๋Š” 2๊ฐœ์˜ ์ƒํ˜ธ์ž‘์šฉ feature์˜ relationship
      3. ๊ฐ ๋…ธ๋“œ xi = {xi,t|t โˆˆ [0, n]} ์ด๋ฏ€๋กœ, k ๋…ธ๋“œ๊ฐ€ ์žˆ๊ณ , timestamp์˜ ์ˆ˜๋Š” n, k๋Š” mulicariate features์˜ ์ˆ˜
    3. Time-oriented graph attention layer
      1. temporal dependency๋ฅผ captureํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ, ๋ชจ๋“  ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ์Šฌ๋ผ์ด๋”ฉํ•˜์—ฌ complete graphํ™”
      2. node xt๋Š” time stamp t์—์„œ์˜ feature vector๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ ์ฃผ๋ณ€์˜ ๋…ธ๋“œ๋Š” ํ˜„์žฌ ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ์˜ ๋ชจ๋“  ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ํฌํ•จ
    4. ์ตœ์ข…์ ์œผ๋กœ feature oriented graph attention layer์˜ ์•„์›ƒํ’‹ (kxn์˜ ํ–‰๋ ฌ)๊ณผ, time-oriented graph attention layer์˜ ์•„์›ƒํ’‹ (nxk์˜ ํฌ๊ธฐ)๊ณผ, ์›๋ณธ input์„ concatenateํ•˜์—ฌ nx3k์˜ ํฌ๊ธฐ์˜ final node feature ์ƒ์„ฑ
    5. GRU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ final node feature๋ฅผ Joint Optimization (forecasting, Reconstruction)
    6. ย 
  1. Joint Optimization
    1. Forecasting-based model
      1. notion image
      2. ๋‹ค์Œ ์‹œ์ ์˜ ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šต
      3. anomaly score๋ฅผ ์‚ฐ์ถœ ํ•  ๋•Œ, ๋ณ€ํ™”์— ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘
        1. ย 
    2. Reconstruction-based model
      1. notion image
      2. a์˜ ๋ฐ์ดํ„ฐ ๋ณ€ํ™”์— ๋Œ€ํ•œ ๊ฐ•๊ฑด์„ฑ ๋ฌธ์ œ ์™„ํ™”
      3. input ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ ์ถ”์ •
      4. b๋งŒ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ์ด์ƒ์น˜๊ฐ€ ํ•™์Šต๋œ ๋ถ„ํฌ ์•ˆ์— ํฌํ•จ๋  ๋•Œ ํฌ์ฐฉ ๋ถˆ๊ฐ€
      5. ย 
    3. Model Inference
    4. notion image
      ((์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์˜ ์ฐจ์ด) ** 2 + * (reconstrction based model ์—์„œ abnormal ํ™•๋ฅ )) / 1+

Experiments

ย 
  1. Datasets
    1. SMAP (Soil Moisture Active Passive satellite) MSL (Mars Science Laboratory rover) TSA (Time Series Anomaly detection system) *
  1. Metrics
    1. precision, recall and F1-score, AUC scores
  1. Setup
    1. window size = 100
    2. ์€ grid search๋ฅผ ํ†ตํ•ด ์„ค์ • (0.8)
    3. adam, 100 epoch, 0.001 learning rate
  1. Comparison with SOTAs
    1. notion image
  1. Evaluation with Different Delays
    1. delay delta ๊ฐ€ 10์ผ๋•Œ 53.98%, 13.04%, 19.93% ์˜ ์ฐจ์ด๋ฅผ ๋ณด์ž„

Analyses

  1. Effectiveness of Graph Attention
    1. notion image
      notion image
  1. Effectiveness of Joint Optimization
    1. notion image
      Forcasting based model์€ ์‹œ๊ณ„์—ด์˜ ๋žœ๋ค์„ฑ์— ๊ฐ•๊ฑดํ•˜๊ณ , Reconstruction based model์€ ์žก์Œ, ๋ฐ์ดํ„ฐ ๋ณ€ํ™”์— ๊ฐ•๊ฑดํ•˜์—ฌ, joint optimizationํ•˜๋Š” ๊ฒƒ์ด f1 score ๊ฐ€ ๋†’๊ณ , anomaly detection์— ํšจ๊ณผ์ ์ž„์„ ์ž…์ฆ
  1. Analysis of
    1. notion image
ย 

Case Study

  1. ๋‹จ๋ณ€๋Ÿ‰์ผ ๊ฒฝ์šฐ ๋ช‡๋ช‡ feature๋ฅผ ์ด์ƒ์น˜๋กœ ํƒ์ง€ ํ–ˆ๊ฒ ์ง€๋งŒ, ๋ณธ ๋ชจ๋ธ์€ feature๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ๊ณ ๋ คํ–ˆ์„ ๋•Œ, ์ „์ฒด ๋…ธ๋“œ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ๋ณ€ํ•˜์ง€ ์•Š์•˜๊ธฐ์— ์ •์ƒ์œผ๋กœ ๋ถ„๋ฅ˜ํ–ˆ๊ณ , ์‹ค์ œ ์ •์ƒ์ธ ๊ฒฝ์šฐ
    1. notion image
  1. ๋‹ค๋ฅธ feature๋Š” ์ •์ƒ์ด์ง€๋งŒ,Checkpoint์™€ CPU๊ฐ€ ๋น„์ •์ƒ ํŒจํ„ด์„ ๋ณด์ด๊ณ , ๊ทธ ํŒจํ„ด์ด ํ•™์Šต๋œ ์ ์ด ์—†๊ธฐ์— ์ •์ƒ ์ƒํ™ฉ์ž„์—๋„ ์ด์ƒ์น˜๋กœ ์˜ค๋ถ„๋ฅ˜ํ•œ ๊ฒฝ์šฐ
notion image

Conclusion

framework ์ œ์‹œ, ์„ฑ๋Šฅ ๊ฒ€์ฆ, ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋ก  ์ฆ๋ช…
์ถ”๊ฐ€์ ์œผ๋กœ ๋„๋ฉ”์ธ ์ง€์‹ ๋ฐ ํ”ผ๋“œ๋ฐฑ์„ ํ†ตํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ฐ€๋Šฅํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€
ย 
ย 
ํ™œ์šฉ ๋ฐฉ์•ˆ :
  1. ๋ณธ ์—ฐ๊ตฌ์˜ GAT๋ฅผ ํ™œ์šฉํ•˜์—ฌ, ์ปดํ“จํ„ฐ ์ž์›๊ฐ„ ๊ด€๊ณ„, ์„œ๋ฒ„๊ฐ„ ๊ด€๊ณ„, host๊ฐ„ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€.
  1. ๋ฏธ๋ž˜ ์‹œ์ ์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ฐ’์„ ํ™œ์šฉํ•˜์—ฌ ๋ฏธ๋ž˜ ์‹œ์ ์— ๋Œ€ํ•œ ์ด์ƒ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จ ํ•  ์ˆ˜ ์žˆ๋„๋ก ์‹คํ—˜ ์„ค๊ณ„ ํ•„์š”.
์ด ๋ฐฉ๋ฒ• ๋˜ํ•œ ๋ฐ์ดํ„ฐ์— ๋น„์ •์ƒ์„ฑ, ๊ณ„์ ˆ์„ฑ์ด ํฌ์ฐฉ๋œ๋‹ค๋ฉด, ๋ชจ์ง‘๋‹จ์— ์žˆ๋Š” ๋ชจ๋“  ํŒจํ„ด์„ ํ•™์Šต ํ•  ์ˆ˜๋Š” ์—†๊ธฐ์— ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณ€ํ™”ํ•˜๋Š” ๋…ธ๋“œ ๊ฐ„ ๊ด€๊ณ„๋ฅผ ๊ณ ๋ คํ•˜๊ธฐ ์–ด๋ ต๊ณ , ๋จผ ๋ฏธ๋ž˜์— ๋Œ€ํ•œ ์ด์ƒ ํƒ์ง€๋Š” ์–ด๋ ค์šธ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐ.
์„œ๋ฒ„๊ฐ€ ์–ด๋Š์ •๋„ ์•ˆ์ •ํ™” ๋œ ์ƒํƒœ์—์„œ ์ธก์ •ํ•œ ๋ฐ์ดํ„ฐ ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์ „์ฒด ์ถ”์ด๋ฅผ ํ•™์Šต ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ ํ•„์š”.
ย 
ย 
MTAD-GAT ์‹คํ—˜
ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •
Class SlidingWindowDataset(Dataset) : ๋ฐ์ดํ„ฐ๋ฅผ ์„ค์ •๋œ window_size๋กœ ์Šฌ๋ผ์ด๋”ฉํ•˜์—ฌ ์ƒ์„ฑ
window_size๋Š” โ€˜arg.lookbackโ€™ ์œผ๋กœ ์ง€์ •, default๋Š” 100 โ†’ 60์œผ๋กœ ๋ณ€๊ฒฝ
okestro ๋ฐ์ดํ„ฐ์˜ ๋„๋ฉ”์ธ ๊ธฐ๋ฐ˜์œผ๋กœ ์„ ์ •๋œ feature 12๊ฐœ๋ฅผ data_dim์œผ๋กœ ์„ค์ •
notion image
okestro ๋ฐ์ดํ„ฐ์˜ ์ข…์†๋ณ€์ˆ˜ 7๊ฐœ๋ฅผ target_dim์œผ๋กœ ์„ค์ •
notion image
ย 
ย 
์ฒ˜์Œ๋ถ€ํ„ฐ ์žฌ๊ตฌ์ถ• ํ•œ ๊ฒฝ์šฐ
  1. โ€˜x = window + 1โ€™ ๊นŒ์ง€๋กœ ์„ค์ • ํ›„ recon = x-1, reg = window_size๋กœ ์„ค์ • (data loader ์žฌ๊ตฌ์ถ•)
  1. forecast ๋Š” ๋งˆ์ง€๋ง‰ ์ธ๋ฑ์Šค, reconstruct๋Š” ๋งˆ์ง€๋ง‰ ์ธ๋ฑ์Šค ํ•˜๋‚˜ ์ „๊นŒ์ง€ ์‚ฌ์šฉ
forecast_loss = torch.sqrt(self.forecast_criterion(y[:,19,:], preds)) recon_loss = torch.sqrt(self.recon_criterion(x, recons))
  1. out_dim = 12
  1. TemporalAttention์˜ ๊ฒฝ์šฐ num_nodes๋ฅผ window_size๋กœ ์„ค์ • (์›๋ž˜๋Š” features)
  1. predict ์‹œ label์€ test๊ธธ์ด - window ๊ธธ์ด