๐Ÿ“‘Paper Review

[paper review] GRAPH ATTENTION NETWORKS

date
Oct 10, 2023
slug
GAT
author
status
Public
tags
DeepLearning
paper
summary
type
Post
thumbnail
์บก์ฒ˜.PNG
category
๐Ÿ“‘Paper Review
updatedAt
Sep 6, 2024 03:02 PM
GAT hidden layer(embedding)
GAT hidden layer(embedding)
ย 

Introduction

CNN์€ local filter๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ grid ๊ธฐ๋ฐ˜์˜ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ(e.g. image)์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ์šฉ๋˜์–ด ์™”์Šต๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ, ์‹ค์ œ ๋งŽ์€ Task(3d-mesh, social networks, telecommunication networks, ..)๋“ค์€ ์ด๋Ÿฌํ•œ grid-like ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ ์ž„์˜์˜ ๊ตฌ์กฐํ™” ๋œ ๋ฐ์ดํ„ฐ(arbitrarily structured data)์— ์‹ ๊ฒฝ๋ง์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ๋“ค์ด ๊ณ„์†๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
GNN ๋˜ํ•œ ์ด๋Ÿฌํ•œ ๋…ธ๋ ฅ ์ค‘ ํ•˜๋‚˜ ์ž…๋‹ˆ๋‹ค.
ย 
GNN์—๋Š” ํฌ๊ฒŒ ๋‘๊ฐ€์ง€ ์ข…๋ฅ˜๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.
์ฒซ ๋ฒˆ์งธ๋Š”, GCN๊ณผ ๊ฐ™์€ Spectral approach์ž…๋‹ˆ๋‹ค.
Spectral approach๋Š” Graph์˜ signal(graph laplacian)์„ eigenvalue decompositionํ•˜์—ฌ ํ‘ธ๋ฆฌ์— ๋„๋ฉ”์ธ์—์„œ ์ •์˜๋œ Convolution ์—ฐ์‚ฐ์„ ํ†ตํ•ด ๊ทธ๋ž˜ํ”„(๋…ธ๋“œ)๋ฅผ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.
GCN์€ ํŠนํžˆ ์ถ”์ฒœ ์‹œ์Šคํ…œ๊ณผ ๊ฐ™์ด ๋นˆ ๊ฐ’์ด ๋งŽ์€ semi-supervised node classification์—์„œ ๋งค์šฐ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์ง€๋งŒ, ๋Š๋‚Œ์ ์œผ๋กœ๋„ ์•Œ ์ˆ˜ ์žˆ๋“ฏ, eigenvector๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ๋ž˜ํ”„์˜ ๊ตฌ์กฐ๊ฐ€ ์กฐ๊ธˆ์ด๋ผ๋„ ๋ฐ”๋€๋‹ค๋ฉด ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ๋‹ค๋ฅธ ๊ทธ๋ž˜ํ”„์— ์ ์šฉ(inductive case)ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
ย 
๋‘ ๋ฒˆ์งธ๋Š”, graph-sage์™€ ๊ฐ™์€ non-spectral approach์ž…๋‹ˆ๋‹ค.
non-spectral approach๋Š” spectral analysis๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ , ๋‹จ์ˆœํžˆ ์ด์›ƒ๋“ค์˜ ์ •๋ณด๋ฅผ aggregator๋ฅผ ํ†ตํ•ด ์ทจํ•ฉํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ทธ๋ž˜ํ”„๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ non-spectral approach์˜ ๊ด€๊ฑด์€ ๋ถˆ ํŠน์ •ํ•œ ๋‹ค์ˆ˜์˜ ์ด์›ƒ ์ •๋ณด๋ฅผ ์–ด๋– ํ•œ ๋ฐฉ์‹์œผ๋กœ aggregateํ•  ๊ฒƒ์ธ์ง€ ์ž…๋‹ˆ๋‹ค. ์˜ˆ์‹œ๋กœ, graph sage๋Š” fixed size์˜ ์ด์›ƒ์„ samplingํ•˜๋Š” ๋ฐฉ์‹์˜ aggregator๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
ย 
๋ณธ ๋…ผ๋ฌธ์€ non-spectral์˜ ๊ด€์ ์—์„œ attention์„ ์‚ฌ์šฉํ•œ aggregator ๋ชจ๋ธ์ธ GAT(graph attention networks)๋ฅผ ์ œ์•ˆํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
๋ณธ ๋…ผ๋ฌธ์˜ contribution์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  • node pair๋กœ ๋ณ‘๋ ฌํ™” ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋งค์šฐ ํšจ์œจ์ ์ธ aggregation ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  • ๋™์ผํ•œ degree๋ฅผ ๊ฐ–์ง€ ์•Š๋Š” node์— ๋Œ€ํ•ด์„œ๋„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • unseen graph(inductive learning problem)์—๋„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Proposed Method

ย 
๊ฐ„๋‹จํ•˜๊ฒŒ ํ•˜๋‚˜์˜ GAT layer๋ฅผ ์˜ˆ์‹œ๋กœ ๋“ค์–ด ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.
GNN๊ณผ Attention mechanism์„ ์•„์‹ ๋‹ค๋ฉด ์‰ฝ๊ฒŒ ์ดํ•ดํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ย 
  • input
๋จผ์ €, GAT layer์˜ input์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
set of node features.
is # or nodes, is # of features
  • output
GAT layer๋Š” output ์—ญ์‹œ input๊ณผ ๋™์ผํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ–์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ผ๋ฐ˜์ ์ธ ์‹ ๊ฒฝ๋ง๊ณผ ๊ฐ™์ด feature space์˜ ํฌ๊ธฐ๋Š” ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
is # of nodes, is # of features
  • GAT layer computation
    • ๊ทธ๋ฆผ 1
      ๊ทธ๋ฆผ 1
๋ชจ๋“  ์ธ๊ณต์‹ ๊ฒฝ๋ง์ด ๊ทธ๋ ‡๋“ฏ input feature๋ฅผ high leavel output feature๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ตœ์†Œํ•œ ํ•˜๋‚˜์˜ learnable linear transformation layer๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. GAT์—ญ์‹œ ๋™์ผํ•˜๋ฉฐ, weight matrix ๋Š” ๊ฐ๊ฐ์— ๋…ธ๋“œ์— ์ ์šฉ๋˜์–ด ๊ทธ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ ์ด์›ƒ ๋…ธ๋“œ ์‚ฌ์ด์˜ attention์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค.
attention ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ node i,j ์‚ฌ์ด์˜ attention coefficients๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
<๊ทธ๋ฆผ 1> ์™ผ์ชฝ๊ณผ ๊ฐ™์ด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” attention mechanism ๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด weight matrix ๋ฅผ ํฌํ•จํ•˜๋Š” single-layer feedforward network๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. (bahdanau attention)
๋”ฐ๋ผ์„œ ์ข…ํ•ฉํ•˜๋ฉด, ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ attention coefficients ๋Š” ์•„๋ž˜์˜ ์‹๊ณผ ๊ฐ™์•„์ง‘๋‹ˆ๋‹ค.
ย 
์ด์›ƒ ๋…ธ๋“œ์™€์˜ attention coefficients๋ฅผ ๋ฐ˜์˜ํ•œ GAT layer๋ฅผ ์ง€๋‚œ node i์˜ feature๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์•„์ง‘๋‹ˆ๋‹ค.
๋‹น์—ฐํ•˜๊ฒŒ, k๊ฐœ์˜ multi-head attention์„ concatํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
final(prediction) layer์— ํ•œํ•ด์„œ๋Š” concat์ด ์•„๋‹Œ averaging์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
ย 

Experiments

์‹คํ—˜์€ transductive, inductive setting์—์„œ ๊ฐ๊ฐ node classification ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
์‹คํ—˜์— ์‚ฌ์šฉํ•œ dataset ์ž…๋‹ˆ๋‹ค.
3๊ฐœ์˜ transductive problem๊ณผ 1๊ฐœ์˜ inductive problem์ธ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
notion image
ย 
  • transductive problem
mean classfication acc๋ฅผ ๋น„๊ตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
GAT๊ฐ€ spectral model์ธ GCN๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
ย 
  • inductive problem
micro-averaged F1 score๋ฅผ ๋น„๊ตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
notion image
GraphSage๋ณด๋‹ค ๋งค์šฐ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
ย