๐Ÿ“‘Paper Review

[paper reivew] Resolution-robust Large Mask Inpainting with Fourier Convolutions

date
May 18, 2023
slug
Resolution-robust
author
status
Public
tags
DeepLearning
paper
summary
type
Post
thumbnail
category
๐Ÿ“‘Paper Review
updatedAt
Sep 6, 2024 03:34 PM
Abstract
ํ˜„๋Œ€์˜ ์ด๋ฏธ์ง€ ์ธํŽ˜์ธํŒ… ์‹œ์Šคํ…œ์˜ ๋ฌธ์ œ
  1. ๋Œ€๊ทœ๋ชจ ๋ˆ„๋ฝ
  1. ๋ณตํ•ฉํ•œ ์‚ฌ์ง„
  1. ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€
    1. notion image
  • ์ธํŽ˜์ธํŒ… ๋„คํŠธ์›Œํฌ์™€ ์†์‹คํ•จ์ˆ˜์— ๋ชจ๋‘ ํšจ์œจ์ ์ธ receptive field๊ฐ€ ๋ถ€์กฑํ•˜๊ธฐ ๋•Œ๋ฌธ
  • receptive field
    • ์ถœ๋ ฅ ๋ ˆ์ด์–ด์˜ ๋‰ด๋Ÿฐ ํ•˜๋‚˜์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ž…๋ ฅ ๋‰ด๋Ÿฐ๋“ค์˜ ๊ณต๊ฐ„ ํฌ๊ธฐ
LaMa ์ œ์•ˆ
  1. FFC๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ์ „๋ฐ˜์— ๋Œ€ํ•œ receptive field๋ฅผ ๊ฐ€์ง€๋Š” ์ƒˆ๋กœ์šด inpainting network ๊ตฌ์กฐ
  1. high receptive field perceptual loss
  1. large training masks
ย 
B. Introduction
๋Œ€ํ˜• ๋งˆ์Šคํฌ๊ฐ€ ์กด์žฌํ•˜๋Š” ๋ฌธ์ œ์˜ ๊ฒฝ์šฐ receptive field๊ฐ€ ํ•ด๋‹น ๋งˆ์Šคํฌ ํฌ๊ธฐ ์ด์ƒ์˜ ๊ตฌ์กฐ๋ฅผ ์ดํ•ดํ•˜๊ณ  ์žˆ์–ด์•ผ inpainting์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, popular convolutional architecture๋“ค์ด ๊ทธ ์ •๋„๋กœ ํฐ receptive field๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š์Œ.
ํ•ด๊ฒฐ ๋ฐฉ์•ˆ
  1. FFC๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ์ „๋ฐ˜์— ๋Œ€ํ•œ receptive field๋ฅผ ๊ฐ€์ง€๋Š” ์ƒˆ๋กœ์šด inpainting network ๊ตฌ์กฐ
  1. high receptive field perceptual loss
  1. large training masks
ย 
contribution
  1. ์ €ํ•ด์ƒ๋„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ต์œก๋งŒ์œผ๋กœ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋ฅผ ์ผ๋ฐ˜ํ™” ํ•  ์ˆ˜ ์žˆ์Œ
  1. complex periodic sructures๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ
  1. ๋Œ€ํ˜• ๋งˆ์Šคํฌ์— robust ํ•จ
  1. ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ๋„ ๋‹ฌ์„ฑ ๊ฐ€๋Šฅํ•˜๊ธฐ์— ๋ฒ ์ด์Šค๋ผ์ธ๋“ค์— ๋น„ํ•ด ์ถ”๋ก  ์‹œ๊ฐ„์ด ์งง์Œ
ย 
C. Method
notion image
x : color image
m : unknown pixels (mask)
xโŠ™m : masked image
๋งˆ์Šคํฌ๋œ ์ด๋ฏธ์ง€ ๋งˆ์Šคํฌ๋ฅผ ์Œ“์•„ ๋งŒ๋“  4์ฑ„๋„ ์ž…๋ ฅํ…์„œ : x'=stack(xโŠ™m,m)
feed-forward inpainting network (generator) : fฮธ()
notion image
x'๋ฅผ ๋„ฃ์œผ๋ฉด, fully-convolutional ๋ฐฉ๋ฒ•์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ , ํ•ฉ์„ฑ๋œ 3์ฑ„๋„ color image์ธ x_hat = fฮธ(x')์„ ์ƒ์„ฑ
์‹ค์ œ์ด๋ฏธ์ง€์™€ ํ•ฉ์„ฑ ์ด๋ฏธ์ง€์˜ ์Œ์— ๋Œ€ํ•ด ํ•™์Šต ์ˆ˜ํ–‰
ย 
C-1. Global context within early layers
resnet ๊ฐ™์€ fully-convolutional model๋“ค์€ ๋„คํŠธ์›Œํฌ์˜ ์ดˆ๊ธฐ ๊ณ„์ธต์— ์‚ฌ์šฉ๋˜๋Š” 3x3๊ณผ ๊ฐ™์€ ์ž‘์€ ์ปค๋„ ๋•Œ๋ฌธ์— ๋งŽ์€ ์ธต์—์„œ global context๊ฐ€ ๋ถ€์กฑํ•˜๊ณ , ์ด๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๋งŽ์€ ๊ณ„์‚ฐ๊ณผ parameter๊ฐ€ ๋‚ญ๋น„๋จ
ํŠนํžˆ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€์—์„œ ๋‘๋“œ๋Ÿฌ์ง
fast fourier convolution์€ ์ดˆ๊ธฐ ๊ณ„์ธต์—์„œ global context ์‚ฌ์šฉ ๊ฐ€๋Šฅ
ย 
notion image
FFC
  • local branch๋Š” ๊ธฐ์กด์˜ convolution์„ ์‚ฌ์šฉ
  • global branch๋Š” FFT ์‚ฌ์šฉ
    • global branch์˜ ๋ชฉ์ ์€ ์ดˆ๊ธฐ ๊ณ„์ธต์—์„œ global context๋ฅผ ๊ณ ๋ คํ•˜๊ธฐ ์œ„ํ•จ
    • FFT๋ฅผ ํ†ตํ•ด spatial domain์˜ ์ •๋ณด๋ฅผ spectral domain์œผ๋กœ ๋ฐ”๊พธ๋ฉด, global context๊ฐ€ ๊ณ ๋ ค๋จ (FFT๋Š” ์‹œ๊ฐ„์ถ•์—์„œ์˜ ํŒŒํ˜•์„ ์ฃผํŒŒ์ˆ˜์ถ•์—์„œ ๋ณด๋„๋ก ๋ณ€ํ˜•)\
    • ์ˆœ์„œ
        1. Real FFT2d๋ฅผ input tensor์— ์ ์šฉ
        1. real part์™€ imaginary part ๋ณ‘ํ•ฉ
        1. spectral domain์—์„œ conv
        1. Spatial structure๋กœ ๋ณต์›ํ•˜๊ธฐ ์œ„ํ•ด inverse transform
  • local branch์™€ global branch ๋ณ‘ํ•ฉ
    • ๊ณต๊ฐ„ ๋ฐ ์ŠคํŽ™ํŠธ๋Ÿผ ์ •๋ณด ์ฆ‰ local๊ณผ global ์ •๋ณด๋ฅผ ๋ชจ๋‘ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Œ
C-2. Loss function
  1. High receptive field perceptual loss
    1. ๋…ผ๋ฌธ์—์„œ์˜ perceptual loss (์˜ˆ์ธก ์ด๋ฏธ์ง€์™€ ๋ชฉํ‘œ์ด๋ฏธ์ง€๊ฐ„ ์˜ค์ฐจ)
    2. notion image
  1. Adversarial loss
    1. notion image
  1. The final loss function
    1. R1: gradient penalty
    2. DiscPL : discriminator-based perceptual loss (feature matching loss)
    3. notion image
ย 
C-3. Generation of masks during training
์•„๋ž˜์™€ ๊ฐ™์€ ์ „์ฒด์ด๋ฏธ์ง€์˜ 50%๋ฅผ ๋„˜์ง€ ์•Š๋Š” ๋‹ค์–‘ํ•œ mask๋ฅผ ์‚ฌ์šฉ
notion image
ย 
ย 
D. Experiments
notion image
LaMa์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๊ฐ€ ๋” ์ ์ง€๋งŒ, ์ „๋ฐ˜์ ์œผ๋กœ ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜๊ณ , ํŠนํžˆ segmentation mask์—์„œ์˜ ์„ฑ๋Šฅ์€ ๊ฐ€์žฅ ์ข‹์Œ
ย 
FFC๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ๋ฐ˜๋ณต ์ ์ด๊ณ , ๊ทœ์น™์ ์ธ ํŒจํ„ด์„ ๋ณต์›ํ•˜๋Š” ๋Šฅ๋ ฅ์ด ๋›ฐ์–ด๋‚˜๊ณ , ๋†’์€ ํ•ด์ƒ๋„์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ ํ• ๋•Œ ํ•ด์ƒ๋„์˜ ์†์‹ค์ด ์ ์Œ
notion image
notion image
ย 
ย 
E. Experiments on UAV data
ย 
์›๋ณธ ์ด๋ฏธ์ง€
notion image
๋งˆ์Šคํฌ
notion image
๊ฒฐ๊ณผ
  • masking ๋œ ๋ถ€๋ถ„์— ๋Œ€ํ•œ ์ž”์ƒ์ฒ˜๋Ÿผ ์ฐจ๋Ÿ‰์˜ ํ˜•์ƒ์ด ๋ณต์›๋จ
notion image
  • ํ•ด๋‹น ๋ชจ๋ธ์€ masking ๋œ ๋ถ€๋ถ„์— ๋Œ€ํ•œ ์›๋ณธ ์ •๋ณด๋ฅผ ๊ฐ–์ง€ ์•Š์€ ์ฑ„๋กœ inpainting ํ•˜๋ฉฐ, ๋™์ผ ์‚ฌ์ง„์œผ๋กœ training ํ•˜์ง€๋„ ์•Š์•˜๊ธฐ์— ์ฐจ๋ฅผ ๋ณต์›ํ•  ์ˆ˜ ์—†์Œ
  • ์‹ฌ์ง€์–ด ๋นจ๊ฐ„ ์ฐจ์— ๋Œ€ํ•ด์„œ๋Š” ๋นจ๊ฐ„์ƒ‰์œผ๋กœ, ํŒŒ๋ž€ ์ฐจ์— ๋Œ€ํ•ด์„œ๋Š” ํŒŒ๋ž€์ƒ‰์œผ๋กœ ๋ณต์› ํ•œ๋‹ค๋Š” ๊ฒƒ์€ gan ๋ชจ๋ธ์ด ๊ฐ–๋Š” ๋ฌธ์ œ๋ผ๊ธฐ ๋ณด๋‹จ, segmentation mask๊ฐ€ ํ•ด๋‹น ๋ฌผ์ฒด์— ๋Œ€ํ•ด ์™„์ „ํžˆ ๊ฐ€๋ฆฌ์ง€ ๋ชปํ•œ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ
  • ์ด์— segmentaion mask์˜ ํฌ๊ธฐ๋ฅผ ์ฃผ๋ณ€์œผ๋กœ ํ•œ ํ”ฝ์…€์”ฉ ๋„“ํ˜€๊ฐ€๋ฉฐ inpainting ๊ฒฐ๊ณผ ๋น„๊ต
ย 
  • ๊ฒฐ๋ก 
    • 10ํ”ฝ์…€ ๋„“ํžŒ ๊ฒฐ๊ณผ ์—ฌ์ „ํžˆ ์กฐ๊ธˆ์˜ ์ž”์ƒ์ด ๋‚จ์ง€๋งŒ, ์ฐจ๋Ÿ‰์ž„์„ ์ธ์‹ํ•˜๊ธฐ ์–ด๋ ค์šด ์ •๋„
    • notion image
    • 15 ํ”ฝ์…€ ๋„“ํžŒ ๊ฒฐ๊ณผ inpainting์ด ์ž˜ ๋จ
    • notion image
    • segmentaion ํ›„ ๋ฌผ์ฒด๊ฐ€ ์ถฉ๋ถ„ํžˆ ํฌํ•จ๋˜๋„๋ก mask์˜ ๋ฒ”์œ„๋ฅผ ๋„“ํžˆ๋Š” ์ž‘์—…์ด ํ•„์š”ํ•จ
ย