๐Ÿ“‘Paper Review

[paper review] Going deeper with convolutions

date
Jul 15, 2023
slug
going-deeper
author
status
Public
tags
paper
DeepLearning
summary
Keyword : GoogLeNet, Inception module, Google
type
Post
thumbnail
category
๐Ÿ“‘Paper Review
updatedAt
Sep 6, 2024 03:18 PM
ย 
notion image
wideํ•œ net๋ณด๋‹ค๋Š” deepํ•œ๊ฒŒ ๋” ์ข‹์Œ
conv layer์˜ ์ฑ„๋„์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ๋„ wide ์ธก๋ฉด์ธ๊ฐ€ deep ์ธก๋ฉด์ธ๊ฐ€.. ๋‚ด๊ฐ€ ๋ณผ๋•Œ๋Š” wide ๊ฐ™์Œ
ย 
dense ํ•˜๋‹ค : MLP์™€ ๊ฐ™์ด input node๊ฐ€ output node์— fully connected ๋˜์–ด์žˆ๋‹ค.
sparseํ•˜๋‹ค : conv layer์ฒ˜๋Ÿผ ํ˜น์€ drop out?? ์ฒ˜๋Ÿผ fully connect ๋˜์–ด์žˆ์ง€ ์•Š์€ ์ƒํƒœ. ๋ช‡๋ช‡์€ ์—ฐ๊ฒฐ์ด ์•ˆ๋œโ€ฆ
ย 
filter ๋‚ด์—์„œ ๋ณ„ ํŠน์ง•์ด ์—†๋Š” ๋…ธ๋“œ๋“ค(weigh๊ฐ€ 0์— ๊ฐ€๊นŒ์šด)์„ ์ œ์™ธํ•˜๊ณ  ํŠน์ง•์ด ๊ฐ•ํ•œ ์• ๋“ค๋งŒ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์€ ์—†๋‚˜?? ๋…ธ๋“œ ๋‹จ์œ„๋กœ ๊ณจ๋ผ๋‚ด๋Š”๊ฒŒ ์‰ฝ์ง€ ์•Š์„๋“ฏ.. ๊ทธ๋ž˜์„œ ์ €์ž๋Š” groupํ™”(filter ๋‹จ์œ„) ํ•ด์„œ dense connect์— ์ง‘์–ด๋„ฃ์€๊ฑฐ๊ณ .. ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํ•ด๋„ ๊ทธ๋ ‡๊ฒŒ ํ•™์Šต๋˜๋ฉด overfitting ๋  ๊ฒƒ ๊ฐ™๋‹ค.. ํŠน์ง•์ด ์—†๋Š” node๋“ค์€ ์ •๋ง ๋ฌด์“ธ๋ชจ๋Š” ์•„๋‹ ๊ฒƒ ๊ฐ™๋‹คโ€ฆ
ย 
layer๋ฅผ ์Œ“์•„๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์ด ์ตœ์ ์˜ ๋ฐฉ์‹์ธ๊ฐ€?? ์ง„์งœ ๋‰ด๋Ÿฐ(์‚ฌ๋žŒ ๋จธ๋ฆฌ)๋„ ์ด๋ ‡๊ฒŒ ์ž‘๋™ํ•˜๋‚˜?? ์•„๋‹ˆ๋ฉด graph์ฒ˜๋Ÿผ ์–ฝํžˆ๊ณ  ์„ค์ผœ์„œ ๋ชจ๋“  ๋‰ด๋Ÿฐ์ด ๋‹ค ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๊ณ  ๋™์‹œ์— ๋ชจ๋“  ๋‰ด๋Ÿฐ์ด update๋˜๋Š”๊ฑฐ ์•„๋‹Œ๊ฐ€(๊ทธ๋Ÿฌ๋ฉด gradient vanishing ๋ฌธ์ œ๋Š” ์—†์„๋“ฏ)? ์šฐ๋ฆฌ๋Š” ๊ทธ๋Ÿฐ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‚˜??
ย 
GoogLeNet (Inception v1)
  1. Introduction
- ILSVRC 2014(ImageNet Large-Scale Visual Recognition Challenge) VGGNET์„ ๊ฐ„์†Œํ•œ ์ฐจ์ด๋กœ ์ด๊ธฐ๊ณ  winner
- ์ด๋Ÿฌํ•œ ์„ฑ๊ณผ๋Š” bigger model, hardware์ด ์•„๋‹ˆ๋ผ new idea, algorithms, network architecture ๋•๋ถ„์— ์ด๋ฃฐ์ˆ˜ ์žˆ์—ˆ๋‹ค.
- ์ ์€ parameter๋กœ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ์—ˆ๋‹ค.
- ์ ์  ๋งŽ์€ ๋ ˆ์ด์–ด, ๋” ํฐ ๋ ˆ์ด์–ด ์‚ฌ์ด์ฆˆ๊ฐ€ ๊ณผ๊ฑฐ ํŠธ๋ Œ๋“œ์˜€๋‹ค.
  1. related work
- LeNet-5 : ์ด ๋ชจ๋ธ๋ถ€ํ„ฐ CNN์ด ๋น„์ „๋ถ„์•ผ์˜ standard๊ฐ€ ๋˜์—ˆ๋‹ค.
- Serre ๋…ผ๋ฌธ : (Inception model๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ) ์‹œ๊ฐ ํ”ผ์งˆ(visual cortex) ์‹ ๊ฒฝ๊ณผํ•™ ๋ชจ๋ธ(neuroscience)์˜ ์˜๊ฐ์„ ๋ฐ›์€ Serre๋Š” multiple scale์„ handle ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋‹ค์–‘ํ•œ ์‚ฌ์ด์ฆˆ์˜ Gabor filter๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.
* Gabor filter : ์„ ํ˜•ํ•„ํ„ฐ๋กœ์„œ ์‚ฌ๋žŒ์˜ ์‹œ๊ฐ์ฒด๊ณ„์™€ ๋น„์Šทํ•˜๊ฒŒ ์ž‘๋™ํ•˜๋Š” ํ•„ํ„ฐ์ด๋ฉฐ ์ด๋ฏธ์ง€์˜ ํŠน์ •๋ฐฉํ–ฅ์˜ ์„ฑ๋ถ„์„ ์ฐพ์•„๋‚ด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ
notion image
- Network in Network : ์ €์ž๋Š” N-in-N ์˜๊ฐ์„ ๋ฐ›์•„์„œ 1 ร— 1 conv layer๋ฅผ ๋„์ž…ํ–ˆ๋‹ค.
* 1 ร— 1 conv layer์˜ ๋ชฉ์  2๊ฐ€์ง€
** computational bottleneck์„ ์ œ๊ฑฐํ•˜๊ธฐ ์œ„ํ•œ ์ฐจ์› ์ถ•์†Œ ๋ชจ๋“ˆ
** ์„ฑ๋Šฅ์ €ํ•˜ ์—†์ด depth ๋ฐ width๋ฅผ ๋Š˜๋ ค์ฃผ๋Š” ํšจ๊ณผ : ์ฐจ์›์„ ์ถ•์†Œํ•˜๋”๋ผ๋„ ์„ฑ๋Šฅ์ €ํ•˜๊ฐ€ ํฌ๊ฒŒ ๋‚˜์ง€ ์•Š๋Š” ์ด์œ  : conv๋Š” ํ•ฉ์„ฑ๊ณฑ ์ดํ›„์— ๋น„์„ ํ˜•์„ฑ์„ ๋”ํ•ด์ฃผ๋Š”๋ฐ(activation func) ์ด๋Š” ํ•ฉ์„ฑ๊ณฑ ํ•˜๊ธฐ ์ „์— node๋“ค์˜ ๋น„์„ ํ˜•์„ฑ์„ ๊ฐ€ํ•ด์ฃผ์ง€๋Š” ์•Š์Œ..๊ทธ๋Ÿฌ๋‚˜ 1 ร— 1 conv layer๋Š” node๋งˆ๋‹ค ๋น„์„ ํ˜•์„ฑ์„ ๊ฐ€ํ•ด์ค„ ์ˆ˜ ์žˆ์Œ(fully connect ํ•œ๋‹ค์Œ activation func์„ ํ†ต๊ณผ)
- RCNN : 2 stage ๋ฐฉ์‹ object detection. ์ €์ž๋Š” multi box prediction๊ณผ bbox proposal ์•™์ƒ๋ธ” approache ์ ์šฉ์œผ๋กœ ์กฐ๊ธˆ๋” ๊ฐœ์„ ํ•˜์˜€์Œ
ย 
  1. motivation and high level considerations
- ๋‹จ์ˆœํ•˜๊ฒŒ ์ƒ๊ฐํ•˜๋ฉด DNN ์„ฑ๋Šฅ์„ ์ข‹๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” large model with large dataset ํ•˜๋ฉด ๋˜๊ฒ ์ง€๋งŒ ๋‹จ์ ์ด 2๊ฐ€์ง€ ์žˆ๋‹ค. ๋ฌดํ•œ์ • ์‚ฌ์ด์ฆˆ(๋งŽ์€ parameter)๋ฅผ ํ‚ค์šด๋‹ค๊ณ  ๋Šฅ์‚ฌ๊ฐ€ ์•„๋‹ˆ๋‹ค.
- ๋งŒ์•ฝ training set์ด ์ ๋‹ค๋ฉด ๋ฐ”๋กœ ์˜ค๋ฒ„ํ”ผํŒ…
- ๊ทธ๋ฆฌ๊ณ  computational resource์˜ ํ•œ๊ณ„. ๋งŒ์•ฝ 2๊ฐœ์˜ ์—ฐ๊ฒฐ๋œ layer์—์„œ uniformํ•˜๊ฒŒ ํ•„ํ„ฐ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋ฉด ์ œ๊ณฑ์œผ๋กœ computation์ด ์ฆ๊ฐ€ํ•œ๋‹ค. weight๊ฐ€ 0์— ๊ฐ€๊นŒ์šด parameter๋“ค์ด ๋งŽ์œผ๋ฉด ๊ทธ๊ฑด computation์ด ๋น„ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.
- ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•ฉ์„ฑ๊ณฑ ๋‚ด๋ถ€์—์„œ๋„ fully connected๋ฅผ sparsely conncected๋กœ ๋ณ€๊ฒฝํ•ด์•ผํ•œ๋‹ค. Arora์˜ ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœํ•œ ์ด๋ก ์€ ํ™•๊ณ ํ•œ ์ด์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. dataset์˜ ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ large and sparseํ•œ DNN์œผ๋กœ ํ‘œํ˜„๊ฐ€๋Šฅํ•˜๋‹ค๋ฉด ์ž…๋ ฅ์ธต์˜ ๋‰ด๋Ÿฐ๊ณผ ์ถœ๋ ฅ์ธต์˜ ๋‰ด๋Ÿฐ ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ํ†ต๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•˜์—ฌ ์ตœ์ ์˜ network๋ฅผ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ๋‹ค.
notion image
์™ผ์ชฝ์€ Sparse, ์˜ค๋ฅธ์ชฝ์€ dense
- ์ด๋Ÿฌํ•œ ์ˆ˜ํ•™์  ์ฆ๋ช…์€ strongํ•œ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•ด์•ผ ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Hebbian principle(๋‰ด๋ŸฐA๊ฐ€ B๋ฅผ ์ง€์†์ ์œผ๋กœ fire(salivation signal on)ํ•˜๊ฒŒ ๋˜๋ฉด B์— ๋Œ€ํ•œ A์˜ weight๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค)์˜ ๋‚ด์žฌ๋œ ๊ฐœ๋…์€ less strictํ•œ ์กฐ๊ฑด์—๋„ ์ ์šฉ์ด ๋œ๋‹ค.
ย 
notion image
- ๊ทธ๋Ÿฌ๋‚˜ ์š”์ฆ˜์˜ computing ์ธํ”„๋ผ๋Š” non-uniform sparse data๋ฅผ ๊ณ„์‚ฐํ•˜๋Š”๋ฐ ๋น„ํšจ์œจ์ ์ด๋‹ค. 100๋ฐฐ ์ด์ƒ ์‚ฐ์ˆ  ๊ณ„์‚ฐ์ด ์ค„์–ด๋“ค์–ด๋„ lookup ๋น„์šฉ, cache miss๋Š” sparse matrix๋กœ ๋ณ€๊ฒฝํ•ด์„œ ๊ณ„์‚ฐํ•˜๋Š” ๊ฑฐ๋ฅผ ์ƒ์‡„ pay off ํ•˜์ง€ ๋ชปํ•œ๋‹ค. ๊พธ์ค€ํžˆ ๊ฐœ์„ ๋˜๊ณ , highly tuned, numerical library์ด ์‚ฌ์šฉ๋˜์–ด CPU๋‚˜ GPU ํ•˜๋“œ์›จ์–ด๋ฅผ exploit ํ•˜๊ฒŒ ๋  ์ˆ˜๋ก ๊ทธ ๊ฐญ(dense์™€ sparse ๊ณ„์‚ฐ ํšจ์œจ ์ฐจ์ด)์€ ์ ์  ๋” ์ปค์ง„๋‹ค. ๋˜ํ•œ non-uniform sparse ๋ชจ๋ธ์€ ๋” ์ •๊ตํ•œ engineering๊ณผ computing ์ธํ”„๋ผ๊ตฌ์กฐ๋ฅผ ์š”๊ตฌํ•œ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๋น„์ „ ๋ชจ๋ธ์€ spatial domain์— ์žˆ๋Š” sparsity๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด CNN์„ ์“ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ดˆ๊ธฐ์— ํ•ฉ์„ฑ๊ณฑ์€ patch์— ๋Œ€ํ•œ dense connection์˜ collection์œผ๋กœ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ??convnet์€ ์ฒ˜์Œ์—๋Š” ๋Œ€์นญ์„ฑ์„ ๊นจ๊ณ  ํ•™์Šต์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด sparse connection์„ ์ ์šฉํ–ˆ์ง€๋งŒ ๋ณ‘๋ ฌ ์ปดํ“จํŒ…์„ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด full connection์œผ๋กœ ๋ฐ”๊พธ์—ˆ๋‹ค.?? ๋ฌด์Šจ๋ง์ด์—ฌ?? ๊ตฌ์กฐ์˜ ํ†ต์ผ์„ฑ, ๋งŽ์€ ํ•„ํ„ฐ์ˆ˜, ํฐ ๋ฐฐ์น˜์‚ฌ์ด์ฆˆ๋Š” ํšจ์œจ์ ์ธ dense computation์„ ๊ฐ€๋Šฅ์ผ€ํ–ˆ๋‹ค.
- ๊ทธ๋Ÿฐ๋ฐ ์˜๋ฌธ์ด ๋“ค์—ˆ๋‹ค. filter level์—์„œ sparsity ์‚ฌ์šฉ์„ ๊ฐ€๋Šฅ์ผ€ํ•˜๋Š” ์•„ํ‚คํ…์ณ๋ฅผ ๋งŒ๋“ค์–ด ๋ณผ ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ ๋ผ๋Š” ํฌ๋ง(dense computation์„ ๊ทธ๋Œ€๋กœ ์“ฐ๋ฉด์„œ)..sparse ๋งคํŠธ๋ฆญ์Šค๋ฅผ ๋‹ค๋ฃฌ ๋งŽ์€ ๋ฌธํ—Œ์—์„œ sparse ๋งคํŠธ๋ฆญ์Šค๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜์—ฌ ์ƒ๋Œ€์ ์œผ๋กœ denseํ•œ submatrix๋ฅผ ๋งŒ๋“ค์—ˆ๋Š”๋ฐ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.
notion image
- inception ๋ชจ๋ธ์€ sparse ๊ตฌ์กฐ๋ฅผ ํ…Œ์ŠคํŠธ ํ•ด๋ณด๊ธฐ ์œ„ํ•ด ์‹œ์ž‘๋˜์—ˆ๋‹ค.
ย 
  1. Architectural details
- ์ฃผ์š” ๊ฐœ๋…์€ ์ตœ์ ํ™”๋œ local sparse ๊ตฌ์กฐ๋ฅผ ๊ทผ์‚ฌํ™”ํ•˜๊ณ  dense components๋กœ ๋ฐ”๋กœ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฐพ๋Š” ๊ฒƒ์— ๊ทผ๊ฑฐํ•œ๋‹ค. ์ตœ์ ์˜ local ๊ตฌ์กฐ๋ฅผ ์ฐพ๊ณ  spatially ๋ฐ˜๋ณตํ•ด์•ผํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋‹ค์‹œ๋งํ•˜๋ฉด sparse ๋งคํŠธ๋ฆญ์Šค๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜์—ฌ ์ƒ๋Œ€์ ์œผ๋กœ denseํ•œ submatrix๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค.
- ์šฐ๋ฆฌ๋Š” ์ด์ „ ๋ ˆ์ด์–ด์˜ ๊ฐ ์œ ๋‹›๋“ค์€ ์ธํ’‹ ์ด๋ฏธ์ง€์˜ some region์— ๋Œ€์‘ํ•˜๊ณ  ์ด ์œ ๋‹›๋“ค์€ filter bank๋กœ ๊ทธ๋ฃนํ™”๋œ๋‹ค๋ผ๊ณ  ๊ฐ€์ •ํ•˜์˜€๋‹ค. ๋‚ฎ์€ ์ธต์ผ์ˆ˜๋ก local ํ•œ region ์— ์ง‘์ค‘ํ•  ๊ฒƒ์ด๋‹ค. ๋‹จ์ผ region(feature๋ผ๊ณ  ํ•ด์„ํ•ด์•ผํ•  ๊ฒƒ ๊ฐ™์Œ)์— ์ง‘์ค‘ํ•œ ๋งŽ์€ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋๋‚  ๊ฒƒ์ด๊ณ  ์ด๋Š” ๋‹ค์Œ ์ธต์—์„œ 1 ร— 1 conv ๋กœ ๊ณ„์‚ฐ๋  ๊ฒƒ์ด๋‹ค.
- ๊ทธ๋Ÿฌ๋‚˜ ์šฐ๋ ค๋˜๋Š” ๋ฐ”๋Š”.. large patch์— ๋น„ํ•ด spatially ํผ์ง„ small number unit์ด ์žˆ์„ ์ˆ˜ ์žˆ๊ณ  ๋ฐ˜๋Œ€๋กœ large region์„ ์ปค๋ฒ„ํ•˜๋Š”๋ฐ patch ์ˆ˜๋ฅผ ์ค„์—ฌ์•ผ ๋  ์ˆ˜๋„ ์žˆ๋‹ค(patch ํฌ๊ธฐ๊ฐ€ ์ž‘์€๊ฑฐ๋Š” feature๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋‹ด์ง€ ๋ชปํ•ด์„œ patch ํฌ๊ธฐ๊ฐ€ ํฐ ๊ฒƒ๋งŒ ์‚ฌ์šฉ). ์ข€ ๋” ๋„“์€ ์˜์—ญ์˜ filter๊ฐ€ ์žˆ์–ด์•ผ correlated unit์˜ ๋น„์œจ(kernel ๋Œ€ correlated unit์˜ ๋น„์œจ)์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.
- ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ €์ž๋Š” filter size๋ฅผ 1 ร— 1, 3 ร— 3, 5 ร— 5๋กœ ์ œํ•œํ•˜์˜€๋‹ค(๋‹จ์ˆœ ํŽธ์˜๋ฅผ ์œ„ํ•ด์„œ).
- filter bank์˜ ๊ฒฐ๊ณผ๋Š” concat๋˜์–ด ๋‹ค์Œ layer์˜ ์ธํ’‹์œผ๋กœ ๋“ค์–ด๊ฐ„๋‹ค.
- ๋†’์€ layer๋กœ ๊ฐˆ์ˆ˜๋ก higher ์ถ”์ƒ์ ์ธ feature๊ฐ€ ํฌ์ฐฉ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ๊ฒƒ์˜ spatial concentration์€ ๋–จ์–ด์งˆ ๊ฒƒ์ด๊ณ  ์ด๋Š” 3 ร— 3, 5 ร— 5 ํ•„ํ„ฐ๊ฐ€ ๋” ๋งŽ์ด ์“ฐ์—ฌ์•ผ ํ•œ๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.
- ํฐ ๋ฌธ์ œ๋Š” 5 ร— 5๋ฅผ ์“ฐ๋ฉด ๋„ˆ๋ฌด ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์•„์ง„๋‹ค. ๋˜ํ•œ pooling unit์„ ๋ถ™์ด๋ฉด ์—ฐ์‚ฐ๋Ÿ‰์ด ๋” ๋งŽ์•„์ง€๋Š”๋ฐ ์ด๊ฒƒ์ด optimal sparse structure๋ฅผ ๊ตฌํ˜„ํ•œ๋‹ค๊ณ  ํ•ด๋„ ๋„ˆ๋ฌด ๋น„ํšจ์œจ์ ์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ์˜ ๋‘๋ฒˆ์งธ ์•„์ด๋””์–ด๊ฐ€ ๋‚˜์˜จ๋‹ค.
- ์šฐ๋ฆฌ๋Š” ์šฐ๋ฆฌ์˜ ํ‘œํ˜„๋ ฅ์„ ๋Œ€๋ถ€๋ถ„์˜ Place์—์„œ ์ง€ํ‚ค๊ณ  ์‹ถ๊ณ  signal์„ ์ถ•์•ฝํ•ด์•ผ๋งŒ ํ•˜๋Š” ๊ณณ์—์„œ๋งŒ ์ถ•์•ฝํ•˜๊ณ  ์‹ถ์—ˆ๋‹ค. ์ฆ‰ 1 ร— 1 ์„ ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์€ 3 ร— 3, 5 ร— 5 layer ์ „์— ๋‘์—ˆ๋‹ค. ์ด๋Š” ์ฐจ์›์ถ•์†Œ๋ผ๋Š” ๋ชฉ์  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ReLU๋ฅผ ๋‘ ์œผ๋กœ์จ ๋‚ฎ์€ ์ฐจ์›์˜ feature๋“ค์˜ ์ •๋ณด๋ฅผ ๋น„์„ ํ˜•์ ์œผ๋กœ ์“ธ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ์—ˆ๋‹ค.
notion image
ย 
- ํšจ์œจ์ ์ธ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ์„ ์œ„ํ•ด์„œ inception ๋ชจ๋“ˆ์€ ๋†’์€ layer์—์„œ ์“ฐ๋Š” ๊ฒƒ์ด ํšจ์œจ์ ์ด๋‹ค.
- inception ๋ชจ๋“ˆ์˜ ์žฅ์ ์€ 2๊ฐ€์ง€์ด๋‹ค.
1. ๊ณผ๋„ํ•œ ์—ฐ์‚ฐ๋Ÿ‰ ์—†์ด ๊ฐ ๋‹จ๊ณ„์—์„œ ์œ ๋‹›์˜ ์ˆ˜๋ฅผ ๋Š˜๋ฆด ์ˆ˜ ์žˆ๋‹ค.
2. ์ด๋ฏธ์ง€ ์ •๋ณด๊ฐ€ ๋‹ค์–‘ํ•œ scale์„ ํ†ต๊ณผํ•˜์—ฌ ๊ณ„์‚ฐ๋˜์–ด ํ•ฉ์ณ์ง€๊ณ  ์ด๋Š” ๋‹ค์Œ layer์—์„œ๋Š” ๋‹ค์–‘ํ•œ scale์„ ํ†ต๊ณผํ•œ ์ •๋ณด๋ฅผ ํ•œ๋ฒˆ์— ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค€๋‹ค.
  1. GoogLeNet
notion image
part 1) ์ž…๋ ฅ ์ด๋ฏธ์ง€์™€ ๊ฐ€๊นŒ์šด ๋‚ฎ์€ ๋ ˆ์ด์–ด
- ๊ธฐ๋ณธ์ ์ธ CNN ๋ชจ๋ธ ์‚ฌ์šฉ. inception layer๋Š” ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ ์œ„ํ•ด์„œ ๋†’์€ layer์—์„œ ์‚ฌ์šฉ
notion image
part 2) inception ๋ชจ๋“ˆ
- ์ฐจ์›์„ ์ถ•์†Œํ•˜๊ธฐ ์œ„ํ•ด์„œ 1 ร— 1์„ ๋จผ์ € ๋‘์—ˆ์œผ๋ฉฐ sparseํ•œ ๋…ธ๋“œ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋ง ํ•˜๊ธฐ ์œ„ํ•ด์„œ 3 ร— 3, 5 ร— 5 ๋ ˆ์ด์–ด๋ฅผ ๋‘์—ˆ๊ณ  ๋‚˜์ค‘์— concatํ•œ๋‹ค.
ย 
notion image
part 3) auxiliary classifier๋ฅผ ์ ์šฉ
- ๋ชจ๋ธ์ด ๊นŠ์„์ˆ˜๋ก gradient vanishing ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธฐ๋Š”๋ฐ ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ณ ์ž ์–•์€ Layer์—์„œ backward๋ฅผ ๋„˜๊ฒจ์ฃผ๊ธฐ ์œ„ํ•ด auxiliary classifier๋ฅผ ์ ์šฉํ•˜์˜€๋‹ค. ์ตœ์ข… ๊ฒฐ๊ณผ์— ์ง€๋‚˜์น˜๊ฒŒ ์˜ํ–ฅ์„ ์ฃผ๋Š”๊ฒƒ์„ ๋ง‰๊ธฐ ์œ„ํ•ด์„œ ์ตœ์ข… loss ํ•ฉ์„ ๊ณ„์‚ฐํ•  ๋•Œ๋Š” ๊ฐ€์ค‘์น˜๋ฅผ 1 ์ดํ•˜(์˜ˆ๋ฅผ ๋“ค์–ด 0.3)๋กœ ์ฃผ์—ˆ๋‹ค.
notion image
ย 
part 4) model์˜ ๋ ๋ถ€๋ถ„
- average pooling์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ ์ด๋Š” GAP(global average pooling)์œผ๋กœ ์ด์ „ layer์—์„œ ๋‚˜์˜จ feature map์„ ๊ฐ๊ฐ ํ‰๊ท  ๋‚ด์„œ 1์ฐจ์› ๋ฒกํ„ฐ๋กœ ์ด์–ด์ค€๋‹ค. weight ๊ณฑ์…ˆ ์—ฐ์‚ฐ ์—†์ด class ๊ฐœ์ˆ˜๋งŒํผ return์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ๋‹ค๋ฅธ label set์— ๋Œ€ํ•ด์„œ fine tunning์„ ์ˆ˜์›”ํ•˜๊ฒŒ ํ•œ๋‹ค. FC๋ณด๋‹ค GAP๊ฐ€ accuracy๊ฐ€ 0.6% ๋” ์ข‹์•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ dropout์€ ์—ฌ์ „ํžˆ ํ•„์š”ํ•˜๋‹ค.
notion image
ย 
  1. training methodology
- optimizer : SGD, 0.9 momentum
- learning rate : 8์—ํญ๋งˆ๋‹ค 4% ๊ฐ์†Œ
- aspect ratio(๊ฐ€๋กœ์„ธ๋กœ ๋น„์œจ) 3/4 ๋˜๋Š” 4/3๋ฅผ ์„ ํƒํ•˜๋ฉด์„œ 8% ~100% ๋‹ค์–‘ํ•œ ์‚ฌ์ด์ฆˆ์˜ patch samplingํ•˜์˜€๋‹ค. ?? patch size๊ฐ€ 3/4์ธ ๊ฑด ๋ฌด์Šจ ๋ง์ž„?? 3ร—3, 5ร—5 ๋ฐ–์— ์—†๋Š”๊ฑฐ ์•„๋‹Œ๊ฐ€??
ย 
  1. ๊ฒฐ๋ก 
- inception์€ weight๊ฐ€ ์˜๋ฏธ์—†์ด ์“ฐ์ด๋Š” fully connect๋ณด๋‹ค sparsly connect๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด filter ๋‹จ์œ„์—์„œ ์ด๋ฅผ ์ ์šฉํ•˜์˜€๋‹ค(์ตœ์ ์˜ local sparse structure๋ฅผ ์ฐพ๋Š”๋‹ค). ๋‹ค์‹œ ๋งํ•˜๋ฉด sparse ๊ตฌ์กฐ๋ฅผ dense ๊ตฌ์กฐ๋กœ ๊ทผ์‚ฌํ™”ํ•˜์˜€๋‹ค.
ย 
filter๊ฐ€ ์ปค๋ฒ„ํ•˜๋Š” input image์˜ region์ด ํด ์ˆ˜๋ก correlated ๋œ node๊ฐ€ ๋งŽ์•„
์ƒ๋Œ€์ ์œผ๋กœ large image patch๋ณด๋‹ค ๋‚ฎ์€ ์ฐจ์›์—์„œ ์ž„๋ฒ ๋”ฉ๋œ ์ •๋ณด๋“ค์ด ์ƒ๋Œ€์ ์œผ๋กœ ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ๊ฐ€์งˆ์ง€๋„ ๋ชจ๋ฅธ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ •๋ณด๋ฅผ denseํ•˜๊ณ  compressed form์œผ๋กœ ์ž„๋ฒ ๋”ฉํ•˜๋Š” ๊ฑด ์–ด๋ ต๋‹ค.
ย 
22.10.30. ์ถ”๊ฐ€ : 3ร—3, 5ร—5๋กœ conv layer๋ฅผ ๋‚˜๋ˆ ์„œ ํ†ต๊ณผํ•˜๋Š” ๊ฒƒ์€ sparse network๋ฅผ ํ™•๋ณดํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
ย