28 July 2020
Origin: YOLO9000: Better, Faster, Stronger
a. Train on ImageNet (224 x 224)
b. Resize & Finetune on ImageNet (448 x 448)
c. Finetune on dataset
d. Get 13 x 13 grid finally (7 x 7 grid before)
a. Lower features are concatenated directly to heigher features
b. A new layer is added for that purpose: reorg
Can accpet any size of inputs, enhance model robustness.
[border % 32 = 0, decided by down sampling
Box Generation | # | Avg IOU |
---|---|---|
Cluster SSE | 5 | 58.7 |
Cluster IOU | 5 | 58.7 |
Anchor Boxes | 9 | 58.7 |
Cluster IOU | 9 | 58.7 |
a. Faster RCNN: 9 by hands
b. YOLOv2: 5 by K-Means [dist: 1 − IOU(bbox, cluster)]
[10 numbers: ($a_{w_{i}}, a_{h_{i}}$) * 5)]
anchors[0] = $a_{w_{i}}$ = $\frac{a_{w_{i}}}W$ * 13
one grid cell: $S^2$ * B * [x, y, w, h, $C_{0}, ..., C_{N}$]
YOLO | YOLOv2 | ||||||||
---|---|---|---|---|---|---|---|---|---|
batch norm? | √ | √ | √ | √ | √ | √ | √ | √ | |
hi-res classifier? | √ | √ | √ | √ | √ | √ | √ | ||
convolutional? | √ | √ | √ | √ | √ | √ | |||
anchor boxes? | √ | √ | |||||||
new network? | √ | √ | √ | √ | √ | ||||
dimension priors? | √ | √ | √ | √ | |||||
location prediction? | √ | √ | √ | √ | |||||
passthrough? | √ | √ | √ | ||||||
multi-scale? | √ | √ | |||||||
hi-res detector? | √ | ||||||||
VOC2007 mAP | 63.4 | 65.8 | 69.5 | 69.2 | 69.6 | 74.4 | 75.4 | 76.8 | 78.6 |