Deep Residual Learning for Image Recognition
Paper - arXiv
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (Microsoft Research)
Hypothesis / Main methods:
When deeper networks are able to start converging, the degradation problem occurs: the accuracy gets saturated and degrades rapidly. It is easier to optimize / fine tune the residual effect of each layer on the previous layer. Thus in their architecture, each layer learns the residual to the input.
The output is element-wise addition of input x
to F(x)
. F(x)
is the residual.