TY - JOUR
T1 - Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)
AU - Li , Yuqing
AU - Luo , Tao
AU - Yip , Nung Kwan
JO - CSIAM Transactions on Applied Mathematics
VL - 4
SP - 692
EP - 760
PY - 2022
DA - 2022/11
SN - 3
DO - http://doi.org/10.4208/csiam-am.SO-2021-0053
UR - https://global-sci.org/intro/article_detail/csiam-am/21154.html
KW - Residual networks, training process, neural tangent kernel, neural tangent hierarchy.
AB - <p style="text-align: justify;">Gradient descent yields zero training loss in polynomial time for deep neural networks despite non-convex nature of the objective function. The behavior of
network in the infinite width limit trained by gradient descent can be described by the
Neural Tangent Kernel (NTK) introduced in [25]. In this paper, we study dynamics of
the NTK for finite width Deep Residual Network (ResNet) using the neural tangent
hierarchy (NTH) proposed in [24]. For a ResNet with smooth and Lipschitz activation
function, we reduce the requirement on the layer width $m$ with respect to the number
of training samples $n$ from quartic to cubic. Our analysis suggests strongly that the
particular skip-connection structure of ResNet is the main reason for its triumph over
fully-connected network.</p>