Volume 1, Issue 1
Embedding Principle: A Hierarchical Structure of Loss Landscape of Deep Neural Networks

Yaoyu Zhang, Yuqing Li, Zhongwang Zhang, Tao Luo & Zhi-Qin John Xu

J. Mach. Learn. , 1 (2022), pp. 60-113.

Published online: 2022-03

Category: Theory

Export citation
  • Abstract

We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i.e., loss landscape of an NN contains all critical points of all the narrower NNs. This result is obtained by constructing a class of critical embeddings which map any critical point of a narrower NN to a critical point of the target NN with the same output function. By discovering a wide class of general compatible critical embeddings, we provide a gross estimate of the dimension of critical submanifolds embedded from critical points of narrower NNs. We further prove an irreversibility property of any critical embedding that the number of negative/zero/positive eigenvalues of the Hessian matrix of a critical point may increase but never decrease as an NN becomes wider through the embedding. Using a special realization of general compatible critical embedding, we prove a stringent necessary condition for being a “truly-bad” critical point that never becomes a strict-saddle point through any critical embedding. This result implies the commonplace of strict-saddle points in wide NNs, which may be an important reason underlying the easy optimization of wide NNs widely observed in practice.

  • General Summary

Understanding the loss landscape of a deep neural network is obviously important in analyzing the training trajectory and the generalization performance. This work proposes a new approaching for studying this problem by examining the (embedding) relation between the loss landscapes of neural networks of different widths. The paper proves a general Embedding Principle, namely the loss landscape of a neural network "contains" all critical points of the landscape for narrower neural networks. The paper also demonstrates that (i) critical points embedded from narrower neural networks form submanifolds; (ii) a local minimum is more likely to become a strict saddle point in the landscape of wider neural networks but not vice versa.

  • Keywords

Neural network, Loss landscape, Critical point, Embedding principle.

  • AMS Subject Headings

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{JML-1-60, author = {Yaoyu and Zhang and and 22779 and and Yaoyu Zhang and Yuqing and Li and and 22780 and and Yuqing Li and Zhongwang and Zhang and and 22781 and and Zhongwang Zhang and Tao and Luo and and 22782 and and Tao Luo and Zhi-Qin and John Xu and and 22783 and and Zhi-Qin John Xu}, title = {Embedding Principle: A Hierarchical Structure of Loss Landscape of Deep Neural Networks}, journal = {Journal of Machine Learning}, year = {2022}, volume = {1}, number = {1}, pages = {60--113}, abstract = {

We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i.e., loss landscape of an NN contains all critical points of all the narrower NNs. This result is obtained by constructing a class of critical embeddings which map any critical point of a narrower NN to a critical point of the target NN with the same output function. By discovering a wide class of general compatible critical embeddings, we provide a gross estimate of the dimension of critical submanifolds embedded from critical points of narrower NNs. We further prove an irreversibility property of any critical embedding that the number of negative/zero/positive eigenvalues of the Hessian matrix of a critical point may increase but never decrease as an NN becomes wider through the embedding. Using a special realization of general compatible critical embedding, we prove a stringent necessary condition for being a “truly-bad” critical point that never becomes a strict-saddle point through any critical embedding. This result implies the commonplace of strict-saddle points in wide NNs, which may be an important reason underlying the easy optimization of wide NNs widely observed in practice.

}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.220108}, url = {http://global-sci.org/intro/article_detail/jml/20372.html} }
TY - JOUR T1 - Embedding Principle: A Hierarchical Structure of Loss Landscape of Deep Neural Networks AU - Zhang , Yaoyu AU - Li , Yuqing AU - Zhang , Zhongwang AU - Luo , Tao AU - John Xu , Zhi-Qin JO - Journal of Machine Learning VL - 1 SP - 60 EP - 113 PY - 2022 DA - 2022/03 SN - 1 DO - http://doi.org/10.4208/jml.220108 UR - https://global-sci.org/intro/article_detail/jml/20372.html KW - Neural network, Loss landscape, Critical point, Embedding principle. AB -

We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i.e., loss landscape of an NN contains all critical points of all the narrower NNs. This result is obtained by constructing a class of critical embeddings which map any critical point of a narrower NN to a critical point of the target NN with the same output function. By discovering a wide class of general compatible critical embeddings, we provide a gross estimate of the dimension of critical submanifolds embedded from critical points of narrower NNs. We further prove an irreversibility property of any critical embedding that the number of negative/zero/positive eigenvalues of the Hessian matrix of a critical point may increase but never decrease as an NN becomes wider through the embedding. Using a special realization of general compatible critical embedding, we prove a stringent necessary condition for being a “truly-bad” critical point that never becomes a strict-saddle point through any critical embedding. This result implies the commonplace of strict-saddle points in wide NNs, which may be an important reason underlying the easy optimization of wide NNs widely observed in practice.

Yaoyu Zhang, Yuqing Li, Zhongwang Zhang, Tao Luo & Zhi-Qin John Xu. (2022). Embedding Principle: A Hierarchical Structure of Loss Landscape of Deep Neural Networks. Journal of Machine Learning. 1 (1). 60-113. doi:10.4208/jml.220108
Copy to clipboard
The citation has been copied to your clipboard