In this work, we delve into the relationship between deep and shallow neural networks (NNs), focusing on the critical points of theirloss landscapes. We discover
an embedding principle in depth that loss landscape of an NN "contains" all critical
points of the loss landscapes for shallower NNs. The key tool for our discovery is the
critical lifting that maps any critical point of a network to critical manifolds of any
deeper network while preserving the outputs. To investigate the practical implications
of this principle, we conduct a series of numerical experiments. The results confirm
that deep networks do encounter these lifted critical points during training, leading to
similar training dynamics across varying network depths. We provide theoretical and
empirical evidence that through the lifting operation, the lifted critical points exhibit
increased degeneracy. This principle also provides insights into the optimization benefits of batch normalization and larger datasets, and enables practical applications like
network layer pruning. Overall, our discovery of the embedding principle in depth
uncovers the depth-wise hierarchical structure of deep learning loss landscape, which
serves as a solid foundation for the further study about the role of depth for DNNs.