Volume 3, Issue 2
Approximation Results for Gradient Flow Trained Neural Networks

Gerrit Welper

J. Mach. Learn. , 3 (2024), pp. 107-175.

Published online: 2024-06

[An open-access article; the PDF is free to any online user.]

Export citation
  • Abstract

The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous $L_2(\mathbb{S}^{d−1 )}$-norm on the $d$-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. We show gradient flow convergence based on a neural tangent kernel (NTK) argument for the non-convex optimization of the second but last layer. Unlike standard NTK analysis, the continuous error norm implies an under-parametrized regime, possible by the natural smoothness assumption required for approximation. The typical over-parametrization re-enters the results in form of a loss in approximation rate relative to established approximation methods for Sobolev smooth functions.

  • AMS Subject Headings

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{JML-3-107, author = {Welper , Gerrit}, title = {Approximation Results for Gradient Flow Trained Neural Networks}, journal = {Journal of Machine Learning}, year = {2024}, volume = {3}, number = {2}, pages = {107--175}, abstract = {

The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous $L_2(\mathbb{S}^{d−1 )}$-norm on the $d$-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. We show gradient flow convergence based on a neural tangent kernel (NTK) argument for the non-convex optimization of the second but last layer. Unlike standard NTK analysis, the continuous error norm implies an under-parametrized regime, possible by the natural smoothness assumption required for approximation. The typical over-parametrization re-enters the results in form of a loss in approximation rate relative to established approximation methods for Sobolev smooth functions.

}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.230924}, url = {http://global-sci.org/intro/article_detail/jml/23210.html} }
TY - JOUR T1 - Approximation Results for Gradient Flow Trained Neural Networks AU - Welper , Gerrit JO - Journal of Machine Learning VL - 2 SP - 107 EP - 175 PY - 2024 DA - 2024/06 SN - 3 DO - http://doi.org/10.4208/jml.230924 UR - https://global-sci.org/intro/article_detail/jml/23210.html KW - Deep neural networks, Approximation, Gradient descent, Neural tangent kernel. AB -

The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous $L_2(\mathbb{S}^{d−1 )}$-norm on the $d$-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. We show gradient flow convergence based on a neural tangent kernel (NTK) argument for the non-convex optimization of the second but last layer. Unlike standard NTK analysis, the continuous error norm implies an under-parametrized regime, possible by the natural smoothness assumption required for approximation. The typical over-parametrization re-enters the results in form of a loss in approximation rate relative to established approximation methods for Sobolev smooth functions.

Gerrit Welper. (2024). Approximation Results for Gradient Flow Trained Neural Networks. Journal of Machine Learning. 3 (2). 107-175. doi:10.4208/jml.230924
Copy to clipboard
The citation has been copied to your clipboard