Volume 1, Issue 4
A Mathematical Framework for Learning Probability Distributions

Hongkang Yang

J. Mach. Learn. , 1 (2022), pp. 373-431.

Published online: 2022-12

Category: Theory

[An open-access article; the PDF is free to any online user.]

Export citation
  • Abstract

The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Likewise, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject. This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles. To demonstrate its efficacy, we present a survey of our results on the approximation error, training error and generalization error of these models, which can all be established based on this framework. In particular, the aforementioned paradox is resolved by proving that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality. Furthermore, we provide some new results on landscape analysis and the mode collapse phenomenon.

  • General Summary

The modeling of probability distributions is an important branch of machine learning. It became popular in recent years thanks to the success of deep generative models in difficult tasks such as image synthesis and text conversation. Nevertheless, we still lack a theoretical understanding of the good performance of distribution learning models. One mystery is the following paradox: it is generally inevitable that the model suffers from memorization (converges to the empirical distribution of the training samples) and thus becomes useless, and yet in practice the trained model can generate new samples or estimate the probability density over unseen samples. Meanwhile, the existing models are so diverse that it has become overwhelming for practitioners and researchers to get a clear picture of this fast-growing subject. This paper provides a mathematical framework that unifies all the well-known models, so that they can be systemically derived based on simple principles. This framework enables our analysis of the theoretical mysteries of distribution learning, in particular, the paradox between memorization and generalization. It is established that the model during training enjoys implicit regularization, so that it approximates the hidden target distribution before eventually turning towards the empirical distribution. With early stopping, the generalization error escapes from the curse of dimensionality and thus the model generalizes well.

  • AMS Subject Headings

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{JML-1-373, author = {Yang , Hongkang}, title = {A Mathematical Framework for Learning Probability Distributions}, journal = {Journal of Machine Learning}, year = {2022}, volume = {1}, number = {4}, pages = {373--431}, abstract = {

The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Likewise, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject. This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles. To demonstrate its efficacy, we present a survey of our results on the approximation error, training error and generalization error of these models, which can all be established based on this framework. In particular, the aforementioned paradox is resolved by proving that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality. Furthermore, we provide some new results on landscape analysis and the mode collapse phenomenon.

}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.221202}, url = {http://global-sci.org/intro/article_detail/jml/21298.html} }
TY - JOUR T1 - A Mathematical Framework for Learning Probability Distributions AU - Yang , Hongkang JO - Journal of Machine Learning VL - 4 SP - 373 EP - 431 PY - 2022 DA - 2022/12 SN - 1 DO - http://doi.org/10.4208/jml.221202 UR - https://global-sci.org/intro/article_detail/jml/21298.html KW - Generative modeling, Density estimation, Generalization error, Memorization, Implicit regularization. AB -

The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Likewise, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject. This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles. To demonstrate its efficacy, we present a survey of our results on the approximation error, training error and generalization error of these models, which can all be established based on this framework. In particular, the aforementioned paradox is resolved by proving that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality. Furthermore, we provide some new results on landscape analysis and the mode collapse phenomenon.

Hongkang Yang. (2022). A Mathematical Framework for Learning Probability Distributions. Journal of Machine Learning. 1 (4). 373-431. doi:10.4208/jml.221202
Copy to clipboard
The citation has been copied to your clipboard