In this paper, we study the combination of compression and Bayesian elastic net. By including a
compression operation into the ℓ1 and ℓ2 regularization, the assumption on model sparsity is relaxed to
compressibility: model coefficients are compressed before being penalized, and sparsity is achieved in a
compressed domain rather than the original space. We focus on the design of compression operations, by
which we can encode various compressibility assumptions and inductive biases. We show that use of a
compression operation provides an opportunity to leverage auxiliary information from various sources. The
compressible Bayesian elastic net has another two major advantages. Firstly, as a Bayesian method, the
distributional results on the estimates are straightforward, making the statistical inference easier. Secondly, it
chooses the two penalty parameters simultaneously, avoiding the “double shrinkage problem” in the elastic
net method. We conduct extensive experiments on braincomputer interfacing, handwritten character
recognition and text classification. Empirical results show clear improvements in prediction performance by
including compression in Bayesian elastic net. We also analyze the learned model coefficients under
appropriate compressibility assumptions, which further demonstrate the advantages of learning compressible
models instead of sparse models.