February 2025 Digest
Jan 30, 2025 4:50 pm
Upcoming Office Hours
What's more romantic than really good statistical methodology? I'll be hosting open office hours on Tuesday, February 11th starting at 1:30 PM EDT with a VOD available immediately afterwards, https://www.patreon.com/posts/120148236. Drop your questions in the comments or the included email.
Recent Writing
They say that you can’t account for taste, but they don’t say anything about modeling it. In my latest case study I model some of the Netflix Prize data to inform movie recommendations with a Bayesian flair.
PDF: https://betanalpha.github.io/assets/chapters_html/ratings.html
HTML: https://betanalpha.github.io/assets/chapters_pdf/ratings.pdf
The customer reviews in the Netflix Prize data set provide a nice example of ordinal data. I use cut point modeling techniques to model not only the reviews but also how the reviews tend to vary across customers. This allows for recommendations that account for customers that are particularly generous or austere with their stars.
I even go so far as to model the heterogenous cut points hierarchically. This is then paired with a multivariate normal hierarchical model for the customer movie preferences. The final model is so hierarchical it could be the basis of a late-stage capitalist society.
While mostly a demonstration of various modeling techniques I do spend some time at the end to discuss computational scaling issues and what strategies one might consider to go beyond the small subset of data, and tens of thousands of parameters, that I considered here.
Consulting and Training
If you are interested in consulting or training engagements or even commissioning me to create a presentation on a topic of interest then don’t hesitate to reach out to me at inquiries@symplectomorphic.com.
Probabilistic Modeling Discord
I have a Discord server dedicated to discussion about (narratively) generatively modeling of all kinds, https://discord.gg/QKmrk4hy.
Support Me on Patreon
If you would like to support my writing then consider becoming a patron, https://www.patreon.com/betanalpha.
Recent Rants
On Marginal Posterior Density Functions
In my nightmares.
I am surrounded.
By wiggly density function estimators.
Let’s discuss why we can never actually construct marginal posterior density functions in practical Bayesian inference.
Firstly why are we so drawn to marginal posterior density functions? I place the blame on how we are so often taught Bayesian inference in the first place.
Most introductory presentations of Bayesian inference end with the construction of a one or two dimensional posterior density function p(theta | tilde{y} ), a function that can conveniently be fully visualized with for example graph or contour plot. These visualizations can communicate all of the qualitative features of the posterior distribution at the same time. With a proper interpretation* they are both elegant and compelling representations of inferential uncertainty.
*For discussion on how to properly interpret probability density functions see https://betanalpha.github.io/assets/chapters_html/density_functions.html#sec:visualizing.
Unfortunately most Bayesian analyses in practice have to consider more than a few parameters. In these settings the simplicity of the low-dimensional demonstrations evaporates away.
To be clear Bayes’ theorem holds in any dimension. Given a prior density function p(theta) and a likelihood function p( tilde{y} | theta) we can immediately construct an unnormalized posterior density function,
p( theta | tilde{y} ) \propto p( tilde{y}, theta) = p( tilde{y} | theta) p(theta).
Note that there is absolutely no computation involved here: the construction of a the posterior density function is immediate and costless once we have defined the joint model
p( y, theta) = p( y | theta) p(theta).
Constructing the joint model is relatively straightforward with tools like probabilistic programming.
Problems arise, however, when we try to communicate the structure of p( theta | tilde{y} ) when the parameter space contains more than a few dimensions. As with any higher-dimensional function we cannot visualize p( theta | tilde{y} ) in its entirety. We can visualize one or two-dimensional slices, but most people aren’t trained to stitch those slices together into a coherent, high-dimensional understanding. To be honest I have my doubts as to whether anyone can be trained to do that effectively.
A more productive approach is to construct relevant, interpretable, and low-dimensional summaries onto which we can project our posterior inferences and then visualize.
Still motivated by those low-dimensional introductory examples _marginal posterior density functions_ become particularly attractive summaries. For example we might integrate out all but one parameter,
p( theta_{i} | tilde{y} )
= \int d theta_{1}} … d theta_{i - 1} d theta_{i + 1} .. d theta_{I}
p( theta_{1}, …, theta_{I} | tilde{y} ),
resulting in a function that we can visualize in its entirety.
Notice, however, that the only way to derive the marginal posterior density function p( theta_{i} | tilde{y} ) from the full posterior density function p( theta_{1}, …, theta_{I} | tilde{y} ) is to compute a series of integrals _for every value of the marginal parameter theta_{i}_.
In exceptionally nice circumstances we might be able to do these integrals analytically, resulting in a closed form expression for the marginal probability density function that we can visualize as easily as in those introductory examples. Most of the time, however, we have to resort to numerical calculations.
Frustratingly integrating over some, but not all, of the parameters is not really possible with computational tools like Markov chain Monte Carlo. At the same time numerical integration isn’t feasible if we’re integrating out more than a few parameters. Even in an ideal numerical setting evaluating a marginal posterior density function requires immense computation. In more realist settings the evaluation is outright impossible to implement.
But don’t many software package claim to provide marginal posterior density function visualizations? Violin and ridge plots are everywhere after all, right?
The truth is that these packages don’t actually evaluate marginal posterior density functions. Instead they take posterior samples of the marginal parameters generated by Markov chain Monte Carlo and then use _kernel density estimators_ to try to _infer_ a marginal posterior density function. Unfortunately posterior samples don’t actually provide enough information to reconstruct marginal posterior density functions alone. Kernel density estimators have to complement the samples with _models_ that restrict the possible marginal behaviors. These models are built up from a series of strong assumptions, most of which are hidden behind defaults if accessible to users at all.
A consequence of these hidden modeling assumptions is that the resulting density function estimates are almost always plagued by awkward artifacts. Ripples obscure true marginal behaviors. Estimated marginal tail behaviors are highly variable and often unreliable. Ridge plots stacking rippling density function estimates over rippling density function estimates may not appear to be a substantial issue, but these wiggles really do make the visualizations an unreliable to communicate the exact behavior of the posterior distribution.
Is there anything we can do to communicate reliable posterior insights when working with high-dimensional models?
Yes! The key is to structure our visualizations around feasible computations. For example we can summarize particular marginal behaviors with component means, variances, and quantiles. Even better we can represent marginal posterior behavior with histograms of marginal posterior probabilities.
If you’re curious how far one can take practical computations then take a look at my recommendation posterior visualizations at https://github.com/betanalpha/mcmc_visualization_tools. I also discuss the motivation behind these in Section 5 of https://betanalpha.github.io/assets/chapters_html/transforming_probability_spaces.html#sec:1d-pushforward-characterizations.
It’s a new year. Don’t be afraid to transcend the oversimplification of Bayesian inference tutorials. Embrace the complexity of high-dimensional models and compatible visualizations. Most of all stop compromising your analyses with meaningless wiggles.