March 2024 Digest
Feb 28, 2024 8:19 pm
Consulting and Training
If you’re interested in consulting or training engagements then don’t hesitate to reach out to me at inquiries@symplectomorphic.com.
Upcoming Courses
I’ll be hosting an abbreviated suite of remote courses this summer, https://events.eventzilla.net/e/principled-bayesian-modeling-with-stan-2138610063. The material will cover foundations and a few advanced modeling techniques, including my gaussian process module that I wasn’t able to offer last year.
If you've enjoyed my courses before then please don't hesitate to share this upcoming course with colleagues and collaborators. Many thanks to everyone who has helped to spread the word.
Recent Speaking
In January I attended a lovely event on (narratively) generative modeling in ecology, https://temporalecology.org/bayes2024/, where I gave a somewhat-unusual talk titled “Scientific Inference”. In this high-concept talk I discuss how statistics can be quite naturally integrated with the scientific method, and why that integration is critical to giving statistical analyses the context needed for practical success.
More recently I lived streamed a presentation of the talk which you can view at https://www.patreon.com/posts/scientific-talk-98124091.
Recent Writing
While I was in town for that seminar I also helped to teach a pretty novel ecology workshop that focused on interactivity and domain expertise rather than copy and pasting code demonstrating a particular modeling or inferential technique. The workshop material is now online.
HTML: https://betanalpha.github.io/assets/chapters_html/tree_diameter_growth_analysis.html
PDF: https://betanalpha.github.io/assets/chapters_pdf/tree_diameter_growth_analysis.pdf
The goal of the workshop was to give the audience familiar data and let them develop their own analyses, with the instructors gently guiding them towards (narratively) generative modeling and Bayesian inference. In the end the audience saw that these ideas fall out pretty naturally from the science that they already know.
There's nothing particularly complicated about the models featured here, but I think that the material provides a useful example of building an analysis up from domain expertise rather than prebuilt parts. Yes, you might end up with a model that fits a regression template but you might also have to play around with some elliptical integrals to place that model into an interpretable context.
Support Me on Patreon
If you would like to support my writing then consider becoming a patron, https://www.patreon.com/betanalpha. Right now covector+ supporters have early access to a what ended up being a very long chapter on conditional probability theory.
Probabilistic Modeling Discord
I’ve recently started a Discord server dedicated to discussion about (narratively) generatively modeling of all kinds, https://discord.gg/7eeAcQFp.
Recent Rants
On Hierarchical Population Modeling
Oh, no. You need to account for some heterogeneity in an analysis. Fortunately you can just slap down a hierarchy and be done with it, right? Right?!?!
Strap in for a long thread on population modeling.
Often the heterogeneity of the behaviors that we are interested in modeling are _unstructured_; knowing the group we’re in doesn’t inform _how_ the behavior should vary. Mathematically permuting or relabeling the contexts shouldn’t change the heterogeneity.
This kind of unstructured heterogeneity is also known as _exchangeability_. The assumption of exchangeability drastically simplifies the modeling problem. All consistent models of exchangeable heterogeneity are exactly, or very well-approximately, given by a particular mathematical form.
Specifically variables theta_k modeling the individual behaviors in an exchangeable model are pooled towards a global _population model_, pi(theta_k | phi) which couples the individual parameters together. The theta_ and phi form a _hierarchy_ of parameters.
If we learn phi from observed data then this coupling is dynamic. Large deviations in the observed data lead to a wider population model and weaker coupling; smaller deviations lead to a narrower population model and stronger coupling.
In order to specify a complete hierarchical model, however, we need to define the exact form of this population model. Hierarchical modeling is not a black box model but rather a generic modeling technique that leaves us with an important choices to make!
When the individual behaviors are modeled with one-dimensional, unconstrained real variables theta_k then the most common choice of population model is a normal population model,
theta_k ~ normal(mu, tau).
For the normal population model the theta_k concentrate around the population location mu with the population scale tau controlling how strong this concentration is. This population model is not only directly interpretable but mathematically straightforward to manipulate and implement in practice.
Because the normal population model enjoys so many nice properties it is often the _only_ population model that we learn; so much so that many assume that _every_ hierarchical model has to be of this form. Unfortunately this assumption drastically limits the potential of hierarchical modeling.
Even when working with one-dimensional, constrained, and real-valued variables we can incorporate all kinds of interesting behavior by considering different hierarchical models. For example heavier-tailed population models allow us to encode sparse heterogeneity while multimodal population models allow us to incorporate different populations of behaviors.
Normal population models become particularly limiting, however, when we start to consider more general behaviors. For example if the theta_k are one-dimensional but constrained, such as to positive values or even to an interval of values, then the normal population model allows for inconsistent behaviors.
We can sometimes undo these constraints with an appropriate link function, allowing us to model the heterogeneity of the unconstrained values with a normal population model.
That said the interpretation of this unconstrained heterogeneity is completely different from the interpretation of the constrained heterogeneity, and the assumption of a normal population model can result in all kinds of counterintuitive behaviors.
What do we do when the theta_k aren’t even one-dimensional? Multivariate population models are rarely, if even, discussed which often forces practitioners to develop their own heuristic techniques.
For example many will assume that each of the M components of the theta_k varies independently with a separate normal population model for each. The utility of this assumption, however, depends on the interpretation of these components.
In particular if our knowledge of the data generating process implies that the M components in an initial parameterization of theta_k might be coupled then we will first have to engineer a different parameterization that isolates behaviors that actually should vary independently before dropping an independent normal population model.
Others will use the possibility of coupled variation as motivation for a multivariate normal population model. While strictly more general than an independent normal population model the kinds of heterogeneity allowed by a multivariate normal population model are still strongly constrained and not appropriate to every application.
Again a productive hierarchical model requires that we use our domain expertise to consider what kinds of heterogeneities we actually want to incorporate!
Both independent and coupled multivariate normal population models are inappropriate when we’re dealing with behaviors subject to multivariate constraints. For example what kind of population model can consistently pool different M-simplices, and their sum-to-one constraint, together?
We can reparameterize each M-simplex into M unconstrained variables and then apply a multivariate normal population model, but there’s no guarantee that will yield reasonable heterogeneities. Moreover translating our domain expertise to that unconstrained space to determine that reasonability is no small feat.
If we’re not wedded to normal population models, however, then we can direct our attention to models that are defined directly over spaces of simplices. For instance we might assume a Dirichlet population model,
(theta_1k, …, theta_Mk) ~ Dirichlet(alpha_1, …, alpha_M).
With a little bit of mathematic we can even reparameterize the alpha_m into a base simplex and a concentration parameter that determines how strongly the simplices concentrate around that baseline.
By starting with the structure of the problem at hand, and not trying to work backwards from a normal population model, we able to quickly develop a hierarchical model that is not only interpretable but also straightforward to implement in practice.
This is the critical lesson for hierarchical modeling more generally. Despite what so many tutorials and software packages imply hierarchical models are not all normal hierarchical models. In many cases normal hierarchical models are really useful, but if that’s the only tool that we have then we’ll never be able to go beyond those cases.
For some more discussion see
https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html
https://betanalpha.github.io/assets/case_studies/factor_modeling.html
https://betanalpha.github.io/assets/case_studies/modeling_sparsity.html