April 2024 Digest
Mar 28, 2024 5:12 pm
Consulting and Training
If you are interested in consulting or training engagements then don’t hesitate to reach out to me at inquiries@symplectomorphic.com.
Upcoming Courses
Only one more month to register for the first modules of my 2024 remote courses, https://events.eventzilla.net/e/principled-bayesian-modeling-with-stan-2138610063!
Friendly reminder that I will try to accommodate as many discounted tickets as I can for black, indigenous people of color in high-income countries and those from low and middle-income countries. For more info contact me at courses@symplectomorphic.com.
Recent Writing
Survival Modeling
I pushed some minor updates to my previous paper on threshold survival models, http://arxiv.org/abs/2212.07602. Additions include a closed form for the implied event density function and identifiability considerations. Perhaps the most novel addition is in the appendix where I show how any density function over the accumulate hazard function can be reversed into a modified survival model for a particular warping function. This drastically opens up the modeling possibilities.
Conditional Probability Theory Chapter
My appreciation for conditional probability theory? It is very much unconditional.
In my latest chapter you, too, can learn why conditional probability theory is so great.
HTML: https://betanalpha.github.io/assets/chapters_html/conditional_probability_theory.html
PDF: https://betanalpha.github.io/assets/chapters_pdf/conditional_probability_theory.pdf
Conditional probability theory allows us to break up probability distributions into smaller, more manageable pieces. This is useful for not only simplifying calculations but also building up sophisticated probability distributions in the first place.
Realizing a consistent decomposition, however, requires careful useful use of all of the probability theory that we have developed to this point. This is especially true for the decomposition of probability density function representations of probability distributions.
Support Me on Patreon
If you would like to support my writing then consider becoming a patron, https://www.patreon.com/betanalpha. Right now covector+ supporters have early access to a new chapter on modeling selection processes.
Probabilistic Modeling Discord
I’ve recently started a Discord server dedicated to discussion about (narratively) generatively modeling of all kinds, https://discord.gg/xeDxXmBF.
Recent Rants
On Communicating p-values
You want to know why we keep seeing terrible explanations of p-values? Because concise explanations understandable by a general audience are fundamentally impossible.
Oh yeah it's a thread about communicating p-values.
The proper definition of a p-value — not just what it is but also how it is used — is technical and even in full formality pretty subtle.
It requires the definition of not one but _two_ different statistical models (because we’re never actually rejecting the null model in isolation), a particular test statistic (that has to be designed probability theory on continuous spaces), an automated decision making process between the two models based on that test statistic, and an ensemble calibration of the performance of that process!
Moreover that formal definition doesn’t even consider the implementation challenge of actually computing p-values in practice.
The strengths and weaknesses of p-values are relatively straightforward in this technical setting. We can have productive conversations about their tradeoffs and if they’re robust enough for practical use. Those strengths and weaknesses become obscured, however, as soon as we move away from the technical definition to something more informal in an attempt for “accessibility”.
None of this is to say that we cannot explain anything and that statistical methodology has to be cordoned off from stakeholders. The key to accessibly is to not divorce p-values from technical details but rather avoid talking about p-values entirely.
Instead we can explain for what p-values are being used. “We want to choose between two models, and in some cases we can inform productive decisions by abandoning one model if the observed data appear to be too extreme. Here are some of the ways that this process can go wrong.”
The problems all arise when we try to explain the awkward way this intuition is actually implemented in math and how the deviation between the two manifest in practical issues. In particular from this broader perspective p-values are just _implementation details_. They are an ingredient in how a decision is made, not what the decision is or how productive the decision making process is in a given application.
_We cannot productively define a p-value without the math_. Every attempt to do so will be a compromise and fail in important ways. Vague language that abstracts the math might appear less intimidating but all it will do is facilitate confusion between important concepts, such as model rejection verses model probabilities verses parameter probabilities verses data probabilities verses data tail probabilities.
Oh, and related terms like “power” or “sample size” are not less forgiving, especially when working with nested models where the minimum power is technically zero without introducing additional, often unstated, assumptions. Imagine a world where people don’t use “power” to refer to ambiguous and often varying notions of “precision”!
On Objectivity
Convention is not objectivity.
On Interpretable Parameters in Narratively Generative Modeling
Narratively generative models interpret statistical models as collections of data generating processes that hopefully well-approximate some true data generation process.
This interpretability provides a way of connecting statistical models to our precious domain expertise, facilitating model development, critique, and more.
In particular the individual parameters of narratively generative model govern a distinct aspect of the underlying data generation process.
For example a regression model
y ~ normal(alpha + beta * x, sigma)
becomes narratively generative when alpha, beta, and sigma corresponding to meaningful phenomena and not just mathematical patterns.
The linear response mu = alpha + beta * x models how some latent phenomena responds, or perhaps approximately responds, to external variations in the covariate x.
When we treat models as black boxes we lose this connection between the model and the system we’re modeling. Parameters become unlabeled knobs; we can turn them but we’re largely oblivious to their consequences.
We may be able to learn parameter configurations consistent with observed data and use them to inform in-sample predictions, but we won’t have any idea how to adjust those configurations to account for new circumstances and inform predictions that generalize.
An interesting side effect of Bayesian methods is that developing an informative prior model is easiest when we can connect the parameters to our domain expertise. Prior modeling is outright frustrating when the parameters don’t admit a meaningful interpretation.
If we push through that struggle and establish a principled connection then we end up with an interpretable model that is easier to not only robustly apply to practical problems but also critique and improve as needed.
At the same time if we embrace narratively generative modeling from the beginning then we already have a meaningful interpretation and prior modeling becomes much less onerous.
There’s a natural synergy between Bayesian inference and narratively generative modeling. Not all Bayesian analyses use generative models and generative models are useful in other inferential approaches, but Bayesian inference using generative models is particularly productive.
This is probably why “interpretable model” and “Bayesian inference” are synonymous for so many, especially in more applied fields.
For many Bayesian methods are the first opportunity they have to build bespoke, interpretable models and not have to rely on a collection of rigid black boxes that are either incomparable if not outright inconsistent with their hard-earned domain expertise.
Incidentally this synergy is also why developing adequate models are so important in applied analyses. Models that are too rigid will often contorts themselves to fit observed data, pulling the individual parameter inferences away from their proper generative interpretations.
These skewed interpretations then lead to decisions and predictions that poorly generalize beyond that particular data set.
For more on narratively generative modeling and strategies for developing narratively generative models of your own see https://betanalpha.github.io/assets/case_studies/generative_modeling.html.