June 2024 Digest

May 30, 2024 1:53 am

Consulting and Training

If you are interested in consulting or training engagements then don’t hesitate to reach out to me at inquiries@symplectomorphic.com.

Upcoming Courses

Still time to sign up for any of the last three modules of my remote courses this year, https://events.eventzilla.net/e/principled-bayesian-modeling-with-stan-2138610063. We’ll be covering the foundations of regression modeling, hierarchical modeling, and gaussian process modeling.

Recent Writing

Two men looked out from prison bars. One saw the mud, the other saw a neat opportunity to demonstrate Bayesian inference.

HTML: https://betanalpha.github.io/assets/chapters_html/window_inference.html

PDF: https://betanalpha.github.io/assets/chapters_pdf/window_inference.pdf

In this case study I implement an analysis from David MacKay’s classic textbook where the goal is to infer the geometry of a window in a dark from room only the stars that we can see through it.

Along the way we’ll learn all about indicator functions, including how to invert the defining subset and argument of an indicator function so observational constraints can be transformed to model configuration constraints, and design some cool posterior visualizations.

Support Me on Patreon

If you would like to support my writing then consider becoming a patron, https://www.patreon.com/betanalpha.

Probabilistic Modeling Discord

I’ve recently started a Discord server dedicated to discussion about (narratively) generatively modeling of all kinds, https://discord.gg/W2QVJaV6.

Recent Rants

On The Implementation of Science

I was asked about my thoughts on making a scientific case based on a Bayesian analysis, in particular strategies for publications. This question comes up from time to time and unfortunately I don't think that there is a satisfying let alone productive answer. The problem is how "science" becomes awkwardly operationalized in the literature.

We often start our journey into science with the presentation of the "scientific method" early in our education that we tend to internalize and take for granted. If you go back to the individual steps in the scientific method, however, they aren't all that well defined.

The vagueness of these definitions leaves all kinds of room for the scientific method to be operationalized into explicit procedures in many different ways. Usually it's tradition and heuristics that motivate these explicit implementations.

Perhaps the most common operationalization focuses on the concept of "discovery". This concept itself, however, is vague enough for many different approaches to fit under its guise.

For example we have the falsification perspective where the scientific method proceeds by showing that the current hypothesis is wrong, or more realistically that it is incomplete, without comparison to any other hypotheses.

In practice this is often heuristically operationalized as rejecting a null model using some kind of extremity statistic like a p-value. The physicist fetish of judging models by their "chi2 per degree of freedom" also falls into this camp.

There are two immediate issues with this approach. Firstly it's never been how science is ever actually done. Scientists may be suspicious of a null model but they never reject it outright until there is an explicit replacement to take its place.

Secondly because we are always limited to finite data in practice we can never reject a wrong model in complete confidence. Ignoring this uncertainty leads to fragile results, as evident by the constant discussion of reproducibility crises in the sciences.

One of my favorite examples of this is the constantly evolving "discovery threshold" in particle physics. It started around 2 "sigma", about 5% statistical significance, and then has been trending up to the current heuristic of 5 "sigma" due to a series of bad results.

A more robust, and normative, take on "discovery" is hypothesis _comparison_ where the initial null model is compared to some alternative model and a decision is made between the two based on experimental measurements.

One approach to hypothesis comparison, for example, is the frequentist null hypothesis significance testing technique. This approach has its issues but explicitly accounting for the alternative hypotheses with some notion of power does leads to more robust results (at least when implemented correctly).

Information criteria and other predictive performance scores as well as Bayes factors also fall into this perspective, selecting a "best" model amongst a collection of candidate models. That said the performance of these approaches tends to be much more variable because they do not, in general, come with any notion of false positive and true positive guarantees even under strong assumptions.

The ultimate problem is that choosing one hypothesis over another is not an inference problem but a _decision_ problem, and in order to formalize it we need to define an explicit utility function that quantifies the consequences of making the wrong and right choices.

Null hypothesis significance testing, for example, can be framed as a decision making process with a utility function constructed from false positive rates and true positive rates (or more often a heuristic lower bound on true positive rates).

Other "model comparison" methods are either completely heuristic or are based on implicit utility functions that often have no relevance to any meaningful scientific goals (I'm looking at you, Kullback-Leibler divergences).

Incidentally for more on formalizing the performance of decision making processes in both frequentist and Bayesian inference see my paper https://arxiv.org/abs/1803.08393 which I love so much despite maybe three people having ever read it.

All of this means that in order to operationalize the scientific method as a process of "discovery" we need the entire community to agree on what utility functions should be used to quantify the performance of discovery claims.

And this is where the shit show that is scientific publishing comes in. Different communities and different journals often rely on different, and most often implicit, utility functions which defines the norms for what needs to be presented in a scientific publication.

Every analysis has to conform to these expectations. If the scientific method is implicitly interpreted as rejecting null models based on a statistic then no Bayesian analyses will ever be acceptable.

That said in practice publishable analyses often need to only superficially appear to conform to these expectations. Reporting tail posterior probabilities as "p-values" does wonders when editors and reviewers don't actually know what the p-values they keep requesting actually are...

Anyways, the ironic thing about Bayesian analyses is that their focus on explicit models often forces us to confront unstated assumptions such as those behind scientific publishing norms.

This is one of the reasons why I think that Bayesian analyses are so hard to publish. When you actually show all of the poor assumptions that everyone takes for granted it's too easy to blame the analysis and not those poor assumptions.

Personally I'm with George Box that the most effective implementation of the scientific method is based on iterative model development, https://tandfonline.com/doi/abs/10.1080/01621459.1976.10480949. For more on this see perspective see this presentation, https://www.patreon.com/posts/scientific-talk-98124091 and the accompanying slides.

From this perspective we can avoid "discovery" in publication entirely and focus on presenting the assumptions and consequences of our models so that others can decide how they might want to use those results in their own analyses (just like how science _actually_ works).

In other words we avoid premature "discovery" decisions entirely, instead using publications to only communicate the details of our analyses. Others can then make decisions based on those details and whatever utility function is right for them.

Ultimately there are two questions of interest: how does one get a Bayesian analyses accepted into an existing journal and how should a Bayesian analysis be communicated so that it can be best used by others.

The latter is the far more interesting question, but the former is about what most people actually care. Because of the conventions on which journals are based are so heuristic and variable answers to the former question will also be heuristic and variable.

Me? I push collaborators to release preprints with all of the juicy modeling details and relevant inferences for science as early as possible and then let them deface the manuscript however they need for a chance at publication and the metrics they need to sustain their careers.

On Density Estimation

Remember: friends don’t let friends use kernel density estimators to try to convert posterior samples into marginal posterior density functions to conclude their Bayesian analyses.

Probability density functions are, by definition, objects that can be integrated to give expectation values. Samples are, again by definition, objects that can be summed to estimate expectation values. There is no way to compare these intermediate representations to each other beyond the resulting expectaton values. In particular there is no well-posed way to transform one in the other.

Kernel density estimators transform samples into density functions only with the introduction of a structural assumptions. When these assumptions aren’t quite correct they introduce artifacts in estimated density functions which can be easily misinterpreted.

For additional discussion see Section 3.2.6 of https://betanalpha.github.io/assets/case_studies/sampling.html#326_No-Good-Sensity_Estimators.

Speaking of which, despite what so many introductions to Bayesian inference imply a probability density function often shouldn’t be the goal of a Bayesian analysis anyways.

Probability density functions are useful visualizations only in the sense they that allow us to approximate various expectation values by eye. Many of their features, in particular behavior at individual points, don’t have any consequences for expectation values and hence are entirely artifacts of the representation. If the consumers of an analysis are not trained to disregard these artifacts then probability density function visualizations can be more misleading then informative.

For some additional discussion see Section 4.3 of https://betanalpha.github.io/assets/chapters_html/density_functions.html#sec:visualizing.

The same is true for samples. Direct visualizations of samples, such as scatter plots, are useful only the in sense they allow us to quickly gauge the behavior of certain expectation value estimators. For example if the samples are denser in one region of space then the probability allocated to any subset located in that region will be larger than if it were located somewhere else. Faithful interpretations of these visualizations is not automatic!

Oh, and finally note that probability density function and sample representations of a probability distribution naturally complement each other. Probability density functions are straightforward to condition (up to normalization) but difficult to marginalize and pushforward more generally. On the other hand samples are straightforward to marginalize but difficult if not impossible to condition.

Comments