June 2025 Digest

May 28, 2025 2:06 pm

Upcoming Courses


In August I’ll be teaching four brand new remote, open enrollment courses, with modules covering Bayesian mixture modeling, survival modeling, pairwise comparison modeling, and ordinal modeling, https://www.eventzilla.net/e/advanced-bayesian-modeling-in-stan-2138661763.


Sharing the course page with friends and colleagues is greatly appreciated!


Recent Writing


Earlier this month I gave a workshop on Bayesian methods in epidemiology. With my hosts I developed a pretty elaborate, demonstrative analysis for the workshop which I am happy to share publicly here. 


HTML: https://betanalpha.github.io/assets/chapters_html/fertility.html

PDF: https://betanalpha.github.io/assets/chapters_pdf/fertility.pdf


The analysis intentionally starts slowly, with a relatively simple initial model that allows people to familiarize themselves with the practical implementation of a Bayesian analysis in Stan. This includes principled prior modeling, directed graphical models, computational diagnostics, posterior retrodictive checks, and well-defined posterior summaries.


After this the complexity the analysis accelerates, first with the consideration of heterogeneous behaviors followed by the modeling of population clinical conditions and demographics. This then allows for the analysis of not just hypothetical fixed interventions but also hypothetical _probabilistic_ interventions to account for uncertainty in the hypothesis.


The analysis ends with a little surprise that demonstrates both the limitations and importance of model critique for well-behaved inferences.


I’ll also be reviewing this case study in detail this afternoon in a livestream for my covector+ supporters on patreon dot com, https://www.patreon.com/c/betanalpha. The livestream will be recorded in case anyone wants to join and watch later.


Upcoming Remote Talk


A few weeks ago I gave a talk at Imperial College London that was supposed to live-streamed, but the live-stream was unfortunately plagued by some technical issues that make the remote viewing experience unpleasant. In an effort to compensate I'm going to give the talk again on June 4, and this time anyone can join. For details see https://www.patreon.com/posts/generative-talk-128366935.


Consulting and Training


Are you, or is someone you know, struggling to develop, implement, or scale a Bayesian analysis compatible with your domain expertise? If so then I can help.

I can be reached at inquiries@symplectomorphic.com for any questions about potential consulting engagements and training courses.


Probabilistic Modeling Discord


I have a Discord server dedicated to discussion about (narratively) generatively modeling of all kinds, https://discord.gg/7SZCxGgC. Come join the conversation!


Support Me on Patreon


If you would like to support my writing then consider becoming a patron, https://www.patreon.com/betanalpha.


Recent Rants


On The Reproducibility of Markov Chain Monte Carlo


One of the more surprising behaviors of Markov chains is that the more effectively a Markov transition explores its target distribution the more quickly it will increase nearly-insignificant floating point perturbations into substantial numerical differences.


Want to see for yourself? Try running the following two Stan programs using the same configurations, including the `seed` argument, and then compare the outputs.  


Model 1:

```

parameters {  

  real<lower=0> x;}

model {  

  target += normal_lpdf({0, 1, 2} | 2, x);

}

```


Model 2:

```

parameters {  

  real<lower=0> x;

}

model {  

  target += normal_lupdf({0, 1, 2} | 2, x);

}

```


For example if you use RStan then might run the following code. Note that `permute=FALSE` is necessary when using the rstan:::extract function to avoid random shuffling of the Markov chains.


``

`library(rstan)rstan_options(auto_write = TRUE)      # Cache compiled Stan programs

options(mc.cores = parallel::detectCores()) # Parallelize chains

parallel:::setDefaultClusterOptions(setup_strategy = "sequential")


code1 <- "parameters { real<lower=0> x;}model { target += normal_lpdf({0, 1, 2} | 2, x);}"


fit1 <- stan(model_code=code1,       

         seed=8438338,       

         warmup=1000, iter=2024, refresh=0)


code2 <- "parameters { real<lower=0> x;}model { target += normal_lupdf({0, 1, 2} | 2, x);}"


fit2 <- stan(model_code=code2,       

         seed=8438338,       

         warmup=1000, iter=2024, refresh=0)

 

extract(fit1, permute=FALSE)[1:5, 1, 1]

extract(fit2, permute=FALSE)[1:5, 1, 1]

```


The only Markov chain output that is even approximately reproducible are ensemble expectations, and only if a Markov chain Monte Carlo central limit theorem holds and then up to only Markov chain Monte Carlo standard error.


On Interactions


I hate to be the bearer of bad news but higher-order interactions can’t just be ignored because they’re inconvenient or you don’t have enough observations to infer them all. Most real-world behaviors are complex and our models need to match that complexity.

The usual problem is that we want to model some behavior which may or may not vary across various, overlapping groups. More formally we might have K categorical variables f_k, each with L_k possible values, and there's no reason why a parameter theta should be homogeneous across any of them.


In general the only way to account for all of the possible variations is to allow theta to vary across each of the L_1 * ... * L_K intersections of these categories.


Unless K and each L_k are all small this general solution will require introducing many of new parameters. Moreover this horde of new parameters is unlikely to be well-informed by anything but the largest and most clean data sets.



One way to manage all of the possible heterogeneities is to decompose them into simpler pieces. In particular we might consider the homogeneous behavior shared by all groups first, then the behavior shared across the intersections of each pair of categorical variables, then triplets, and so on.


For a more detailed derivation of this decomposition and its utility see Section 3 of https://betanalpha.github.io/assets/case_studies/factor_modeling.html#3_Overlapping_Multifactor_Models. For those who really love tensors Section 3.4 shows how this expansion is just the decomposition of a tensor product into direct sums.


Note that these "interaction" expansions introduce more parameters than the original L_1 * ... * L_K intersections. In particular the expansion parameters will in general be non-identified and reasonable inferences will require some strong regularization, such as in the form of a prior model.


Many presentations of heterogeneity modeling -- I'm looking at you "random effects" -- take this decomposition for granted, starting with an intercept and "main effects" before maybe considering higher-order "interactions".


The problem is that if _any_ of the higher-order contributions are not included the resulting model will only _approximate_ the full heterogeneity, and there's no universal reason why those approximations might be accurate in any given application.


Indeed keeping only low-order contributions results in a reasonable approximation only when those low-order contributions dominate the full contribution from all of the higher-order terms. 


This, in turn, usually occurs only when the higher-order contributions rapidly decay, which is lovely mathematically but in no way guaranteed to hold when modeling actual behaviors in any practical application.


The consequences of ignoring large higher-order terms can be drastic. In order to fit any observation as well the possible the lower-order contributions that are included will have to contort themselves, corrupting any meaningful interpretation and undermining generalization to other observations.


If you want to see this behavior in action see the demonstration in Section 4.3 of 

https://betanalpha.github.io/assets/case_studies/factor_modeling.html#43_Overlapping_Factor_Models_in_Action.


All of this is to say that ignoring higher-order interactions simply because they are frustrating to implement or one has only limited data will generally lead to poor inferences. The adequacy of a lower-order model is a strong assumption, and one that we are responsible for validating in practice.


If you think this is worrisome then I hope that you don't rely on linear models too often either...


Comments