Hi, I'm Jesse

(move me)

Research

Check out Timaeus's research for more work from my team.

Studying Small Language Models with Susceptibilities

Studying Small Language Models with Susceptibilities

2025-04-25
Garrett Baker=, George Wang=, Jesse Hoogland, Daniel Murfet

We develop a linear response framework for interpretability that treats a neural network as a Bayesian statistical mechanical system. A small, controlled perturbation of the data distribution, for example shifting the Pile toward GitHub or legal text, induces a first-order change in the posterior expectation of an observable localized on a chosen component of the network. The resulting susceptibility can be estimated efficiently with local SGLD samples and factorizes into signed, per-token contributions that serve as attribution scores. Building a set of perturbations (probes) yields a response matrix whose low-rank structure separates functional modules such as multigram and induction heads in a 3M-parameter transformer. Susceptibilities link local learning coefficients from singular learning theory with linear-response theory, and quantify how local loss landscape geometry deforms under shifts in the data distribution.

You Are What You Eat – AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

You Are What You Eat – AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

2025-02-08
Simon Pepin Lehalleur=, Jesse Hoogland=, Matthew Farrugia-Roberts=, Susan Wei, Alexander Gietelink Oldenziel, Stan van Wingerden, George Wang, Zach Furman, Liam Carroll, Daniel Murfet

In this position paper, we argue that understanding the relation between structure in the data distribution and structure in trained models is central to AI alignment. First, we discuss how two neural networks can have equivalent performance on the training set but compute their outputs in essentially different ways and thus generalise differently. For this reason, standard testing and evaluation are insufficient for obtaining assurances of safety for widely deployed generally intelligent systems. We argue that to progress beyond evaluation to a robust mathematical science of AI alignment, we need to develop statistical foundations for an understanding of the relation between structure in the data distribution, internal structure in models, and how these structures underlie generalisation.

Dynamics of Transient Structure in In-Context Linear Regression Transformers

Dynamics of Transient Structure in In-Context Linear Regression Transformers

2025-01-29
Liam Carroll=, Jesse Hoogland, Matthew Farrugia-Roberts, Daniel Murfet

Modern deep neural networks display striking examples of rich internal computational structure. Uncovering principles governing the development of such structure is a priority for the science of deep learning. In this paper, we explore the transient ridge phenomenon: when transformers are trained on in-context linear regression tasks with intermediate task diversity, they initially behave like ridge regression before specializing to the tasks in their training distribution. This transition from a general solution to a specialized solution is revealed by joint trajectory principal component analysis. Further, we draw on the theory of Bayesian internal model selection to suggest a general explanation for the phenomena of transient structure in transformers, based on an evolving tradeoff between loss and complexity. We empirically validate this explanation by measuring the model complexity of our transformers as defined by the local learning coefficient.

Talks

Check out the SLT & AI Safety channel for more related videos.

Singular Learning Theory & AI Safety

2025-06-26

Singular learning theory (SLT) identifies the geometry of the loss landscape as key to understanding neural networks. In this talk, I will explore applications of this framework and perspective for interpretability, alignment, and other areas of AI safety.

Embryology of AI

Embryology of AI

2025-06-19

Jesse Hoogland and Daniel Murfet, founders of Timaeus, introduce their mathematically rigorous approach to AI safety through 'developmental interpretability' based on Singular Learning Theory.

Jesse Hoogland on Singular Learning Theory

Jesse Hoogland on Singular Learning Theory

2024-12-01

You may have heard of singular learning theory, and its 'local learning coefficient', or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models.

Other Writing

Check out my LessWrong profile for a more up-to-date list of writings.

SLT for AI Safety

SLT for AI Safety

2025-07-01

The Sweet Lesson: AI Safety Should Scale With Compute

2025-05-05
Timaeus in 2024

Timaeus in 2024

2025-02-20
Jesse Hoogland, Stan van Wingerden, Alexander Gietelink Oldenziel, Daniel Murfet