About the lab
The laboratory consists of several research groups working on learning and research projects on topics such as automated theorem proving, math for AI safety, ML for mathematical modeling, or mathematical foundations of ML. Each group typically has 3–6 people at a mix of levels (undergraduates, graduate students, postdocs, and faculty), with 1–3 mentors per group. Periodically, groups make short presentations to the whole lab on what they have learned or discovered.
Organizers
- Daniel Halpern-Leistner – homepage
- Lionel Levine – homepage
- Alex Townsend – homepage
- Ziv Goldfeld – homepage
Spring 2026 Projects
There has been recent work modifying the architecture of large language models so that they can reason entirely in a space of latent variables, as opposed to reasoning through text. This project aims to understand that work better and to explore, more generally, models whose output is the result of a dynamical process, rather than a fixed-length computation.
This project will develop a benchmarking task for applying AI to formalization of mathematics in Lean. The benchmark will be very challenging (to appeal to ML researchers), will involve research-level mathematics, and will focus on autoformalization—converting natural language statements and proofs into Lean code—as opposed to black-box automated theorem proving. The philosophy behind the project is explained here: View Slides →
Automated theorem proving methods mostly use reinforcement learning to guide step-wise theorem proving in a formally verifiable language (like Lean), or try to generate a formal proof all at once with an LLM. This project investigates using tree search in a way closer to how people find proofs: identify a sequence of intermediate facts that would verifiably imply the result and seem easier to prove, then iteratively prove these simpler claims.
Standard neural networks approximate functions using compositions of smooth or piecewise-linear nonlinearities, which are poorly suited for representing sharp transitions, boundary layers, or near singular behavior. Rational neural networks replace standard activations with rational functions (ratios of polynomials). This project studies rational neural networks as a mathematically structured alternative to standard architectures.
Modern AI systems are often evaluated by empirical performance, yet many learning tasks are fundamentally limited by the amount of information available in the data. This project studies and extends information-theoretic lower bounds for learning operators and models from data. The goal is to understand when learning is provably impossible or requires an infeasible number of samples, even for idealized learners.
AI coding agents (e.g. Codex-style systems) promise to automate large-scale software development, but it remains unclear whether they are reliable enough for serious scientific computing. This project tests these claims by attempting a systematic translation of a mature numerical computing library into Python.
An agent trained on a given reward (for example, to collect coins in a game) is not necessarily motivated by that reward. One way to see this is to deploy the agent in a new environment: it may act in ways that were correlated with coin collection in its training environment, even if those actions no longer yield coins. This project explores black-box definitions of “motivation” based on counterfactual experience (minimal changes to the environment that alter behavior) and counterfactual training (minimal changes to the curriculum that alter behavior).
This project builds on emili by using FER (facial emotion recognition) to train a reward model for LLM fine-tuning.
Masked language models like BERT define a Markov chain on the set of texts of a given length: at each time step, a random token is resampled conditional on all other tokens. As the temperature decreases, more and more states become metastable “traps”. Building on research started in fall 2025, this project investigates a phase transition in the temperature, how framing the text with a fixed prefix or suffix affects the stationary distribution, and how to tilt the distribution with a reward model. The goal is to understand what this Markov chain reveals about the proclivities or biases of the reward model.
AI labs compete fiercely to surpass each other on published benchmarks. This project examines the incentives of that process: What are the incentives of benchmark makers? What motivates labs to compete on one benchmark and ignore another? What techniques do labs use to improve performance on a benchmark, and what side-effects arise from this “benchmaxxing”? As an application, the project uses EigenBench to design a public-facing leaderboard where LLMs are ranked on their alignment to different value systems.
Modern neural networks can solve complex tasks, but it is often unclear how they do so internally. This project aims to “open the black box” by connecting an interpretable, high-level causal model to the internal computations of a neural network. The objective is to produce concrete “handles” for testing and steering a model, going beyond mere concept localization, and to develop practical methods for identifying and validating high-level algorithmic parts inside real neural networks.