Math+AI at Cornell

About the lab

The laboratory consists of several research groups working on learning and research projects on topics such as automated theorem proving, math for AI safety, ML for mathematical modeling, or mathematical foundations of ML. Each group typically has 3–6 people at a mix of levels (undergraduates, graduate students, postdocs, and faculty), with 1–3 mentors per group. Periodically, groups make short presentations to the whole lab on what they have learned or discovered.

Organizers

Daniel Halpern-Leistner – homepage
Lionel Levine – homepage
Alex Townsend – homepage
Ziv Goldfeld – homepage

Spring 2026 Projects

Latent reasoning models

Mentors: Daniel Halpern-Leistner, Gabe Udell

Participants: Jeffrey Huang, Weijie Zhou, Arya Datla, Kabir Rakshe

There has been recent work modifying the architecture of large language models so that they can reason entirely in a space of latent variables, as opposed to reasoning through text. This project aims to understand that work better and to explore, more generally, models whose output is the result of a dynamical process, rather than a fixed-length computation.

Autoformalization benchmark for the working mathematician

Mentors: Daniel Halpern-Leistner, Hanxi (Gary) Chen, Xiaodong Cao, Linyue Xu

Participants: Leo Chang, Konrad Hartung, Rose Xu

This project will develop a benchmarking task for applying AI to formalization of mathematics in Lean. The benchmark will be very challenging (to appeal to ML researchers), will involve research-level mathematics, and will focus on autoformalization—converting natural language statements and proofs into Lean code—as opposed to black-box automated theorem proving. The philosophy behind the project is explained here: View Slides →

Reverse proof search

Mentors: Daniel Halpern-Leistner, Paramjyoti Mohapatra, Raj Gandhi

Participants: Will Li, Olga Zhukova, Jaithra Nagindas

Automated theorem proving methods mostly use reinforcement learning to guide step-wise theorem proving in a formally verifiable language (like Lean), or try to generate a formal proof all at once with an LLM. This project investigates using tree search in a way closer to how people find proofs: identify a sequence of intermediate facts that would verifiably imply the result and seem easier to prove, then iteratively prove these simpler claims.

Rational neural networks for learning singular structure

Mentors: Alex Townsend, John Guckenheimer

Participants: Jess Wang, Milo Schlittgen-Li, Andrew Park, Evan Manolis

Standard neural networks approximate functions using compositions of smooth or piecewise-linear nonlinearities, which are poorly suited for representing sharp transitions, boundary layers, or near singular behavior. Rational neural networks replace standard activations with rational functions (ratios of polynomials). This project studies rational neural networks as a mathematically structured alternative to standard architectures.

Information-theoretic limits of learning

Mentors: Alex Townsend, Diego Balam Sanchez Espinosa

Participants: Caleb Shim, Yafet Negash, Tinkey Yu

Modern AI systems are often evaluated by empirical performance, yet many learning tasks are fundamentally limited by the amount of information available in the data. This project studies and extends information-theoretic lower bounds for learning operators and models from data. The goal is to understand when learning is provably impossible or requires an infeasible number of samples, even for idealized learners.

Can AI coding agents translate a scientific software library?

Mentors: Alex Townsend, Vaibhav Mehta

Participants: Lily Saada, Elinor Tu, Gordon Mei, Michael Sidoti

AI coding agents (e.g. Codex-style systems) promise to automate large-scale software development, but it remains unclear whether they are reliable enough for serious scientific computing. This project tests these claims by attempting a systematic translation of a mature numerical computing library into Python.

Measuring AI motivations

Mentors: Lionel Levine, Chuwen Wang

Participants: Ekadh Singh, Andre Kotsenko, David Rodriguez

An agent trained on a given reward (for example, to collect coins in a game) is not necessarily motivated by that reward. One way to see this is to deploy the agent in a new environment: it may act in ways that were correlated with coin collection in its training environment, even if those actions no longer yield coins. This project explores black-box definitions of “motivation” based on counterfactual experience (minimal changes to the environment that alter behavior) and counterfactual training (minimal changes to the curriculum that alter behavior).

Facial emotion as a reward signal

Mentors: Lionel Levine, Uday Tyagi

Participants: Salena Tang, Gokulanath Kumar

This project builds on emili by using FER (facial emotion recognition) to train a reward model for LLM fine-tuning.

Glauber dynamics on a natural language state space

Mentors: Lionel Levine, Suvadip Sana

Participants: Aitzaz Shaikh, Neer Mehta, Sami Wolf, Alina Shah

Masked language models like BERT define a Markov chain on the set of texts of a given length: at each time step, a random token is resampled conditional on all other tokens. As the temperature decreases, more and more states become metastable “traps”. Building on research started in fall 2025, this project investigates a phase transition in the temperature, how framing the text with a fixed prefix or suffix affects the stationary distribution, and how to tilt the distribution with a reward model. The goal is to understand what this Markov chain reveals about the proclivities or biases of the reward model.

Benchmarks and incentives

Mentors: Lionel Levine, Alaa Daffalla, Mahdi Asgari

Participants: Ellery Merriam, Ria Kataria

AI labs compete fiercely to surpass each other on published benchmarks. This project examines the incentives of that process: What are the incentives of benchmark makers? What motivates labs to compete on one benchmark and ignore another? What techniques do labs use to improve performance on a benchmark, and what side-effects arise from this “benchmaxxing”? As an application, the project uses EigenBench to design a public-facing leaderboard where LLMs are ranked on their alignment to different value systems.

Causal abstractions inside neural networks

Mentors: Ziv Goldfeld, Jonathn Chang

Participants: Austin Lu, Rohan Shankar, Jane Lackley, Bojro Das

Modern neural networks can solve complex tasks, but it is often unclear how they do so internally. This project aims to “open the black box” by connecting an interpretable, high-level causal model to the internal computations of a neural network. The objective is to produce concrete “handles” for testing and steering a model, going beyond mere concept localization, and to develop practical methods for identifying and validating high-level algorithmic parts inside real neural networks.