PhD Thesis Proposal: Junyan Cheng

Apr

Wednesday, April 1, 2026
1:00pm–2:00pm ET

Rm 201, MacLean ESC (Rett's Rm)/ Online

"Autonomous Agentic Systems for Complex Reasoning and Discovery"

Abstract

The long-standing goal of computer science and artificial intelligence is to understand the limits of computation and to build systems that can process complex information, augment human abilities in critical tasks, and reliably analyze, decide, and discover autonomously in long-term, large-scale deployments. Large Language Models (LLMs) have emerged as a promising, highly capable, and generalized foundational backbone for building these kinds of systems. However, deploying these models in high-stakes scenarios exposes critical limitations and challenges. In this thesis, we systematically study autonomous LLM agentic systems from their theoretical foundations, through their concrete system architectures, to their engineering frameworks. First, we establish the theoretical feasibility of neural networks learning necessary symbolic capabilities, such as compositionality and extrapolation, unsupervisedly via Transitional Dictionary Learning (TDL). We also conduct studies that explore the capability boundaries and features of LLM-based agentic system designs. Second, we develop the actual systems by anchoring core cognitive capabilities within extreme-case real-world applications to rigorously stress-test our designs. We construct a basic "Sense-Plan-Act" paradigm for lifelong operation through SocioDojo, learning a macroscopic world model from continuous time series and texts. Diving into its core reasoning component, we introduce Analytica, utilizing Soft Propositional Reasoning to systematically reduce bias and variance in societal and scientific forecasting. We then evaluate three advanced behaviors in highly complex domains: long-term memory via empirical asset pricing, utilizing financial markets as a complex non-stationary environment where information advantages can be precisely measured; adaptation and personalization via demand-optimized application synthesis (Apeiron), directly confronting the highly amorphous demands of software development; and the ultimate capability of evolution and discovery via Genesys, deploying distributed genetic programming to navigate the infinite search space of language model architectures themselves. Finally, to ground theory into practice, we introduce the Low-Level Language Model (LLLM) open-source framework for building scalable, extensible, and functional agentic systems. It ensures robust reproducibility and provides a foundation for the broader community to advance this field.

Thesis Committee

Peter Chin (Chair)
George Cybenko
Soroush Vosoughi
Kyle Richardson (Allen Institute for AI)
Jay Stokes (Microsoft Research)

Contact

For more information, contact Thayer Registrar at thayer.registrar@dartmouth.edu .