- Undergraduate
Bachelor's Degrees
Bachelor of ArtsBachelor of EngineeringDual-Degree ProgramUndergraduate AdmissionsUndergraduate Experience
- Graduate
Graduate Experience
- Research
- Entrepreneurship
- Community
- About
-
Search
All Thayer Events
PhD Thesis Proposal: Mateusz Nowak
May
13
Wednesday, May 13, 2026
1:00pm–2:00pm ET
Rm B12, ECSC (Conference Room)/ Online
"Less can be More and More can be Less: Strategic Data Utilization and The Consequences of Data Overload"
Abstract
While machine learning has seen massive gains through large-scale data ingestion, the relationship between data volume, network architecture, and model reasoning remains underexplored. This thesis investigates the limitations of unstructured data scaling and demonstrates how targeted network design can enhance reasoning capabilities more effectively than sheer data volume. The research is structured into two main parts.
First, we propose architectural modifications that allow models to better exploit intrinsic data structures without increasing dataset size. We demonstrate that tailoring architectures to the underlying data structure significantly improves performance on complex tasks, such as 3D reconstruction and world modeling.
Second, we analyze the failure modes of high-capacity pre-trained models, quantifying how data overload degrades multiple-choice question understanding. We further examine how varying conceptual representations impact model performance across image and text modalities, focusing on image classification and sentiment analysis.
Ultimately, this work provides a comprehensive framework for optimizing machine learning models through targeted augmentation and data-aware design, offering a structured evaluation of when 'less can be more' and 'more can be less' in the context of data scaling.
Thesis Committee
- Prof. Peter Chin (Chair)
- Prof. Xiaoyao Fan
- Prof. Geoffrey P. Luke
Contact
For more information, contact Thayer Registrar at thayer.registrar@dartmouth.edu .
