Skip to main content
All Thayer Events

PhD Thesis Proposal: Mateusz Nowak

May

13

Wednesday, May 13, 2026
1:00pm–2:00pm ET

Rm B12, ECSC (Conference Room)/ Online

ZOOM LINK

"Less can be More and More can be Less: Strategic Data Utilization and The Consequences of Data Overload"

Abstract

While machine learning has seen massive gains through large-scale data ingestion, the relationship between data volume, network architecture, and model reasoning remains underexplored. This thesis investigates the limitations of unstructured data scaling and demonstrates how targeted network design can enhance reasoning capabilities more effectively than sheer data volume. The research is structured into two main parts.

First, we propose architectural modifications that allow models to better exploit intrinsic data structures without increasing dataset size. We demonstrate that tailoring architectures to the underlying data structure significantly improves performance on complex tasks, such as 3D reconstruction and world modeling. 

Second, we analyze the failure modes of high-capacity pre-trained models, quantifying how data overload degrades multiple-choice question understanding. We further examine how varying conceptual representations impact model performance across image and text modalities, focusing on image classification and sentiment analysis. 

Ultimately, this work provides a comprehensive framework for optimizing machine learning models through targeted augmentation and data-aware design, offering a structured evaluation of when 'less can be more' and 'more can be less' in the context of data scaling.

Thesis Committee

  • Prof. Peter Chin (Chair)
  • Prof. Xiaoyao Fan
  • Prof. Geoffrey P. Luke

Contact

For more information, contact Thayer Registrar at thayer.registrar@dartmouth.edu .