PhD Thesis Proposal: Mai Pham

Jun

Tuesday, June 2, 2026
10:00am–11:00am ET

Rm 205, ECSC

"Learning-Based Decision Models for Sequential, Strategic, and Dynamic Resource Allocation"

Abstract

Resource allocation is a foundational problem in operations research, arising in domains such as transportation, market design, and logistics. Modern allocation systems are increasingly difficult to model and solve using static or fully specified formulations alone because they are sequential, stochastic, combinatorial, strategic, and large-scale. This thesis develops learning-based decision models for resource allocation problems with these forms of complexity.

The thesis consists of two completed studies and one proposed study. The first study addresses policy optimization in sequential decision environments. It introduces Optimistic Policy Regularization (OPR), a PPO-compatible reinforcement learning method that uses high-performing episodes to construct a temporary reward-shaping signal. The method biases policy learning toward actions overrepresented in successful trajectories while decaying the shaping signal over training. OPR is evaluated on Atari 2600 and CAGE Challenge 2 environments against standard PPO, PPO with self-imitation learning, and other deep reinforcement learning baselines.

The second study addresses strategic combinatorial allocation through differentiable economics. It extends RegretNet-style neural auction design to combinatorial auctions where bidders value bundles of resources and the designer must balance revenue, welfare, and fairness. The study introduces constraint-aware allocation layers and neural architectures, including CAGraph, a graph-based model that represents bidder-bundle conflicts. The framework is evaluated on synthetic auction benchmarks and operational case studies in airport slot allocation and strategic cyber defense.

The proposed third study addresses scalable dynamic electric-vehicle ride-sharing. It will formulate EV ride-sharing as a sequential allocation problem with online demand, shared rides, routing constraints, vehicle capacity, battery dynamics, charging infrastructure, and public social-welfare objectives. The proposed method will combine graph neural networks and reinforcement learning, and will be evaluated against offline mixed-integer programming benchmarks, rolling-horizon re-optimization policies, and heuristic baselines.

Together, the thesis studies how learning-based decision models can improve resource allocation across policy optimization, mechanism design, and dynamic transportation operations.

Thesis Committee

Peter Chin (Chair)
Vikrant Vaze
Wesley Marrero

Contact

For more information, contact Thayer Registrar at thayer.registrar@dartmouth.edu .