MS Thesis Defense: Sarah Oh

Wednesday, May 15, 2019, 4:00–6:00pm

Rm. 105, Cummings Hall

“Towards a Perceptual Distance Metric for Audio”


The question “What makes sensory stimuli seem alike or different?” is of fundamental importance to our understanding of the information processing underlying perception and cognition. Although perceptual (dis)similarity seems akin to distance, measuring the distance between points in Euclidean stimulus space is a poor estimator of subjective dissimilarity. Nonlinear response patterns, interactions between stimulus components, temporal effects, and top-down modulation transform the information contained in incoming stimuli in a way that seems to preserve some notion of distance, but not the one we are used to. This thesis proposes that transformations applied to stimuli during bottom-up stages of perception can be modeled as a function mapping stimulus points in Euclidean space to their representations in perceptual space. A dataset was collected in a subjective listening experiment, the results of which were used to explore possible approaches to approximating the perceptual transformation.

The first method is based on physiology and estimates the function transforming stimulus vectors into basilar membrane vibration vectors based on experimental mammalian basilar membrane responses. The second method is data-driven, optimizing the weights of a single matrix using the subjective listening experiment as training data. The third method combines aspect of the first two, using subjective ratings data to optimize the parameters to the function generating basilar membrane gain curves. Each of the proposed measures achieved comparable or stronger correlations with subjective ratings (r≈0.8) compared to state-of-the-art objective audio quality measures.

Thesis Committee

For more information, contact Daryl Laware at