A Repository of Modern American English

The goal of APIL's Repository project (using Harvard Sentence stimuli) is to create a repository of acoustic and corresponding ultrasound data which is representative of the sound and articulations of Modern American English.

Experiment Methods

Stimuli

Five lists from the IEEE Harvard Sentences, including lists [6,7,8,9,10]. These each sentence in these lists was displayed to the participant six times, totalling 300 sentences.

Participants

Ten participants, five females and five males, between the ages of 22 and 33 y.o. participated in the experiment.

Experimental Set-up

The participant was seated in a lab chair across from a computer monitor which displayed the stimulus sentences. The participants wore a head mounted probe holder (Derrick et al., 2015) to stabilize the relative probe-head placement during the experiment. In-house designed data collection software was used to display the stimuli and to collect the audio and ultrasound images. The participant read aloud each sentence, and the researcher would then advance the stimulus display to the next stimulus sentence. The participant would be asked to repeat the sentence if it was mispoken.

Preparing the Data

The ultrasound videos were demuxed into audio wavfiles and the image frames from the ultrasound video. An in-house modified version of Praat was used to identify the sentence boundaries and the image frames corresponding to these sentences. The audio for each sentence and the corresponding image frames were place in a separate directory. The Penn state forced aligner was used on each sentence, creating a Praat textgrid demarkating the word and phoneme boundaries in the audio. An in-house developed Database for ultrasound image/trace storage (details and download available here, and here (tarball) or here (zipped), respectively) houses the storage files labelled with the corresponding word and phoneme.

NOTICE

At the present moment, we have left most textgrids as-is, meaning that word and phone boundaries have not been hand-adjusted. We have noticed that the vast majority of boundaries are quite accurate, however there are several in each sentence which are off by a number of milliseconds which would result in the incorrect classification of an ultrasound image. Therefore, these boundaries should not be trusted, and should be hand-verified. APIL will gladly replace existing textgrids (along with the corresponding audio wav files, available for download) with an updated version containing correct boundaries. All this said, some of the present entries in the Database are likely mislabeled. Furthermore, the ultrasound images for this study presently are not traced, despite the intent of the Database to manage both images and trace files. Similarly, APIL will gladly upload any trace files for these images to the Database.