Analysis of Speech Recordings from a Head and Torso Simulator (HATS) with and without Face Coverings using both Spectrogram and Transcription Tools

Disciplines

Architectural Engineering | Engineering | Speech and Rhetorical Studies

Abstract (300 words maximum)

The COVID-19 pandemic caused the need for a wide assortment of face coverings to be adopted all over the globe. But with the use of such face covering, speech sound levels and intelligibly are often sacrificed over the benefit of reduced transmissibility of the virus. This made communications at all public locations more challenging, but especially so in classroom settings when one needs to listen to a single talker over a long time period. Prior research at KSU consisted of measuring and evaluating the sound levels and frequency bands of white noise played out of the artificial mouth (loudspeaker) of an acoustic head and torso simulator (HATS) at the instructor location in a 40-seat empty classroom. Sound levels and recordings were made at a 2-meter distance and a 6.2-meter distance, simulating a student listener in the front row and in the back row, respectively. This procedure was repeated with several popular face coverings, including the standard N95, KN95, various cloth masks, and transparent plastic face shields. In addition, WAV recordings were made with and without masks using short phases of pre-recorded American English female and male talkers. This directed study consists of finding a way to make use of the speech recordings at a 2-meter distance from the HATS. Adobe Audition software was used to improve the playback quality of the speech recording and trim the modified files into 10 seconds of speech, and all start with the same spoken words. This software is also being used to create spectrograms to identify the frequencies of attenuation or amplification when a particular mask is on. Adobe Premier Pro software is used for the transcription of these files. Documenting which words are transcribed correctly or incorrectly will provide further indication of the effects of various face coverings on speech intelligibility.

Academic department under which the project should be listed

Mechanical Engineering

Primary Investigator (PI) Name

Richard Ruhala

Additional Faculty

Laura Ruhala, Mechanical Engineering, lruhala@kennesaw.edu

This document is currently not available here.

Share

COinS
 

Analysis of Speech Recordings from a Head and Torso Simulator (HATS) with and without Face Coverings using both Spectrogram and Transcription Tools

The COVID-19 pandemic caused the need for a wide assortment of face coverings to be adopted all over the globe. But with the use of such face covering, speech sound levels and intelligibly are often sacrificed over the benefit of reduced transmissibility of the virus. This made communications at all public locations more challenging, but especially so in classroom settings when one needs to listen to a single talker over a long time period. Prior research at KSU consisted of measuring and evaluating the sound levels and frequency bands of white noise played out of the artificial mouth (loudspeaker) of an acoustic head and torso simulator (HATS) at the instructor location in a 40-seat empty classroom. Sound levels and recordings were made at a 2-meter distance and a 6.2-meter distance, simulating a student listener in the front row and in the back row, respectively. This procedure was repeated with several popular face coverings, including the standard N95, KN95, various cloth masks, and transparent plastic face shields. In addition, WAV recordings were made with and without masks using short phases of pre-recorded American English female and male talkers. This directed study consists of finding a way to make use of the speech recordings at a 2-meter distance from the HATS. Adobe Audition software was used to improve the playback quality of the speech recording and trim the modified files into 10 seconds of speech, and all start with the same spoken words. This software is also being used to create spectrograms to identify the frequencies of attenuation or amplification when a particular mask is on. Adobe Premier Pro software is used for the transcription of these files. Documenting which words are transcribed correctly or incorrectly will provide further indication of the effects of various face coverings on speech intelligibility.