Pitch Mark Detection from Noisy Speech Waveform Using Wave-U-Net
- Title
- Pitch Mark Detection from Noisy Speech Waveform Using Wave-U-Net
- Authors
- Nam, Hyun-Joon; Park, Hong-June
- Date Issued
- 2023-06-08
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Abstract
- Pitch mark (PM) is a time point corresponding to the closing time of vocal fold in voiced speech. PMs are useful for real-life speech processing because of their noise immunity. Wave-U-PM, a Wave-U-Net based neural network, is proposed to detect PMs from noisy speech. The ground truth PMs are generated from clean speech by using REAPER; this increases the available speech dataset for training to 100 hours, while the dataset for the electroglottograph (EGG) based PM detection is less than 5 hours. Wave-U-PM has an encoder and two decoders. The first decoder generates a sinusoidal PM waveform, whose positive peak times represent the PMs. The second decoder generates a combined pitch and formant waveform below 1000Hz. Wave-U-PM outperforms previous PM detection works by 11% and 31% for the voiced and the entire speech intervals, respectively, in the identification rate (IDR) at 0 dB SNR. The second decoder enhances IDR by 2.5% for the entire speech interval.
- URI
- https://oasis.postech.ac.kr/handle/2014.oak/120131
- Article Type
- Conference
- Citation
- 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023, 2023-06-08
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.