The paper investigates the impact of denoising techniques on a deep learning recognition system for speakers with dysarthria, i.e., a neuromotor speech disorder which compromises speech intelligibility and that affects approximately 46 million of people worldwide. In particular, we compare a manual noise reduction techniques with automatic approaches based on classical signal processing techniques, i.e. filtering and spectral analysis, as well as more recent deep learning techniques based on recurrent neural network models. Comparison results reported in this paper are based on a dataset with more than 21K audio files collected with the collaboration of 156 Italian native speakers with different disabilities that cause dysarthria speech impairment. Therefore, different diseases and dysarthric severity levels have been taken into account. Moreover, differently from several other studies related to automatic recognition systems, audio files considered in our analysis have been collected in real environments, with a very limited supervision and simply using users' smartphones. Our analysis shows that, in this context, the effectiveness of automatic denoising tools is quite limited, particularly for dysarthric speakers with severe grades of disorder. However, comparisons with the proposed manual denoising intervention provide new and interesting insights which can be effectively and easily exploited with the aim of empowering actual automatic dysarthric speech recognition systems and that could drive future research in this field.
Comparison of Noise Reduction Techniques for Dysarthric Speech Recognition
Mulfari, D;
2022-01-01
Abstract
The paper investigates the impact of denoising techniques on a deep learning recognition system for speakers with dysarthria, i.e., a neuromotor speech disorder which compromises speech intelligibility and that affects approximately 46 million of people worldwide. In particular, we compare a manual noise reduction techniques with automatic approaches based on classical signal processing techniques, i.e. filtering and spectral analysis, as well as more recent deep learning techniques based on recurrent neural network models. Comparison results reported in this paper are based on a dataset with more than 21K audio files collected with the collaboration of 156 Italian native speakers with different disabilities that cause dysarthria speech impairment. Therefore, different diseases and dysarthric severity levels have been taken into account. Moreover, differently from several other studies related to automatic recognition systems, audio files considered in our analysis have been collected in real environments, with a very limited supervision and simply using users' smartphones. Our analysis shows that, in this context, the effectiveness of automatic denoising tools is quite limited, particularly for dysarthric speakers with severe grades of disorder. However, comparisons with the proposed manual denoising intervention provide new and interesting insights which can be effectively and easily exploited with the aim of empowering actual automatic dysarthric speech recognition systems and that could drive future research in this field.File | Dimensione | Formato | |
---|---|---|---|
20.500.12610-72243.pdf
non disponibili
Tipologia:
Versione Editoriale (PDF)
Licenza:
Copyright dell'editore
Dimensione
1.76 MB
Formato
Adobe PDF
|
1.76 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.