In this article, to the purpose of simplifying challenges in designing automatic speech recognition (ASR) systems working on disordered speech, we focus on an isolated word recognition solution based on a convolutional neural network architecture to predict the presence of precise voice commands within atypical utterances. Italian speech recognizers have trained from scratch on custom datasets by following a speaker dependent approach, then their performances (in terms of word error rate) have been investigated with the collaboration of 16 Italian speakers with motor and speech impairments. Our ASR system works on Mel-frequency Cepstral Coefficients features extraction method, and thanks to its inner structure, the trained ASR model can be deployed on edge computing nodes where the local inference task requires limited computational resources. We acknowledge that it may have significant implications in the field of assistive technology. Then, in the end of the paper, we present a prototype of an edge computing device aimed at supporting the voice interaction between its user with speech disorders and his/her surrounding smart environment through a custom interaction with virtual assistants' services.& COPY; 2023 Elsevier B.V. All rights reserved.

Toward a lightweight ASR solution for atypical speech on the edge

Mulfari, D;
2023-01-01

Abstract

In this article, to the purpose of simplifying challenges in designing automatic speech recognition (ASR) systems working on disordered speech, we focus on an isolated word recognition solution based on a convolutional neural network architecture to predict the presence of precise voice commands within atypical utterances. Italian speech recognizers have trained from scratch on custom datasets by following a speaker dependent approach, then their performances (in terms of word error rate) have been investigated with the collaboration of 16 Italian speakers with motor and speech impairments. Our ASR system works on Mel-frequency Cepstral Coefficients features extraction method, and thanks to its inner structure, the trained ASR model can be deployed on edge computing nodes where the local inference task requires limited computational resources. We acknowledge that it may have significant implications in the field of assistive technology. Then, in the end of the paper, we present a prototype of an edge computing device aimed at supporting the voice interaction between its user with speech disorders and his/her surrounding smart environment through a custom interaction with virtual assistants' services.& COPY; 2023 Elsevier B.V. All rights reserved.
2023
Artificial intelligence; Machine learning; Dysarthria; Automatic speech recognition; Edge computing; Speech disorder
File in questo prodotto:
File Dimensione Formato  
20.500.12610-76403.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 6.27 MB
Formato Adobe PDF
6.27 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/76403
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact