In this article, to the purpose of simplifying challenges in designing automatic speech recognition (ASR) systems working on disordered speech, we focus on an isolated word recognition solution based on a convolutional neural network architecture to predict the presence of precise voice commands within atypical utterances. Italian speech recognizers have trained from scratch on custom datasets by following a speaker dependent approach, then their performances (in terms of word error rate) have been investigated with the collaboration of 16 Italian speakers with motor and speech impairments. Our ASR system works on Mel-frequency Cepstral Coefficients features extraction method, and thanks to its inner structure, the trained ASR model can be deployed on edge computing nodes where the local inference task requires limited computational resources. We acknowledge that it may have significant implications in the field of assistive technology. Then, in the end of the paper, we present a prototype of an edge computing device aimed at supporting the voice interaction between its user with speech disorders and his/her surrounding smart environment through a custom interaction with virtual assistants' services.& COPY; 2023 Elsevier B.V. All rights reserved.

Toward a lightweight ASR solution for atypical speech on the edge

Mulfari, D;
2023-01-01

Abstract

In this article, to the purpose of simplifying challenges in designing automatic speech recognition (ASR) systems working on disordered speech, we focus on an isolated word recognition solution based on a convolutional neural network architecture to predict the presence of precise voice commands within atypical utterances. Italian speech recognizers have trained from scratch on custom datasets by following a speaker dependent approach, then their performances (in terms of word error rate) have been investigated with the collaboration of 16 Italian speakers with motor and speech impairments. Our ASR system works on Mel-frequency Cepstral Coefficients features extraction method, and thanks to its inner structure, the trained ASR model can be deployed on edge computing nodes where the local inference task requires limited computational resources. We acknowledge that it may have significant implications in the field of assistive technology. Then, in the end of the paper, we present a prototype of an edge computing device aimed at supporting the voice interaction between its user with speech disorders and his/her surrounding smart environment through a custom interaction with virtual assistants' services.& COPY; 2023 Elsevier B.V. All rights reserved.
2023
Artificial intelligence; Machine learning; Dysarthria; Automatic speech recognition; Edge computing; Speech disorder
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/76403
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact