Toward a lightweight ASR solution for atypical speech on the edge

IRIS

In this article, to the purpose of simplifying challenges in designing automatic speech recognition (ASR) systems working on disordered speech, we focus on an isolated word recognition solution based on a convolutional neural network architecture to predict the presence of precise voice commands within atypical utterances. Italian speech recognizers have trained from scratch on custom datasets by following a speaker dependent approach, then their performances (in terms of word error rate) have been investigated with the collaboration of 16 Italian speakers with motor and speech impairments. Our ASR system works on Mel-frequency Cepstral Coefficients features extraction method, and thanks to its inner structure, the trained ASR model can be deployed on edge computing nodes where the local inference task requires limited computational resources. We acknowledge that it may have significant implications in the field of assistive technology. Then, in the end of the paper, we present a prototype of an edge computing device aimed at supporting the voice interaction between its user with speech disorders and his/her surrounding smart environment through a custom interaction with virtual assistants' services.& COPY; 2023 Elsevier B.V. All rights reserved.

Toward a lightweight ASR solution for atypical speech on the edge

Mulfari, D;Carnevale, L;Villari, M

2023-01-01

Abstract

In this article, to the purpose of simplifying challenges in designing automatic speech recognition (ASR) systems working on disordered speech, we focus on an isolated word recognition solution based on a convolutional neural network architecture to predict the presence of precise voice commands within atypical utterances. Italian speech recognizers have trained from scratch on custom datasets by following a speaker dependent approach, then their performances (in terms of word error rate) have been investigated with the collaboration of 16 Italian speakers with motor and speech impairments. Our ASR system works on Mel-frequency Cepstral Coefficients features extraction method, and thanks to its inner structure, the trained ASR model can be deployed on edge computing nodes where the local inference task requires limited computational resources. We acknowledge that it may have significant implications in the field of assistive technology. Then, in the end of the paper, we present a prototype of an edge computing device aimed at supporting the voice interaction between its user with speech disorders and his/her surrounding smart environment through a custom interaction with virtual assistants' services.& COPY; 2023 Elsevier B.V. All rights reserved.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Parole chiave
	
				Artificial intelligence; Machine learning; Dysarthria; Automatic speech recognition; Edge computing; Speech disorder
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
20.500.12610-76403.pdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 6.27 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	6.27 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/76403

Citazioni

ND

4

4

social impact