Exploring AI-based Speaker Dependent Methods in Dysarthric Speech Recognition

IRIS

In this paper, we present our recent improvements within the CapisciAMe project, an Italian initiative aimed at investigating the usage of deep learning strategies for automatic speech recognition in presence of speech disorders, such as dysarthria. Our research activity is focused on isolated word recognition by exploiting a convolutional neural network (CNN) architecture to predict the presence of a reduced number of speech commands within an atypical speech. Currently, by following speaker-dependent approaches, our speech models have been trained on a 21K speech dataset consisting of voice contributions, i.e., single speech data recordings, from 156 Italian users with neuromotor disabilities and dysarthria. Having a large number of repetitions (into the thousands) for each word on which to train our deep learning model is of crucial importance for our project. Nevertheless, people with impaired speech are generally weak in repetitive vocalization tasks, so producing a large number of speech samples for each work is a complex operation to be accomplished. To mitigate this difficulty, we investigate possible relationships between the number of samples for each word and the accuracy of automatic speech recognition. This study plays a critical role in our research, allowing us to minimize the amount of speech data samples required for each work from people with dysarthria to train the automatic speech recognition system.

Exploring AI-based Speaker Dependent Methods in Dysarthric Speech Recognition

Mulfari, D;Celesti, A;Villari, M

2022-01-01

Abstract

In this paper, we present our recent improvements within the CapisciAMe project, an Italian initiative aimed at investigating the usage of deep learning strategies for automatic speech recognition in presence of speech disorders, such as dysarthria. Our research activity is focused on isolated word recognition by exploiting a convolutional neural network (CNN) architecture to predict the presence of a reduced number of speech commands within an atypical speech. Currently, by following speaker-dependent approaches, our speech models have been trained on a 21K speech dataset consisting of voice contributions, i.e., single speech data recordings, from 156 Italian users with neuromotor disabilities and dysarthria. Having a large number of repetitions (into the thousands) for each word on which to train our deep learning model is of crucial importance for our project. Nevertheless, people with impaired speech are generally weak in repetitive vocalization tasks, so producing a large number of speech samples for each work is a complex operation to be accomplished. To mitigate this difficulty, we investigate possible relationships between the number of samples for each word and the accuracy of automatic speech recognition. This study plays a critical role in our research, allowing us to minimize the amount of speech data samples required for each work from people with dysarthria to train the automatic speech recognition system.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Codice ISBN
	
				978-1-6654-9956-9
			
	Parole chiave
	
				assistive technology; dysarthria; automatic speech recognition; artificial intelligence; machine learning; deep learning
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
20.500.12610-72244.pdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 328.45 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	328.45 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/72244

Citazioni

ND

6

4

social impact