In this paper, we present our recent improvements within the CapisciAMe project, an Italian initiative aimed at investigating the usage of deep learning strategies for automatic speech recognition in presence of speech disorders, such as dysarthria. Our research activity is focused on isolated word recognition by exploiting a convolutional neural network (CNN) architecture to predict the presence of a reduced number of speech commands within an atypical speech. Currently, by following speaker-dependent approaches, our speech models have been trained on a 21K speech dataset consisting of voice contributions, i.e., single speech data recordings, from 156 Italian users with neuromotor disabilities and dysarthria. Having a large number of repetitions (into the thousands) for each word on which to train our deep learning model is of crucial importance for our project. Nevertheless, people with impaired speech are generally weak in repetitive vocalization tasks, so producing a large number of speech samples for each work is a complex operation to be accomplished. To mitigate this difficulty, we investigate possible relationships between the number of samples for each word and the accuracy of automatic speech recognition. This study plays a critical role in our research, allowing us to minimize the amount of speech data samples required for each work from people with dysarthria to train the automatic speech recognition system.

Exploring AI-based Speaker Dependent Methods in Dysarthric Speech Recognition

Mulfari, D;
2022-01-01

Abstract

In this paper, we present our recent improvements within the CapisciAMe project, an Italian initiative aimed at investigating the usage of deep learning strategies for automatic speech recognition in presence of speech disorders, such as dysarthria. Our research activity is focused on isolated word recognition by exploiting a convolutional neural network (CNN) architecture to predict the presence of a reduced number of speech commands within an atypical speech. Currently, by following speaker-dependent approaches, our speech models have been trained on a 21K speech dataset consisting of voice contributions, i.e., single speech data recordings, from 156 Italian users with neuromotor disabilities and dysarthria. Having a large number of repetitions (into the thousands) for each word on which to train our deep learning model is of crucial importance for our project. Nevertheless, people with impaired speech are generally weak in repetitive vocalization tasks, so producing a large number of speech samples for each work is a complex operation to be accomplished. To mitigate this difficulty, we investigate possible relationships between the number of samples for each word and the accuracy of automatic speech recognition. This study plays a critical role in our research, allowing us to minimize the amount of speech data samples required for each work from people with dysarthria to train the automatic speech recognition system.
2022
978-1-6654-9956-9
assistive technology; dysarthria; automatic speech recognition; artificial intelligence; machine learning; deep learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/72244
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 4
social impact