This thesis focuses on the application of Artificial Intelligence (AI)-driven techniques to support Automatic Speech Recognition (ASR) services for individuals with speech impairments, such as dysarthria. These conditions often result in poor speech intelligibility and are frequently accompanied by severe motor disabilities. It is estimated that over 22 million people in Europe (5\% of the population) are affected by such speech disorders, which can manifest from childhood, as in the case of cerebral palsy, or due to neurodegenerative and progressive diseases like Parkinson’s, amyotrophic lateral sclerosis, spinal muscular atrophy, and multiple sclerosis. These individuals face significant barriers in interpersonal communication due to articulation problems that produce extremely variable speech. These impairments impose profound limitations on social participation and independence in daily activities. While contemporary ASR tools integrated into voice assistants excel with standard speech, their performance deteriorates significantly when faced with impaired speech patterns, particularly moderate to severe voice disorders. This creates a paradox: technologies that could be crucial in improving the lives of people with the disabilities instead become additional barriers. To address this, the present work proposes a technological ecosystem called CapisciAMe, designed for disordered speech recognition with a focus on Italian as the primary language. The research is centered on isolated word recognition tasks using speaker-dependent approaches and leverages deep learning techniques, including state-of-the-art ASR models based on encoder-decoder architectures. These models are fine-tuned with our private corpus of Italian impaired speech, enabling progress toward recognizing short sentences as combinations of individual words. The proposed ecosystem encompasses three interrelated pillars, each a crucial aspect of this research: -Disordered Speech Collection: Given the scarcity of dysarthric speech corpora, especially in Italian, a significant effort has been devoted to collecting voice samples from individuals with speech disorders. This work has resulted in the development of the first Italian atypical speech corpus for AI-based research. To support this effort, novel IoT-based assistive technologies have been introduced to streamline speech acquisition, alongside methodologies for enhancing impaired speech signals. -Deep Learning Architectures: The study employs sequence-to-sequence frameworks, leveraging state-of-the-art encoder-decoder architectures such as Wav2Vec2 (by Meta AI) and Whisper (by OpenAI). These models, based on transformer and pre-trained on extensive multilingual standard speech datasets, were fine-tuned on our private corpus. This fine-tuning process is critical to our work, as it enables accurate recognition of single voice commands and short sentences (as combinations of isolated words) spoken by Italian individuals with atypical speech and dysarthria. -ASR Services and Application Prototypes: A cloud-based speech-to-text transcription service, powered by the ASR engine, has been developed as part of the CapisciAMe ecosystem. This platform facilitates seamless integration of speech recognition capabilities into custom applications, empowering software developers and fostering interdisciplinary studies and applications that support individuals with speech impairments. This research demonstrates the feasibility of creating tailored ASR systems for disordered speech by addressing challenges in data scarcity, model optimization, and application accessibility. The ecosystem achieves significant improvements in recognizing impaired speech in Italian, laying the groundwork for further development of inclusive communication technologies that enhance the independence and social participation of individuals with speech impairments.

Exploring Artificial Intelligence Technologies for Automatic Speech Recognition in Voice Disorders / Davide Mulfari , 2025 Jun 04. 37. ciclo, Anno Accademico 2024/2025.

Exploring Artificial Intelligence Technologies for Automatic Speech Recognition in Voice Disorders

MULFARI, DAVIDE
2025-06-04

Abstract

This thesis focuses on the application of Artificial Intelligence (AI)-driven techniques to support Automatic Speech Recognition (ASR) services for individuals with speech impairments, such as dysarthria. These conditions often result in poor speech intelligibility and are frequently accompanied by severe motor disabilities. It is estimated that over 22 million people in Europe (5\% of the population) are affected by such speech disorders, which can manifest from childhood, as in the case of cerebral palsy, or due to neurodegenerative and progressive diseases like Parkinson’s, amyotrophic lateral sclerosis, spinal muscular atrophy, and multiple sclerosis. These individuals face significant barriers in interpersonal communication due to articulation problems that produce extremely variable speech. These impairments impose profound limitations on social participation and independence in daily activities. While contemporary ASR tools integrated into voice assistants excel with standard speech, their performance deteriorates significantly when faced with impaired speech patterns, particularly moderate to severe voice disorders. This creates a paradox: technologies that could be crucial in improving the lives of people with the disabilities instead become additional barriers. To address this, the present work proposes a technological ecosystem called CapisciAMe, designed for disordered speech recognition with a focus on Italian as the primary language. The research is centered on isolated word recognition tasks using speaker-dependent approaches and leverages deep learning techniques, including state-of-the-art ASR models based on encoder-decoder architectures. These models are fine-tuned with our private corpus of Italian impaired speech, enabling progress toward recognizing short sentences as combinations of individual words. The proposed ecosystem encompasses three interrelated pillars, each a crucial aspect of this research: -Disordered Speech Collection: Given the scarcity of dysarthric speech corpora, especially in Italian, a significant effort has been devoted to collecting voice samples from individuals with speech disorders. This work has resulted in the development of the first Italian atypical speech corpus for AI-based research. To support this effort, novel IoT-based assistive technologies have been introduced to streamline speech acquisition, alongside methodologies for enhancing impaired speech signals. -Deep Learning Architectures: The study employs sequence-to-sequence frameworks, leveraging state-of-the-art encoder-decoder architectures such as Wav2Vec2 (by Meta AI) and Whisper (by OpenAI). These models, based on transformer and pre-trained on extensive multilingual standard speech datasets, were fine-tuned on our private corpus. This fine-tuning process is critical to our work, as it enables accurate recognition of single voice commands and short sentences (as combinations of isolated words) spoken by Italian individuals with atypical speech and dysarthria. -ASR Services and Application Prototypes: A cloud-based speech-to-text transcription service, powered by the ASR engine, has been developed as part of the CapisciAMe ecosystem. This platform facilitates seamless integration of speech recognition capabilities into custom applications, empowering software developers and fostering interdisciplinary studies and applications that support individuals with speech impairments. This research demonstrates the feasibility of creating tailored ASR systems for disordered speech by addressing challenges in data scarcity, model optimization, and application accessibility. The ecosystem achieves significant improvements in recognizing impaired speech in Italian, laying the groundwork for further development of inclusive communication technologies that enhance the independence and social participation of individuals with speech impairments.
4-giu-2025
Exploring Artificial Intelligence Technologies for Automatic Speech Recognition in Voice Disorders / Davide Mulfari , 2025 Jun 04. 37. ciclo, Anno Accademico 2024/2025.
File in questo prodotto:
File Dimensione Formato  
Tesi.pdf

accesso aperto

Descrizione: Tesi
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 7.41 MB
Formato Adobe PDF
7.41 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/95263
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact