In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVM-based system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.

A machine learning approach for sentiment analysis for Italian reviews in healthcare

Bacco L.;Merone M.
;
2020-01-01

Abstract

In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVM-based system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.
File in questo prodotto:
File Dimensione Formato  
paper_32.pdf

accesso aperto

Descrizione: In this paper, we present our approach to the task of binary sentiment classification for Italian reviews in healthcare domain. We first collected a new dataset for such domain. Then, we compared the results obtained by two different systems, one including a Support Vector Machine and one with BERT. For the first one, we linguistic pre–processed the dataset to extract hand-crafted features exploited by the classifier. For the second one, we oversampled the dataset to achieve better results. Our results show that the SVMbased system, without the worry of having to oversample, has better performance than the BERT-based one, achieving an F1-score of 91.21%.
Tipologia: Abstract
Licenza: Copyright dell'editore
Dimensione 553.05 kB
Formato Adobe PDF
553.05 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/70324
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact