Livia Lilli per i Seminari Generali dell'IAC il 27 maggio 2025 alle 14.30

Lilli

Martedì 27 maggio un nuovo incontro della serie dei Seminari Generali IAC 2025. Ospite Livia Lilli, Data Scientist presso la Direzione Tecnica, ICT e Innovazione Tecnologie Sanitarie, Università Cattolica del Sacro Cuore.

Titolo del seminario sarà Real-World Applications of Natural Language Processing in Healthcare – The MISTIC pipelines, che si svolgerà in modalità mista in presenza al CNR IAC e in streaming (vedi link in calce), trasmesso sul canale YT dell'istituto. 

Di seguito l'abstract.

The use of Natural Language Processing (NLP) has expanded across many domains, including healthcare, where it plays a crucial role in extracting data from unstructured clinical reports to support Real-World Evidence (RWE) generation. This is especially important in oncology, where key information on disease progression such as metastasis is often found only in free-text Electronic Health Records (EHRs). However, processing this data remains challenging, particularly in minor languages like Italian, where domain-specific NLP tools are limited. Additionally, adaptation of large language models typically requires substantial computational resources and large labeled datasets, limiting their use in real-world clinical settings.
The MISTIC pipeline (Metastases Italian Sentence Transformers Inference Classification) is a novel, lightweight NLP solution developed by Fondazione Policlinico Universitario Agostino Gemelli IRCCS in collaboration with the Istituto per le Applicazioni del Calcolo “Mauro Picone” (CNR-IAC). Designed and tested in a real-world clinical setting, MISTIC aims to identify breast cancer metastases in Italian electronic health records (EHRs) using a few-shot learning approach that requires minimal annotated data and computational resources. The pipeline combines linguistic preprocessing techniques, such as sentence segmentation and topic filtering, with a transformer-based classifier fine-tuned on a small dataset of 550 texts.
When evaluated against alternative methods—including zero-shot BERT models, rule-based systems, and large generative language models—MISTIC demonstrates a compelling balance of accuracy, generalization, and efficiency. With an F1-score exceeding 91%, it outperforms competing approaches while maintaining full explainability and requiring no GPU infrastructure.
The project addresses a key gap in biomedical NLP by targeting Italian, an underrepresented language in clinical research. With its scalable and adaptable design, MISTIC can help hospitals streamline retrospective studies, build real-world evidence datasets, and extract meaningful insights from unstructured clinical text, demonstrating the impact of tailored NLP solutions on medical research and care.

Data inizio
Data fine