Programme SpeD 2025

The 13th Conference on Speech Technology and Human-Computer Dialogue

October 19-22 – Cluj-Napoca, Romania

Aula HUB UTC-N, 3rd floor, George Barițiu 4, Cluj-Napoca

Monday, October 20th, 2025

8:30-9:30 Registration
9:00-9:30 Opening notes
9:30-10:30 Keynote “Building Trust in the Age of Speech Deepfakes: From Detection to Source Tracing and Robust Biometrics”
Tomi Kinnunen, University of Eastern Finland
10:30-11:00 Coffee break
11:00-12:20
Oral Session: Audio deepfakes and forensics
Chair: Tomi Kinnunen
11:00
11:20
Audio Splicing Detection Using Self-Supervised Representations
Octavian Pascu, Horia Cucu
11:20
11:40
Augmented Transfer Learning for Synthetic Speech Detection
Irina Mutica, Serban Mihalache, Gheorghe Pop, Dragoș Burileanu
11:40
12:00
Spoofed Speech Detection for Physical and Logical Access Applications using Deep Neural Networks
Cristian-Teodor Neamţu, Șerban Mihalache, Dragoș Burileanu
12:00
12:20
Detecting Audio Deepfakes on the Edge: Lightweight SSL-Based Detection in a Browser Plugin
Octavian Pascu, Dan Oneata, Horia Cucu, Nicolas Müller
12:20-14:00 Lunch – onsite
14:00-14:40 Keynote “New research directions of Romanian Academy Research Institute for Artificial Intelligence “Mihai Drăgănescu” (ICIA)”
Paul-Andrei Păun, Romanian Academy
14:40-16:00
Oral Session: Natural Language Processing
Chair: Camelia Lemnaru
14:40
15:00
A Study on Metric-Human Alignment in Code Summarization by LLMs through Kolmogorov-Arnold Networks
Georgian Nicolae, Corneliu Burileanu
15:00
15:20
Effects of Attention Head Pruning on Encoder-only Language Models for Multilingual Recipe Classification
Angheluș Alin-Gabriel, Vlad Andrei Negru, Camelia Lemnaru, Rodica Potolea
15:20
15:40
Safeguarding Online Trust: NLP-ML Detection of Computer-Generated Reviews
Cornelia Ionela Bădoi
15:40
16:00
Transcription and Classification of Compound and Special Numeral Entities Using Artificial Intelligence and Rule-Based Methods
Vasile Mitruț, Ștefania Ștefănescu, Ștefan-Adrian Toma
16:00-16:30 Coffee break
16:30-17:50
Oral Session: Emotion, speaker recognition, diarization, multimodal
Chair: Peter Mihajlik
16:30
16:50
Improved visually prompted keyword localisation in real low-resource settings
Leanne Nortje, Dan Oneata, Gabriel Pirlogeanu, Herman Kamper
16:50
17:10
On the Contribution of Lexical Features to Speech Emotion Recognition
David Combei
17:10
17:30
From Lab to Field: Practical Deployment and Evaluation of a Speaker Recognition System
Raluca-Ionela Costan, Mihai Coca, Ștefan-Adrian Toma
17:30
17:50
Expanding and Refining RoMEMEs: A Multimodal Corpus of Romanian Memes for Advanced AI Analysis
Vasile Păiș, Daniela Gîfu
19:00-21:00 Conference Banquet

Tuesday, October 21st, 2025

9:00-9:30 Registration
9:30-10:30 Keynote: “Evaluating speech technology – challenges and opportunities”
Cassia Valentini-Botinhao, University of Edinburgh
10:30-11:00 Coffee break
11:00-12:40
Oral Session: Synthetic speech
Chair: Cassia Valentini-Botinhao
11:00
11:20
Latent Insights: Exploring Phoneme Diversity in Natural and Synthetic Speech through Latent Representations
Diptasree Debnath, Helard Becerra, Andrew Hines
11:20
11:40
Style-Controlled VALL-E for Few-Shot Emotional German TTS
Rami Kammoun, Mohammed Salah Al-Radhi
11:40
12:00
Towards Hungarian-English Code-Switching Speech Dataset Construction by using Multi Speaker-adaptive Text-to-Speech Synthesis
Piroska Zsófia Barta, Peter Mihajlik
12:00
12:20
Adding Emotion Conditioning in Speech Synthesis via Multi-Term Classifier-Free Guidance
Radu-George Bolborici, Ana Antonia Neacșu
12:20
12:40
Improving Speech Synthesis by Using a Cognitive View on F0 Contours
Doina Jitcă
12:40-14:00 Lunch – onsite
14:00-14:20
Sponsor: “Orange Romania Research and Development Ecosystem”
Răzvan Mihai, Orange Romania
14:20-14:40
Sponsor: “Infineon Romania – Semiconductors R&D Competence Center”
Valentina Davidoiu, Infineon Romania
14:40-15:40
Oral Session: Automatic Speech Recognition
Chair: Ștefan-Adrian Toma
14:40
15:00
Impact of Text Origin and Real-Synthetic Data Ratio in TTS-Augmented Low-Resource ASR
Mengke Dalai, Peter Mihajlik
15:00
15:20
Open Source State-Of-the-Art Solution for Romanian Speech Recognition
Gabriel Pîrlogeanu, Alexandru-Lucian Georgescu, Horia Cucu
15:20
15:40
Cross-lingual Transfer Learning Experiments for Arabic ASR
Amin Hassairi, Peter Mihajlik
15:40-16:10 Coffee break
16:10-17:50
Oral Session: Signal processing
Chair: Constantin Paleologu
16:10
16:30
Pretrained Speech Models Learn Boundaries, Not Patterns: An Analysis of Supervised vs. Unsupervised Capabilities
Abdul Rehman, Kavisha Jayathunge, Jian-Jun Zhang, Xiaosong Yang
16:30
16:50
ADNAC: Audio Denoiser using Neural Audio Codec
Lucian Daniel Jimon, Mircea Florin Vaida, Adriana Stan
16:50
17:10
A Robust Decomposition-Based RLS Algorithm for Echo Cancellation Applications
Radu-Andrei Otopeleanu, Camelia Elisei-Iliescu, Constantin Paleologu, Jacob Benesty, Cristian-Lucian Stanciu, Cristian Anghel
17:10
17:30
Range of lip movement during the production of selected consonants in Kannada speakers
Ajish Abraham, V. Sivaramakrishnan, N. Swapna, N. Manohar
17:30
17:50
About the spatial arrangement of acoustic sensors for monitoring protected wildlife area
Corneliu Rusu
17:50 Closing notes

Page last updated: 02.10.2025