SpeD 2025 – Welcome message

The “SpeD 2025” Organizing Committee warmly invites you to attend the 13th Conference on Speech Technology and Human-Computer Dialogue, in Cluj-Napoca, Romania. The conference will be held in-person at the Technical University of Cluj-Napoca.

The conference will bring together scientists, developers, and professionals to present their work, meet colleagues, discuss new ideas, and build collaboration between university, research center, and commercial sector research groups. The technical program will include oral sessions, keynotes by renowned speakers, and demonstrations of latest research on a wide range of topics positioned at the forefront of science and engineering in speech technology and human-computer dialogue.

The past editions of the “SpeD” conference series were sponsored by IEEE and EURASIP (technical sponsors), the proceedings being indexed by the IEEE Xplore® Digital Library, Scopus, and the Web of Science Conference Proceedings Citation Index (the WoS indexing process has not been finalized for the previous 2023 edition). This year, papers accepted and presented during the conference will also be submitted for inclusion into IEEE Xplore, subject to meeting IEEE Xplore’s scope and quality requirements, and will be sent for indexing in Web of Science.

Joint event

This year, the Language Data Space (LDS)  Workshop organised by the Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy, will be co-located with “SpeD”.

Main Topics

  • Self-Supervised and Generative Models for Speech Representation
  • Robust Spoken Language Recognition and Understanding
  • Efficient and Low-Resource Speech Recognition for Edge and Embedded Systems
  • Neural Text-to-Speech (TTS) and Expressive Speech Synthesis
  • End-to-End Speech-to-Speech Translation and Multimodal Language Models
  • Speaker Recognition, Diarization, and Adaptive Speaker Embeddings
  • Conversational Search, Spoken Document Understanding, and Retrieval-Augmented Generation (RAG)
  • Paralinguistic Speech Processing and Emotion Recognition in the Wild
  • Speech Enhancement, Dereverberation, and Noise-Robust Processing
  • AI-driven Speech Technology: Large-Scale Models and Fine-Tuning Strategies
  • Conversational AI, Large Language Models, and Multimodal Dialogue Systems
  • Speech Forensics, Deepfake Detection, and Synthetic Speech Analysis
  • Clinical Speech Processing for Health, Well-being, and Cognitive Assessment
  • Multilingual and Low-Resource Speech Data Collection, Annotation, and Benchmarking
  • Human-Centric Speech Interfaces: UX, Personalization, and Ethical Design
  • Voice AI for Smart Environments, Assistive Tech, and Wearable Devices
  • Speech Pathology, Augmentative Communication, and AI-Driven Therapy
  • Bias, Fairness, and Ethical Considerations in Speech AI Deployment
  • Next-Gen Speech and Speaker Recognition: Continual Learning and Adaptation
  • Multimodal and Audio-Visual Speech Processing with Foundation Models
  • Cross-Modal Information Retrieval and Multisensory AI
  • Advanced Audio Signal Processing for Spatial and 3D Audio Applications
  • AI-Powered Human-Robot Interaction and Conversational Embodied Agents
  • Efficient, Scalable, and Sustainable Deep Learning for Speech Processing

Additional Topics in NLP and Multimodal Processing

  • Text Summarization and Abstractive Generation
  • Language Modeling and Pre-trained Architectures (e.g., Transformers)
  • Automatic Question Answering and Knowledge Extraction
  • Cross-lingual and Multilingual Natural Language Processing Applications
  • NLP for Social Media and Online Communication
  • NLP for Code and Programming Language Understanding
  • NLP in Virtual and Augmented Reality Applications
  • NLP for Low-Resource Languages
  • Bias Detection and Fairness in NLP Systems
  • Explainability and Interpretability in NLP Models
  • Speech-to-Image and Image-to-Text Systems
  • Event Detection and Narrative Understanding
  • Spoken and Written Language Alignment Models
  • Multimodal Emotion Recognition and Analysis
  • Human-like Emotion Generation in Multimodal AI Systems
  • Gesture and Gaze Integration in Multimodal Systems
  • End-to-End Multimodal Dialogue Systems
  • Multimodal Data Fusion Techniques
  • Real-Time Multimodal Interaction Systems

Schedule (provisional)

  • Paper submission (5 – 6 pages, IEEE format): June 2, 2025.
  • Notification of acceptance and reviewers’ comments: August 15, 2025.
  • Submission of final papers: September 5, 2025.
  • Conference: October 19-22, 2025.