Conference Aims and Scope

The International Conference on Speech Technology and Human–Computer Dialogue (SpeD) is a biennial scientific event that brings together researchers, practitioners, and industry professionals working in the fields of speech communication, language technologies, and multimodal human–computer interaction. Since its inception, the conference has aimed to promote the exchange of knowledge, foster collaboration, and showcase recent advances in speech and language technologies that shape the way humans and machines interact.

The aim of SpeD 2025 is to provide a platform for presenting and discussing innovative research, emerging trends, and practical applications in speech and audio processing, with a particular emphasis on methods that bridge the gap between human communication and artificial intelligence. The conference encourages interdisciplinary approaches that combine speech technology, linguistics, and cognitive science, as well as contributions exploring ethical, social, and technological implications of speech-based systems.

The scope of the conference covers a broad range of topics in speech and language technologies, including but not limited to:

  • Automatic Speech Recognition (ASR): algorithms, models, and systems for accurate and robust transcription of spoken language in diverse acoustic and linguistic conditions.
  • Audio Deepfakes and Forensics: detection, analysis, and prevention of manipulated or synthetic speech; forensic applications for speaker verification and authenticity assessment.
  • Text-to-Speech (TTS) Synthesis: neural and statistical approaches to generating natural, expressive, and intelligible synthetic speech.
  • Speech Emotion Recognition (SER): computational methods for analyzing affective and paralinguistic cues in speech.
  • Automatic Speaker Recognition and Diarization: techniques for speaker identification, verification, and segmentation in multi-speaker environments.
  • Audio and Speech Signal Processing: enhancement, separation, coding, and transformation of speech and audio signals.
  • Multimodal and Audio-Visual Speech Processing: integration of visual, linguistic, and contextual cues for improved understanding and synthesis of human communication.
  • Natural Language Processing (NLP): models and tools for understanding, generating, and interacting with human language, including dialogue systems and large language models.

By bringing together contributions from academia and industry, SpeD 2025 aims to strengthen the research community in speech and language technologies and to stimulate innovation in intelligent systems capable of perceiving, understanding, and generating human communication.