Keynote Speakers

Isabel Trancoso – bio

Isabel Trancoso is a full professor at Instituto Superior Técnico (IST, Univ. Lisbon), and the President of the Scientific Council of INESC ID Lisbon. She got her PhD in ECE from IST in 1987. She chaired the ECE Department of IST. She was Editor-in-Chief of the IEEE Transactions on Speech and Audio Processing and had many leadership roles in SPS (Signal Processing Society of IEEE) and ISCA (International Speech Communication Association), namely having been President of ISCA and Chair of the Fellow Evaluation Committees of both SPS and ISCA. She was elevated to IEEE Fellow in 2011, and to ISCA Fellow in 2014.

Abstract – Speech as Personally Identifiable Information

Speech is the most natural and immediate form of communication. It is ubiquitous. The tremendous progress in language technologies that we have witnessed in the past few years has led to the use of speech as input / outpt modality in a panoplia of applications which have been mostly reserved for text until recently.

Many of these applications run on cloud-based platforms that provide remote access to powerful models in what is commonly known as Machine Learning as a Service (MLaaS), enabling the automation of time-consuming tasks (such as transcribing speech), and help users to perform everyday tasks (e.g. voice-based virtual assistants).

When a biometric signal such as speech is sent to a remote server for processing, however, this input signal can be used to determine information about the user, including his/her preferences, personality traits, mood, health, political opinions, among other data such as gender, age range, height, accent, etc.

Although there is a growing society awareness about user data protection (the GDPR in Europe is such an example), most users of such remote servers are unaware of the amount of information that can be extracted from a handful of their sentences – in particular, about their health status. In fact, the potential of speech as a biomarker for health has been realized for diseases affecting respiratory organs, such as the common Cold, or Obstructive Sleep Apnea, for mood disorders such as Depression, and Bipolar Disease, and neurodegenerative diseases such as Parkinson’s, Alzheimer’s, and Huntington’s disease. The potential for mining this type of information from speech is however largely unknown.

The current state of the art in speaker recognition is also largely unknown. Many research studies with humans involve speech recordings. In the past, such recordings were stored, claiming that all user information is anonymised, but given that recent challenges in speaker recognition involve corpora of around 6,000 speakers, this anonimity may nowadays be questionable.

Users are also generally unaware of the potential misuse of their speech data for voice cloning. In fact, the enormous progress in speech synthesis/morphing raises spoofing concerns for automatic speaker verification systems.

The discussion of all these issues requires joining forces of different communities – the speech research community, the cryptography research community, and the legal community. Their different taxonomy is probably the first obstacle to conquer. The GDPR contains few norms that have direct applicability to inferred data, requiring an effort of extensive interpretation of many of its norms, with adaptations, to guarantee the effective protection of people’s rights in an era where speech must be legally regarded as PII (Personable Identifiable Information).

Jean-Christophe Pesquet – bio

Jean-Christophe Pesquet (Fellow, IEEE 2012) received the engineering degree from Supélec, Gif-sur-Yvette, France, in 1987, the Ph.D. and HDR degrees from Universit’e Paris-Sud in 1990 and 1999, respectively. From 1991 to 1999, he was an Assistant Professor at Universite Paris-Sud, and a Research Scientist at the Laboratoire des Signaux et Systemes (CNRS). From 1999 to 2016, he was a Full Professor at Universite Paris-Est and from 2012 to 2016, he was the Deputy Director of the Laboratoire d’Informatique of the university (CNRS). He is currently a Distinguished Professor at Centrale Supélec, Universite Paris-Saclay, and the Director of the Center for Visual Computing and OPIS Inria group. His research interests include statistical signal/image processing and optimization methods with applications to data science. He has also been a Senior Member of the Institut Universitaire de France since 2016.

Abstract – Forward-backward Steps and Variations

This talk provides an overview of the forward-backward (FB) algorithm, which is a prominent tool for solving optimization problems in signal and image processing. This algorithm belongs to the class of proximal methods and puts in a unifying framework many traditional optimization schemes. In particular, the FB algorithm is instrumental for solving nonsmooth optimization problems which are encountered in sparse estimation or compressed sensing. A higher mathematical view of this algorithm can also be given through modern fixed point theory, making it possible to address sophisticated variational problems. Finally, it is shown that this algorithm is also closely related to feedforward neural networks. This link brings insight into the development of more robust and more explainable neural architectures.

Yoshikazu Miyanaga – bio

Dr. Yoshikazu Miyanaga is the president and CEO, Chitose Institute of Science and Technology (CIST), Hokkaido, Japan. He is a professor emeritus, Hokkaido University, Japan, also the adjunct professors, University of Technology Sydney (UTS), Australia, and King Mongkut’s University of Technology Thonburi (KMUTT), Thailand. He was the President, IEICE (Institute of Electronics, Information and Communication Engineers), Engineering Science (ES) Society (2015-2016) and an auditor of IEICE (2018-2020). He is a fellow member of IEICE. He was a distinguished lecture (DL) of IEEE (Institute of Electrical and Electronics Engineers), CAS Society (2010-2011), an associate editor of IEEE CAS Transaction on TCAS-II (2011-2013) and he was a Board of Governor (BoG) of IEEE CAS Society (2011-2013). He is ExCom member, IEEE Sapporo Section. He was an honorary chair and general chair/co-chairs of international conferences/symposiums/workshops, e.g., ISMAC, IEEE 2016 – 2020, ISCIT, IEEE 2016 – 2020, IEEE ISCAS 2019, ICCE-Asia, IEEE 2019.

Abstract – Psychoacoustic Techniques for Noise-Robust Speech Recognition

This topic introduces the design of a noise robust automatic speech recognition (ASR) system. It is suitable for speech communication robots, and in particular for ASR robots isolated from internet. For almost all of speech communication robots, a strong noisy robust speech recognition has been demanded. For both of a continuous speech dialog-based and a command-based ASR, we have designed strong robust ASR systems against various noise circumstances.
In this presentation, noise robust speech analysis techniques have been introduced. In order to develop the robustness under low SNR, Dynamic Range Adjustment (DRA) and Modulation Spectrum Control (MSC) have been first developed for the robust speech features and they focus on the speech feature adjustment with important speech components. The DRA normalizes dynamic ranges and the MSC eliminates the noise corruption of speech feature parameters.
In addition to DRA and MSC, the psychoacoustic masking effects for speech feature extraction in ASR is also introduced in this presentation. It is based on the human auditory system. Generally, the mel-frequency cepstral coefficients (MFCC) are widely used speech features in ASR systems, and however one of their main drawbacks is the lack of psychoacoustic processing, which can affect and hamper the results. This presentation introduces noise robust speech features which improve upon MFCC and its modified features. A psychoacoustic model-based feature extraction which simulates the perception of sound in the human auditory system is investigated and integrated into the front-end technique of the proposed ASR system. This new approach has been useful for noise robust speech recognition embedded into AI-Robots.