Title: Cross-modal understanding and generation of multimodal content
Author: Nicu Sebe
University of Trento, Italy
Video generation consists of generating a video sequence so that an object in a source image is animated according to some external information (a conditioning label, a driving video, a piece of text). In this talk I will present some of our recent achievements addressing generating videos without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to any object of this class. Based on this, I will present our framework to train game-engine-like neural models, solely from monocular annotated videos. The result —a Learnable Game Engine (LGE)— maintains states of the scene, objects and agents in it, and enables rendering the environment from a controllable viewpoint. Similarly to a game engine, it models the logic of the game and the underlying rules of physics, to make it possible for a user to play the game by specifying both high- and low-level action sequences. Our LGE can also unlock the director’s mode, where the game is played by plotting behind the scenes, specifying high-level actions and goals for the agents in the form of language and desired states. This requires learning “game AI”, encapsulated by our animation model, to navigate the scene using high-level constraints, play against an adversary, devise the strategy to win a point.
Nicu Sebe is a professor in the University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human-computer interaction in computer vision applications. He received his PhD from the University of Leiden, The Netherlands and has been in the past with the University of Amsterdam, The Netherlands and the University of Illinois at Urbana-Champaign, USA. He was involved in the organization of the major conferences and workshops addressing the computer vision and human-centered aspects of multimedia information retrieval, among which as a General Co-Chair of the IEEE Automatic Face and Gesture Recognition Conference, FG 2008, ACM International Conference on Multimedia Retrieval (ICMR) 2017 and ACM Multimedia 2013. He was a program chair of ACM Multimedia 2011 and 2007, ECCV 2016, ICCV 2017, ICPR 2020 and a general chair of ACM Multimedia 2022. He is a program chair of ECCV 2024. He is a fellow of ELLIS, IAPR and a Senior member of ACM and IEEE.
Title: Synthetic Speaking Children – Why We Need Them and How to Make Them
Author: Peter Corcoran
University of Galway, Ireland
Researchers working on human-centric machine vision and speech analysis have found GDPR to be a huge challenge. Much of today’s research relies on neural network models and requires large training datasets for optimal performance. But what do you do when your research focus is to build HCI interfaces for a Smart-Toy and you need data from a vulnerable population, such as young children in order to train your Edge-AI HCI modules? GDPR imposes many complexities in collecting, managing and processing data from real children. Fortunately, it is now feasible to leverage state-of-the-art GAN and other generative neural technologies to build data samples at scale. Here we show how StyleGAN-2 can be fine-tuned to build a gender balanced dataset of children’s faces, including controllable facial expressions, age variations, facial pose and even speech-driven animations with photo-realistic lip-synch. Combining fastpitch, advanced voice augmentation approaches and generative text-to-speech models we can create realistic children’s voices and leveraging the Media framework together with speech driven neural lip-synch models we can build highly realistic, completely synthetic, talking child heads that can be used, for example, to fine-tune neural computer vision and speech recognition models on an Edge-AI smart-toy platform. This talk will outline the relevant technologies and demonstrate how a methodology and pipeline architecture can be used to build such data samples.
Peter Corcoran (Fellow, IEEE) is currently the Personal Chair of electronic engineering with the College of Science and Engineering, University of Galway, Ireland. He was the Co-Founder of several start-up companies, notably FotoNation (currently the Imaging Division, Xperi Corporation). He has more than 600 cited technical publications and patents, more than 120 peer-reviewed journal articles, and 160 international conference papers, and is a co-inventor on more than 300 granted U.S. patents. He is an IEEE Fellow recognized for his contributions to digital camera technologies, notably in-camera red-eye correction and facial detection. He is a member of the IEEE Consumer Technology Society for more than 25 years. He is the founding Editor of IEEE Consumer Electronics Magazine.
Title: Information Technology and Geolinguistics
Authors: Silviu-Ioan Bejinariu1,2, Vasile Apopei1, Manuela Nevaci2, Florin-Teodor Olariu3, Nicolae Saramandu2
1 Institute of Computer Science, Romanian Academy Iasi Branch
2 “Iorgu Iordan – Al. Rosetti” Institute of Linguistics, Romanian Academy
3 “A. Philippide” Institute of Romanian Philology, Romanian Academy Iasi Branch
Dialectology studies the evolution in time and space of the spoken language as well as the relationships and influences between more or less related languages which are spoken in the same area. Traditionally, to create an image of linguistic variation, linguistic atlases are published. This concern exists in all countries, which periodically publish such works: Atlas linguistique de la France, The Linguistic Atlas of England, Atlante Linguistico Italiano, Atlas of North American English. Similar studies are carried out in international collaborations for regions where the languages used have common origins, as Atlas linguistique roman. More complex works are also carried out, e.g., Mediterranean linguistic atlas and Atlas Linguarum Europae. In recent years, traditional printed atlases are being replaced by electronic versions on CD/DVD or web applications that include multimedia facilities, e.g., Linguistic Atlas of Dolomitic Ladinian and neighbouring dialects, The Audio-Visual Linguistic Atlas of Bukovina. Although in Romania there is a tradition of more than a century in the study of dialects, only in the last two decades the computer technologies have started to be used for the development of specific research tools. This paper presents the main achievements in the field resulting from the collaboration between computer scientists and dialectologists. We believe that the greatest advantage of computerization is the digitalization and storage of linguistic information in electronic databases, allowing its use in further research. Specific image processing methods were used to synthesize the images of all possible symbols (more than 100.000) to be used in the phonetic transcription specific to the Romanian language but also for the recovery of old manually edited materials. Computer graphics methods were required in the implementation of the applications used to prepare the phonetic and interpretive maps for their inclusion in printed language atlases. Because in dialectology the geographical position is an important attribute of the data, it is obvious that it is necessary to use the specific methods of Geographical Information Systems with all their advantages. The collaboration in the last 25 years between Institute of Computer Science, “Iorgu Iordan – Al. Rosetti” Institute of Linguistics and “A. Philippide” Institute of Romanian Philology has enabled the development of the applications used for the publication of: the New Romanian Linguistic Atlas by Regions – Moldavia and Bukovina volumes III, IV and V, the Romanian Linguistic Atlas by Regions – Muntenia and Dobrogea vol. VI, the Linguistic Atlas of the Aromanian Dialect vol. II and Atlas Linguarum Europae vol I, fasc IX. The preparation of other atlases is in progress: the Romanian Linguistic Atlas by Regions – Synthesis, vol. IV and the Atlas of Romanian Dialects from the North and South of the Danube. Also, tools for dialectal text editing and dialectal texts – audio recordings synchronization were developed.
Silviu-Ioan Bejinariu (speaker) is senior researcher at the Institute of Computer Science, Romanian Academy, Iaşi Branch. He graduated from “Alexandru Ioan Cuza” University from Iaşi, Romania, Faculty of Computer Science and obtained his Ph.D. in Electronic Engineering and Telecommunications at School of Advanced Studies of the Romanian Academy. His research interests include: image processing, computer vision, human motion analysis, nature-inspired optimization algorithms, parallel computing, geographic information systems and geo-linguistics. Using the experience gained in the field of image processing and geographic information systems implementation, he developed research tools that were used by dialectologists for the computer-aided publication of regional, national and international linguistic atlases, as well as for the editing of dialectal texts written using the Romanian language-specific phonetic transcription. He has published over 100 papers in international journals and conferences. He received the “Gheorghe Cartianu” Romanian Academy Prize, in Information Science and Technology domain in 2013 and “Octav Mayer” Prize given by the Iaşi Branch of the Romanian Academy in 2006.
Vasile Apopei is senior researcher at the Institute of Computer Science, Romanian Academy, Iaşi Branch. He graduated from “Gheorghe Asachi” Technical University of Iasi, Faculty of Electrical Engineering and obtained his Ph.D. in Electronic Engineering and Telecommunications at School of Advanced Studies of the Romanian Academy. His research interests include: the development of methods and algorithms for the management, processing and publication of resources within the fundamental project of the Romanian Academy “Linguistic Atlases”; development of algorithms and methods for the analysis of audio signals with applications in the detection of the occurrence of dangerous events and in the modeling of prosody elements. He received, together with the teams he coordinated, the “Mihai Drăgănescu” Romanian Academy Prize in 2017 and the “Octav Mayer” Prize of the Romanian Academy, Iași Branch in 2006.
Manuela Nevaci is Senior Researcher I at “Iorgu Iordan – Al. Rosetti” Institute of Linguistics of the Romanian Academy; Professor and PhD conductor at University of Bucharest, specialist in dialectology, linguistic geography, Balkanology. Publications: The verb in Aromanian. Structure and Values, 2006, Publishing House of the Romanian Academy; Romanian Identity in the Balkan Context, Bucharest, 2013, Publishing House of the National Museum of Romanian Literature; Syntheses of Romanian Dialectology (with Nicolae Saramandu), 2013, Publishing House of the University of Bucharest; Romanian Linguistic Atlas on regions. Synthesis, vol. II, III, 2012, 2018 (coordinator: Nicolae Saramandu), Publishing House of the Romanian Academy; Atlas Linguarum Europae (ALE). Volumes I.8, I.9., 2014, 2015, Publishing House of the University of Bucharest; Linguistic Atlas of the Aromanian dialect, vol. I, II, (Nicolae Saramandu, author and Manuela Nevaci, editor), 2014, 2020, Publishing House of the Romanian Academy.
Florin-Teodor Olariu is a senior researcher at the Department of Dialectology and Sociolinguistics, the “Alexandru Philippide” Institute of Romanian Philology – the Iasi Branch of the Romanian Academy. He works at the priority project of the Romanian Academy The New Romanian Linguistic Atlas by Regions. Moldavia and Bukovina and coordinates the researches for The Audio-Visual Linguistic Atlas of Bukovina, a pilot-project for Romanian geolinguistics. Areas of competence: Romanian dialectology, sociolinguistics (sociolinguistics of migration, minority languages sociolinguistics), geolinguistics (computerization of linguistic cartography), pragmalinguistics (conversation analysis, sociopragmatics), corpus linguistics (spoken and dialectal corpora).
Nicolae Saramandu is corresponding member of the Romanian Academy, Senior researcher I at “Iorgu Iordan – Al. Rosetti” Institute of Linguistics of the Romanian Academy; Professor emeritus and PhD conductor at University of Bucharest. He has a long experience in dialect research, as well as in the development of regional, national and international linguistic atlases. He was a scholarship holder of the “Alexander von Humboldt” Foundation at University of Tübingen. He was lecturer and visiting professor at the universities of Freiburg, Bamberg and Marburg. He is the president of Atlas Linguarum Europae and he coordinated the publication of several works: Linguistic Atlas of the Aromanian dialect, Romanian Linguistic Atlas by Regions – Synthesis, Megleno-romanian Dictionary, Toponymic dictionary of Romania – Muntenia. Throughout the 59 years of scientific research in the Romanian Academy, he presented over 200 communications at international congresses and he has published more than 300 articles in specialized journals.