PT-STAR: Speech Translation Advanced Research to and from Portuguese
Start Date: May 1st, 2009 End Date: July 31st, 2012
PIs: Luísa Coheur (INESC ID), Alan Black (CMU)
Dual Degree Ph.D. Students: João Tiago de Sousa Miranda (Language Technologies), Wang Lin (Language Technologies), Gopala Krishna Anumanchipalli (Language Technologies)
Teams: INESC/ID, IST/UTL, FLUL, UBI, Carnegie Mellon University
Speech-to-Speech Machine Translation (S2SMT) technologies aim at enabling natural language communication between people that do not share the same language. S2SMT can be seen as a cascade of three major components: Automatic Speech Recognition, Machine Translation and Text-to-Speech Synthesis. One of the main problems of this multidisciplinary area, however, is the still weak integration between the three components. The main goal of PT-STAR is to improve speech translation systems for Portuguese by strengthening this integration.
The framework of this project is one of the two main research topics of the CMU-Portugal cooperation agreement in the area of language technologies - Speech-to-Speech Machine Translation (S2SMT). This cooperation involves on the CMU side the Language Technologies Institute (LTI), and on the Portuguese side a consortium of universities and research centers also involved in drafting the original cooperation proposal in this area: the Spoken Language Systems Lab (L2F) of INESC-ID Lisboa, the Center of Linguistics of the University of Lisbon (CLUL), and the University of Beira Interior (UBI).
S2SMT is a multilingual, multidisciplinary topic where LTI’s research is undoubtedly one of the best at world wide level, and where the language specific expertise of the Portuguese research teams may greatly complement this know-how. The main goal is the improvement of the current S2SMT systems from Portuguese to English and vice-versa. A third language (Chinese) will be the target of a PhD thesis on machine translation, jointly supervised by Professors from L2F and the University of Macau. The informal cooperation of this University in the framework of the current proposal will therefore contribute to enhance its scope, encompassing typologically different languages.
S2SMT can be seen as a cascade of three major components: Automatic Speech Recognition (ASR), Machine Translation (MT) and Text-to-Speech Synthesis (TTS). One of the main problems of this multidisciplinary area, however, is the still weak integration between the three components. The main goal of this project is to improve speech translation systems for Portuguese by strengthening this integration. Hence the project encompasses three main tasks. The first one addresses the interface ASR/MT (Task T1); the second one addresses the interface MT/TTS (Task T2); and the third one the MT module itself (Task T3). A fourth task will build a proof of concept prototype (Task T4).
One of the main goals of the first task is the translation of spontaneous speech, for which the performance of the ASR component seriously degrades added by an even greater impact in the translation system. Tightening the integration between the two modules is another major goal that can be approached in several ways, either by feeding the confusion networks produced by the ASR module to the MT module, or by investigating how the segmentation of the input speech affects its translation, as the phrase segmentation of the input is a critical issue in phrase-based translation.
In what concerns the second task, one of the major problems to be addressed is the fact that the current output of the MT module is totally unsuited as input to the TTS module, being often non-grammatical, and producing non-understandable speech, as a consequence. Another major subtask is voice conversion, which allows the synthesized speech to retain the characteristics of the original voice, making it very useful for a wide range of S2SMT applications. The third task addresses major problems in statistical machine translation: the study of different methods to automatically build aligned parallel corpora from non-aligned ones, the updating of the translation model, and the use of fully supervised, semi-supervised and completely unsupervised approaches for adapting the system, using actual user results.
Finally, the fourth task targets at implementing a proof of concept prototype. The goal of the project is not building a complete speech-to-speech translation system itself, which involves many engineering issues, but rather invest in the core research necessary to strengthen this area, thus providing an umbrella project for the PhD theses that will be devoted to this topic in the framework of the CMU-Portugal program.
The Principal Investigator of this project is Luísa Coheur, on behalf of the L2F team. The co-PI of the LTI team will be Alan W. Black, from CMU. The co-Pis on behalf of CLUL and UBI will be Céu Viana and Gäel Dias, respectively. Isabel Trancoso will be in charge of the joint CMU-Portugal doctoral programs in the area of language technologies, on behalf of the team of Portuguese universities Hence, her involvement in this project will also extend beyond the direct participation in some of the tasks.
PT-STAR prototype translates PT to EN and vice-versa (input is speech), and one can opt between two synthetic voices. We had the opportunity to perform a live demo of our prototype during the project evaluations (a video can be found in https://www.l2f.inesc-id.pt/demos/pt- star/Demo_S2S.mov).
Articles published in the Portuguese media:
PT-STAR Project: Researchers Create a Speech Translation System for Portuguese [Ciência Hoje, July 2012]
Computers that Talk with People [Sábado Magazine, January 2011]
PT STAR at SIC Notícias [SIC Notícias, July 2009]