From Portuguese to English and Vice-Versa: An Innovative Speech Translation System

After three years of hard research work and funding, the project PT-STAR: Speech Translation Advanced Research to and from Portuguese reached its end. The achievements are remarkable: the prototype of live Speech-to-Speech Machine Translation System from Portuguese to English and vice-versa is ready and the recently work done within this project allowed the research team to integrate the Universal Speech Translation Advanced Research (U-STAR) Consortium.

Each year, more than a billion Euros is spent translating documents and interpreting speeches by the European institutions. Therefore, the main goal of the PT-STAR project, carried out in the scope of the Carnegie Mellon Portugal Program, was to improve speech translation systems for Portuguese by strengthening the integration of the three components of the Speech-to-Speech Machine Translation (S2SMT), namely Automatic Speech Recognition (ASR), Machine Translation (MT) and Text-to-Speech Synthesis (TTS).

“When we address speech-to-speech translation we need to work with very complex models of speech-to-text conversion (recognizers), of text in a language-to-text in another language (translators), and text-to-speech (synthesizers),” explained Luísa Coheur adding that “this is a fascinating work in a scientific level.”

During the PT-STAR project, the researchers completed several tasks. The first one was the translation of spontaneous speech. At the end of this task, the researchers were able to do the integration between the two modules: Automatic Speech Recognition (ASR) and Machine Translation (MT). The second task was to make the voice conversion, which allowed the synthesized speech to retain the characteristics of the original voice, making it very useful for a wide range of S2SMT applications. It followed the third task, which addressed major problems in statistical machine translation: the study of different methods to automatically build aligned parallel corpora from non-aligned ones, the updating of the translation model, and the use of fully supervised, semi-supervised and completely unsupervised approaches for adapting the system, using actual user results. The fourth task targeted the implementing of a proof of concept prototype. All the tasks were completed successfully and the prototype is working properly.

Demonstration of Real-time S2SMT System

In the course of this project it was possible to make “improvements in full stops and commas insertions, capitalization and detection of interrogatives; to take advantage of in-domain texts to build domain adapted language models for ASR/MT; to take advantage of imperfect transcriptions (in which annotations do not include laughter, applause, filled pauses, repetitions, or other disfluencies, and sometimes contain errors); to build acoustic models for ASR; to build Statistical Parametric Synthetic voices for Portuguese; to develop a language independent statistical Intonation model; to cross lingual voice morphing to match source speaker. To develop techniques for optimal synchronization using MT N-best list; to carry out a framework for building real-time translation systems; among other achievements."

The research team of the project believes that “in social terms the project is very important because it makes us move towards the elimination of language barriers,” said Luísa whose team is already working “on a system that will enable speech-to-speech translation from Portuguese to Chinese and vice-versa.”

Moreover the project finished, Luísa Coheur alleged that “there is a lot to do and the work between the INESC-ID and the LTI at CMU will continue. We would love to bet on a PT-STAR 2.” There are “several doctoral students focusing their work on the conversion of the pitch from one language to another, on the extraction of pairs of phrases from parallel text, the processing of disfluencies in spontaneous speech, the combination of systems for simultaneous translation into various languages, among others,” explained Luísa Coheur.

Integration at the Universal Speech Translation Advanced Research (U-STAR) Consortium
Luísa Coheur is very pleased to be part of the Universal Speech Translation Advanced Research (U-STAR - http://www.ustar-consortium.com/) consortium. “Our recently work within this project made it possible,” Coheur said. The Consortium is an international research collaboration entity formed to develop a network-based speech-to-speech translation (S2ST). It aims of breaking language barriers around the world and to implement vocal communication between different languages.

INESC ID Luísa Coheur Team 
The L2F Group from the INESC-ID.

The PT-STAR project, funded by the Portuguese Foundation for Science and Tecnology (FCT), started on May 2009, and finished on July 2012. It involved a consortium of researchers from several entities: INESC-ID - Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa; Universidade da Beira Interior (UBI), Fundação da Universidade de Lisboa (FUL/UL), Carnegie Mellon University. Like all the CMU Portugal research projects, the PT-STAR has two principal investigators: Luísa Coheur from INESC-ID in Lisbon, Portugal and Alan Black from Carnegie Mellon University.

August, 2012

What do you think of this article? Share your thoughts and concerns about articles, suggest topics, or contribute with articles and pictures. Please send your feedback to: news@cmuportugal.org .