Ph.D. Student Investigates New Techniques to Improve Speech Recognition

 João Miranda

João Miranda is a dual degree Ph.D. student in Language Technologies at the Instituto Superior Técnico of the Universidade Técnica de Lisboa (IST/UTL) and Carnegie Mellon University (CMU), within the CMU Portugal Program. As part of his Ph.D., he is developing new techniques to improve speech recognition, with his two advisors, João Paulo Neto, from IST/UTL, and Alan W. Black, from CMU.

Recently, João Miranda presented a poster on the CMU Portugal Annual Symposium about his research on “Combining multiple parallel streams for improved Automatic Speech Recognition,” which aims at combining multiple information streams in order to improve natural language processing tasks, such as speech recognition. The techniques João Miranda is developing aim at enhancing simultaneous interpretation. “Our method enables us to improve the Automatic Speech Recognizer (ASR) output for each of the interpreters by relying on information from the other streams,” he explained. Speech recognition of lectures can also be improved with these techniques, as well as simultaneous broadcast of sports events by multiple TV channels because the streams are not direct translations of each other, but instead comparable versions.

In order to achieve his goals, João Miranda is using Phrase Tables, originally developed for Statistical Machine Translation, which are going to map word sequences in one stream to those in another. “This method capitalizes on the redundancy across multiple speech streams by biasing them to agree on certain corresponding phrases,” João Miranda said.

The Ph.D. student is also studying the segmentation of the output and the detection of and recovery from disfluency. “The combination of these two techniques is expected to produce significant improvements in the output quality in the European Parliament interpretation task,” for example. João Miranda explained, “We have observed significant improvements in the European Parliament interpretation task, with up to 25% relative reduction in Word Error Rate when compared to a baseline of ASR only, and when combining three speech streams, in the English, Portuguese and Spanish languages.”

João Miranda is in his 4th year of studies. When questioned about his experience as a dual degree doctoral student with two advisors, João Miranda said that so far the experience has been “very positive,” since ”they can offer two different perspectives on the research.” The researcher highlighted the contribution of the Language Technologies Institute at CMU, where there is a “wider variety of subjects related to Language Technologies, since it is a hub for researchers in the area.”

The student was also involved in the research project “PT-STAR: Speech Translation Advanced Research to and from Portuguese,” which was carried out by Portuguese and CMU researchers in the scope of the CMU Portugal Program. The main goal of the project, which began in 2009 and ended last year, was to improve the current Speech-to-Speech Machine Translation (S2SMT) systems from Portuguese to English and vice-versa.

In the future, João Miranda wants to continue working on speech technologies. “I want to contribute to a near-term future where speech technologies are seamlessly integrated with the environment, with human-like or superior levels of performance, and provide for a closer interaction with technology that would otherwise be less accessible,” he stressed.

May 2013