Doctoral Student Wants to Give a Strong Contribution to the Machine Translation Community

 Wang Ling 2011

Wang Ling is a dual degree Ph.D. student in Language Technologies, at the Instituto Superior Técnico of the Universidade Técnica de Lisboa (IST/ULT) and Carnegie Mellon University (CMU), who has been carrying out research on Machine Translation. He is under the supervision of Isabel Trancoso, from the IST/UTL and researcher at the L2F Spoken Systems Lab / INESC ID, and Alan Black, from Carnegie Mellon University. Wang Ling goal is to “give a strong contribution to the Machine Translation community.” 

During his first academic year, 2010/2011, this doctoral student developed an open-source Machine Translation toolkit named Geppetto, and had six papers accepted in several top conferences in the Machine Translation area. The toolkit Geppeto is used “to generate translation models for Phrase-based Machine Translation systems,” explained Wang Ling. This tool is available at http://code.google.com/p/geppetto/. With this project, Wang Ling presented one system description paper and a paper (http://www.inesc-id.pt/pt/indicadores/Ficheiros/6696.pdf) about this toolkit, at the International Workshop on Spoken Language Translation (IWSLT) 2010, in the Machine Translation track. The paper was written by several authors: Wang Ling, Tiago Luís and João Graça, Ph.D. students at the IST/UTL, and Luísa Coheur and Isabel Trancoso, researchers at the L2F Spoken Systems Lab, INESC ID.

As part of his Ph.D. course project, Wang Ling was involved in the conception of an educational game aimed at teaching a second language to foreign students which employed Machine Translation. “An automated agent is employed as an opponent in order to improve the user’s motivation and maintain the user focused,” said Wang Ling adding that “the agent’s actions are based on statistical machine translation outputs.” The system demo and description paper, written by Wang Ling, Isabel Trancoso, and Rui Prada, from IST/UTL, were submitted and accepted in the ISCA (International Speech Communication Association) Special Interest Group (SIG) on Speech and Language Technology in Education (SLaTE) 2011[http://www.inesc-id.pt/pt/indicadores/Ficheiros/7361.pdf]. The test was conducted among 20 Portuguese learners of Mandarim in the Missão Macau, facilities in Lisbon, and in Centro Cientifico e Cultural de Macau, where weekly Mandarin Classes were given. This system had a web-based implementation and is easily accessible by language learners.

Finally, he had a partial contribution in a work involving Brazilian Portuguese to European Portuguese translation, which leads to a paper in the 15th Annual Conference of the European Association for Machine Translation (EAMT) 2011. In this paper the authors - Luís Marujo, dual degree doctoral student in Language Technologies, Nuno Grazina, from the Spoken Language Laboratory / INESC-ID Lisboa, Tiago Luís, from the Spoken Language Laboratory / INESC-ID Lisboa, Wang Ling, Luisa Coheur, and Isabel Trancoso - describe a method to “efficiently leverage Brazilian Portuguese resources as European Portuguese resources.” Based on this study, the authors of the paper derived a rule based system to translate Brazilian Portuguese resources. Some resources were enriched with multiword units retrieved semi-automatically from phrase tables created using statistical machine translation tools. Their experiments suggest that applying their translation step improves the translation quality between English and Portuguese, relatively to the same process without this adaptation step. [http://www.inesc-id.pt/pt/indicadores/Ficheiros/7268.pdf]

Therefore, he finds “working in research very entrancing, especially when the results of your research are the fruit of your own ideas.”

Wang Ling’s work in Portugal has been focused mainly on Machine Translation, namely in improving the translation quality of the translation system. Although, the PT-STAR project, were he is a researcher, is focused on Speech-to-Speech translation, where Machine Translation is only one component within the pipeline, improvements in this component generally lead to an improvement in the overall Speech-to-Speech translation quality. In the future, Wang Ling said that he might be interested in working with Speech Recognition, since he thinks that the field is both relevant to the project and interesting, but for the time being, “I think that there is still a lot to be done in Machine Translation.”

Wang Ling is an open mind student that wants to learn as much as possible from all the very influential people in the field of language processing and machine learning. He expects “to make a strong contribution to the Machine Translation community”, a goal that he says that it is “not limited to expanding the current knowledge base in the Machine Translation Field, but also provides tools that can aid the current and the following generations of researchers to expand this field.”

October 2011