FCT Programs
About The Program Governance Contacts
Friday, July 04, 2008
Information Processing and Networking Critical Infrastructures and Risk Assessment Technology, Innovation and Policy Applied Mathematics
Language Technology

The CMU-Portugal Program on language technology involves a consortium of Portuguese Research Centers and Universities and the Language Technology Institute (LTI) at CMU. The LTI formed about 20 years ago, first as a research center, and then as an academic department in the School of Computer Science, is the leading center in language technologies in the world. The consortium in Portugal, which we will refer to as the L2F consortium, includes the Laboratório de Sistemas de Lingua Falada (L2F) (language to speech) at INESC-ID, IST, the Center of Linguistics of Universidade de Lisboa (CLUL), the group of linguistics of Universidade do Algarve (UALG), and the Center for Human Language Technology and Bioinformatics (HULTIG) of the Universidade of Beira Interior (UBI). In addition, a close cooperation is expected with the LINGUATEC network established in Portugal through FCCN (i.e., the national foundation for scientific computation, through the "Centro de Recursos Distribuído para o Processamento Computacional da Língua Portuguesa"). The collaboration between L2F and CLUL dates back to the early nineties, forming the basis for a truly interdisciplinary cooperation (engineering/linguistics). The cooperation with UALG and HULTIG is much more recent and, despite their much smaller size in terms of human language technologies, is also very active.

There are a number of areas of strong common interest that will be pursued: computer aided language learning (CALL), speech-to-speech machine translation (S2SMT), speech recognition, speech synthesis, dialogue systems, summarization, and topic detection and tracking. In particular, the CMU-Portugal Program will pursue two very important multilingual research projects: in computer aided language learning (CALL); and speech-to-speech machine translation (S2SMT). These projects will involve at least two languages, one of them being Portuguese, the target language for the CALL system to be developed and either the source or target language (or both) for the MR system. The other language is either English or Chinese (Mandarin) or both. Chinese is of particular interest to both parts, because of the existing expertise at LTI with language technologies for Chinese and the great demand from China for products involving Portuguese.

The CALL research project will involve the development of a Portuguese version of the REAP project (Reader-Specific Lexical Practice for Improved Reading Comprehension), currently in progress at LTI at CMU, and research in the associated topics. This research involves a variety of language technologies to help native or non-native students learning to read. Examples include searching for appropriate authentic documents for students to read according to reading level, topic, vocabulary list, and other teacher-specific criteria, with a search engine that finds text passages satisfying very specific lexical constraints, selects materials from an open-corpus, thus satisfying a wide range of student interests and classroom needs, and models an individual's degree of acquisition and fluency for each word in a constantly-expanding lexicon so as to provide student-specific practice and remediation. The challenges posed by such a system enable research on a wide range of very difficult reading comprehension topics.

The speech-to-speech machine translation (S2SMT) research area, one of the most strategically relevant areas for the Portuguese consortium, will investigate a Portuguese-to- English/Chinese translation system (or vice-versa) that deals with one of the two challenges of current S2SMT systems: the need for disfluency removal on the speech input side, or the inadequacy of current translation systems to produce text that is fit for a synthesizer, i.e., that can be read in a naturally sounding way. We will investigate the use of statistical based machine translation approaches, but also a hybrid approach, developed by LTI, that starts from a relatively small parallel elicitation corpus and uses rule induction. Research in S2SMT is crucially dependent on several core technologies from speech recognition to machine translation, to text-to-speech synthesis, including voice morphing.

These two projects provide a focus for the proposed research; through them the collaboration will explore the main core areas in language technology.

The CMU-Portugal Program includes a PhD degree component offered in partnership between IST and LTI at CMU, and research projects in language technologies between the L2F consortium and LTI. Other potential partners in Portugal will be defined during the initial phase of the Program.

11/09/2008
03/11/2008
02/07/2008
01/07/2008
01/07/2008
01/07/2008
30/06/2008
18/06/2008
05/06/2008
21/05/2008
07/05/2008
23/04/2008