|
TUESDAY |
MT. TAMALPAIS |
|
8 October |
|
|
9:00 — 12:00 |
Workshop |
|
12:00 — 2:00 |
LUNCH |
|
2:00 — 5:00 |
Workshop |
|
WEDNESDAY |
BELVEDERE |
TIBURON |
MT. TAMALPAIS |
|
|
9 October |
|
|
|
|
|
9:00 — 12:00 |
Tutorial |
Tutorial |
Tutorial |
|
|
12:00 — 2:00 |
LUNCH |
|
||
|
2:00 — 5:00 |
Tutorial |
Tutorial |
Tutorial |
|
|
6:00 — 8:00 |
WELCOME
RECEPTION – BELEVERON ROOM |
|
||
|
THURSDAY |
BELVERON |
MT.
TAMALPAIS |
|
10 October |
|
|
|
8:45 — 9:00 |
Opening Remarks Elliott Macklovitch,
Conference Chair & AMTA President |
|
|
9:00 — 10:00 |
Invited Speaker Empiricism from TMI-1992 to AMTA-2002 to AMTA-2012:
Have IBM Models 1-5 failed? |
|
|
10:00 — 10:30 |
BREAK
– Exhibits |
|
|
10:30 — 12:00 |
Panel 1 Taking MT from research to real users: have we made any
progress? |
Technical Papers Using Word Formation Rules
to Extend MT Lexicons Deriving Semantic Knowledge from Descriptive Texts
using an MT System Korean-Chinese Machine
Translation Based on Verb Patterns |
|
12:00 — 1:30 |
LUNCH
– Exhibits |
|
|
1:30 — 3:00 |
Technical Papers Semi-automatic Compilation of Bilingual Lexicon Entries
from Cross-Lingually Relevant News Articles on WWW News Sites Adaptive Bilingual Sentence Alignment Fast and Accurate Sentence Alignment of Bilingual
Corpora |
System Demos Translation by the Numbers: Language Weaver The KANTOO MT System: Controlled Language Checker and
Lexical Maintenance Tool Fluent Machines’ EliMT System |
|
3:00 — 3:30 |
BREAK
– Exhibits |
|
|
3:30 — 5:00 |
Technical Papers Better Contextual Translation Using Machine Learning Handling Translation Divergences: Combining Statistical
and Symbolic Techniques in Generation-Heavy Machine Translation DUSTer: A Method for Unraveling Cross-Language
Divergences for Statistical Word-Level Alignment |
System Demos Natural Intelligence in a Machine Translation System MSR-MT: The Microsoft Research Machine Translation
System |
|
5:15 — 6:00 |
AMTA General
Membership Meeting – BELVERON ROOM >>>ALL AMTA MEMBERS ARE INVITED TO ATTEND<<< |
|
|
FRIDAY |
BELVERON |
MT.
TAMALPAIS |
|
11 October |
|
|
|
9:00 — 10:30 |
Technical Papers Efficient Integration of Maximum Entropy Lexicon Models
within the Training of Statistical Alignment Models Using a Large Monolingual Corpus to Improve Translation
Accuracy Text Prediction with Fuzzy Alignments |
User Studies An Assessment of Machine Translation for Vehicle
Assembly Process Planning at Ford Motor Company A Report on the Experiences of Implementing an MT
System for Use in a Commercial Environment Getting the Message In: A Global Company’s Experience
with the New Generation of Low-Cost, High Performance Machine Translation
Systems |
|
10:30 — 11:00 |
BREAK
– Exhibits |
|
|
11:00 — 12:00 |
Invited Speaker Romantics of the Translation Market |
|
|
12:00 — 1:30 |
LUNCH – Exhibits |
|
|
1:30 — 3:00 |
Panel 2 MT methodologies: what works, what doesn’t, and what’s
the right mix? |
System Demos The NESPOLE! Speech-to-Speech Translation System Approaches to Spoken Translation LogoMedia
TRANSLATE™, version 2.0 |
|
3:00 — 3:30 |
BREAK
– Exhibits |
|
|
3:30 — 5:00 |
Technical Papers Bootstrapping the Lexicon Building Process for Machine
Translation between 'New' Languages Automatic Rule Learning for Resource-Limited MT Classification Approach to Word Selection in Machine
Translation |
System Demo A New Family of the PARS Translation Systems User Presentation Cisco Systems and Systran Software: An Ongoing
Partnership in MT |
|
7:00-9:00 |
BANQUET The Waterfront Restaurant and Cafe Pier 7, The Embarcadero, San Francisco |
|
SATURDAY |
BELVERON |
MT.
TAMALPAIS |
|
12 October |
|
|
|
9:00 — 10:30 |
Panel 3 Who’s making/saving money with MT and how are they
doing it? |
Technical Papers Example-based Machine Translation via the Web Toward a Hybrid Integrated Translation Environment Merging Example-Based and Statistical Machine
Translation: An Experiment |
|
10:30 — 11:00 |
BREAK – last chance to see Exhibits! |
|
|
11:00 — 12:00 |
Invited Speaker Stone soup revisited, or the unity of MT as the prime
NLP task. |
|
|
12:00 — 12:15 |
Closing Remarks Elliott Macklovitch, Conference Chair |
|
Empiricism
from TMI-1992 to AMTA-2002 to AMTA-2012: Have IBM Models 1-5 failed?
Ken Church (AT&T Labs Research)
Abstract: The organizers of this conference asked me to comment on what's changed since TMI-92 (if anything). There was great excitement at TMI-92 about using aligned parallel corpora to assist human translation. There was also a lot of controversy over the IBM Models 1-5, which was shaking up the field.
So what's happened since then? Empiricism has come of age. What used to be considered radical is now accepted practice. The new field of Machine Learning has absorbed many good (and formally controversial) ideas including the IBM Models 1-5. Yarowsky's work on Word Sense Disambiguation grew out of Machine Translation, but is now widely cited in Machine Learning as an early example of co-training. Mercer's fighting words, "More data is better data," doesn't seem as shocking when Bill makes the case a decade later.
It took a decade or two for the revival of empirical methods to become popular (perhaps too popular). I worry that the pendulum has swung so far that we are no longer training students for the possibility that the pendulum might swing the other way. We ought to be preparing students with a broad education including Statistics and Machine Learning as well as Linguistic Theory.
Empiricism has not only come of age in academic venues (e.g., conferences, textbooks), but also in commercial venues. Many of the alignment tools and suggestions proposed in "Good Applications for Crummy MT" and elsewhere are currently being sold by Trados and others. There are some even better apps than the ones we imagined: e.g., CLIR (cross-language information retrieval) and MT in web search engines (Systran & AltaVista).
So, what do I expect to happen over the next decade?
Bio: Ken Church is currently the head of a data mining department in AT&T Labs-Research. He received his BS, Masters and PhD from MIT in computer science in 1978, 1980 and 1983, and immediately joined AT&T Bell Labs, where he has been ever since (though the name of the organization has changed). In 2001, Ken received the honor of being selected as an AT&T fellow. He has worked in many areas of computational linguistics including: acoustics, speech recognition, speech synthesis, OCR, phonetics, phonology, morphology, word-sense disambiguation, spelling correction, terminology, translation, lexicography, information retrieval, compression, language modeling and text analysis. He enjoys working with very large corpora such as the Associated Press newswire (1 million words per week). His data mining department is currently applying similar methods to much larger data sets such as telephone call detail (1-10 billion records per month).
Romantics
of the Translation Market
Jaap Van der Meer
(SYSTRAN consultant; former CEO of ALPNET)
Abstract: No
matter how much effort has been put into treating translation as a process like
any other, the profession is really still regarded as a vocation or an art that
can not be measured or standardized. This cultural background has contributed
to an archaic industry with a cascaded supply chain. But the demand is changing
and what’s even more important: information consumption is changing. A new
environment is emerging where machine translation can really make a big
difference. Is the world of MT ready for the challenge? In his speech Jaap van
der Meer will address the following topics:
§
Historical
overview of the translation market and typology of actors
§
Cost
and efficiencies in the translation market
§
The
role of technology in the translation market
§
Changing
demands, information recycling as a paradigm
§
Enterprise
applications for machine translation and their ROI
§
New
hybrid solutions
Bio: Until the end of 2001 Jaap van der Meer was President and CEO of ALPNET. Since his departure from ALPNET upon the merger with SDL, he is advising various technology companies. Since his debut in the localization market in 1980 he has been a great advocate of translation automation. His first company, INK International, launched the first desk-top translation memory software. He also published the Language Technology magazine for several years covering many of the pioneering technologies. In 1990 he launched the idea for a localization industry association and he funded the establishing of LISA. In 1999 he helped to start the SAE TopTec Multilingual Automotive Information conference. At ALPNET he spearheaded the implementation of an end-to-end automated localization process, including machine translation, centralized translation memory and automated workflow. Currently, Jaap is consulting with SYSTRAN on the promotion of machine translation in the enterprise market. He is also a member of the Translation Vendor Web Services steering committee.
Stone
soup revisited, or the unity of MT as the prime NLP task.
Yorick WIlks
(University of Sheffield, UK)
Abstract: MT researchers of a certain age will remember that, about fifteen years ago, the group under Jelinek and Brown at IBM mounted an attack on the idea of MT as a purely linguistic/symbolic enterprise, and argued that engineering methods based purely on text statistics, and derived from their success in speech recognition, could yield fundamental advance in MT. There were debates at conferences and in newsletters and matters came to a head in the DARPA MT competitions of the early Nineties, where both types of system (supported by DARPA) were pitted against each other and against commercial systems , including SYSTRAN. The answer was pretty clear, statistical MT did well, better than many expected, but never beat SYSTRAN over texts and domains for which neither had been trained.
Many believe that nothing much has happened since, but I will argue that that is not so. What has happened above all is the web, which has both provided a new easy-accessible market for MT, through page translation, and has also provided a source of vast corpora, unimaginable before. However, that availability has not yet been cashed in: there is an enormous amount of work, of both sorts and above all as hybrids of both, but nothing fundamental has yet enabled purely empirical methods to overcome the data-sparseness problem, not even the web itself, viewed as a corpus. It seems pretty clear that some form of symbolic methods will be needed to do that. Again, that opposition is increasingly hard to make, as "symbolic" methods now themselves tend to be empirically based, and refer only to information types, rather than to structures that are written down directly from intuition.
Most striking has been the division of the old MT task up into sub-tasks, each being tackled and evaluated independently--the most MT relevant case has been Word Sense Discrimination--- but whose limited successes have not, so far, been built back into more advanced MT itself. Again, MT has disintegrated in another way, in that multilingual functionality over a whole range of tasks, up from simple text editing to summarization and information extraction, are now such that one cannot really say whether they are MT or not. None of this should matter if real advances for language workers are being made, and they are. But intellectually, it can be bewildering, as in the recent turning of tables in which it has been argued that information retrieval should be seen as a form of machine translation (as opposed to vice versa!).
Bio: YORICK WILKS is Professor of Computer Science at the University of Sheffield and Director of ILASH, the Institute of Language, Speech and Hearing. For the eight years 1985-93 he was Director of the Computing Research Laboratory at New Mexico State University, a centre for research in artificial intelligence and its applications. He received his doctorate from Cambridge University in 1968 for work in computer programs that understand written English in terms of a theory later called "preference semantics": the claim that language is to be understood by means of a search for semantic "gists", combined with a coherence function over such structures that minimises effort in the analyser.
This has continued as the focus of his work, and has had applications in the areas of machine translation, the use of English as a "front end" for users of data bases, and the computation of belief structures. He was a researcher at Stanford AI Laboratory, and then Professor of Computer Science and Linguistics at the University of Essex. He has published numerous articles and nine books in that area of artificial intelligence, of which the most recent are Artificial Believers (with Afzal Ballim) from Lawrence Erlbaum Associates (1991) and Electric Words: dictionaries, computers and meanings (with Brian Slator and Louise Guthrie) MIT Press, (1995).
He is a member of the (UK) EPSRC College of Computing, and also a Fellow of the American Association for Artificial Intelligence and the European Society for Artificial Intelligence (ECCAI). He is on the boards of some fifteen AI-related journals. He currently works in the areas of information extraction from text sources, computational pragmatics and dialogue systems, and the automatic construction and maintenance of linguistic resources such as lexicons, ontologies and grammars. He has been principal researcher at Sheffield on a range of Fourth and Fifth Framework EU contracts: AVENTINUS, ECRAN, EUROWORDNET, SIMPLE, PAROLE, NAMIC, MUMIS and AMITIES. He has substantial experience managing and directing large research projects for national agencies such as the EPSRC (UK) and DARPA and NSF (US).