Tutorials Tuesday, September 23, 2003

Morning Tutorials 9:00 AM - 12:30 PM

T1 Computer assisted business process management for the language industry
Adriane Rinsche, Language Technology Centre, Ltd.
T2 Finite-state Language Processing and its applications to MT
Shuly Wintner, Department of Computer Science, University of Haifa
T3 Thanks for the Memories: Translation Memory Demo
Hans Fenstermacher, ArchiText Inc.

Afternoon Tutorials 2:00 PM - 5:30 PM

T4 Information Architecture for Controlled Authoring and Translation
Joerg Schuetz, Institute for Applied Information Sciences (IAI)
T5 Introduction to Statistical Machine Translation
Kevin Knight and Phillip Koehn, University of Southern California, Information Sciences Institute
T6 MT Customization
Remi Zajac, SYSTRAN Software, Inc.

T1: Computer-assisted management of translation/localisation projects.

Dr Adriane Rinsche, Language Technology Centre Ltd., UK

Organising multilingual documentation projects, including the authoring process of documentation, is time consuming and expensive. Errors can occur at many levels, and the human memory is limited. A typical project manager is responsible for a broad range of very time consuming tasks. Tasks like budgeting, project co-ordination, client contact, resource management, establishment of deadlines and generation of quotes are integral parts of project management.

THE LANGUAGE Technology Centre has developed a tool called LTC Organiser that has revolutionised business process management of many translation/localization companies as well as translation and localisation departments in multinational companies and international organisations. This tool is an innovative business process management and workflow control software tool that supports and enables translation and multilingual documentation projects.

The application is designed to reduce the cost of managing documentation and translation projects, including DTP and printing processes, decrease the time to market, and maximise the benefits derived from human and technical resources.

The tutorial will describe the most important aspects of the integrated solution, such as

  • Client management
  • Supplier management
  • Project management
  • Finance management
  • Translation software management
  • Reporting facilities
  • Security and user management
  • Directory management
  • Sort and search facilities
  • Web facilities: working from distributed sites
  • Web forms: giving customers and suppliers access to their records in LTC Organiser via the internet.
  • Freelance edition for freelancers' administrative requirements.

In addition, usability and customisability issues will be examined.

T1   T2   T3   T4   T5   T6   Top


T2: Finite-state Language Processing (and its applications to MT)

Shuly Wintner

Outline: Finite-state technology is becoming an invaluable tool for various levels of language processing. it is the computational means of choice for describing the phonology, lexicon and morphology of natural languages, but is used more and more for other purposes as well, including (shallow) parsing, word-level translation, named entity recognition etc.

The tutorial will provide an introduction to the technology and its many applications in natural language processing. Aimed at linguists and computer scientists alike, it starts with the very basics of finite-state devices and regular expressions and concludes with a sketch of how to design and implement a large-scale project. Several examples of real applications illustrate the formal material.

Contents:

  • Finite-state automata (FSA)
  • Regular expressions
  • Operations on automata
  • Applications of FSA in NLP
    • Storing lexicons
  • Regular relations
  • Finite-state transducers (FSTs)
  • Properties of FSTs
  • Applications of FSTs in NLP
    • Morphological analysis
    • Part of speech tagging
    • Translation dictionaries
  • Extended regular expression languages
  • Replace rules and composition
  • Applications
    • Markup
    • Morphological analysis and generation
    • Shallow parsing
  • Available tools

Prerequisites: Acquaintance with basic formal language theory and knowledge of some programming language will be useful but not mandatory.

Tutorialist:Shuly Wintner
Department of Computer Science
University of Haifa
31905 Haifa, Israel
Phone: +972 (4) 8288180
Fax: +972 (4) 8249331
shuly@cs.haifa.ac.il

Bio: I am an assistant professor with the Department of Computer Science, University of Haifa, Israel. My research involves adaptation of computer science techniques and paradigms to computational linguistics, with an emphasis on formal grammars and finite-state devices. I have an extensive teaching experience (with very good teaching evaluations), including a course titled "Unification-based linguistic formalisms" at ESSLLI-98, a tutorial at COLING-2000, a course titled "Formal language theory for natural language processing" at ESSLLI-2001 and a course on Unification Grammars at ESSLLI-2003. A complete, updated CV is available at http://www.cs.haifa.ac.il/~shuly.

T1   T2   T3   T4   T5   T6   Top


T3: Thanks for the Memories: Translation Memory Demo

Hans Fenstermacher

Overview

This tutorial focuses on demonstration of translation memory technology, and is designed to give participants a detailed overview of the Translation Memory (TM) process. The session will explain in detail:

  • what TM is
  • what the different types of TM are
  • how content is processed in TM
  • what the advantages and disadvantages of TM are
  • how TM generates cost savings, including a price quote simulation
  • some tips and tricks for working with TM from the content developer’s perspective
  • how to prepare the source content better before it goes into TM

These issues will be demonstrated by running an actual piece of preselected content through one or more commercially available TM packages (for example, SDLX or Trados). Participants will see how the actual workflow progresses, so they can judge for themselves how content is affected. The demo should be highly interactive throughout the session, and participants will be encouraged to share their own TM experiences and insights with everyone. Efforts will be made to have one or more representatives of the TM software companies present to demonstrate their products and answer questions.

Objectives
  • Demonstrate what Translation Memory is and how it works
  • Explain the differences between TM and MT
  • Demonstrate how content is processed in TM in different tools
  • Explain the pricing issues surrounding TM
  • Outline some steps to take to make TM more efficient and cost-effective
Audience

This demo is intended for a broad audience, including:

  • Project/Product managers
  • Product developers (engineers, web developers, etc.)
  • Content developers (writers, editors, etc.)
  • Anyone involved in localization or translatio

Bio: Hans Fenstermacher is President and founder of ArchiText Inc., a localization and globalization services provider. Born in Germany, Hans speaks six languages fluently and worked as an in-house and contract translator and production manager before founding ArchiText. He and his staff work with a variety of TM tools, including Trados, Déjà Vu, STAR Transit and Catalyst. At ArchiText, Hans pioneered the ABREVE® process for improving content usability and substantially reducing localization costs by globalizing and streamlining content before it is translated.

T1   T2   T3   T4   T5   T6   Top


T4: Information Architecture for Controlled Authoring and Translation

Joerg Schuetz

T1   T2   T3   T4   T5   T6   Top


T5: Introduction to Statistical Machine Translation

Kevin Knight and Philipp Koehn

Accurate translation requires a great deal of knowledge about the usage and meaning of words, the structure of phrases, the meaning of sentences, and which real-life situations are plausible. Recently, there has been a fair amount of research into extracting translation-relevant knowledge automatically from human-built bilingual texts. In the early 1990s, IBM pioneered automatic bilingual-text analysis. A 1999 workshop at Johns Hopkins University saw a re-implementation of many of the core components of this work, aimed at attracting more researchers into the field. Over the past years, several statistical MT projects have appeared in North America, Europe, and Asia, and the literature is growing substantially. We'll overview this progress.

Tutorial Contents:

Data for MT

  • bilingual corpora: what's out there?
  • acquisition and cleaning
  • what does three million words really mean?
MT Evaluation
  • manual and automatic
Core Models and Decoders
  • IBM Models 1-5 and HMM models, training, decoding
  • word alignment and its evaluation
  • phrase models
  • syntax-based translation and language models
Specialized Models
  • named entity MT, numbers and dates, morphology, noun phrase MT
Available Resources
  • tools and data
Presenters:

Kevin Knight is a Senior Research Scientist at the USC/Information Sciences Institute and an Research Associate Professor in the Computer Science Department at USC. He has written a number of articles on statistical MT, plus a widely-circulated MT workbook (http://www.isi.edu/natural-language/mt/wkbk.rtf). Dr. Knight has also given several MT-related talks at the AMTA and EMNLP conferences, such as "Statistical Machine Translation: Where Did It Go?", "Every Time I Fire a Statistician, I Get a Warm Fuzzy Feeling", and "Deeper Representations for Machine Translation: Ready or Not?"

Philipp Koehn is a Ph.D. candidate in Computer Science at the University of Southern California. He has written a number of articles on topics in statistical machine translation, including bilingual lexicon induction from monolingual corpora, word-level translation models, and translation with scarce resources. He has also worked at AT&T Laboratories on text-to-speech systems, and at WhizBang! Labs on text categorization.

T1   T2   T3   T4   T5   T6   Top


T6: MT Customization

Dr. Remi Zajac, SYSTRAN Software, Inc., San Diego, CA

MT customization is becoming the preferred option for deploying high-quality machine translation systems for specific applications. This tutorial will give a detailled description of the process and tools for customizing MT systems with examples. Topics includes why to customize an MT system, how to evaluate the costs and the potential benefits, and how to test and evaluate the customized system.

Contents:

  • High quality MT
    • Different translation requirements for different applications
    • The notion of linguistic closure
    • Machine translatability
  • Customization assessment and expected results
  • Manual vs. automatic customization
  • Customization process
    • Corpus analysis
    • Term extraction/coding
    • Syntactic pattern extraction/coding
    • Evaluation/refinement
  • MT Evaluation Issues

Presenter: Remi Zajac is Director of Computational Linguistics at SYSTRAN Software, Inc. in San Diego. He has written numerous articles on MT technologies and MT issues, and presented several tutorial on MT-related topics in conferences such as COLING.

T1   T2   T3   T4   T5   T6   Top