This tool helps Tibetan to English translators partially automate the translation process. In the Tibetan language, the boundaries of individual words are not marked in any manner such as the way in which spaces separate and mark words in English. Instead, there is a punctuation mark called a "tsheg" which separates each syllable. Thus while syllabic boundaries are utterly explicit, word boundaries are often unclear. One of the main difficulties beginning students thus have with translating Tibetan texts is figuring out where each word ends and the next word starts, and determining what series of syllables to look up in the dictionary either as constituting a single word or a larger compound phrase. This entails a very time consuming process of looking up multiple combinations of syllables to determine which are found within a given dictionary.
It partially automates that process by breaking up a sentence/paragraph entered in Extended Wylie or Tibetan script into the biggest component parts it can find in multiple dictionary databases. Then for each component part found, it displays its stored definitions and relevant information. This will thus often yield only the definition of a long phrase, rather than its component words, but one can also search for the syllables of that phrase one by one separately.
The tool can run on-line through a:
The text is typed (or pasted) using
Extended Wylie in a
text box within a form. All of the processing is done on the server, and the
results are returned in plain HTML. This allows the user to run this version on
even the most basic browser without needing any additional software installed.
Also, because the results are returned in HTML, features of HTML like
hyperlinks, tables, and text formatting allow it to be skimmed more easily. The
user can choose between seeing the Tibetan within the results in
Extended Wylie or in Tibetan script
(using
Tibetan Machine Web font now available for free).
The text is typed in Extended Wylie, but with the added value that optionally the user can choose to see it directly in the Tibetan script (using Tibetan Machine Web font now available for free) as he types. We eventually plan to support other keyboard methods of entry as well. Here all the processing is also done on the server side, and the results are displayed interactively within the program's window. Again the user can choose to see the results in Extended Wylie or in the Tibetan script.
Even though the application runs as a stand-alone application in the desktop's user, connection to the Internet is still necessary to access the dictionary databases. Easy launching of the application can be done over the Internet using Java Web Start, which comes with Sun's Java Runtime Environment version 1.4 or higher. This is the recommended way to run the tool.
The applet runs within a browser. The browser not only needs
to support Java, but since the classes that handle the Tibetan font use Swing,
Sun's Java Runtime Environment version 1.4 or higher must additionally be installed.
The tool can also run off-line in:
The classes designed to be run from the command-line are:
Notes on Input:
chos nyid
or
chos
nyid
the parser will recognize it as a single word "chos nyid". But if the entry is:
chos / nyid
or
chos
nyid
the parser will assume "chos" and "nyid" are independent, and will be looked up separately.
Author: Andrés Montano Pellegrini