diff --git a/source/org/thdl/tib/scanner/AcipToWylie.java b/source/org/thdl/tib/scanner/AcipToWylie.java index f7f1485..659685d 100644 --- a/source/org/thdl/tib/scanner/AcipToWylie.java +++ b/source/org/thdl/tib/scanner/AcipToWylie.java @@ -21,8 +21,20 @@ package org.thdl.tib.scanner; import java.net.*; import java.io.*; -/** Provides interfase to convert from tibetan text transliterated in - the Acip scheme to THDL's extended wylie scheme. +/** Provides an interfase to convert from tibetan text transliterated in the Acip scheme to THDL's Extended Wylie scheme. + +
If no arguments are sent, it takes the Acip text from the standard input and sends the +Wylie text to the standard output. If one argument is sent, it interprets it as the +file name for the input. If two arguments are sent, it interprets the first one as the file name for the input and +the second one as the file name for the output. For example, the following +command converts the lam-rim-chen-mo.act storing the results in +lam-rim-chen-mo.txt:
+java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.AcipToWylie lam-rim-chen-mo.act lam-rim-chen-mo.txt+
Alternatively by redirecting the standard input/output you perform the same +job:
+java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.AcipToWylie < lam-rim-chen-mo.act > lam-rim-chen-mo.txt+
If you only want to display the results to the screen, you can run:
+java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.AcipToWylie lam-rim-chen-mo.act | more@author Andrés Montano Pellegrini @see WindowScannerFilter diff --git a/source/org/thdl/tib/scanner/BinaryFileGenerator.java b/source/org/thdl/tib/scanner/BinaryFileGenerator.java index 33d3ac4..68ae631 100644 --- a/source/org/thdl/tib/scanner/BinaryFileGenerator.java +++ b/source/org/thdl/tib/scanner/BinaryFileGenerator.java @@ -23,8 +23,121 @@ import java.io.*; into a binary file tree structure format, to be used by some implementations of the SyllableListTree. -
The text files must be in the format used by the - The Rangjung Yeshe Tibetan-English Dictionary of Buddhist Culture.
+Syntax (Dictionary files are assumed to be .txt. Don't include extensions!):
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator [-delimiter] dict-name+
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator dest-file-name [-delimiter1] dict-name1 [[-delimiter2] dict-name2 ...]+
-delimiter
bkra shis - 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. +bde legs - 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.+
If this were the content of a file called "my-glossary.txt" the + binary tree file would be generated with the command:
+java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator my-glossary+
bkra shis 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. +bde legs 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.+
Here, the + binary tree file would be generated with the command:
+java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -tab my-glossary+
bkra shis ** 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. +bde legs ** 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.+
Here, the + binary tree file would be generated with the command:
+java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -** my-glossary+
@1 + +(ka) + +ka ba/ gdung 'degs don byed nus pa/ + +rkyen/ grogs byed + +@2 + +(kha) + +khyod dngos po dang de byung 'brel/ khyod dngos po las byung +zhing/ dngos po ldog stops kyis khyod ldog pa/ + +khyod dngos po dang bdag gcig 'brel/ khyod ngos po dang bdag +nyid gcig pa'i sgo nas tha dad gang zhig/ dngos po ldog +stops kyis khyod ldog pa/ + +khyod dngos po dang 'brel pa/ khyod dngos po dang tha dad gang + +@3 + +zhig/ ngos po ldog stobs kyis khyod ldog pa/ + +kha dog mdog du rung ba'am/ sngo ser dkar dmar sogs mdog tu +rung ba'i gzugs/+
Here the + binary tree file would be generated with the command:
+java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -acip my-glossary+
Comments: Notice in the sample text that at the beginning of page 2, "zhig" is not a
+new definiendum, but still is part of the definition of "khyod dngos po dang 'brel
+pa". Also the definiendum of the last entry is "kha dog"
+(the shad was omitted after "ga" suffix) and not "kha dog mdog du rung ba'am".
+Nevertheless the definiendum of the second term is not "khyod dngos po dang bdag"
+since there is no omitted shad after that "ga" suffix; the
+definiedum is "khyod dngos po dang bdag gcig 'brel". As is clear from the
+sample text, the tool has to make a series of "smart guesses" to try to figure
+out where each definiendum end and it's definition start. Such process is
+not 100% full-proof, so expect some mistakes.
+
Dictionaries in different formats can be processed together. For instance the +command: +
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator alldicts ry-dic99 -acip myglossary_uma -tab myglossary_rdzogs-chen+
would generate alldicts.def and alldicts.wrd processing ry-dic99.txt
+as dash-separated, myglossary_rdzogs-chen.txt as tab-separated and
+myglossary_uma.txt in the transliteration format explained above.
+
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.ConsoleScannerFilter ry-dic99+
It reads from the standard input and prints the results to the + standard output. For example if you want to parse a text stored in puja.txt + and save the results in puja_words.txt, you can run the command:
+java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.ConsoleScannerFilter ry-dic99 < puja.txt > puja_words.txt@author Andrés Montano Pellegrini */ diff --git a/source/org/thdl/tib/scanner/WindowScannerFilter.java b/source/org/thdl/tib/scanner/WindowScannerFilter.java index 9858008..b6e76a7 100644 --- a/source/org/thdl/tib/scanner/WindowScannerFilter.java +++ b/source/org/thdl/tib/scanner/WindowScannerFilter.java @@ -33,7 +33,12 @@ import org.thdl.tib.input.DuffPane; Tibetan script) and displays the words (Roman or Tibetan script) with their definitions. Works without Tibetan script in platforms that don't support Swing. Can access dictionaries stored - locally or remotely. + locally or remotely. For example, to access the public dictionary database run the command: +
java -jar DictionarySearchStandalone.jar http://iris.lib.virginia.edu/tibetan/servlet/org.thdl.tib.scanner.RemoteScannerFilter+
If the JRE you installed does not support Swing classes but supports + + AWT (as the JRE for handhelds), run the command:
+java -jar DictionarySearchHandheld.jar -simple ry-dic99@author Andrés Montano Pellegrini */ diff --git a/source/org/thdl/tib/scanner/package.html b/source/org/thdl/tib/scanner/package.html index bf7b40e..0d87955 100644 --- a/source/org/thdl/tib/scanner/package.html +++ b/source/org/thdl/tib/scanner/package.html @@ -8,14 +8,134 @@ --> -Provides classes and methods for translating Tibetan text to English. -
-Right now, this package scans Tibetan text, but we aim to make it parse Tibetan text. -
-Author: Andrés Montano Pellegrini +Provides the classes to take Tibetan language passages and divide the passages up +into their component phrases and words, and display corresponding dictionary definitions. +
This tool helps Tibetan to English translators partially automate the +translation process. In the Tibetan language, the boundaries of individual words +are not marked in any manner such as the way in which spaces separate and mark +words in English. Instead, there is +a punctuation mark called a "tsheg" which separates each syllable. Thus while syllabic boundaries are utterly explicit, word boundaries are +often unclear. One of the main +difficulties beginning students thus have with translating Tibetan texts is +figuring out where each word ends and the next word starts, and determining what +series of syllables to look up in the dictionary either as constituting a single +word or a larger compound phrase. This +entails a very time consuming process of looking up multiple combinations of +syllables to determine which are found within a given dictionary.
+It partially automates that process by +breaking up a sentence/paragraph entered in + Extended Wylie or Tibetan script +into the biggest component parts it can find in multiple dictionary databases. +Then for each component part found, it displays its stored definitions and +relevant information. This will +thus often yield only the definition of a long phrase, rather than its component +words, but one can also search for the syllables of that phrase one by one +separately.
+The tool can run on-line through a:
+The text is typed (or pasted) using
+ Extended Wylie in a
+text box within a form. All of the processing is done on the server, and the
+results are returned in plain HTML. This allows the user to run this version on
+even the most basic browser without needing any additional software installed.
+Also, because the results are returned in HTML, features of HTML like
+hyperlinks, tables, and text formatting allow it to be skimmed more easily. The
+user can choose between seeing the Tibetan within the results in
+ Extended Wylie or in Tibetan script
+(using
+Tibetan Machine Web font now available for free).
+
The text is typed in + Extended Wylie, but with the added +value that optionally the user can choose to see it directly in the Tibetan script +(using +Tibetan Machine Web font now available for free) as he types. We +eventually plan to support other keyboard methods of entry as well. Here all the processing is also done on the server side, and the results +are displayed interactively within the program's window. Again the user can choose +to see the results in + Extended Wylie or in the Tibetan script.
+Even though the application runs as a stand-alone application in the +desktop's user, connection to the Internet is still necessary to access the +dictionary databases. Easy launching of the application can be done over the +Internet using Java Web +Start, which comes with +Sun's Java Runtime Environment version 1.4 or higher. This is the +recommended way to run the tool.
+The applet runs within a browser. The browser not only needs
+to support Java, but since the classes that handle the Tibetan font use Swing,
+Sun's Java Runtime Environment version 1.4 or higher must additionally be installed.
+
The tool can also run off-line in:
+The classes designed to be run from the command-line are:
+Notes on Input:
+chos nyid
+or
+chos
+nyid
the parser will recognize it as a single word "chos nyid". +But if the entry is:
+chos / nyid
+or
+chos
+nyid
+the parser will assume "chos" and "nyid" are independent, +and will be looked up separately.
+Author: Andrés Montano Pellegrini