/* The contents of this file are subject to the AMP Open Community License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License on the AMP web site (http://www.tibet.iteso.mx/Guatemala/). Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific terms governing rights and limitations under the License. The Initial Developer of this software is Andres Montano Pellegrini. Portions created by Andres Montano Pellegrini are Copyright 2001 Andres Montano Pellegrini. All Rights Reserved. Contributor(s): ______________________________________. */ package org.thdl.tib.scanner; import java.io.*; /** Converts Tibetan dictionaries stored in text files into a binary file tree structure format, to be used by some implementations of the SyllableListTree.
Syntax (Dictionary files are assumed to be .txt. Don't include extensions!):
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator [-delimiter] dict-name
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator dest-file-name [-delimiter1] dict-name1 [[-delimiter2] dict-name2 ...]
-delimiter
bkra shis - 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. bde legs - 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.
If this were the content of a file called "my-glossary.txt" the binary tree file would be generated with the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator my-glossary
bkra shis 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. bde legs 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.
Here, the binary tree file would be generated with the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -tab my-glossary
bkra shis ** 1) auspiciousness, good luck, good fortune, goodness, prosperity, happiness. 2) auspicious, favorable, fortunate, successful, felicitous, lucky. 3) verse of auspiciousness; benediction, blessing. 4) a personal name. bde legs ** 1) goodness, happiness, well-being, wellfare, auspiciousness, good fortune. 2) well, fine.
Here, the binary tree file would be generated with the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -** my-glossary
@1 (ka) ka ba/ gdung 'degs don byed nus pa/ rkyen/ grogs byed @2 (kha) khyod dngos po dang de byung 'brel/ khyod dngos po las byung zhing/ dngos po ldog stops kyis khyod ldog pa/ khyod dngos po dang bdag gcig 'brel/ khyod ngos po dang bdag nyid gcig pa'i sgo nas tha dad gang zhig/ dngos po ldog stops kyis khyod ldog pa/ khyod dngos po dang 'brel pa/ khyod dngos po dang tha dad gang @3 zhig/ ngos po ldog stobs kyis khyod ldog pa/ kha dog mdog du rung ba'am/ sngo ser dkar dmar sogs mdog tu rung ba'i gzugs/
Here the binary tree file would be generated with the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator -acip my-glossary
Comments: Notice in the sample text that at the beginning of page 2, "zhig" is not a
new definiendum, but still is part of the definition of "khyod dngos po dang 'brel
pa". Also the definiendum of the last entry is "kha dog"
(the shad was omitted after "ga" suffix) and not "kha dog mdog du rung ba'am".
Nevertheless the definiendum of the second term is not "khyod dngos po dang bdag"
since there is no omitted shad after that "ga" suffix; the
definiedum is "khyod dngos po dang bdag gcig 'brel". As is clear from the
sample text, the tool has to make a series of "smart guesses" to try to figure
out where each definiendum end and it's definition start. Such process is
not 100% full-proof, so expect some mistakes.
Dictionaries in different formats can be processed together. For instance the command:
java -cp DictionarySearchStandalone.jar org.thdl.tib.scanner.BinaryFileGenerator alldicts ry-dic99 -acip myglossary_uma -tab myglossary_rdzogs-chen
would generate alldicts.def and alldicts.wrd processing ry-dic99.txt
as dash-separated, myglossary_rdzogs-chen.txt as tab-separated and
myglossary_uma.txt in the transliteration format explained above.