I really hesitate to commit this because I'm not sure what it brings to the

table exactly and I fear that it makes the ACIP->Tibetan converter code
a lot uglier.  The TODO(DLC)[EWTS->Tibetan] comments littered throughout
are part of the ugliness; they point to the ugliness.  If each were addressed,
cleanliness could perhaps be achieved.

I've largely forgotten exactly what this change does, but it attempts to
improve EWTS->Tibetan conversion.  The lexer is probably really, really
primitive.  I concentrate here on converting a single tsheg bar rather than
a whole document.

Eclipse was used during part of my journey here and some imports were
reorganized merely because I could.  :)

(Eclipse was needed when the usual ant build failed to run a new test
EWTSTest.  And I wanted its debugger.)

Next steps: end-to-end EWTS tests should bring many problems to light.  Fix
those.  Triage all the TODO comments.

I don't know that I'll ever really trust the implementation.  The tests are
valuable, though.  A clean implementation of EWTS->Tibetan in Jython
might hold enough interest for me; I'd like to learn Python.
This commit is contained in:
dchandler 2005-06-20 06:18:00 +00:00
parent f64bae8ea6
commit 7198f23361
45 changed files with 1666 additions and 695 deletions

View file

@ -19,10 +19,6 @@ Contributor(s): ______________________________________.
package org.thdl.tib.text.ttt;
import org.thdl.util.ThdlDebug;
import org.thdl.tib.text.TibetanMachineWeb;
import org.thdl.tib.text.DuffCode;
import java.util.ArrayList;
/** An ordered pair used in ACIP/EWTS-to-TMW/Unicode conversion. The
* left side is the consonant or empty; the right side is either the
@ -182,8 +178,14 @@ class TPair {
/** Returns true if this pair contains a Tibetan number. */
boolean isNumeric() {
char ch;
return (l != null && l.length() == 1 && (ch = l.charAt(0)) >= '0' && ch <= '9');
if (l != null && l.length() == 1) {
char ch = l.charAt(0);
return ((ch >= '0' && ch <= '9')
|| (ch >= '\u0f18' && ch <= '\u0f33')
|| ch == '\u0f3e' || ch == '\u0f3f');
}
return false;
// TODO(DLC)[EWTS->Tibetan]: what about half-numbers?
}
String getWylie() {
@ -209,7 +211,7 @@ class TPair {
if (null == leftWylie) leftWylie = "";
if (justLeft) return leftWylie;
String rightWylie = null;
if ("-".equals(getRight()))
if (traits.disambiguator().equals(getRight()))
rightWylie = ".";
else if ("+".equals(getRight()))
rightWylie = "+";
@ -238,8 +240,9 @@ class TPair {
consonantSB.append(x);
}
if (null != getRight()
&& !("-".equals(getRight()) || "+".equals(getRight()) || "A".equals(getRight()))) {
String x = traits.getUnicodeFor(getRight(), subscribed);
&& !(traits.disambiguator().equals(getRight())
|| "+".equals(getRight()) || traits.aVowel().equals(getRight()))) {
String x = traits.getUnicodeForWowel(getRight());
if (null == x) throw new Error("TPair: " + getRight() + " has no Uni");
vowelSB.append(x);
}