The Manipulate class now includes: acipToWylie, wylieToAcip, and unicodeToWylie. They provide a simple interfase to David Chandler's converters that are used by the translation tool.
also removed remark worrying about whether convertEwtsTo
should be concerned about what kind of String it returns;
no, it need not: it is just a java.lang.String which will be
treated as such and converted to an XSL String in the appropriate
encoding by the XSLT processor.
transformations. I haven't actually used it with Xalan XSLT yet, but
it ought to work if TibetanHTML did (which it must have at one point).
I do have a unit test, but an end-to-end test with Xalan is what we
need.
Fixed ordering of Unicode wowels. [ku+A] gives the correct Unicode
now, e.g.
EWTS->TMW looks better for some wacky wowels like, I'm guessing here, [ku+A].
EWTS->TMW should now give errors any time the full input isn't used.
Previously, wacky wowels like [kai+-i] would lead to some droppage.
EWTS->TMW->Unicode testing is now in effect. This found a ton of
EWTS->TMW bugs, most or all of which are fixed now.
TMW->Unicode is improved/fixed for {
\u5350,\u534D,\u0F88+k,\u0F88+kh,U }. (Why U? "\u0f75" is
discouraged in favor of "\u0f71\u0f74".)
NOTE: TMW_RTF_TO_THDL_WYLIETest is still disabled for the nightly
builds' sake, but I ran it in my sandbox and it passed.
[hUM^]) are not yet converting correctly.
I have not yet committed the end-to-end test that I'm manually doing
to find these problems. It will be another document for
TMW_RTF_TO_THDL_WYLIETest.java. Note that thdl.debug=true is
essential to access the GUI for the EWTS->* converters.
debug mode for now.
I tested against a really simple-but-real document, found a bug with '*', tried
to implement TMW vowel code but I don't trust it yet. Differentiated EWTS
code from ACIP where needed.
Several bugs in ewts->tibetan have been exposed; see the TODO
comments.
table exactly and I fear that it makes the ACIP->Tibetan converter code
a lot uglier. The TODO(DLC)[EWTS->Tibetan] comments littered throughout
are part of the ugliness; they point to the ugliness. If each were addressed,
cleanliness could perhaps be achieved.
I've largely forgotten exactly what this change does, but it attempts to
improve EWTS->Tibetan conversion. The lexer is probably really, really
primitive. I concentrate here on converting a single tsheg bar rather than
a whole document.
Eclipse was used during part of my journey here and some imports were
reorganized merely because I could. :)
(Eclipse was needed when the usual ant build failed to run a new test
EWTSTest. And I wanted its debugger.)
Next steps: end-to-end EWTS tests should bring many problems to light. Fix
those. Triage all the TODO comments.
I don't know that I'll ever really trust the implementation. The tests are
valuable, though. A clean implementation of EWTS->Tibetan in Jython
might hold enough interest for me; I'd like to learn Python.