also removed remark worrying about whether convertEwtsTo
should be concerned about what kind of String it returns;
no, it need not: it is just a java.lang.String which will be
treated as such and converted to an XSL String in the appropriate
encoding by the XSLT processor.
transformations. I haven't actually used it with Xalan XSLT yet, but
it ought to work if TibetanHTML did (which it must have at one point).
I do have a unit test, but an end-to-end test with Xalan is what we
need.
Fixed ordering of Unicode wowels. [ku+A] gives the correct Unicode
now, e.g.
EWTS->TMW looks better for some wacky wowels like, I'm guessing here, [ku+A].
EWTS->TMW should now give errors any time the full input isn't used.
Previously, wacky wowels like [kai+-i] would lead to some droppage.
EWTS->TMW->Unicode testing is now in effect. This found a ton of
EWTS->TMW bugs, most or all of which are fixed now.
TMW->Unicode is improved/fixed for {
\u5350,\u534D,\u0F88+k,\u0F88+kh,U }. (Why U? "\u0f75" is
discouraged in favor of "\u0f71\u0f74".)
NOTE: TMW_RTF_TO_THDL_WYLIETest is still disabled for the nightly
builds' sake, but I ran it in my sandbox and it passed.
[hUM^]) are not yet converting correctly.
I have not yet committed the end-to-end test that I'm manually doing
to find these problems. It will be another document for
TMW_RTF_TO_THDL_WYLIETest.java. Note that thdl.debug=true is
essential to access the GUI for the EWTS->* converters.
debug mode for now.
I tested against a really simple-but-real document, found a bug with '*', tried
to implement TMW vowel code but I don't trust it yet. Differentiated EWTS
code from ACIP where needed.
Several bugs in ewts->tibetan have been exposed; see the TODO
comments.
table exactly and I fear that it makes the ACIP->Tibetan converter code
a lot uglier. The TODO(DLC)[EWTS->Tibetan] comments littered throughout
are part of the ugliness; they point to the ugliness. If each were addressed,
cleanliness could perhaps be achieved.
I've largely forgotten exactly what this change does, but it attempts to
improve EWTS->Tibetan conversion. The lexer is probably really, really
primitive. I concentrate here on converting a single tsheg bar rather than
a whole document.
Eclipse was used during part of my journey here and some imports were
reorganized merely because I could. :)
(Eclipse was needed when the usual ant build failed to run a new test
EWTSTest. And I wanted its debugger.)
Next steps: end-to-end EWTS tests should bring many problems to light. Fix
those. Triage all the TODO comments.
I don't know that I'll ever really trust the implementation. The tests are
valuable, though. A clean implementation of EWTS->Tibetan in Jython
might hold enough interest for me; I'd like to learn Python.
Servlet version's minimum is now 1.3, which is the version number in UVa's iris server.
Handheld version's minimum is now 1.4. Seems high, but it has been designed for some
time to run on WebSphere Everyplace Micro Environment Personal Profile 1.0 and 1.4 is
the maximum it takes. I didn't find any compatible freeware Java Runtime Environment for windows ce.
For everything else, the minimum is still set at 1.2.
standalone and the servlet version of the translation tool respectively. The
specific parameters are for debugging in Netbeans IDE but probably could
be generalized. Added comments on how to debug the servlet version
via sockets or via shared memory.
'Tibetan Machine Web (non-Unicode)' rather than 'Tibetan'.
I often use the term 'Tibetan' to mean 'Tibetan (either in Unicode,
TM or TMW in RTF, or any other scheme where it appears as Tibetan
instead of Roman transliteration'. So this is a good change in my opinion,
though 'TMW' or 'Legacy TMW' is shorter.