Fixed ordering of Unicode wowels. [ku+A] gives the correct Unicode
now, e.g.
EWTS->TMW looks better for some wacky wowels like, I'm guessing here, [ku+A].
EWTS->TMW should now give errors any time the full input isn't used.
Previously, wacky wowels like [kai+-i] would lead to some droppage.
EWTS->TMW->Unicode testing is now in effect. This found a ton of
EWTS->TMW bugs, most or all of which are fixed now.
TMW->Unicode is improved/fixed for {
\u5350,\u534D,\u0F88+k,\u0F88+kh,U }. (Why U? "\u0f75" is
discouraged in favor of "\u0f71\u0f74".)
NOTE: TMW_RTF_TO_THDL_WYLIETest is still disabled for the nightly
builds' sake, but I ran it in my sandbox and it passed.
table exactly and I fear that it makes the ACIP->Tibetan converter code
a lot uglier. The TODO(DLC)[EWTS->Tibetan] comments littered throughout
are part of the ugliness; they point to the ugliness. If each were addressed,
cleanliness could perhaps be achieved.
I've largely forgotten exactly what this change does, but it attempts to
improve EWTS->Tibetan conversion. The lexer is probably really, really
primitive. I concentrate here on converting a single tsheg bar rather than
a whole document.
Eclipse was used during part of my journey here and some imports were
reorganized merely because I could. :)
(Eclipse was needed when the usual ant build failed to run a new test
EWTSTest. And I wanted its debugger.)
Next steps: end-to-end EWTS tests should bring many problems to light. Fix
those. Triage all the TODO comments.
I don't know that I'll ever really trust the implementation. The tests are
valuable, though. A clean implementation of EWTS->Tibetan in Jython
might hold enough interest for me; I'd like to learn Python.
conversion. The tag 'TODO(DLC)[EWTS->Tibetan]' exists all over the
place. EWTS->Tibetan isn't here yet; lexing isn't here yet; this is
mainly a refactoring so that the ACIP->Tibetan code can be reused to
do EWTS->Tibetan.
I'm committing this because tests pass (it shouldn't be breaking
anything), because I want a checkpoint, and because the laptop this
sandbox was on isn't my preferred development environment.
as text to be passed through (without the brackets in the case of {}) literally,
which is the case by default because Robert Chilton requested it, or the old,
ad-hoc mechanism which could be useful for finding some ugly input.
Made a couple of error messages a little more verbose now that we have
short-message mode.
stacks that use full-form subjoined RA and YA consonants.
ACIP {RVA} was converting to the wrong things.
The TMW for {RVA} was converting to the wrong ACIP.
Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.
non-reference, publicly available ACIP files (hundreds of megabytes of
them) through the converter. The frequencies of these tsheg bars in
in the file, too.
This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.