table exactly and I fear that it makes the ACIP->Tibetan converter code
a lot uglier. The TODO(DLC)[EWTS->Tibetan] comments littered throughout
are part of the ugliness; they point to the ugliness. If each were addressed,
cleanliness could perhaps be achieved.
I've largely forgotten exactly what this change does, but it attempts to
improve EWTS->Tibetan conversion. The lexer is probably really, really
primitive. I concentrate here on converting a single tsheg bar rather than
a whole document.
Eclipse was used during part of my journey here and some imports were
reorganized merely because I could. :)
(Eclipse was needed when the usual ant build failed to run a new test
EWTSTest. And I wanted its debugger.)
Next steps: end-to-end EWTS tests should bring many problems to light. Fix
those. Triage all the TODO comments.
I don't know that I'll ever really trust the implementation. The tests are
valuable, though. A clean implementation of EWTS->Tibetan in Jython
might hold enough interest for me; I'd like to learn Python.
One, TMW->EWTS gives dbas and dngas instead of dabs and dangs
because Chris Fynn's e-mail from today has dbas and dngas.
Second, Down with ACIPRules. Long live ACIPTraits. EWTS->Tibetan
conversion is closer still.
conversion. The tag 'TODO(DLC)[EWTS->Tibetan]' exists all over the
place. EWTS->Tibetan isn't here yet; lexing isn't here yet; this is
mainly a refactoring so that the ACIP->Tibetan code can be reused to
do EWTS->Tibetan.
I'm committing this because tests pass (it shouldn't be breaking
anything), because I want a checkpoint, and because the laptop this
sandbox was on isn't my preferred development environment.
Fixed part of bug 998476 and part of an undocumented bug. Discovered a
new bug, "aM" should be generated but only "M" is.
The undocumented bug was that laMA was generated when lAM should have been.
The part of bug 998476 that was fixed: laM, laH, etc. are now generated.
This does nothing about paN etc.
Some refactoring here; this is not a minimal diff.
Added tests of TMW->EWTS that use ACIP to get the TMW in place
because EWTS->TMW is a faulty keyboard at present.
as text to be passed through (without the brackets in the case of {}) literally,
which is the case by default because Robert Chilton requested it, or the old,
ad-hoc mechanism which could be useful for finding some ugly input.
Made a couple of error messages a little more verbose now that we have
short-message mode.
warnings in ACIP->Tibetan conversion much more configurable. You can
now choose from short or long error messages, for one thing. You can change
the severity of almost all warnings. Each error and warning has an error code.
Errors and warnings are better tested.
The converter GUI has a new checkbox for short messages; the converter
CLI has a new mandatory option for short messages.
I also fixed a bug whereby certain errors were not being appended to the
'errors' StringBuffer.
stacks that use full-form subjoined RA and YA consonants.
ACIP {RVA} was converting to the wrong things.
The TMW for {RVA} was converting to the wrong ACIP.
Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.
non-reference, publicly available ACIP files (hundreds of megabytes of
them) through the converter. The frequencies of these tsheg bars in
in the file, too.
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
The code would be cleaner if I could bear to delete my terrible hack. Maybe in a month, when I don't feel so dumb for coding it up in the first place.
The correct solution for such things is to give the ACIP->Tibetan converters a pre-filter mechanism. This would be before the lexer or part of the lexer (maybe you only want to filter tsheg bars), and it would allow the end user to specify things like "s/SNYAM'AM/S+NYAMA'AMA/g".
Fixed ACIP->Unicode/TMW for BDE, which should be B-DE, not B+DE, because the former is legal Tibetan.
The ACIP->EWTS subroutine has improved.
TMW->Wylie and TMW->ACIP are improved in error cases.
TMW->ACIP has friendly embedded error messages now.
because I don't know which glyphs o and x correspond to. For that
reason, they cause ERRORs.
The proposed THDL Extended Wylie ~X and X is now used for U+0F35 and
U+0F37 respectively.