transformations. I haven't actually used it with Xalan XSLT yet, but
it ought to work if TibetanHTML did (which it must have at one point).
I do have a unit test, but an end-to-end test with Xalan is what we
need.
Fixed ordering of Unicode wowels. [ku+A] gives the correct Unicode
now, e.g.
EWTS->TMW looks better for some wacky wowels like, I'm guessing here, [ku+A].
EWTS->TMW should now give errors any time the full input isn't used.
Previously, wacky wowels like [kai+-i] would lead to some droppage.
EWTS->TMW->Unicode testing is now in effect. This found a ton of
EWTS->TMW bugs, most or all of which are fixed now.
TMW->Unicode is improved/fixed for {
\u5350,\u534D,\u0F88+k,\u0F88+kh,U }. (Why U? "\u0f75" is
discouraged in favor of "\u0f71\u0f74".)
NOTE: TMW_RTF_TO_THDL_WYLIETest is still disabled for the nightly
builds' sake, but I ran it in my sandbox and it passed.
[hUM^]) are not yet converting correctly.
I have not yet committed the end-to-end test that I'm manually doing
to find these problems. It will be another document for
TMW_RTF_TO_THDL_WYLIETest.java. Note that thdl.debug=true is
essential to access the GUI for the EWTS->* converters.
debug mode for now.
I tested against a really simple-but-real document, found a bug with '*', tried
to implement TMW vowel code but I don't trust it yet. Differentiated EWTS
code from ACIP where needed.
Several bugs in ewts->tibetan have been exposed; see the TODO
comments.
table exactly and I fear that it makes the ACIP->Tibetan converter code
a lot uglier. The TODO(DLC)[EWTS->Tibetan] comments littered throughout
are part of the ugliness; they point to the ugliness. If each were addressed,
cleanliness could perhaps be achieved.
I've largely forgotten exactly what this change does, but it attempts to
improve EWTS->Tibetan conversion. The lexer is probably really, really
primitive. I concentrate here on converting a single tsheg bar rather than
a whole document.
Eclipse was used during part of my journey here and some imports were
reorganized merely because I could. :)
(Eclipse was needed when the usual ant build failed to run a new test
EWTSTest. And I wanted its debugger.)
Next steps: end-to-end EWTS tests should bring many problems to light. Fix
those. Triage all the TODO comments.
I don't know that I'll ever really trust the implementation. The tests are
valuable, though. A clean implementation of EWTS->Tibetan in Jython
might hold enough interest for me; I'd like to learn Python.
'Tibetan Machine Web (non-Unicode)' rather than 'Tibetan'.
I often use the term 'Tibetan' to mean 'Tibetan (either in Unicode,
TM or TMW in RTF, or any other scheme where it appears as Tibetan
instead of Roman transliteration'. So this is a good change in my opinion,
though 'TMW' or 'Legacy TMW' is shorter.
One, TMW->EWTS gives dbas and dngas instead of dabs and dangs
because Chris Fynn's e-mail from today has dbas and dngas.
Second, Down with ACIPRules. Long live ACIPTraits. EWTS->Tibetan
conversion is closer still.
conversion. The tag 'TODO(DLC)[EWTS->Tibetan]' exists all over the
place. EWTS->Tibetan isn't here yet; lexing isn't here yet; this is
mainly a refactoring so that the ACIP->Tibetan code can be reused to
do EWTS->Tibetan.
I'm committing this because tests pass (it shouldn't be breaking
anything), because I want a checkpoint, and because the laptop this
sandbox was on isn't my preferred development environment.
Fixed part of bug 998476 and part of an undocumented bug. Discovered a
new bug, "aM" should be generated but only "M" is.
The undocumented bug was that laMA was generated when lAM should have been.
The part of bug 998476 that was fixed: laM, laH, etc. are now generated.
This does nothing about paN etc.
Some refactoring here; this is not a minimal diff.
Added tests of TMW->EWTS that use ACIP to get the TMW in place
because EWTS->TMW is a faulty keyboard at present.
I refactored the code trying to fit it onto one screen. So not all of the
changes are material to the bug fix.
About this commit: TMW->Wylie for {b.s.d} now gives bsad instead of bas.d.
This fixes part of bug 998476, and is done because Andres thinks it'll work
most of the time. But don't be surprised if an exception comes up in the
future and we have to trivially change the code to catch it.
well-formed. They still do, but they do it less often.
Chris Fynn wrote this a while back:
By normal Tibetan & Dzongkha spelling, writing, and input rules
Tibetan script stacks should be entered and written: 1 headline
consonant (0F40->0F6A), any subjoined consonant(s) (0F90-> 0F9C),
achung (0F71), shabkyu (0F74), any above headline vowel(s) (0F72
0F7A 0F7B 0F7C 0F7D and 0F80); any ngaro (0F7E, 0F82 and 0F83).
Now efforts are made to ensure that the converters conform to the
above rules.
Fixed crashing bug reported by Teresa Lam. Added tests so that I'm fairly
certain that no more crashing bugs exist. Removed a marker for iffy code
after understanding that code via test cases.
Fixed crashing bug reported by Teresa Lam. Added tests so that I'm fairly
certain that no more crashing bugs exist. Removed a marker for iffy code
after understanding that code via test cases.
that the paste method is over-ridden for "smart-pasting". If pasting TMW paste
as is. If pasting TM, converts to TMW. If neither of these fonts are used,
assumes transliteration and converts to TMW.
as text to be passed through (without the brackets in the case of {}) literally,
which is the case by default because Robert Chilton requested it, or the old,
ad-hoc mechanism which could be useful for finding some ugly input.
Made a couple of error messages a little more verbose now that we have
short-message mode.