Commit graph

15 commits

Author SHA1 Message Date
dchandler
63ff0fb0c9 Fixed important EWTS->Tibetan conversion bugs. [g.yogs] (and maybe
[hUM^]) are not yet converting correctly.

I have not yet committed the end-to-end test that I'm manually doing
to find these problems.  It will be another document for
TMW_RTF_TO_THDL_WYLIETest.java.  Note that thdl.debug=true is
essential to access the GUI for the EWTS->* converters.
2005-07-06 07:46:21 +00:00
dchandler
0b3a636f63 Tremendously better EWTS->Unicode and EWTS->TMW conversion, though still not tested end-to-end and without perfect unit tests. See EWTSTest.RUN_FAILING_TESTS, for example, to find imperfection. 2005-07-06 02:19:38 +00:00
dchandler
7198f23361 I really hesitate to commit this because I'm not sure what it brings to the
table exactly and I fear that it makes the ACIP->Tibetan converter code
a lot uglier.  The TODO(DLC)[EWTS->Tibetan] comments littered throughout
are part of the ugliness; they point to the ugliness.  If each were addressed,
cleanliness could perhaps be achieved.

I've largely forgotten exactly what this change does, but it attempts to
improve EWTS->Tibetan conversion.  The lexer is probably really, really
primitive.  I concentrate here on converting a single tsheg bar rather than
a whole document.

Eclipse was used during part of my journey here and some imports were
reorganized merely because I could.  :)

(Eclipse was needed when the usual ant build failed to run a new test
EWTSTest.  And I wanted its debugger.)

Next steps: end-to-end EWTS tests should bring many problems to light.  Fix
those.  Triage all the TODO comments.

I don't know that I'll ever really trust the implementation.  The tests are
valuable, though.  A clean implementation of EWTS->Tibetan in Jython
might hold enough interest for me; I'd like to learn Python.
2005-06-20 06:18:00 +00:00
dchandler
c16f633ecf Two things:
One, TMW->EWTS gives dbas and dngas instead of dabs and dangs
because Chris Fynn's e-mail from today has dbas and dngas.

Second, Down with ACIPRules.  Long live ACIPTraits.  EWTS->Tibetan
conversion is closer still.
2005-02-22 04:36:54 +00:00
dchandler
37bf9a736d I did this stuff back in August. It's all in support of EWTS->Tibetan
conversion.  The tag 'TODO(DLC)[EWTS->Tibetan]' exists all over the
place.  EWTS->Tibetan isn't here yet; lexing isn't here yet; this is
mainly a refactoring so that the ACIP->Tibetan code can be reused to
do EWTS->Tibetan.

I'm committing this because tests pass (it shouldn't be breaking
anything), because I want a checkpoint, and because the laptop this
sandbox was on isn't my preferred development environment.
2005-02-21 01:16:10 +00:00
dchandler
e7a9e7968f ACIP->Unicode now uses two characters for consonants instead of one. This matches the dislike for characters like U+0F77 etc.
ACIP->Tibetan was not giving an error for BCWA because it parsed like BCVA.  Fixed.
2003-12-15 07:32:14 +00:00
dchandler
76c2e969ac Fixed ACIP->Unicode bug for YYE etc., things with full-formed
subjoined consonants and vowels.

Fixed ACIP->TMW for YYA etc., things with full-formed subjoined
consonants.
2003-12-14 07:36:21 +00:00
dchandler
ac412c994b Now {Pm} is treated like {PAm}; {Pm:} is like {PAm:}; {P:} is like {PA:}. 2003-11-30 02:06:48 +00:00
dchandler
04816acb74 ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those
stacks that use full-form subjoined RA and YA consonants.

ACIP {RVA} was converting to the wrong things.

The TMW for {RVA} was converting to the wrong ACIP.

Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.
2003-11-09 01:07:45 +00:00
dchandler
557ed7ed44 DKY'O etc. weren't being handled properly by ACIP->Tibetan. Now they are. 2003-10-18 17:49:29 +00:00
dchandler
5e18feb47d ACIP now stacks greedily. TTTTTA is T+T+T+T+TA, even though that stack doesn't exist in TM or TMW. Robert Chilton, in personal correspondence, agreed that this is the way to do things.
ACIP handles the appendages 'AM, 'ANG, 'US, 'UR, 'I, 'O, and 'U correctly.
2003-10-16 04:15:10 +00:00
dchandler
16817d0b8e Fixed Javadocs. 2003-09-10 01:19:05 +00:00
dchandler
045c4069c9 Preliminary ACIP->TMW support is in place. {DU} gives you something
less beautiful than what Jskad would give, so more work is needed.
2003-08-31 16:06:35 +00:00
dchandler
1afb3a0fdd ACIP->Unicode, without going through TMW, is now possible, so long as
\, the Sanskrit virama, is not used.  Of the 1370-odd ACIP texts I've
got here, about 57% make it through the gauntlet (fewer if you demand
a vowel or disambiguator on every stack of a non-Tibetan tsheg bar).
2003-08-18 02:38:54 +00:00
dchandler
e21d3774a9 Added an unfinished ACIP->Tibetan converter. Once it works properly
for ACIP, it'll easily be made to work as a perfect EWTS
Wylie->Tibetan converter.  It has an extensive suite of tests for the
existing functionality.
2003-08-10 19:30:07 +00:00