Commit Graph

23 Commits

Author SHA1 Message Date
dchandler 7198f23361 I really hesitate to commit this because I'm not sure what it brings to the
table exactly and I fear that it makes the ACIP->Tibetan converter code
a lot uglier.  The TODO(DLC)[EWTS->Tibetan] comments littered throughout
are part of the ugliness; they point to the ugliness.  If each were addressed,
cleanliness could perhaps be achieved.

I've largely forgotten exactly what this change does, but it attempts to
improve EWTS->Tibetan conversion.  The lexer is probably really, really
primitive.  I concentrate here on converting a single tsheg bar rather than
a whole document.

Eclipse was used during part of my journey here and some imports were
reorganized merely because I could.  :)

(Eclipse was needed when the usual ant build failed to run a new test
EWTSTest.  And I wanted its debugger.)

Next steps: end-to-end EWTS tests should bring many problems to light.  Fix
those.  Triage all the TODO comments.

I don't know that I'll ever really trust the implementation.  The tests are
valuable, though.  A clean implementation of EWTS->Tibetan in Jython
might hold enough interest for me; I'd like to learn Python.
2005-06-20 06:18:00 +00:00
dchandler 37bf9a736d I did this stuff back in August. It's all in support of EWTS->Tibetan
conversion.  The tag 'TODO(DLC)[EWTS->Tibetan]' exists all over the
place.  EWTS->Tibetan isn't here yet; lexing isn't here yet; this is
mainly a refactoring so that the ACIP->Tibetan code can be reused to
do EWTS->Tibetan.

I'm committing this because tests pass (it shouldn't be breaking
anything), because I want a checkpoint, and because the laptop this
sandbox was on isn't my preferred development environment.
2005-02-21 01:16:10 +00:00
dchandler 8a9271a3d8 I broke warning 507 into two warnings, one high-priority (512) and one
low-priority (507).
2004-05-01 20:49:53 +00:00
dchandler e2d42f36eb Robert Chilton's experience inspired me to make the handling of errors and
warnings in ACIP->Tibetan conversion much more configurable.  You can
now choose from short or long error messages, for one thing.  You can change
the severity of almost all warnings.  Each error and warning has an error code.
Errors and warnings are better tested.

The converter GUI has a new checkbox for short messages; the converter
CLI has a new mandatory option for short messages.

I also fixed a bug whereby certain errors were not being appended to the
'errors' StringBuffer.
2004-04-24 17:49:16 +00:00
dchandler 1bfd3772e6 TMW->ACIP is much improved. V and W were confused, # and * were
confused; many glyphs that should have yielded errors were not.

I've added a test case that transforms every TMW glyph save the one with
no TM mapping to ACIP.  I hand-checked that it was correct.

ACIP->TMW is fixed for # and *.  I never noticed it, but each needed an
extra swoosh (U+0F05).

Round-tripping would be good, as would testing real-world use of
TMW->ACIP.
2004-04-14 05:44:51 +00:00
dchandler 04816acb74 ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those
stacks that use full-form subjoined RA and YA consonants.

ACIP {RVA} was converting to the wrong things.

The TMW for {RVA} was converting to the wrong ACIP.

Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.
2003-11-09 01:07:45 +00:00
dchandler 61cf19932e ACIP {B5} and {7'} were problematic; that's fixed. 2003-10-26 17:47:35 +00:00
dchandler 31b3020d07 Added a test case that runs almost all the tsheg bars from all
non-reference, publicly available ACIP files (hundreds of megabytes of
them) through the converter.  The frequencies of these tsheg bars in
in the file, too.
2003-10-26 06:02:48 +00:00
dchandler 6bda550157 The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:32:55 +00:00
dchandler c764eee8d0 Added a new warning for DMAR and others affected similarly affected by prefix rules, where seeing D+MAR, not D-MAR, could have caused an input operator to type in DMAR. This is a "Most" warning, but DMA causes a higher-priority "Some" warning. 2003-10-21 03:36:57 +00:00
dchandler 2f81a801ef Added three new kinds of warnings to ACIP->Tibetan conversions. 2003-10-21 02:00:49 +00:00
dchandler a47af2c165 Bulletproofing -- code cleanup. 2003-10-21 00:31:10 +00:00
dchandler 188b9c322e Warn about prefix rules only in Most and All modes. 2003-10-21 00:23:55 +00:00
dchandler 1224030898 Speedup. 2003-10-21 00:19:15 +00:00
dchandler 5e18feb47d ACIP now stacks greedily. TTTTTA is T+T+T+T+TA, even though that stack doesn't exist in TM or TMW. Robert Chilton, in personal correspondence, agreed that this is the way to do things.
ACIP handles the appendages 'AM, 'ANG, 'US, 'UR, 'I, 'O, and 'U correctly.
2003-10-16 04:15:10 +00:00
dchandler b10098cc61 "Most" warnings now excludes "the last stack has no vowel", making it much more useful. 2003-10-04 15:10:18 +00:00
dchandler 115d0e0e6c Fixed ACIP->TMW vowels like 'I etc.
Fixed ACIP->Unicode/TMW for BDE, which should be B-DE, not B+DE, because the former is legal Tibetan.

The ACIP->EWTS subroutine has improved.

TMW->Wylie and TMW->ACIP are improved in error cases.

TMW->ACIP has friendly embedded error messages now.
2003-09-12 05:06:37 +00:00
dchandler 16817d0b8e Fixed Javadocs. 2003-09-10 01:19:05 +00:00
dchandler 0d6d6ed611 Added GUI support for color-coding. Added support for color-coding
and choosing the warning level to TibetanConverter.

Better error checking in the GUI converter.
2003-09-06 22:56:10 +00:00
dchandler 1982c5847b Jskad's converter now has ACIP-to-Unicode built in. There are known
bugs; it is pre-alpha.  It's usable, though, and finds tons of errors
in ACIP input files, with the user deciding just how pedantic to be.
The biggest outstanding bug is the silent one: treating { }, space, as
tsheg instead of whitespace when we ought to know better.
2003-08-24 06:40:53 +00:00
dchandler d5ad760230 TMW->Wylie conversion now takes advantage of prefix rules, the rules
that say "ya can take a ga prefix" etc.

The ACIP->Unicode converter now gives warnings (optionally, and by
default, inline).  This converter now produces output even when
lexical errors occur, but the output has errors and warnings inline.
2003-08-23 22:03:37 +00:00
dchandler 57f506384f The ACIP->Tibetan converter now has perfect low-level functionality,
and it has the capability to produce error messages and warnings that
make sense to the user.  One can now get the correct parse, if one
exists, for an ACIP tsheg bar.

One could even feed in ACIP and get a list of warnings about things as
innocuous as PADMA, which a dumb converter would have trouble with.
One could then turn ACIP into well-behaved ACIP for that dumb
converter, if you really wanted to.

Still to do:

o Scan ACIP files into tsheg bars.
o Produce TMW/Latin (from which you can get Unicode, etc.).
o E-mail the illegal tsheg bars to the ACIP fellows so they can fix
  the affected documents (most of the Kangyur has unparseable
  creatures).
2003-08-12 04:13:11 +00:00
dchandler e21d3774a9 Added an unfinished ACIP->Tibetan converter. Once it works properly
for ACIP, it'll easily be made to work as a perfect EWTS
Wylie->Tibetan converter.  It has an extensive suite of tests for the
existing functionality.
2003-08-10 19:30:07 +00:00