Commit graph

294 commits

Author SHA1 Message Date
dchandler
c16f633ecf Two things:
One, TMW->EWTS gives dbas and dngas instead of dabs and dangs
because Chris Fynn's e-mail from today has dbas and dngas.

Second, Down with ACIPRules.  Long live ACIPTraits.  EWTS->Tibetan
conversion is closer still.
2005-02-22 04:36:54 +00:00
dchandler
4c268c5ea2 Refactored so that there can be an EWTS scanner and an ACIP scanner. 2005-02-21 05:37:01 +00:00
amontano
7854e4fd93 Changed minor optimization things 2005-02-21 05:27:36 +00:00
dchandler
3e0168b384 Renamed ACIPConverter to TConverter. Added a needed parameter (the
only needed parameter in that class's interface AFAIK.
2005-02-21 01:35:23 +00:00
dchandler
37bf9a736d I did this stuff back in August. It's all in support of EWTS->Tibetan
conversion.  The tag 'TODO(DLC)[EWTS->Tibetan]' exists all over the
place.  EWTS->Tibetan isn't here yet; lexing isn't here yet; this is
mainly a refactoring so that the ACIP->Tibetan code can be reused to
do EWTS->Tibetan.

I'm committing this because tests pass (it shouldn't be breaking
anything), because I want a checkpoint, and because the laptop this
sandbox was on isn't my preferred development environment.
2005-02-21 01:16:10 +00:00
dchandler
83f499b7a8 Formatting in TMW documents is not preserved. I've added an identity
tranformation, TMW->TMW, to help me debug this problem.
2005-02-13 00:34:47 +00:00
dchandler
9025fb42d6 TMW->EWTS 998476 partial fix: "aM" is generated now correctly. Before
you got "M".
2005-02-07 04:00:42 +00:00
dchandler
8dcb623382 TMW->EWTS:
Fixed part of bug 998476 and part of an undocumented bug.  Discovered a
new bug, "aM" should be generated but only "M" is.

The undocumented bug was that laMA was generated when lAM should have been.

The part of bug 998476 that was fixed: laM, laH, etc. are now generated.

This does nothing about paN etc.

Some refactoring here; this is not a minimal diff.

Added tests of TMW->EWTS that use ACIP to get the TMW in place
because EWTS->TMW is a faulty keyboard at present.
2005-02-07 03:17:40 +00:00
dchandler
96d0d0d9d0 My previous commit message failed to mention the following:
I refactored the code trying to fit it onto one screen.  So not all of the
changes are material to the bug fix.

About this commit: TMW->Wylie for {b.s.d} now gives bsad instead of bas.d.
This fixes part of bug 998476, and is done because Andres thinks it'll work
most of the time.  But don't be surprised if an exception comes up in the
future and we have to trivially change the code to catch it.
2005-02-05 22:37:02 +00:00
dchandler
287fc181a0 Fix for part of bug 998476. 2005-02-05 22:16:39 +00:00
dchandler
be632e1874 Cleaned up some code that is relevant to the bug I'm looking into. I need to
instrument it.  Functionally, this change is a no-op.  I just don't want to
confuse refactoring with the actual bug fix.
2005-02-05 19:29:37 +00:00
dchandler
b4155c3264 After a1tsal's changes to tibwn.ini, the tests failed. I'm a bit disheartened
that more tests didn't fail.
2005-02-05 16:51:13 +00:00
dchandler
7304c770c9 Just a better comment. 2005-02-05 16:27:34 +00:00
a1tsal
28d46bb207 Add corrective comment regarding the bogus Unicode OM characters. 2005-01-19 01:07:52 +00:00
a1tsal
affccad9e5 Use 00A0 rather than 0020 for _, per unicode spec. 2005-01-17 08:58:56 +00:00
a1tsal
91d0e7f4da Use "precomposed" sanskrit consonant combinations consistently throughout. 2005-01-17 08:49:04 +00:00
a1tsal
22cfee69db Had wrong Unicode for n+n+y.
Had wrong Unicode for space -- but only in comment.

Cleaned up punctuation in another comment.
2005-01-16 01:17:44 +00:00
dchandler
0b0af67ed9 Ximalaya is not nearly as nice as Tibetan Machine Uni, so use the latter. 2005-01-04 02:20:59 +00:00
dchandler
aa5d86a6e3 The *->Unicode conversions were outputting Unicode that was not
well-formed.  They still do, but they do it less often.

Chris Fynn wrote this a while back:

   By normal Tibetan & Dzongkha spelling, writing, and input rules
   Tibetan script stacks should be entered and written: 1 headline
   consonant (0F40->0F6A), any subjoined consonant(s) (0F90-> 0F9C),
   achung (0F71), shabkyu (0F74), any above headline vowel(s) (0F72
   0F7A 0F7B 0F7C 0F7D and 0F80); any ngaro (0F7E, 0F82 and 0F83).

Now efforts are made to ensure that the converters conform to the
above rules.
2004-12-13 02:32:46 +00:00
eg3p
a39d5c2ba3 changed all occurrences of 'Color.BLACK' to 'Color.black', since the former causes a runtime error on Mac OS X (for Java 1.3.1 at least). not sure why this is, but may be related to the bug mentioned at http://www.oxygenxml.com/forum/viewtopic.php?p=239 2004-08-19 14:59:06 +00:00
dchandler
e101cc8294 Now the Sambhota keyboard crashing bug.
Fixed crashing bug reported by Teresa Lam.  Added tests so that I'm fairly
certain that no more crashing bugs exist.  Removed a marker for iffy code
after understanding that code via test cases.
2004-07-05 04:36:35 +00:00
dchandler
6cbea9f894 Fixed crashing bug reported by Teresa Lam. Added tests so that I'm fairly
certain that no more crashing bugs exist.  Removed a marker for iffy code
after understanding that code via test cases.
2004-07-05 04:10:38 +00:00
dchandler
bff0e6b2fc Fixed TMW->Wylie/ACIP when multiple font sizes are in use. I was not
incrementing the offset at which I was inserting text properly.
2004-06-25 00:22:10 +00:00
dchandler
14fb449f95 I thought my earlier commit preserved font size info for TMW->ACIP/Wylie
conversions.  It was only at a very coarse level.  The feature is now truly
here.
2004-06-20 02:57:28 +00:00
dchandler
e18a4417dc Added a FIXME comment. 2004-06-12 02:26:28 +00:00
dchandler
9f78cabb18 TMW->{Wylie,ACIP} conversions now preserve font size information. 2004-06-12 02:09:28 +00:00
dchandler
7acbce3361 Added errors 142 and 143, which are produced when converting yig chung
to a Unicode text file, which cannot support font size changes.
2004-06-06 21:59:16 +00:00
dchandler
df262aa148 It is now a compile-time option whether to treat []- and {}-bracketed sequences
as text to be passed through (without the brackets in the case of {}) literally,
which is the case by default because Robert Chilton requested it, or the old,
ad-hoc mechanism which could be useful for finding some ugly input.

Made a couple of error messages a little more verbose now that we have
short-message mode.
2004-06-06 21:39:06 +00:00
dchandler
8a9271a3d8 I broke warning 507 into two warnings, one high-priority (512) and one
low-priority (507).
2004-05-01 20:49:53 +00:00
dchandler
31bdd39fec The TMW for 'da'i was converting to 'aad'i. Andres found this; it is bug
945744.  I've made it more correct -- 'ad'i is now produced.  The wrong stack
is thought to be the root stack still.
2004-05-01 19:11:15 +00:00
dchandler
1a055f3472 I don't think warning level "None" was really doing the trick. Fixed that.
You can now customize the severities of all warnings, even 504 and 510.

When warning level is "None", scanning, i.e. lexical analysis, is faster.
2004-04-25 00:37:57 +00:00
dchandler
e2d42f36eb Robert Chilton's experience inspired me to make the handling of errors and
warnings in ACIP->Tibetan conversion much more configurable.  You can
now choose from short or long error messages, for one thing.  You can change
the severity of almost all warnings.  Each error and warning has an error code.
Errors and warnings are better tested.

The converter GUI has a new checkbox for short messages; the converter
CLI has a new mandatory option for short messages.

I also fixed a bug whereby certain errors were not being appended to the
'errors' StringBuffer.
2004-04-24 17:49:16 +00:00
dchandler
cc5d096918 David Chapman's latest fix to tibwn.ini (clearing up an issue that Than or I
dropped the ball on) introduced two lines for 8,95.  This is a bad thing, so
I've taken out the second line.  I've also introduced a check in
TibetanMachineWeb.java such that we'll know that tibwn.ini has no such
error in the future just by running 'ant clean jskad-run' and making sure that
the GUI is indeed visible.

I also updated the test baselines now that F03A and 0F82 are squared away.
2004-04-24 13:23:56 +00:00
a1tsal
9e071ea178 Differentiated 0F82 (~M`) and F03A (nyi.zla editor's mark). 2004-04-21 10:04:11 +00:00
dchandler
0ee90a0fb0 Added many ACIP->TMW->ACIP tests. They found no bugs. 2004-04-17 17:28:26 +00:00
dchandler
63438d243b getACIP was getting EWTS, not ACIP. 2004-04-17 15:49:40 +00:00
dchandler
de3a19761e Fixes for javadoc tool. 2004-04-17 15:48:50 +00:00
dchandler
adcf9de952 Two new tests. 2004-04-17 15:14:46 +00:00
dchandler
1bfd3772e6 TMW->ACIP is much improved. V and W were confused, # and * were
confused; many glyphs that should have yielded errors were not.

I've added a test case that transforms every TMW glyph save the one with
no TM mapping to ACIP.  I hand-checked that it was correct.

ACIP->TMW is fixed for # and *.  I never noticed it, but each needed an
extra swoosh (U+0F05).

Round-tripping would be good, as would testing real-world use of
TMW->ACIP.
2004-04-14 05:44:51 +00:00
dchandler
56a02ba41d Fixed the worst TMW->ACIP bug, the one regarding U+0F04 and U+0F05.
TMW->EWTS requires no context information, but TMW->ACIP does.
2004-04-10 18:26:57 +00:00
dchandler
7eca276a62 TMW->Unicode conversions have changed; now using U+0F6A for the stacks
whose EWTS transliteration begins with "R+".

ACIP->* conversions and test baselines were updated to deal with the
"r+..."=>"R+..."  change.
2004-04-10 16:03:25 +00:00
dchandler
aff34174ab The new EWTS rule regarding R, W, and Y requires that these change. It
may also require changes to the following, but I'm going to ask if it really
should or not.

// Y+Y~185,3~~6,98~1,109~6,120~1,123~1,125~6,106~6,113~f61,fbb
// Y+r~186,3~~6,99~1,109~6,120~1,123~1,125~6,106~6,113~f61,fb2
// Y+w~187,3~~6,100~1,109~6,120~1,123~1,125~6,106~6,113~f61,fad
// Y+s~188,3~~6,101~1,109~6,120~1,123~1,125~6,106~6,113~f61,fb6

// W+y~69,4~~7,79~1,109~8,121~1,123~1,125~8,107~8,114~f5d,fb1
// W+r~70,4~~7,80~1,109~8,121~1,123~1,125~8,107~8,114~f5d,fb2
// W+n~195,4~~7,81~1,109~8,120~1,123~1,125~8,106~8,113~f5d,fa3
// W+W~194,4~~7,82~1,109~8,120~1,123~1,125~8,106~8,113~f5d,fba
2004-04-08 02:55:59 +00:00
dchandler
76356f4009 ACIP->Tibetan now gives an error when {?} is seen alone (not in {[?]} or {[*FOO?]}, but alone). Bug 860192 is fixed. 2004-03-15 00:49:01 +00:00
dchandler
542fb50bf1 The ~M and ~M` EWTS change had not fully been made. Someone submitted a bug report 911472 that alerted me to this. 2004-03-07 17:02:35 +00:00
dchandler
d436a4d462 Removed David Chapman's recently added line for U+0F82 -- a line for U+0F82 already existed, and the new line had incorrect TM and incorrect TMW mappings. I changed the existing line for U+0F82 to use the EWTS {~M`}. 2004-03-02 04:29:41 +00:00
a1tsal
8eaaeaa202 Fix careless error: I had the same TMW character for ~M and ~M`! 2004-02-22 09:14:56 +00:00
a1tsal
b14833b5b9 Change ^M to ~M to conform to spec.
Introduce ~M` (for 0F82).
2004-02-20 15:07:49 +00:00
dchandler
274e1736be Deleted cut-and-paste goof. 2004-01-17 19:45:31 +00:00
dchandler
c69ba26c60 TString now has tracks what Roman transliteration system it is using. Next up is to make ACIPConverter handle EWTS or ACIP TStrings. 2004-01-17 19:28:54 +00:00
dchandler
48b4c5cb07 Added a Unicode->ASCII dump for debugging *->Unicode conversions. To use it, use 'java -cp Jskad.jar org.thdl.util.VerboseUnicodeDump'. 2004-01-17 17:10:12 +00:00