Commit Graph

666 Commits

Author SHA1 Message Date
dchandler 31bdd39fec The TMW for 'da'i was converting to 'aad'i. Andres found this; it is bug
945744.  I've made it more correct -- 'ad'i is now produced.  The wrong stack
is thought to be the root stack still.
2004-05-01 19:11:15 +00:00
dchandler 1a055f3472 I don't think warning level "None" was really doing the trick. Fixed that.
You can now customize the severities of all warnings, even 504 and 510.

When warning level is "None", scanning, i.e. lexical analysis, is faster.
2004-04-25 00:37:57 +00:00
dchandler e2d42f36eb Robert Chilton's experience inspired me to make the handling of errors and
warnings in ACIP->Tibetan conversion much more configurable.  You can
now choose from short or long error messages, for one thing.  You can change
the severity of almost all warnings.  Each error and warning has an error code.
Errors and warnings are better tested.

The converter GUI has a new checkbox for short messages; the converter
CLI has a new mandatory option for short messages.

I also fixed a bug whereby certain errors were not being appended to the
'errors' StringBuffer.
2004-04-24 17:49:16 +00:00
dchandler cc5d096918 David Chapman's latest fix to tibwn.ini (clearing up an issue that Than or I
dropped the ball on) introduced two lines for 8,95.  This is a bad thing, so
I've taken out the second line.  I've also introduced a check in
TibetanMachineWeb.java such that we'll know that tibwn.ini has no such
error in the future just by running 'ant clean jskad-run' and making sure that
the GUI is indeed visible.

I also updated the test baselines now that F03A and 0F82 are squared away.
2004-04-24 13:23:56 +00:00
a1tsal 9e071ea178 Differentiated 0F82 (~M`) and F03A (nyi.zla editor's mark). 2004-04-21 10:04:11 +00:00
dchandler 72442788c1 This displayed poorly for me, so I untabified it. Whitespace changed only. 2004-04-18 18:56:01 +00:00
dchandler 0ee90a0fb0 Added many ACIP->TMW->ACIP tests. They found no bugs. 2004-04-17 17:28:26 +00:00
dchandler 63438d243b getACIP was getting EWTS, not ACIP. 2004-04-17 15:49:40 +00:00
dchandler de3a19761e Fixes for javadoc tool. 2004-04-17 15:48:50 +00:00
dchandler adcf9de952 Two new tests. 2004-04-17 15:14:46 +00:00
dchandler 1bfd3772e6 TMW->ACIP is much improved. V and W were confused, # and * were
confused; many glyphs that should have yielded errors were not.

I've added a test case that transforms every TMW glyph save the one with
no TM mapping to ACIP.  I hand-checked that it was correct.

ACIP->TMW is fixed for # and *.  I never noticed it, but each needed an
extra swoosh (U+0F05).

Round-tripping would be good, as would testing real-world use of
TMW->ACIP.
2004-04-14 05:44:51 +00:00
dchandler 244a9d1370 TiblEdit's diacritics panel now works -- dia.dat has been added to the
repository and to TiblEdit's jar.
2004-04-14 05:12:00 +00:00
dchandler 56a02ba41d Fixed the worst TMW->ACIP bug, the one regarding U+0F04 and U+0F05.
TMW->EWTS requires no context information, but TMW->ACIP does.
2004-04-10 18:26:57 +00:00
dchandler 9e7ccf2894 TMW->Unicode conversions have changed; now using U+0F6A for the stacks
whose EWTS transliteration begins with "R+".

ACIP->* conversions and test baselines were updated to deal with the
"r+..."=>"R+..."  change.
2004-04-10 16:58:45 +00:00
dchandler 7eca276a62 TMW->Unicode conversions have changed; now using U+0F6A for the stacks
whose EWTS transliteration begins with "R+".

ACIP->* conversions and test baselines were updated to deal with the
"r+..."=>"R+..."  change.
2004-04-10 16:03:25 +00:00
dchandler aff34174ab The new EWTS rule regarding R, W, and Y requires that these change. It
may also require changes to the following, but I'm going to ask if it really
should or not.

// Y+Y~185,3~~6,98~1,109~6,120~1,123~1,125~6,106~6,113~f61,fbb
// Y+r~186,3~~6,99~1,109~6,120~1,123~1,125~6,106~6,113~f61,fb2
// Y+w~187,3~~6,100~1,109~6,120~1,123~1,125~6,106~6,113~f61,fad
// Y+s~188,3~~6,101~1,109~6,120~1,123~1,125~6,106~6,113~f61,fb6

// W+y~69,4~~7,79~1,109~8,121~1,123~1,125~8,107~8,114~f5d,fb1
// W+r~70,4~~7,80~1,109~8,121~1,123~1,125~8,107~8,114~f5d,fb2
// W+n~195,4~~7,81~1,109~8,120~1,123~1,125~8,106~8,113~f5d,fa3
// W+W~194,4~~7,82~1,109~8,120~1,123~1,125~8,106~8,113~f5d,fba
2004-04-08 02:55:59 +00:00
dchandler 76356f4009 ACIP->Tibetan now gives an error when {?} is seen alone (not in {[?]} or {[*FOO?]}, but alone). Bug 860192 is fixed. 2004-03-15 00:49:01 +00:00
dchandler 542fb50bf1 The ~M and ~M` EWTS change had not fully been made. Someone submitted a bug report 911472 that alerted me to this. 2004-03-07 17:02:35 +00:00
dchandler e0928d8472 New EWTS for 0F82 and 0F83. 2004-03-06 23:00:40 +00:00
amontano bb8fa6c58f Now the clear button in the http servlet version actually clears. Also added "synchronized" to some methods to ensure that concurrent threads don't crash. 2004-03-03 00:33:18 +00:00
dchandler d436a4d462 Removed David Chapman's recently added line for U+0F82 -- a line for U+0F82 already existed, and the new line had incorrect TM and incorrect TMW mappings. I changed the existing line for U+0F82 to use the EWTS {~M`}. 2004-03-02 04:29:41 +00:00
a1tsal 8eaaeaa202 Fix careless error: I had the same TMW character for ~M and ~M`! 2004-02-22 09:14:56 +00:00
a1tsal b14833b5b9 Change ^M to ~M to conform to spec.
Introduce ~M` (for 0F82).
2004-02-20 15:07:49 +00:00
amontano e5454d3720 Updated the translation tool to conform to the Personal Profile specification of Java.
Before it would run in pocket pc's through the more restricted personalJava specification
but Sun's vm for pocket pc's project was terminated. Now it is designed to run under
IBM's VM for pocket pc's called J9 which implements the Personal Profile specification.
Such specification also supports awt, but not swing so still there is no (hope for) support
of Tibetan script in the pocket pc's,
2004-02-07 18:21:17 +00:00
dchandler 274e1736be Deleted cut-and-paste goof. 2004-01-17 19:45:31 +00:00
dchandler c69ba26c60 TString now has tracks what Roman transliteration system it is using. Next up is to make ACIPConverter handle EWTS or ACIP TStrings. 2004-01-17 19:28:54 +00:00
dchandler 48b4c5cb07 Added a Unicode->ASCII dump for debugging *->Unicode conversions. To use it, use 'java -cp Jskad.jar org.thdl.util.VerboseUnicodeDump'. 2004-01-17 17:10:12 +00:00
dchandler 6fdb2a26bb Added a Unicode->ASCII dump for debugging *->Unicode conversions. To use it, use 'java -cp Jskad.jar org.thdl.util.VerboseUnicodeDump'. 2004-01-17 16:52:38 +00:00
dchandler 9dd95c5524 I saw this error when I wasn't expecting it, so now, curious, I print more details. 2004-01-17 16:51:33 +00:00
dchandler 4dd40809a5 A user reported that q` caused a crash with TCC keyboard #1. Fixed. TCC keyboard #1 does not support q~ though. 2003-12-21 06:27:36 +00:00
dchandler c1aa81e943 RFE 860190: ACIP->Unicode now gives a warning when it outputs something that can't be represented in TMW. 2003-12-16 07:45:40 +00:00
dchandler 848349fd3a More tests. 2003-12-15 08:16:06 +00:00
dchandler e7a9e7968f ACIP->Unicode now uses two characters for consonants instead of one. This matches the dislike for characters like U+0F77 etc.
ACIP->Tibetan was not giving an error for BCWA because it parsed like BCVA.  Fixed.
2003-12-15 07:32:14 +00:00
dchandler e9f7b2dfed If you want curly brackets around folio markers, you'll have to set
the system property
thdl.acip.to.x.output.curly.brackets.around.folio.markers to true.
2003-12-14 08:47:03 +00:00
dchandler 8664571577 Warnings were not being detected correctly. Fixed.
ACIP->Unicode uses U+0020, ' ', for whitespace.  ACIP->TMW uses the
TMW whitespace for whitespace.
2003-12-14 08:38:10 +00:00
dchandler 01e65176d4 Using less memory and time to figure out if warnings occurred. 2003-12-14 07:41:15 +00:00
dchandler 76c2e969ac Fixed ACIP->Unicode bug for YYE etc., things with full-formed
subjoined consonants and vowels.

Fixed ACIP->TMW for YYA etc., things with full-formed subjoined
consonants.
2003-12-14 07:36:21 +00:00
dchandler f625c937ee ACIP {B} was not being treated like {BA}; instead, an error was resulting. All the five prefixes were affected. 2003-12-14 05:54:07 +00:00
dchandler a0e6db11c0 Very minor cleanup. 2003-12-13 21:59:31 +00:00
dchandler 4c30657afa Adding tests for an ACIP keyboard that will never work correctly, and
probably never even be useful.  But they were lying around from a
while back, so here are the tests.
2003-12-13 21:34:33 +00:00
dchandler 02967539b0 Slightly improved Jskad's internal documentation. Links to converters' docs. 2003-12-10 07:04:35 +00:00
dchandler 581643cf59 {DAN,\nLHAG} used to be treated like {DAN, LHAG} but that got broken. Fixed.
Added tests for lexer's handling of ACIP spaces etc.
2003-12-10 06:55:16 +00:00
dchandler 8e673bbc2c {NGA,} becomes {NGA\u0f0c,} now instead of {NGA\u0f0b,}.
Note: ACIP->Unicode for {NGA,} was not giving the Unicode that {NGA\u0f0b,} gives before.
2003-12-10 06:50:14 +00:00
dchandler a466bad939 ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing. 2003-12-08 07:51:45 +00:00
dchandler a39c5c12b0 ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing. 2003-12-08 07:15:27 +00:00
dchandler 8f7322a056 Use absolute paths when invoking the external viewer; it doesn't know what our current working directory is. 2003-12-08 06:53:37 +00:00
dchandler b617f761d5 ACIP->TMW for {^GONG SA } used to fail; fixed. 2003-12-07 20:05:41 +00:00
dchandler 115534e688 ACIP->TMW for {^GONG SA } used to fail because we had \u0F38 in the ToWylie section. Now it's in the <?Input:Numbers?> section because I didn't want to introduce a new section. If WylieWord has trouble due to this misuse of the 'numbers' category, we'll introduce a new category, 'other'.
TMW->EWTS improved as a result -- {\u0F38.gonga sa } is produced now where {\u0F38agonga sa } was once produced.  Even the better version is imperfect; see bug 855877.
2003-12-07 19:40:59 +00:00
dchandler 597cf408dd Fixed help message. 2003-12-07 19:10:36 +00:00
dchandler 4adf87c401 Updated comments only. 2003-12-06 20:36:56 +00:00
dchandler 3f18623977 Added comments only. 2003-12-06 20:26:45 +00:00
dchandler 6232ee9170 Added comments referring to a user guide in development now. 2003-12-06 20:26:15 +00:00
dchandler c43e9a446b Revamped some ACIP->Tibetan error messages. 2003-12-06 20:19:40 +00:00
dchandler c9c771d1ee ACIP {&}, as in {KO&HAm,}, is supported. 2003-11-30 02:18:59 +00:00
dchandler ac412c994b Now {Pm} is treated like {PAm}; {Pm:} is like {PAm:}; {P:} is like {PA:}. 2003-11-30 02:06:48 +00:00
dchandler e7c4cc1874 Updated to be in sync with latest EWTS draft. 2003-11-29 22:59:39 +00:00
dchandler ffd041e32c ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}.
Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided.

Also, we've traded some speed for memory -- DuffCode now uses bytes, not ints.
2003-11-29 22:57:12 +00:00
dchandler dfaae4be93 ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}.
Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided.
2003-11-29 22:56:18 +00:00
dchandler 946d8cbc72 Updated the code I used for testing to generate the file containing all glyphs in TM and all glyphs but one in TMW. 2003-11-29 16:22:26 +00:00
dchandler 16bfeac641 These issues are non-issues; removing these comments. 2003-11-25 00:31:33 +00:00
dchandler d3d0ff23a8 Chris Fynn and Tony Duff answered my questions about U+0F3F and U+0F3E. 2003-11-25 00:28:18 +00:00
dchandler b8608797aa Updated the code I used for testing to generate the file containing all glyphs in TM and all glyphs but one in TMW. 2003-11-24 05:59:32 +00:00
dchandler 8d18ac53cb N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate.
Found another inconsistency between Unicode and the TM/TMW docs.  I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day.

Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate.  A few TMW glyphs still give errors.

Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested).  The baselines have not been hand-verified, but changes will be detected.
2003-11-24 05:50:42 +00:00
dchandler 5d053b41fe Found another inconsistency between Unicode and the TM/TMW docs. I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day.
Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate.  A few TMW glyphs still give errors.

Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested).  The baselines have not been hand-verified, but changes will be detected.
2003-11-24 05:49:15 +00:00
dchandler 9a247f5932 N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate. 2003-11-24 04:55:11 +00:00
dchandler 1ec668c018 Dza is not in the latest EWTS draft. 2003-11-24 04:28:55 +00:00
dchandler f76c089366 Using Y, R, and W everywhere needed. R+... is never needed in TM/TMW, I concluded (with 50% certainty). 2003-11-24 04:05:59 +00:00
dchandler 08c676c186 Bug fixes. Plus, now 99% in sync with the new EWTS draft. Search for 'DLC' to find a few open issues.
Readded the line for reversed dza; it should never have been deleted, as that breaks TM<->TMW.  I tested the whole mapping by hand once; this incident shows that automation is very helpful.

'{' and '}' were swapped...

The Unicode for something was "", not "none".

+R, +W, +Y, R+ now in use (though more testing is needed)
2003-11-24 02:40:40 +00:00
dchandler 216c5b0d54 Fixed TWM->Wylie for achen. I even tested this by pretending achen could take a da prefix (when in reality it takes no prefixes). 2003-11-23 01:22:27 +00:00
dchandler 37e8dfa917 The menu now says (Buggy) in front of "Convert Selection from Wylie to Tibetan" because this feature is, you guessed it, buggy. 2003-11-22 22:48:41 +00:00
dchandler 113480a882 X is now better supported, so this changed. 2003-11-15 20:00:59 +00:00
dchandler 8d4fb5d13f We crashed before when '~' was entered. 2003-11-14 04:50:55 +00:00
dchandler b59b86fd73 Commented this to mention some recent testing. 2003-11-11 03:45:58 +00:00
dchandler 4023be9612 Better prettyprinting. Untested. 2003-11-11 03:43:26 +00:00
dchandler 4e6a9c299f ACIP % {MTHAR%} and o {Ko} and ^ {^GONG SA} are now supported. A % always causes a warning. 2003-11-11 03:43:11 +00:00
dchandler 2cb90bd231 ACIP->Tibetan converters now warn every time {%} is encountered that U+0F14 might've been intended.
The Unicode for ACIP {o} is U+0F37.
2003-11-09 23:15:58 +00:00
dchandler 084e12a02c Import Wylie is a buggy feature. The menu now calls it "(Buggy) Import Wylie...". t+s+w doesn't even convert correctly!
Bug-free EWTS->TMW using the org.thdl.tib.text.ttt codebase will be here soon.
2003-11-09 01:25:58 +00:00
dchandler 04816acb74 ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those
stacks that use full-form subjoined RA and YA consonants.

ACIP {RVA} was converting to the wrong things.

The TMW for {RVA} was converting to the wrong ACIP.

Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.
2003-11-09 01:07:45 +00:00
dchandler 8193cef5d1 Better comments. 2003-11-09 01:07:07 +00:00
dchandler dbd9c80ca0 Special tests for rwa and r+wa, which are the only two different stacks with the same hash key modulo - and +. 2003-11-09 01:06:26 +00:00
dchandler 85e1e0701e Fixed crashing bug in Import Wylie. 2003-11-08 23:32:53 +00:00
dchandler 8fbd8850f8 New feature: Convert Selection from TWM to ACIP. 2003-11-08 23:22:06 +00:00
dchandler bab47c4910 There are now extensive tests to make sure that each Tibetan stack in TMW can be typed in using EWTS and correctly converted to TMW and then back to EWTS. These tests unearthed new bugs in the Tibetan! 5.1 docs. 2003-11-08 22:11:24 +00:00
dchandler 3fa417d3ee phywI, phywU, drwI and drwU now produce vowels and subjoined a-chungs. The Tibetan! 5.1 docs say I and U are not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the I or U request -- we were silent. 2003-11-08 21:53:34 +00:00
dchandler e058d6252e phywu and drwu now produce zhabs-kyus. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent. 2003-11-08 21:48:08 +00:00
dchandler 55aaeef9d0 l+h+wu now produces a zhabs-kyu. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to l+h+w, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent. 2003-11-08 21:23:50 +00:00
dchandler 06edf17b04 Once again, the wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually. 2003-11-08 21:17:18 +00:00
dchandler f626a04d72 Tests t+r+n glyph. 2003-11-08 20:28:34 +00:00
dchandler 74d6bc61ab The wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually. 2003-11-08 20:25:16 +00:00
dchandler a0ae0bf70d Fixes bug 800164. Jskad users can now enter t+r+n on the keyboard. Wylie Word should work for t+r+n too. 2003-11-08 17:50:10 +00:00
dchandler 0ac90d7c0f Nathanial -> Nathaniel 2003-11-08 03:42:51 +00:00
dchandler e3f1ed5914 Removed a DOS EOF character (^Z). I haven't a clue how it crept in -- the lexer doesn't let that kind of thing get into tsheg bars. 2003-10-27 13:58:45 +00:00
dchandler 94a43d3f39 Now anything not clearly native Tibetan is colored green when coloring is enabled. G'EEm is "native", though -- the only "vowel" that implies non-nativeness is {:}, as in {KA:}. 2003-10-26 18:56:48 +00:00
dchandler 5c36dd81d3 Fixed bug 830332, "Convert selected ACIP=>Tibetan busted". 2003-10-26 18:25:25 +00:00
dchandler e74547d743 GA-YOGS now parses like G-YOGS and GAYOGS do. 2003-10-26 18:06:38 +00:00
dchandler 61cf19932e ACIP {B5} and {7'} were problematic; that's fixed. 2003-10-26 17:47:35 +00:00
dchandler ad7b20e485 Added yet more metadata. 2003-10-26 16:05:30 +00:00
dchandler 1550fee41a Removed garbage. 2003-10-26 16:05:07 +00:00
dchandler fe33d67573 Added more metadata. There are 35 million+ tsheg bars here. 2003-10-26 15:35:08 +00:00
dchandler 050666d735 I'm committing this at 1:55 am EST on Sunday, October 26, 2003. There
is no compelling technical reason, but this way I get to have two
commits that are both before and after each other.

Freaky.
2003-10-26 06:56:12 +00:00
dchandler 31b3020d07 Added a test case that runs almost all the tsheg bars from all
non-reference, publicly available ACIP files (hundreds of megabytes of
them) through the converter.  The frequencies of these tsheg bars in
in the file, too.
2003-10-26 06:02:48 +00:00
dchandler 7ba1ad0735 Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc. 2003-10-26 02:42:06 +00:00
dchandler ef24c608bf Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically.
This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.
2003-10-26 02:17:19 +00:00
dchandler 6bda550157 The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:32:55 +00:00
dchandler d99ae50d8a The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:24:28 +00:00
dchandler 1415fc43e3 The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question. 2003-10-26 00:21:54 +00:00
dchandler 306cf2817c Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
Added a few new tests.
2003-10-25 21:47:34 +00:00
dchandler f106deb884 Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
Added a few new tests.
2003-10-25 21:40:21 +00:00
dchandler af013a6a39 I renamed this function a while ago. 2003-10-22 02:49:16 +00:00
dchandler 7d24ab393f Code cleanup. 2003-10-21 03:44:02 +00:00
dchandler c764eee8d0 Added a new warning for DMAR and others affected similarly affected by prefix rules, where seeing D+MAR, not D-MAR, could have caused an input operator to type in DMAR. This is a "Most" warning, but DMA causes a higher-priority "Some" warning. 2003-10-21 03:36:57 +00:00
dchandler 2f39921381 Added more test cases. 2003-10-21 02:14:45 +00:00
dchandler 2f81a801ef Added three new kinds of warnings to ACIP->Tibetan conversions. 2003-10-21 02:00:49 +00:00
dchandler a47af2c165 Bulletproofing -- code cleanup. 2003-10-21 00:31:10 +00:00
dchandler 188b9c322e Warn about prefix rules only in Most and All modes. 2003-10-21 00:23:55 +00:00
dchandler 1224030898 Speedup. 2003-10-21 00:19:15 +00:00
dchandler 1d9b405bb8 Forgot to add this file earlier. 2003-10-20 13:49:54 +00:00
dchandler 5d9305c9d5 "Browse..." buttons are smart about file types now. 2003-10-19 23:17:25 +00:00
dchandler 3aa3859354 ACIP->Unicode crash fixed.
5% of the code for support of ACIP->Unicode.rtf is here.
2003-10-19 22:19:16 +00:00
dchandler 5aab4acc93 I've undone the SNYAM'AM == SNYAMA'AM hack. The only occurrence of SNYAM'AM in the ACIP texts I've got is likely a typo, says Robert Chilton.
The code would be cleaner if I could bear to delete my terrible hack.  Maybe in a month, when I don't feel so dumb for coding it up in the first place.

The correct solution for such things is to give the ACIP->Tibetan converters a pre-filter mechanism.  This would be before the lexer or part of the lexer (maybe you only want to filter tsheg bars), and it would allow the end user to specify things like "s/SNYAM'AM/S+NYAMA'AMA/g".
2003-10-19 20:48:22 +00:00
dchandler 4b1395e0ba Jskad has a new feature: Convert Selection from ACIP to Tibetan. It uses the ACIP converter to do its work.
Improved some error messages from the ACIP->Tibetan converter.
2003-10-19 20:16:06 +00:00
dchandler 5ce84d4d9a Tiny code cleanup. 2003-10-19 04:43:34 +00:00
dchandler 0edebd55d7 We were dying in the "can ts+h take a ga prefix?" check for GTZHAN. 2003-10-19 03:47:33 +00:00
dchandler 47648186b4 Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences. 2003-10-18 18:34:49 +00:00
dchandler e5534f69ee Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences. 2003-10-18 18:29:46 +00:00
dchandler 557ed7ed44 DKY'O etc. weren't being handled properly by ACIP->Tibetan. Now they are. 2003-10-18 17:49:29 +00:00
dchandler e799438f86 CVS ignoring backup files. 2003-10-18 17:47:56 +00:00
dchandler 3b55ea509f Prefix rules have changed. A few are gone; a few new ones are here. I've implemented here a list that Robert Chilton sent me in private correspondence. He doesn't describe it as definitive, but since it affects ACIP->Tibetan conversions, and it's the best I've got, here they are. There's still an optional warning about "Hey, prefix rules matter for this tsheg bar."
I've left in a few rules that I didn't find on RC's list; I've asked him to look into these further.
2003-10-18 05:48:53 +00:00
dchandler f28bee4c71 The appendage 'um is here too. 2003-10-18 05:10:49 +00:00
dchandler 8c99adeb63 TMW->EWTS, TMW->ACIP, and ACIP->Unicode/TMW now support more appendages. Personal correspondence with Robert Chilton led me to support, besides 'am, 'ang, 'o, 'i, and 'u, the following:
'e (used in foreign transliteration)
'ongs
'is
'os
'ur
'us
'ung
2003-10-18 03:04:47 +00:00
dchandler 5e18feb47d ACIP now stacks greedily. TTTTTA is T+T+T+T+TA, even though that stack doesn't exist in TM or TMW. Robert Chilton, in personal correspondence, agreed that this is the way to do things.
ACIP handles the appendages 'AM, 'ANG, 'US, 'UR, 'I, 'O, and 'U correctly.
2003-10-16 04:15:10 +00:00
dchandler 5f4fbfab7c Bulletproofing and debugging support. 2003-10-16 04:13:14 +00:00
dchandler 129ebccd67 In TCC #1 keyboard, h>cj now works. I may have fixed this in a terrible way, breaking other things even. Hard to say because I don't really understand the code I changed. But DuffPaneTest passes.
If we ever clean up the keyboards, the changes made here to tcc_keyboard.ini should probably be undone.
2003-10-12 18:16:17 +00:00
dchandler d7fdacfcdc Open menu is now Open..., Save as is now Save as... 2003-10-12 18:12:19 +00:00
dchandler 8dbfff17e1 All .rtf and .Rtf and .RTF files are selectable now. 2003-10-12 18:11:50 +00:00
dchandler 35209ce7fd I'm going to have to debug this, and the tab stops make the source unreadable. I don't like messing with whitespace, but it seems like I'll be the main maintainer for a while, and the people after me can use cvs diff -wb. So I'm untabifying. 2003-10-12 16:44:28 +00:00
dchandler 749b8d6727 Added toString for debugging. 2003-10-04 16:33:47 +00:00
dchandler b983af8031 r-t, not rt. This was why converting 'brtul' from TMW to Wylie didn't work. 2003-10-04 16:33:23 +00:00
dchandler 6a11eddb1e Warning level "None" wasn't working. 2003-10-04 16:12:48 +00:00
dchandler b10098cc61 "Most" warnings now excludes "the last stack has no vowel", making it much more useful. 2003-10-04 15:10:18 +00:00
dchandler ee50291ed4 Andres found that "THAG PA" caused a NullPointerException. That's fixed.
Renamed ACIPString to TString -- we'll use this for EWTS and ACIP both.

TMW->ACIP for TMW9.61 should work now.
2003-10-04 01:22:59 +00:00
amontano c8927b827c Fixed bugs in the scanner. Added reference to yogacara bhumi in the about window. 2003-09-23 19:05:23 +00:00
amontano e89c49651c Now translation tool accepts synonyms separated by ';' in the entry field. 2003-09-14 05:56:20 +00:00
dchandler 115d0e0e6c Fixed ACIP->TMW vowels like 'I etc.
Fixed ACIP->Unicode/TMW for BDE, which should be B-DE, not B+DE, because the former is legal Tibetan.

The ACIP->EWTS subroutine has improved.

TMW->Wylie and TMW->ACIP are improved in error cases.

TMW->ACIP has friendly embedded error messages now.
2003-09-12 05:06:37 +00:00
dchandler 16817d0b8e Fixed Javadocs. 2003-09-10 01:19:05 +00:00
amontano cc853be387 Fixed a bug with regards to the word order in the servlet version. 2003-09-09 16:02:03 +00:00
amontano 1467f9cd3f Fixed display of servlet version and added option to include links to
other versions. See http://iris.lib.virginia.edu/tibetan/servlet/org.thdl.tib.scanner.OnLineScannerFilter?thdlBanner=on
2003-09-08 21:32:40 +00:00
amontano 73d01111ca Fixed the "clicking on the translate button makes the thdl menu go away"
error. on the servlet version of the translation tool.
2003-09-08 16:39:18 +00:00
amontano 07fbbcaf45 Solved some sorting errors with the servlet version.
Also if the service parameter thdlBanner=anything is sent, the THDL's
java script menu is displayed (if it is running on the thdl server). There is
still a bug. Menu goes away when pressing "translate" button. See:
http://iris.lib.virginia.edu/tibetan/servlet/org.thdl.tib.scanner.OnLineScannerFilter?thdlBanner=on
2003-09-08 08:12:56 +00:00
dchandler e42d76b3b8 Nicer default Latin font for ACIP->* conversions.
Performance improvement in non-color-coding mode.
2003-09-07 22:08:35 +00:00
dchandler 6872ea8028 Corrected the usage info. 2003-09-07 22:08:00 +00:00
dchandler d8657abd44 ACIP font shrinking as in {KA (GA)} is now supported. 2003-09-07 18:30:59 +00:00
dchandler 07e360d9a8 The ACIP {NYA%} is supported. {NYAo} and {NYAx} are confusing to me,
because I don't know which glyphs o and x correspond to.  For that
reason, they cause ERRORs.

The proposed THDL Extended Wylie ~X and X is now used for U+0F35 and
U+0F37 respectively.
2003-09-07 16:19:50 +00:00
amontano f57cdda867 Now translation tool displays to where is it connected 2003-09-07 03:40:51 +00:00
amontano b489034598 Fixed a call to a deprecated method 2003-09-07 03:39:08 +00:00
dchandler 0d6d6ed611 Added GUI support for color-coding. Added support for color-coding
and choosing the warning level to TibetanConverter.

Better error checking in the GUI converter.
2003-09-06 22:56:10 +00:00
dchandler 1308f14807 sanskrit=green, prefix-rule-afflicted-tsheg-bar=yellow 2003-09-05 06:05:46 +00:00
dchandler 899b042ec0 Preliminary, untested color support in ACIP->TMW conversion. 2003-09-05 05:54:35 +00:00
dchandler 717c3b94f3 Fixed ACIP->Unicode spaces/tshegs and newlines, especially with shads.
"NGA," becomes "NGA-tsheg-," automatically now.
2003-09-05 05:08:47 +00:00
dchandler 5c240ac072 From the converter GUI, you can now choose TMW->ACIP text and
TMW->Wylie text.  All the conversions show you which format they take
as input and which format they give as output.

File filter for ACIP files added.

The GUI converter suggests a file extension wisely.

Fixed newline bug in ACIP->Unicode converter.
2003-09-05 02:05:34 +00:00
dchandler 4abbf6db37 --to-acip-text and --to-wylie-text added; these get you text files,
not RTF files like --to-acip and --to-wylie do.  The GUI converter
doesn't yet allow you to get text files.
2003-09-04 05:16:47 +00:00
dchandler cc615f34df ACIP->TMW and ACIP->Unicode have my pre-stamp of non-approval. Except
for (NYAx} and {NYAo}, they're as good as I'll get them without input
from experts of the employ of a complementary, syllabary-based
approach.
2003-09-04 04:34:18 +00:00
dchandler ae7a7577bc ACIP->TMW and ACIP->Unicode are now smart about when a newline is
really a newline and when a space is really a tsheg. The space in {KA
,MDO} is a tsheg, but the space in {GA ,MDO} is not.
2003-09-04 04:13:01 +00:00
dchandler d2749cecd0 ACIP->TMW and ACIP->Unicode are now smart about when a newline is
really a newline and when a space is really a tsheg. The space in {KA
,MDO} is a tsheg, but the space in {GA ,MDO} is not.
2003-09-04 04:04:21 +00:00
dchandler 72e531e515 Use shortened 'dreng-bu, not regular. As per TM glyphs. I suspect
that the following would look better with shortened 'dreng-bu also,
but I'm sticking with the TM/TMW docs:

dz+r~137,2~~4,46~1,110~4,120~1,123~1,126~4,106~4,113~f5b,fb2
dz+w~138,2~~4,47~1,110~4,120~1,123~1,126~4,106~4,113~f5b,fad
dz+h~139,2~~4,48~1,110~4,120~1,123~1,126~4,106~4,113~0F5C
dz+h+y~140,2~~4,49~1,110~4,121~1,123~1,126~4,107~4,114~f5c,fb1
dz+h+r~141,2~~4,50~1,110~4,121~1,123~1,126~4,107~4,114~f5c,fb2
dz+h+l~249,2~~4,51~1,110~4,123~1,123~1,126~4,110~4,117~f5c,fb3
dz+h+w~143,2~~4,52~1,110~4,122~1,123~1,126~4,108~4,115~f5c,fad
2003-09-04 03:46:35 +00:00
a1tsal 2f58ec2760 A bunch of Sanskrit stacks of the form ts+... and dz+...had 1,125 for their
drengbu, but that is actually a naro.  I changed it to 1,123
(which is one of the two drengbus).
2003-09-04 02:06:58 +00:00
dchandler 316f59107b A preliminary TMW->ACIP converter is here. There are known bugs, mostly with rare punctuation. 2003-09-02 06:39:33 +00:00
dchandler cc9ab06864 Added utility routine. Better comments. 2003-08-31 20:38:28 +00:00
dchandler 045c4069c9 Preliminary ACIP->TMW support is in place. {DU} gives you something
less beautiful than what Jskad would give, so more work is needed.
2003-08-31 16:06:35 +00:00
a1tsal 1f4d53be2e Moved ^M to punctuation section.
Removed obsolete comment.
2003-08-31 00:44:23 +00:00
a1tsal 522812996e Remove unused sections of tibwn.ini. 2003-08-31 00:34:15 +00:00
dchandler dd22e161a5 Code cleanup for Jskad's Tibetan font converter GUI. 2003-08-30 05:01:15 +00:00
dchandler 896344f2d1 David Chapman removed some lines from tibwn.ini. That breaks TM<->TMW
mappings, so I've put them back, but with the EWTS non-correspondences
\tmwXYYY.

Jskad no longer supports superscribed or subscribed numerals, because
EWTS does not.
2003-08-26 01:28:02 +00:00
a1tsal ccdebf6719 Removed half numbers (no longer in EWTS)
Brought <?Other?> closer to EWTS
Removed __TILDE__ (no longer in EWTS)
Changed M^ to ^M per new EWTS draft
Added ai, au, -i from WW tibwn.ini -- they were missing in this version
2003-08-25 23:19:48 +00:00
dchandler 1982c5847b Jskad's converter now has ACIP-to-Unicode built in. There are known
bugs; it is pre-alpha.  It's usable, though, and finds tons of errors
in ACIP input files, with the user deciding just how pedantic to be.
The biggest outstanding bug is the silent one: treating { }, space, as
tsheg instead of whitespace when we ought to know better.
2003-08-24 06:40:53 +00:00
dchandler d5ad760230 TMW->Wylie conversion now takes advantage of prefix rules, the rules
that say "ya can take a ga prefix" etc.

The ACIP->Unicode converter now gives warnings (optionally, and by
default, inline).  This converter now produces output even when
lexical errors occur, but the output has errors and warnings inline.
2003-08-23 22:03:37 +00:00
dchandler 21ef657921 I'd broken the ACIP->Wylie for ACIP vowels {'A}, {'I}, etc. 2003-08-22 05:13:32 +00:00
dchandler 1afb3a0fdd ACIP->Unicode, without going through TMW, is now possible, so long as
\, the Sanskrit virama, is not used.  Of the 1370-odd ACIP texts I've
got here, about 57% make it through the gauntlet (fewer if you demand
a vowel or disambiguator on every stack of a non-Tibetan tsheg bar).
2003-08-18 02:38:54 +00:00
dchandler 245aac4911 I'm now stricter about accepting alphabetic characters. F, Q, X, a,
b, c, d, e, ... do not belong in ACIP, so the scanner rejects them.
This should make it even easier to distinguish automatically between
Tibetan and English texts.
2003-08-17 02:38:58 +00:00
dchandler 39451d8879 Fixed a couple of small bugs.
Only 250 errors are reported now; this is important if you try to
convert an English document.
2003-08-17 02:12:49 +00:00
dchandler 4581a2d8ab Improved the ACIP scanner (the part of the converter that says, "This
is a correction, that's a comment, this is Tibetan, that's Latin
(English), that's Tibetan inter-tsheg-bar punctuation, etc.)  It now
accepts more real-world ACIP files, i.e. it handles illegal
constructs.  The error checking is more user-friendly.  There are now
tests.

Added some tsheg bars that Peter E. Hauer of Linguasoft sent me to the
tests.  Many thanks, Peter.  I still need to implement rules that say,
"This is not Tibetan, it must be Sanskrit, because that letter doesn't
take a MA prefix."
2003-08-17 01:45:55 +00:00
dchandler 0b91ed0beb I've improved the ACIP tsheg bar scanner to handle a lot of illegal
constructions that occur in practice.
2003-08-16 16:13:53 +00:00
amontano 2a57439516 Updated the info displayed on the about window. 2003-08-14 14:16:49 +00:00
amontano da384c6c2f Now when loading, takes the default font options from the DuffPane. 2003-08-14 14:16:23 +00:00
dchandler 2b59d9838d I now have a function that takes as input a String of ACIP and breaks
up that String into tsheg bars, punctuation, etc., while finding
errors.  I've tested it some, but I'm not yet committing the tests.

Next step: a converter that takes an ACIP file as input and outputs
TMW+Latin.
2003-08-14 05:10:47 +00:00
dchandler 57f506384f The ACIP->Tibetan converter now has perfect low-level functionality,
and it has the capability to produce error messages and warnings that
make sense to the user.  One can now get the correct parse, if one
exists, for an ACIP tsheg bar.

One could even feed in ACIP and get a list of warnings about things as
innocuous as PADMA, which a dumb converter would have trouble with.
One could then turn ACIP into well-behaved ACIP for that dumb
converter, if you really wanted to.

Still to do:

o Scan ACIP files into tsheg bars.
o Produce TMW/Latin (from which you can get Unicode, etc.).
o E-mail the illegal tsheg bars to the ACIP fellows so they can fix
  the affected documents (most of the Kangyur has unparseable
  creatures).
2003-08-12 04:13:11 +00:00
dchandler 87266646fb Removed misinformation. 2003-08-10 19:33:01 +00:00
dchandler e21d3774a9 Added an unfinished ACIP->Tibetan converter. Once it works properly
for ACIP, it'll easily be made to work as a perfect EWTS
Wylie->Tibetan converter.  It has an extensive suite of tests for the
existing functionality.
2003-08-10 19:30:07 +00:00
dchandler 39e0435b6b Refactored this code so that Wylie->Tibetan and ACIP->Tibetan
conversions can make use of it.  Hooray for reuse.
2003-08-10 19:02:56 +00:00
dchandler bcf1c12b6a We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie.
Our disambiguation is now perfect, happening when and only when it is
necessary.  These are all illegal, so it shouldn't affect many
existing conversions.  But if there were typos, it could.
2003-08-10 18:46:01 +00:00
dchandler 9093fd3c05 We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie.
Our disambiguation is now perfect, happening when and only when it is
necessary.  These are all illegal, so it shouldn't affect many
existing conversions.  But if there were typos, it could.
2003-08-10 18:38:20 +00:00
dchandler 251d8feae5 brtan now gives TMW->Wylie brtan, not b.rtan. Etc. See bug report
http://sourceforge.net/tracker/index.php?func=detail&aid=785791&group_id=61934&atid=502515.
2003-08-09 17:48:40 +00:00
dchandler 7dffc47cb7 'bad now gives TMW->Wylie 'bad, not TMW->Wylie 'abd. Andres came
across this one, so we've added it to the list of ambiguous three-consonant
combos.
2003-08-09 17:05:43 +00:00
amontano 52cdc17794 Added support for multiple keyboards and ability to set the preferences
for size of tibetan font and type and size of roman font.
2003-08-09 08:00:58 +00:00
amontano 8e4b508de8 Made a new class for the preference window so that other software
(i.e. the translation tool) can use re-use that same code to set up the
attributes of the tibetan and roman fonts.
2003-08-09 07:57:21 +00:00
amontano ef0df405d9 Redesigned the interface of the handheld version. 2003-08-03 06:29:08 +00:00
amontano 2b5a5fe67a Got rid of redundant code 2003-08-03 06:28:22 +00:00
amontano cce779bf88 Added a wizard window to avoid as much as possible using the command line.
This way through clicking on the application through the wizard one can choose
to connect to the available on-line dicts, open a local dict or generate a dict database.
2003-08-03 06:27:30 +00:00
dchandler 4caeafa1b1 You shouldn't have one of these without the other, now that there are two.
This way neither TM nor TMW fonts will be loaded.
2003-07-26 00:55:32 +00:00
dchandler 2bb499e5a7 This was dying with a NullPointerException when you started it up using
'ant tt-run' with no dictionary.  Now it starts up and shows you a nice
error message, "Dictionary could not be loaded!", instead.
2003-07-26 00:53:59 +00:00