dchandler
7acbce3361
Added errors 142 and 143, which are produced when converting yig chung
...
to a Unicode text file, which cannot support font size changes.
2004-06-06 21:59:16 +00:00
dchandler
df262aa148
It is now a compile-time option whether to treat []- and {}-bracketed sequences
...
as text to be passed through (without the brackets in the case of {}) literally,
which is the case by default because Robert Chilton requested it, or the old,
ad-hoc mechanism which could be useful for finding some ugly input.
Made a couple of error messages a little more verbose now that we have
short-message mode.
2004-06-06 21:39:06 +00:00
dchandler
8a9271a3d8
I broke warning 507 into two warnings, one high-priority (512) and one
...
low-priority (507).
2004-05-01 20:49:53 +00:00
dchandler
31bdd39fec
The TMW for 'da'i was converting to 'aad'i. Andres found this; it is bug
...
945744. I've made it more correct -- 'ad'i is now produced. The wrong stack
is thought to be the root stack still.
2004-05-01 19:11:15 +00:00
dchandler
1a055f3472
I don't think warning level "None" was really doing the trick. Fixed that.
...
You can now customize the severities of all warnings, even 504 and 510.
When warning level is "None", scanning, i.e. lexical analysis, is faster.
2004-04-25 00:37:57 +00:00
dchandler
e2d42f36eb
Robert Chilton's experience inspired me to make the handling of errors and
...
warnings in ACIP->Tibetan conversion much more configurable. You can
now choose from short or long error messages, for one thing. You can change
the severity of almost all warnings. Each error and warning has an error code.
Errors and warnings are better tested.
The converter GUI has a new checkbox for short messages; the converter
CLI has a new mandatory option for short messages.
I also fixed a bug whereby certain errors were not being appended to the
'errors' StringBuffer.
2004-04-24 17:49:16 +00:00
dchandler
cc5d096918
David Chapman's latest fix to tibwn.ini (clearing up an issue that Than or I
...
dropped the ball on) introduced two lines for 8,95. This is a bad thing, so
I've taken out the second line. I've also introduced a check in
TibetanMachineWeb.java such that we'll know that tibwn.ini has no such
error in the future just by running 'ant clean jskad-run' and making sure that
the GUI is indeed visible.
I also updated the test baselines now that F03A and 0F82 are squared away.
2004-04-24 13:23:56 +00:00
a1tsal
9e071ea178
Differentiated 0F82 (~M`) and F03A (nyi.zla editor's mark).
2004-04-21 10:04:11 +00:00
dchandler
0ee90a0fb0
Added many ACIP->TMW->ACIP tests. They found no bugs.
2004-04-17 17:28:26 +00:00
dchandler
63438d243b
getACIP was getting EWTS, not ACIP.
2004-04-17 15:49:40 +00:00
dchandler
de3a19761e
Fixes for javadoc tool.
2004-04-17 15:48:50 +00:00
dchandler
adcf9de952
Two new tests.
2004-04-17 15:14:46 +00:00
dchandler
1bfd3772e6
TMW->ACIP is much improved. V and W were confused, # and * were
...
confused; many glyphs that should have yielded errors were not.
I've added a test case that transforms every TMW glyph save the one with
no TM mapping to ACIP. I hand-checked that it was correct.
ACIP->TMW is fixed for # and *. I never noticed it, but each needed an
extra swoosh (U+0F05).
Round-tripping would be good, as would testing real-world use of
TMW->ACIP.
2004-04-14 05:44:51 +00:00
dchandler
56a02ba41d
Fixed the worst TMW->ACIP bug, the one regarding U+0F04 and U+0F05.
...
TMW->EWTS requires no context information, but TMW->ACIP does.
2004-04-10 18:26:57 +00:00
dchandler
7eca276a62
TMW->Unicode conversions have changed; now using U+0F6A for the stacks
...
whose EWTS transliteration begins with "R+".
ACIP->* conversions and test baselines were updated to deal with the
"r+..."=>"R+..." change.
2004-04-10 16:03:25 +00:00
dchandler
aff34174ab
The new EWTS rule regarding R, W, and Y requires that these change. It
...
may also require changes to the following, but I'm going to ask if it really
should or not.
// Y+Y~185,3~~6,98~1,109~6,120~1,123~1,125~6,106~6,113~f61,fbb
// Y+r~186,3~~6,99~1,109~6,120~1,123~1,125~6,106~6,113~f61,fb2
// Y+w~187,3~~6,100~1,109~6,120~1,123~1,125~6,106~6,113~f61,fad
// Y+s~188,3~~6,101~1,109~6,120~1,123~1,125~6,106~6,113~f61,fb6
// W+y~69,4~~7,79~1,109~8,121~1,123~1,125~8,107~8,114~f5d,fb1
// W+r~70,4~~7,80~1,109~8,121~1,123~1,125~8,107~8,114~f5d,fb2
// W+n~195,4~~7,81~1,109~8,120~1,123~1,125~8,106~8,113~f5d,fa3
// W+W~194,4~~7,82~1,109~8,120~1,123~1,125~8,106~8,113~f5d,fba
2004-04-08 02:55:59 +00:00
dchandler
76356f4009
ACIP->Tibetan now gives an error when {?} is seen alone (not in {[?]} or {[*FOO?]}, but alone). Bug 860192 is fixed.
2004-03-15 00:49:01 +00:00
dchandler
542fb50bf1
The ~M and ~M` EWTS change had not fully been made. Someone submitted a bug report 911472 that alerted me to this.
2004-03-07 17:02:35 +00:00
dchandler
d436a4d462
Removed David Chapman's recently added line for U+0F82 -- a line for U+0F82 already existed, and the new line had incorrect TM and incorrect TMW mappings. I changed the existing line for U+0F82 to use the EWTS {~M`}.
2004-03-02 04:29:41 +00:00
a1tsal
8eaaeaa202
Fix careless error: I had the same TMW character for ~M and ~M`!
2004-02-22 09:14:56 +00:00
a1tsal
b14833b5b9
Change ^M to ~M to conform to spec.
...
Introduce ~M` (for 0F82).
2004-02-20 15:07:49 +00:00
dchandler
274e1736be
Deleted cut-and-paste goof.
2004-01-17 19:45:31 +00:00
dchandler
c69ba26c60
TString now has tracks what Roman transliteration system it is using. Next up is to make ACIPConverter handle EWTS or ACIP TStrings.
2004-01-17 19:28:54 +00:00
dchandler
48b4c5cb07
Added a Unicode->ASCII dump for debugging *->Unicode conversions. To use it, use 'java -cp Jskad.jar org.thdl.util.VerboseUnicodeDump'.
2004-01-17 17:10:12 +00:00
dchandler
4dd40809a5
A user reported that q` caused a crash with TCC keyboard #1 . Fixed. TCC keyboard #1 does not support q~ though.
2003-12-21 06:27:36 +00:00
dchandler
c1aa81e943
RFE 860190: ACIP->Unicode now gives a warning when it outputs something that can't be represented in TMW.
2003-12-16 07:45:40 +00:00
dchandler
848349fd3a
More tests.
2003-12-15 08:16:06 +00:00
dchandler
e7a9e7968f
ACIP->Unicode now uses two characters for consonants instead of one. This matches the dislike for characters like U+0F77 etc.
...
ACIP->Tibetan was not giving an error for BCWA because it parsed like BCVA. Fixed.
2003-12-15 07:32:14 +00:00
dchandler
e9f7b2dfed
If you want curly brackets around folio markers, you'll have to set
...
the system property
thdl.acip.to.x.output.curly.brackets.around.folio.markers to true.
2003-12-14 08:47:03 +00:00
dchandler
8664571577
Warnings were not being detected correctly. Fixed.
...
ACIP->Unicode uses U+0020, ' ', for whitespace. ACIP->TMW uses the
TMW whitespace for whitespace.
2003-12-14 08:38:10 +00:00
dchandler
01e65176d4
Using less memory and time to figure out if warnings occurred.
2003-12-14 07:41:15 +00:00
dchandler
76c2e969ac
Fixed ACIP->Unicode bug for YYE etc., things with full-formed
...
subjoined consonants and vowels.
Fixed ACIP->TMW for YYA etc., things with full-formed subjoined
consonants.
2003-12-14 07:36:21 +00:00
dchandler
f625c937ee
ACIP {B} was not being treated like {BA}; instead, an error was resulting. All the five prefixes were affected.
2003-12-14 05:54:07 +00:00
dchandler
581643cf59
{DAN,\nLHAG} used to be treated like {DAN, LHAG} but that got broken. Fixed.
...
Added tests for lexer's handling of ACIP spaces etc.
2003-12-10 06:55:16 +00:00
dchandler
8e673bbc2c
{NGA,} becomes {NGA\u0f0c,} now instead of {NGA\u0f0b,}.
...
Note: ACIP->Unicode for {NGA,} was not giving the Unicode that {NGA\u0f0b,} gives before.
2003-12-10 06:50:14 +00:00
dchandler
a466bad939
ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.
2003-12-08 07:51:45 +00:00
dchandler
a39c5c12b0
ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.
2003-12-08 07:15:27 +00:00
dchandler
b617f761d5
ACIP->TMW for {^GONG SA } used to fail; fixed.
2003-12-07 20:05:41 +00:00
dchandler
115534e688
ACIP->TMW for {^GONG SA } used to fail because we had \u0F38 in the ToWylie section. Now it's in the <?Input:Numbers?> section because I didn't want to introduce a new section. If WylieWord has trouble due to this misuse of the 'numbers' category, we'll introduce a new category, 'other'.
...
TMW->EWTS improved as a result -- {\u0F38.gonga sa } is produced now where {\u0F38agonga sa } was once produced. Even the better version is imperfect; see bug 855877.
2003-12-07 19:40:59 +00:00
dchandler
4adf87c401
Updated comments only.
2003-12-06 20:36:56 +00:00
dchandler
3f18623977
Added comments only.
2003-12-06 20:26:45 +00:00
dchandler
6232ee9170
Added comments referring to a user guide in development now.
2003-12-06 20:26:15 +00:00
dchandler
c43e9a446b
Revamped some ACIP->Tibetan error messages.
2003-12-06 20:19:40 +00:00
dchandler
c9c771d1ee
ACIP {&}, as in {KO&HAm,}, is supported.
2003-11-30 02:18:59 +00:00
dchandler
ac412c994b
Now {Pm} is treated like {PAm}; {Pm:} is like {PAm:}; {P:} is like {PA:}.
2003-11-30 02:06:48 +00:00
dchandler
e7c4cc1874
Updated to be in sync with latest EWTS draft.
2003-11-29 22:59:39 +00:00
dchandler
ffd041e32c
ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}.
...
Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided.
Also, we've traded some speed for memory -- DuffCode now uses bytes, not ints.
2003-11-29 22:57:12 +00:00
dchandler
dfaae4be93
ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}.
...
Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided.
2003-11-29 22:56:18 +00:00
dchandler
16bfeac641
These issues are non-issues; removing these comments.
2003-11-25 00:31:33 +00:00
dchandler
d3d0ff23a8
Chris Fynn and Tony Duff answered my questions about U+0F3F and U+0F3E.
2003-11-25 00:28:18 +00:00
dchandler
b8608797aa
Updated the code I used for testing to generate the file containing all glyphs in TM and all glyphs but one in TMW.
2003-11-24 05:59:32 +00:00
dchandler
5d053b41fe
Found another inconsistency between Unicode and the TM/TMW docs. I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day.
...
Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate. A few TMW glyphs still give errors.
Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested). The baselines have not been hand-verified, but changes will be detected.
2003-11-24 05:49:15 +00:00
dchandler
9a247f5932
N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate.
2003-11-24 04:55:11 +00:00
dchandler
1ec668c018
Dza is not in the latest EWTS draft.
2003-11-24 04:28:55 +00:00
dchandler
f76c089366
Using Y, R, and W everywhere needed. R+... is never needed in TM/TMW, I concluded (with 50% certainty).
2003-11-24 04:05:59 +00:00
dchandler
08c676c186
Bug fixes. Plus, now 99% in sync with the new EWTS draft. Search for 'DLC' to find a few open issues.
...
Readded the line for reversed dza; it should never have been deleted, as that breaks TM<->TMW. I tested the whole mapping by hand once; this incident shows that automation is very helpful.
'{' and '}' were swapped...
The Unicode for something was "", not "none".
+R, +W, +Y, R+ now in use (though more testing is needed)
2003-11-24 02:40:40 +00:00
dchandler
216c5b0d54
Fixed TWM->Wylie for achen. I even tested this by pretending achen could take a da prefix (when in reality it takes no prefixes).
2003-11-23 01:22:27 +00:00
dchandler
8d4fb5d13f
We crashed before when '~' was entered.
2003-11-14 04:50:55 +00:00
dchandler
b59b86fd73
Commented this to mention some recent testing.
2003-11-11 03:45:58 +00:00
dchandler
4023be9612
Better prettyprinting. Untested.
2003-11-11 03:43:26 +00:00
dchandler
4e6a9c299f
ACIP % {MTHAR%} and o {Ko} and ^ {^GONG SA} are now supported. A % always causes a warning.
2003-11-11 03:43:11 +00:00
dchandler
2cb90bd231
ACIP->Tibetan converters now warn every time {%} is encountered that U+0F14 might've been intended.
...
The Unicode for ACIP {o} is U+0F37.
2003-11-09 23:15:58 +00:00
dchandler
04816acb74
ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those
...
stacks that use full-form subjoined RA and YA consonants.
ACIP {RVA} was converting to the wrong things.
The TMW for {RVA} was converting to the wrong ACIP.
Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.
2003-11-09 01:07:45 +00:00
dchandler
8193cef5d1
Better comments.
2003-11-09 01:07:07 +00:00
dchandler
3fa417d3ee
phywI, phywU, drwI and drwU now produce vowels and subjoined a-chungs. The Tibetan! 5.1 docs say I and U are not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the I or U request -- we were silent.
2003-11-08 21:53:34 +00:00
dchandler
e058d6252e
phywu and drwu now produce zhabs-kyus. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.
2003-11-08 21:48:08 +00:00
dchandler
55aaeef9d0
l+h+wu now produces a zhabs-kyu. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to l+h+w, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.
2003-11-08 21:23:50 +00:00
dchandler
06edf17b04
Once again, the wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.
2003-11-08 21:17:18 +00:00
dchandler
74d6bc61ab
The wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.
2003-11-08 20:25:16 +00:00
dchandler
a0ae0bf70d
Fixes bug 800164. Jskad users can now enter t+r+n on the keyboard. Wylie Word should work for t+r+n too.
2003-11-08 17:50:10 +00:00
dchandler
e3f1ed5914
Removed a DOS EOF character (^Z). I haven't a clue how it crept in -- the lexer doesn't let that kind of thing get into tsheg bars.
2003-10-27 13:58:45 +00:00
dchandler
94a43d3f39
Now anything not clearly native Tibetan is colored green when coloring is enabled. G'EEm is "native", though -- the only "vowel" that implies non-nativeness is {:}, as in {KA:}.
2003-10-26 18:56:48 +00:00
dchandler
5c36dd81d3
Fixed bug 830332, "Convert selected ACIP=>Tibetan busted".
2003-10-26 18:25:25 +00:00
dchandler
e74547d743
GA-YOGS now parses like G-YOGS and GAYOGS do.
2003-10-26 18:06:38 +00:00
dchandler
61cf19932e
ACIP {B5} and {7'} were problematic; that's fixed.
2003-10-26 17:47:35 +00:00
dchandler
ad7b20e485
Added yet more metadata.
2003-10-26 16:05:30 +00:00
dchandler
1550fee41a
Removed garbage.
2003-10-26 16:05:07 +00:00
dchandler
fe33d67573
Added more metadata. There are 35 million+ tsheg bars here.
2003-10-26 15:35:08 +00:00
dchandler
050666d735
I'm committing this at 1:55 am EST on Sunday, October 26, 2003. There
...
is no compelling technical reason, but this way I get to have two
commits that are both before and after each other.
Freaky.
2003-10-26 06:56:12 +00:00
dchandler
31b3020d07
Added a test case that runs almost all the tsheg bars from all
...
non-reference, publicly available ACIP files (hundreds of megabytes of
them) through the converter. The frequencies of these tsheg bars in
in the file, too.
2003-10-26 06:02:48 +00:00
dchandler
7ba1ad0735
Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc.
2003-10-26 02:42:06 +00:00
dchandler
ef24c608bf
Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically.
...
This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.
2003-10-26 02:17:19 +00:00
dchandler
6bda550157
The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
...
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:32:55 +00:00
dchandler
d99ae50d8a
The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
...
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:24:28 +00:00
dchandler
306cf2817c
Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
...
Added a few new tests.
2003-10-25 21:47:34 +00:00
dchandler
f106deb884
Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
...
Added a few new tests.
2003-10-25 21:40:21 +00:00
dchandler
7d24ab393f
Code cleanup.
2003-10-21 03:44:02 +00:00
dchandler
c764eee8d0
Added a new warning for DMAR and others affected similarly affected by prefix rules, where seeing D+MAR, not D-MAR, could have caused an input operator to type in DMAR. This is a "Most" warning, but DMA causes a higher-priority "Some" warning.
2003-10-21 03:36:57 +00:00
dchandler
2f39921381
Added more test cases.
2003-10-21 02:14:45 +00:00
dchandler
2f81a801ef
Added three new kinds of warnings to ACIP->Tibetan conversions.
2003-10-21 02:00:49 +00:00
dchandler
a47af2c165
Bulletproofing -- code cleanup.
2003-10-21 00:31:10 +00:00
dchandler
188b9c322e
Warn about prefix rules only in Most and All modes.
2003-10-21 00:23:55 +00:00
dchandler
1224030898
Speedup.
2003-10-21 00:19:15 +00:00
dchandler
1d9b405bb8
Forgot to add this file earlier.
2003-10-20 13:49:54 +00:00
dchandler
3aa3859354
ACIP->Unicode crash fixed.
...
5% of the code for support of ACIP->Unicode.rtf is here.
2003-10-19 22:19:16 +00:00
dchandler
5aab4acc93
I've undone the SNYAM'AM == SNYAMA'AM hack. The only occurrence of SNYAM'AM in the ACIP texts I've got is likely a typo, says Robert Chilton.
...
The code would be cleaner if I could bear to delete my terrible hack. Maybe in a month, when I don't feel so dumb for coding it up in the first place.
The correct solution for such things is to give the ACIP->Tibetan converters a pre-filter mechanism. This would be before the lexer or part of the lexer (maybe you only want to filter tsheg bars), and it would allow the end user to specify things like "s/SNYAM'AM/S+NYAMA'AMA/g".
2003-10-19 20:48:22 +00:00
dchandler
4b1395e0ba
Jskad has a new feature: Convert Selection from ACIP to Tibetan. It uses the ACIP converter to do its work.
...
Improved some error messages from the ACIP->Tibetan converter.
2003-10-19 20:16:06 +00:00
dchandler
5ce84d4d9a
Tiny code cleanup.
2003-10-19 04:43:34 +00:00
dchandler
0edebd55d7
We were dying in the "can ts+h take a ga prefix?" check for GTZHAN.
2003-10-19 03:47:33 +00:00
dchandler
47648186b4
Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences.
2003-10-18 18:34:49 +00:00