Commit graph

207 commits

Author SHA1 Message Date
dchandler
2cb90bd231 ACIP->Tibetan converters now warn every time {%} is encountered that U+0F14 might've been intended.
The Unicode for ACIP {o} is U+0F37.
2003-11-09 23:15:58 +00:00
dchandler
04816acb74 ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those
stacks that use full-form subjoined RA and YA consonants.

ACIP {RVA} was converting to the wrong things.

The TMW for {RVA} was converting to the wrong ACIP.

Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.
2003-11-09 01:07:45 +00:00
dchandler
8193cef5d1 Better comments. 2003-11-09 01:07:07 +00:00
dchandler
3fa417d3ee phywI, phywU, drwI and drwU now produce vowels and subjoined a-chungs. The Tibetan! 5.1 docs say I and U are not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the I or U request -- we were silent. 2003-11-08 21:53:34 +00:00
dchandler
e058d6252e phywu and drwu now produce zhabs-kyus. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent. 2003-11-08 21:48:08 +00:00
dchandler
55aaeef9d0 l+h+wu now produces a zhabs-kyu. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to l+h+w, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent. 2003-11-08 21:23:50 +00:00
dchandler
06edf17b04 Once again, the wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually. 2003-11-08 21:17:18 +00:00
dchandler
74d6bc61ab The wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually. 2003-11-08 20:25:16 +00:00
dchandler
a0ae0bf70d Fixes bug 800164. Jskad users can now enter t+r+n on the keyboard. Wylie Word should work for t+r+n too. 2003-11-08 17:50:10 +00:00
dchandler
e3f1ed5914 Removed a DOS EOF character (^Z). I haven't a clue how it crept in -- the lexer doesn't let that kind of thing get into tsheg bars. 2003-10-27 13:58:45 +00:00
dchandler
94a43d3f39 Now anything not clearly native Tibetan is colored green when coloring is enabled. G'EEm is "native", though -- the only "vowel" that implies non-nativeness is {:}, as in {KA:}. 2003-10-26 18:56:48 +00:00
dchandler
5c36dd81d3 Fixed bug 830332, "Convert selected ACIP=>Tibetan busted". 2003-10-26 18:25:25 +00:00
dchandler
e74547d743 GA-YOGS now parses like G-YOGS and GAYOGS do. 2003-10-26 18:06:38 +00:00
dchandler
61cf19932e ACIP {B5} and {7'} were problematic; that's fixed. 2003-10-26 17:47:35 +00:00
dchandler
ad7b20e485 Added yet more metadata. 2003-10-26 16:05:30 +00:00
dchandler
1550fee41a Removed garbage. 2003-10-26 16:05:07 +00:00
dchandler
fe33d67573 Added more metadata. There are 35 million+ tsheg bars here. 2003-10-26 15:35:08 +00:00
dchandler
050666d735 I'm committing this at 1:55 am EST on Sunday, October 26, 2003. There
is no compelling technical reason, but this way I get to have two
commits that are both before and after each other.

Freaky.
2003-10-26 06:56:12 +00:00
dchandler
31b3020d07 Added a test case that runs almost all the tsheg bars from all
non-reference, publicly available ACIP files (hundreds of megabytes of
them) through the converter.  The frequencies of these tsheg bars in
in the file, too.
2003-10-26 06:02:48 +00:00
dchandler
7ba1ad0735 Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc. 2003-10-26 02:42:06 +00:00
dchandler
ef24c608bf Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically.
This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.
2003-10-26 02:17:19 +00:00
dchandler
6bda550157 The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:32:55 +00:00
dchandler
d99ae50d8a The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:24:28 +00:00
dchandler
306cf2817c Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
Added a few new tests.
2003-10-25 21:47:34 +00:00
dchandler
f106deb884 Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
Added a few new tests.
2003-10-25 21:40:21 +00:00
dchandler
7d24ab393f Code cleanup. 2003-10-21 03:44:02 +00:00
dchandler
c764eee8d0 Added a new warning for DMAR and others affected similarly affected by prefix rules, where seeing D+MAR, not D-MAR, could have caused an input operator to type in DMAR. This is a "Most" warning, but DMA causes a higher-priority "Some" warning. 2003-10-21 03:36:57 +00:00
dchandler
2f39921381 Added more test cases. 2003-10-21 02:14:45 +00:00
dchandler
2f81a801ef Added three new kinds of warnings to ACIP->Tibetan conversions. 2003-10-21 02:00:49 +00:00
dchandler
a47af2c165 Bulletproofing -- code cleanup. 2003-10-21 00:31:10 +00:00
dchandler
188b9c322e Warn about prefix rules only in Most and All modes. 2003-10-21 00:23:55 +00:00
dchandler
1224030898 Speedup. 2003-10-21 00:19:15 +00:00
dchandler
1d9b405bb8 Forgot to add this file earlier. 2003-10-20 13:49:54 +00:00
dchandler
3aa3859354 ACIP->Unicode crash fixed.
5% of the code for support of ACIP->Unicode.rtf is here.
2003-10-19 22:19:16 +00:00
dchandler
5aab4acc93 I've undone the SNYAM'AM == SNYAMA'AM hack. The only occurrence of SNYAM'AM in the ACIP texts I've got is likely a typo, says Robert Chilton.
The code would be cleaner if I could bear to delete my terrible hack.  Maybe in a month, when I don't feel so dumb for coding it up in the first place.

The correct solution for such things is to give the ACIP->Tibetan converters a pre-filter mechanism.  This would be before the lexer or part of the lexer (maybe you only want to filter tsheg bars), and it would allow the end user to specify things like "s/SNYAM'AM/S+NYAMA'AMA/g".
2003-10-19 20:48:22 +00:00
dchandler
4b1395e0ba Jskad has a new feature: Convert Selection from ACIP to Tibetan. It uses the ACIP converter to do its work.
Improved some error messages from the ACIP->Tibetan converter.
2003-10-19 20:16:06 +00:00
dchandler
5ce84d4d9a Tiny code cleanup. 2003-10-19 04:43:34 +00:00
dchandler
0edebd55d7 We were dying in the "can ts+h take a ga prefix?" check for GTZHAN. 2003-10-19 03:47:33 +00:00
dchandler
47648186b4 Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences. 2003-10-18 18:34:49 +00:00
dchandler
557ed7ed44 DKY'O etc. weren't being handled properly by ACIP->Tibetan. Now they are. 2003-10-18 17:49:29 +00:00
dchandler
e799438f86 CVS ignoring backup files. 2003-10-18 17:47:56 +00:00
dchandler
3b55ea509f Prefix rules have changed. A few are gone; a few new ones are here. I've implemented here a list that Robert Chilton sent me in private correspondence. He doesn't describe it as definitive, but since it affects ACIP->Tibetan conversions, and it's the best I've got, here they are. There's still an optional warning about "Hey, prefix rules matter for this tsheg bar."
I've left in a few rules that I didn't find on RC's list; I've asked him to look into these further.
2003-10-18 05:48:53 +00:00
dchandler
f28bee4c71 The appendage 'um is here too. 2003-10-18 05:10:49 +00:00
dchandler
8c99adeb63 TMW->EWTS, TMW->ACIP, and ACIP->Unicode/TMW now support more appendages. Personal correspondence with Robert Chilton led me to support, besides 'am, 'ang, 'o, 'i, and 'u, the following:
'e (used in foreign transliteration)
'ongs
'is
'os
'ur
'us
'ung
2003-10-18 03:04:47 +00:00
dchandler
5e18feb47d ACIP now stacks greedily. TTTTTA is T+T+T+T+TA, even though that stack doesn't exist in TM or TMW. Robert Chilton, in personal correspondence, agreed that this is the way to do things.
ACIP handles the appendages 'AM, 'ANG, 'US, 'UR, 'I, 'O, and 'U correctly.
2003-10-16 04:15:10 +00:00
dchandler
5f4fbfab7c Bulletproofing and debugging support. 2003-10-16 04:13:14 +00:00
dchandler
129ebccd67 In TCC #1 keyboard, h>cj now works. I may have fixed this in a terrible way, breaking other things even. Hard to say because I don't really understand the code I changed. But DuffPaneTest passes.
If we ever clean up the keyboards, the changes made here to tcc_keyboard.ini should probably be undone.
2003-10-12 18:16:17 +00:00
dchandler
749b8d6727 Added toString for debugging. 2003-10-04 16:33:47 +00:00
dchandler
b983af8031 r-t, not rt. This was why converting 'brtul' from TMW to Wylie didn't work. 2003-10-04 16:33:23 +00:00
dchandler
6a11eddb1e Warning level "None" wasn't working. 2003-10-04 16:12:48 +00:00