Jskad

Author	SHA1	Message	Date
dchandler	16bfeac641	These issues are non-issues; removing these comments.	2003-11-25 00:31:33 +00:00
dchandler	d3d0ff23a8	Chris Fynn and Tony Duff answered my questions about U+0F3F and U+0F3E.	2003-11-25 00:28:18 +00:00
dchandler	b8608797aa	Updated the code I used for testing to generate the file containing all glyphs in TM and all glyphs but one in TMW.	2003-11-24 05:59:32 +00:00
dchandler	5d053b41fe	Found another inconsistency between Unicode and the TM/TMW docs. I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day. Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate. A few TMW glyphs still give errors. Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested). The baselines have not been hand-verified, but changes will be detected.	2003-11-24 05:49:15 +00:00
dchandler	9a247f5932	N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate.	2003-11-24 04:55:11 +00:00
dchandler	1ec668c018	Dza is not in the latest EWTS draft.	2003-11-24 04:28:55 +00:00
dchandler	f76c089366	Using Y, R, and W everywhere needed. R+... is never needed in TM/TMW, I concluded (with 50% certainty).	2003-11-24 04:05:59 +00:00
dchandler	08c676c186	Bug fixes. Plus, now 99% in sync with the new EWTS draft. Search for 'DLC' to find a few open issues. Readded the line for reversed dza; it should never have been deleted, as that breaks TM<->TMW. I tested the whole mapping by hand once; this incident shows that automation is very helpful. '{' and '}' were swapped... The Unicode for something was "", not "none". +R, +W, +Y, R+ now in use (though more testing is needed)	2003-11-24 02:40:40 +00:00
dchandler	216c5b0d54	Fixed TWM->Wylie for achen. I even tested this by pretending achen could take a da prefix (when in reality it takes no prefixes).	2003-11-23 01:22:27 +00:00
dchandler	8d4fb5d13f	We crashed before when '~' was entered.	2003-11-14 04:50:55 +00:00
dchandler	b59b86fd73	Commented this to mention some recent testing.	2003-11-11 03:45:58 +00:00
dchandler	4023be9612	Better prettyprinting. Untested.	2003-11-11 03:43:26 +00:00
dchandler	4e6a9c299f	ACIP % {MTHAR%} and o {Ko} and ^ {^GONG SA} are now supported. A % always causes a warning.	2003-11-11 03:43:11 +00:00
dchandler	2cb90bd231	ACIP->Tibetan converters now warn every time {%} is encountered that U+0F14 might've been intended. The Unicode for ACIP {o} is U+0F37.	2003-11-09 23:15:58 +00:00
dchandler	04816acb74	ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those stacks that use full-form subjoined RA and YA consonants. ACIP {RVA} was converting to the wrong things. The TMW for {RVA} was converting to the wrong ACIP. Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.	2003-11-09 01:07:45 +00:00
dchandler	8193cef5d1	Better comments.	2003-11-09 01:07:07 +00:00
dchandler	3fa417d3ee	phywI, phywU, drwI and drwU now produce vowels and subjoined a-chungs. The Tibetan! 5.1 docs say I and U are not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the I or U request -- we were silent.	2003-11-08 21:53:34 +00:00
dchandler	e058d6252e	phywu and drwu now produce zhabs-kyus. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.	2003-11-08 21:48:08 +00:00
dchandler	55aaeef9d0	l+h+wu now produces a zhabs-kyu. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to l+h+w, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.	2003-11-08 21:23:50 +00:00
dchandler	06edf17b04	Once again, the wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.	2003-11-08 21:17:18 +00:00
dchandler	74d6bc61ab	The wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.	2003-11-08 20:25:16 +00:00
dchandler	a0ae0bf70d	Fixes bug 800164. Jskad users can now enter t+r+n on the keyboard. Wylie Word should work for t+r+n too.	2003-11-08 17:50:10 +00:00
dchandler	e3f1ed5914	Removed a DOS EOF character (^Z). I haven't a clue how it crept in -- the lexer doesn't let that kind of thing get into tsheg bars.	2003-10-27 13:58:45 +00:00
dchandler	94a43d3f39	Now anything not clearly native Tibetan is colored green when coloring is enabled. G'EEm is "native", though -- the only "vowel" that implies non-nativeness is {:}, as in {KA:}.	2003-10-26 18:56:48 +00:00
dchandler	5c36dd81d3	Fixed bug 830332, "Convert selected ACIP=>Tibetan busted".	2003-10-26 18:25:25 +00:00
dchandler	e74547d743	GA-YOGS now parses like G-YOGS and GAYOGS do.	2003-10-26 18:06:38 +00:00
dchandler	61cf19932e	ACIP {B5} and {7'} were problematic; that's fixed.	2003-10-26 17:47:35 +00:00
dchandler	ad7b20e485	Added yet more metadata.	2003-10-26 16:05:30 +00:00
dchandler	1550fee41a	Removed garbage.	2003-10-26 16:05:07 +00:00
dchandler	fe33d67573	Added more metadata. There are 35 million+ tsheg bars here.	2003-10-26 15:35:08 +00:00
dchandler	050666d735	I'm committing this at 1:55 am EST on Sunday, October 26, 2003. There is no compelling technical reason, but this way I get to have two commits that are both before and after each other. Freaky.	2003-10-26 06:56:12 +00:00
dchandler	31b3020d07	Added a test case that runs almost all the tsheg bars from all non-reference, publicly available ACIP files (hundreds of megabytes of them) through the converter. The frequencies of these tsheg bars in in the file, too.	2003-10-26 06:02:48 +00:00
dchandler	7ba1ad0735	Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc.	2003-10-26 02:42:06 +00:00
dchandler	ef24c608bf	Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically. This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.	2003-10-26 02:17:19 +00:00
dchandler	6bda550157	The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question. Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).	2003-10-26 00:32:55 +00:00
dchandler	d99ae50d8a	The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question. Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).	2003-10-26 00:24:28 +00:00
dchandler	306cf2817c	Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone. Added a few new tests.	2003-10-25 21:47:34 +00:00
dchandler	f106deb884	Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone. Added a few new tests.	2003-10-25 21:40:21 +00:00
dchandler	7d24ab393f	Code cleanup.	2003-10-21 03:44:02 +00:00
dchandler	c764eee8d0	Added a new warning for DMAR and others affected similarly affected by prefix rules, where seeing D+MAR, not D-MAR, could have caused an input operator to type in DMAR. This is a "Most" warning, but DMA causes a higher-priority "Some" warning.	2003-10-21 03:36:57 +00:00
dchandler	2f39921381	Added more test cases.	2003-10-21 02:14:45 +00:00
dchandler	2f81a801ef	Added three new kinds of warnings to ACIP->Tibetan conversions.	2003-10-21 02:00:49 +00:00
dchandler	a47af2c165	Bulletproofing -- code cleanup.	2003-10-21 00:31:10 +00:00
dchandler	188b9c322e	Warn about prefix rules only in Most and All modes.	2003-10-21 00:23:55 +00:00
dchandler	1224030898	Speedup.	2003-10-21 00:19:15 +00:00
dchandler	1d9b405bb8	Forgot to add this file earlier.	2003-10-20 13:49:54 +00:00
dchandler	3aa3859354	ACIP->Unicode crash fixed. 5% of the code for support of ACIP->Unicode.rtf is here.	2003-10-19 22:19:16 +00:00
dchandler	5aab4acc93	I've undone the SNYAM'AM == SNYAMA'AM hack. The only occurrence of SNYAM'AM in the ACIP texts I've got is likely a typo, says Robert Chilton. The code would be cleaner if I could bear to delete my terrible hack. Maybe in a month, when I don't feel so dumb for coding it up in the first place. The correct solution for such things is to give the ACIP->Tibetan converters a pre-filter mechanism. This would be before the lexer or part of the lexer (maybe you only want to filter tsheg bars), and it would allow the end user to specify things like "s/SNYAM'AM/S+NYAMA'AMA/g".	2003-10-19 20:48:22 +00:00
dchandler	4b1395e0ba	Jskad has a new feature: Convert Selection from ACIP to Tibetan. It uses the ACIP converter to do its work. Improved some error messages from the ACIP->Tibetan converter.	2003-10-19 20:16:06 +00:00
dchandler	5ce84d4d9a	Tiny code cleanup.	2003-10-19 04:43:34 +00:00

1 2 3 4 5

220 commits