Jskad

Author	SHA1	Message	Date
dchandler	63438d243b	getACIP was getting EWTS, not ACIP.	2004-04-17 15:49:40 +00:00
dchandler	de3a19761e	Fixes for javadoc tool.	2004-04-17 15:48:50 +00:00
dchandler	adcf9de952	Two new tests.	2004-04-17 15:14:46 +00:00
dchandler	1bfd3772e6	TMW->ACIP is much improved. V and W were confused, # and * were confused; many glyphs that should have yielded errors were not. I've added a test case that transforms every TMW glyph save the one with no TM mapping to ACIP. I hand-checked that it was correct. ACIP->TMW is fixed for # and *. I never noticed it, but each needed an extra swoosh (U+0F05). Round-tripping would be good, as would testing real-world use of TMW->ACIP.	2004-04-14 05:44:51 +00:00
dchandler	244a9d1370	TiblEdit's diacritics panel now works -- dia.dat has been added to the repository and to TiblEdit's jar.	2004-04-14 05:12:00 +00:00
dchandler	56a02ba41d	Fixed the worst TMW->ACIP bug, the one regarding U+0F04 and U+0F05. TMW->EWTS requires no context information, but TMW->ACIP does.	2004-04-10 18:26:57 +00:00
dchandler	9e7ccf2894	TMW->Unicode conversions have changed; now using U+0F6A for the stacks whose EWTS transliteration begins with "R+". ACIP->* conversions and test baselines were updated to deal with the "r+..."=>"R+..." change.	2004-04-10 16:58:45 +00:00
dchandler	7eca276a62	TMW->Unicode conversions have changed; now using U+0F6A for the stacks whose EWTS transliteration begins with "R+". ACIP->* conversions and test baselines were updated to deal with the "r+..."=>"R+..." change.	2004-04-10 16:03:25 +00:00
dchandler	aff34174ab	The new EWTS rule regarding R, W, and Y requires that these change. It may also require changes to the following, but I'm going to ask if it really should or not. // Y+Y~185,3~~6,98~1,109~6,120~1,123~1,125~6,106~6,113~f61,fbb // Y+r~186,3~~6,99~1,109~6,120~1,123~1,125~6,106~6,113~f61,fb2 // Y+w~187,3~~6,100~1,109~6,120~1,123~1,125~6,106~6,113~f61,fad // Y+s~188,3~~6,101~1,109~6,120~1,123~1,125~6,106~6,113~f61,fb6 // W+y~69,4~~7,79~1,109~8,121~1,123~1,125~8,107~8,114~f5d,fb1 // W+r~70,4~~7,80~1,109~8,121~1,123~1,125~8,107~8,114~f5d,fb2 // W+n~195,4~~7,81~1,109~8,120~1,123~1,125~8,106~8,113~f5d,fa3 // W+W~194,4~~7,82~1,109~8,120~1,123~1,125~8,106~8,113~f5d,fba	2004-04-08 02:55:59 +00:00
dchandler	76356f4009	ACIP->Tibetan now gives an error when {?} is seen alone (not in {[?]} or {[*FOO?]}, but alone). Bug 860192 is fixed.	2004-03-15 00:49:01 +00:00
dchandler	542fb50bf1	The ~M and ~M` EWTS change had not fully been made. Someone submitted a bug report 911472 that alerted me to this.	2004-03-07 17:02:35 +00:00
dchandler	e0928d8472	New EWTS for 0F82 and 0F83.	2004-03-06 23:00:40 +00:00
amontano	bb8fa6c58f	Now the clear button in the http servlet version actually clears. Also added "synchronized" to some methods to ensure that concurrent threads don't crash.	2004-03-03 00:33:18 +00:00
dchandler	d436a4d462	Removed David Chapman's recently added line for U+0F82 -- a line for U+0F82 already existed, and the new line had incorrect TM and incorrect TMW mappings. I changed the existing line for U+0F82 to use the EWTS {~M`}.	2004-03-02 04:29:41 +00:00
a1tsal	8eaaeaa202	Fix careless error: I had the same TMW character for ~M and ~M`!	2004-02-22 09:14:56 +00:00
a1tsal	b14833b5b9	Change ^M to ~M to conform to spec. Introduce ~M` (for 0F82).	2004-02-20 15:07:49 +00:00
amontano	e5454d3720	Updated the translation tool to conform to the Personal Profile specification of Java. Before it would run in pocket pc's through the more restricted personalJava specification but Sun's vm for pocket pc's project was terminated. Now it is designed to run under IBM's VM for pocket pc's called J9 which implements the Personal Profile specification. Such specification also supports awt, but not swing so still there is no (hope for) support of Tibetan script in the pocket pc's,	2004-02-07 18:21:17 +00:00
dchandler	274e1736be	Deleted cut-and-paste goof.	2004-01-17 19:45:31 +00:00
dchandler	c69ba26c60	TString now has tracks what Roman transliteration system it is using. Next up is to make ACIPConverter handle EWTS or ACIP TStrings.	2004-01-17 19:28:54 +00:00
dchandler	48b4c5cb07	Added a Unicode->ASCII dump for debugging *->Unicode conversions. To use it, use 'java -cp Jskad.jar org.thdl.util.VerboseUnicodeDump'.	2004-01-17 17:10:12 +00:00
dchandler	9dd95c5524	I saw this error when I wasn't expecting it, so now, curious, I print more details.	2004-01-17 16:51:33 +00:00
dchandler	4dd40809a5	A user reported that q` caused a crash with TCC keyboard #1 . Fixed. TCC keyboard #1 does not support q~ though.	2003-12-21 06:27:36 +00:00
dchandler	c1aa81e943	RFE 860190: ACIP->Unicode now gives a warning when it outputs something that can't be represented in TMW.	2003-12-16 07:45:40 +00:00
dchandler	848349fd3a	More tests.	2003-12-15 08:16:06 +00:00
dchandler	e7a9e7968f	ACIP->Unicode now uses two characters for consonants instead of one. This matches the dislike for characters like U+0F77 etc. ACIP->Tibetan was not giving an error for BCWA because it parsed like BCVA. Fixed.	2003-12-15 07:32:14 +00:00
dchandler	e9f7b2dfed	If you want curly brackets around folio markers, you'll have to set the system property thdl.acip.to.x.output.curly.brackets.around.folio.markers to true.	2003-12-14 08:47:03 +00:00
dchandler	8664571577	Warnings were not being detected correctly. Fixed. ACIP->Unicode uses U+0020, ' ', for whitespace. ACIP->TMW uses the TMW whitespace for whitespace.	2003-12-14 08:38:10 +00:00
dchandler	01e65176d4	Using less memory and time to figure out if warnings occurred.	2003-12-14 07:41:15 +00:00
dchandler	76c2e969ac	Fixed ACIP->Unicode bug for YYE etc., things with full-formed subjoined consonants and vowels. Fixed ACIP->TMW for YYA etc., things with full-formed subjoined consonants.	2003-12-14 07:36:21 +00:00
dchandler	f625c937ee	ACIP {B} was not being treated like {BA}; instead, an error was resulting. All the five prefixes were affected.	2003-12-14 05:54:07 +00:00
dchandler	a0e6db11c0	Very minor cleanup.	2003-12-13 21:59:31 +00:00
dchandler	4c30657afa	Adding tests for an ACIP keyboard that will never work correctly, and probably never even be useful. But they were lying around from a while back, so here are the tests.	2003-12-13 21:34:33 +00:00
dchandler	02967539b0	Slightly improved Jskad's internal documentation. Links to converters' docs.	2003-12-10 07:04:35 +00:00
dchandler	581643cf59	{DAN,\nLHAG} used to be treated like {DAN, LHAG} but that got broken. Fixed. Added tests for lexer's handling of ACIP spaces etc.	2003-12-10 06:55:16 +00:00
dchandler	8e673bbc2c	{NGA,} becomes {NGA\u0f0c,} now instead of {NGA\u0f0b,}. Note: ACIP->Unicode for {NGA,} was not giving the Unicode that {NGA\u0f0b,} gives before.	2003-12-10 06:50:14 +00:00
dchandler	a466bad939	ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.	2003-12-08 07:51:45 +00:00
dchandler	a39c5c12b0	ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.	2003-12-08 07:15:27 +00:00
dchandler	8f7322a056	Use absolute paths when invoking the external viewer; it doesn't know what our current working directory is.	2003-12-08 06:53:37 +00:00
dchandler	b617f761d5	ACIP->TMW for {^GONG SA } used to fail; fixed.	2003-12-07 20:05:41 +00:00
dchandler	115534e688	ACIP->TMW for {^GONG SA } used to fail because we had \u0F38 in the ToWylie section. Now it's in the <?Input:Numbers?> section because I didn't want to introduce a new section. If WylieWord has trouble due to this misuse of the 'numbers' category, we'll introduce a new category, 'other'. TMW->EWTS improved as a result -- {\u0F38.gonga sa } is produced now where {\u0F38agonga sa } was once produced. Even the better version is imperfect; see bug 855877.	2003-12-07 19:40:59 +00:00
dchandler	597cf408dd	Fixed help message.	2003-12-07 19:10:36 +00:00
dchandler	4adf87c401	Updated comments only.	2003-12-06 20:36:56 +00:00
dchandler	3f18623977	Added comments only.	2003-12-06 20:26:45 +00:00
dchandler	6232ee9170	Added comments referring to a user guide in development now.	2003-12-06 20:26:15 +00:00
dchandler	c43e9a446b	Revamped some ACIP->Tibetan error messages.	2003-12-06 20:19:40 +00:00
dchandler	c9c771d1ee	ACIP {&}, as in {KO&HAm,}, is supported.	2003-11-30 02:18:59 +00:00
dchandler	ac412c994b	Now {Pm} is treated like {PAm}; {Pm:} is like {PAm:}; {P:} is like {PA:}.	2003-11-30 02:06:48 +00:00
dchandler	e7c4cc1874	Updated to be in sync with latest EWTS draft.	2003-11-29 22:59:39 +00:00
dchandler	ffd041e32c	ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}. Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided. Also, we've traded some speed for memory -- DuffCode now uses bytes, not ints.	2003-11-29 22:57:12 +00:00
dchandler	dfaae4be93	ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}. Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided.	2003-11-29 22:56:18 +00:00
dchandler	946d8cbc72	Updated the code I used for testing to generate the file containing all glyphs in TM and all glyphs but one in TMW.	2003-11-29 16:22:26 +00:00
dchandler	16bfeac641	These issues are non-issues; removing these comments.	2003-11-25 00:31:33 +00:00
dchandler	d3d0ff23a8	Chris Fynn and Tony Duff answered my questions about U+0F3F and U+0F3E.	2003-11-25 00:28:18 +00:00
dchandler	b8608797aa	Updated the code I used for testing to generate the file containing all glyphs in TM and all glyphs but one in TMW.	2003-11-24 05:59:32 +00:00
dchandler	8d18ac53cb	N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate. Found another inconsistency between Unicode and the TM/TMW docs. I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day. Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate. A few TMW glyphs still give errors. Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested). The baselines have not been hand-verified, but changes will be detected.	2003-11-24 05:50:42 +00:00
dchandler	5d053b41fe	Found another inconsistency between Unicode and the TM/TMW docs. I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day. Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate. A few TMW glyphs still give errors. Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested). The baselines have not been hand-verified, but changes will be detected.	2003-11-24 05:49:15 +00:00
dchandler	9a247f5932	N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate.	2003-11-24 04:55:11 +00:00
dchandler	1ec668c018	Dza is not in the latest EWTS draft.	2003-11-24 04:28:55 +00:00
dchandler	f76c089366	Using Y, R, and W everywhere needed. R+... is never needed in TM/TMW, I concluded (with 50% certainty).	2003-11-24 04:05:59 +00:00
dchandler	08c676c186	Bug fixes. Plus, now 99% in sync with the new EWTS draft. Search for 'DLC' to find a few open issues. Readded the line for reversed dza; it should never have been deleted, as that breaks TM<->TMW. I tested the whole mapping by hand once; this incident shows that automation is very helpful. '{' and '}' were swapped... The Unicode for something was "", not "none". +R, +W, +Y, R+ now in use (though more testing is needed)	2003-11-24 02:40:40 +00:00
dchandler	216c5b0d54	Fixed TWM->Wylie for achen. I even tested this by pretending achen could take a da prefix (when in reality it takes no prefixes).	2003-11-23 01:22:27 +00:00
dchandler	37e8dfa917	The menu now says (Buggy) in front of "Convert Selection from Wylie to Tibetan" because this feature is, you guessed it, buggy.	2003-11-22 22:48:41 +00:00
dchandler	113480a882	X is now better supported, so this changed.	2003-11-15 20:00:59 +00:00
dchandler	8d4fb5d13f	We crashed before when '~' was entered.	2003-11-14 04:50:55 +00:00
dchandler	b59b86fd73	Commented this to mention some recent testing.	2003-11-11 03:45:58 +00:00
dchandler	4023be9612	Better prettyprinting. Untested.	2003-11-11 03:43:26 +00:00
dchandler	4e6a9c299f	ACIP % {MTHAR%} and o {Ko} and ^ {^GONG SA} are now supported. A % always causes a warning.	2003-11-11 03:43:11 +00:00
dchandler	2cb90bd231	ACIP->Tibetan converters now warn every time {%} is encountered that U+0F14 might've been intended. The Unicode for ACIP {o} is U+0F37.	2003-11-09 23:15:58 +00:00
dchandler	084e12a02c	Import Wylie is a buggy feature. The menu now calls it "(Buggy) Import Wylie...". t+s+w doesn't even convert correctly! Bug-free EWTS->TMW using the org.thdl.tib.text.ttt codebase will be here soon.	2003-11-09 01:25:58 +00:00
dchandler	04816acb74	ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those stacks that use full-form subjoined RA and YA consonants. ACIP {RVA} was converting to the wrong things. The TMW for {RVA} was converting to the wrong ACIP. Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.	2003-11-09 01:07:45 +00:00
dchandler	8193cef5d1	Better comments.	2003-11-09 01:07:07 +00:00
dchandler	dbd9c80ca0	Special tests for rwa and r+wa, which are the only two different stacks with the same hash key modulo - and +.	2003-11-09 01:06:26 +00:00
dchandler	85e1e0701e	Fixed crashing bug in Import Wylie.	2003-11-08 23:32:53 +00:00
dchandler	8fbd8850f8	New feature: Convert Selection from TWM to ACIP.	2003-11-08 23:22:06 +00:00
dchandler	bab47c4910	There are now extensive tests to make sure that each Tibetan stack in TMW can be typed in using EWTS and correctly converted to TMW and then back to EWTS. These tests unearthed new bugs in the Tibetan! 5.1 docs.	2003-11-08 22:11:24 +00:00
dchandler	3fa417d3ee	phywI, phywU, drwI and drwU now produce vowels and subjoined a-chungs. The Tibetan! 5.1 docs say I and U are not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the I or U request -- we were silent.	2003-11-08 21:53:34 +00:00
dchandler	e058d6252e	phywu and drwu now produce zhabs-kyus. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.	2003-11-08 21:48:08 +00:00
dchandler	55aaeef9d0	l+h+wu now produces a zhabs-kyu. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to l+h+w, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.	2003-11-08 21:23:50 +00:00
dchandler	06edf17b04	Once again, the wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.	2003-11-08 21:17:18 +00:00
dchandler	f626a04d72	Tests t+r+n glyph.	2003-11-08 20:28:34 +00:00
dchandler	74d6bc61ab	The wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.	2003-11-08 20:25:16 +00:00
dchandler	a0ae0bf70d	Fixes bug 800164. Jskad users can now enter t+r+n on the keyboard. Wylie Word should work for t+r+n too.	2003-11-08 17:50:10 +00:00
dchandler	0ac90d7c0f	Nathanial -> Nathaniel	2003-11-08 03:42:51 +00:00
dchandler	e3f1ed5914	Removed a DOS EOF character (^Z). I haven't a clue how it crept in -- the lexer doesn't let that kind of thing get into tsheg bars.	2003-10-27 13:58:45 +00:00
dchandler	94a43d3f39	Now anything not clearly native Tibetan is colored green when coloring is enabled. G'EEm is "native", though -- the only "vowel" that implies non-nativeness is {:}, as in {KA:}.	2003-10-26 18:56:48 +00:00
dchandler	5c36dd81d3	Fixed bug 830332, "Convert selected ACIP=>Tibetan busted".	2003-10-26 18:25:25 +00:00
dchandler	e74547d743	GA-YOGS now parses like G-YOGS and GAYOGS do.	2003-10-26 18:06:38 +00:00
dchandler	61cf19932e	ACIP {B5} and {7'} were problematic; that's fixed.	2003-10-26 17:47:35 +00:00
dchandler	ad7b20e485	Added yet more metadata.	2003-10-26 16:05:30 +00:00
dchandler	1550fee41a	Removed garbage.	2003-10-26 16:05:07 +00:00
dchandler	fe33d67573	Added more metadata. There are 35 million+ tsheg bars here.	2003-10-26 15:35:08 +00:00
dchandler	050666d735	I'm committing this at 1:55 am EST on Sunday, October 26, 2003. There is no compelling technical reason, but this way I get to have two commits that are both before and after each other. Freaky.	2003-10-26 06:56:12 +00:00
dchandler	31b3020d07	Added a test case that runs almost all the tsheg bars from all non-reference, publicly available ACIP files (hundreds of megabytes of them) through the converter. The frequencies of these tsheg bars in in the file, too.	2003-10-26 06:02:48 +00:00
dchandler	7ba1ad0735	Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc.	2003-10-26 02:42:06 +00:00
dchandler	ef24c608bf	Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically. This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.	2003-10-26 02:17:19 +00:00
dchandler	6bda550157	The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question. Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).	2003-10-26 00:32:55 +00:00
dchandler	d99ae50d8a	The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question. Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).	2003-10-26 00:24:28 +00:00
dchandler	1415fc43e3	The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.	2003-10-26 00:21:54 +00:00
dchandler	306cf2817c	Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone. Added a few new tests.	2003-10-25 21:47:34 +00:00
dchandler	f106deb884	Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone. Added a few new tests.	2003-10-25 21:40:21 +00:00

1 2 3 4 5 ...

485 commits