Jskad

Author	SHA1	Message	Date
amontano	bb8fa6c58f	Now the clear button in the http servlet version actually clears. Also added "synchronized" to some methods to ensure that concurrent threads don't crash.	2004-03-03 00:33:18 +00:00
dchandler	d436a4d462	Removed David Chapman's recently added line for U+0F82 -- a line for U+0F82 already existed, and the new line had incorrect TM and incorrect TMW mappings. I changed the existing line for U+0F82 to use the EWTS {~M`}.	2004-03-02 04:29:41 +00:00
a1tsal	8eaaeaa202	Fix careless error: I had the same TMW character for ~M and ~M`!	2004-02-22 09:14:56 +00:00
a1tsal	b14833b5b9	Change ^M to ~M to conform to spec. Introduce ~M` (for 0F82).	2004-02-20 15:07:49 +00:00
amontano	e5454d3720	Updated the translation tool to conform to the Personal Profile specification of Java. Before it would run in pocket pc's through the more restricted personalJava specification but Sun's vm for pocket pc's project was terminated. Now it is designed to run under IBM's VM for pocket pc's called J9 which implements the Personal Profile specification. Such specification also supports awt, but not swing so still there is no (hope for) support of Tibetan script in the pocket pc's,	2004-02-07 18:21:17 +00:00
dchandler	274e1736be	Deleted cut-and-paste goof.	2004-01-17 19:45:31 +00:00
dchandler	c69ba26c60	TString now has tracks what Roman transliteration system it is using. Next up is to make ACIPConverter handle EWTS or ACIP TStrings.	2004-01-17 19:28:54 +00:00
dchandler	48b4c5cb07	Added a Unicode->ASCII dump for debugging *->Unicode conversions. To use it, use 'java -cp Jskad.jar org.thdl.util.VerboseUnicodeDump'.	2004-01-17 17:10:12 +00:00
dchandler	6fdb2a26bb	Added a Unicode->ASCII dump for debugging *->Unicode conversions. To use it, use 'java -cp Jskad.jar org.thdl.util.VerboseUnicodeDump'.	2004-01-17 16:52:38 +00:00
dchandler	9dd95c5524	I saw this error when I wasn't expecting it, so now, curious, I print more details.	2004-01-17 16:51:33 +00:00
dchandler	4dd40809a5	A user reported that q` caused a crash with TCC keyboard #1 . Fixed. TCC keyboard #1 does not support q~ though.	2003-12-21 06:27:36 +00:00
dchandler	c1aa81e943	RFE 860190: ACIP->Unicode now gives a warning when it outputs something that can't be represented in TMW.	2003-12-16 07:45:40 +00:00
dchandler	848349fd3a	More tests.	2003-12-15 08:16:06 +00:00
dchandler	e7a9e7968f	ACIP->Unicode now uses two characters for consonants instead of one. This matches the dislike for characters like U+0F77 etc. ACIP->Tibetan was not giving an error for BCWA because it parsed like BCVA. Fixed.	2003-12-15 07:32:14 +00:00
dchandler	e9f7b2dfed	If you want curly brackets around folio markers, you'll have to set the system property thdl.acip.to.x.output.curly.brackets.around.folio.markers to true.	2003-12-14 08:47:03 +00:00
dchandler	8664571577	Warnings were not being detected correctly. Fixed. ACIP->Unicode uses U+0020, ' ', for whitespace. ACIP->TMW uses the TMW whitespace for whitespace.	2003-12-14 08:38:10 +00:00
dchandler	01e65176d4	Using less memory and time to figure out if warnings occurred.	2003-12-14 07:41:15 +00:00
dchandler	76c2e969ac	Fixed ACIP->Unicode bug for YYE etc., things with full-formed subjoined consonants and vowels. Fixed ACIP->TMW for YYA etc., things with full-formed subjoined consonants.	2003-12-14 07:36:21 +00:00
dchandler	f625c937ee	ACIP {B} was not being treated like {BA}; instead, an error was resulting. All the five prefixes were affected.	2003-12-14 05:54:07 +00:00
dchandler	a0e6db11c0	Very minor cleanup.	2003-12-13 21:59:31 +00:00
dchandler	4c30657afa	Adding tests for an ACIP keyboard that will never work correctly, and probably never even be useful. But they were lying around from a while back, so here are the tests.	2003-12-13 21:34:33 +00:00
dchandler	02967539b0	Slightly improved Jskad's internal documentation. Links to converters' docs.	2003-12-10 07:04:35 +00:00
dchandler	581643cf59	{DAN,\nLHAG} used to be treated like {DAN, LHAG} but that got broken. Fixed. Added tests for lexer's handling of ACIP spaces etc.	2003-12-10 06:55:16 +00:00
dchandler	8e673bbc2c	{NGA,} becomes {NGA\u0f0c,} now instead of {NGA\u0f0b,}. Note: ACIP->Unicode for {NGA,} was not giving the Unicode that {NGA\u0f0b,} gives before.	2003-12-10 06:50:14 +00:00
dchandler	a466bad939	ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.	2003-12-08 07:51:45 +00:00
dchandler	a39c5c12b0	ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.	2003-12-08 07:15:27 +00:00
dchandler	8f7322a056	Use absolute paths when invoking the external viewer; it doesn't know what our current working directory is.	2003-12-08 06:53:37 +00:00
dchandler	b617f761d5	ACIP->TMW for {^GONG SA } used to fail; fixed.	2003-12-07 20:05:41 +00:00
dchandler	115534e688	ACIP->TMW for {^GONG SA } used to fail because we had \u0F38 in the ToWylie section. Now it's in the <?Input:Numbers?> section because I didn't want to introduce a new section. If WylieWord has trouble due to this misuse of the 'numbers' category, we'll introduce a new category, 'other'. TMW->EWTS improved as a result -- {\u0F38.gonga sa } is produced now where {\u0F38agonga sa } was once produced. Even the better version is imperfect; see bug 855877.	2003-12-07 19:40:59 +00:00
dchandler	597cf408dd	Fixed help message.	2003-12-07 19:10:36 +00:00
dchandler	4adf87c401	Updated comments only.	2003-12-06 20:36:56 +00:00
dchandler	3f18623977	Added comments only.	2003-12-06 20:26:45 +00:00
dchandler	6232ee9170	Added comments referring to a user guide in development now.	2003-12-06 20:26:15 +00:00
dchandler	c43e9a446b	Revamped some ACIP->Tibetan error messages.	2003-12-06 20:19:40 +00:00
dchandler	c9c771d1ee	ACIP {&}, as in {KO&HAm,}, is supported.	2003-11-30 02:18:59 +00:00
dchandler	ac412c994b	Now {Pm} is treated like {PAm}; {Pm:} is like {PAm:}; {P:} is like {PA:}.	2003-11-30 02:06:48 +00:00
dchandler	e7c4cc1874	Updated to be in sync with latest EWTS draft.	2003-11-29 22:59:39 +00:00
dchandler	ffd041e32c	ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}. Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided. Also, we've traded some speed for memory -- DuffCode now uses bytes, not ints.	2003-11-29 22:57:12 +00:00
dchandler	dfaae4be93	ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}. Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided.	2003-11-29 22:56:18 +00:00
dchandler	946d8cbc72	Updated the code I used for testing to generate the file containing all glyphs in TM and all glyphs but one in TMW.	2003-11-29 16:22:26 +00:00
dchandler	16bfeac641	These issues are non-issues; removing these comments.	2003-11-25 00:31:33 +00:00
dchandler	d3d0ff23a8	Chris Fynn and Tony Duff answered my questions about U+0F3F and U+0F3E.	2003-11-25 00:28:18 +00:00
dchandler	b8608797aa	Updated the code I used for testing to generate the file containing all glyphs in TM and all glyphs but one in TMW.	2003-11-24 05:59:32 +00:00
dchandler	8d18ac53cb	N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate. Found another inconsistency between Unicode and the TM/TMW docs. I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day. Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate. A few TMW glyphs still give errors. Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested). The baselines have not been hand-verified, but changes will be detected.	2003-11-24 05:50:42 +00:00
dchandler	5d053b41fe	Found another inconsistency between Unicode and the TM/TMW docs. I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day. Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate. A few TMW glyphs still give errors. Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested). The baselines have not been hand-verified, but changes will be detected.	2003-11-24 05:49:15 +00:00
dchandler	9a247f5932	N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate.	2003-11-24 04:55:11 +00:00
dchandler	1ec668c018	Dza is not in the latest EWTS draft.	2003-11-24 04:28:55 +00:00
dchandler	f76c089366	Using Y, R, and W everywhere needed. R+... is never needed in TM/TMW, I concluded (with 50% certainty).	2003-11-24 04:05:59 +00:00
dchandler	08c676c186	Bug fixes. Plus, now 99% in sync with the new EWTS draft. Search for 'DLC' to find a few open issues. Readded the line for reversed dza; it should never have been deleted, as that breaks TM<->TMW. I tested the whole mapping by hand once; this incident shows that automation is very helpful. '{' and '}' were swapped... The Unicode for something was "", not "none". +R, +W, +Y, R+ now in use (though more testing is needed)	2003-11-24 02:40:40 +00:00
dchandler	216c5b0d54	Fixed TWM->Wylie for achen. I even tested this by pretending achen could take a da prefix (when in reality it takes no prefixes).	2003-11-23 01:22:27 +00:00
dchandler	37e8dfa917	The menu now says (Buggy) in front of "Convert Selection from Wylie to Tibetan" because this feature is, you guessed it, buggy.	2003-11-22 22:48:41 +00:00
dchandler	113480a882	X is now better supported, so this changed.	2003-11-15 20:00:59 +00:00
dchandler	8d4fb5d13f	We crashed before when '~' was entered.	2003-11-14 04:50:55 +00:00
dchandler	b59b86fd73	Commented this to mention some recent testing.	2003-11-11 03:45:58 +00:00
dchandler	4023be9612	Better prettyprinting. Untested.	2003-11-11 03:43:26 +00:00
dchandler	4e6a9c299f	ACIP % {MTHAR%} and o {Ko} and ^ {^GONG SA} are now supported. A % always causes a warning.	2003-11-11 03:43:11 +00:00
dchandler	2cb90bd231	ACIP->Tibetan converters now warn every time {%} is encountered that U+0F14 might've been intended. The Unicode for ACIP {o} is U+0F37.	2003-11-09 23:15:58 +00:00
dchandler	084e12a02c	Import Wylie is a buggy feature. The menu now calls it "(Buggy) Import Wylie...". t+s+w doesn't even convert correctly! Bug-free EWTS->TMW using the org.thdl.tib.text.ttt codebase will be here soon.	2003-11-09 01:25:58 +00:00
dchandler	04816acb74	ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those stacks that use full-form subjoined RA and YA consonants. ACIP {RVA} was converting to the wrong things. The TMW for {RVA} was converting to the wrong ACIP. Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.	2003-11-09 01:07:45 +00:00
dchandler	8193cef5d1	Better comments.	2003-11-09 01:07:07 +00:00
dchandler	dbd9c80ca0	Special tests for rwa and r+wa, which are the only two different stacks with the same hash key modulo - and +.	2003-11-09 01:06:26 +00:00
dchandler	85e1e0701e	Fixed crashing bug in Import Wylie.	2003-11-08 23:32:53 +00:00
dchandler	8fbd8850f8	New feature: Convert Selection from TWM to ACIP.	2003-11-08 23:22:06 +00:00
dchandler	bab47c4910	There are now extensive tests to make sure that each Tibetan stack in TMW can be typed in using EWTS and correctly converted to TMW and then back to EWTS. These tests unearthed new bugs in the Tibetan! 5.1 docs.	2003-11-08 22:11:24 +00:00
dchandler	3fa417d3ee	phywI, phywU, drwI and drwU now produce vowels and subjoined a-chungs. The Tibetan! 5.1 docs say I and U are not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the I or U request -- we were silent.	2003-11-08 21:53:34 +00:00
dchandler	e058d6252e	phywu and drwu now produce zhabs-kyus. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.	2003-11-08 21:48:08 +00:00
dchandler	55aaeef9d0	l+h+wu now produces a zhabs-kyu. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to l+h+w, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.	2003-11-08 21:23:50 +00:00
dchandler	06edf17b04	Once again, the wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.	2003-11-08 21:17:18 +00:00
dchandler	f626a04d72	Tests t+r+n glyph.	2003-11-08 20:28:34 +00:00
dchandler	74d6bc61ab	The wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.	2003-11-08 20:25:16 +00:00
dchandler	a0ae0bf70d	Fixes bug 800164. Jskad users can now enter t+r+n on the keyboard. Wylie Word should work for t+r+n too.	2003-11-08 17:50:10 +00:00
dchandler	0ac90d7c0f	Nathanial -> Nathaniel	2003-11-08 03:42:51 +00:00
dchandler	e3f1ed5914	Removed a DOS EOF character (^Z). I haven't a clue how it crept in -- the lexer doesn't let that kind of thing get into tsheg bars.	2003-10-27 13:58:45 +00:00
dchandler	94a43d3f39	Now anything not clearly native Tibetan is colored green when coloring is enabled. G'EEm is "native", though -- the only "vowel" that implies non-nativeness is {:}, as in {KA:}.	2003-10-26 18:56:48 +00:00
dchandler	5c36dd81d3	Fixed bug 830332, "Convert selected ACIP=>Tibetan busted".	2003-10-26 18:25:25 +00:00
dchandler	e74547d743	GA-YOGS now parses like G-YOGS and GAYOGS do.	2003-10-26 18:06:38 +00:00
dchandler	61cf19932e	ACIP {B5} and {7'} were problematic; that's fixed.	2003-10-26 17:47:35 +00:00
dchandler	ad7b20e485	Added yet more metadata.	2003-10-26 16:05:30 +00:00
dchandler	1550fee41a	Removed garbage.	2003-10-26 16:05:07 +00:00
dchandler	fe33d67573	Added more metadata. There are 35 million+ tsheg bars here.	2003-10-26 15:35:08 +00:00
dchandler	050666d735	I'm committing this at 1:55 am EST on Sunday, October 26, 2003. There is no compelling technical reason, but this way I get to have two commits that are both before and after each other. Freaky.	2003-10-26 06:56:12 +00:00
dchandler	31b3020d07	Added a test case that runs almost all the tsheg bars from all non-reference, publicly available ACIP files (hundreds of megabytes of them) through the converter. The frequencies of these tsheg bars in in the file, too.	2003-10-26 06:02:48 +00:00
dchandler	7ba1ad0735	Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc.	2003-10-26 02:42:06 +00:00
dchandler	ef24c608bf	Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically. This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.	2003-10-26 02:17:19 +00:00
dchandler	6bda550157	The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question. Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).	2003-10-26 00:32:55 +00:00
dchandler	d99ae50d8a	The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question. Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).	2003-10-26 00:24:28 +00:00
dchandler	1415fc43e3	The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.	2003-10-26 00:21:54 +00:00
dchandler	306cf2817c	Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone. Added a few new tests.	2003-10-25 21:47:34 +00:00
dchandler	f106deb884	Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone. Added a few new tests.	2003-10-25 21:40:21 +00:00
dchandler	af013a6a39	I renamed this function a while ago.	2003-10-22 02:49:16 +00:00
dchandler	7d24ab393f	Code cleanup.	2003-10-21 03:44:02 +00:00
dchandler	c764eee8d0	Added a new warning for DMAR and others affected similarly affected by prefix rules, where seeing D+MAR, not D-MAR, could have caused an input operator to type in DMAR. This is a "Most" warning, but DMA causes a higher-priority "Some" warning.	2003-10-21 03:36:57 +00:00
dchandler	2f39921381	Added more test cases.	2003-10-21 02:14:45 +00:00
dchandler	2f81a801ef	Added three new kinds of warnings to ACIP->Tibetan conversions.	2003-10-21 02:00:49 +00:00
dchandler	a47af2c165	Bulletproofing -- code cleanup.	2003-10-21 00:31:10 +00:00
dchandler	188b9c322e	Warn about prefix rules only in Most and All modes.	2003-10-21 00:23:55 +00:00
dchandler	1224030898	Speedup.	2003-10-21 00:19:15 +00:00
dchandler	1d9b405bb8	Forgot to add this file earlier.	2003-10-20 13:49:54 +00:00
dchandler	5d9305c9d5	"Browse..." buttons are smart about file types now.	2003-10-19 23:17:25 +00:00
dchandler	3aa3859354	ACIP->Unicode crash fixed. 5% of the code for support of ACIP->Unicode.rtf is here.	2003-10-19 22:19:16 +00:00
dchandler	5aab4acc93	I've undone the SNYAM'AM == SNYAMA'AM hack. The only occurrence of SNYAM'AM in the ACIP texts I've got is likely a typo, says Robert Chilton. The code would be cleaner if I could bear to delete my terrible hack. Maybe in a month, when I don't feel so dumb for coding it up in the first place. The correct solution for such things is to give the ACIP->Tibetan converters a pre-filter mechanism. This would be before the lexer or part of the lexer (maybe you only want to filter tsheg bars), and it would allow the end user to specify things like "s/SNYAM'AM/S+NYAMA'AMA/g".	2003-10-19 20:48:22 +00:00
dchandler	4b1395e0ba	Jskad has a new feature: Convert Selection from ACIP to Tibetan. It uses the ACIP converter to do its work. Improved some error messages from the ACIP->Tibetan converter.	2003-10-19 20:16:06 +00:00
dchandler	5ce84d4d9a	Tiny code cleanup.	2003-10-19 04:43:34 +00:00
dchandler	0edebd55d7	We were dying in the "can ts+h take a ga prefix?" check for GTZHAN.	2003-10-19 03:47:33 +00:00
dchandler	47648186b4	Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences.	2003-10-18 18:34:49 +00:00
dchandler	e5534f69ee	Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences.	2003-10-18 18:29:46 +00:00
dchandler	557ed7ed44	DKY'O etc. weren't being handled properly by ACIP->Tibetan. Now they are.	2003-10-18 17:49:29 +00:00
dchandler	e799438f86	CVS ignoring backup files.	2003-10-18 17:47:56 +00:00
dchandler	3b55ea509f	Prefix rules have changed. A few are gone; a few new ones are here. I've implemented here a list that Robert Chilton sent me in private correspondence. He doesn't describe it as definitive, but since it affects ACIP->Tibetan conversions, and it's the best I've got, here they are. There's still an optional warning about "Hey, prefix rules matter for this tsheg bar." I've left in a few rules that I didn't find on RC's list; I've asked him to look into these further.	2003-10-18 05:48:53 +00:00
dchandler	f28bee4c71	The appendage 'um is here too.	2003-10-18 05:10:49 +00:00
dchandler	8c99adeb63	TMW->EWTS, TMW->ACIP, and ACIP->Unicode/TMW now support more appendages. Personal correspondence with Robert Chilton led me to support, besides 'am, 'ang, 'o, 'i, and 'u, the following: 'e (used in foreign transliteration) 'ongs 'is 'os 'ur 'us 'ung	2003-10-18 03:04:47 +00:00
dchandler	5e18feb47d	ACIP now stacks greedily. TTTTTA is T+T+T+T+TA, even though that stack doesn't exist in TM or TMW. Robert Chilton, in personal correspondence, agreed that this is the way to do things. ACIP handles the appendages 'AM, 'ANG, 'US, 'UR, 'I, 'O, and 'U correctly.	2003-10-16 04:15:10 +00:00
dchandler	5f4fbfab7c	Bulletproofing and debugging support.	2003-10-16 04:13:14 +00:00
dchandler	129ebccd67	In TCC #1 keyboard, h>cj now works. I may have fixed this in a terrible way, breaking other things even. Hard to say because I don't really understand the code I changed. But DuffPaneTest passes. If we ever clean up the keyboards, the changes made here to tcc_keyboard.ini should probably be undone.	2003-10-12 18:16:17 +00:00
dchandler	d7fdacfcdc	Open menu is now Open..., Save as is now Save as...	2003-10-12 18:12:19 +00:00
dchandler	8dbfff17e1	All .rtf and .Rtf and .RTF files are selectable now.	2003-10-12 18:11:50 +00:00
dchandler	35209ce7fd	I'm going to have to debug this, and the tab stops make the source unreadable. I don't like messing with whitespace, but it seems like I'll be the main maintainer for a while, and the people after me can use cvs diff -wb. So I'm untabifying.	2003-10-12 16:44:28 +00:00
dchandler	749b8d6727	Added toString for debugging.	2003-10-04 16:33:47 +00:00
dchandler	b983af8031	r-t, not rt. This was why converting 'brtul' from TMW to Wylie didn't work.	2003-10-04 16:33:23 +00:00
dchandler	6a11eddb1e	Warning level "None" wasn't working.	2003-10-04 16:12:48 +00:00
dchandler	b10098cc61	"Most" warnings now excludes "the last stack has no vowel", making it much more useful.	2003-10-04 15:10:18 +00:00
dchandler	ee50291ed4	Andres found that "THAG PA" caused a NullPointerException. That's fixed. Renamed ACIPString to TString -- we'll use this for EWTS and ACIP both. TMW->ACIP for TMW9.61 should work now.	2003-10-04 01:22:59 +00:00
amontano	c8927b827c	Fixed bugs in the scanner. Added reference to yogacara bhumi in the about window.	2003-09-23 19:05:23 +00:00
amontano	e89c49651c	Now translation tool accepts synonyms separated by ';' in the entry field.	2003-09-14 05:56:20 +00:00
dchandler	115d0e0e6c	Fixed ACIP->TMW vowels like 'I etc. Fixed ACIP->Unicode/TMW for BDE, which should be B-DE, not B+DE, because the former is legal Tibetan. The ACIP->EWTS subroutine has improved. TMW->Wylie and TMW->ACIP are improved in error cases. TMW->ACIP has friendly embedded error messages now.	2003-09-12 05:06:37 +00:00
dchandler	16817d0b8e	Fixed Javadocs.	2003-09-10 01:19:05 +00:00
amontano	cc853be387	Fixed a bug with regards to the word order in the servlet version.	2003-09-09 16:02:03 +00:00
amontano	1467f9cd3f	Fixed display of servlet version and added option to include links to other versions. See http://iris.lib.virginia.edu/tibetan/servlet/org.thdl.tib.scanner.OnLineScannerFilter?thdlBanner=on	2003-09-08 21:32:40 +00:00
amontano	73d01111ca	Fixed the "clicking on the translate button makes the thdl menu go away" error. on the servlet version of the translation tool.	2003-09-08 16:39:18 +00:00
amontano	07fbbcaf45	Solved some sorting errors with the servlet version. Also if the service parameter thdlBanner=anything is sent, the THDL's java script menu is displayed (if it is running on the thdl server). There is still a bug. Menu goes away when pressing "translate" button. See: http://iris.lib.virginia.edu/tibetan/servlet/org.thdl.tib.scanner.OnLineScannerFilter?thdlBanner=on	2003-09-08 08:12:56 +00:00
dchandler	e42d76b3b8	Nicer default Latin font for ACIP->* conversions. Performance improvement in non-color-coding mode.	2003-09-07 22:08:35 +00:00
dchandler	6872ea8028	Corrected the usage info.	2003-09-07 22:08:00 +00:00
dchandler	d8657abd44	ACIP font shrinking as in {KA (GA)} is now supported.	2003-09-07 18:30:59 +00:00
dchandler	07e360d9a8	The ACIP {NYA%} is supported. {NYAo} and {NYAx} are confusing to me, because I don't know which glyphs o and x correspond to. For that reason, they cause ERRORs. The proposed THDL Extended Wylie ~X and X is now used for U+0F35 and U+0F37 respectively.	2003-09-07 16:19:50 +00:00
amontano	f57cdda867	Now translation tool displays to where is it connected	2003-09-07 03:40:51 +00:00
amontano	b489034598	Fixed a call to a deprecated method	2003-09-07 03:39:08 +00:00
dchandler	0d6d6ed611	Added GUI support for color-coding. Added support for color-coding and choosing the warning level to TibetanConverter. Better error checking in the GUI converter.	2003-09-06 22:56:10 +00:00
dchandler	1308f14807	sanskrit=green, prefix-rule-afflicted-tsheg-bar=yellow	2003-09-05 06:05:46 +00:00
dchandler	899b042ec0	Preliminary, untested color support in ACIP->TMW conversion.	2003-09-05 05:54:35 +00:00
dchandler	717c3b94f3	Fixed ACIP->Unicode spaces/tshegs and newlines, especially with shads. "NGA," becomes "NGA-tsheg-," automatically now.	2003-09-05 05:08:47 +00:00
dchandler	5c240ac072	From the converter GUI, you can now choose TMW->ACIP text and TMW->Wylie text. All the conversions show you which format they take as input and which format they give as output. File filter for ACIP files added. The GUI converter suggests a file extension wisely. Fixed newline bug in ACIP->Unicode converter.	2003-09-05 02:05:34 +00:00
dchandler	4abbf6db37	--to-acip-text and --to-wylie-text added; these get you text files, not RTF files like --to-acip and --to-wylie do. The GUI converter doesn't yet allow you to get text files.	2003-09-04 05:16:47 +00:00
dchandler	cc615f34df	ACIP->TMW and ACIP->Unicode have my pre-stamp of non-approval. Except for (NYAx} and {NYAo}, they're as good as I'll get them without input from experts of the employ of a complementary, syllabary-based approach.	2003-09-04 04:34:18 +00:00
dchandler	ae7a7577bc	ACIP->TMW and ACIP->Unicode are now smart about when a newline is really a newline and when a space is really a tsheg. The space in {KA ,MDO} is a tsheg, but the space in {GA ,MDO} is not.	2003-09-04 04:13:01 +00:00
dchandler	d2749cecd0	ACIP->TMW and ACIP->Unicode are now smart about when a newline is really a newline and when a space is really a tsheg. The space in {KA ,MDO} is a tsheg, but the space in {GA ,MDO} is not.	2003-09-04 04:04:21 +00:00
dchandler	72e531e515	Use shortened 'dreng-bu, not regular. As per TM glyphs. I suspect that the following would look better with shortened 'dreng-bu also, but I'm sticking with the TM/TMW docs: dz+r~137,2~~4,46~1,110~4,120~1,123~1,126~4,106~4,113~f5b,fb2 dz+w~138,2~~4,47~1,110~4,120~1,123~1,126~4,106~4,113~f5b,fad dz+h~139,2~~4,48~1,110~4,120~1,123~1,126~4,106~4,113~0F5C dz+h+y~140,2~~4,49~1,110~4,121~1,123~1,126~4,107~4,114~f5c,fb1 dz+h+r~141,2~~4,50~1,110~4,121~1,123~1,126~4,107~4,114~f5c,fb2 dz+h+l~249,2~~4,51~1,110~4,123~1,123~1,126~4,110~4,117~f5c,fb3 dz+h+w~143,2~~4,52~1,110~4,122~1,123~1,126~4,108~4,115~f5c,fad	2003-09-04 03:46:35 +00:00
a1tsal	2f58ec2760	A bunch of Sanskrit stacks of the form ts+... and dz+...had 1,125 for their drengbu, but that is actually a naro. I changed it to 1,123 (which is one of the two drengbus).	2003-09-04 02:06:58 +00:00
dchandler	316f59107b	A preliminary TMW->ACIP converter is here. There are known bugs, mostly with rare punctuation.	2003-09-02 06:39:33 +00:00
dchandler	cc9ab06864	Added utility routine. Better comments.	2003-08-31 20:38:28 +00:00
dchandler	045c4069c9	Preliminary ACIP->TMW support is in place. {DU} gives you something less beautiful than what Jskad would give, so more work is needed.	2003-08-31 16:06:35 +00:00
a1tsal	1f4d53be2e	Moved ^M to punctuation section. Removed obsolete comment.	2003-08-31 00:44:23 +00:00
a1tsal	522812996e	Remove unused sections of tibwn.ini.	2003-08-31 00:34:15 +00:00
dchandler	dd22e161a5	Code cleanup for Jskad's Tibetan font converter GUI.	2003-08-30 05:01:15 +00:00
dchandler	896344f2d1	David Chapman removed some lines from tibwn.ini. That breaks TM<->TMW mappings, so I've put them back, but with the EWTS non-correspondences \tmwXYYY. Jskad no longer supports superscribed or subscribed numerals, because EWTS does not.	2003-08-26 01:28:02 +00:00
a1tsal	ccdebf6719	Removed half numbers (no longer in EWTS) Brought <?Other?> closer to EWTS Removed __TILDE__ (no longer in EWTS) Changed M^ to ^M per new EWTS draft Added ai, au, -i from WW tibwn.ini -- they were missing in this version	2003-08-25 23:19:48 +00:00
dchandler	1982c5847b	Jskad's converter now has ACIP-to-Unicode built in. There are known bugs; it is pre-alpha. It's usable, though, and finds tons of errors in ACIP input files, with the user deciding just how pedantic to be. The biggest outstanding bug is the silent one: treating { }, space, as tsheg instead of whitespace when we ought to know better.	2003-08-24 06:40:53 +00:00
dchandler	d5ad760230	TMW->Wylie conversion now takes advantage of prefix rules, the rules that say "ya can take a ga prefix" etc. The ACIP->Unicode converter now gives warnings (optionally, and by default, inline). This converter now produces output even when lexical errors occur, but the output has errors and warnings inline.	2003-08-23 22:03:37 +00:00
dchandler	21ef657921	I'd broken the ACIP->Wylie for ACIP vowels {'A}, {'I}, etc.	2003-08-22 05:13:32 +00:00
dchandler	1afb3a0fdd	ACIP->Unicode, without going through TMW, is now possible, so long as \, the Sanskrit virama, is not used. Of the 1370-odd ACIP texts I've got here, about 57% make it through the gauntlet (fewer if you demand a vowel or disambiguator on every stack of a non-Tibetan tsheg bar).	2003-08-18 02:38:54 +00:00
dchandler	245aac4911	I'm now stricter about accepting alphabetic characters. F, Q, X, a, b, c, d, e, ... do not belong in ACIP, so the scanner rejects them. This should make it even easier to distinguish automatically between Tibetan and English texts.	2003-08-17 02:38:58 +00:00
dchandler	39451d8879	Fixed a couple of small bugs. Only 250 errors are reported now; this is important if you try to convert an English document.	2003-08-17 02:12:49 +00:00
dchandler	4581a2d8ab	Improved the ACIP scanner (the part of the converter that says, "This is a correction, that's a comment, this is Tibetan, that's Latin (English), that's Tibetan inter-tsheg-bar punctuation, etc.) It now accepts more real-world ACIP files, i.e. it handles illegal constructs. The error checking is more user-friendly. There are now tests. Added some tsheg bars that Peter E. Hauer of Linguasoft sent me to the tests. Many thanks, Peter. I still need to implement rules that say, "This is not Tibetan, it must be Sanskrit, because that letter doesn't take a MA prefix."	2003-08-17 01:45:55 +00:00
dchandler	0b91ed0beb	I've improved the ACIP tsheg bar scanner to handle a lot of illegal constructions that occur in practice.	2003-08-16 16:13:53 +00:00
amontano	2a57439516	Updated the info displayed on the about window.	2003-08-14 14:16:49 +00:00
amontano	da384c6c2f	Now when loading, takes the default font options from the DuffPane.	2003-08-14 14:16:23 +00:00
dchandler	2b59d9838d	I now have a function that takes as input a String of ACIP and breaks up that String into tsheg bars, punctuation, etc., while finding errors. I've tested it some, but I'm not yet committing the tests. Next step: a converter that takes an ACIP file as input and outputs TMW+Latin.	2003-08-14 05:10:47 +00:00
dchandler	57f506384f	The ACIP->Tibetan converter now has perfect low-level functionality, and it has the capability to produce error messages and warnings that make sense to the user. One can now get the correct parse, if one exists, for an ACIP tsheg bar. One could even feed in ACIP and get a list of warnings about things as innocuous as PADMA, which a dumb converter would have trouble with. One could then turn ACIP into well-behaved ACIP for that dumb converter, if you really wanted to. Still to do: o Scan ACIP files into tsheg bars. o Produce TMW/Latin (from which you can get Unicode, etc.). o E-mail the illegal tsheg bars to the ACIP fellows so they can fix the affected documents (most of the Kangyur has unparseable creatures).	2003-08-12 04:13:11 +00:00
dchandler	87266646fb	Removed misinformation.	2003-08-10 19:33:01 +00:00
dchandler	e21d3774a9	Added an unfinished ACIP->Tibetan converter. Once it works properly for ACIP, it'll easily be made to work as a perfect EWTS Wylie->Tibetan converter. It has an extensive suite of tests for the existing functionality.	2003-08-10 19:30:07 +00:00
dchandler	39e0435b6b	Refactored this code so that Wylie->Tibetan and ACIP->Tibetan conversions can make use of it. Hooray for reuse.	2003-08-10 19:02:56 +00:00
dchandler	bcf1c12b6a	We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie. Our disambiguation is now perfect, happening when and only when it is necessary. These are all illegal, so it shouldn't affect many existing conversions. But if there were typos, it could.	2003-08-10 18:46:01 +00:00
dchandler	9093fd3c05	We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie. Our disambiguation is now perfect, happening when and only when it is necessary. These are all illegal, so it shouldn't affect many existing conversions. But if there were typos, it could.	2003-08-10 18:38:20 +00:00
dchandler	251d8feae5	brtan now gives TMW->Wylie brtan, not b.rtan. Etc. See bug report http://sourceforge.net/tracker/index.php?func=detail&aid=785791&group_id=61934&atid=502515.	2003-08-09 17:48:40 +00:00
dchandler	7dffc47cb7	'bad now gives TMW->Wylie 'bad, not TMW->Wylie 'abd. Andres came across this one, so we've added it to the list of ambiguous three-consonant combos.	2003-08-09 17:05:43 +00:00
amontano	52cdc17794	Added support for multiple keyboards and ability to set the preferences for size of tibetan font and type and size of roman font.	2003-08-09 08:00:58 +00:00
amontano	8e4b508de8	Made a new class for the preference window so that other software (i.e. the translation tool) can use re-use that same code to set up the attributes of the tibetan and roman fonts.	2003-08-09 07:57:21 +00:00
amontano	ef0df405d9	Redesigned the interface of the handheld version.	2003-08-03 06:29:08 +00:00
amontano	2b5a5fe67a	Got rid of redundant code	2003-08-03 06:28:22 +00:00
amontano	cce779bf88	Added a wizard window to avoid as much as possible using the command line. This way through clicking on the application through the wizard one can choose to connect to the available on-line dicts, open a local dict or generate a dict database.	2003-08-03 06:27:30 +00:00
dchandler	4caeafa1b1	You shouldn't have one of these without the other, now that there are two. This way neither TM nor TMW fonts will be loaded.	2003-07-26 00:55:32 +00:00
dchandler	2bb499e5a7	This was dying with a NullPointerException when you started it up using 'ant tt-run' with no dictionary. Now it starts up and shows you a nice error message, "Dictionary could not be loaded!", instead.	2003-07-26 00:53:59 +00:00
dchandler	e198519c5f	Jskad now supports EWTS ~, i.e. TMW8.91.	2003-07-25 02:35:31 +00:00
amontano	5df9b5b91a	now supports sorting	2003-07-25 01:43:58 +00:00
amontano	97f5fe91b3	when invalid wylie is encountered, instead of displaying a message it raises an exception.	2003-07-25 01:43:18 +00:00
amontano	7cdbf33333	changed it to support for 30 dictionaries (instead of just 15)	2003-07-25 01:42:17 +00:00
amontano	7b04d7bca5	changed the "about" info	2003-07-25 01:41:30 +00:00
dchandler	a7f0c35738	Added a test for ts.ha vs. tsha ambiguity; there is no ambiguity.	2003-07-18 03:51:29 +00:00
dchandler	dc454b8c0c	More test cases related to the following: The Tibetan d.za was being converted into the Wylie dza incorrectly. This is a rare case, but I want TMW->Wylie to be perfectly unambiguous.	2003-07-18 02:31:02 +00:00
dchandler	f8c959bfb0	The Tibetan d.za was being converted into the Wylie dza incorrectly. This is a rare case, but I want TMW->Wylie to be perfectly unambiguous.	2003-07-18 00:30:27 +00:00
dchandler	1c29566aee	I'm now using the Unix diff built in to Apache Jakarta Commons JRCS (which I found on suigeneris.org, not apache.org) in order to bulletproof the Tibetan Converter tests. They used to fail due to nondeterminism in the Java RTF writer; they should no longer fail. I've also changed it so that the Tibetan Converter tests run in headless mode, which means that they'll run on the nightly builds server.	2003-07-14 12:26:26 +00:00
dchandler	06fb77a82b	Initial revision	2003-07-14 12:22:29 +00:00
dchandler	f900154e7a	Tests disambiguation in TMW->Wylie conversion.	2003-07-14 12:21:02 +00:00
dchandler	0622ac5062	Jskad no longer relies on the <?Consonants?>, <?Vowels?>, <?Other?>, or <?Numbers?> commands; it instead hard-codes the appropriate comma- delimited lists. This is cleaner because WylieWord and Jskad had different values for these lists.	2003-07-14 12:19:46 +00:00
dchandler	fb85f6e8ce	Fix comment.	2003-07-14 12:17:04 +00:00
dchandler	79b3b97326	Remove warning message from menu item.	2003-07-13 23:19:11 +00:00
dchandler	c986684beb	Updated help to talk about new features.	2003-07-13 22:51:35 +00:00
dchandler	f695b1a6c1	Updated baselines because conversions have improved since the last update.	2003-07-13 19:14:41 +00:00
dchandler	d10f97fc06	Disambiguation was not being used appropriately. This makes previous TMW->Wylie conversions with the new-and-improved TMW->Wylie algorithm faulty. Now I'm using it a little more than you need to, e.g. b.lha instead of blha is generated because bla and b.la are ambiguous.	2003-07-13 19:14:15 +00:00
dchandler	96afae795c	Disambiguation was not being used appropriately. This makes previous TMW->Wylie conversions with the new-and-improved TMW->Wylie algorithm faulty. Now I'm using it a little more than you need to, e.g. b.lha instead of blha is generated because bla and b.la are ambiguous.	2003-07-13 18:46:29 +00:00
dchandler	802e0cb588	If this method uses the Wylie representation, you get an infinite recursion when you do a TMW->Wylie conversion for a document with glyphs that have no known Wylie.	2003-07-13 17:40:02 +00:00

... 2 3 4 5 6 ...

647 commits