Jskad

Author	SHA1	Message	Date
dchandler	a39c5c12b0	ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.	2003-12-08 07:15:27 +00:00
dchandler	4adf87c401	Updated comments only.	2003-12-06 20:36:56 +00:00
dchandler	ffd041e32c	ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}. Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided. Also, we've traded some speed for memory -- DuffCode now uses bytes, not ints.	2003-11-29 22:57:12 +00:00
dchandler	9a247f5932	N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate.	2003-11-24 04:55:11 +00:00
dchandler	1ec668c018	Dza is not in the latest EWTS draft.	2003-11-24 04:28:55 +00:00
dchandler	216c5b0d54	Fixed TWM->Wylie for achen. I even tested this by pretending achen could take a da prefix (when in reality it takes no prefixes).	2003-11-23 01:22:27 +00:00
dchandler	8d4fb5d13f	We crashed before when '~' was entered.	2003-11-14 04:50:55 +00:00
dchandler	f28bee4c71	The appendage 'um is here too.	2003-10-18 05:10:49 +00:00
dchandler	8c99adeb63	TMW->EWTS, TMW->ACIP, and ACIP->Unicode/TMW now support more appendages. Personal correspondence with Robert Chilton led me to support, besides 'am, 'ang, 'o, 'i, and 'u, the following: 'e (used in foreign transliteration) 'ongs 'is 'os 'ur 'us 'ung	2003-10-18 03:04:47 +00:00
dchandler	5f4fbfab7c	Bulletproofing and debugging support.	2003-10-16 04:13:14 +00:00
dchandler	115d0e0e6c	Fixed ACIP->TMW vowels like 'I etc. Fixed ACIP->Unicode/TMW for BDE, which should be B-DE, not B+DE, because the former is legal Tibetan. The ACIP->EWTS subroutine has improved. TMW->Wylie and TMW->ACIP are improved in error cases. TMW->ACIP has friendly embedded error messages now.	2003-09-12 05:06:37 +00:00
dchandler	16817d0b8e	Fixed Javadocs.	2003-09-10 01:19:05 +00:00
dchandler	717c3b94f3	Fixed ACIP->Unicode spaces/tshegs and newlines, especially with shads. "NGA," becomes "NGA-tsheg-," automatically now.	2003-09-05 05:08:47 +00:00
dchandler	316f59107b	A preliminary TMW->ACIP converter is here. There are known bugs, mostly with rare punctuation.	2003-09-02 06:39:33 +00:00
dchandler	896344f2d1	David Chapman removed some lines from tibwn.ini. That breaks TM<->TMW mappings, so I've put them back, but with the EWTS non-correspondences \tmwXYYY. Jskad no longer supports superscribed or subscribed numerals, because EWTS does not.	2003-08-26 01:28:02 +00:00
dchandler	d5ad760230	TMW->Wylie conversion now takes advantage of prefix rules, the rules that say "ya can take a ga prefix" etc. The ACIP->Unicode converter now gives warnings (optionally, and by default, inline). This converter now produces output even when lexical errors occur, but the output has errors and warnings inline.	2003-08-23 22:03:37 +00:00
dchandler	87266646fb	Removed misinformation.	2003-08-10 19:33:01 +00:00
dchandler	39e0435b6b	Refactored this code so that Wylie->Tibetan and ACIP->Tibetan conversions can make use of it. Hooray for reuse.	2003-08-10 19:02:56 +00:00
dchandler	9093fd3c05	We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie. Our disambiguation is now perfect, happening when and only when it is necessary. These are all illegal, so it shouldn't affect many existing conversions. But if there were typos, it could.	2003-08-10 18:38:20 +00:00
dchandler	251d8feae5	brtan now gives TMW->Wylie brtan, not b.rtan. Etc. See bug report http://sourceforge.net/tracker/index.php?func=detail&aid=785791&group_id=61934&atid=502515.	2003-08-09 17:48:40 +00:00
dchandler	e198519c5f	Jskad now supports EWTS ~, i.e. TMW8.91.	2003-07-25 02:35:31 +00:00
dchandler	f8c959bfb0	The Tibetan d.za was being converted into the Wylie dza incorrectly. This is a rare case, but I want TMW->Wylie to be perfectly unambiguous.	2003-07-18 00:30:27 +00:00
dchandler	0622ac5062	Jskad no longer relies on the <?Consonants?>, <?Vowels?>, <?Other?>, or <?Numbers?> commands; it instead hard-codes the appropriate comma- delimited lists. This is cleaner because WylieWord and Jskad had different values for these lists.	2003-07-14 12:19:46 +00:00
dchandler	96afae795c	Disambiguation was not being used appropriately. This makes previous TMW->Wylie conversions with the new-and-improved TMW->Wylie algorithm faulty. Now I'm using it a little more than you need to, e.g. b.lha instead of blha is generated because bla and b.la are ambiguous.	2003-07-13 18:46:29 +00:00
dchandler	6677d1e245	Code cleanup.	2003-07-13 16:53:03 +00:00
dchandler	85176cd9f3	Put in a fix for a new bug in Swing's RTF support. This bug is w.r.t. escapes like \bullet, \emdash, etc., and this fix only works for Windows or OS/2 RTF files, not for Mac RTF files. So if you want a TM->TMW conversion to work, use MS Word for Windows, not for the Mac.	2003-07-11 13:30:22 +00:00
dchandler	d726bc0258	A couple of changes to TMW->Unicode thanks to Than's reply to my questions.	2003-07-09 01:44:15 +00:00
dchandler	02558a1d78	Jskad supports <7, >8, etc. again; it no longer supports the punctuation '<' and '>'. The current keyboard implementation makes this an either-or proposition, when fundamentally it need not be. Added a <?Numbers?> command and an <?Input:Numbers?> command to tibwn.ini; broke the numbers apart from the consonants. This facilitates the new-and-improved Tibetan->Wylie conversion. Tibetan->Wylie is now done by forming legal tsheg-bars. A legal tsheg bar is converted into perfect THDL Wylie. See code comments to learn what it thinks is a legal tsheg-bar, but it inlcudes bskyUMbsH minus the trailing punctuation (H), e.g. Illegal sequences, such as runs of transliterated Sanskrit, are turned into unambiguous Wylie; each glyph is followed by a vowel or a disambiguator ('.'). I've made it so that the illegal sequences are as beautiful as possible. You get 'pad+me', for example, not the equivalent but uglier 'pad+m.e.'.	2003-07-08 14:30:17 +00:00
dchandler	a463b686b3	Jskad now ships with both TibetanMachine and TibetanMachineWeb fonts by default, not just TMW. Thus users need not install these fonts on their systems.	2003-07-05 18:00:29 +00:00
dchandler	a48ec641d5	Better error messages in TMW->Wylie conversions. The user knows what's up.	2003-07-01 03:43:33 +00:00
dchandler	229536884f	I've validated by hand the TM<->TMW mappings. A few things changed, so no previous TM->TMW or TMW->TM conversions can be trusted.	2003-06-30 02:24:11 +00:00
dchandler	3f76c3692d	Fixed Javadoc warnings.	2003-06-29 15:37:35 +00:00
dchandler	7938648ca8	TM->TMW conversion has no known bugs. Oddballs have been comprehensively handled.	2003-06-29 03:03:07 +00:00
dchandler	4e279defb4	Fixed a couple of array bounds checks. Added support for two more oddballs. Deprecated the oddball lookup method because it drops up to 30 glyphs in TibetanMachine. The correct solution is to transform the RTF before Java's busted RTF readers ever see it. \'97 becomes \u151, e.g.	2003-06-28 16:33:58 +00:00
dchandler	f547734043	Added Than's converter GUI code; adapted it to work with Jskad's converters. TMW->Unicode now uses Ximalaya by default.	2003-06-24 03:02:29 +00:00
dchandler	917864574c	Fixed a logic bug in mapTMWtoTM and mapTMtoTMW. You can now specify which Unicode font to use via 'java -Dthdl.tmw.to.unicode.font=Ximalaya ...'.	2003-06-23 01:58:11 +00:00
dchandler	1f4343bed0	TMW->TM, TM->TMW, and TMW->Unicode conversions are all (at least 2) orders of magnitude faster.	2003-06-22 22:10:58 +00:00
dchandler	da70434e52	Jskad now allows for TMW->Unicode conversion.	2003-06-15 16:27:36 +00:00
dchandler	af5b95b08d	A TMW->Unicode table is here. Note these issues, however: Is the EWTS '_' to be represented as U+0020, or is it a wider space? Does TMW9.42, Dza, map to U+0F5F,U+0F39? Does TMW6.60, r+y, map to U+0F62,U+0FBB or to U+0F6A,U+0FBB? (Likewise with r+w, TMW6.61, TMW6.62, etc.) Is U+0F7E a bindu? What Unicode does TMW7.96 map to, for example? What does TMW7.91 map to? Should TMW8.97 and TMW8.98 map to swastiskas elsewhere in Unicode? If so, which codepoints? Likewise with TMW9.60, a Chinese character. Does TMW7.68 map to U+0F39? Does TMW7.74, the ITHI secret sign, have a Unicode mapping? f68,fa0,f80,f72 comes close, but fa0 would be too large, wouldn't it? What Unicode does TMW9.61 map to? Is it for sequences like f40,f7c,f60,f72? Or is it for f60,f72,f7c?	2003-06-15 03:25:45 +00:00
dchandler	189fef9aec	Made Jskad smart enough to handle a few more EWTS characters; some it can only convert to Wylie, others are live key sequences. This will make converting the shechen documents go more smoothly.	2003-06-09 13:35:43 +00:00
dchandler	09a55110b7	Handles more TibetanMachine oddballs.	2003-06-09 02:01:13 +00:00
dchandler	b9219640e5	Handles more TibetanMachine oddballs.	2003-06-09 01:53:01 +00:00
dchandler	e97e1c8464	Handles more TibetanMachine oddballs.	2003-06-09 01:20:32 +00:00
dchandler	0f724989b5	The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90. I've fixed that. I've also added a couple of Unicode mappings to give a flavor for how multi-codepoint mappings will be represented. TM->TMW conversion takes about 1 second per thousand glyphs on my PIII-550.	2003-06-01 23:05:32 +00:00
dchandler	e2caf99085	Some code cleanup. tibwn.ini must now have, in the Unicode column, either nothing, or 0FXX(,0FXX)*. E.g., 0F04,0F05 is valid. Debugging code ensures this is the case.	2003-06-01 18:09:49 +00:00
dchandler	0235263ddf	TM->TMW and TMW->TM conversion in RTF is now supported. I've noticed that formatting is mostly OK but sometimes gets bungled slightly. I tried everything I could think of, and now I'm passing the buck to Java's RTF support. TMW_RTF_TO_THDL_WYLIE (now misnamed) support TMW->TM conversion (but not TM->TMW). There is an automated test case for a TMW->TM conversion. I have full confidence in this conversion. Even the smallest glitch in the core functionality (not formatting) would surprise me. Note that the JUnit test TMW_RTF_TO_THDL_WYLIETest sometimes fails due to one- or two-line diffs between the actual and expected outputs. This is because Java's RTF support is not deterministic, I'm guessing, and is not a real failure. I'm too lazy to make a more elaborate sed/diff mechanism that works on all platforms, and that would complicate the build anyway.	2003-05-31 23:21:29 +00:00
dchandler	bfacd6c998	Accurate TM->TMW and TMW->TM mappings are now available. I've verified this extensively and have full confidence that these mappings agree with Tony Duff's Tibetan! 5.1 documentation (except as described below). To get them, I had to disregard Tony Duff's tables for a few glyphs: the characters with ordinal 32 and 45 (space and hyphen in Roman ASCII, space and tsheg in Tibetan). For these glyphs, we must have mappings from TibetanMachineSkt4.32 to something, etc., and those mappings were not present. I've normalized the mapping for these glyphs, as it is arbitrary because the same two glyphs just appear fifteen times each.	2003-05-31 20:13:15 +00:00
dchandler	a4bc23a9ab	Made performance improvements, doc improvements, and code cleanup to DuffCode.	2003-05-31 17:02:06 +00:00
dchandler	a144b125ca	I've made Jskad adhere to the THDL Extended Wylie spec. Some punctuation has changed {@, #, %, and $}. Fixed some errors in tibwn.ini so that all the TM<->TMW mappings are correct.	2003-05-26 13:11:51 +00:00
dchandler	ec7fec695f	Added some automated JUnit tests for TMW_RTF_TO_THDL_WYLIE.	2003-05-18 17:17:52 +00:00

1 2

73 commits