Jskad

Author	SHA1	Message	Date
a1tsal	ccdebf6719	Removed half numbers (no longer in EWTS) Brought <?Other?> closer to EWTS Removed __TILDE__ (no longer in EWTS) Changed M^ to ^M per new EWTS draft Added ai, au, -i from WW tibwn.ini -- they were missing in this version	2003-08-25 23:19:48 +00:00
dchandler	1982c5847b	Jskad's converter now has ACIP-to-Unicode built in. There are known bugs; it is pre-alpha. It's usable, though, and finds tons of errors in ACIP input files, with the user deciding just how pedantic to be. The biggest outstanding bug is the silent one: treating { }, space, as tsheg instead of whitespace when we ought to know better.	2003-08-24 06:40:53 +00:00
dchandler	d5ad760230	TMW->Wylie conversion now takes advantage of prefix rules, the rules that say "ya can take a ga prefix" etc. The ACIP->Unicode converter now gives warnings (optionally, and by default, inline). This converter now produces output even when lexical errors occur, but the output has errors and warnings inline.	2003-08-23 22:03:37 +00:00
dchandler	21ef657921	I'd broken the ACIP->Wylie for ACIP vowels {'A}, {'I}, etc.	2003-08-22 05:13:32 +00:00
dchandler	1afb3a0fdd	ACIP->Unicode, without going through TMW, is now possible, so long as \, the Sanskrit virama, is not used. Of the 1370-odd ACIP texts I've got here, about 57% make it through the gauntlet (fewer if you demand a vowel or disambiguator on every stack of a non-Tibetan tsheg bar).	2003-08-18 02:38:54 +00:00
dchandler	245aac4911	I'm now stricter about accepting alphabetic characters. F, Q, X, a, b, c, d, e, ... do not belong in ACIP, so the scanner rejects them. This should make it even easier to distinguish automatically between Tibetan and English texts.	2003-08-17 02:38:58 +00:00
dchandler	39451d8879	Fixed a couple of small bugs. Only 250 errors are reported now; this is important if you try to convert an English document.	2003-08-17 02:12:49 +00:00
dchandler	4581a2d8ab	Improved the ACIP scanner (the part of the converter that says, "This is a correction, that's a comment, this is Tibetan, that's Latin (English), that's Tibetan inter-tsheg-bar punctuation, etc.) It now accepts more real-world ACIP files, i.e. it handles illegal constructs. The error checking is more user-friendly. There are now tests. Added some tsheg bars that Peter E. Hauer of Linguasoft sent me to the tests. Many thanks, Peter. I still need to implement rules that say, "This is not Tibetan, it must be Sanskrit, because that letter doesn't take a MA prefix."	2003-08-17 01:45:55 +00:00
dchandler	0b91ed0beb	I've improved the ACIP tsheg bar scanner to handle a lot of illegal constructions that occur in practice.	2003-08-16 16:13:53 +00:00
dchandler	2b59d9838d	I now have a function that takes as input a String of ACIP and breaks up that String into tsheg bars, punctuation, etc., while finding errors. I've tested it some, but I'm not yet committing the tests. Next step: a converter that takes an ACIP file as input and outputs TMW+Latin.	2003-08-14 05:10:47 +00:00
dchandler	57f506384f	The ACIP->Tibetan converter now has perfect low-level functionality, and it has the capability to produce error messages and warnings that make sense to the user. One can now get the correct parse, if one exists, for an ACIP tsheg bar. One could even feed in ACIP and get a list of warnings about things as innocuous as PADMA, which a dumb converter would have trouble with. One could then turn ACIP into well-behaved ACIP for that dumb converter, if you really wanted to. Still to do: o Scan ACIP files into tsheg bars. o Produce TMW/Latin (from which you can get Unicode, etc.). o E-mail the illegal tsheg bars to the ACIP fellows so they can fix the affected documents (most of the Kangyur has unparseable creatures).	2003-08-12 04:13:11 +00:00
dchandler	87266646fb	Removed misinformation.	2003-08-10 19:33:01 +00:00
dchandler	e21d3774a9	Added an unfinished ACIP->Tibetan converter. Once it works properly for ACIP, it'll easily be made to work as a perfect EWTS Wylie->Tibetan converter. It has an extensive suite of tests for the existing functionality.	2003-08-10 19:30:07 +00:00
dchandler	39e0435b6b	Refactored this code so that Wylie->Tibetan and ACIP->Tibetan conversions can make use of it. Hooray for reuse.	2003-08-10 19:02:56 +00:00
dchandler	9093fd3c05	We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie. Our disambiguation is now perfect, happening when and only when it is necessary. These are all illegal, so it shouldn't affect many existing conversions. But if there were typos, it could.	2003-08-10 18:38:20 +00:00
dchandler	251d8feae5	brtan now gives TMW->Wylie brtan, not b.rtan. Etc. See bug report http://sourceforge.net/tracker/index.php?func=detail&aid=785791&group_id=61934&atid=502515.	2003-08-09 17:48:40 +00:00
dchandler	7dffc47cb7	'bad now gives TMW->Wylie 'bad, not TMW->Wylie 'abd. Andres came across this one, so we've added it to the list of ambiguous three-consonant combos.	2003-08-09 17:05:43 +00:00
dchandler	e198519c5f	Jskad now supports EWTS ~, i.e. TMW8.91.	2003-07-25 02:35:31 +00:00
amontano	97f5fe91b3	when invalid wylie is encountered, instead of displaying a message it raises an exception.	2003-07-25 01:43:18 +00:00
dchandler	f8c959bfb0	The Tibetan d.za was being converted into the Wylie dza incorrectly. This is a rare case, but I want TMW->Wylie to be perfectly unambiguous.	2003-07-18 00:30:27 +00:00
dchandler	0622ac5062	Jskad no longer relies on the <?Consonants?>, <?Vowels?>, <?Other?>, or <?Numbers?> commands; it instead hard-codes the appropriate comma- delimited lists. This is cleaner because WylieWord and Jskad had different values for these lists.	2003-07-14 12:19:46 +00:00
dchandler	fb85f6e8ce	Fix comment.	2003-07-14 12:17:04 +00:00
dchandler	96afae795c	Disambiguation was not being used appropriately. This makes previous TMW->Wylie conversions with the new-and-improved TMW->Wylie algorithm faulty. Now I'm using it a little more than you need to, e.g. b.lha instead of blha is generated because bla and b.la are ambiguous.	2003-07-13 18:46:29 +00:00
dchandler	802e0cb588	If this method uses the Wylie representation, you get an infinite recursion when you do a TMW->Wylie conversion for a document with glyphs that have no known Wylie.	2003-07-13 17:40:02 +00:00
dchandler	a86a0f235b	I was missing a break; statement; this caused an Error to be thrown during some TMW->Wylie conversions. No conversions were erroneous, though.	2003-07-13 17:38:00 +00:00
dchandler	6677d1e245	Code cleanup.	2003-07-13 16:53:03 +00:00
dchandler	3b6eaa792e	Fixed javadocs.	2003-07-11 13:33:30 +00:00
dchandler	85176cd9f3	Put in a fix for a new bug in Swing's RTF support. This bug is w.r.t. escapes like \bullet, \emdash, etc., and this fix only works for Windows or OS/2 RTF files, not for Mac RTF files. So if you want a TM->TMW conversion to work, use MS Word for Windows, not for the Mac.	2003-07-11 13:30:22 +00:00
dchandler	d726bc0258	A couple of changes to TMW->Unicode thanks to Than's reply to my questions.	2003-07-09 01:44:15 +00:00
dchandler	02558a1d78	Jskad supports <7, >8, etc. again; it no longer supports the punctuation '<' and '>'. The current keyboard implementation makes this an either-or proposition, when fundamentally it need not be. Added a <?Numbers?> command and an <?Input:Numbers?> command to tibwn.ini; broke the numbers apart from the consonants. This facilitates the new-and-improved Tibetan->Wylie conversion. Tibetan->Wylie is now done by forming legal tsheg-bars. A legal tsheg bar is converted into perfect THDL Wylie. See code comments to learn what it thinks is a legal tsheg-bar, but it inlcudes bskyUMbsH minus the trailing punctuation (H), e.g. Illegal sequences, such as runs of transliterated Sanskrit, are turned into unambiguous Wylie; each glyph is followed by a vowel or a disambiguator ('.'). I've made it so that the illegal sequences are as beautiful as possible. You get 'pad+me', for example, not the equivalent but uglier 'pad+m.e.'.	2003-07-08 14:30:17 +00:00
dchandler	23d18c925f	Tibetan! 5.1's docs were again faulty. fa and va were getting the wrong vowels.	2003-07-08 02:59:17 +00:00
dchandler	72d2eee503	Code cleanup.	2003-07-05 19:26:58 +00:00
dchandler	a463b686b3	Jskad now ships with both TibetanMachine and TibetanMachineWeb fonts by default, not just TMW. Thus users need not install these fonts on their systems.	2003-07-05 18:00:29 +00:00
dchandler	6c286573ba	Fixed Javadocs.	2003-07-04 00:12:59 +00:00
dchandler	a48ec641d5	Better error messages in TMW->Wylie conversions. The user knows what's up.	2003-07-01 03:43:33 +00:00
dchandler	3113a4b8de	Some of the \tmw80.. mappings were out of date. 3+1/2 is not EWTS; took these out.	2003-07-01 03:42:30 +00:00
dchandler	6151a7bc94	TMW->Wylie now occurs in the TibetanDocument, not in DuffPane, which means that the command-line tool can finally function with a headless graphics device. Hopefully it will speed things up, too. It also means that entering Roman text into the TMW->Unicode conversion and TMW->TM conversion will be easy.	2003-07-01 01:21:57 +00:00
dchandler	61d29fc355	The TMW->Wylie mapping was busted w.r.t. tshegs. Also, I now map both TMW7.90 and TMW7.91 to EWTS 'M'.	2003-07-01 00:17:18 +00:00
dchandler	229536884f	I've validated by hand the TM<->TMW mappings. A few things changed, so no previous TM->TMW or TMW->TM conversions can be trusted.	2003-06-30 02:24:11 +00:00
dchandler	b16fb8a85c	This is correct; the Tibetan! 5.1 documentation is not. This affects TM->TMW conversions. See http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515 for a full list of Tibetan! 5.1 documentation errors.	2003-06-29 22:11:00 +00:00
dchandler	aedef4b44d	An error now appears if you try to convert from format A to format B but no glyphs in format A appear. In this case, it is likely that you meant to convert a different file or do a different conversion.	2003-06-29 21:31:48 +00:00
dchandler	3f76c3692d	Fixed Javadoc warnings.	2003-06-29 15:37:35 +00:00
dchandler	7938648ca8	TM->TMW conversion has no known bugs. Oddballs have been comprehensively handled.	2003-06-29 03:03:07 +00:00
dchandler	4e279defb4	Fixed a couple of array bounds checks. Added support for two more oddballs. Deprecated the oddball lookup method because it drops up to 30 glyphs in TibetanMachine. The correct solution is to transform the RTF before Java's busted RTF readers ever see it. \'97 becomes \u151, e.g.	2003-06-28 16:33:58 +00:00
dchandler	2a359c45ef	Bad conversions were not leaving the unconvertable characters at the beginning of the document as they should and as they are documented to. They now do, and they bracket the bad characters with the TM or TMW for U+0F3C on the left and the TM or TMW for U+0F3D on the right. Some cleanup.	2003-06-28 16:20:19 +00:00
dchandler	f547734043	Added Than's converter GUI code; adapted it to work with Jskad's converters. TMW->Unicode now uses Ximalaya by default.	2003-06-24 03:02:29 +00:00
dchandler	917864574c	Fixed a logic bug in mapTMWtoTM and mapTMtoTMW. You can now specify which Unicode font to use via 'java -Dthdl.tmw.to.unicode.font=Ximalaya ...'.	2003-06-23 01:58:11 +00:00
dchandler	b6d8fd89f9	When errors in (all but TMW->Wylie and Wylie->TMW) conversion occur, the troublesome glyphs are now put at the beginning of the document AFTER AN ACHEN. This makes a glyph like \tmw7095 visible atop the achen. Major fix to the handling of paragraphs in conversion; we were (for whatever reason) dropping paragraphs before.	2003-06-23 01:24:02 +00:00
dchandler	1f4343bed0	TMW->TM, TM->TMW, and TMW->Unicode conversions are all (at least 2) orders of magnitude faster.	2003-06-22 22:10:58 +00:00
dchandler	900f7492b0	'ant clean check' was failing because I hadn't updated the --find-some-non-tmw and --find-all-non-tmw baselines. Code cleanup.	2003-06-22 16:11:58 +00:00
dchandler	6540b260bd	Fixes a (small, I think) TMW->Unicode performance glitch. I was inserting 5 characters at a time and then skipping ahead just one position. I don't think this affected correctness. I believe there's still a terrible (exponential?) slowdown as the input file gets bigger, however. Perhaps not -- but we run through the first 1000 TMW glyphs in 6 seconds, the 20th thousand takes at least 60 seconds. Is TMW->Wylie faster than TMW->Unicode? If so, why? Thought: don't use a DuffPane within TibetanConverter -- it can only add overhead, right? My hprof profile said that the conversion was taking just a couple of percent of the work; the rest was going to display-related stuff that you should only see if you were displaying the document. I'm not!	2003-06-22 04:08:33 +00:00
dchandler	dfe64a1927	Added --find-some-non-tm and --find-all-non-tm modes to the converter to help ensure worry-free TM->TMW conversions.	2003-06-22 00:14:18 +00:00
dchandler	80101666c7	Included a fix from WylieWord's tibwn.ini. Removed some needless trailing tildes.	2003-06-21 02:35:21 +00:00
dchandler	5067683121	Edward corrected me; he had intended to have M map to 7.91, not 7.90.	2003-06-17 01:46:19 +00:00
dchandler	da70434e52	Jskad now allows for TMW->Unicode conversion.	2003-06-15 16:27:36 +00:00
dchandler	af5b95b08d	A TMW->Unicode table is here. Note these issues, however: Is the EWTS '_' to be represented as U+0020, or is it a wider space? Does TMW9.42, Dza, map to U+0F5F,U+0F39? Does TMW6.60, r+y, map to U+0F62,U+0FBB or to U+0F6A,U+0FBB? (Likewise with r+w, TMW6.61, TMW6.62, etc.) Is U+0F7E a bindu? What Unicode does TMW7.96 map to, for example? What does TMW7.91 map to? Should TMW8.97 and TMW8.98 map to swastiskas elsewhere in Unicode? If so, which codepoints? Likewise with TMW9.60, a Chinese character. Does TMW7.68 map to U+0F39? Does TMW7.74, the ITHI secret sign, have a Unicode mapping? f68,fa0,f80,f72 comes close, but fa0 would be too large, wouldn't it? What Unicode does TMW9.61 map to? Is it for sequences like f40,f7c,f60,f72? Or is it for f60,f72,f7c?	2003-06-15 03:25:45 +00:00
dchandler	b387c512e9	Fixed two bugs.	2003-06-15 03:08:57 +00:00
dchandler	189fef9aec	Made Jskad smart enough to handle a few more EWTS characters; some it can only convert to Wylie, others are live key sequences. This will make converting the shechen documents go more smoothly.	2003-06-09 13:35:43 +00:00
dchandler	09a55110b7	Handles more TibetanMachine oddballs.	2003-06-09 02:01:13 +00:00
dchandler	b9219640e5	Handles more TibetanMachine oddballs.	2003-06-09 01:53:01 +00:00
dchandler	e97e1c8464	Handles more TibetanMachine oddballs.	2003-06-09 01:20:32 +00:00
dchandler	70b31558fa	Tried to fix a crashing bug that happened when you converted TM->TMW and then tried to convert that TMW to Wylie. I swear it's Java's problem (see the ugly stack trace in the code and decide for yourself), and I tried replacing rather than inserting-and-then-removing, but it didn't work. I've left these things as options.	2003-06-08 23:12:52 +00:00
dchandler	32831b698f	If bad (oddball) TM glyphs appear, then converting to TMW causes, by default, all oddballs to appear once in the resulting document. This'll help me find the correct glyphs for the oddballs, and it'll prevent the average user from converting a document with oddballs.	2003-06-08 22:37:38 +00:00
dchandler	d45f5ab8c8	Improved performance (I suppose).	2003-06-03 23:49:34 +00:00
dchandler	7d768c9e06	Fixed a crashing bug that happened upon converting wylie to tibetan.	2003-06-03 23:45:15 +00:00
dchandler	0f724989b5	The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90. I've fixed that. I've also added a couple of Unicode mappings to give a flavor for how multi-codepoint mappings will be represented. TM->TMW conversion takes about 1 second per thousand glyphs on my PIII-550.	2003-06-01 23:05:32 +00:00
dchandler	54ca37c824	The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90. I've fixed that. I've also added a couple of Unicode mappings to give a flavor for how multi-codepoint mappings will be represented.	2003-06-01 19:14:08 +00:00
dchandler	e2caf99085	Some code cleanup. tibwn.ini must now have, in the Unicode column, either nothing, or 0FXX(,0FXX)*. E.g., 0F04,0F05 is valid. Debugging code ensures this is the case.	2003-06-01 18:09:49 +00:00
dchandler	1f6bb07d53	Fixes bogus Unicode mappings mentioned in http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515.	2003-06-01 04:02:04 +00:00
dchandler	0235263ddf	TM->TMW and TMW->TM conversion in RTF is now supported. I've noticed that formatting is mostly OK but sometimes gets bungled slightly. I tried everything I could think of, and now I'm passing the buck to Java's RTF support. TMW_RTF_TO_THDL_WYLIE (now misnamed) support TMW->TM conversion (but not TM->TMW). There is an automated test case for a TMW->TM conversion. I have full confidence in this conversion. Even the smallest glitch in the core functionality (not formatting) would surprise me. Note that the JUnit test TMW_RTF_TO_THDL_WYLIETest sometimes fails due to one- or two-line diffs between the actual and expected outputs. This is because Java's RTF support is not deterministic, I'm guessing, and is not a real failure. I'm too lazy to make a more elaborate sed/diff mechanism that works on all platforms, and that would complicate the build anyway.	2003-05-31 23:21:29 +00:00
dchandler	bfacd6c998	Accurate TM->TMW and TMW->TM mappings are now available. I've verified this extensively and have full confidence that these mappings agree with Tony Duff's Tibetan! 5.1 documentation (except as described below). To get them, I had to disregard Tony Duff's tables for a few glyphs: the characters with ordinal 32 and 45 (space and hyphen in Roman ASCII, space and tsheg in Tibetan). For these glyphs, we must have mappings from TibetanMachineSkt4.32 to something, etc., and those mappings were not present. I've normalized the mapping for these glyphs, as it is arbitrary because the same two glyphs just appear fifteen times each.	2003-05-31 20:13:15 +00:00
dchandler	a4bc23a9ab	Made performance improvements, doc improvements, and code cleanup to DuffCode.	2003-05-31 17:02:06 +00:00
dchandler	6f0390c5d6	By default (controllable via options.txt), Jskad now fixes the Tahoma curly brace problem upon opening any RTF document. The TMW_RTF_TO_THDL_WYLIE test baselines changed because I fixed (a while ago) some inconsistencies between the EWTS standard and Jskad. Conversion of TibetanMachineWeb8.40, @#, to Wylie now works correctly. Unfortunately, though, typing @# doesn't produce 8.40, it still produces 8.38 and 8.39, two glyphs.	2003-05-28 00:40:59 +00:00
dchandler	a144b125ca	I've made Jskad adhere to the THDL Extended Wylie spec. Some punctuation has changed {@, #, %, and $}. Fixed some errors in tibwn.ini so that all the TM<->TMW mappings are correct.	2003-05-26 13:11:51 +00:00
dchandler	ec7fec695f	Added some automated JUnit tests for TMW_RTF_TO_THDL_WYLIE.	2003-05-18 17:17:52 +00:00
dchandler	e2a9720d9b	I've added a command-line converter, org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE. It converts RTF files consisting of TMW characters to the corresponding THDL Extended Wylie. It supports --find-some-non-tmw mode, which allows you to ensure that no unusual characters will spoil the conversion. The converter has built-in intelligence that allows it to handle Tahoma '{', '}', and '\\' characters properly. The converter works on mixed Roman/TMW also, but --find-some-non-tmw and --find-all-non-tmw modes are not as useful. Invoke org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE, which resides in Jskad's jar, with no command-line options to see usage information.	2003-05-18 14:14:47 +00:00
dchandler	78dc46a979	Jskad keyboards are now configured via keyboards.ini, a file that has comments that explain its function. It's quite simple. This is in response to Jeff C. H. Wu's request.	2003-05-14 03:25:36 +00:00
dchandler	dcb36ec338	Clearer status message; cleanup.	2003-05-14 02:37:28 +00:00
dchandler	8958366a07	Bad RTF now causes an error message to appear in the transcription instead of causing a fatal exception. The error allows you to look up the DuffCode that caused the trouble.	2003-05-14 01:37:49 +00:00
dchandler	59175ccfd6	Added a few tests for the ACIP keyboard, which I've improved a bit. Noted some failures. "Fixed" the code to do what I want it to do for the (no sanskrit stacking, tibetan stacking) case [which is exercised by this keyboard only].	2003-04-14 23:55:00 +00:00
dchandler	efa8fc1f25	DuffPane now has the start of a unit test suite. Invoke it via 'ant clean check'. Right now there are tests to ensure that typing certain sequences of keys in the Extended Wylie keyboard gives the expected Extended Wylie back when "Tools/Convert Tibetan to Wylie" is invoked. The syntactically illegal d.wa now converts to Tibetan and then back to d.wa (not dwa, as it did); likewise with the illegal g.wa. wa doesn't take any prefixes, but I prefer clean end-to-end behavior. (jeskd doesn't go end-to-end, though.) Note that you cannot successfully run the DuffPane tests on a Linux box unless your DISPLAY variable is set correctly. Thus, my nightly builds will fail with an Error (as opposed to a Failure).	2003-04-14 05:22:27 +00:00
dchandler	6636d03a41	ant private-javadocs runs without warnings; cleaned up some as-yet-unused code.	2003-04-13 01:46:20 +00:00
dchandler	644c0d3801	Updated the HTML help file; removed some useless code.	2003-04-13 01:17:10 +00:00
dchandler	daacf6ee3b	I've got too many sandboxes, so I'm committing these changes, half-done, from one sandbox so as to consolidate my sandboxes.	2003-04-12 20:56:20 +00:00
dchandler	6e05b60cff	I'll need these when I turn a sequence of UnicodeGraphemeClusters into LegalTshegBars.	2003-04-12 20:19:02 +00:00
dchandler	cbccfc5277	Fixed bug 718207. 'byungs now converts from Tibetan to Wylie correctly.	2003-04-10 02:14:15 +00:00
dchandler	7dd67bbf6a	Now turns Tibetan into pa'am, not pa'm. Works with or without vowels in the part preceding the 'am or 'ang, overcoming the inconsistency that I'd put here for a short time.	2003-04-08 04:56:40 +00:00
dchandler	eb71fb6075	"sgom pa'am " is correct, not "sgom pa'm ".	2003-04-07 23:49:07 +00:00
dchandler	d836b850e8	"sgom pa'm ", not "sgom pa'am", is now used. "pe'm " was being produced already, so the code was inconsistent. If it turns out that "pe'am " is preferred, I'll fix it later. Consistency is very appealing.	2003-03-31 01:38:27 +00:00
dchandler	33b3080068	Fixed a bunch of bugs; supports le'u'i'o, sgom pa'am, etc. Better tests. As part of that, I had to break TibetanMachineWeb into TibetanMachineWeb+THDLWylieConstants, because I don't want the class-wide initialization code from TibetanMachineWeb causing errors in LegalTshegBarTest.	2003-03-31 00:33:50 +00:00
dchandler	1987f7d80a	b-r-g, b-l-g-s, etc., when converted from Tibetan to Wylie, give correct, unambiguous Wylie.	2003-03-30 21:49:55 +00:00
dchandler	f9670233ba	Removed documentation FIXMEs from this code; did away for good with some really iffy code that I think was behind the "Tibetan->Wylie conversion fails when keyboard isn't Extended Wylie" bug.	2003-03-30 16:13:00 +00:00
dchandler	58f7371e66	I hope that Revamped the "Tools>Convert Tibetan To Wylie" feature that converts TibetanMachineWeb glyphs to THDL Wylie. Three-glyph and four-glyph sequences with implicit "a" vowels are now handled correctly, except for disambiguation w.r.t. things like b-la-g vs. bla-g and d-wa vs. dwa. pa'am, pa'ang etc. now work too. Illegal Tibetan sequences now become very ugly, but "correct" Wylie. Correct in the sense that converting it back to glyphs should get you the glyphs you started with. I also made a change to TibetanMachineWeb.java that I hope will clear up problems with this feature when keyboards other than "Extended Wylie" are selected. Took nga out of the farRightSet [postsuffixes]; only da and sa belong there, right? I tried to get the system in a state such that I could run automated tests of this stuff, but I ran into difficulties. I have some manual test cases; ask if you're interested.	2003-03-30 02:31:16 +00:00
dchandler	2b81020b0e	More and better tests; fixed some bugs in LegalTshegBar.	2003-03-28 03:49:49 +00:00
dchandler	08d2a5d702	Added a test for org.thdl.tib.text.tshegbar.UnicodeCodepointToThdlWylie.	2003-03-22 04:55:17 +00:00
dchandler	f2dcb0cbc3	I said I removed this earlier; I lied. Now it's gone.	2003-03-22 03:58:13 +00:00
dchandler	16cbfb6033	Moved ad-hoc test.java test cases to UnicodeGraphemeClusterTest.java, a JUnit test which can be run via 'ant check'. Removed test.java and its build process.	2003-03-22 03:55:39 +00:00
dchandler	395eca7bb1	Moved ad-hoc test.java test cases to LegalTshegBarTest.java, a JUnit test which can be run via 'ant check'.	2003-03-22 03:46:32 +00:00
dchandler	879b477902	Made some ad-hoc tests in test.java into JUnit tests, run by 'ant check'. NORM_NFD was replaced with NORM_NFKD in three cases in testMostlyNFKD.	2003-03-22 03:24:56 +00:00
dchandler	190a3d9b60	achen must appear before a vowel.	2003-01-05 05:58:32 +00:00
dchandler	fcb75c55eb	Small performance improvement involving String.intern(). Plus a little bit of code cleanup.	2003-01-05 05:57:44 +00:00
dchandler	e5a63df1c1	Added a class skeleton that may not stay for long. I'm committing in order to sync with my laptop, really. This stuff will disappear and reappear in better form later, after a holiday of coding and eggless, alcohol-free nog.	2002-12-20 04:46:13 +00:00
dchandler	fdfedb4419	Added some tests for org.thdl.tib.text.tshegbar. These tests are preliminary, and for this package only. I'm committing in order to sync with my laptop, really. This stuff will disappear and reappear in better form later, after a holiday of coding and eggless, alcohol-free nog.	2002-12-20 04:34:56 +00:00
dchandler	7ea185fa01	Renamed UnicodeCharToExtendedWylie to UnicodeCodepointToThdlWylie.java. Added a new class, UnicodeGraphemeCluster, that can tell you the components of a grapheme cluster from top to bottom. It does not yet have good error checking; it is not yet finished. Next is to parse clean Unicode into GraphemeClusters. After that comes scanning dirty Unicode into best-guess GraphemeClusters, and scanning dirty Unicode to get nice error messages.	2002-12-17 13:51:18 +00:00
dchandler	8e8a23c6a6	Extended Wylie is referred to as THDL Extended Wylie or THDL Wylie because a Japanese scholar has an "Extended Wylie" also. NFKD and NFD have a new brother, NFTHDL. I wish there weren't a need, but as my yet-to-be-put-into-CVS break-unicode-into-grapheme-clusters code demonstrates, the-need-is-there. forgive-me for the hyphens, it's late.	2002-12-15 06:57:32 +00:00
dchandler	a42347b224	Now uses terminology from the Unicode standard. No more talk of characters, for example. Normalization forms NFKD and NFD are supported for the Tibetan Unicode range. I don't like either, actually. I've tested NFKD, but I've not yet committed the tests.	2002-12-15 03:35:24 +00:00
dchandler	26993a5093	So that Unicode escape sequences appear correctly in javadocs.	2002-12-09 02:35:39 +00:00
dchandler	2d6c8be804	So that Unicode escape sequences appear correctly in javadocs.	2002-12-09 02:29:09 +00:00
dchandler	22c6ec5406	Javadoc now works without warnings.	2002-12-09 01:48:34 +00:00
dchandler	f4a16f8e9d	This commit is for my benefit only; these classes are not ready for prime time, and the build system is not yet aware of them. I'm adding some classes for representing legal tsheg-bars (syllables, for the most part) in Unicode. These classes were designed bottom-up (OK, OK -- they weren't designed designed, but I had to write down everything I knew about Tibetan syntax somewhere). The classes are aware of extended wylie. I doubt the Javadocs work yet, and I'm still testing (and am not committing my testing code with these as it is not yet ready). Next on my list--fix these up to reflect my new awareness of suffix particles (like le'u'i'o) add classes to support syntactically incorrect Unicode sequences. Then add a UnicodeReader, and we've got the back end of a Tibetan Unicode shaping system (like half of MS's Uniscribe or Apple's Worldscript or FreeType Layout or Omega's OTPs). A top-down design would not have included LegalTshegBar. But now that my itch has been scratched, potential uses are lingering about. For example, it would be nice to scan some input and break it into LegalTshegBars, punctuation/marks/signs, and illegal stacks. Then we could alert the client of the illegality, its precise form, and its precise location. The real system for turning a Unicode stream into an internal representation suitable for conversion to EWTS/ACIP/XHTML/what-have-you need not be aware of Tibetan syntax. But to make the very best conversion from Unicode to, e.g., EWTS, it is necessary to konw that gaskad is better represented as gskad, but that jaskad is not the same as jskad.	2002-12-09 01:02:23 +00:00
eg3p	9eedfcd909	This is Tashi's TibetanSyllable class for sorting Wylie Tibetan. It does not have many methods for determining the root letter, suffix, and so on, but these should be easy to add. David, please use this class to the extent that it and your new work overlap.	2002-12-05 01:48:41 +00:00
dchandler	d200b03d66	Updated the build system so that you must do a cvs checkout of the 'Fonts' module inside the 'Jskad' module. I.e., you must now have the tree like so: Jskad/ source/ dist/ Fonts/ TibetanMachineWeb/ . . . This is because the THDL tools now optionally (and by default) load the TibetanMachineWeb fonts automatically. Updated the build system so that the 'web-start-releases' and 'self-contained-dist' targets JAR up optional JARs to create double-clickable, self-contained joy. Even the TMW fonts are in the JARs now. Changed the strings describing two Jskad keyboards so that "keyboard" is no longer in the description. It's in the label next to the combo box. Jskad now saves preferences on exit or when the user selects a menu item (that is there for debugging mainly) to ~/my_thdl_preferences.txt on *nix or C:\my_thdl_preferences.txt on Win32. I don't know the correct Mac location. There's a new paradigm for telling org.thdl.util.ThdlOptions that a user preference has been changed. If, for example, a combo box is manipulated so that the ACIP keyboard is selected, then you must call a certain method in ThdlOptions.	2002-11-18 16:12:25 +00:00
eg3p	c9349f6846	These files are not used.	2002-11-12 16:47:02 +00:00
dchandler	ecf61bc892	A DuffPane is now a TibetanPane. A TibetanPane is much more lightweight but does line breaks correctly. I.e., I refactored DuffPane into two classes. I did this trying to track down a subtle bug in line breaking: 'gye ' breaks after 'gy' sometimes, with the dreng bo on the next line, but only when you resize the window certain ways, and only in Savant (and maybe QD and the translation tool, I don't know) but not in Jskad. I was not successful in finding the bug, but it still exists when I use TibetanPanes instead of DuffPanes in org.thdl.savant.tib.*.	2002-11-08 04:11:42 +00:00
dchandler	d462f4e41c	Fixes all known bugs with the ACIP keyboard except for one: ACIP's 'WA' represents Wylie's 'wa', but ACIP's 'ZHVA' represents Wylie's 'zhwa'. The key for wasur is the same as the key for the twentieth consonant in extended Wylie, but not in ACIP.	2002-11-03 17:34:33 +00:00
dchandler	de6ae79959	Fixes bug 624133, "Input freezes after impossible character". Try 'shsM' in ACIP or 'ShSm' in Extended Wylie to see the new behavior. We use a trie to store valid input sequences. In the future, we could use the same trie as a replacement for the more inefficient HashSets we use to store characters, vowels, and punctuation. For example, we'd use 'validInputSequences.put("K", new Pair("consonant", "k"))' when reading in the ACIP keyboard's description of the first consonant of the Tibetan alphabet in 'TibetanKeyboard.java'. Note that the current trie implementation is only useful for 7- or 8-bit transcription systems, and works best for tries with low average depth, which describes a transcription system's trie very well. If you used arbitrary Unicode in your keyboard, you'd need a different trie implementation. Improved the optional keyboard input mode status messages.	2002-11-02 18:44:24 +00:00
dchandler	a6cc4a7ff3	Removed/commented out/tagged some unused local variables. Added a JUnit test for the new Trie that fails at present since the Trie is case-insensitive. Running JUnit tests is not something our build system knows about at present, but Eclipse 2.0 makes it very easy. Fixed a few compiler errors due to imports I'd forgotten.	2002-11-02 16:01:40 +00:00
dchandler	aa580e0bea	Undoing my erroneous commit of buggy code.	2002-11-02 03:46:44 +00:00
dchandler	abcf8f19b3	Factored TibetanDocument into two classes, one that is a DefaultStyledDocument, and another consisting entirely of static utility methods for processing Tibetan text. Moved TibetanDocument.DuffData into its own class. I think this makes things a bit more transparent, and gets us a little closer to making clean use of Swing.	2002-11-02 03:38:59 +00:00
dchandler	5249c48807	Factored TibetanDocument into two classes, one that is a DefaultStyledDocument, and another consisting entirely of static utility methods for processing Tibetan text. Moved TibetanDocument.DuffData into its own class. I think this makes things a bit more transparent, and gets us a little closer to making clean use of Swing.	2002-11-02 03:33:09 +00:00
dchandler	97c530e974	GHA and KR'i now work.	2002-10-28 05:31:19 +00:00
dchandler	1ecbfe6a7c	Fixed some Javadoc comments in preparation for putting up new Javadocs on http://thdltools.sf.net/.	2002-10-28 04:49:24 +00:00
dchandler	fd1b4dd468	Now breaks the line after the last whitespace, not the first. I cleaned things up a bit, and I've made logging optional since I don't yet trust the code fully. A Wylie underscore at the end of a line is worth looking into further, at the very least.	2002-10-28 04:12:49 +00:00
dchandler	8433369d60	Now with slightly better error handling.	2002-10-28 03:17:28 +00:00
dchandler	0ad135f8f1	This may well be a fix to the "Improper line wrapping" bug. The fix is basically that we use our own special ViewFactory, with a new subclass of LabelView (the view RTFEditorKit uses for the nitty gritty) that is aware of Tibetan. There are a couple of nasty hacks still here, and Swing's documentation for doing what I did was quite poor. I searched the web for hours, read the Javadocs and the tutorials, and consulted a Swing reference book, but I still don't have tremendous confidence in this solution. If it fundamentally doesn't work, though, we have to define our own first-class Document, Element hierarchy, ViewFactory, Views, and EditorKit. So let's hope it does work fundamentally. I can't say for sure if this even works, as I have yet to run this code on a machine where Jskad works properly. I had major trouble installing the TMW fonts on Linux, and have yet to resolve it, even after verifying via xlsfonts that the fonts were installed and then changing TibetanMachineWeb.java to look for them. Because I haven't tested this yet, a lot of nasty code is tagged 'DLC' and commented out.	2002-10-28 03:08:04 +00:00
dchandler	2a923f83f8	Added a first attempt at an ACIP keyboard following their document http://www.asianclassics.org/download/tibetancode/ticode.pdf	2002-10-20 07:59:25 +00:00
dchandler	4f9bdab7f7	Changed a /* / comment to a Javadoc (/* */) comment.	2002-10-13 19:13:59 +00:00
dchandler	403f21c8db	Added Javadoc files overview.html and several package.html files. Added a "Quit" option to Savant's File menu. Factored out the Close option in doing so. Exceptions in many action listeners are now handled by org.thdl.util.ThdlActionListener or org.thdl.util.ThdlAbstractAction. Many exceptions that we used to just log now optionally cause aborts. This option is on by default for developers using 'ant savant-run'-style targets, but it is off for users. An erroneous CLASSPATH now causes a useful error message in almost all situations. Fixed some typos and bad links in Javadoc comments. Added a simple assertion facility, but the overhead is suffered even in release builds. Factored out the code that sets up log files like savant.log and jskad.log.	2002-10-06 18:23:27 +00:00
dchandler	859a7731fb	More robust--handles the case when tibwn.ini cannot be found.	2002-10-04 04:37:32 +00:00
dchandler	10d86fc3b7	Updated comments so that Javadoc 1.4 warnings went away.	2002-09-30 03:10:00 +00:00
dchandler	1b47e7c268	Fixed the line feeds, which were botched DOS line feeds. Added copyright boilerplate.	2002-09-28 14:35:09 +00:00
dchandler	3c82f0a24c	Initial revision	2002-09-28 00:53:39 +00:00
dchandler	c6d6116ff2	Initial revision	2002-09-23 23:15:39 +00:00

... 2 3 4 5 6

283 commits