Jskad

Author	SHA1	Message	Date
dchandler	1982c5847b	Jskad's converter now has ACIP-to-Unicode built in. There are known bugs; it is pre-alpha. It's usable, though, and finds tons of errors in ACIP input files, with the user deciding just how pedantic to be. The biggest outstanding bug is the silent one: treating { }, space, as tsheg instead of whitespace when we ought to know better.	2003-08-24 06:40:53 +00:00
dchandler	d5ad760230	TMW->Wylie conversion now takes advantage of prefix rules, the rules that say "ya can take a ga prefix" etc. The ACIP->Unicode converter now gives warnings (optionally, and by default, inline). This converter now produces output even when lexical errors occur, but the output has errors and warnings inline.	2003-08-23 22:03:37 +00:00
dchandler	21ef657921	I'd broken the ACIP->Wylie for ACIP vowels {'A}, {'I}, etc.	2003-08-22 05:13:32 +00:00
dchandler	1afb3a0fdd	ACIP->Unicode, without going through TMW, is now possible, so long as \, the Sanskrit virama, is not used. Of the 1370-odd ACIP texts I've got here, about 57% make it through the gauntlet (fewer if you demand a vowel or disambiguator on every stack of a non-Tibetan tsheg bar).	2003-08-18 02:38:54 +00:00
dchandler	245aac4911	I'm now stricter about accepting alphabetic characters. F, Q, X, a, b, c, d, e, ... do not belong in ACIP, so the scanner rejects them. This should make it even easier to distinguish automatically between Tibetan and English texts.	2003-08-17 02:38:58 +00:00
dchandler	39451d8879	Fixed a couple of small bugs. Only 250 errors are reported now; this is important if you try to convert an English document.	2003-08-17 02:12:49 +00:00
dchandler	4581a2d8ab	Improved the ACIP scanner (the part of the converter that says, "This is a correction, that's a comment, this is Tibetan, that's Latin (English), that's Tibetan inter-tsheg-bar punctuation, etc.) It now accepts more real-world ACIP files, i.e. it handles illegal constructs. The error checking is more user-friendly. There are now tests. Added some tsheg bars that Peter E. Hauer of Linguasoft sent me to the tests. Many thanks, Peter. I still need to implement rules that say, "This is not Tibetan, it must be Sanskrit, because that letter doesn't take a MA prefix."	2003-08-17 01:45:55 +00:00
dchandler	0b91ed0beb	I've improved the ACIP tsheg bar scanner to handle a lot of illegal constructions that occur in practice.	2003-08-16 16:13:53 +00:00
dchandler	2b59d9838d	I now have a function that takes as input a String of ACIP and breaks up that String into tsheg bars, punctuation, etc., while finding errors. I've tested it some, but I'm not yet committing the tests. Next step: a converter that takes an ACIP file as input and outputs TMW+Latin.	2003-08-14 05:10:47 +00:00
dchandler	57f506384f	The ACIP->Tibetan converter now has perfect low-level functionality, and it has the capability to produce error messages and warnings that make sense to the user. One can now get the correct parse, if one exists, for an ACIP tsheg bar. One could even feed in ACIP and get a list of warnings about things as innocuous as PADMA, which a dumb converter would have trouble with. One could then turn ACIP into well-behaved ACIP for that dumb converter, if you really wanted to. Still to do: o Scan ACIP files into tsheg bars. o Produce TMW/Latin (from which you can get Unicode, etc.). o E-mail the illegal tsheg bars to the ACIP fellows so they can fix the affected documents (most of the Kangyur has unparseable creatures).	2003-08-12 04:13:11 +00:00
dchandler	87266646fb	Removed misinformation.	2003-08-10 19:33:01 +00:00
dchandler	e21d3774a9	Added an unfinished ACIP->Tibetan converter. Once it works properly for ACIP, it'll easily be made to work as a perfect EWTS Wylie->Tibetan converter. It has an extensive suite of tests for the existing functionality.	2003-08-10 19:30:07 +00:00
dchandler	39e0435b6b	Refactored this code so that Wylie->Tibetan and ACIP->Tibetan conversions can make use of it. Hooray for reuse.	2003-08-10 19:02:56 +00:00
dchandler	9093fd3c05	We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie. Our disambiguation is now perfect, happening when and only when it is necessary. These are all illegal, so it shouldn't affect many existing conversions. But if there were typos, it could.	2003-08-10 18:38:20 +00:00
dchandler	251d8feae5	brtan now gives TMW->Wylie brtan, not b.rtan. Etc. See bug report http://sourceforge.net/tracker/index.php?func=detail&aid=785791&group_id=61934&atid=502515.	2003-08-09 17:48:40 +00:00
dchandler	7dffc47cb7	'bad now gives TMW->Wylie 'bad, not TMW->Wylie 'abd. Andres came across this one, so we've added it to the list of ambiguous three-consonant combos.	2003-08-09 17:05:43 +00:00
dchandler	e198519c5f	Jskad now supports EWTS ~, i.e. TMW8.91.	2003-07-25 02:35:31 +00:00
amontano	97f5fe91b3	when invalid wylie is encountered, instead of displaying a message it raises an exception.	2003-07-25 01:43:18 +00:00
dchandler	f8c959bfb0	The Tibetan d.za was being converted into the Wylie dza incorrectly. This is a rare case, but I want TMW->Wylie to be perfectly unambiguous.	2003-07-18 00:30:27 +00:00
dchandler	0622ac5062	Jskad no longer relies on the <?Consonants?>, <?Vowels?>, <?Other?>, or <?Numbers?> commands; it instead hard-codes the appropriate comma- delimited lists. This is cleaner because WylieWord and Jskad had different values for these lists.	2003-07-14 12:19:46 +00:00
dchandler	fb85f6e8ce	Fix comment.	2003-07-14 12:17:04 +00:00
dchandler	96afae795c	Disambiguation was not being used appropriately. This makes previous TMW->Wylie conversions with the new-and-improved TMW->Wylie algorithm faulty. Now I'm using it a little more than you need to, e.g. b.lha instead of blha is generated because bla and b.la are ambiguous.	2003-07-13 18:46:29 +00:00
dchandler	802e0cb588	If this method uses the Wylie representation, you get an infinite recursion when you do a TMW->Wylie conversion for a document with glyphs that have no known Wylie.	2003-07-13 17:40:02 +00:00
dchandler	a86a0f235b	I was missing a break; statement; this caused an Error to be thrown during some TMW->Wylie conversions. No conversions were erroneous, though.	2003-07-13 17:38:00 +00:00
dchandler	6677d1e245	Code cleanup.	2003-07-13 16:53:03 +00:00
dchandler	3b6eaa792e	Fixed javadocs.	2003-07-11 13:33:30 +00:00
dchandler	85176cd9f3	Put in a fix for a new bug in Swing's RTF support. This bug is w.r.t. escapes like \bullet, \emdash, etc., and this fix only works for Windows or OS/2 RTF files, not for Mac RTF files. So if you want a TM->TMW conversion to work, use MS Word for Windows, not for the Mac.	2003-07-11 13:30:22 +00:00
dchandler	d726bc0258	A couple of changes to TMW->Unicode thanks to Than's reply to my questions.	2003-07-09 01:44:15 +00:00
dchandler	02558a1d78	Jskad supports <7, >8, etc. again; it no longer supports the punctuation '<' and '>'. The current keyboard implementation makes this an either-or proposition, when fundamentally it need not be. Added a <?Numbers?> command and an <?Input:Numbers?> command to tibwn.ini; broke the numbers apart from the consonants. This facilitates the new-and-improved Tibetan->Wylie conversion. Tibetan->Wylie is now done by forming legal tsheg-bars. A legal tsheg bar is converted into perfect THDL Wylie. See code comments to learn what it thinks is a legal tsheg-bar, but it inlcudes bskyUMbsH minus the trailing punctuation (H), e.g. Illegal sequences, such as runs of transliterated Sanskrit, are turned into unambiguous Wylie; each glyph is followed by a vowel or a disambiguator ('.'). I've made it so that the illegal sequences are as beautiful as possible. You get 'pad+me', for example, not the equivalent but uglier 'pad+m.e.'.	2003-07-08 14:30:17 +00:00
dchandler	23d18c925f	Tibetan! 5.1's docs were again faulty. fa and va were getting the wrong vowels.	2003-07-08 02:59:17 +00:00
dchandler	72d2eee503	Code cleanup.	2003-07-05 19:26:58 +00:00
dchandler	a463b686b3	Jskad now ships with both TibetanMachine and TibetanMachineWeb fonts by default, not just TMW. Thus users need not install these fonts on their systems.	2003-07-05 18:00:29 +00:00
dchandler	6c286573ba	Fixed Javadocs.	2003-07-04 00:12:59 +00:00
dchandler	a48ec641d5	Better error messages in TMW->Wylie conversions. The user knows what's up.	2003-07-01 03:43:33 +00:00
dchandler	3113a4b8de	Some of the \tmw80.. mappings were out of date. 3+1/2 is not EWTS; took these out.	2003-07-01 03:42:30 +00:00
dchandler	6151a7bc94	TMW->Wylie now occurs in the TibetanDocument, not in DuffPane, which means that the command-line tool can finally function with a headless graphics device. Hopefully it will speed things up, too. It also means that entering Roman text into the TMW->Unicode conversion and TMW->TM conversion will be easy.	2003-07-01 01:21:57 +00:00
dchandler	61d29fc355	The TMW->Wylie mapping was busted w.r.t. tshegs. Also, I now map both TMW7.90 and TMW7.91 to EWTS 'M'.	2003-07-01 00:17:18 +00:00
dchandler	229536884f	I've validated by hand the TM<->TMW mappings. A few things changed, so no previous TM->TMW or TMW->TM conversions can be trusted.	2003-06-30 02:24:11 +00:00
dchandler	b16fb8a85c	This is correct; the Tibetan! 5.1 documentation is not. This affects TM->TMW conversions. See http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515 for a full list of Tibetan! 5.1 documentation errors.	2003-06-29 22:11:00 +00:00
dchandler	aedef4b44d	An error now appears if you try to convert from format A to format B but no glyphs in format A appear. In this case, it is likely that you meant to convert a different file or do a different conversion.	2003-06-29 21:31:48 +00:00
dchandler	3f76c3692d	Fixed Javadoc warnings.	2003-06-29 15:37:35 +00:00
dchandler	7938648ca8	TM->TMW conversion has no known bugs. Oddballs have been comprehensively handled.	2003-06-29 03:03:07 +00:00
dchandler	4e279defb4	Fixed a couple of array bounds checks. Added support for two more oddballs. Deprecated the oddball lookup method because it drops up to 30 glyphs in TibetanMachine. The correct solution is to transform the RTF before Java's busted RTF readers ever see it. \'97 becomes \u151, e.g.	2003-06-28 16:33:58 +00:00
dchandler	2a359c45ef	Bad conversions were not leaving the unconvertable characters at the beginning of the document as they should and as they are documented to. They now do, and they bracket the bad characters with the TM or TMW for U+0F3C on the left and the TM or TMW for U+0F3D on the right. Some cleanup.	2003-06-28 16:20:19 +00:00
dchandler	f547734043	Added Than's converter GUI code; adapted it to work with Jskad's converters. TMW->Unicode now uses Ximalaya by default.	2003-06-24 03:02:29 +00:00
dchandler	917864574c	Fixed a logic bug in mapTMWtoTM and mapTMtoTMW. You can now specify which Unicode font to use via 'java -Dthdl.tmw.to.unicode.font=Ximalaya ...'.	2003-06-23 01:58:11 +00:00
dchandler	b6d8fd89f9	When errors in (all but TMW->Wylie and Wylie->TMW) conversion occur, the troublesome glyphs are now put at the beginning of the document AFTER AN ACHEN. This makes a glyph like \tmw7095 visible atop the achen. Major fix to the handling of paragraphs in conversion; we were (for whatever reason) dropping paragraphs before.	2003-06-23 01:24:02 +00:00
dchandler	1f4343bed0	TMW->TM, TM->TMW, and TMW->Unicode conversions are all (at least 2) orders of magnitude faster.	2003-06-22 22:10:58 +00:00
dchandler	900f7492b0	'ant clean check' was failing because I hadn't updated the --find-some-non-tmw and --find-all-non-tmw baselines. Code cleanup.	2003-06-22 16:11:58 +00:00
dchandler	6540b260bd	Fixes a (small, I think) TMW->Unicode performance glitch. I was inserting 5 characters at a time and then skipping ahead just one position. I don't think this affected correctness. I believe there's still a terrible (exponential?) slowdown as the input file gets bigger, however. Perhaps not -- but we run through the first 1000 TMW glyphs in 6 seconds, the 20th thousand takes at least 60 seconds. Is TMW->Wylie faster than TMW->Unicode? If so, why? Thought: don't use a DuffPane within TibetanConverter -- it can only add overhead, right? My hprof profile said that the conversion was taking just a couple of percent of the work; the rest was going to display-related stuff that you should only see if you were displaying the document. I'm not!	2003-06-22 04:08:33 +00:00

1 2 3

132 commits