Jskad

Author	SHA1	Message	Date
dchandler	8ccd68789a	Since I had Eclipse fired up, I had it automatically organized the imports. It made two errors, but the compiler found them. I've cvs tagged the tree before doing this, just in case.	2005-07-11 03:10:32 +00:00
dchandler	6d419fe641	Numerous EWTS->Unicode and especially EWTS->TMW improvements. Fixed ordering of Unicode wowels. [ku+A] gives the correct Unicode now, e.g. EWTS->TMW looks better for some wacky wowels like, I'm guessing here, [ku+A]. EWTS->TMW should now give errors any time the full input isn't used. Previously, wacky wowels like [kai+-i] would lead to some droppage. EWTS->TMW->Unicode testing is now in effect. This found a ton of EWTS->TMW bugs, most or all of which are fixed now. TMW->Unicode is improved/fixed for { \u5350,\u534D,\u0F88+k,\u0F88+kh,U }. (Why U? "\u0f75" is discouraged in favor of "\u0f71\u0f74".) NOTE: TMW_RTF_TO_THDL_WYLIETest is still disabled for the nightly builds' sake, but I ran it in my sandbox and it passed.	2005-07-11 02:51:06 +00:00
dchandler	0b3a636f63	Tremendously better EWTS->Unicode and EWTS->TMW conversion, though still not tested end-to-end and without perfect unit tests. See EWTSTest.RUN_FAILING_TESTS, for example, to find imperfection.	2005-07-06 02:19:38 +00:00
dchandler	7198f23361	I really hesitate to commit this because I'm not sure what it brings to the table exactly and I fear that it makes the ACIP->Tibetan converter code a lot uglier. The TODO(DLC)[EWTS->Tibetan] comments littered throughout are part of the ugliness; they point to the ugliness. If each were addressed, cleanliness could perhaps be achieved. I've largely forgotten exactly what this change does, but it attempts to improve EWTS->Tibetan conversion. The lexer is probably really, really primitive. I concentrate here on converting a single tsheg bar rather than a whole document. Eclipse was used during part of my journey here and some imports were reorganized merely because I could. :) (Eclipse was needed when the usual ant build failed to run a new test EWTSTest. And I wanted its debugger.) Next steps: end-to-end EWTS tests should bring many problems to light. Fix those. Triage all the TODO comments. I don't know that I'll ever really trust the implementation. The tests are valuable, though. A clean implementation of EWTS->Tibetan in Jython might hold enough interest for me; I'd like to learn Python.	2005-06-20 06:18:00 +00:00
dchandler	37bf9a736d	I did this stuff back in August. It's all in support of EWTS->Tibetan conversion. The tag 'TODO(DLC)[EWTS->Tibetan]' exists all over the place. EWTS->Tibetan isn't here yet; lexing isn't here yet; this is mainly a refactoring so that the ACIP->Tibetan code can be reused to do EWTS->Tibetan. I'm committing this because tests pass (it shouldn't be breaking anything), because I want a checkpoint, and because the laptop this sandbox was on isn't my preferred development environment.	2005-02-21 01:16:10 +00:00
dchandler	df262aa148	It is now a compile-time option whether to treat []- and {}-bracketed sequences as text to be passed through (without the brackets in the case of {}) literally, which is the case by default because Robert Chilton requested it, or the old, ad-hoc mechanism which could be useful for finding some ugly input. Made a couple of error messages a little more verbose now that we have short-message mode.	2004-06-06 21:39:06 +00:00
dchandler	de3a19761e	Fixes for javadoc tool.	2004-04-17 15:48:50 +00:00
dchandler	274e1736be	Deleted cut-and-paste goof.	2004-01-17 19:45:31 +00:00
dchandler	c69ba26c60	TString now has tracks what Roman transliteration system it is using. Next up is to make ACIPConverter handle EWTS or ACIP TStrings.	2004-01-17 19:28:54 +00:00
dchandler	a39c5c12b0	ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.	2003-12-08 07:15:27 +00:00
dchandler	dfaae4be93	ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}. Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided.	2003-11-29 22:56:18 +00:00
dchandler	04816acb74	ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those stacks that use full-form subjoined RA and YA consonants. ACIP {RVA} was converting to the wrong things. The TMW for {RVA} was converting to the wrong ACIP. Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.	2003-11-09 01:07:45 +00:00
dchandler	31b3020d07	Added a test case that runs almost all the tsheg bars from all non-reference, publicly available ACIP files (hundreds of megabytes of them) through the converter. The frequencies of these tsheg bars in in the file, too.	2003-10-26 06:02:48 +00:00
dchandler	7ba1ad0735	Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc.	2003-10-26 02:42:06 +00:00
dchandler	ef24c608bf	Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically. This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.	2003-10-26 02:17:19 +00:00
dchandler	ee50291ed4	Andres found that "THAG PA" caused a NullPointerException. That's fixed. Renamed ACIPString to TString -- we'll use this for EWTS and ACIP both. TMW->ACIP for TMW9.61 should work now.	2003-10-04 01:22:59 +00:00

16 commits