Jskad

Author	SHA1	Message	Date
dchandler	1982c5847b	Jskad's converter now has ACIP-to-Unicode built in. There are known bugs; it is pre-alpha. It's usable, though, and finds tons of errors in ACIP input files, with the user deciding just how pedantic to be. The biggest outstanding bug is the silent one: treating { }, space, as tsheg instead of whitespace when we ought to know better.	2003-08-24 06:40:53 +00:00
dchandler	1afb3a0fdd	ACIP->Unicode, without going through TMW, is now possible, so long as \, the Sanskrit virama, is not used. Of the 1370-odd ACIP texts I've got here, about 57% make it through the gauntlet (fewer if you demand a vowel or disambiguator on every stack of a non-Tibetan tsheg bar).	2003-08-18 02:38:54 +00:00
dchandler	4581a2d8ab	Improved the ACIP scanner (the part of the converter that says, "This is a correction, that's a comment, this is Tibetan, that's Latin (English), that's Tibetan inter-tsheg-bar punctuation, etc.) It now accepts more real-world ACIP files, i.e. it handles illegal constructs. The error checking is more user-friendly. There are now tests. Added some tsheg bars that Peter E. Hauer of Linguasoft sent me to the tests. Many thanks, Peter. I still need to implement rules that say, "This is not Tibetan, it must be Sanskrit, because that letter doesn't take a MA prefix."	2003-08-17 01:45:55 +00:00
dchandler	2b59d9838d	I now have a function that takes as input a String of ACIP and breaks up that String into tsheg bars, punctuation, etc., while finding errors. I've tested it some, but I'm not yet committing the tests. Next step: a converter that takes an ACIP file as input and outputs TMW+Latin.	2003-08-14 05:10:47 +00:00

4 commits