Fixed ACIP->Unicode/TMW for BDE, which should be B-DE, not B+DE, because the former is legal Tibetan.
The ACIP->EWTS subroutine has improved.
TMW->Wylie and TMW->ACIP are improved in error cases.
TMW->ACIP has friendly embedded error messages now.
mappings, so I've put them back, but with the EWTS non-correspondences
\tmwXYYY.
Jskad no longer supports superscribed or subscribed numerals, because
EWTS does not.
that say "ya can take a ga prefix" etc.
The ACIP->Unicode converter now gives warnings (optionally, and by
default, inline). This converter now produces output even when
lexical errors occur, but the output has errors and warnings inline.
Our disambiguation is now perfect, happening when and only when it is
necessary. These are all illegal, so it shouldn't affect many
existing conversions. But if there were typos, it could.
or <?Numbers?> commands; it instead hard-codes the appropriate comma-
delimited lists. This is cleaner because WylieWord and Jskad had different
values for these lists.
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.
Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
like \bullet, \emdash, etc., and this fix only works for Windows or OS/2 RTF
files, not for Mac RTF files. So if you want a TM->TMW conversion to work,
use MS Word for Windows, not for the Mac.
'<' and '>'. The current keyboard implementation makes this an either-or
proposition, when fundamentally it need not be.
Added a <?Numbers?> command and an <?Input:Numbers?> command to
tibwn.ini; broke the numbers apart from the consonants. This facilitates the
new-and-improved Tibetan->Wylie conversion.
Tibetan->Wylie is now done by forming legal tsheg-bars. A legal tsheg bar
is converted into perfect THDL Wylie. See code comments to learn what
it thinks is a legal tsheg-bar, but it inlcudes bskyUMbsH minus the trailing
punctuation (H), e.g.
Illegal sequences, such as runs of transliterated Sanskrit, are turned into
unambiguous Wylie; each glyph is followed by a vowel or a disambiguator
('.').
I've made it so that the illegal sequences are as beautiful as possible. You
get 'pad+me', for example, not the equivalent but uglier 'pad+m.e.'.
Added support for two more oddballs.
Deprecated the oddball lookup method because it drops up to 30 glyphs in
TibetanMachine. The correct solution is to transform the RTF before Java's
busted RTF readers ever see it. \'97 becomes \u151, e.g.
Is the EWTS '_' to be represented as U+0020, or is it a wider space?
Does TMW9.42, Dza, map to U+0F5F,U+0F39?
Does TMW6.60, r+y, map to U+0F62,U+0FBB or to U+0F6A,U+0FBB? (Likewise with r+w, TMW6.61, TMW6.62, etc.)
Is U+0F7E a bindu? What Unicode does TMW7.96 map to, for example? What does TMW7.91 map to?
Should TMW8.97 and TMW8.98 map to swastiskas elsewhere in Unicode? If so, which codepoints? Likewise with TMW9.60, a Chinese character.
Does TMW7.68 map to U+0F39?
Does TMW7.74, the ITHI secret sign, have a Unicode mapping? f68,fa0,f80,f72 comes close, but fa0 would be too large, wouldn't it?
What Unicode does TMW9.61 map to? Is it for sequences like f40,f7c,f60,f72? Or is it for f60,f72,f7c?
I've fixed that.
I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.
TM->TMW conversion takes about 1 second per thousand glyphs on my
PIII-550.
noticed that formatting is mostly OK but sometimes gets bungled slightly.
I tried everything I could think of, and now I'm passing the buck to Java's
RTF support.
TMW_RTF_TO_THDL_WYLIE (now misnamed) support TMW->TM
conversion (but not TM->TMW). There is an automated test case for a
TMW->TM conversion.
I have full confidence in this conversion. Even the smallest glitch in the core
functionality (not formatting) would surprise me.
Note that the JUnit test TMW_RTF_TO_THDL_WYLIETest sometimes fails
due to one- or two-line diffs between the actual and expected outputs. This
is because Java's RTF support is not deterministic, I'm guessing, and is not
a real failure. I'm too lazy to make a more elaborate sed/diff mechanism
that works on all platforms, and that would complicate the build anyway.
verified this extensively and have full confidence that these mappings
agree with Tony Duff's Tibetan! 5.1 documentation (except as described
below).
To get them, I had to disregard Tony Duff's tables for a few glyphs: the
characters with ordinal 32 and 45 (space and hyphen in Roman ASCII,
space and tsheg in Tibetan). For these glyphs, we must have mappings
from TibetanMachineSkt4.32 to something, etc., and those mappings were
not present. I've normalized the mapping for these glyphs, as it is arbitrary
because the same two glyphs just appear fifteen times each.
org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE. It converts RTF files
consisting of TMW characters to the corresponding THDL Extended Wylie.
It supports --find-some-non-tmw mode, which allows you to ensure that no
unusual characters will spoil the conversion. The converter has built-in
intelligence that allows it to handle Tahoma '{', '}', and '\\' characters
properly.
The converter works on mixed Roman/TMW also, but --find-some-non-tmw
and --find-all-non-tmw modes are not as useful.
Invoke org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE, which resides in
Jskad's jar, with no command-line options to see usage information.
clean check'. Right now there are tests to ensure that typing certain
sequences of keys in the Extended Wylie keyboard gives the expected
Extended Wylie back when "Tools/Convert Tibetan to Wylie" is invoked.
The syntactically illegal d.wa now converts to Tibetan and then back
to d.wa (not dwa, as it did); likewise with the illegal g.wa. wa
doesn't take any prefixes, but I prefer clean end-to-end
behavior. (jeskd doesn't go end-to-end, though.)
Note that you cannot successfully run the DuffPane tests on a Linux
box unless your DISPLAY variable is set correctly. Thus, my nightly
builds will fail with an Error (as opposed to a Failure).
Better tests. As part of that, I had to break TibetanMachineWeb into
TibetanMachineWeb+THDLWylieConstants, because I don't want the
class-wide initialization code from TibetanMachineWeb causing errors
in LegalTshegBarTest.
converts TibetanMachineWeb glyphs to THDL Wylie. Three-glyph and
four-glyph sequences with implicit "a" vowels are now handled
correctly, except for disambiguation w.r.t. things like b-la-g
vs. bla-g and d-wa vs. dwa.
pa'am, pa'ang etc. now work too.
Illegal Tibetan sequences now become very ugly, but "correct" Wylie.
Correct in the sense that converting it back to glyphs should get you
the glyphs you started with.
I also made a change to TibetanMachineWeb.java that I hope will clear
up problems with this feature when keyboards other than "Extended
Wylie" are selected.
Took nga out of the farRightSet [postsuffixes]; only da and sa belong
there, right?
I tried to get the system in a state such that I could run automated
tests of this stuff, but I ran into difficulties. I have some manual
test cases; ask if you're interested.