I've fixed that.
I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.
TM->TMW conversion takes about 1 second per thousand glyphs on my
PIII-550.
noticed that formatting is mostly OK but sometimes gets bungled slightly.
I tried everything I could think of, and now I'm passing the buck to Java's
RTF support.
TMW_RTF_TO_THDL_WYLIE (now misnamed) support TMW->TM
conversion (but not TM->TMW). There is an automated test case for a
TMW->TM conversion.
I have full confidence in this conversion. Even the smallest glitch in the core
functionality (not formatting) would surprise me.
Note that the JUnit test TMW_RTF_TO_THDL_WYLIETest sometimes fails
due to one- or two-line diffs between the actual and expected outputs. This
is because Java's RTF support is not deterministic, I'm guessing, and is not
a real failure. I'm too lazy to make a more elaborate sed/diff mechanism
that works on all platforms, and that would complicate the build anyway.
verified this extensively and have full confidence that these mappings
agree with Tony Duff's Tibetan! 5.1 documentation (except as described
below).
To get them, I had to disregard Tony Duff's tables for a few glyphs: the
characters with ordinal 32 and 45 (space and hyphen in Roman ASCII,
space and tsheg in Tibetan). For these glyphs, we must have mappings
from TibetanMachineSkt4.32 to something, etc., and those mappings were
not present. I've normalized the mapping for these glyphs, as it is arbitrary
because the same two glyphs just appear fifteen times each.
org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE. It converts RTF files
consisting of TMW characters to the corresponding THDL Extended Wylie.
It supports --find-some-non-tmw mode, which allows you to ensure that no
unusual characters will spoil the conversion. The converter has built-in
intelligence that allows it to handle Tahoma '{', '}', and '\\' characters
properly.
The converter works on mixed Roman/TMW also, but --find-some-non-tmw
and --find-all-non-tmw modes are not as useful.
Invoke org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE, which resides in
Jskad's jar, with no command-line options to see usage information.
clean check'. Right now there are tests to ensure that typing certain
sequences of keys in the Extended Wylie keyboard gives the expected
Extended Wylie back when "Tools/Convert Tibetan to Wylie" is invoked.
The syntactically illegal d.wa now converts to Tibetan and then back
to d.wa (not dwa, as it did); likewise with the illegal g.wa. wa
doesn't take any prefixes, but I prefer clean end-to-end
behavior. (jeskd doesn't go end-to-end, though.)
Note that you cannot successfully run the DuffPane tests on a Linux
box unless your DISPLAY variable is set correctly. Thus, my nightly
builds will fail with an Error (as opposed to a Failure).
Better tests. As part of that, I had to break TibetanMachineWeb into
TibetanMachineWeb+THDLWylieConstants, because I don't want the
class-wide initialization code from TibetanMachineWeb causing errors
in LegalTshegBarTest.
converts TibetanMachineWeb glyphs to THDL Wylie. Three-glyph and
four-glyph sequences with implicit "a" vowels are now handled
correctly, except for disambiguation w.r.t. things like b-la-g
vs. bla-g and d-wa vs. dwa.
pa'am, pa'ang etc. now work too.
Illegal Tibetan sequences now become very ugly, but "correct" Wylie.
Correct in the sense that converting it back to glyphs should get you
the glyphs you started with.
I also made a change to TibetanMachineWeb.java that I hope will clear
up problems with this feature when keyboards other than "Extended
Wylie" are selected.
Took nga out of the farRightSet [postsuffixes]; only da and sa belong
there, right?
I tried to get the system in a state such that I could run automated
tests of this stuff, but I ran into difficulties. I have some manual
test cases; ask if you're interested.
'Fonts' module inside the 'Jskad' module. I.e., you must now have the
tree like so:
Jskad/
source/
dist/
Fonts/
TibetanMachineWeb/
.
.
.
This is because the THDL tools now optionally (and by default) load
the TibetanMachineWeb fonts automatically.
Updated the build system so that the 'web-start-releases' and
'self-contained-dist' targets JAR up optional JARs to create
double-clickable, self-contained joy. Even the TMW fonts are in the
JARs now.
Changed the strings describing two Jskad keyboards so that "keyboard"
is no longer in the description. It's in the label next to the combo
box.
Jskad now saves preferences on exit or when the user selects a menu
item (that is there for debugging mainly) to ~/my_thdl_preferences.txt
on *nix or C:\my_thdl_preferences.txt on Win32. I don't know the
correct Mac location.
There's a new paradigm for telling org.thdl.util.ThdlOptions that a
user preference has been changed. If, for example, a combo box is
manipulated so that the ACIP keyboard is selected, then you must call
a certain method in ThdlOptions.
ACIP or 'ShSm' in Extended Wylie to see the new behavior.
We use a trie to store valid input sequences. In the future, we could use
the same trie as a replacement for the more inefficient HashSets we use to
store characters, vowels, and punctuation. For example, we'd use
'validInputSequences.put("K", new Pair("consonant", "k"))' when reading
in the ACIP keyboard's description of the first consonant of the Tibetan
alphabet in 'TibetanKeyboard.java'.
Note that the current trie implementation is only useful for 7- or 8-bit
transcription systems, and works best for tries with low average depth, which
describes a transcription system's trie very well. If you used arbitrary
Unicode in your keyboard, you'd need a different trie implementation.
Improved the optional keyboard input mode status messages.
Added a JUnit test for the new Trie that fails at present since the Trie is
case-insensitive. Running JUnit tests is not something our build system
knows about at present, but Eclipse 2.0 makes it very easy.
Fixed a few compiler errors due to imports I'd forgotten.
DefaultStyledDocument, and another consisting entirely of static utility
methods for processing Tibetan text. Moved TibetanDocument.DuffData
into its own class.
I think this makes things a bit more transparent, and gets us a little closer to
making clean use of Swing.
I cleaned things up a bit, and I've made logging optional since I don't yet
trust the code fully.
A Wylie underscore at the end of a line is worth looking into further, at the
very least.
is basically that we use our own special ViewFactory, with a new
subclass of LabelView (the view RTFEditorKit uses for the nitty
gritty) that is aware of Tibetan.
There are a couple of nasty hacks still here, and Swing's
documentation for doing what I did was quite poor. I searched the web
for hours, read the Javadocs and the tutorials, and consulted a Swing
reference book, but I still don't have tremendous confidence in this
solution. If it fundamentally doesn't work, though, we have to define
our own first-class Document, Element hierarchy, ViewFactory, Views,
and EditorKit. So let's hope it *does* work fundamentally.
I can't say for sure if this even works, as I have yet to run this
code on a machine where Jskad works properly. I had major trouble
installing the TMW fonts on Linux, and have yet to resolve it, even
after verifying via xlsfonts that the fonts were installed and then
changing TibetanMachineWeb.java to look for them. Because I haven't
tested this yet, a lot of nasty code is tagged 'DLC' and commented
out.