Better tests. As part of that, I had to break TibetanMachineWeb into
TibetanMachineWeb+THDLWylieConstants, because I don't want the
class-wide initialization code from TibetanMachineWeb causing errors
in LegalTshegBarTest.
converts TibetanMachineWeb glyphs to THDL Wylie. Three-glyph and
four-glyph sequences with implicit "a" vowels are now handled
correctly, except for disambiguation w.r.t. things like b-la-g
vs. bla-g and d-wa vs. dwa.
pa'am, pa'ang etc. now work too.
Illegal Tibetan sequences now become very ugly, but "correct" Wylie.
Correct in the sense that converting it back to glyphs should get you
the glyphs you started with.
I also made a change to TibetanMachineWeb.java that I hope will clear
up problems with this feature when keyboards other than "Extended
Wylie" are selected.
Took nga out of the farRightSet [postsuffixes]; only da and sa belong
there, right?
I tried to get the system in a state such that I could run automated
tests of this stuff, but I ran into difficulties. I have some manual
test cases; ask if you're interested.
2. Added support extreme uses of 'a' like le'u'i'o
3. Now parses correctly syllables that have the particles "ang" and "am" added to them. Second works only in "roman script" mode. The converter from tibetan script to roman script does not convert correctly this combinations. ("pa'ang" is converted wrongly into "pa'ng" and "pa'am" is converted wrongly into "pa'ma").
now a preference.
In addition, Jskad now raises an error dialog when you try to "Save
As" to a bad place or open a file that doesn't exist or isn't
readable.
into Jskad's JAR file.
Doing so required that I cut out a lot of fancy HTML code. The correct fix
is to use XML to store the meat and then use XSL to generate two forms of
HTML: one dumb enough for Java, one for use on the THDL tools website.
e-mailed to me. Tibbibl is an editor for XML-based bibliographies of
Tibetan texts. All I did was change the package from org.thdl.xml to
org.thdl.tib.bibl and add boilerplate; no changes to Than's code were
made.
Tibbibl features a diacritic input tool which Jskad might want to
swipe.
I'm committing in order to sync with my laptop, really. This stuff will disappear
and reappear in better form later, after a holiday of coding and eggless,
alcohol-free nog.
and for this package only.
I'm committing in order to sync with my laptop, really. This stuff will disappear
and reappear in better form later, after a holiday of coding and eggless,
alcohol-free nog.
UnicodeCodepointToThdlWylie.java.
Added a new class, UnicodeGraphemeCluster, that can tell you
the components of a grapheme cluster from top to bottom. It does not
yet have good error checking; it is not yet finished.
Next is to parse clean Unicode into GraphemeClusters. After that comes
scanning dirty Unicode into best-guess GraphemeClusters, and scanning
dirty Unicode to get nice error messages.
because a Japanese scholar has an "Extended Wylie" also.
NFKD and NFD have a new brother, NFTHDL. I wish there weren't a need,
but as my yet-to-be-put-into-CVS break-unicode-into-grapheme-clusters code
demonstrates, the-need-is-there. forgive-me for the hyphens, it's late.
characters, for example.
Normalization forms NFKD and NFD are supported for the Tibetan Unicode
range. I don't like either, actually. I've tested NFKD, but I've not yet
committed the tests.
and the build system is not yet aware of them.
I'm adding some classes for representing legal tsheg-bars (syllables, for the
most part) in Unicode. These classes were designed bottom-up (OK, OK --
they weren't designed designed, but I had to write down everything I knew
about Tibetan syntax somewhere). The classes are aware of extended
wylie. I doubt the Javadocs work yet, and I'm still testing (and am not
committing my testing code with these as it is not yet ready).
Next on my list--fix these up to reflect my new awareness of suffix particles
(like le'u'i'o) add classes to support syntactically incorrect Unicode
sequences. Then add a UnicodeReader, and we've got the back end of
a Tibetan Unicode shaping system (like half of MS's Uniscribe or Apple's
Worldscript or FreeType Layout or Omega's OTPs).
A top-down design would not have included LegalTshegBar. But now that
my itch has been scratched, potential uses are lingering about. For example,
it would be nice to scan some input and break it into LegalTshegBars,
punctuation/marks/signs, and illegal stacks. Then we could alert the client
of the illegality, its precise form, and its precise location.
The real system for turning a Unicode stream into an internal representation
suitable for conversion to EWTS/ACIP/XHTML/what-have-you need not be
aware of Tibetan syntax. But to make the very best conversion from
Unicode to, e.g., EWTS, it is necessary to konw that gaskad is better
represented as gskad, but that jaskad is not the same as jskad.
It does not have many methods for determining the root letter, suffix,
and so on, but these should be easy to add. David, please use this
class to the extent that it and your new work overlap.
Enabled the property thdl.rely.on.system.tmw.fonts before the production of TibetanMachineWeb HTML. This avoids the call to readInFontFiles() within the TibetanMachineWeb class (which raises an exception when it cannot find for whatever reason the fonts). The servlet doesn't need to load the fonts anyway!
texts. All the catalogs of the Literary Collection are at this time using this
DTD and a proprietary program, Dynaweb. There is also an XML version
of it, xtibbibl.dtd, which has some slight modifications.
'Fonts' module inside the 'Jskad' module. I.e., you must now have the
tree like so:
Jskad/
source/
dist/
Fonts/
TibetanMachineWeb/
.
.
.
This is because the THDL tools now optionally (and by default) load
the TibetanMachineWeb fonts automatically.
Updated the build system so that the 'web-start-releases' and
'self-contained-dist' targets JAR up optional JARs to create
double-clickable, self-contained joy. Even the TMW fonts are in the
JARs now.
Changed the strings describing two Jskad keyboards so that "keyboard"
is no longer in the description. It's in the label next to the combo
box.
Jskad now saves preferences on exit or when the user selects a menu
item (that is there for debugging mainly) to ~/my_thdl_preferences.txt
on *nix or C:\my_thdl_preferences.txt on Win32. I don't know the
correct Mac location.
There's a new paradigm for telling org.thdl.util.ThdlOptions that a
user preference has been changed. If, for example, a combo box is
manipulated so that the ACIP keyboard is selected, then you must call
a certain method in ThdlOptions.
but does line breaks correctly. I.e., I refactored DuffPane into two classes.
I did this trying to track down a subtle bug in line breaking: 'gye ' breaks
after 'gy' sometimes, with the dreng bo on the next line, but only when you
resize the window certain ways, and only in Savant (and maybe QD and the
translation tool, I don't know) but not in Jskad.
I was not successful in finding the bug, but it still exists when I use
TibetanPanes instead of DuffPanes in org.thdl.savant.tib.*.
but does line breaks correctly. I.e., I refactored DuffPane into two classes.
I did this trying to track down a subtle bug in line breaking: 'gye ' breaks
after 'gy' sometimes, with the dreng bo on the next line, but only when you
resize the window certain ways, and only in Savant (and maybe QD and the
translation tool, I don't know) but not in Jskad.
I was not successful in finding the bug, but it still exists when I use
TibetanPanes instead of DuffPanes in org.thdl.savant.tib.*.
ACIP's 'WA' represents Wylie's 'wa', but ACIP's 'ZHVA' represents Wylie's
'zhwa'. The key for wasur is the same as the key for the twentieth
consonant in extended Wylie, but not in ACIP.
Savant. That is, org.thdl.savant.SoundPanel
has been eliminated in favour of these classes,
which are shared between QD and Savant.
The main change is that SmartMoviePanels
can now communicate with the outside world,
for example to send messages to a Savant
text window telling it to update highlights.
ACIP or 'ShSm' in Extended Wylie to see the new behavior.
We use a trie to store valid input sequences. In the future, we could use
the same trie as a replacement for the more inefficient HashSets we use to
store characters, vowels, and punctuation. For example, we'd use
'validInputSequences.put("K", new Pair("consonant", "k"))' when reading
in the ACIP keyboard's description of the first consonant of the Tibetan
alphabet in 'TibetanKeyboard.java'.
Note that the current trie implementation is only useful for 7- or 8-bit
transcription systems, and works best for tries with low average depth, which
describes a transcription system's trie very well. If you used arbitrary
Unicode in your keyboard, you'd need a different trie implementation.
Improved the optional keyboard input mode status messages.
Added a JUnit test for the new Trie that fails at present since the Trie is
case-insensitive. Running JUnit tests is not something our build system
knows about at present, but Eclipse 2.0 makes it very easy.
Fixed a few compiler errors due to imports I'd forgotten.