well-formed. They still do, but they do it less often.
Chris Fynn wrote this a while back:
By normal Tibetan & Dzongkha spelling, writing, and input rules
Tibetan script stacks should be entered and written: 1 headline
consonant (0F40->0F6A), any subjoined consonant(s) (0F90-> 0F9C),
achung (0F71), shabkyu (0F74), any above headline vowel(s) (0F72
0F7A 0F7B 0F7C 0F7D and 0F80); any ngaro (0F7E, 0F82 and 0F83).
Now efforts are made to ensure that the converters conform to the
above rules.
which means that the command-line tool can finally function with a headless
graphics device. Hopefully it will speed things up, too. It also means that
entering Roman text into the TMW->Unicode conversion and TMW->TM
conversion will be easy.
beginning of the document as they should and as they are documented to.
They now do, and they bracket the bad characters with the TM or TMW for
U+0F3C on the left and the TM or TMW for U+0F3D on the right.
Some cleanup.
the troublesome glyphs are now put at the beginning of the document
AFTER AN ACHEN. This makes a glyph like \tmw7095 visible atop the
achen.
Major fix to the handling of paragraphs in conversion; we were (for
whatever reason) dropping paragraphs before.
inserting 5 characters at a time and then skipping ahead just one
position. I don't think this affected correctness.
I believe there's still a terrible (exponential?) slowdown as the
input file gets bigger, however. Perhaps not -- but we run through
the first 1000 TMW glyphs in 6 seconds, the 20th thousand takes at
least 60 seconds. Is TMW->Wylie faster than TMW->Unicode? If so,
why?
Thought: don't use a DuffPane within TibetanConverter -- it can only
add overhead, right? My hprof profile said that the conversion was
taking just a couple of percent of the work; the rest was going to
display-related stuff that you should only see if you were displaying
the document. I'm not!
and then tried to convert that TMW to Wylie. I swear it's Java's
problem (see the ugly stack trace in the code and decide for
yourself), and I tried replacing rather than
inserting-and-then-removing, but it didn't work. I've left these
things as options.
default, all oddballs to appear once in the resulting document.
This'll help me find the correct glyphs for the oddballs, and it'll
prevent the average user from converting a document with oddballs.
I've fixed that.
I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.
TM->TMW conversion takes about 1 second per thousand glyphs on my
PIII-550.
noticed that formatting is mostly OK but sometimes gets bungled slightly.
I tried everything I could think of, and now I'm passing the buck to Java's
RTF support.
TMW_RTF_TO_THDL_WYLIE (now misnamed) support TMW->TM
conversion (but not TM->TMW). There is an automated test case for a
TMW->TM conversion.
I have full confidence in this conversion. Even the smallest glitch in the core
functionality (not formatting) would surprise me.
Note that the JUnit test TMW_RTF_TO_THDL_WYLIETest sometimes fails
due to one- or two-line diffs between the actual and expected outputs. This
is because Java's RTF support is not deterministic, I'm guessing, and is not
a real failure. I'm too lazy to make a more elaborate sed/diff mechanism
that works on all platforms, and that would complicate the build anyway.
brace problem upon opening any RTF document.
The TMW_RTF_TO_THDL_WYLIE test baselines changed because
I fixed (a while ago) some inconsistencies between the EWTS standard and
Jskad.
Conversion of TibetanMachineWeb8.40, @#, to Wylie now works correctly.
Unfortunately, though, typing @# doesn't produce 8.40, it still produces
8.38 and 8.39, two glyphs.
org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE. It converts RTF files
consisting of TMW characters to the corresponding THDL Extended Wylie.
It supports --find-some-non-tmw mode, which allows you to ensure that no
unusual characters will spoil the conversion. The converter has built-in
intelligence that allows it to handle Tahoma '{', '}', and '\\' characters
properly.
The converter works on mixed Roman/TMW also, but --find-some-non-tmw
and --find-all-non-tmw modes are not as useful.
Invoke org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE, which resides in
Jskad's jar, with no command-line options to see usage information.
but does line breaks correctly. I.e., I refactored DuffPane into two classes.
I did this trying to track down a subtle bug in line breaking: 'gye ' breaks
after 'gy' sometimes, with the dreng bo on the next line, but only when you
resize the window certain ways, and only in Savant (and maybe QD and the
translation tool, I don't know) but not in Jskad.
I was not successful in finding the bug, but it still exists when I use
TibetanPanes instead of DuffPanes in org.thdl.savant.tib.*.
DefaultStyledDocument, and another consisting entirely of static utility
methods for processing Tibetan text. Moved TibetanDocument.DuffData
into its own class.
I think this makes things a bit more transparent, and gets us a little closer to
making clean use of Swing.
Added a "Quit" option to Savant's File menu. Factored out the Close
option in doing so.
Exceptions in many action listeners are now handled by
org.thdl.util.ThdlActionListener or org.thdl.util.ThdlAbstractAction.
Many exceptions that we used to just log now optionally cause aborts.
This option is on by default for developers using 'ant savant-run'-style
targets, but it is off for users.
An erroneous CLASSPATH now causes a useful error message in almost
all situations.
Fixed some typos and bad links in Javadoc comments.
Added a simple assertion facility, but the overhead is suffered even in
release builds.
Factored out the code that sets up log files like savant.log and jskad.log.