Commit graph

339 commits

Author SHA1 Message Date
dchandler
f8c959bfb0 The Tibetan d.za was being converted into the Wylie dza incorrectly. This
is a rare case, but I want TMW->Wylie to be perfectly unambiguous.
2003-07-18 00:30:27 +00:00
dchandler
1c29566aee I'm now using the Unix diff built in to Apache Jakarta Commons JRCS
(which I found on suigeneris.org, not apache.org) in order to bulletproof the
Tibetan Converter tests.  They used to fail due to nondeterminism in the
Java RTF writer; they should no longer fail.

I've also changed it so that the Tibetan Converter tests run in headless
mode, which means that they'll run on the nightly builds server.
2003-07-14 12:26:26 +00:00
dchandler
f900154e7a Tests disambiguation in TMW->Wylie conversion. 2003-07-14 12:21:02 +00:00
dchandler
0622ac5062 Jskad no longer relies on the <?Consonants?>, <?Vowels?>, <?Other?>,
or <?Numbers?> commands; it instead hard-codes the appropriate comma-
delimited lists.  This is cleaner because WylieWord and Jskad had different
values for these lists.
2003-07-14 12:19:46 +00:00
dchandler
fb85f6e8ce Fix comment. 2003-07-14 12:17:04 +00:00
dchandler
79b3b97326 Remove warning message from menu item. 2003-07-13 23:19:11 +00:00
dchandler
c986684beb Updated help to talk about new features. 2003-07-13 22:51:35 +00:00
dchandler
f695b1a6c1 Updated baselines because conversions have improved since the last
update.
2003-07-13 19:14:41 +00:00
dchandler
d10f97fc06 Disambiguation was not being used appropriately. This makes previous
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.

Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
2003-07-13 19:14:15 +00:00
dchandler
96afae795c Disambiguation was not being used appropriately. This makes previous
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.

Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
2003-07-13 18:46:29 +00:00
dchandler
802e0cb588 If this method uses the Wylie representation, you get an infinite recursion
when you do a TMW->Wylie conversion for a document with glyphs that
have no known Wylie.
2003-07-13 17:40:02 +00:00
dchandler
a86a0f235b I was missing a break; statement; this caused an Error to be thrown during
some TMW->Wylie conversions.  No conversions were erroneous, though.
2003-07-13 17:38:00 +00:00
dchandler
6677d1e245 Code cleanup. 2003-07-13 16:53:03 +00:00
dchandler
3b6eaa792e Fixed javadocs. 2003-07-11 13:33:30 +00:00
dchandler
85176cd9f3 Put in a fix for a new bug in Swing's RTF support. This bug is w.r.t. escapes
like \bullet, \emdash, etc., and this fix only works for Windows or OS/2 RTF
files, not for Mac RTF files.  So if you want a TM->TMW conversion to work,
use MS Word for Windows, not for the Mac.
2003-07-11 13:30:22 +00:00
dchandler
d726bc0258 A couple of changes to TMW->Unicode thanks to Than's reply to my
questions.
2003-07-09 01:44:15 +00:00
dchandler
9db233bdf8 Cosmetic change. 2003-07-08 14:31:14 +00:00
dchandler
02558a1d78 Jskad supports <7, >8, etc. again; it no longer supports the punctuation
'<' and '>'.  The current keyboard implementation makes this an either-or
proposition, when fundamentally it need not be.

Added a <?Numbers?> command and an <?Input:Numbers?> command to
tibwn.ini; broke the numbers apart from the consonants.  This facilitates the
new-and-improved Tibetan->Wylie conversion.

Tibetan->Wylie is now done by forming legal tsheg-bars.  A legal tsheg bar
is converted into perfect THDL Wylie.  See code comments to learn what
it thinks is a legal tsheg-bar, but it inlcudes bskyUMbsH minus the trailing
punctuation (H), e.g.

Illegal sequences, such as runs of transliterated Sanskrit, are turned into
unambiguous Wylie; each glyph is followed by a vowel or a disambiguator
('.').

I've made it so that the illegal sequences are as beautiful as possible.  You
get 'pad+me', for example, not the equivalent but uglier 'pad+m.e.'.
2003-07-08 14:30:17 +00:00
dchandler
c04a3f189b Rearranged the topics. 2003-07-08 12:50:27 +00:00
dchandler
23d18c925f Tibetan! 5.1's docs were again faulty. fa and va were getting the wrong
vowels.
2003-07-08 02:59:17 +00:00
dchandler
24ac6fd06c The Trie of possible inputs fixed this bug. 2003-07-06 16:31:13 +00:00
dchandler
d88141512b Small changes w.r.t. clearing preferences. Some code cleanup. 2003-07-06 16:24:29 +00:00
dchandler
086f4bb6ec Renamed the Info menu Help.
Now using CalHTMLPane to surf the offline and the online help.
2003-07-05 22:25:21 +00:00
dchandler
8c4ab30a52 Rearranged the Tools menu; made the converter smart about "find some..."
and "find all..." modes.
2003-07-05 21:02:46 +00:00
dchandler
72d2eee503 Code cleanup. 2003-07-05 19:26:58 +00:00
dchandler
a463b686b3 Jskad now ships with both TibetanMachine and TibetanMachineWeb fonts
by default, not just TMW.  Thus users need not install these fonts on their
systems.
2003-07-05 18:00:29 +00:00
dchandler
9effee0564 If you opened a file from the recently opened files list and very quickly
mouse-clicked on the new Jskad window, you could cause an infinite
regression of requestFocus() operations because the menu would try
to get focus back.  I grab focus from the menu now.
2003-07-05 02:30:00 +00:00
dchandler
51679c158b Final fixes completed; recently opened files can now be selected from
Jskad's file menu.
2003-07-05 02:15:33 +00:00
dchandler
4410b52c07 There's still a small bug in this, but here's the real stuff:
Recently opened files can now be selected from Jskad's file menu.

A Jskad now gives the focus to the DuffPane when that Jskad gets the
focus.
2003-07-04 03:29:25 +00:00
dchandler
d863446d25 I think *this* compiles... 2003-07-04 02:32:40 +00:00
dchandler
407020108f I didn't mean to commit the previous revision; I'm still tweaking it. 2003-07-04 02:32:03 +00:00
dchandler
9f0b1c3250 Recently opened files can now be selected from Jskad's file menu.
A Jskad now gives the focus to the DuffPane when that Jskad gets the
focus.
2003-07-04 02:31:23 +00:00
dchandler
7500b4e06b Jskad won't allow you to exit by closing the last window anymore. Instead,
you get a dialog box saying to use File/Exit.
2003-07-04 00:21:07 +00:00
dchandler
6c286573ba Fixed Javadocs. 2003-07-04 00:12:59 +00:00
dchandler
0a1bc0d30b getWylie now takes a parameter for error detection; I'm not detecting errors
here though.

Fixed a typo in a property name.
2003-07-01 23:20:08 +00:00
dchandler
0d1999d055 getWylie now takes a parameter for error detection; I'm not detecting errors
here though.
2003-07-01 22:52:18 +00:00
dchandler
a48ec641d5 Better error messages in TMW->Wylie conversions. The user knows what's
up.
2003-07-01 03:43:33 +00:00
dchandler
3113a4b8de Some of the \tmw80.. mappings were out of date.
3+1/2 is not EWTS; took these out.
2003-07-01 03:42:30 +00:00
dchandler
e7e7c2bf15 The command-line tool runs in headless mode by default, so it will
work on a Linux console, e.g.  The JUnit tests will too, though 'ant
check' still fails because we don't sneak the -Djava.awt.headless=true
into the process early enough.
2003-07-01 02:50:09 +00:00
dchandler
6151a7bc94 TMW->Wylie now occurs in the TibetanDocument, not in DuffPane,
which means that the command-line tool can finally function with a headless
graphics device.  Hopefully it will speed things up, too.  It also means that
entering Roman text into the TMW->Unicode conversion and TMW->TM
conversion will be easy.
2003-07-01 01:21:57 +00:00
dchandler
61d29fc355 The TMW->Wylie mapping was busted w.r.t. tshegs.
Also, I now map both TMW7.90 and TMW7.91 to EWTS 'M'.
2003-07-01 00:17:18 +00:00
dchandler
229536884f I've validated by hand the TM<->TMW mappings. A few things changed, so
no previous TM->TMW or TMW->TM conversions can be trusted.
2003-06-30 02:24:11 +00:00
dchandler
dc03083433 I've validated by hand the TM<->TMW mappings. A few things changed, so
no previous TM->TMW conversions can be trusted.
2003-06-30 02:22:09 +00:00
dchandler
58644a6ef9 Better error handling. 2003-06-30 02:20:52 +00:00
dchandler
b16fb8a85c This is correct; the Tibetan! 5.1 documentation is not. This affects
TM->TMW conversions.

See http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515
for a full list of Tibetan! 5.1 documentation errors.
2003-06-29 22:11:00 +00:00
dchandler
aedef4b44d An error now appears if you try to convert from format A to format B but no
glyphs in format A appear.  In this case, it is likely that you meant to convert
a different file or do a different conversion.
2003-06-29 21:31:48 +00:00
dchandler
ee14b7b97f Jskad now has the ability to open its buffer with an external viewer, e.g.
Microsoft Word.

Better OOM error handling in the GUI converter; untested, though.
2003-06-29 20:49:30 +00:00
dchandler
646e23b4a4 Tweaked the converter GUI so that you can open the old and the new files
with the external viewer.
2003-06-29 16:45:15 +00:00
dchandler
3f76c3692d Fixed Javadoc warnings. 2003-06-29 15:37:35 +00:00
dchandler
b841a7f14b The converter GUI can now be run standalone or from Jskad's Tools menu.
The converter GUI gives nicer error messages in at least one case.
2003-06-29 04:18:36 +00:00
dchandler
7938648ca8 TM->TMW conversion has no known bugs. Oddballs have been
comprehensively handled.
2003-06-29 03:03:07 +00:00
dchandler
689c1910aa To deal with java.swing.text.rtf bugs regarding hexadecimal escape
sequences, I've created RTFFixerInputStream.  It turns illegal hexadecimal
escapes into Unicode escapes.
2003-06-29 02:30:08 +00:00
dchandler
0b849aed97 Fixed comments w.r.t. javadoc warnings. 2003-06-29 02:22:20 +00:00
dchandler
4e279defb4 Fixed a couple of array bounds checks.
Added support for two more oddballs.

Deprecated the oddball lookup method because it drops up to 30 glyphs in
TibetanMachine.  The correct solution is to transform the RTF before Java's
busted RTF readers ever see it.  \'97 becomes \u151, e.g.
2003-06-28 16:33:58 +00:00
dchandler
2a359c45ef Bad conversions were not leaving the unconvertable characters at the
beginning of the document as they should and as they are documented to.

They now do, and they bracket the bad characters with the TM or TMW for
U+0F3C on the left and the TM or TMW for U+0F3D on the right.

Some cleanup.
2003-06-28 16:20:19 +00:00
dchandler
c39d8d6326 My earlier code cleanup introduced this bug; TMW->TM conversion was
busted.
2003-06-26 22:48:51 +00:00
dchandler
25510542b2 Now with a nicer error message in one case. 2003-06-26 22:48:05 +00:00
dchandler
c34259b105 Code cleanup. 2003-06-25 01:04:24 +00:00
dchandler
9e6c3009ac Added an About button. Code cleanup. Changed the Cancel button to the
Close button.
2003-06-25 00:49:11 +00:00
dchandler
569fba6467 Made the comments in the my_thdl_preferences.txt file use standard line
separators.
2003-06-25 00:03:46 +00:00
dchandler
0f3c4174b6 Made the comments in the my_thdl_preferences.txt file more useful. 2003-06-24 23:48:00 +00:00
dchandler
33beb7b782 Bye bye debugging output. 2003-06-24 12:23:37 +00:00
dchandler
f547734043 Added Than's converter GUI code; adapted it to work with Jskad's
converters.

TMW->Unicode now uses Ximalaya by default.
2003-06-24 03:02:29 +00:00
dchandler
19d7cabfe6 Forget the final=faster myth. 2003-06-24 03:01:13 +00:00
dchandler
917864574c Fixed a logic bug in mapTMWtoTM and mapTMtoTMW.
You can now specify which Unicode font to use via 'java
-Dthdl.tmw.to.unicode.font=Ximalaya ...'.
2003-06-23 01:58:11 +00:00
dchandler
b6d8fd89f9 When errors in (all but TMW->Wylie and Wylie->TMW) conversion occur,
the troublesome glyphs are now put at the beginning of the document
AFTER AN ACHEN.  This makes a glyph like \tmw7095 visible atop the
achen.

Major fix to the handling of paragraphs in conversion; we were (for
whatever reason) dropping paragraphs before.
2003-06-23 01:24:02 +00:00
dchandler
1f4343bed0 TMW->TM, TM->TMW, and TMW->Unicode conversions are all (at least 2)
orders of magnitude faster.
2003-06-22 22:10:58 +00:00
dchandler
afe73c2228 The pseudo-file '-', referring to standard input, is now accepted as a
command-line argument.
2003-06-22 21:05:16 +00:00
dchandler
900f7492b0 'ant clean check' was failing because I hadn't updated the
--find-some-non-tmw and --find-all-non-tmw baselines.

Code cleanup.
2003-06-22 16:11:58 +00:00
dchandler
66287f3cc9 Small TMW->Wylie performance improvements. TMW->Wylie is *much*
faster than TMW->Unicode etc.; this is because many fewer replacements
are made (i.e., more text is replaced each time a replacement is
performed).

I must find a way to still preserve formatting but do many fewer
replacements in TMW->{Unicode,TM} and TM->TMW.
2003-06-22 04:32:59 +00:00
dchandler
6540b260bd Fixes a (small, I think) TMW->Unicode performance glitch. I was
inserting 5 characters at a time and then skipping ahead just one
position.  I don't think this affected correctness.

I believe there's still a terrible (exponential?) slowdown as the
input file gets bigger, however.  Perhaps not -- but we run through
the first 1000 TMW glyphs in 6 seconds, the 20th thousand takes at
least 60 seconds.  Is TMW->Wylie faster than TMW->Unicode?  If so,
why?

Thought: don't use a DuffPane within TibetanConverter -- it can only
add overhead, right?  My hprof profile said that the conversion was
taking just a couple of percent of the work; the rest was going to
display-related stuff that you should only see if you were displaying
the document.  I'm not!
2003-06-22 04:08:33 +00:00
dchandler
dfe64a1927 Added --find-some-non-tm and --find-all-non-tm modes to the converter to
help ensure worry-free TM->TMW conversions.
2003-06-22 00:14:18 +00:00
dchandler
80101666c7 Included a fix from WylieWord's tibwn.ini. Removed some needless trailing
tildes.
2003-06-21 02:35:21 +00:00
dchandler
9a41f512d9 It used to be the case that you could select 'Close', and then when asked
"do you want to save?" you could press yes and then press cancel and
Jskad would still exit.  That's no longer the case.

Added File->Exit to Jskad.
2003-06-21 02:07:51 +00:00
dchandler
45b87b0fb4 In Jskad, you can now clear the preferences and return to default values. 2003-06-21 01:26:17 +00:00
eg3p
fbb6245fdb Added cut() and copy() methods to override JTextPane's methods of same name. 2003-06-20 15:27:20 +00:00
dchandler
5067683121 Edward corrected me; he had intended to have M map to 7.91, not 7.90. 2003-06-17 01:46:19 +00:00
dchandler
ced830a7d3 Renamed TMW_RTF_TO_THDL_WYLIE TibetanConverter. 2003-06-15 19:19:23 +00:00
dchandler
34a7b5da9b This converter now performs TMW->Unicode conversions. 2003-06-15 18:38:42 +00:00
dchandler
da70434e52 Jskad now allows for TMW->Unicode conversion. 2003-06-15 16:27:36 +00:00
dchandler
af5b95b08d A TMW->Unicode table is here. Note these issues, however:
Is the EWTS '_' to be represented as U+0020, or is it a wider space?

Does TMW9.42, Dza, map to U+0F5F,U+0F39?

Does TMW6.60, r+y, map to U+0F62,U+0FBB or to U+0F6A,U+0FBB?  (Likewise with r+w, TMW6.61, TMW6.62, etc.)

Is U+0F7E a bindu?  What Unicode does TMW7.96 map to, for example?  What does TMW7.91 map to?

Should TMW8.97 and TMW8.98 map to swastiskas elsewhere in Unicode?  If so, which codepoints?  Likewise with TMW9.60, a Chinese character.

Does TMW7.68 map to U+0F39?

Does TMW7.74, the ITHI secret sign, have a Unicode mapping?  f68,fa0,f80,f72 comes close, but fa0 would be too large, wouldn't it?

What Unicode does TMW9.61 map to?  Is it for sequences like f40,f7c,f60,f72?  Or is it for f60,f72,f7c?
2003-06-15 03:25:45 +00:00
dchandler
b387c512e9 Fixed two bugs. 2003-06-15 03:08:57 +00:00
dchandler
189fef9aec Made Jskad smart enough to handle a few more EWTS characters; some
it can only convert to Wylie, others are live key sequences.  This will make
converting the shechen documents go more smoothly.
2003-06-09 13:35:43 +00:00
dchandler
09a55110b7 Handles more TibetanMachine oddballs. 2003-06-09 02:01:13 +00:00
dchandler
b9219640e5 Handles more TibetanMachine oddballs. 2003-06-09 01:53:01 +00:00
dchandler
e97e1c8464 Handles more TibetanMachine oddballs. 2003-06-09 01:20:32 +00:00
dchandler
651a599188 Fixed usage info. 2003-06-08 23:23:12 +00:00
dchandler
70b31558fa Tried to fix a crashing bug that happened when you converted TM->TMW
and then tried to convert that TMW to Wylie.  I swear it's Java's
problem (see the ugly stack trace in the code and decide for
yourself), and I tried replacing rather than
inserting-and-then-removing, but it didn't work.  I've left these
things as options.
2003-06-08 23:12:52 +00:00
dchandler
212414edef TMW_RTF_TO_THDL_WYLIE now converts TM->TMW. 2003-06-08 22:43:27 +00:00
dchandler
32831b698f If bad (oddball) TM glyphs appear, then converting to TMW causes, by
default, all oddballs to appear once in the resulting document.
This'll help me find the correct glyphs for the oddballs, and it'll
prevent the average user from converting a document with oddballs.
2003-06-08 22:37:38 +00:00
dchandler
d45f5ab8c8 Improved performance (I suppose). 2003-06-03 23:49:34 +00:00
dchandler
7d768c9e06 Fixed a crashing bug that happened upon converting wylie to tibetan. 2003-06-03 23:45:15 +00:00
dchandler
0f724989b5 The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90.
I've fixed that.

I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.

TM->TMW conversion takes about 1 second per thousand glyphs on my
PIII-550.
2003-06-01 23:05:32 +00:00
dchandler
54ca37c824 The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90.
I've fixed that.

I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.
2003-06-01 19:14:08 +00:00
dchandler
e2caf99085 Some code cleanup.
tibwn.ini must now have, in the Unicode column, either nothing, or
0FXX(,0FXX)*.  E.g., 0F04,0F05 is valid.  Debugging code ensures this is
the case.
2003-06-01 18:09:49 +00:00
dchandler
1f6bb07d53 Fixes bogus Unicode mappings mentioned in
http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515.
2003-06-01 04:02:04 +00:00
dchandler
7a8264d87c Fixed typo. 2003-06-01 03:30:49 +00:00
dchandler
0235263ddf TM->TMW and TMW->TM conversion in RTF is now supported. I've
noticed that formatting is mostly OK but sometimes gets bungled slightly.
I tried everything I could think of, and now I'm passing the buck to Java's
RTF support.

TMW_RTF_TO_THDL_WYLIE (now misnamed) support TMW->TM
conversion (but not TM->TMW).  There is an automated test case for a
TMW->TM conversion.

I have full confidence in this conversion.  Even the smallest glitch in the core
functionality (not formatting) would surprise me.

Note that the JUnit test TMW_RTF_TO_THDL_WYLIETest sometimes fails
due to one- or two-line diffs between the actual and expected outputs.  This
is because Java's RTF support is not deterministic, I'm guessing, and is not
a real failure.  I'm too lazy to make a more elaborate sed/diff mechanism
that works on all platforms, and that would complicate the build anyway.
2003-05-31 23:21:29 +00:00
dchandler
bfacd6c998 Accurate TM->TMW and TMW->TM mappings are now available. I've
verified this extensively and have full confidence that these mappings
agree with Tony Duff's Tibetan! 5.1 documentation (except as described
below).

To get them, I had to disregard Tony Duff's tables for a few glyphs: the
characters with ordinal 32 and 45 (space and hyphen in Roman ASCII,
space and tsheg in Tibetan).  For these glyphs, we must have mappings
from TibetanMachineSkt4.32 to something, etc., and those mappings were
not present.  I've normalized the mapping for these glyphs, as it is arbitrary
because the same two glyphs just appear fifteen times each.
2003-05-31 20:13:15 +00:00
dchandler
a4bc23a9ab Made performance improvements, doc improvements, and code cleanup to
DuffCode.
2003-05-31 17:02:06 +00:00