Commit graph

239 commits

Author SHA1 Message Date
dchandler
afe73c2228 The pseudo-file '-', referring to standard input, is now accepted as a
command-line argument.
2003-06-22 21:05:16 +00:00
dchandler
900f7492b0 'ant clean check' was failing because I hadn't updated the
--find-some-non-tmw and --find-all-non-tmw baselines.

Code cleanup.
2003-06-22 16:11:58 +00:00
dchandler
66287f3cc9 Small TMW->Wylie performance improvements. TMW->Wylie is *much*
faster than TMW->Unicode etc.; this is because many fewer replacements
are made (i.e., more text is replaced each time a replacement is
performed).

I must find a way to still preserve formatting but do many fewer
replacements in TMW->{Unicode,TM} and TM->TMW.
2003-06-22 04:32:59 +00:00
dchandler
6540b260bd Fixes a (small, I think) TMW->Unicode performance glitch. I was
inserting 5 characters at a time and then skipping ahead just one
position.  I don't think this affected correctness.

I believe there's still a terrible (exponential?) slowdown as the
input file gets bigger, however.  Perhaps not -- but we run through
the first 1000 TMW glyphs in 6 seconds, the 20th thousand takes at
least 60 seconds.  Is TMW->Wylie faster than TMW->Unicode?  If so,
why?

Thought: don't use a DuffPane within TibetanConverter -- it can only
add overhead, right?  My hprof profile said that the conversion was
taking just a couple of percent of the work; the rest was going to
display-related stuff that you should only see if you were displaying
the document.  I'm not!
2003-06-22 04:08:33 +00:00
dchandler
dfe64a1927 Added --find-some-non-tm and --find-all-non-tm modes to the converter to
help ensure worry-free TM->TMW conversions.
2003-06-22 00:14:18 +00:00
dchandler
80101666c7 Included a fix from WylieWord's tibwn.ini. Removed some needless trailing
tildes.
2003-06-21 02:35:21 +00:00
dchandler
9a41f512d9 It used to be the case that you could select 'Close', and then when asked
"do you want to save?" you could press yes and then press cancel and
Jskad would still exit.  That's no longer the case.

Added File->Exit to Jskad.
2003-06-21 02:07:51 +00:00
dchandler
45b87b0fb4 In Jskad, you can now clear the preferences and return to default values. 2003-06-21 01:26:17 +00:00
eg3p
fbb6245fdb Added cut() and copy() methods to override JTextPane's methods of same name. 2003-06-20 15:27:20 +00:00
dchandler
5067683121 Edward corrected me; he had intended to have M map to 7.91, not 7.90. 2003-06-17 01:46:19 +00:00
dchandler
6712b47e13 Added an option to control the Unicode font for TMW->Unicode
conversions.
2003-06-15 20:28:56 +00:00
dchandler
ced830a7d3 Renamed TMW_RTF_TO_THDL_WYLIE TibetanConverter. 2003-06-15 19:19:23 +00:00
dchandler
34a7b5da9b This converter now performs TMW->Unicode conversions. 2003-06-15 18:38:42 +00:00
dchandler
da70434e52 Jskad now allows for TMW->Unicode conversion. 2003-06-15 16:27:36 +00:00
dchandler
af5b95b08d A TMW->Unicode table is here. Note these issues, however:
Is the EWTS '_' to be represented as U+0020, or is it a wider space?

Does TMW9.42, Dza, map to U+0F5F,U+0F39?

Does TMW6.60, r+y, map to U+0F62,U+0FBB or to U+0F6A,U+0FBB?  (Likewise with r+w, TMW6.61, TMW6.62, etc.)

Is U+0F7E a bindu?  What Unicode does TMW7.96 map to, for example?  What does TMW7.91 map to?

Should TMW8.97 and TMW8.98 map to swastiskas elsewhere in Unicode?  If so, which codepoints?  Likewise with TMW9.60, a Chinese character.

Does TMW7.68 map to U+0F39?

Does TMW7.74, the ITHI secret sign, have a Unicode mapping?  f68,fa0,f80,f72 comes close, but fa0 would be too large, wouldn't it?

What Unicode does TMW9.61 map to?  Is it for sequences like f40,f7c,f60,f72?  Or is it for f60,f72,f7c?
2003-06-15 03:25:45 +00:00
dchandler
b387c512e9 Fixed two bugs. 2003-06-15 03:08:57 +00:00
dchandler
189fef9aec Made Jskad smart enough to handle a few more EWTS characters; some
it can only convert to Wylie, others are live key sequences.  This will make
converting the shechen documents go more smoothly.
2003-06-09 13:35:43 +00:00
dchandler
09a55110b7 Handles more TibetanMachine oddballs. 2003-06-09 02:01:13 +00:00
dchandler
b9219640e5 Handles more TibetanMachine oddballs. 2003-06-09 01:53:01 +00:00
dchandler
e97e1c8464 Handles more TibetanMachine oddballs. 2003-06-09 01:20:32 +00:00
dchandler
651a599188 Fixed usage info. 2003-06-08 23:23:12 +00:00
dchandler
70b31558fa Tried to fix a crashing bug that happened when you converted TM->TMW
and then tried to convert that TMW to Wylie.  I swear it's Java's
problem (see the ugly stack trace in the code and decide for
yourself), and I tried replacing rather than
inserting-and-then-removing, but it didn't work.  I've left these
things as options.
2003-06-08 23:12:52 +00:00
dchandler
212414edef TMW_RTF_TO_THDL_WYLIE now converts TM->TMW. 2003-06-08 22:43:27 +00:00
dchandler
32831b698f If bad (oddball) TM glyphs appear, then converting to TMW causes, by
default, all oddballs to appear once in the resulting document.
This'll help me find the correct glyphs for the oddballs, and it'll
prevent the average user from converting a document with oddballs.
2003-06-08 22:37:38 +00:00
dchandler
d45f5ab8c8 Improved performance (I suppose). 2003-06-03 23:49:34 +00:00
dchandler
7d768c9e06 Fixed a crashing bug that happened upon converting wylie to tibetan. 2003-06-03 23:45:15 +00:00
dchandler
0f724989b5 The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90.
I've fixed that.

I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.

TM->TMW conversion takes about 1 second per thousand glyphs on my
PIII-550.
2003-06-01 23:05:32 +00:00
dchandler
54ca37c824 The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90.
I've fixed that.

I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.
2003-06-01 19:14:08 +00:00
dchandler
e2caf99085 Some code cleanup.
tibwn.ini must now have, in the Unicode column, either nothing, or
0FXX(,0FXX)*.  E.g., 0F04,0F05 is valid.  Debugging code ensures this is
the case.
2003-06-01 18:09:49 +00:00
dchandler
1f6bb07d53 Fixes bogus Unicode mappings mentioned in
http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515.
2003-06-01 04:02:04 +00:00
dchandler
7a8264d87c Fixed typo. 2003-06-01 03:30:49 +00:00
dchandler
0235263ddf TM->TMW and TMW->TM conversion in RTF is now supported. I've
noticed that formatting is mostly OK but sometimes gets bungled slightly.
I tried everything I could think of, and now I'm passing the buck to Java's
RTF support.

TMW_RTF_TO_THDL_WYLIE (now misnamed) support TMW->TM
conversion (but not TM->TMW).  There is an automated test case for a
TMW->TM conversion.

I have full confidence in this conversion.  Even the smallest glitch in the core
functionality (not formatting) would surprise me.

Note that the JUnit test TMW_RTF_TO_THDL_WYLIETest sometimes fails
due to one- or two-line diffs between the actual and expected outputs.  This
is because Java's RTF support is not deterministic, I'm guessing, and is not
a real failure.  I'm too lazy to make a more elaborate sed/diff mechanism
that works on all platforms, and that would complicate the build anyway.
2003-05-31 23:21:29 +00:00
dchandler
bfacd6c998 Accurate TM->TMW and TMW->TM mappings are now available. I've
verified this extensively and have full confidence that these mappings
agree with Tony Duff's Tibetan! 5.1 documentation (except as described
below).

To get them, I had to disregard Tony Duff's tables for a few glyphs: the
characters with ordinal 32 and 45 (space and hyphen in Roman ASCII,
space and tsheg in Tibetan).  For these glyphs, we must have mappings
from TibetanMachineSkt4.32 to something, etc., and those mappings were
not present.  I've normalized the mapping for these glyphs, as it is arbitrary
because the same two glyphs just appear fifteen times each.
2003-05-31 20:13:15 +00:00
dchandler
a4bc23a9ab Made performance improvements, doc improvements, and code cleanup to
DuffCode.
2003-05-31 17:02:06 +00:00
dchandler
08d2ea3e2d Jeff C. H. Wu found a bug whereby typing 'cuig' just after starting Jskad fails
(by producing 'cug') although typing 'kcuig' succeeds.

This is now fixed, and test cases now exist to ensure that the problem
doesn't reappear.
2003-05-31 12:58:36 +00:00
dchandler
bc9a8f4754 Jeff C. H. Wu found a bug whereby typing 'cuig' just after starting Jskad fails
(by producing 'cug') although typing 'kcuig' succeeds.

This is now fixed.
2003-05-31 12:49:44 +00:00
dchandler
6f0390c5d6 By default (controllable via options.txt), Jskad now fixes the Tahoma curly
brace problem upon opening any RTF document.

The TMW_RTF_TO_THDL_WYLIE test baselines changed because
I fixed (a while ago) some inconsistencies between the EWTS standard and
Jskad.

Conversion of TibetanMachineWeb8.40, @#, to Wylie now works correctly.

Unfortunately, though, typing @# doesn't produce 8.40, it still produces
8.38 and 8.39, two glyphs.
2003-05-28 00:40:59 +00:00
dchandler
a144b125ca I've made Jskad adhere to the THDL Extended Wylie spec. Some
punctuation has changed {@, #, %, and $}.

Fixed some errors in tibwn.ini so that all the TM<->TMW mappings are
correct.
2003-05-26 13:11:51 +00:00
dchandler
ec7fec695f Added some automated JUnit tests for TMW_RTF_TO_THDL_WYLIE. 2003-05-18 17:17:52 +00:00
dchandler
e2a9720d9b I've added a command-line converter,
org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE.  It converts RTF files
consisting of TMW characters to the corresponding THDL Extended Wylie.

It supports --find-some-non-tmw mode, which allows you to ensure that no
unusual characters will spoil the conversion.  The converter has built-in
intelligence that allows it to handle Tahoma '{', '}', and '\\' characters
properly.

The converter works on mixed Roman/TMW also, but --find-some-non-tmw
and --find-all-non-tmw modes are not as useful.

Invoke org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE, which resides in
Jskad's jar, with no command-line options to see usage information.
2003-05-18 14:14:47 +00:00
dchandler
17ea8fdf2a Copying from Word XP used to crash Jskad sometimes. Now you get a
dialog box telling you something about RTF support in Java.
2003-05-15 01:41:56 +00:00
dchandler
78dc46a979 Jskad keyboards are now configured via keyboards.ini, a file that has
comments that explain its function.  It's quite simple.  This is in
response to Jeff C. H. Wu's request.
2003-05-14 03:25:36 +00:00
dchandler
dcb36ec338 Clearer status message; cleanup. 2003-05-14 02:37:28 +00:00
dchandler
8958366a07 Bad RTF now causes an error message to appear in the transcription
instead of causing a fatal exception.  The error allows you to look up
the DuffCode that caused the trouble.
2003-05-14 01:37:49 +00:00
dchandler
8275afeb41 Bad RTF files cause a polite error message to appear instead of an
exception to be thrown.

Jskad windows now always have "Jskad" in their window titles.
2003-05-14 01:34:39 +00:00
eg3p
3e847ed009 DELETE was not working properly in Roman entry mode.
Now it works ok.
2003-04-17 19:48:22 +00:00
amontano
0bacdcc229 fixed the paste problem for the translation tool 2003-04-17 11:12:59 +00:00
dchandler
59175ccfd6 Added a few tests for the ACIP keyboard, which I've improved a bit.
Noted some failures.  "Fixed" the code to do what I want it to do for
the (no sanskrit stacking, tibetan stacking) case [which is exercised
by this keyboard only].
2003-04-14 23:55:00 +00:00
dchandler
efa8fc1f25 DuffPane now has the start of a unit test suite. Invoke it via 'ant
clean check'.  Right now there are tests to ensure that typing certain
sequences of keys in the Extended Wylie keyboard gives the expected
Extended Wylie back when "Tools/Convert Tibetan to Wylie" is invoked.

The syntactically illegal d.wa now converts to Tibetan and then back
to d.wa (not dwa, as it did); likewise with the illegal g.wa.  wa
doesn't take any prefixes, but I prefer clean end-to-end
behavior. (jeskd doesn't go end-to-end, though.)

Note that you cannot successfully run the DuffPane tests on a Linux
box unless your DISPLAY variable is set correctly.  Thus, my nightly
builds will fail with an Error (as opposed to a Failure).
2003-04-14 05:22:27 +00:00
dchandler
6636d03a41 ant private-javadocs runs without warnings; cleaned up some
as-yet-unused code.
2003-04-13 01:46:20 +00:00