Commit graph

453 commits

Author SHA1 Message Date
dchandler
d2749cecd0 ACIP->TMW and ACIP->Unicode are now smart about when a newline is
really a newline and when a space is really a tsheg. The space in {KA
,MDO} is a tsheg, but the space in {GA ,MDO} is not.
2003-09-04 04:04:21 +00:00
dchandler
72e531e515 Use shortened 'dreng-bu, not regular. As per TM glyphs. I suspect
that the following would look better with shortened 'dreng-bu also,
but I'm sticking with the TM/TMW docs:

dz+r~137,2~~4,46~1,110~4,120~1,123~1,126~4,106~4,113~f5b,fb2
dz+w~138,2~~4,47~1,110~4,120~1,123~1,126~4,106~4,113~f5b,fad
dz+h~139,2~~4,48~1,110~4,120~1,123~1,126~4,106~4,113~0F5C
dz+h+y~140,2~~4,49~1,110~4,121~1,123~1,126~4,107~4,114~f5c,fb1
dz+h+r~141,2~~4,50~1,110~4,121~1,123~1,126~4,107~4,114~f5c,fb2
dz+h+l~249,2~~4,51~1,110~4,123~1,123~1,126~4,110~4,117~f5c,fb3
dz+h+w~143,2~~4,52~1,110~4,122~1,123~1,126~4,108~4,115~f5c,fad
2003-09-04 03:46:35 +00:00
a1tsal
2f58ec2760 A bunch of Sanskrit stacks of the form ts+... and dz+...had 1,125 for their
drengbu, but that is actually a naro.  I changed it to 1,123
(which is one of the two drengbus).
2003-09-04 02:06:58 +00:00
dchandler
316f59107b A preliminary TMW->ACIP converter is here. There are known bugs, mostly with rare punctuation. 2003-09-02 06:39:33 +00:00
dchandler
cc9ab06864 Added utility routine. Better comments. 2003-08-31 20:38:28 +00:00
dchandler
045c4069c9 Preliminary ACIP->TMW support is in place. {DU} gives you something
less beautiful than what Jskad would give, so more work is needed.
2003-08-31 16:06:35 +00:00
a1tsal
1f4d53be2e Moved ^M to punctuation section.
Removed obsolete comment.
2003-08-31 00:44:23 +00:00
a1tsal
522812996e Remove unused sections of tibwn.ini. 2003-08-31 00:34:15 +00:00
dchandler
dd22e161a5 Code cleanup for Jskad's Tibetan font converter GUI. 2003-08-30 05:01:15 +00:00
dchandler
896344f2d1 David Chapman removed some lines from tibwn.ini. That breaks TM<->TMW
mappings, so I've put them back, but with the EWTS non-correspondences
\tmwXYYY.

Jskad no longer supports superscribed or subscribed numerals, because
EWTS does not.
2003-08-26 01:28:02 +00:00
a1tsal
ccdebf6719 Removed half numbers (no longer in EWTS)
Brought <?Other?> closer to EWTS
Removed __TILDE__ (no longer in EWTS)
Changed M^ to ^M per new EWTS draft
Added ai, au, -i from WW tibwn.ini -- they were missing in this version
2003-08-25 23:19:48 +00:00
dchandler
1982c5847b Jskad's converter now has ACIP-to-Unicode built in. There are known
bugs; it is pre-alpha.  It's usable, though, and finds tons of errors
in ACIP input files, with the user deciding just how pedantic to be.
The biggest outstanding bug is the silent one: treating { }, space, as
tsheg instead of whitespace when we ought to know better.
2003-08-24 06:40:53 +00:00
dchandler
d5ad760230 TMW->Wylie conversion now takes advantage of prefix rules, the rules
that say "ya can take a ga prefix" etc.

The ACIP->Unicode converter now gives warnings (optionally, and by
default, inline).  This converter now produces output even when
lexical errors occur, but the output has errors and warnings inline.
2003-08-23 22:03:37 +00:00
dchandler
21ef657921 I'd broken the ACIP->Wylie for ACIP vowels {'A}, {'I}, etc. 2003-08-22 05:13:32 +00:00
dchandler
1afb3a0fdd ACIP->Unicode, without going through TMW, is now possible, so long as
\, the Sanskrit virama, is not used.  Of the 1370-odd ACIP texts I've
got here, about 57% make it through the gauntlet (fewer if you demand
a vowel or disambiguator on every stack of a non-Tibetan tsheg bar).
2003-08-18 02:38:54 +00:00
dchandler
245aac4911 I'm now stricter about accepting alphabetic characters. F, Q, X, a,
b, c, d, e, ... do not belong in ACIP, so the scanner rejects them.
This should make it even easier to distinguish automatically between
Tibetan and English texts.
2003-08-17 02:38:58 +00:00
dchandler
39451d8879 Fixed a couple of small bugs.
Only 250 errors are reported now; this is important if you try to
convert an English document.
2003-08-17 02:12:49 +00:00
dchandler
4581a2d8ab Improved the ACIP scanner (the part of the converter that says, "This
is a correction, that's a comment, this is Tibetan, that's Latin
(English), that's Tibetan inter-tsheg-bar punctuation, etc.)  It now
accepts more real-world ACIP files, i.e. it handles illegal
constructs.  The error checking is more user-friendly.  There are now
tests.

Added some tsheg bars that Peter E. Hauer of Linguasoft sent me to the
tests.  Many thanks, Peter.  I still need to implement rules that say,
"This is not Tibetan, it must be Sanskrit, because that letter doesn't
take a MA prefix."
2003-08-17 01:45:55 +00:00
dchandler
0b91ed0beb I've improved the ACIP tsheg bar scanner to handle a lot of illegal
constructions that occur in practice.
2003-08-16 16:13:53 +00:00
amontano
2a57439516 Updated the info displayed on the about window. 2003-08-14 14:16:49 +00:00
amontano
da384c6c2f Now when loading, takes the default font options from the DuffPane. 2003-08-14 14:16:23 +00:00
dchandler
2b59d9838d I now have a function that takes as input a String of ACIP and breaks
up that String into tsheg bars, punctuation, etc., while finding
errors.  I've tested it some, but I'm not yet committing the tests.

Next step: a converter that takes an ACIP file as input and outputs
TMW+Latin.
2003-08-14 05:10:47 +00:00
dchandler
57f506384f The ACIP->Tibetan converter now has perfect low-level functionality,
and it has the capability to produce error messages and warnings that
make sense to the user.  One can now get the correct parse, if one
exists, for an ACIP tsheg bar.

One could even feed in ACIP and get a list of warnings about things as
innocuous as PADMA, which a dumb converter would have trouble with.
One could then turn ACIP into well-behaved ACIP for that dumb
converter, if you really wanted to.

Still to do:

o Scan ACIP files into tsheg bars.
o Produce TMW/Latin (from which you can get Unicode, etc.).
o E-mail the illegal tsheg bars to the ACIP fellows so they can fix
  the affected documents (most of the Kangyur has unparseable
  creatures).
2003-08-12 04:13:11 +00:00
dchandler
87266646fb Removed misinformation. 2003-08-10 19:33:01 +00:00
dchandler
e21d3774a9 Added an unfinished ACIP->Tibetan converter. Once it works properly
for ACIP, it'll easily be made to work as a perfect EWTS
Wylie->Tibetan converter.  It has an extensive suite of tests for the
existing functionality.
2003-08-10 19:30:07 +00:00
dchandler
39e0435b6b Refactored this code so that Wylie->Tibetan and ACIP->Tibetan
conversions can make use of it.  Hooray for reuse.
2003-08-10 19:02:56 +00:00
dchandler
bcf1c12b6a We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie.
Our disambiguation is now perfect, happening when and only when it is
necessary.  These are all illegal, so it shouldn't affect many
existing conversions.  But if there were typos, it could.
2003-08-10 18:46:01 +00:00
dchandler
9093fd3c05 We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie.
Our disambiguation is now perfect, happening when and only when it is
necessary.  These are all illegal, so it shouldn't affect many
existing conversions.  But if there were typos, it could.
2003-08-10 18:38:20 +00:00
dchandler
251d8feae5 brtan now gives TMW->Wylie brtan, not b.rtan. Etc. See bug report
http://sourceforge.net/tracker/index.php?func=detail&aid=785791&group_id=61934&atid=502515.
2003-08-09 17:48:40 +00:00
dchandler
7dffc47cb7 'bad now gives TMW->Wylie 'bad, not TMW->Wylie 'abd. Andres came
across this one, so we've added it to the list of ambiguous three-consonant
combos.
2003-08-09 17:05:43 +00:00
amontano
52cdc17794 Added support for multiple keyboards and ability to set the preferences
for size of tibetan font and type and size of roman font.
2003-08-09 08:00:58 +00:00
amontano
8e4b508de8 Made a new class for the preference window so that other software
(i.e. the translation tool) can use re-use that same code to set up the
attributes of the tibetan and roman fonts.
2003-08-09 07:57:21 +00:00
amontano
ef0df405d9 Redesigned the interface of the handheld version. 2003-08-03 06:29:08 +00:00
amontano
2b5a5fe67a Got rid of redundant code 2003-08-03 06:28:22 +00:00
amontano
cce779bf88 Added a wizard window to avoid as much as possible using the command line.
This way through clicking on the application through the wizard one can choose
to connect to the available on-line dicts, open a local dict or generate a dict database.
2003-08-03 06:27:30 +00:00
dchandler
4caeafa1b1 You shouldn't have one of these without the other, now that there are two.
This way neither TM nor TMW fonts will be loaded.
2003-07-26 00:55:32 +00:00
dchandler
2bb499e5a7 This was dying with a NullPointerException when you started it up using
'ant tt-run' with no dictionary.  Now it starts up and shows you a nice
error message, "Dictionary could not be loaded!", instead.
2003-07-26 00:53:59 +00:00
dchandler
e198519c5f Jskad now supports EWTS ~, i.e. TMW8.91. 2003-07-25 02:35:31 +00:00
amontano
5df9b5b91a now supports sorting 2003-07-25 01:43:58 +00:00
amontano
97f5fe91b3 when invalid wylie is encountered, instead of displaying a message it raises an exception. 2003-07-25 01:43:18 +00:00
amontano
7cdbf33333 changed it to support for 30 dictionaries (instead of just 15) 2003-07-25 01:42:17 +00:00
amontano
7b04d7bca5 changed the "about" info 2003-07-25 01:41:30 +00:00
dchandler
a7f0c35738 Added a test for ts.ha vs. tsha ambiguity; there is no ambiguity. 2003-07-18 03:51:29 +00:00
dchandler
dc454b8c0c More test cases related to the following:
The Tibetan d.za was being converted into the Wylie dza incorrectly.  This
is a rare case, but I want TMW->Wylie to be perfectly unambiguous.
2003-07-18 02:31:02 +00:00
dchandler
f8c959bfb0 The Tibetan d.za was being converted into the Wylie dza incorrectly. This
is a rare case, but I want TMW->Wylie to be perfectly unambiguous.
2003-07-18 00:30:27 +00:00
dchandler
1c29566aee I'm now using the Unix diff built in to Apache Jakarta Commons JRCS
(which I found on suigeneris.org, not apache.org) in order to bulletproof the
Tibetan Converter tests.  They used to fail due to nondeterminism in the
Java RTF writer; they should no longer fail.

I've also changed it so that the Tibetan Converter tests run in headless
mode, which means that they'll run on the nightly builds server.
2003-07-14 12:26:26 +00:00
dchandler
06fb77a82b Initial revision 2003-07-14 12:22:29 +00:00
dchandler
f900154e7a Tests disambiguation in TMW->Wylie conversion. 2003-07-14 12:21:02 +00:00
dchandler
0622ac5062 Jskad no longer relies on the <?Consonants?>, <?Vowels?>, <?Other?>,
or <?Numbers?> commands; it instead hard-codes the appropriate comma-
delimited lists.  This is cleaner because WylieWord and Jskad had different
values for these lists.
2003-07-14 12:19:46 +00:00
dchandler
fb85f6e8ce Fix comment. 2003-07-14 12:17:04 +00:00
dchandler
79b3b97326 Remove warning message from menu item. 2003-07-13 23:19:11 +00:00
dchandler
c986684beb Updated help to talk about new features. 2003-07-13 22:51:35 +00:00
dchandler
f695b1a6c1 Updated baselines because conversions have improved since the last
update.
2003-07-13 19:14:41 +00:00
dchandler
d10f97fc06 Disambiguation was not being used appropriately. This makes previous
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.

Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
2003-07-13 19:14:15 +00:00
dchandler
96afae795c Disambiguation was not being used appropriately. This makes previous
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.

Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
2003-07-13 18:46:29 +00:00
dchandler
802e0cb588 If this method uses the Wylie representation, you get an infinite recursion
when you do a TMW->Wylie conversion for a document with glyphs that
have no known Wylie.
2003-07-13 17:40:02 +00:00
dchandler
a86a0f235b I was missing a break; statement; this caused an Error to be thrown during
some TMW->Wylie conversions.  No conversions were erroneous, though.
2003-07-13 17:38:00 +00:00
dchandler
6677d1e245 Code cleanup. 2003-07-13 16:53:03 +00:00
dchandler
31f0f852cc The ACIP keyboard is context-sensitive, so it'll likely never be supported by
the current infrastructure as a keyboard.  I'm taking it out of Jskad.
2003-07-13 16:30:58 +00:00
dchandler
3b6eaa792e Fixed javadocs. 2003-07-11 13:33:30 +00:00
dchandler
85176cd9f3 Put in a fix for a new bug in Swing's RTF support. This bug is w.r.t. escapes
like \bullet, \emdash, etc., and this fix only works for Windows or OS/2 RTF
files, not for Mac RTF files.  So if you want a TM->TMW conversion to work,
use MS Word for Windows, not for the Mac.
2003-07-11 13:30:22 +00:00
dchandler
d726bc0258 A couple of changes to TMW->Unicode thanks to Than's reply to my
questions.
2003-07-09 01:44:15 +00:00
dchandler
9db233bdf8 Cosmetic change. 2003-07-08 14:31:14 +00:00
dchandler
02558a1d78 Jskad supports <7, >8, etc. again; it no longer supports the punctuation
'<' and '>'.  The current keyboard implementation makes this an either-or
proposition, when fundamentally it need not be.

Added a <?Numbers?> command and an <?Input:Numbers?> command to
tibwn.ini; broke the numbers apart from the consonants.  This facilitates the
new-and-improved Tibetan->Wylie conversion.

Tibetan->Wylie is now done by forming legal tsheg-bars.  A legal tsheg bar
is converted into perfect THDL Wylie.  See code comments to learn what
it thinks is a legal tsheg-bar, but it inlcudes bskyUMbsH minus the trailing
punctuation (H), e.g.

Illegal sequences, such as runs of transliterated Sanskrit, are turned into
unambiguous Wylie; each glyph is followed by a vowel or a disambiguator
('.').

I've made it so that the illegal sequences are as beautiful as possible.  You
get 'pad+me', for example, not the equivalent but uglier 'pad+m.e.'.
2003-07-08 14:30:17 +00:00
dchandler
c04a3f189b Rearranged the topics. 2003-07-08 12:50:27 +00:00
dchandler
23d18c925f Tibetan! 5.1's docs were again faulty. fa and va were getting the wrong
vowels.
2003-07-08 02:59:17 +00:00
dchandler
24ac6fd06c The Trie of possible inputs fixed this bug. 2003-07-06 16:31:13 +00:00
dchandler
d88141512b Small changes w.r.t. clearing preferences. Some code cleanup. 2003-07-06 16:24:29 +00:00
dchandler
086f4bb6ec Renamed the Info menu Help.
Now using CalHTMLPane to surf the offline and the online help.
2003-07-05 22:25:21 +00:00
dchandler
8c4ab30a52 Rearranged the Tools menu; made the converter smart about "find some..."
and "find all..." modes.
2003-07-05 21:02:46 +00:00
dchandler
72d2eee503 Code cleanup. 2003-07-05 19:26:58 +00:00
dchandler
a463b686b3 Jskad now ships with both TibetanMachine and TibetanMachineWeb fonts
by default, not just TMW.  Thus users need not install these fonts on their
systems.
2003-07-05 18:00:29 +00:00
dchandler
9effee0564 If you opened a file from the recently opened files list and very quickly
mouse-clicked on the new Jskad window, you could cause an infinite
regression of requestFocus() operations because the menu would try
to get focus back.  I grab focus from the menu now.
2003-07-05 02:30:00 +00:00
dchandler
51679c158b Final fixes completed; recently opened files can now be selected from
Jskad's file menu.
2003-07-05 02:15:33 +00:00
dchandler
4410b52c07 There's still a small bug in this, but here's the real stuff:
Recently opened files can now be selected from Jskad's file menu.

A Jskad now gives the focus to the DuffPane when that Jskad gets the
focus.
2003-07-04 03:29:25 +00:00
dchandler
d863446d25 I think *this* compiles... 2003-07-04 02:32:40 +00:00
dchandler
407020108f I didn't mean to commit the previous revision; I'm still tweaking it. 2003-07-04 02:32:03 +00:00
dchandler
9f0b1c3250 Recently opened files can now be selected from Jskad's file menu.
A Jskad now gives the focus to the DuffPane when that Jskad gets the
focus.
2003-07-04 02:31:23 +00:00
dchandler
7500b4e06b Jskad won't allow you to exit by closing the last window anymore. Instead,
you get a dialog box saying to use File/Exit.
2003-07-04 00:21:07 +00:00
dchandler
6c286573ba Fixed Javadocs. 2003-07-04 00:12:59 +00:00
dchandler
0a1bc0d30b getWylie now takes a parameter for error detection; I'm not detecting errors
here though.

Fixed a typo in a property name.
2003-07-01 23:20:08 +00:00
dchandler
0d1999d055 getWylie now takes a parameter for error detection; I'm not detecting errors
here though.
2003-07-01 22:52:18 +00:00
dchandler
a48ec641d5 Better error messages in TMW->Wylie conversions. The user knows what's
up.
2003-07-01 03:43:33 +00:00
dchandler
3113a4b8de Some of the \tmw80.. mappings were out of date.
3+1/2 is not EWTS; took these out.
2003-07-01 03:42:30 +00:00
dchandler
e7e7c2bf15 The command-line tool runs in headless mode by default, so it will
work on a Linux console, e.g.  The JUnit tests will too, though 'ant
check' still fails because we don't sneak the -Djava.awt.headless=true
into the process early enough.
2003-07-01 02:50:09 +00:00
dchandler
6151a7bc94 TMW->Wylie now occurs in the TibetanDocument, not in DuffPane,
which means that the command-line tool can finally function with a headless
graphics device.  Hopefully it will speed things up, too.  It also means that
entering Roman text into the TMW->Unicode conversion and TMW->TM
conversion will be easy.
2003-07-01 01:21:57 +00:00
dchandler
61d29fc355 The TMW->Wylie mapping was busted w.r.t. tshegs.
Also, I now map both TMW7.90 and TMW7.91 to EWTS 'M'.
2003-07-01 00:17:18 +00:00
dchandler
229536884f I've validated by hand the TM<->TMW mappings. A few things changed, so
no previous TM->TMW or TMW->TM conversions can be trusted.
2003-06-30 02:24:11 +00:00
dchandler
dc03083433 I've validated by hand the TM<->TMW mappings. A few things changed, so
no previous TM->TMW conversions can be trusted.
2003-06-30 02:22:09 +00:00
dchandler
58644a6ef9 Better error handling. 2003-06-30 02:20:52 +00:00
dchandler
b16fb8a85c This is correct; the Tibetan! 5.1 documentation is not. This affects
TM->TMW conversions.

See http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515
for a full list of Tibetan! 5.1 documentation errors.
2003-06-29 22:11:00 +00:00
dchandler
aedef4b44d An error now appears if you try to convert from format A to format B but no
glyphs in format A appear.  In this case, it is likely that you meant to convert
a different file or do a different conversion.
2003-06-29 21:31:48 +00:00
dchandler
ee14b7b97f Jskad now has the ability to open its buffer with an external viewer, e.g.
Microsoft Word.

Better OOM error handling in the GUI converter; untested, though.
2003-06-29 20:49:30 +00:00
dchandler
646e23b4a4 Tweaked the converter GUI so that you can open the old and the new files
with the external viewer.
2003-06-29 16:45:15 +00:00
dchandler
3f76c3692d Fixed Javadoc warnings. 2003-06-29 15:37:35 +00:00
dchandler
b841a7f14b The converter GUI can now be run standalone or from Jskad's Tools menu.
The converter GUI gives nicer error messages in at least one case.
2003-06-29 04:18:36 +00:00
dchandler
7938648ca8 TM->TMW conversion has no known bugs. Oddballs have been
comprehensively handled.
2003-06-29 03:03:07 +00:00
dchandler
689c1910aa To deal with java.swing.text.rtf bugs regarding hexadecimal escape
sequences, I've created RTFFixerInputStream.  It turns illegal hexadecimal
escapes into Unicode escapes.
2003-06-29 02:30:08 +00:00
dchandler
0b849aed97 Fixed comments w.r.t. javadoc warnings. 2003-06-29 02:22:20 +00:00
dchandler
4e279defb4 Fixed a couple of array bounds checks.
Added support for two more oddballs.

Deprecated the oddball lookup method because it drops up to 30 glyphs in
TibetanMachine.  The correct solution is to transform the RTF before Java's
busted RTF readers ever see it.  \'97 becomes \u151, e.g.
2003-06-28 16:33:58 +00:00
dchandler
2a359c45ef Bad conversions were not leaving the unconvertable characters at the
beginning of the document as they should and as they are documented to.

They now do, and they bracket the bad characters with the TM or TMW for
U+0F3C on the left and the TM or TMW for U+0F3D on the right.

Some cleanup.
2003-06-28 16:20:19 +00:00
dchandler
c39d8d6326 My earlier code cleanup introduced this bug; TMW->TM conversion was
busted.
2003-06-26 22:48:51 +00:00
dchandler
25510542b2 Now with a nicer error message in one case. 2003-06-26 22:48:05 +00:00
dchandler
c34259b105 Code cleanup. 2003-06-25 01:04:24 +00:00
dchandler
9e6c3009ac Added an About button. Code cleanup. Changed the Cancel button to the
Close button.
2003-06-25 00:49:11 +00:00
dchandler
569fba6467 Made the comments in the my_thdl_preferences.txt file use standard line
separators.
2003-06-25 00:03:46 +00:00
dchandler
0f3c4174b6 Made the comments in the my_thdl_preferences.txt file more useful. 2003-06-24 23:48:00 +00:00
dchandler
c67ddb2d6c Use Ximalaya, not Arial Unicode MS, by default. 2003-06-24 12:51:32 +00:00
dchandler
33beb7b782 Bye bye debugging output. 2003-06-24 12:23:37 +00:00
dchandler
f547734043 Added Than's converter GUI code; adapted it to work with Jskad's
converters.

TMW->Unicode now uses Ximalaya by default.
2003-06-24 03:02:29 +00:00
dchandler
19d7cabfe6 Forget the final=faster myth. 2003-06-24 03:01:13 +00:00
dchandler
917864574c Fixed a logic bug in mapTMWtoTM and mapTMtoTMW.
You can now specify which Unicode font to use via 'java
-Dthdl.tmw.to.unicode.font=Ximalaya ...'.
2003-06-23 01:58:11 +00:00
dchandler
b6d8fd89f9 When errors in (all but TMW->Wylie and Wylie->TMW) conversion occur,
the troublesome glyphs are now put at the beginning of the document
AFTER AN ACHEN.  This makes a glyph like \tmw7095 visible atop the
achen.

Major fix to the handling of paragraphs in conversion; we were (for
whatever reason) dropping paragraphs before.
2003-06-23 01:24:02 +00:00
dchandler
1f4343bed0 TMW->TM, TM->TMW, and TMW->Unicode conversions are all (at least 2)
orders of magnitude faster.
2003-06-22 22:10:58 +00:00
dchandler
afe73c2228 The pseudo-file '-', referring to standard input, is now accepted as a
command-line argument.
2003-06-22 21:05:16 +00:00
dchandler
900f7492b0 'ant clean check' was failing because I hadn't updated the
--find-some-non-tmw and --find-all-non-tmw baselines.

Code cleanup.
2003-06-22 16:11:58 +00:00
dchandler
66287f3cc9 Small TMW->Wylie performance improvements. TMW->Wylie is *much*
faster than TMW->Unicode etc.; this is because many fewer replacements
are made (i.e., more text is replaced each time a replacement is
performed).

I must find a way to still preserve formatting but do many fewer
replacements in TMW->{Unicode,TM} and TM->TMW.
2003-06-22 04:32:59 +00:00
dchandler
6540b260bd Fixes a (small, I think) TMW->Unicode performance glitch. I was
inserting 5 characters at a time and then skipping ahead just one
position.  I don't think this affected correctness.

I believe there's still a terrible (exponential?) slowdown as the
input file gets bigger, however.  Perhaps not -- but we run through
the first 1000 TMW glyphs in 6 seconds, the 20th thousand takes at
least 60 seconds.  Is TMW->Wylie faster than TMW->Unicode?  If so,
why?

Thought: don't use a DuffPane within TibetanConverter -- it can only
add overhead, right?  My hprof profile said that the conversion was
taking just a couple of percent of the work; the rest was going to
display-related stuff that you should only see if you were displaying
the document.  I'm not!
2003-06-22 04:08:33 +00:00
dchandler
dfe64a1927 Added --find-some-non-tm and --find-all-non-tm modes to the converter to
help ensure worry-free TM->TMW conversions.
2003-06-22 00:14:18 +00:00
dchandler
80101666c7 Included a fix from WylieWord's tibwn.ini. Removed some needless trailing
tildes.
2003-06-21 02:35:21 +00:00
dchandler
9a41f512d9 It used to be the case that you could select 'Close', and then when asked
"do you want to save?" you could press yes and then press cancel and
Jskad would still exit.  That's no longer the case.

Added File->Exit to Jskad.
2003-06-21 02:07:51 +00:00
dchandler
45b87b0fb4 In Jskad, you can now clear the preferences and return to default values. 2003-06-21 01:26:17 +00:00
eg3p
fbb6245fdb Added cut() and copy() methods to override JTextPane's methods of same name. 2003-06-20 15:27:20 +00:00
dchandler
5067683121 Edward corrected me; he had intended to have M map to 7.91, not 7.90. 2003-06-17 01:46:19 +00:00
dchandler
6712b47e13 Added an option to control the Unicode font for TMW->Unicode
conversions.
2003-06-15 20:28:56 +00:00
dchandler
ced830a7d3 Renamed TMW_RTF_TO_THDL_WYLIE TibetanConverter. 2003-06-15 19:19:23 +00:00
dchandler
34a7b5da9b This converter now performs TMW->Unicode conversions. 2003-06-15 18:38:42 +00:00
dchandler
da70434e52 Jskad now allows for TMW->Unicode conversion. 2003-06-15 16:27:36 +00:00
dchandler
af5b95b08d A TMW->Unicode table is here. Note these issues, however:
Is the EWTS '_' to be represented as U+0020, or is it a wider space?

Does TMW9.42, Dza, map to U+0F5F,U+0F39?

Does TMW6.60, r+y, map to U+0F62,U+0FBB or to U+0F6A,U+0FBB?  (Likewise with r+w, TMW6.61, TMW6.62, etc.)

Is U+0F7E a bindu?  What Unicode does TMW7.96 map to, for example?  What does TMW7.91 map to?

Should TMW8.97 and TMW8.98 map to swastiskas elsewhere in Unicode?  If so, which codepoints?  Likewise with TMW9.60, a Chinese character.

Does TMW7.68 map to U+0F39?

Does TMW7.74, the ITHI secret sign, have a Unicode mapping?  f68,fa0,f80,f72 comes close, but fa0 would be too large, wouldn't it?

What Unicode does TMW9.61 map to?  Is it for sequences like f40,f7c,f60,f72?  Or is it for f60,f72,f7c?
2003-06-15 03:25:45 +00:00
dchandler
b387c512e9 Fixed two bugs. 2003-06-15 03:08:57 +00:00
dchandler
189fef9aec Made Jskad smart enough to handle a few more EWTS characters; some
it can only convert to Wylie, others are live key sequences.  This will make
converting the shechen documents go more smoothly.
2003-06-09 13:35:43 +00:00
dchandler
09a55110b7 Handles more TibetanMachine oddballs. 2003-06-09 02:01:13 +00:00
dchandler
b9219640e5 Handles more TibetanMachine oddballs. 2003-06-09 01:53:01 +00:00
dchandler
e97e1c8464 Handles more TibetanMachine oddballs. 2003-06-09 01:20:32 +00:00
dchandler
651a599188 Fixed usage info. 2003-06-08 23:23:12 +00:00
dchandler
70b31558fa Tried to fix a crashing bug that happened when you converted TM->TMW
and then tried to convert that TMW to Wylie.  I swear it's Java's
problem (see the ugly stack trace in the code and decide for
yourself), and I tried replacing rather than
inserting-and-then-removing, but it didn't work.  I've left these
things as options.
2003-06-08 23:12:52 +00:00
dchandler
212414edef TMW_RTF_TO_THDL_WYLIE now converts TM->TMW. 2003-06-08 22:43:27 +00:00
dchandler
32831b698f If bad (oddball) TM glyphs appear, then converting to TMW causes, by
default, all oddballs to appear once in the resulting document.
This'll help me find the correct glyphs for the oddballs, and it'll
prevent the average user from converting a document with oddballs.
2003-06-08 22:37:38 +00:00
dchandler
d45f5ab8c8 Improved performance (I suppose). 2003-06-03 23:49:34 +00:00
dchandler
7d768c9e06 Fixed a crashing bug that happened upon converting wylie to tibetan. 2003-06-03 23:45:15 +00:00
dchandler
0f724989b5 The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90.
I've fixed that.

I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.

TM->TMW conversion takes about 1 second per thousand glyphs on my
PIII-550.
2003-06-01 23:05:32 +00:00
dchandler
54ca37c824 The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90.
I've fixed that.

I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.
2003-06-01 19:14:08 +00:00
dchandler
e2caf99085 Some code cleanup.
tibwn.ini must now have, in the Unicode column, either nothing, or
0FXX(,0FXX)*.  E.g., 0F04,0F05 is valid.  Debugging code ensures this is
the case.
2003-06-01 18:09:49 +00:00
dchandler
1f6bb07d53 Fixes bogus Unicode mappings mentioned in
http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515.
2003-06-01 04:02:04 +00:00
dchandler
7a8264d87c Fixed typo. 2003-06-01 03:30:49 +00:00
dchandler
0235263ddf TM->TMW and TMW->TM conversion in RTF is now supported. I've
noticed that formatting is mostly OK but sometimes gets bungled slightly.
I tried everything I could think of, and now I'm passing the buck to Java's
RTF support.

TMW_RTF_TO_THDL_WYLIE (now misnamed) support TMW->TM
conversion (but not TM->TMW).  There is an automated test case for a
TMW->TM conversion.

I have full confidence in this conversion.  Even the smallest glitch in the core
functionality (not formatting) would surprise me.

Note that the JUnit test TMW_RTF_TO_THDL_WYLIETest sometimes fails
due to one- or two-line diffs between the actual and expected outputs.  This
is because Java's RTF support is not deterministic, I'm guessing, and is not
a real failure.  I'm too lazy to make a more elaborate sed/diff mechanism
that works on all platforms, and that would complicate the build anyway.
2003-05-31 23:21:29 +00:00
dchandler
bfacd6c998 Accurate TM->TMW and TMW->TM mappings are now available. I've
verified this extensively and have full confidence that these mappings
agree with Tony Duff's Tibetan! 5.1 documentation (except as described
below).

To get them, I had to disregard Tony Duff's tables for a few glyphs: the
characters with ordinal 32 and 45 (space and hyphen in Roman ASCII,
space and tsheg in Tibetan).  For these glyphs, we must have mappings
from TibetanMachineSkt4.32 to something, etc., and those mappings were
not present.  I've normalized the mapping for these glyphs, as it is arbitrary
because the same two glyphs just appear fifteen times each.
2003-05-31 20:13:15 +00:00
dchandler
a4bc23a9ab Made performance improvements, doc improvements, and code cleanup to
DuffCode.
2003-05-31 17:02:06 +00:00
dchandler
08d2ea3e2d Jeff C. H. Wu found a bug whereby typing 'cuig' just after starting Jskad fails
(by producing 'cug') although typing 'kcuig' succeeds.

This is now fixed, and test cases now exist to ensure that the problem
doesn't reappear.
2003-05-31 12:58:36 +00:00
dchandler
bc9a8f4754 Jeff C. H. Wu found a bug whereby typing 'cuig' just after starting Jskad fails
(by producing 'cug') although typing 'kcuig' succeeds.

This is now fixed.
2003-05-31 12:49:44 +00:00