Commit graph

60 commits

Author SHA1 Message Date
dchandler
316f59107b A preliminary TMW->ACIP converter is here. There are known bugs, mostly with rare punctuation. 2003-09-02 06:39:33 +00:00
dchandler
896344f2d1 David Chapman removed some lines from tibwn.ini. That breaks TM<->TMW
mappings, so I've put them back, but with the EWTS non-correspondences
\tmwXYYY.

Jskad no longer supports superscribed or subscribed numerals, because
EWTS does not.
2003-08-26 01:28:02 +00:00
dchandler
d5ad760230 TMW->Wylie conversion now takes advantage of prefix rules, the rules
that say "ya can take a ga prefix" etc.

The ACIP->Unicode converter now gives warnings (optionally, and by
default, inline).  This converter now produces output even when
lexical errors occur, but the output has errors and warnings inline.
2003-08-23 22:03:37 +00:00
dchandler
87266646fb Removed misinformation. 2003-08-10 19:33:01 +00:00
dchandler
39e0435b6b Refactored this code so that Wylie->Tibetan and ACIP->Tibetan
conversions can make use of it.  Hooray for reuse.
2003-08-10 19:02:56 +00:00
dchandler
9093fd3c05 We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie.
Our disambiguation is now perfect, happening when and only when it is
necessary.  These are all illegal, so it shouldn't affect many
existing conversions.  But if there were typos, it could.
2003-08-10 18:38:20 +00:00
dchandler
251d8feae5 brtan now gives TMW->Wylie brtan, not b.rtan. Etc. See bug report
http://sourceforge.net/tracker/index.php?func=detail&aid=785791&group_id=61934&atid=502515.
2003-08-09 17:48:40 +00:00
dchandler
e198519c5f Jskad now supports EWTS ~, i.e. TMW8.91. 2003-07-25 02:35:31 +00:00
dchandler
f8c959bfb0 The Tibetan d.za was being converted into the Wylie dza incorrectly. This
is a rare case, but I want TMW->Wylie to be perfectly unambiguous.
2003-07-18 00:30:27 +00:00
dchandler
0622ac5062 Jskad no longer relies on the <?Consonants?>, <?Vowels?>, <?Other?>,
or <?Numbers?> commands; it instead hard-codes the appropriate comma-
delimited lists.  This is cleaner because WylieWord and Jskad had different
values for these lists.
2003-07-14 12:19:46 +00:00
dchandler
96afae795c Disambiguation was not being used appropriately. This makes previous
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.

Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
2003-07-13 18:46:29 +00:00
dchandler
6677d1e245 Code cleanup. 2003-07-13 16:53:03 +00:00
dchandler
85176cd9f3 Put in a fix for a new bug in Swing's RTF support. This bug is w.r.t. escapes
like \bullet, \emdash, etc., and this fix only works for Windows or OS/2 RTF
files, not for Mac RTF files.  So if you want a TM->TMW conversion to work,
use MS Word for Windows, not for the Mac.
2003-07-11 13:30:22 +00:00
dchandler
d726bc0258 A couple of changes to TMW->Unicode thanks to Than's reply to my
questions.
2003-07-09 01:44:15 +00:00
dchandler
02558a1d78 Jskad supports <7, >8, etc. again; it no longer supports the punctuation
'<' and '>'.  The current keyboard implementation makes this an either-or
proposition, when fundamentally it need not be.

Added a <?Numbers?> command and an <?Input:Numbers?> command to
tibwn.ini; broke the numbers apart from the consonants.  This facilitates the
new-and-improved Tibetan->Wylie conversion.

Tibetan->Wylie is now done by forming legal tsheg-bars.  A legal tsheg bar
is converted into perfect THDL Wylie.  See code comments to learn what
it thinks is a legal tsheg-bar, but it inlcudes bskyUMbsH minus the trailing
punctuation (H), e.g.

Illegal sequences, such as runs of transliterated Sanskrit, are turned into
unambiguous Wylie; each glyph is followed by a vowel or a disambiguator
('.').

I've made it so that the illegal sequences are as beautiful as possible.  You
get 'pad+me', for example, not the equivalent but uglier 'pad+m.e.'.
2003-07-08 14:30:17 +00:00
dchandler
a463b686b3 Jskad now ships with both TibetanMachine and TibetanMachineWeb fonts
by default, not just TMW.  Thus users need not install these fonts on their
systems.
2003-07-05 18:00:29 +00:00
dchandler
a48ec641d5 Better error messages in TMW->Wylie conversions. The user knows what's
up.
2003-07-01 03:43:33 +00:00
dchandler
229536884f I've validated by hand the TM<->TMW mappings. A few things changed, so
no previous TM->TMW or TMW->TM conversions can be trusted.
2003-06-30 02:24:11 +00:00
dchandler
3f76c3692d Fixed Javadoc warnings. 2003-06-29 15:37:35 +00:00
dchandler
7938648ca8 TM->TMW conversion has no known bugs. Oddballs have been
comprehensively handled.
2003-06-29 03:03:07 +00:00
dchandler
4e279defb4 Fixed a couple of array bounds checks.
Added support for two more oddballs.

Deprecated the oddball lookup method because it drops up to 30 glyphs in
TibetanMachine.  The correct solution is to transform the RTF before Java's
busted RTF readers ever see it.  \'97 becomes \u151, e.g.
2003-06-28 16:33:58 +00:00
dchandler
f547734043 Added Than's converter GUI code; adapted it to work with Jskad's
converters.

TMW->Unicode now uses Ximalaya by default.
2003-06-24 03:02:29 +00:00
dchandler
917864574c Fixed a logic bug in mapTMWtoTM and mapTMtoTMW.
You can now specify which Unicode font to use via 'java
-Dthdl.tmw.to.unicode.font=Ximalaya ...'.
2003-06-23 01:58:11 +00:00
dchandler
1f4343bed0 TMW->TM, TM->TMW, and TMW->Unicode conversions are all (at least 2)
orders of magnitude faster.
2003-06-22 22:10:58 +00:00
dchandler
da70434e52 Jskad now allows for TMW->Unicode conversion. 2003-06-15 16:27:36 +00:00
dchandler
af5b95b08d A TMW->Unicode table is here. Note these issues, however:
Is the EWTS '_' to be represented as U+0020, or is it a wider space?

Does TMW9.42, Dza, map to U+0F5F,U+0F39?

Does TMW6.60, r+y, map to U+0F62,U+0FBB or to U+0F6A,U+0FBB?  (Likewise with r+w, TMW6.61, TMW6.62, etc.)

Is U+0F7E a bindu?  What Unicode does TMW7.96 map to, for example?  What does TMW7.91 map to?

Should TMW8.97 and TMW8.98 map to swastiskas elsewhere in Unicode?  If so, which codepoints?  Likewise with TMW9.60, a Chinese character.

Does TMW7.68 map to U+0F39?

Does TMW7.74, the ITHI secret sign, have a Unicode mapping?  f68,fa0,f80,f72 comes close, but fa0 would be too large, wouldn't it?

What Unicode does TMW9.61 map to?  Is it for sequences like f40,f7c,f60,f72?  Or is it for f60,f72,f7c?
2003-06-15 03:25:45 +00:00
dchandler
189fef9aec Made Jskad smart enough to handle a few more EWTS characters; some
it can only convert to Wylie, others are live key sequences.  This will make
converting the shechen documents go more smoothly.
2003-06-09 13:35:43 +00:00
dchandler
09a55110b7 Handles more TibetanMachine oddballs. 2003-06-09 02:01:13 +00:00
dchandler
b9219640e5 Handles more TibetanMachine oddballs. 2003-06-09 01:53:01 +00:00
dchandler
e97e1c8464 Handles more TibetanMachine oddballs. 2003-06-09 01:20:32 +00:00
dchandler
0f724989b5 The Wylie 'M' used to map to TMW7.91, when it should map to TMW7.90.
I've fixed that.

I've also added a couple of Unicode mappings to give a flavor for how
multi-codepoint mappings will be represented.

TM->TMW conversion takes about 1 second per thousand glyphs on my
PIII-550.
2003-06-01 23:05:32 +00:00
dchandler
e2caf99085 Some code cleanup.
tibwn.ini must now have, in the Unicode column, either nothing, or
0FXX(,0FXX)*.  E.g., 0F04,0F05 is valid.  Debugging code ensures this is
the case.
2003-06-01 18:09:49 +00:00
dchandler
0235263ddf TM->TMW and TMW->TM conversion in RTF is now supported. I've
noticed that formatting is mostly OK but sometimes gets bungled slightly.
I tried everything I could think of, and now I'm passing the buck to Java's
RTF support.

TMW_RTF_TO_THDL_WYLIE (now misnamed) support TMW->TM
conversion (but not TM->TMW).  There is an automated test case for a
TMW->TM conversion.

I have full confidence in this conversion.  Even the smallest glitch in the core
functionality (not formatting) would surprise me.

Note that the JUnit test TMW_RTF_TO_THDL_WYLIETest sometimes fails
due to one- or two-line diffs between the actual and expected outputs.  This
is because Java's RTF support is not deterministic, I'm guessing, and is not
a real failure.  I'm too lazy to make a more elaborate sed/diff mechanism
that works on all platforms, and that would complicate the build anyway.
2003-05-31 23:21:29 +00:00
dchandler
bfacd6c998 Accurate TM->TMW and TMW->TM mappings are now available. I've
verified this extensively and have full confidence that these mappings
agree with Tony Duff's Tibetan! 5.1 documentation (except as described
below).

To get them, I had to disregard Tony Duff's tables for a few glyphs: the
characters with ordinal 32 and 45 (space and hyphen in Roman ASCII,
space and tsheg in Tibetan).  For these glyphs, we must have mappings
from TibetanMachineSkt4.32 to something, etc., and those mappings were
not present.  I've normalized the mapping for these glyphs, as it is arbitrary
because the same two glyphs just appear fifteen times each.
2003-05-31 20:13:15 +00:00
dchandler
a4bc23a9ab Made performance improvements, doc improvements, and code cleanup to
DuffCode.
2003-05-31 17:02:06 +00:00
dchandler
a144b125ca I've made Jskad adhere to the THDL Extended Wylie spec. Some
punctuation has changed {@, #, %, and $}.

Fixed some errors in tibwn.ini so that all the TM<->TMW mappings are
correct.
2003-05-26 13:11:51 +00:00
dchandler
ec7fec695f Added some automated JUnit tests for TMW_RTF_TO_THDL_WYLIE. 2003-05-18 17:17:52 +00:00
dchandler
e2a9720d9b I've added a command-line converter,
org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE.  It converts RTF files
consisting of TMW characters to the corresponding THDL Extended Wylie.

It supports --find-some-non-tmw mode, which allows you to ensure that no
unusual characters will spoil the conversion.  The converter has built-in
intelligence that allows it to handle Tahoma '{', '}', and '\\' characters
properly.

The converter works on mixed Roman/TMW also, but --find-some-non-tmw
and --find-all-non-tmw modes are not as useful.

Invoke org.thdl.tib.input.TMW_RTF_TO_THDL_WYLIE, which resides in
Jskad's jar, with no command-line options to see usage information.
2003-05-18 14:14:47 +00:00
dchandler
78dc46a979 Jskad keyboards are now configured via keyboards.ini, a file that has
comments that explain its function.  It's quite simple.  This is in
response to Jeff C. H. Wu's request.
2003-05-14 03:25:36 +00:00
dchandler
8958366a07 Bad RTF now causes an error message to appear in the transcription
instead of causing a fatal exception.  The error allows you to look up
the DuffCode that caused the trouble.
2003-05-14 01:37:49 +00:00
dchandler
efa8fc1f25 DuffPane now has the start of a unit test suite. Invoke it via 'ant
clean check'.  Right now there are tests to ensure that typing certain
sequences of keys in the Extended Wylie keyboard gives the expected
Extended Wylie back when "Tools/Convert Tibetan to Wylie" is invoked.

The syntactically illegal d.wa now converts to Tibetan and then back
to d.wa (not dwa, as it did); likewise with the illegal g.wa.  wa
doesn't take any prefixes, but I prefer clean end-to-end
behavior. (jeskd doesn't go end-to-end, though.)

Note that you cannot successfully run the DuffPane tests on a Linux
box unless your DISPLAY variable is set correctly.  Thus, my nightly
builds will fail with an Error (as opposed to a Failure).
2003-04-14 05:22:27 +00:00
dchandler
644c0d3801 Updated the HTML help file; removed some useless code. 2003-04-13 01:17:10 +00:00
dchandler
7dd67bbf6a Now turns Tibetan into pa'am, not pa'm. Works with or without vowels
in the part preceding the 'am or 'ang, overcoming the inconsistency
that I'd put here for a short time.
2003-04-08 04:56:40 +00:00
dchandler
33b3080068 Fixed a bunch of bugs; supports le'u'i'o, sgom pa'am, etc.
Better tests.  As part of that, I had to break TibetanMachineWeb into
TibetanMachineWeb+THDLWylieConstants, because I don't want the
class-wide initialization code from TibetanMachineWeb causing errors
in LegalTshegBarTest.
2003-03-31 00:33:50 +00:00
dchandler
1987f7d80a b-r-g, b-l-g-s, etc., when converted from Tibetan to Wylie, give
correct, unambiguous Wylie.
2003-03-30 21:49:55 +00:00
dchandler
f9670233ba Removed documentation FIXMEs from this code; did away for good with
some really iffy code that I think was behind the "Tibetan->Wylie
conversion fails when keyboard isn't Extended Wylie" bug.
2003-03-30 16:13:00 +00:00
dchandler
58f7371e66 I hope that Revamped the "Tools>Convert Tibetan To Wylie" feature that
converts TibetanMachineWeb glyphs to THDL Wylie.  Three-glyph and
four-glyph sequences with implicit "a" vowels are now handled
correctly, except for disambiguation w.r.t. things like b-la-g
vs. bla-g and d-wa vs. dwa.

pa'am, pa'ang etc. now work too.

Illegal Tibetan sequences now become very ugly, but "correct" Wylie.
Correct in the sense that converting it back to glyphs should get you
the glyphs you started with.

I also made a change to TibetanMachineWeb.java that I hope will clear
up problems with this feature when keyboards other than "Extended
Wylie" are selected.

Took nga out of the farRightSet [postsuffixes]; only da and sa belong
there, right?

I tried to get the system in a state such that I could run automated
tests of this stuff, but I ran into difficulties.  I have some manual
test cases; ask if you're interested.
2003-03-30 02:31:16 +00:00
dchandler
fcb75c55eb Small performance improvement involving String.intern(). Plus a
little bit of code cleanup.
2003-01-05 05:57:44 +00:00
dchandler
d200b03d66 Updated the build system so that you must do a cvs checkout of the
'Fonts' module inside the 'Jskad' module.  I.e., you must now have the
tree like so:

Jskad/
   source/
   dist/
   Fonts/
       TibetanMachineWeb/
   .
   .
   .

This is because the THDL tools now optionally (and by default) load
the TibetanMachineWeb fonts automatically.

Updated the build system so that the 'web-start-releases' and
'self-contained-dist' targets JAR up optional JARs to create
double-clickable, self-contained joy.  Even the TMW fonts are in the
JARs now.

Changed the strings describing two Jskad keyboards so that "keyboard"
is no longer in the description.  It's in the label next to the combo
box.

Jskad now saves preferences on exit or when the user selects a menu
item (that is there for debugging mainly) to ~/my_thdl_preferences.txt
on *nix or C:\my_thdl_preferences.txt on Win32.  I don't know the
correct Mac location.

There's a new paradigm for telling org.thdl.util.ThdlOptions that a
user preference has been changed.  If, for example, a combo box is
manipulated so that the ACIP keyboard is selected, then you must call
a certain method in ThdlOptions.
2002-11-18 16:12:25 +00:00
dchandler
de6ae79959 Fixes bug 624133, "Input freezes after impossible character". Try 'shsM' in
ACIP or 'ShSm' in Extended Wylie to see the new behavior.

We use a trie to store valid input sequences.  In the future, we could use
the same trie as a replacement for the more inefficient HashSets we use to
store characters, vowels, and punctuation.  For example, we'd use
'validInputSequences.put("K", new Pair("consonant", "k"))' when reading
in the ACIP keyboard's description of the first consonant of the Tibetan
alphabet in 'TibetanKeyboard.java'.

Note that the current trie implementation is only useful for 7- or 8-bit
transcription systems, and works best for tries with low average depth, which
describes a transcription system's trie very well.  If you used arbitrary
Unicode in your keyboard, you'd need a different trie implementation.

Improved the optional keyboard input mode status messages.
2002-11-02 18:44:24 +00:00