Commit graph

315 commits

Author SHA1 Message Date
dchandler
245aac4911 I'm now stricter about accepting alphabetic characters. F, Q, X, a,
b, c, d, e, ... do not belong in ACIP, so the scanner rejects them.
This should make it even easier to distinguish automatically between
Tibetan and English texts.
2003-08-17 02:38:58 +00:00
dchandler
39451d8879 Fixed a couple of small bugs.
Only 250 errors are reported now; this is important if you try to
convert an English document.
2003-08-17 02:12:49 +00:00
dchandler
4581a2d8ab Improved the ACIP scanner (the part of the converter that says, "This
is a correction, that's a comment, this is Tibetan, that's Latin
(English), that's Tibetan inter-tsheg-bar punctuation, etc.)  It now
accepts more real-world ACIP files, i.e. it handles illegal
constructs.  The error checking is more user-friendly.  There are now
tests.

Added some tsheg bars that Peter E. Hauer of Linguasoft sent me to the
tests.  Many thanks, Peter.  I still need to implement rules that say,
"This is not Tibetan, it must be Sanskrit, because that letter doesn't
take a MA prefix."
2003-08-17 01:45:55 +00:00
dchandler
0b91ed0beb I've improved the ACIP tsheg bar scanner to handle a lot of illegal
constructions that occur in practice.
2003-08-16 16:13:53 +00:00
amontano
2a57439516 Updated the info displayed on the about window. 2003-08-14 14:16:49 +00:00
amontano
da384c6c2f Now when loading, takes the default font options from the DuffPane. 2003-08-14 14:16:23 +00:00
dchandler
2b59d9838d I now have a function that takes as input a String of ACIP and breaks
up that String into tsheg bars, punctuation, etc., while finding
errors.  I've tested it some, but I'm not yet committing the tests.

Next step: a converter that takes an ACIP file as input and outputs
TMW+Latin.
2003-08-14 05:10:47 +00:00
dchandler
57f506384f The ACIP->Tibetan converter now has perfect low-level functionality,
and it has the capability to produce error messages and warnings that
make sense to the user.  One can now get the correct parse, if one
exists, for an ACIP tsheg bar.

One could even feed in ACIP and get a list of warnings about things as
innocuous as PADMA, which a dumb converter would have trouble with.
One could then turn ACIP into well-behaved ACIP for that dumb
converter, if you really wanted to.

Still to do:

o Scan ACIP files into tsheg bars.
o Produce TMW/Latin (from which you can get Unicode, etc.).
o E-mail the illegal tsheg bars to the ACIP fellows so they can fix
  the affected documents (most of the Kangyur has unparseable
  creatures).
2003-08-12 04:13:11 +00:00
dchandler
87266646fb Removed misinformation. 2003-08-10 19:33:01 +00:00
dchandler
e21d3774a9 Added an unfinished ACIP->Tibetan converter. Once it works properly
for ACIP, it'll easily be made to work as a perfect EWTS
Wylie->Tibetan converter.  It has an extensive suite of tests for the
existing functionality.
2003-08-10 19:30:07 +00:00
dchandler
39e0435b6b Refactored this code so that Wylie->Tibetan and ACIP->Tibetan
conversions can make use of it.  Hooray for reuse.
2003-08-10 19:02:56 +00:00
dchandler
bcf1c12b6a We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie.
Our disambiguation is now perfect, happening when and only when it is
necessary.  These are all illegal, so it shouldn't affect many
existing conversions.  But if there were typos, it could.
2003-08-10 18:46:01 +00:00
dchandler
9093fd3c05 We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie.
Our disambiguation is now perfect, happening when and only when it is
necessary.  These are all illegal, so it shouldn't affect many
existing conversions.  But if there were typos, it could.
2003-08-10 18:38:20 +00:00
dchandler
251d8feae5 brtan now gives TMW->Wylie brtan, not b.rtan. Etc. See bug report
http://sourceforge.net/tracker/index.php?func=detail&aid=785791&group_id=61934&atid=502515.
2003-08-09 17:48:40 +00:00
dchandler
7dffc47cb7 'bad now gives TMW->Wylie 'bad, not TMW->Wylie 'abd. Andres came
across this one, so we've added it to the list of ambiguous three-consonant
combos.
2003-08-09 17:05:43 +00:00
amontano
52cdc17794 Added support for multiple keyboards and ability to set the preferences
for size of tibetan font and type and size of roman font.
2003-08-09 08:00:58 +00:00
amontano
8e4b508de8 Made a new class for the preference window so that other software
(i.e. the translation tool) can use re-use that same code to set up the
attributes of the tibetan and roman fonts.
2003-08-09 07:57:21 +00:00
amontano
ef0df405d9 Redesigned the interface of the handheld version. 2003-08-03 06:29:08 +00:00
amontano
2b5a5fe67a Got rid of redundant code 2003-08-03 06:28:22 +00:00
amontano
cce779bf88 Added a wizard window to avoid as much as possible using the command line.
This way through clicking on the application through the wizard one can choose
to connect to the available on-line dicts, open a local dict or generate a dict database.
2003-08-03 06:27:30 +00:00
dchandler
4caeafa1b1 You shouldn't have one of these without the other, now that there are two.
This way neither TM nor TMW fonts will be loaded.
2003-07-26 00:55:32 +00:00
dchandler
2bb499e5a7 This was dying with a NullPointerException when you started it up using
'ant tt-run' with no dictionary.  Now it starts up and shows you a nice
error message, "Dictionary could not be loaded!", instead.
2003-07-26 00:53:59 +00:00
dchandler
e198519c5f Jskad now supports EWTS ~, i.e. TMW8.91. 2003-07-25 02:35:31 +00:00
amontano
97f5fe91b3 when invalid wylie is encountered, instead of displaying a message it raises an exception. 2003-07-25 01:43:18 +00:00
amontano
7cdbf33333 changed it to support for 30 dictionaries (instead of just 15) 2003-07-25 01:42:17 +00:00
amontano
7b04d7bca5 changed the "about" info 2003-07-25 01:41:30 +00:00
dchandler
a7f0c35738 Added a test for ts.ha vs. tsha ambiguity; there is no ambiguity. 2003-07-18 03:51:29 +00:00
dchandler
dc454b8c0c More test cases related to the following:
The Tibetan d.za was being converted into the Wylie dza incorrectly.  This
is a rare case, but I want TMW->Wylie to be perfectly unambiguous.
2003-07-18 02:31:02 +00:00
dchandler
f8c959bfb0 The Tibetan d.za was being converted into the Wylie dza incorrectly. This
is a rare case, but I want TMW->Wylie to be perfectly unambiguous.
2003-07-18 00:30:27 +00:00
dchandler
1c29566aee I'm now using the Unix diff built in to Apache Jakarta Commons JRCS
(which I found on suigeneris.org, not apache.org) in order to bulletproof the
Tibetan Converter tests.  They used to fail due to nondeterminism in the
Java RTF writer; they should no longer fail.

I've also changed it so that the Tibetan Converter tests run in headless
mode, which means that they'll run on the nightly builds server.
2003-07-14 12:26:26 +00:00
dchandler
f900154e7a Tests disambiguation in TMW->Wylie conversion. 2003-07-14 12:21:02 +00:00
dchandler
0622ac5062 Jskad no longer relies on the <?Consonants?>, <?Vowels?>, <?Other?>,
or <?Numbers?> commands; it instead hard-codes the appropriate comma-
delimited lists.  This is cleaner because WylieWord and Jskad had different
values for these lists.
2003-07-14 12:19:46 +00:00
dchandler
fb85f6e8ce Fix comment. 2003-07-14 12:17:04 +00:00
dchandler
79b3b97326 Remove warning message from menu item. 2003-07-13 23:19:11 +00:00
dchandler
c986684beb Updated help to talk about new features. 2003-07-13 22:51:35 +00:00
dchandler
f695b1a6c1 Updated baselines because conversions have improved since the last
update.
2003-07-13 19:14:41 +00:00
dchandler
d10f97fc06 Disambiguation was not being used appropriately. This makes previous
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.

Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
2003-07-13 19:14:15 +00:00
dchandler
96afae795c Disambiguation was not being used appropriately. This makes previous
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.

Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
2003-07-13 18:46:29 +00:00
dchandler
802e0cb588 If this method uses the Wylie representation, you get an infinite recursion
when you do a TMW->Wylie conversion for a document with glyphs that
have no known Wylie.
2003-07-13 17:40:02 +00:00
dchandler
a86a0f235b I was missing a break; statement; this caused an Error to be thrown during
some TMW->Wylie conversions.  No conversions were erroneous, though.
2003-07-13 17:38:00 +00:00
dchandler
6677d1e245 Code cleanup. 2003-07-13 16:53:03 +00:00
dchandler
3b6eaa792e Fixed javadocs. 2003-07-11 13:33:30 +00:00
dchandler
85176cd9f3 Put in a fix for a new bug in Swing's RTF support. This bug is w.r.t. escapes
like \bullet, \emdash, etc., and this fix only works for Windows or OS/2 RTF
files, not for Mac RTF files.  So if you want a TM->TMW conversion to work,
use MS Word for Windows, not for the Mac.
2003-07-11 13:30:22 +00:00
dchandler
d726bc0258 A couple of changes to TMW->Unicode thanks to Than's reply to my
questions.
2003-07-09 01:44:15 +00:00
dchandler
02558a1d78 Jskad supports <7, >8, etc. again; it no longer supports the punctuation
'<' and '>'.  The current keyboard implementation makes this an either-or
proposition, when fundamentally it need not be.

Added a <?Numbers?> command and an <?Input:Numbers?> command to
tibwn.ini; broke the numbers apart from the consonants.  This facilitates the
new-and-improved Tibetan->Wylie conversion.

Tibetan->Wylie is now done by forming legal tsheg-bars.  A legal tsheg bar
is converted into perfect THDL Wylie.  See code comments to learn what
it thinks is a legal tsheg-bar, but it inlcudes bskyUMbsH minus the trailing
punctuation (H), e.g.

Illegal sequences, such as runs of transliterated Sanskrit, are turned into
unambiguous Wylie; each glyph is followed by a vowel or a disambiguator
('.').

I've made it so that the illegal sequences are as beautiful as possible.  You
get 'pad+me', for example, not the equivalent but uglier 'pad+m.e.'.
2003-07-08 14:30:17 +00:00
dchandler
c04a3f189b Rearranged the topics. 2003-07-08 12:50:27 +00:00
dchandler
23d18c925f Tibetan! 5.1's docs were again faulty. fa and va were getting the wrong
vowels.
2003-07-08 02:59:17 +00:00
dchandler
24ac6fd06c The Trie of possible inputs fixed this bug. 2003-07-06 16:31:13 +00:00
dchandler
d88141512b Small changes w.r.t. clearing preferences. Some code cleanup. 2003-07-06 16:24:29 +00:00
dchandler
086f4bb6ec Renamed the Info menu Help.
Now using CalHTMLPane to surf the offline and the online help.
2003-07-05 22:25:21 +00:00
dchandler
8c4ab30a52 Rearranged the Tools menu; made the converter smart about "find some..."
and "find all..." modes.
2003-07-05 21:02:46 +00:00
dchandler
72d2eee503 Code cleanup. 2003-07-05 19:26:58 +00:00
dchandler
a463b686b3 Jskad now ships with both TibetanMachine and TibetanMachineWeb fonts
by default, not just TMW.  Thus users need not install these fonts on their
systems.
2003-07-05 18:00:29 +00:00
dchandler
9effee0564 If you opened a file from the recently opened files list and very quickly
mouse-clicked on the new Jskad window, you could cause an infinite
regression of requestFocus() operations because the menu would try
to get focus back.  I grab focus from the menu now.
2003-07-05 02:30:00 +00:00
dchandler
51679c158b Final fixes completed; recently opened files can now be selected from
Jskad's file menu.
2003-07-05 02:15:33 +00:00
dchandler
4410b52c07 There's still a small bug in this, but here's the real stuff:
Recently opened files can now be selected from Jskad's file menu.

A Jskad now gives the focus to the DuffPane when that Jskad gets the
focus.
2003-07-04 03:29:25 +00:00
dchandler
d863446d25 I think *this* compiles... 2003-07-04 02:32:40 +00:00
dchandler
407020108f I didn't mean to commit the previous revision; I'm still tweaking it. 2003-07-04 02:32:03 +00:00
dchandler
9f0b1c3250 Recently opened files can now be selected from Jskad's file menu.
A Jskad now gives the focus to the DuffPane when that Jskad gets the
focus.
2003-07-04 02:31:23 +00:00
dchandler
7500b4e06b Jskad won't allow you to exit by closing the last window anymore. Instead,
you get a dialog box saying to use File/Exit.
2003-07-04 00:21:07 +00:00
dchandler
6c286573ba Fixed Javadocs. 2003-07-04 00:12:59 +00:00
dchandler
0a1bc0d30b getWylie now takes a parameter for error detection; I'm not detecting errors
here though.

Fixed a typo in a property name.
2003-07-01 23:20:08 +00:00
dchandler
0d1999d055 getWylie now takes a parameter for error detection; I'm not detecting errors
here though.
2003-07-01 22:52:18 +00:00
dchandler
a48ec641d5 Better error messages in TMW->Wylie conversions. The user knows what's
up.
2003-07-01 03:43:33 +00:00
dchandler
3113a4b8de Some of the \tmw80.. mappings were out of date.
3+1/2 is not EWTS; took these out.
2003-07-01 03:42:30 +00:00
dchandler
e7e7c2bf15 The command-line tool runs in headless mode by default, so it will
work on a Linux console, e.g.  The JUnit tests will too, though 'ant
check' still fails because we don't sneak the -Djava.awt.headless=true
into the process early enough.
2003-07-01 02:50:09 +00:00
dchandler
6151a7bc94 TMW->Wylie now occurs in the TibetanDocument, not in DuffPane,
which means that the command-line tool can finally function with a headless
graphics device.  Hopefully it will speed things up, too.  It also means that
entering Roman text into the TMW->Unicode conversion and TMW->TM
conversion will be easy.
2003-07-01 01:21:57 +00:00
dchandler
61d29fc355 The TMW->Wylie mapping was busted w.r.t. tshegs.
Also, I now map both TMW7.90 and TMW7.91 to EWTS 'M'.
2003-07-01 00:17:18 +00:00
dchandler
229536884f I've validated by hand the TM<->TMW mappings. A few things changed, so
no previous TM->TMW or TMW->TM conversions can be trusted.
2003-06-30 02:24:11 +00:00
dchandler
dc03083433 I've validated by hand the TM<->TMW mappings. A few things changed, so
no previous TM->TMW conversions can be trusted.
2003-06-30 02:22:09 +00:00
dchandler
58644a6ef9 Better error handling. 2003-06-30 02:20:52 +00:00
dchandler
b16fb8a85c This is correct; the Tibetan! 5.1 documentation is not. This affects
TM->TMW conversions.

See http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515
for a full list of Tibetan! 5.1 documentation errors.
2003-06-29 22:11:00 +00:00
dchandler
aedef4b44d An error now appears if you try to convert from format A to format B but no
glyphs in format A appear.  In this case, it is likely that you meant to convert
a different file or do a different conversion.
2003-06-29 21:31:48 +00:00
dchandler
ee14b7b97f Jskad now has the ability to open its buffer with an external viewer, e.g.
Microsoft Word.

Better OOM error handling in the GUI converter; untested, though.
2003-06-29 20:49:30 +00:00
dchandler
646e23b4a4 Tweaked the converter GUI so that you can open the old and the new files
with the external viewer.
2003-06-29 16:45:15 +00:00
dchandler
3f76c3692d Fixed Javadoc warnings. 2003-06-29 15:37:35 +00:00
dchandler
b841a7f14b The converter GUI can now be run standalone or from Jskad's Tools menu.
The converter GUI gives nicer error messages in at least one case.
2003-06-29 04:18:36 +00:00
dchandler
7938648ca8 TM->TMW conversion has no known bugs. Oddballs have been
comprehensively handled.
2003-06-29 03:03:07 +00:00
dchandler
4e279defb4 Fixed a couple of array bounds checks.
Added support for two more oddballs.

Deprecated the oddball lookup method because it drops up to 30 glyphs in
TibetanMachine.  The correct solution is to transform the RTF before Java's
busted RTF readers ever see it.  \'97 becomes \u151, e.g.
2003-06-28 16:33:58 +00:00
dchandler
2a359c45ef Bad conversions were not leaving the unconvertable characters at the
beginning of the document as they should and as they are documented to.

They now do, and they bracket the bad characters with the TM or TMW for
U+0F3C on the left and the TM or TMW for U+0F3D on the right.

Some cleanup.
2003-06-28 16:20:19 +00:00
dchandler
c39d8d6326 My earlier code cleanup introduced this bug; TMW->TM conversion was
busted.
2003-06-26 22:48:51 +00:00
dchandler
25510542b2 Now with a nicer error message in one case. 2003-06-26 22:48:05 +00:00
dchandler
c34259b105 Code cleanup. 2003-06-25 01:04:24 +00:00
dchandler
9e6c3009ac Added an About button. Code cleanup. Changed the Cancel button to the
Close button.
2003-06-25 00:49:11 +00:00
dchandler
33beb7b782 Bye bye debugging output. 2003-06-24 12:23:37 +00:00
dchandler
f547734043 Added Than's converter GUI code; adapted it to work with Jskad's
converters.

TMW->Unicode now uses Ximalaya by default.
2003-06-24 03:02:29 +00:00
dchandler
917864574c Fixed a logic bug in mapTMWtoTM and mapTMtoTMW.
You can now specify which Unicode font to use via 'java
-Dthdl.tmw.to.unicode.font=Ximalaya ...'.
2003-06-23 01:58:11 +00:00
dchandler
b6d8fd89f9 When errors in (all but TMW->Wylie and Wylie->TMW) conversion occur,
the troublesome glyphs are now put at the beginning of the document
AFTER AN ACHEN.  This makes a glyph like \tmw7095 visible atop the
achen.

Major fix to the handling of paragraphs in conversion; we were (for
whatever reason) dropping paragraphs before.
2003-06-23 01:24:02 +00:00
dchandler
1f4343bed0 TMW->TM, TM->TMW, and TMW->Unicode conversions are all (at least 2)
orders of magnitude faster.
2003-06-22 22:10:58 +00:00
dchandler
afe73c2228 The pseudo-file '-', referring to standard input, is now accepted as a
command-line argument.
2003-06-22 21:05:16 +00:00
dchandler
900f7492b0 'ant clean check' was failing because I hadn't updated the
--find-some-non-tmw and --find-all-non-tmw baselines.

Code cleanup.
2003-06-22 16:11:58 +00:00
dchandler
66287f3cc9 Small TMW->Wylie performance improvements. TMW->Wylie is *much*
faster than TMW->Unicode etc.; this is because many fewer replacements
are made (i.e., more text is replaced each time a replacement is
performed).

I must find a way to still preserve formatting but do many fewer
replacements in TMW->{Unicode,TM} and TM->TMW.
2003-06-22 04:32:59 +00:00
dchandler
6540b260bd Fixes a (small, I think) TMW->Unicode performance glitch. I was
inserting 5 characters at a time and then skipping ahead just one
position.  I don't think this affected correctness.

I believe there's still a terrible (exponential?) slowdown as the
input file gets bigger, however.  Perhaps not -- but we run through
the first 1000 TMW glyphs in 6 seconds, the 20th thousand takes at
least 60 seconds.  Is TMW->Wylie faster than TMW->Unicode?  If so,
why?

Thought: don't use a DuffPane within TibetanConverter -- it can only
add overhead, right?  My hprof profile said that the conversion was
taking just a couple of percent of the work; the rest was going to
display-related stuff that you should only see if you were displaying
the document.  I'm not!
2003-06-22 04:08:33 +00:00
dchandler
dfe64a1927 Added --find-some-non-tm and --find-all-non-tm modes to the converter to
help ensure worry-free TM->TMW conversions.
2003-06-22 00:14:18 +00:00
dchandler
80101666c7 Included a fix from WylieWord's tibwn.ini. Removed some needless trailing
tildes.
2003-06-21 02:35:21 +00:00
dchandler
9a41f512d9 It used to be the case that you could select 'Close', and then when asked
"do you want to save?" you could press yes and then press cancel and
Jskad would still exit.  That's no longer the case.

Added File->Exit to Jskad.
2003-06-21 02:07:51 +00:00
dchandler
45b87b0fb4 In Jskad, you can now clear the preferences and return to default values. 2003-06-21 01:26:17 +00:00
eg3p
fbb6245fdb Added cut() and copy() methods to override JTextPane's methods of same name. 2003-06-20 15:27:20 +00:00
dchandler
5067683121 Edward corrected me; he had intended to have M map to 7.91, not 7.90. 2003-06-17 01:46:19 +00:00
dchandler
ced830a7d3 Renamed TMW_RTF_TO_THDL_WYLIE TibetanConverter. 2003-06-15 19:19:23 +00:00