dchandler
e058d6252e
phywu and drwu now produce zhabs-kyus. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.
2003-11-08 21:48:08 +00:00
dchandler
55aaeef9d0
l+h+wu now produces a zhabs-kyu. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to l+h+w, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.
2003-11-08 21:23:50 +00:00
dchandler
06edf17b04
Once again, the wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.
2003-11-08 21:17:18 +00:00
dchandler
f626a04d72
Tests t+r+n glyph.
2003-11-08 20:28:34 +00:00
dchandler
74d6bc61ab
The wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.
2003-11-08 20:25:16 +00:00
dchandler
a0ae0bf70d
Fixes bug 800164. Jskad users can now enter t+r+n on the keyboard. Wylie Word should work for t+r+n too.
2003-11-08 17:50:10 +00:00
dchandler
0ac90d7c0f
Nathanial -> Nathaniel
2003-11-08 03:42:51 +00:00
dchandler
e3f1ed5914
Removed a DOS EOF character (^Z). I haven't a clue how it crept in -- the lexer doesn't let that kind of thing get into tsheg bars.
2003-10-27 13:58:45 +00:00
dchandler
94a43d3f39
Now anything not clearly native Tibetan is colored green when coloring is enabled. G'EEm is "native", though -- the only "vowel" that implies non-nativeness is {:}, as in {KA:}.
2003-10-26 18:56:48 +00:00
dchandler
5c36dd81d3
Fixed bug 830332, "Convert selected ACIP=>Tibetan busted".
2003-10-26 18:25:25 +00:00
dchandler
e74547d743
GA-YOGS now parses like G-YOGS and GAYOGS do.
2003-10-26 18:06:38 +00:00
dchandler
61cf19932e
ACIP {B5} and {7'} were problematic; that's fixed.
2003-10-26 17:47:35 +00:00
dchandler
ad7b20e485
Added yet more metadata.
2003-10-26 16:05:30 +00:00
dchandler
1550fee41a
Removed garbage.
2003-10-26 16:05:07 +00:00
dchandler
fe33d67573
Added more metadata. There are 35 million+ tsheg bars here.
2003-10-26 15:35:08 +00:00
dchandler
050666d735
I'm committing this at 1:55 am EST on Sunday, October 26, 2003. There
...
is no compelling technical reason, but this way I get to have two
commits that are both before and after each other.
Freaky.
2003-10-26 06:56:12 +00:00
dchandler
31b3020d07
Added a test case that runs almost all the tsheg bars from all
...
non-reference, publicly available ACIP files (hundreds of megabytes of
them) through the converter. The frequencies of these tsheg bars in
in the file, too.
2003-10-26 06:02:48 +00:00
dchandler
7ba1ad0735
Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc.
2003-10-26 02:42:06 +00:00
dchandler
ef24c608bf
Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically.
...
This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.
2003-10-26 02:17:19 +00:00
dchandler
6bda550157
The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
...
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:32:55 +00:00
dchandler
d99ae50d8a
The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
...
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:24:28 +00:00
dchandler
1415fc43e3
The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
2003-10-26 00:21:54 +00:00
dchandler
306cf2817c
Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
...
Added a few new tests.
2003-10-25 21:47:34 +00:00
dchandler
f106deb884
Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
...
Added a few new tests.
2003-10-25 21:40:21 +00:00
dchandler
af013a6a39
I renamed this function a while ago.
2003-10-22 02:49:16 +00:00
dchandler
7d24ab393f
Code cleanup.
2003-10-21 03:44:02 +00:00
dchandler
c764eee8d0
Added a new warning for DMAR and others affected similarly affected by prefix rules, where seeing D+MAR, not D-MAR, could have caused an input operator to type in DMAR. This is a "Most" warning, but DMA causes a higher-priority "Some" warning.
2003-10-21 03:36:57 +00:00
dchandler
2f39921381
Added more test cases.
2003-10-21 02:14:45 +00:00
dchandler
2f81a801ef
Added three new kinds of warnings to ACIP->Tibetan conversions.
2003-10-21 02:00:49 +00:00
dchandler
a47af2c165
Bulletproofing -- code cleanup.
2003-10-21 00:31:10 +00:00
dchandler
188b9c322e
Warn about prefix rules only in Most and All modes.
2003-10-21 00:23:55 +00:00
dchandler
1224030898
Speedup.
2003-10-21 00:19:15 +00:00
dchandler
1d9b405bb8
Forgot to add this file earlier.
2003-10-20 13:49:54 +00:00
dchandler
5d9305c9d5
"Browse..." buttons are smart about file types now.
2003-10-19 23:17:25 +00:00
dchandler
3aa3859354
ACIP->Unicode crash fixed.
...
5% of the code for support of ACIP->Unicode.rtf is here.
2003-10-19 22:19:16 +00:00
dchandler
5aab4acc93
I've undone the SNYAM'AM == SNYAMA'AM hack. The only occurrence of SNYAM'AM in the ACIP texts I've got is likely a typo, says Robert Chilton.
...
The code would be cleaner if I could bear to delete my terrible hack. Maybe in a month, when I don't feel so dumb for coding it up in the first place.
The correct solution for such things is to give the ACIP->Tibetan converters a pre-filter mechanism. This would be before the lexer or part of the lexer (maybe you only want to filter tsheg bars), and it would allow the end user to specify things like "s/SNYAM'AM/S+NYAMA'AMA/g".
2003-10-19 20:48:22 +00:00
dchandler
4b1395e0ba
Jskad has a new feature: Convert Selection from ACIP to Tibetan. It uses the ACIP converter to do its work.
...
Improved some error messages from the ACIP->Tibetan converter.
2003-10-19 20:16:06 +00:00
dchandler
5ce84d4d9a
Tiny code cleanup.
2003-10-19 04:43:34 +00:00
dchandler
0edebd55d7
We were dying in the "can ts+h take a ga prefix?" check for GTZHAN.
2003-10-19 03:47:33 +00:00
dchandler
47648186b4
Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences.
2003-10-18 18:34:49 +00:00
dchandler
e5534f69ee
Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences.
2003-10-18 18:29:46 +00:00
dchandler
557ed7ed44
DKY'O etc. weren't being handled properly by ACIP->Tibetan. Now they are.
2003-10-18 17:49:29 +00:00
dchandler
e799438f86
CVS ignoring backup files.
2003-10-18 17:47:56 +00:00
dchandler
3b55ea509f
Prefix rules have changed. A few are gone; a few new ones are here. I've implemented here a list that Robert Chilton sent me in private correspondence. He doesn't describe it as definitive, but since it affects ACIP->Tibetan conversions, and it's the best I've got, here they are. There's still an optional warning about "Hey, prefix rules matter for this tsheg bar."
...
I've left in a few rules that I didn't find on RC's list; I've asked him to look into these further.
2003-10-18 05:48:53 +00:00
dchandler
f28bee4c71
The appendage 'um is here too.
2003-10-18 05:10:49 +00:00
dchandler
8c99adeb63
TMW->EWTS, TMW->ACIP, and ACIP->Unicode/TMW now support more appendages. Personal correspondence with Robert Chilton led me to support, besides 'am, 'ang, 'o, 'i, and 'u, the following:
...
'e (used in foreign transliteration)
'ongs
'is
'os
'ur
'us
'ung
2003-10-18 03:04:47 +00:00
dchandler
5e18feb47d
ACIP now stacks greedily. TTTTTA is T+T+T+T+TA, even though that stack doesn't exist in TM or TMW. Robert Chilton, in personal correspondence, agreed that this is the way to do things.
...
ACIP handles the appendages 'AM, 'ANG, 'US, 'UR, 'I, 'O, and 'U correctly.
2003-10-16 04:15:10 +00:00
dchandler
5f4fbfab7c
Bulletproofing and debugging support.
2003-10-16 04:13:14 +00:00
dchandler
129ebccd67
In TCC #1 keyboard, h>cj now works. I may have fixed this in a terrible way, breaking other things even. Hard to say because I don't really understand the code I changed. But DuffPaneTest passes.
...
If we ever clean up the keyboards, the changes made here to tcc_keyboard.ini should probably be undone.
2003-10-12 18:16:17 +00:00
dchandler
d7fdacfcdc
Open menu is now Open..., Save as is now Save as...
2003-10-12 18:12:19 +00:00
dchandler
8dbfff17e1
All .rtf and .Rtf and .RTF files are selectable now.
2003-10-12 18:11:50 +00:00
dchandler
35209ce7fd
I'm going to have to debug this, and the tab stops make the source unreadable. I don't like messing with whitespace, but it seems like I'll be the main maintainer for a while, and the people after me can use cvs diff -wb. So I'm untabifying.
2003-10-12 16:44:28 +00:00
dchandler
749b8d6727
Added toString for debugging.
2003-10-04 16:33:47 +00:00
dchandler
b983af8031
r-t, not rt. This was why converting 'brtul' from TMW to Wylie didn't work.
2003-10-04 16:33:23 +00:00
dchandler
6a11eddb1e
Warning level "None" wasn't working.
2003-10-04 16:12:48 +00:00
dchandler
b10098cc61
"Most" warnings now excludes "the last stack has no vowel", making it much more useful.
2003-10-04 15:10:18 +00:00
dchandler
ee50291ed4
Andres found that "THAG PA" caused a NullPointerException. That's fixed.
...
Renamed ACIPString to TString -- we'll use this for EWTS and ACIP both.
TMW->ACIP for TMW9.61 should work now.
2003-10-04 01:22:59 +00:00
amontano
c8927b827c
Fixed bugs in the scanner. Added reference to yogacara bhumi in the about window.
2003-09-23 19:05:23 +00:00
amontano
e89c49651c
Now translation tool accepts synonyms separated by ';' in the entry field.
2003-09-14 05:56:20 +00:00
dchandler
115d0e0e6c
Fixed ACIP->TMW vowels like 'I etc.
...
Fixed ACIP->Unicode/TMW for BDE, which should be B-DE, not B+DE, because the former is legal Tibetan.
The ACIP->EWTS subroutine has improved.
TMW->Wylie and TMW->ACIP are improved in error cases.
TMW->ACIP has friendly embedded error messages now.
2003-09-12 05:06:37 +00:00
dchandler
16817d0b8e
Fixed Javadocs.
2003-09-10 01:19:05 +00:00
amontano
cc853be387
Fixed a bug with regards to the word order in the servlet version.
2003-09-09 16:02:03 +00:00
amontano
1467f9cd3f
Fixed display of servlet version and added option to include links to
...
other versions. See http://iris.lib.virginia.edu/tibetan/servlet/org.thdl.tib.scanner.OnLineScannerFilter?thdlBanner=on
2003-09-08 21:32:40 +00:00
amontano
73d01111ca
Fixed the "clicking on the translate button makes the thdl menu go away"
...
error. on the servlet version of the translation tool.
2003-09-08 16:39:18 +00:00
amontano
07fbbcaf45
Solved some sorting errors with the servlet version.
...
Also if the service parameter thdlBanner=anything is sent, the THDL's
java script menu is displayed (if it is running on the thdl server). There is
still a bug. Menu goes away when pressing "translate" button. See:
http://iris.lib.virginia.edu/tibetan/servlet/org.thdl.tib.scanner.OnLineScannerFilter?thdlBanner=on
2003-09-08 08:12:56 +00:00
dchandler
e42d76b3b8
Nicer default Latin font for ACIP->* conversions.
...
Performance improvement in non-color-coding mode.
2003-09-07 22:08:35 +00:00
dchandler
6872ea8028
Corrected the usage info.
2003-09-07 22:08:00 +00:00
dchandler
d8657abd44
ACIP font shrinking as in {KA (GA)} is now supported.
2003-09-07 18:30:59 +00:00
dchandler
07e360d9a8
The ACIP {NYA%} is supported. {NYAo} and {NYAx} are confusing to me,
...
because I don't know which glyphs o and x correspond to. For that
reason, they cause ERRORs.
The proposed THDL Extended Wylie ~X and X is now used for U+0F35 and
U+0F37 respectively.
2003-09-07 16:19:50 +00:00
amontano
f57cdda867
Now translation tool displays to where is it connected
2003-09-07 03:40:51 +00:00
amontano
b489034598
Fixed a call to a deprecated method
2003-09-07 03:39:08 +00:00
dchandler
0d6d6ed611
Added GUI support for color-coding. Added support for color-coding
...
and choosing the warning level to TibetanConverter.
Better error checking in the GUI converter.
2003-09-06 22:56:10 +00:00
dchandler
1308f14807
sanskrit=green, prefix-rule-afflicted-tsheg-bar=yellow
2003-09-05 06:05:46 +00:00
dchandler
899b042ec0
Preliminary, untested color support in ACIP->TMW conversion.
2003-09-05 05:54:35 +00:00
dchandler
717c3b94f3
Fixed ACIP->Unicode spaces/tshegs and newlines, especially with shads.
...
"NGA," becomes "NGA-tsheg-," automatically now.
2003-09-05 05:08:47 +00:00
dchandler
5c240ac072
From the converter GUI, you can now choose TMW->ACIP text and
...
TMW->Wylie text. All the conversions show you which format they take
as input and which format they give as output.
File filter for ACIP files added.
The GUI converter suggests a file extension wisely.
Fixed newline bug in ACIP->Unicode converter.
2003-09-05 02:05:34 +00:00
dchandler
4abbf6db37
--to-acip-text and --to-wylie-text added; these get you text files,
...
not RTF files like --to-acip and --to-wylie do. The GUI converter
doesn't yet allow you to get text files.
2003-09-04 05:16:47 +00:00
dchandler
cc615f34df
ACIP->TMW and ACIP->Unicode have my pre-stamp of non-approval. Except
...
for (NYAx} and {NYAo}, they're as good as I'll get them without input
from experts of the employ of a complementary, syllabary-based
approach.
2003-09-04 04:34:18 +00:00
dchandler
ae7a7577bc
ACIP->TMW and ACIP->Unicode are now smart about when a newline is
...
really a newline and when a space is really a tsheg. The space in {KA
,MDO} is a tsheg, but the space in {GA ,MDO} is not.
2003-09-04 04:13:01 +00:00
dchandler
d2749cecd0
ACIP->TMW and ACIP->Unicode are now smart about when a newline is
...
really a newline and when a space is really a tsheg. The space in {KA
,MDO} is a tsheg, but the space in {GA ,MDO} is not.
2003-09-04 04:04:21 +00:00
dchandler
72e531e515
Use shortened 'dreng-bu, not regular. As per TM glyphs. I suspect
...
that the following would look better with shortened 'dreng-bu also,
but I'm sticking with the TM/TMW docs:
dz+r~137,2~~4,46~1,110~4,120~1,123~1,126~4,106~4,113~f5b,fb2
dz+w~138,2~~4,47~1,110~4,120~1,123~1,126~4,106~4,113~f5b,fad
dz+h~139,2~~4,48~1,110~4,120~1,123~1,126~4,106~4,113~0F5C
dz+h+y~140,2~~4,49~1,110~4,121~1,123~1,126~4,107~4,114~f5c,fb1
dz+h+r~141,2~~4,50~1,110~4,121~1,123~1,126~4,107~4,114~f5c,fb2
dz+h+l~249,2~~4,51~1,110~4,123~1,123~1,126~4,110~4,117~f5c,fb3
dz+h+w~143,2~~4,52~1,110~4,122~1,123~1,126~4,108~4,115~f5c,fad
2003-09-04 03:46:35 +00:00
a1tsal
2f58ec2760
A bunch of Sanskrit stacks of the form ts+... and dz+...had 1,125 for their
...
drengbu, but that is actually a naro. I changed it to 1,123
(which is one of the two drengbus).
2003-09-04 02:06:58 +00:00
dchandler
316f59107b
A preliminary TMW->ACIP converter is here. There are known bugs, mostly with rare punctuation.
2003-09-02 06:39:33 +00:00
dchandler
cc9ab06864
Added utility routine. Better comments.
2003-08-31 20:38:28 +00:00
dchandler
045c4069c9
Preliminary ACIP->TMW support is in place. {DU} gives you something
...
less beautiful than what Jskad would give, so more work is needed.
2003-08-31 16:06:35 +00:00
a1tsal
1f4d53be2e
Moved ^M to punctuation section.
...
Removed obsolete comment.
2003-08-31 00:44:23 +00:00
a1tsal
522812996e
Remove unused sections of tibwn.ini.
2003-08-31 00:34:15 +00:00
dchandler
dd22e161a5
Code cleanup for Jskad's Tibetan font converter GUI.
2003-08-30 05:01:15 +00:00
dchandler
896344f2d1
David Chapman removed some lines from tibwn.ini. That breaks TM<->TMW
...
mappings, so I've put them back, but with the EWTS non-correspondences
\tmwXYYY.
Jskad no longer supports superscribed or subscribed numerals, because
EWTS does not.
2003-08-26 01:28:02 +00:00
a1tsal
ccdebf6719
Removed half numbers (no longer in EWTS)
...
Brought <?Other?> closer to EWTS
Removed __TILDE__ (no longer in EWTS)
Changed M^ to ^M per new EWTS draft
Added ai, au, -i from WW tibwn.ini -- they were missing in this version
2003-08-25 23:19:48 +00:00
dchandler
1982c5847b
Jskad's converter now has ACIP-to-Unicode built in. There are known
...
bugs; it is pre-alpha. It's usable, though, and finds tons of errors
in ACIP input files, with the user deciding just how pedantic to be.
The biggest outstanding bug is the silent one: treating { }, space, as
tsheg instead of whitespace when we ought to know better.
2003-08-24 06:40:53 +00:00
dchandler
d5ad760230
TMW->Wylie conversion now takes advantage of prefix rules, the rules
...
that say "ya can take a ga prefix" etc.
The ACIP->Unicode converter now gives warnings (optionally, and by
default, inline). This converter now produces output even when
lexical errors occur, but the output has errors and warnings inline.
2003-08-23 22:03:37 +00:00
dchandler
21ef657921
I'd broken the ACIP->Wylie for ACIP vowels {'A}, {'I}, etc.
2003-08-22 05:13:32 +00:00
dchandler
1afb3a0fdd
ACIP->Unicode, without going through TMW, is now possible, so long as
...
\, the Sanskrit virama, is not used. Of the 1370-odd ACIP texts I've
got here, about 57% make it through the gauntlet (fewer if you demand
a vowel or disambiguator on every stack of a non-Tibetan tsheg bar).
2003-08-18 02:38:54 +00:00
dchandler
245aac4911
I'm now stricter about accepting alphabetic characters. F, Q, X, a,
...
b, c, d, e, ... do not belong in ACIP, so the scanner rejects them.
This should make it even easier to distinguish automatically between
Tibetan and English texts.
2003-08-17 02:38:58 +00:00
dchandler
39451d8879
Fixed a couple of small bugs.
...
Only 250 errors are reported now; this is important if you try to
convert an English document.
2003-08-17 02:12:49 +00:00
dchandler
4581a2d8ab
Improved the ACIP scanner (the part of the converter that says, "This
...
is a correction, that's a comment, this is Tibetan, that's Latin
(English), that's Tibetan inter-tsheg-bar punctuation, etc.) It now
accepts more real-world ACIP files, i.e. it handles illegal
constructs. The error checking is more user-friendly. There are now
tests.
Added some tsheg bars that Peter E. Hauer of Linguasoft sent me to the
tests. Many thanks, Peter. I still need to implement rules that say,
"This is not Tibetan, it must be Sanskrit, because that letter doesn't
take a MA prefix."
2003-08-17 01:45:55 +00:00
dchandler
0b91ed0beb
I've improved the ACIP tsheg bar scanner to handle a lot of illegal
...
constructions that occur in practice.
2003-08-16 16:13:53 +00:00
amontano
2a57439516
Updated the info displayed on the about window.
2003-08-14 14:16:49 +00:00
amontano
da384c6c2f
Now when loading, takes the default font options from the DuffPane.
2003-08-14 14:16:23 +00:00
dchandler
2b59d9838d
I now have a function that takes as input a String of ACIP and breaks
...
up that String into tsheg bars, punctuation, etc., while finding
errors. I've tested it some, but I'm not yet committing the tests.
Next step: a converter that takes an ACIP file as input and outputs
TMW+Latin.
2003-08-14 05:10:47 +00:00
dchandler
57f506384f
The ACIP->Tibetan converter now has perfect low-level functionality,
...
and it has the capability to produce error messages and warnings that
make sense to the user. One can now get the correct parse, if one
exists, for an ACIP tsheg bar.
One could even feed in ACIP and get a list of warnings about things as
innocuous as PADMA, which a dumb converter would have trouble with.
One could then turn ACIP into well-behaved ACIP for that dumb
converter, if you really wanted to.
Still to do:
o Scan ACIP files into tsheg bars.
o Produce TMW/Latin (from which you can get Unicode, etc.).
o E-mail the illegal tsheg bars to the ACIP fellows so they can fix
the affected documents (most of the Kangyur has unparseable
creatures).
2003-08-12 04:13:11 +00:00
dchandler
87266646fb
Removed misinformation.
2003-08-10 19:33:01 +00:00
dchandler
e21d3774a9
Added an unfinished ACIP->Tibetan converter. Once it works properly
...
for ACIP, it'll easily be made to work as a perfect EWTS
Wylie->Tibetan converter. It has an extensive suite of tests for the
existing functionality.
2003-08-10 19:30:07 +00:00
dchandler
39e0435b6b
Refactored this code so that Wylie->Tibetan and ACIP->Tibetan
...
conversions can make use of it. Hooray for reuse.
2003-08-10 19:02:56 +00:00
dchandler
bcf1c12b6a
We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie.
...
Our disambiguation is now perfect, happening when and only when it is
necessary. These are all illegal, so it shouldn't affect many
existing conversions. But if there were typos, it could.
2003-08-10 18:46:01 +00:00
dchandler
9093fd3c05
We now produce EWTS m.ya, g.rwa, d.rwa, and b.ya during TMW->Wylie.
...
Our disambiguation is now perfect, happening when and only when it is
necessary. These are all illegal, so it shouldn't affect many
existing conversions. But if there were typos, it could.
2003-08-10 18:38:20 +00:00
dchandler
251d8feae5
brtan now gives TMW->Wylie brtan, not b.rtan. Etc. See bug report
...
http://sourceforge.net/tracker/index.php?func=detail&aid=785791&group_id=61934&atid=502515 .
2003-08-09 17:48:40 +00:00
dchandler
7dffc47cb7
'bad now gives TMW->Wylie 'bad, not TMW->Wylie 'abd. Andres came
...
across this one, so we've added it to the list of ambiguous three-consonant
combos.
2003-08-09 17:05:43 +00:00
amontano
52cdc17794
Added support for multiple keyboards and ability to set the preferences
...
for size of tibetan font and type and size of roman font.
2003-08-09 08:00:58 +00:00
amontano
8e4b508de8
Made a new class for the preference window so that other software
...
(i.e. the translation tool) can use re-use that same code to set up the
attributes of the tibetan and roman fonts.
2003-08-09 07:57:21 +00:00
amontano
ef0df405d9
Redesigned the interface of the handheld version.
2003-08-03 06:29:08 +00:00
amontano
2b5a5fe67a
Got rid of redundant code
2003-08-03 06:28:22 +00:00
amontano
cce779bf88
Added a wizard window to avoid as much as possible using the command line.
...
This way through clicking on the application through the wizard one can choose
to connect to the available on-line dicts, open a local dict or generate a dict database.
2003-08-03 06:27:30 +00:00
dchandler
4caeafa1b1
You shouldn't have one of these without the other, now that there are two.
...
This way neither TM nor TMW fonts will be loaded.
2003-07-26 00:55:32 +00:00
dchandler
2bb499e5a7
This was dying with a NullPointerException when you started it up using
...
'ant tt-run' with no dictionary. Now it starts up and shows you a nice
error message, "Dictionary could not be loaded!", instead.
2003-07-26 00:53:59 +00:00
dchandler
e198519c5f
Jskad now supports EWTS ~, i.e. TMW8.91.
2003-07-25 02:35:31 +00:00
amontano
5df9b5b91a
now supports sorting
2003-07-25 01:43:58 +00:00
amontano
97f5fe91b3
when invalid wylie is encountered, instead of displaying a message it raises an exception.
2003-07-25 01:43:18 +00:00
amontano
7cdbf33333
changed it to support for 30 dictionaries (instead of just 15)
2003-07-25 01:42:17 +00:00
amontano
7b04d7bca5
changed the "about" info
2003-07-25 01:41:30 +00:00
dchandler
a7f0c35738
Added a test for ts.ha vs. tsha ambiguity; there is no ambiguity.
2003-07-18 03:51:29 +00:00
dchandler
dc454b8c0c
More test cases related to the following:
...
The Tibetan d.za was being converted into the Wylie dza incorrectly. This
is a rare case, but I want TMW->Wylie to be perfectly unambiguous.
2003-07-18 02:31:02 +00:00
dchandler
f8c959bfb0
The Tibetan d.za was being converted into the Wylie dza incorrectly. This
...
is a rare case, but I want TMW->Wylie to be perfectly unambiguous.
2003-07-18 00:30:27 +00:00
dchandler
1c29566aee
I'm now using the Unix diff built in to Apache Jakarta Commons JRCS
...
(which I found on suigeneris.org, not apache.org) in order to bulletproof the
Tibetan Converter tests. They used to fail due to nondeterminism in the
Java RTF writer; they should no longer fail.
I've also changed it so that the Tibetan Converter tests run in headless
mode, which means that they'll run on the nightly builds server.
2003-07-14 12:26:26 +00:00
dchandler
06fb77a82b
Initial revision
2003-07-14 12:22:29 +00:00
dchandler
f900154e7a
Tests disambiguation in TMW->Wylie conversion.
2003-07-14 12:21:02 +00:00
dchandler
0622ac5062
Jskad no longer relies on the <?Consonants?>, <?Vowels?>, <?Other?>,
...
or <?Numbers?> commands; it instead hard-codes the appropriate comma-
delimited lists. This is cleaner because WylieWord and Jskad had different
values for these lists.
2003-07-14 12:19:46 +00:00
dchandler
fb85f6e8ce
Fix comment.
2003-07-14 12:17:04 +00:00
dchandler
79b3b97326
Remove warning message from menu item.
2003-07-13 23:19:11 +00:00
dchandler
c986684beb
Updated help to talk about new features.
2003-07-13 22:51:35 +00:00
dchandler
f695b1a6c1
Updated baselines because conversions have improved since the last
...
update.
2003-07-13 19:14:41 +00:00
dchandler
d10f97fc06
Disambiguation was not being used appropriately. This makes previous
...
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.
Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
2003-07-13 19:14:15 +00:00
dchandler
96afae795c
Disambiguation was not being used appropriately. This makes previous
...
TMW->Wylie conversions with the new-and-improved TMW->Wylie
algorithm faulty.
Now I'm using it a little more than you need to, e.g. b.lha instead of blha is
generated because bla and b.la are ambiguous.
2003-07-13 18:46:29 +00:00
dchandler
802e0cb588
If this method uses the Wylie representation, you get an infinite recursion
...
when you do a TMW->Wylie conversion for a document with glyphs that
have no known Wylie.
2003-07-13 17:40:02 +00:00
dchandler
a86a0f235b
I was missing a break; statement; this caused an Error to be thrown during
...
some TMW->Wylie conversions. No conversions were erroneous, though.
2003-07-13 17:38:00 +00:00
dchandler
6677d1e245
Code cleanup.
2003-07-13 16:53:03 +00:00
dchandler
3b6eaa792e
Fixed javadocs.
2003-07-11 13:33:30 +00:00
dchandler
85176cd9f3
Put in a fix for a new bug in Swing's RTF support. This bug is w.r.t. escapes
...
like \bullet, \emdash, etc., and this fix only works for Windows or OS/2 RTF
files, not for Mac RTF files. So if you want a TM->TMW conversion to work,
use MS Word for Windows, not for the Mac.
2003-07-11 13:30:22 +00:00
dchandler
d726bc0258
A couple of changes to TMW->Unicode thanks to Than's reply to my
...
questions.
2003-07-09 01:44:15 +00:00
dchandler
9db233bdf8
Cosmetic change.
2003-07-08 14:31:14 +00:00
dchandler
02558a1d78
Jskad supports <7, >8, etc. again; it no longer supports the punctuation
...
'<' and '>'. The current keyboard implementation makes this an either-or
proposition, when fundamentally it need not be.
Added a <?Numbers?> command and an <?Input:Numbers?> command to
tibwn.ini; broke the numbers apart from the consonants. This facilitates the
new-and-improved Tibetan->Wylie conversion.
Tibetan->Wylie is now done by forming legal tsheg-bars. A legal tsheg bar
is converted into perfect THDL Wylie. See code comments to learn what
it thinks is a legal tsheg-bar, but it inlcudes bskyUMbsH minus the trailing
punctuation (H), e.g.
Illegal sequences, such as runs of transliterated Sanskrit, are turned into
unambiguous Wylie; each glyph is followed by a vowel or a disambiguator
('.').
I've made it so that the illegal sequences are as beautiful as possible. You
get 'pad+me', for example, not the equivalent but uglier 'pad+m.e.'.
2003-07-08 14:30:17 +00:00
dchandler
c04a3f189b
Rearranged the topics.
2003-07-08 12:50:27 +00:00
dchandler
23d18c925f
Tibetan! 5.1's docs were again faulty. fa and va were getting the wrong
...
vowels.
2003-07-08 02:59:17 +00:00
dchandler
24ac6fd06c
The Trie of possible inputs fixed this bug.
2003-07-06 16:31:13 +00:00
dchandler
d88141512b
Small changes w.r.t. clearing preferences. Some code cleanup.
2003-07-06 16:24:29 +00:00
dchandler
086f4bb6ec
Renamed the Info menu Help.
...
Now using CalHTMLPane to surf the offline and the online help.
2003-07-05 22:25:21 +00:00
dchandler
8c4ab30a52
Rearranged the Tools menu; made the converter smart about "find some..."
...
and "find all..." modes.
2003-07-05 21:02:46 +00:00
dchandler
72d2eee503
Code cleanup.
2003-07-05 19:26:58 +00:00
dchandler
a463b686b3
Jskad now ships with both TibetanMachine and TibetanMachineWeb fonts
...
by default, not just TMW. Thus users need not install these fonts on their
systems.
2003-07-05 18:00:29 +00:00
dchandler
9effee0564
If you opened a file from the recently opened files list and very quickly
...
mouse-clicked on the new Jskad window, you could cause an infinite
regression of requestFocus() operations because the menu would try
to get focus back. I grab focus from the menu now.
2003-07-05 02:30:00 +00:00
dchandler
51679c158b
Final fixes completed; recently opened files can now be selected from
...
Jskad's file menu.
2003-07-05 02:15:33 +00:00
dchandler
4410b52c07
There's still a small bug in this, but here's the real stuff:
...
Recently opened files can now be selected from Jskad's file menu.
A Jskad now gives the focus to the DuffPane when that Jskad gets the
focus.
2003-07-04 03:29:25 +00:00
dchandler
d863446d25
I think *this* compiles...
2003-07-04 02:32:40 +00:00
dchandler
407020108f
I didn't mean to commit the previous revision; I'm still tweaking it.
2003-07-04 02:32:03 +00:00
dchandler
9f0b1c3250
Recently opened files can now be selected from Jskad's file menu.
...
A Jskad now gives the focus to the DuffPane when that Jskad gets the
focus.
2003-07-04 02:31:23 +00:00
dchandler
7500b4e06b
Jskad won't allow you to exit by closing the last window anymore. Instead,
...
you get a dialog box saying to use File/Exit.
2003-07-04 00:21:07 +00:00
dchandler
6c286573ba
Fixed Javadocs.
2003-07-04 00:12:59 +00:00
dchandler
0a1bc0d30b
getWylie now takes a parameter for error detection; I'm not detecting errors
...
here though.
Fixed a typo in a property name.
2003-07-01 23:20:08 +00:00
dchandler
0d1999d055
getWylie now takes a parameter for error detection; I'm not detecting errors
...
here though.
2003-07-01 22:52:18 +00:00
dchandler
a48ec641d5
Better error messages in TMW->Wylie conversions. The user knows what's
...
up.
2003-07-01 03:43:33 +00:00
dchandler
3113a4b8de
Some of the \tmw80.. mappings were out of date.
...
3+1/2 is not EWTS; took these out.
2003-07-01 03:42:30 +00:00
dchandler
e7e7c2bf15
The command-line tool runs in headless mode by default, so it will
...
work on a Linux console, e.g. The JUnit tests will too, though 'ant
check' still fails because we don't sneak the -Djava.awt.headless=true
into the process early enough.
2003-07-01 02:50:09 +00:00
dchandler
6151a7bc94
TMW->Wylie now occurs in the TibetanDocument, not in DuffPane,
...
which means that the command-line tool can finally function with a headless
graphics device. Hopefully it will speed things up, too. It also means that
entering Roman text into the TMW->Unicode conversion and TMW->TM
conversion will be easy.
2003-07-01 01:21:57 +00:00
dchandler
61d29fc355
The TMW->Wylie mapping was busted w.r.t. tshegs.
...
Also, I now map both TMW7.90 and TMW7.91 to EWTS 'M'.
2003-07-01 00:17:18 +00:00
dchandler
229536884f
I've validated by hand the TM<->TMW mappings. A few things changed, so
...
no previous TM->TMW or TMW->TM conversions can be trusted.
2003-06-30 02:24:11 +00:00
dchandler
dc03083433
I've validated by hand the TM<->TMW mappings. A few things changed, so
...
no previous TM->TMW conversions can be trusted.
2003-06-30 02:22:09 +00:00
dchandler
58644a6ef9
Better error handling.
2003-06-30 02:20:52 +00:00
dchandler
b16fb8a85c
This is correct; the Tibetan! 5.1 documentation is not. This affects
...
TM->TMW conversions.
See http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515
for a full list of Tibetan! 5.1 documentation errors.
2003-06-29 22:11:00 +00:00
dchandler
aedef4b44d
An error now appears if you try to convert from format A to format B but no
...
glyphs in format A appear. In this case, it is likely that you meant to convert
a different file or do a different conversion.
2003-06-29 21:31:48 +00:00
dchandler
ee14b7b97f
Jskad now has the ability to open its buffer with an external viewer, e.g.
...
Microsoft Word.
Better OOM error handling in the GUI converter; untested, though.
2003-06-29 20:49:30 +00:00
dchandler
646e23b4a4
Tweaked the converter GUI so that you can open the old and the new files
...
with the external viewer.
2003-06-29 16:45:15 +00:00
dchandler
3f76c3692d
Fixed Javadoc warnings.
2003-06-29 15:37:35 +00:00
dchandler
b841a7f14b
The converter GUI can now be run standalone or from Jskad's Tools menu.
...
The converter GUI gives nicer error messages in at least one case.
2003-06-29 04:18:36 +00:00
dchandler
7938648ca8
TM->TMW conversion has no known bugs. Oddballs have been
...
comprehensively handled.
2003-06-29 03:03:07 +00:00
dchandler
689c1910aa
To deal with java.swing.text.rtf bugs regarding hexadecimal escape
...
sequences, I've created RTFFixerInputStream. It turns illegal hexadecimal
escapes into Unicode escapes.
2003-06-29 02:30:08 +00:00
dchandler
0b849aed97
Fixed comments w.r.t. javadoc warnings.
2003-06-29 02:22:20 +00:00
dchandler
4e279defb4
Fixed a couple of array bounds checks.
...
Added support for two more oddballs.
Deprecated the oddball lookup method because it drops up to 30 glyphs in
TibetanMachine. The correct solution is to transform the RTF before Java's
busted RTF readers ever see it. \'97 becomes \u151, e.g.
2003-06-28 16:33:58 +00:00
dchandler
2a359c45ef
Bad conversions were not leaving the unconvertable characters at the
...
beginning of the document as they should and as they are documented to.
They now do, and they bracket the bad characters with the TM or TMW for
U+0F3C on the left and the TM or TMW for U+0F3D on the right.
Some cleanup.
2003-06-28 16:20:19 +00:00
dchandler
c39d8d6326
My earlier code cleanup introduced this bug; TMW->TM conversion was
...
busted.
2003-06-26 22:48:51 +00:00
dchandler
25510542b2
Now with a nicer error message in one case.
2003-06-26 22:48:05 +00:00
dchandler
c34259b105
Code cleanup.
2003-06-25 01:04:24 +00:00
dchandler
9e6c3009ac
Added an About button. Code cleanup. Changed the Cancel button to the
...
Close button.
2003-06-25 00:49:11 +00:00
dchandler
569fba6467
Made the comments in the my_thdl_preferences.txt file use standard line
...
separators.
2003-06-25 00:03:46 +00:00
dchandler
0f3c4174b6
Made the comments in the my_thdl_preferences.txt file more useful.
2003-06-24 23:48:00 +00:00
dchandler
33beb7b782
Bye bye debugging output.
2003-06-24 12:23:37 +00:00
dchandler
f547734043
Added Than's converter GUI code; adapted it to work with Jskad's
...
converters.
TMW->Unicode now uses Ximalaya by default.
2003-06-24 03:02:29 +00:00
dchandler
19d7cabfe6
Forget the final=faster myth.
2003-06-24 03:01:13 +00:00
dchandler
917864574c
Fixed a logic bug in mapTMWtoTM and mapTMtoTMW.
...
You can now specify which Unicode font to use via 'java
-Dthdl.tmw.to.unicode.font=Ximalaya ...'.
2003-06-23 01:58:11 +00:00
dchandler
b6d8fd89f9
When errors in (all but TMW->Wylie and Wylie->TMW) conversion occur,
...
the troublesome glyphs are now put at the beginning of the document
AFTER AN ACHEN. This makes a glyph like \tmw7095 visible atop the
achen.
Major fix to the handling of paragraphs in conversion; we were (for
whatever reason) dropping paragraphs before.
2003-06-23 01:24:02 +00:00
dchandler
1f4343bed0
TMW->TM, TM->TMW, and TMW->Unicode conversions are all (at least 2)
...
orders of magnitude faster.
2003-06-22 22:10:58 +00:00
dchandler
afe73c2228
The pseudo-file '-', referring to standard input, is now accepted as a
...
command-line argument.
2003-06-22 21:05:16 +00:00
dchandler
900f7492b0
'ant clean check' was failing because I hadn't updated the
...
--find-some-non-tmw and --find-all-non-tmw baselines.
Code cleanup.
2003-06-22 16:11:58 +00:00
dchandler
66287f3cc9
Small TMW->Wylie performance improvements. TMW->Wylie is *much*
...
faster than TMW->Unicode etc.; this is because many fewer replacements
are made (i.e., more text is replaced each time a replacement is
performed).
I must find a way to still preserve formatting but do many fewer
replacements in TMW->{Unicode,TM} and TM->TMW.
2003-06-22 04:32:59 +00:00
dchandler
6540b260bd
Fixes a (small, I think) TMW->Unicode performance glitch. I was
...
inserting 5 characters at a time and then skipping ahead just one
position. I don't think this affected correctness.
I believe there's still a terrible (exponential?) slowdown as the
input file gets bigger, however. Perhaps not -- but we run through
the first 1000 TMW glyphs in 6 seconds, the 20th thousand takes at
least 60 seconds. Is TMW->Wylie faster than TMW->Unicode? If so,
why?
Thought: don't use a DuffPane within TibetanConverter -- it can only
add overhead, right? My hprof profile said that the conversion was
taking just a couple of percent of the work; the rest was going to
display-related stuff that you should only see if you were displaying
the document. I'm not!
2003-06-22 04:08:33 +00:00
dchandler
dfe64a1927
Added --find-some-non-tm and --find-all-non-tm modes to the converter to
...
help ensure worry-free TM->TMW conversions.
2003-06-22 00:14:18 +00:00
dchandler
80101666c7
Included a fix from WylieWord's tibwn.ini. Removed some needless trailing
...
tildes.
2003-06-21 02:35:21 +00:00
dchandler
9a41f512d9
It used to be the case that you could select 'Close', and then when asked
...
"do you want to save?" you could press yes and then press cancel and
Jskad would still exit. That's no longer the case.
Added File->Exit to Jskad.
2003-06-21 02:07:51 +00:00
dchandler
45b87b0fb4
In Jskad, you can now clear the preferences and return to default values.
2003-06-21 01:26:17 +00:00
eg3p
fbb6245fdb
Added cut() and copy() methods to override JTextPane's methods of same name.
2003-06-20 15:27:20 +00:00