dchandler
d3d0ff23a8
Chris Fynn and Tony Duff answered my questions about U+0F3F and U+0F3E.
2003-11-25 00:28:18 +00:00
dchandler
b8608797aa
Updated the code I used for testing to generate the file containing all glyphs in TM and all glyphs but one in TMW.
2003-11-24 05:59:32 +00:00
dchandler
8d18ac53cb
N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate.
...
Found another inconsistency between Unicode and the TM/TMW docs. I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day.
Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate. A few TMW glyphs still give errors.
Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested). The baselines have not been hand-verified, but changes will be detected.
2003-11-24 05:50:42 +00:00
dchandler
5d053b41fe
Found another inconsistency between Unicode and the TM/TMW docs. I've sent e-mail to Tony Duff asking who's right, but I'm putting this in the errata under the assumption that even if Unicode is wrong, Unicode's wrong view will somehow rule the day.
...
Also, TMW->EWTS now generates \uF021-\uF0FF or \u0F00-\u0FFF escapes when appropriate. A few TMW glyphs still give errors.
Also, there's now a test to be sure that TM<->TMW and TMW->EWTS won't break in the future (except for the one glyph in TMW that isn't in TM, that one isn't tested). The baselines have not been hand-verified, but changes will be detected.
2003-11-24 05:49:15 +00:00
dchandler
9a247f5932
N+D+Ya, not N+D+ya, w+Wa, not w+wa .. use W, R, and Y where appropriate.
2003-11-24 04:55:11 +00:00
dchandler
1ec668c018
Dza is not in the latest EWTS draft.
2003-11-24 04:28:55 +00:00
dchandler
f76c089366
Using Y, R, and W everywhere needed. R+... is never needed in TM/TMW, I concluded (with 50% certainty).
2003-11-24 04:05:59 +00:00
dchandler
08c676c186
Bug fixes. Plus, now 99% in sync with the new EWTS draft. Search for 'DLC' to find a few open issues.
...
Readded the line for reversed dza; it should never have been deleted, as that breaks TM<->TMW. I tested the whole mapping by hand once; this incident shows that automation is very helpful.
'{' and '}' were swapped...
The Unicode for something was "", not "none".
+R, +W, +Y, R+ now in use (though more testing is needed)
2003-11-24 02:40:40 +00:00
dchandler
216c5b0d54
Fixed TWM->Wylie for achen. I even tested this by pretending achen could take a da prefix (when in reality it takes no prefixes).
2003-11-23 01:22:27 +00:00
dchandler
37e8dfa917
The menu now says (Buggy) in front of "Convert Selection from Wylie to Tibetan" because this feature is, you guessed it, buggy.
2003-11-22 22:48:41 +00:00
dchandler
113480a882
X is now better supported, so this changed.
2003-11-15 20:00:59 +00:00
dchandler
8d4fb5d13f
We crashed before when '~' was entered.
2003-11-14 04:50:55 +00:00
dchandler
b59b86fd73
Commented this to mention some recent testing.
2003-11-11 03:45:58 +00:00
dchandler
4023be9612
Better prettyprinting. Untested.
2003-11-11 03:43:26 +00:00
dchandler
4e6a9c299f
ACIP % {MTHAR%} and o {Ko} and ^ {^GONG SA} are now supported. A % always causes a warning.
2003-11-11 03:43:11 +00:00
dchandler
2cb90bd231
ACIP->Tibetan converters now warn every time {%} is encountered that U+0F14 might've been intended.
...
The Unicode for ACIP {o} is U+0F37.
2003-11-09 23:15:58 +00:00
dchandler
084e12a02c
Import Wylie is a buggy feature. The menu now calls it "(Buggy) Import Wylie...". t+s+w doesn't even convert correctly!
...
Bug-free EWTS->TMW using the org.thdl.tib.text.ttt codebase will be here soon.
2003-11-09 01:25:58 +00:00
dchandler
04816acb74
ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those
...
stacks that use full-form subjoined RA and YA consonants.
ACIP {RVA} was converting to the wrong things.
The TMW for {RVA} was converting to the wrong ACIP.
Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.
2003-11-09 01:07:45 +00:00
dchandler
8193cef5d1
Better comments.
2003-11-09 01:07:07 +00:00
dchandler
dbd9c80ca0
Special tests for rwa and r+wa, which are the only two different stacks with the same hash key modulo - and +.
2003-11-09 01:06:26 +00:00
dchandler
85e1e0701e
Fixed crashing bug in Import Wylie.
2003-11-08 23:32:53 +00:00
dchandler
8fbd8850f8
New feature: Convert Selection from TWM to ACIP.
2003-11-08 23:22:06 +00:00
dchandler
bab47c4910
There are now extensive tests to make sure that each Tibetan stack in TMW can be typed in using EWTS and correctly converted to TMW and then back to EWTS. These tests unearthed new bugs in the Tibetan! 5.1 docs.
2003-11-08 22:11:24 +00:00
dchandler
3fa417d3ee
phywI, phywU, drwI and drwU now produce vowels and subjoined a-chungs. The Tibetan! 5.1 docs say I and U are not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the I or U request -- we were silent.
2003-11-08 21:53:34 +00:00
dchandler
e058d6252e
phywu and drwu now produce zhabs-kyus. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to these stacks, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.
2003-11-08 21:48:08 +00:00
dchandler
55aaeef9d0
l+h+wu now produces a zhabs-kyu. The Tibetan! 5.1 docs say the zhabs-kyu is not applicable to l+h+w, but I say Jskad lets the user decide what's applicable. If you disagree, be sure to give an error message before dropping the zhabs-kyu request -- we were silent.
2003-11-08 21:23:50 +00:00
dchandler
06edf17b04
Once again, the wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.
2003-11-08 21:17:18 +00:00
dchandler
f626a04d72
Tests t+r+n glyph.
2003-11-08 20:28:34 +00:00
dchandler
74d6bc61ab
The wrong 'dreng-bu glyphs were listed in the Tibetan! 5.1 docs -- they were na-ro glyphs, actually.
2003-11-08 20:25:16 +00:00
dchandler
a0ae0bf70d
Fixes bug 800164. Jskad users can now enter t+r+n on the keyboard. Wylie Word should work for t+r+n too.
2003-11-08 17:50:10 +00:00
dchandler
0ac90d7c0f
Nathanial -> Nathaniel
2003-11-08 03:42:51 +00:00
dchandler
e3f1ed5914
Removed a DOS EOF character (^Z). I haven't a clue how it crept in -- the lexer doesn't let that kind of thing get into tsheg bars.
2003-10-27 13:58:45 +00:00
dchandler
94a43d3f39
Now anything not clearly native Tibetan is colored green when coloring is enabled. G'EEm is "native", though -- the only "vowel" that implies non-nativeness is {:}, as in {KA:}.
2003-10-26 18:56:48 +00:00
dchandler
5c36dd81d3
Fixed bug 830332, "Convert selected ACIP=>Tibetan busted".
2003-10-26 18:25:25 +00:00
dchandler
e74547d743
GA-YOGS now parses like G-YOGS and GAYOGS do.
2003-10-26 18:06:38 +00:00
dchandler
61cf19932e
ACIP {B5} and {7'} were problematic; that's fixed.
2003-10-26 17:47:35 +00:00
dchandler
ad7b20e485
Added yet more metadata.
2003-10-26 16:05:30 +00:00
dchandler
1550fee41a
Removed garbage.
2003-10-26 16:05:07 +00:00
dchandler
fe33d67573
Added more metadata. There are 35 million+ tsheg bars here.
2003-10-26 15:35:08 +00:00
dchandler
050666d735
I'm committing this at 1:55 am EST on Sunday, October 26, 2003. There
...
is no compelling technical reason, but this way I get to have two
commits that are both before and after each other.
Freaky.
2003-10-26 06:56:12 +00:00
dchandler
31b3020d07
Added a test case that runs almost all the tsheg bars from all
...
non-reference, publicly available ACIP files (hundreds of megabytes of
them) through the converter. The frequencies of these tsheg bars in
in the file, too.
2003-10-26 06:02:48 +00:00
dchandler
7ba1ad0735
Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc.
2003-10-26 02:42:06 +00:00
dchandler
ef24c608bf
Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically.
...
This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.
2003-10-26 02:17:19 +00:00
dchandler
6bda550157
The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
...
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:32:55 +00:00
dchandler
d99ae50d8a
The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
...
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:24:28 +00:00
dchandler
1415fc43e3
The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
2003-10-26 00:21:54 +00:00
dchandler
306cf2817c
Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
...
Added a few new tests.
2003-10-25 21:47:34 +00:00
dchandler
f106deb884
Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
...
Added a few new tests.
2003-10-25 21:40:21 +00:00
dchandler
af013a6a39
I renamed this function a while ago.
2003-10-22 02:49:16 +00:00
dchandler
7d24ab393f
Code cleanup.
2003-10-21 03:44:02 +00:00
dchandler
c764eee8d0
Added a new warning for DMAR and others affected similarly affected by prefix rules, where seeing D+MAR, not D-MAR, could have caused an input operator to type in DMAR. This is a "Most" warning, but DMA causes a higher-priority "Some" warning.
2003-10-21 03:36:57 +00:00
dchandler
2f39921381
Added more test cases.
2003-10-21 02:14:45 +00:00
dchandler
2f81a801ef
Added three new kinds of warnings to ACIP->Tibetan conversions.
2003-10-21 02:00:49 +00:00
dchandler
a47af2c165
Bulletproofing -- code cleanup.
2003-10-21 00:31:10 +00:00
dchandler
188b9c322e
Warn about prefix rules only in Most and All modes.
2003-10-21 00:23:55 +00:00
dchandler
1224030898
Speedup.
2003-10-21 00:19:15 +00:00
dchandler
1d9b405bb8
Forgot to add this file earlier.
2003-10-20 13:49:54 +00:00
dchandler
5d9305c9d5
"Browse..." buttons are smart about file types now.
2003-10-19 23:17:25 +00:00
dchandler
3aa3859354
ACIP->Unicode crash fixed.
...
5% of the code for support of ACIP->Unicode.rtf is here.
2003-10-19 22:19:16 +00:00
dchandler
5aab4acc93
I've undone the SNYAM'AM == SNYAMA'AM hack. The only occurrence of SNYAM'AM in the ACIP texts I've got is likely a typo, says Robert Chilton.
...
The code would be cleaner if I could bear to delete my terrible hack. Maybe in a month, when I don't feel so dumb for coding it up in the first place.
The correct solution for such things is to give the ACIP->Tibetan converters a pre-filter mechanism. This would be before the lexer or part of the lexer (maybe you only want to filter tsheg bars), and it would allow the end user to specify things like "s/SNYAM'AM/S+NYAMA'AMA/g".
2003-10-19 20:48:22 +00:00
dchandler
4b1395e0ba
Jskad has a new feature: Convert Selection from ACIP to Tibetan. It uses the ACIP converter to do its work.
...
Improved some error messages from the ACIP->Tibetan converter.
2003-10-19 20:16:06 +00:00
dchandler
5ce84d4d9a
Tiny code cleanup.
2003-10-19 04:43:34 +00:00
dchandler
0edebd55d7
We were dying in the "can ts+h take a ga prefix?" check for GTZHAN.
2003-10-19 03:47:33 +00:00
dchandler
47648186b4
Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences.
2003-10-18 18:34:49 +00:00
dchandler
e5534f69ee
Untabified -- whitespace only has changed. Use 'cvs diff -wb' to avoid seeing these differences.
2003-10-18 18:29:46 +00:00
dchandler
557ed7ed44
DKY'O etc. weren't being handled properly by ACIP->Tibetan. Now they are.
2003-10-18 17:49:29 +00:00
dchandler
e799438f86
CVS ignoring backup files.
2003-10-18 17:47:56 +00:00
dchandler
3b55ea509f
Prefix rules have changed. A few are gone; a few new ones are here. I've implemented here a list that Robert Chilton sent me in private correspondence. He doesn't describe it as definitive, but since it affects ACIP->Tibetan conversions, and it's the best I've got, here they are. There's still an optional warning about "Hey, prefix rules matter for this tsheg bar."
...
I've left in a few rules that I didn't find on RC's list; I've asked him to look into these further.
2003-10-18 05:48:53 +00:00
dchandler
f28bee4c71
The appendage 'um is here too.
2003-10-18 05:10:49 +00:00
dchandler
8c99adeb63
TMW->EWTS, TMW->ACIP, and ACIP->Unicode/TMW now support more appendages. Personal correspondence with Robert Chilton led me to support, besides 'am, 'ang, 'o, 'i, and 'u, the following:
...
'e (used in foreign transliteration)
'ongs
'is
'os
'ur
'us
'ung
2003-10-18 03:04:47 +00:00
dchandler
5e18feb47d
ACIP now stacks greedily. TTTTTA is T+T+T+T+TA, even though that stack doesn't exist in TM or TMW. Robert Chilton, in personal correspondence, agreed that this is the way to do things.
...
ACIP handles the appendages 'AM, 'ANG, 'US, 'UR, 'I, 'O, and 'U correctly.
2003-10-16 04:15:10 +00:00
dchandler
5f4fbfab7c
Bulletproofing and debugging support.
2003-10-16 04:13:14 +00:00
dchandler
129ebccd67
In TCC #1 keyboard, h>cj now works. I may have fixed this in a terrible way, breaking other things even. Hard to say because I don't really understand the code I changed. But DuffPaneTest passes.
...
If we ever clean up the keyboards, the changes made here to tcc_keyboard.ini should probably be undone.
2003-10-12 18:16:17 +00:00
dchandler
d7fdacfcdc
Open menu is now Open..., Save as is now Save as...
2003-10-12 18:12:19 +00:00
dchandler
8dbfff17e1
All .rtf and .Rtf and .RTF files are selectable now.
2003-10-12 18:11:50 +00:00
dchandler
35209ce7fd
I'm going to have to debug this, and the tab stops make the source unreadable. I don't like messing with whitespace, but it seems like I'll be the main maintainer for a while, and the people after me can use cvs diff -wb. So I'm untabifying.
2003-10-12 16:44:28 +00:00
dchandler
749b8d6727
Added toString for debugging.
2003-10-04 16:33:47 +00:00
dchandler
b983af8031
r-t, not rt. This was why converting 'brtul' from TMW to Wylie didn't work.
2003-10-04 16:33:23 +00:00
dchandler
6a11eddb1e
Warning level "None" wasn't working.
2003-10-04 16:12:48 +00:00
dchandler
b10098cc61
"Most" warnings now excludes "the last stack has no vowel", making it much more useful.
2003-10-04 15:10:18 +00:00
dchandler
ee50291ed4
Andres found that "THAG PA" caused a NullPointerException. That's fixed.
...
Renamed ACIPString to TString -- we'll use this for EWTS and ACIP both.
TMW->ACIP for TMW9.61 should work now.
2003-10-04 01:22:59 +00:00
amontano
c8927b827c
Fixed bugs in the scanner. Added reference to yogacara bhumi in the about window.
2003-09-23 19:05:23 +00:00
amontano
e89c49651c
Now translation tool accepts synonyms separated by ';' in the entry field.
2003-09-14 05:56:20 +00:00
dchandler
115d0e0e6c
Fixed ACIP->TMW vowels like 'I etc.
...
Fixed ACIP->Unicode/TMW for BDE, which should be B-DE, not B+DE, because the former is legal Tibetan.
The ACIP->EWTS subroutine has improved.
TMW->Wylie and TMW->ACIP are improved in error cases.
TMW->ACIP has friendly embedded error messages now.
2003-09-12 05:06:37 +00:00
dchandler
16817d0b8e
Fixed Javadocs.
2003-09-10 01:19:05 +00:00
amontano
cc853be387
Fixed a bug with regards to the word order in the servlet version.
2003-09-09 16:02:03 +00:00
amontano
1467f9cd3f
Fixed display of servlet version and added option to include links to
...
other versions. See http://iris.lib.virginia.edu/tibetan/servlet/org.thdl.tib.scanner.OnLineScannerFilter?thdlBanner=on
2003-09-08 21:32:40 +00:00
amontano
73d01111ca
Fixed the "clicking on the translate button makes the thdl menu go away"
...
error. on the servlet version of the translation tool.
2003-09-08 16:39:18 +00:00
amontano
07fbbcaf45
Solved some sorting errors with the servlet version.
...
Also if the service parameter thdlBanner=anything is sent, the THDL's
java script menu is displayed (if it is running on the thdl server). There is
still a bug. Menu goes away when pressing "translate" button. See:
http://iris.lib.virginia.edu/tibetan/servlet/org.thdl.tib.scanner.OnLineScannerFilter?thdlBanner=on
2003-09-08 08:12:56 +00:00
dchandler
e42d76b3b8
Nicer default Latin font for ACIP->* conversions.
...
Performance improvement in non-color-coding mode.
2003-09-07 22:08:35 +00:00
dchandler
6872ea8028
Corrected the usage info.
2003-09-07 22:08:00 +00:00
dchandler
d8657abd44
ACIP font shrinking as in {KA (GA)} is now supported.
2003-09-07 18:30:59 +00:00
dchandler
07e360d9a8
The ACIP {NYA%} is supported. {NYAo} and {NYAx} are confusing to me,
...
because I don't know which glyphs o and x correspond to. For that
reason, they cause ERRORs.
The proposed THDL Extended Wylie ~X and X is now used for U+0F35 and
U+0F37 respectively.
2003-09-07 16:19:50 +00:00
amontano
f57cdda867
Now translation tool displays to where is it connected
2003-09-07 03:40:51 +00:00
amontano
b489034598
Fixed a call to a deprecated method
2003-09-07 03:39:08 +00:00
dchandler
0d6d6ed611
Added GUI support for color-coding. Added support for color-coding
...
and choosing the warning level to TibetanConverter.
Better error checking in the GUI converter.
2003-09-06 22:56:10 +00:00
dchandler
1308f14807
sanskrit=green, prefix-rule-afflicted-tsheg-bar=yellow
2003-09-05 06:05:46 +00:00
dchandler
899b042ec0
Preliminary, untested color support in ACIP->TMW conversion.
2003-09-05 05:54:35 +00:00
dchandler
717c3b94f3
Fixed ACIP->Unicode spaces/tshegs and newlines, especially with shads.
...
"NGA," becomes "NGA-tsheg-," automatically now.
2003-09-05 05:08:47 +00:00
dchandler
5c240ac072
From the converter GUI, you can now choose TMW->ACIP text and
...
TMW->Wylie text. All the conversions show you which format they take
as input and which format they give as output.
File filter for ACIP files added.
The GUI converter suggests a file extension wisely.
Fixed newline bug in ACIP->Unicode converter.
2003-09-05 02:05:34 +00:00