dchandler
c69ba26c60
TString now has tracks what Roman transliteration system it is using. Next up is to make ACIPConverter handle EWTS or ACIP TStrings.
2004-01-17 19:28:54 +00:00
dchandler
c1aa81e943
RFE 860190: ACIP->Unicode now gives a warning when it outputs something that can't be represented in TMW.
2003-12-16 07:45:40 +00:00
dchandler
848349fd3a
More tests.
2003-12-15 08:16:06 +00:00
dchandler
e7a9e7968f
ACIP->Unicode now uses two characters for consonants instead of one. This matches the dislike for characters like U+0F77 etc.
...
ACIP->Tibetan was not giving an error for BCWA because it parsed like BCVA. Fixed.
2003-12-15 07:32:14 +00:00
dchandler
e9f7b2dfed
If you want curly brackets around folio markers, you'll have to set
...
the system property
thdl.acip.to.x.output.curly.brackets.around.folio.markers to true.
2003-12-14 08:47:03 +00:00
dchandler
8664571577
Warnings were not being detected correctly. Fixed.
...
ACIP->Unicode uses U+0020, ' ', for whitespace. ACIP->TMW uses the
TMW whitespace for whitespace.
2003-12-14 08:38:10 +00:00
dchandler
01e65176d4
Using less memory and time to figure out if warnings occurred.
2003-12-14 07:41:15 +00:00
dchandler
76c2e969ac
Fixed ACIP->Unicode bug for YYE etc., things with full-formed
...
subjoined consonants and vowels.
Fixed ACIP->TMW for YYA etc., things with full-formed subjoined
consonants.
2003-12-14 07:36:21 +00:00
dchandler
f625c937ee
ACIP {B} was not being treated like {BA}; instead, an error was resulting. All the five prefixes were affected.
2003-12-14 05:54:07 +00:00
dchandler
581643cf59
{DAN,\nLHAG} used to be treated like {DAN, LHAG} but that got broken. Fixed.
...
Added tests for lexer's handling of ACIP spaces etc.
2003-12-10 06:55:16 +00:00
dchandler
8e673bbc2c
{NGA,} becomes {NGA\u0f0c,} now instead of {NGA\u0f0b,}.
...
Note: ACIP->Unicode for {NGA,} was not giving the Unicode that {NGA\u0f0b,} gives before.
2003-12-10 06:50:14 +00:00
dchandler
a466bad939
ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.
2003-12-08 07:51:45 +00:00
dchandler
a39c5c12b0
ACIP->TMW now supports EWTS PUA {\uF021}-style escapes. Our extended ACIP is thus TMW-complete and useful for testing.
2003-12-08 07:15:27 +00:00
dchandler
b617f761d5
ACIP->TMW for {^GONG SA } used to fail; fixed.
2003-12-07 20:05:41 +00:00
dchandler
c43e9a446b
Revamped some ACIP->Tibetan error messages.
2003-12-06 20:19:40 +00:00
dchandler
c9c771d1ee
ACIP {&}, as in {KO&HAm,}, is supported.
2003-11-30 02:18:59 +00:00
dchandler
ac412c994b
Now {Pm} is treated like {PAm}; {Pm:} is like {PAm:}; {P:} is like {PA:}.
2003-11-30 02:06:48 +00:00
dchandler
dfaae4be93
ACIP->TMW and ACIP->Unicode now allow for Unicode escapes like K\u0F84. This means that the lack of support for ACIP's backslash, '\\', is mitigated because you can turn ACIP {K\} into ACIP {K\u0F84}.
...
Support for U+F021-U+F0FF, the PUA that the latest EWTS uses, is not provided.
2003-11-29 22:56:18 +00:00
dchandler
4e6a9c299f
ACIP % {MTHAR%} and o {Ko} and ^ {^GONG SA} are now supported. A % always causes a warning.
2003-11-11 03:43:11 +00:00
dchandler
2cb90bd231
ACIP->Tibetan converters now warn every time {%} is encountered that U+0F14 might've been intended.
...
The Unicode for ACIP {o} is U+0F37.
2003-11-09 23:15:58 +00:00
dchandler
04816acb74
ACIP->Unicode was broken for KshR, ndRY, ndY, YY, and RY -- those
...
stacks that use full-form subjoined RA and YA consonants.
ACIP {RVA} was converting to the wrong things.
The TMW for {RVA} was converting to the wrong ACIP.
Checked all the 'DLC' tags in the ttt (ACIP->Tibetan) package.
2003-11-09 01:07:45 +00:00
dchandler
e3f1ed5914
Removed a DOS EOF character (^Z). I haven't a clue how it crept in -- the lexer doesn't let that kind of thing get into tsheg bars.
2003-10-27 13:58:45 +00:00
dchandler
94a43d3f39
Now anything not clearly native Tibetan is colored green when coloring is enabled. G'EEm is "native", though -- the only "vowel" that implies non-nativeness is {:}, as in {KA:}.
2003-10-26 18:56:48 +00:00
dchandler
5c36dd81d3
Fixed bug 830332, "Convert selected ACIP=>Tibetan busted".
2003-10-26 18:25:25 +00:00
dchandler
e74547d743
GA-YOGS now parses like G-YOGS and GAYOGS do.
2003-10-26 18:06:38 +00:00
dchandler
61cf19932e
ACIP {B5} and {7'} were problematic; that's fixed.
2003-10-26 17:47:35 +00:00
dchandler
ad7b20e485
Added yet more metadata.
2003-10-26 16:05:30 +00:00
dchandler
1550fee41a
Removed garbage.
2003-10-26 16:05:07 +00:00
dchandler
fe33d67573
Added more metadata. There are 35 million+ tsheg bars here.
2003-10-26 15:35:08 +00:00
dchandler
050666d735
I'm committing this at 1:55 am EST on Sunday, October 26, 2003. There
...
is no compelling technical reason, but this way I get to have two
commits that are both before and after each other.
Freaky.
2003-10-26 06:56:12 +00:00
dchandler
31b3020d07
Added a test case that runs almost all the tsheg bars from all
...
non-reference, publicly available ACIP files (hundreds of megabytes of
them) through the converter. The frequencies of these tsheg bars in
in the file, too.
2003-10-26 06:02:48 +00:00
dchandler
7ba1ad0735
Added a mechanism for end users to have the ACIP/EWTS=>Tibetan converters print all tsheg bars or all unique tsheg bars to standard output. This will be useful for getting a list of all the tsheg bars in ACIP texts, e.g., which can then go into PackageTest.java. A lot of postprocessing would be required to get frequency counts, but you could do it with a perl script, awk, etc.
2003-10-26 02:42:06 +00:00
dchandler
ef24c608bf
Added a mechanism for end users to customize ACIP/EWTS=>Tibetan conversions by giving a list of substitutions to be performed. E.g., when I invoke Jskad via 'java -Dorg.thdl.tib.text.ttt.VerboseReplacementMap=false -Dorg.thdl.tib.text.ttt.ReplacementMap="KAsh=>K+sh" -jar Jskad.jar', then the ACIP KAsh becomes K+sh automatically.
...
This mechanism is for Andres (who noticed KAsh=>K+sh in practice) and power users only, and not power users until I document the thing outside of the source code.
2003-10-26 02:17:19 +00:00
dchandler
6bda550157
The ACIP "BNA" was converting to B-NA instead of B+NA, even though NA cannot take a BA prefix. This was because BNA was interpreted as root-suffix. In ACIP, BN is surely B+N unless N takes a B prefix, so root-suffix is out of the question.
...
Now Jskad has two "Convert selected ACIP to Tibetan" conversions, one with and one without warnings, built in to Jskad proper (not the converter, that is).
2003-10-26 00:32:55 +00:00
dchandler
306cf2817c
Private correspondence with Robert Chilton led to me to add and remove a few prefix rules. BLC and BGL are here, BLK, BLG, BLNG, BLJ, BNG, BJ, BNY, BN, and BDZ are gone.
...
Added a few new tests.
2003-10-25 21:47:34 +00:00
dchandler
7d24ab393f
Code cleanup.
2003-10-21 03:44:02 +00:00
dchandler
c764eee8d0
Added a new warning for DMAR and others affected similarly affected by prefix rules, where seeing D+MAR, not D-MAR, could have caused an input operator to type in DMAR. This is a "Most" warning, but DMA causes a higher-priority "Some" warning.
2003-10-21 03:36:57 +00:00
dchandler
2f39921381
Added more test cases.
2003-10-21 02:14:45 +00:00
dchandler
2f81a801ef
Added three new kinds of warnings to ACIP->Tibetan conversions.
2003-10-21 02:00:49 +00:00
dchandler
a47af2c165
Bulletproofing -- code cleanup.
2003-10-21 00:31:10 +00:00
dchandler
188b9c322e
Warn about prefix rules only in Most and All modes.
2003-10-21 00:23:55 +00:00
dchandler
1224030898
Speedup.
2003-10-21 00:19:15 +00:00
dchandler
3aa3859354
ACIP->Unicode crash fixed.
...
5% of the code for support of ACIP->Unicode.rtf is here.
2003-10-19 22:19:16 +00:00
dchandler
5aab4acc93
I've undone the SNYAM'AM == SNYAMA'AM hack. The only occurrence of SNYAM'AM in the ACIP texts I've got is likely a typo, says Robert Chilton.
...
The code would be cleaner if I could bear to delete my terrible hack. Maybe in a month, when I don't feel so dumb for coding it up in the first place.
The correct solution for such things is to give the ACIP->Tibetan converters a pre-filter mechanism. This would be before the lexer or part of the lexer (maybe you only want to filter tsheg bars), and it would allow the end user to specify things like "s/SNYAM'AM/S+NYAMA'AMA/g".
2003-10-19 20:48:22 +00:00
dchandler
4b1395e0ba
Jskad has a new feature: Convert Selection from ACIP to Tibetan. It uses the ACIP converter to do its work.
...
Improved some error messages from the ACIP->Tibetan converter.
2003-10-19 20:16:06 +00:00
dchandler
5ce84d4d9a
Tiny code cleanup.
2003-10-19 04:43:34 +00:00
dchandler
0edebd55d7
We were dying in the "can ts+h take a ga prefix?" check for GTZHAN.
2003-10-19 03:47:33 +00:00
dchandler
557ed7ed44
DKY'O etc. weren't being handled properly by ACIP->Tibetan. Now they are.
2003-10-18 17:49:29 +00:00
dchandler
3b55ea509f
Prefix rules have changed. A few are gone; a few new ones are here. I've implemented here a list that Robert Chilton sent me in private correspondence. He doesn't describe it as definitive, but since it affects ACIP->Tibetan conversions, and it's the best I've got, here they are. There's still an optional warning about "Hey, prefix rules matter for this tsheg bar."
...
I've left in a few rules that I didn't find on RC's list; I've asked him to look into these further.
2003-10-18 05:48:53 +00:00
dchandler
8c99adeb63
TMW->EWTS, TMW->ACIP, and ACIP->Unicode/TMW now support more appendages. Personal correspondence with Robert Chilton led me to support, besides 'am, 'ang, 'o, 'i, and 'u, the following:
...
'e (used in foreign transliteration)
'ongs
'is
'os
'ur
'us
'ung
2003-10-18 03:04:47 +00:00