2e9ea92a3a
SourceForge). Implemented (somewhat kludgily) option for phonetics scheme to replace e with é iff it is the last letter of the last tsheg bar. This is required by the new THDL phonetics spec. New algorithm, per new THDL phonetics spec, for ba->wa processing. The heuristic is that it applies only to the last tsheg bar in multi-tsheg-bar words. (Previously, ba always generated "?ba/wa?", which is maybe more correct but less attractive.) This heuristic fails on, e.g., "tsheg bar". Oh well. Rationalized format of phonetics file: > is used as separator in exceptions as well as rules. (Previously, : was used in exceptions only.)
155 lines
No EOL
3.5 KiB
Text
155 lines
No EOL
3.5 KiB
Text
; THDL Phonetics rules for WylieWord.
|
|
|
|
; File format: there are two sections, for rules and exceptions.
|
|
;
|
|
; There is one rule per line. Typically a rule has two parts,
|
|
; separated by a space. The first part is a sequence of letters
|
|
; that, if it appears in a Wylie transliteration, is to be replaced
|
|
; by the second part in the phonetic transcription. The second
|
|
; part can be missing, in which case the first part is simply
|
|
; deleted from transcriptions when found.
|
|
;
|
|
; A semicolon precedes a comment. Blank lines are OK.
|
|
;
|
|
; The rules are applied in the order they appear in this file.
|
|
; Each rule is applied as many times as possible, but we never
|
|
; go back to a previous rule. (This simple rewrite-rule grammar is
|
|
; not sufficient to implement all phonetic schemes, at least not
|
|
; compactly. For example, it would be difficult to capture the
|
|
; effects of preinitial consonants on tone (as in the scheme used
|
|
; in Joe Wilson's book, for instance).) Also note that not even the
|
|
; whole of the present scheme is implemented using these rules. For
|
|
; example, the deletion of prefix and superscript consonants,
|
|
; and of wa-zur, are done in program code, not using the rules here.
|
|
|
|
; This makes e come out é only when the last letter in a "word" (*not*
|
|
; syllable). Our grammar engine is not nearly powerful enough to do
|
|
; this in a clean way.
|
|
<?Enable THDL final é kludge?>
|
|
|
|
; Miscellaneous prefix transformations
|
|
g. ; delete this (representing g prefix, used before root y only)
|
|
dby y ; must come before db->w, for dbyang
|
|
dbr r ; must come before db->w, for dbral
|
|
db w ; must come before by->j
|
|
|
|
; Removal of confusing 'h's
|
|
th t
|
|
ph p
|
|
tsh ts
|
|
|
|
; c and ch are both transcribed ch. To get this we need a kludge
|
|
; (involving x), because the rule c -> ch would apply recursively.
|
|
ch c
|
|
c x
|
|
x ch
|
|
|
|
; Bad behavior from Y
|
|
py ch
|
|
phy ch
|
|
by j
|
|
my ny
|
|
|
|
; Retroflexes
|
|
kr tr
|
|
khr tr
|
|
gr dr
|
|
pr tr
|
|
phr tr
|
|
br dr
|
|
|
|
; Other bad behavior from R
|
|
mr m
|
|
sr s
|
|
|
|
; Uniquely random case
|
|
zl d
|
|
|
|
; Umlaut of a, o, u followed by d, n, l, s, and 'i
|
|
; Note: this must be done before suffix-stripping.
|
|
; Before actually doing the umlaut, we "hide" the n in ng, so that ng doesn't
|
|
; induce umlaut. This is gross; if we had a real grammar engine it wouldn't
|
|
; be necessary.
|
|
ng x
|
|
ad e
|
|
an en
|
|
al el
|
|
as e
|
|
a'i e
|
|
od ö
|
|
on ön
|
|
ol öl
|
|
os ö
|
|
o'i ö
|
|
ud ü
|
|
un ün
|
|
ul ül
|
|
us ü
|
|
u'i ü
|
|
; restore ng
|
|
x ng
|
|
|
|
; Stripping of 'i from e'i
|
|
; (It is stripped from a, o, u by umlaut rules, and from i by vowel-doubling rule.)
|
|
e'i e
|
|
|
|
; Stripping of suffix d, s, and ' from i and e
|
|
; Note: this has already been done by the umlaut rules for some cases,
|
|
; which don't need to be repeated here.
|
|
id i
|
|
ed e
|
|
is i
|
|
es e
|
|
a' a
|
|
e' e
|
|
i' i
|
|
o' o
|
|
u' u
|
|
|
|
; Remove doubled vowels (e.g. pa'ang -> pang, not paang)
|
|
aa a
|
|
ee e
|
|
ii i
|
|
oo o
|
|
uu u
|
|
|
|
; Devoicing of suffix g, b
|
|
ag ak
|
|
eg ek
|
|
ig ik
|
|
og ok
|
|
ug uk
|
|
ab ap
|
|
eb ep
|
|
ib ip
|
|
ob op
|
|
ub up
|
|
|
|
<?Exceptions?>
|
|
|
|
; There is one exception per line. Each exception consists of
|
|
; the transliteration (which may be several syllables separated
|
|
; by spaces), followed by a space, a greater-than, a space, and the
|
|
; pronunciation (which may also contain spaces). A semicolon
|
|
; precedes a comment. Blank lines are OK.
|
|
|
|
mkha' 'gro > khandro
|
|
sprul sku > tulku
|
|
rta mgrin > tamdrin
|
|
dga' ldan > ganden
|
|
dge 'dun > gendün
|
|
a mdo > amdo
|
|
bka' 'gyur > kangyur
|
|
rgyu 'bras > gyundré
|
|
ngos 'dzin > ngöndzin
|
|
chab mdo > chamdo
|
|
dpal ldan > penden
|
|
dpal 'bar > pembar
|
|
rig 'dzin > rindzin
|
|
skyabs 'gro > kyamdro
|
|
'bri ru > biru
|
|
sbra nag zhol > banakzhöl
|
|
rdo rje > dorje
|
|
o rgyan > orgyen
|
|
lha rje > lharjé
|
|
rgyal rtse > gyantsé |