Fixed phonetics code to strip post-suffix d (bug 800167 in
SourceForge). Implemented (somewhat kludgily) option for phonetics scheme to replace e with é iff it is the last letter of the last tsheg bar. This is required by the new THDL phonetics spec. New algorithm, per new THDL phonetics spec, for ba->wa processing. The heuristic is that it applies only to the last tsheg bar in multi-tsheg-bar words. (Previously, ba always generated "?ba/wa?", which is maybe more correct but less attractive.) This heuristic fails on, e.g., "tsheg bar". Oh well. Rationalized format of phonetics file: > is used as separator in exceptions as well as rules. (Previously, : was used in exceptions only.)
This commit is contained in:
parent
3910d355f9
commit
2e9ea92a3a
5 changed files with 136 additions and 72 deletions
|
@ -96,20 +96,14 @@ x ng
|
||||||
; pronunciation (which may also contain spaces). A semicolon
|
; pronunciation (which may also contain spaces). A semicolon
|
||||||
; precedes a comment. Blank lines are OK.
|
; precedes a comment. Blank lines are OK.
|
||||||
|
|
||||||
ba : wa ; mind you, ba (pronounced ba) means cow. But that's much rarer than wa.
|
rdo rje > dorjé
|
||||||
bo : wo
|
mkha' 'gro > khandro
|
||||||
ba'i : wa'i
|
sku mnye > kumnyé
|
||||||
bo'i : wo'i
|
sprul sku > tulku
|
||||||
bar : ?bar/war? ; bar = "middle"; could be either, so supply both and let user sort it out
|
mtsho rgyal > tsogyèl
|
||||||
bor : ?bor/wor? ; bor = "cast away"; could be either, so supply both and let user sort it out
|
rta mgrin> tamdrin
|
||||||
rdo rje : dorjé
|
dga' ldan > ganden
|
||||||
mkha' 'gro : khandro
|
dge 'dun > gendün
|
||||||
sku mnye : kumnyé
|
a mdo > amdo
|
||||||
sprul sku : tulku
|
srid pa > sipa
|
||||||
mtsho rgyal : tsogyèl
|
pad ma > pèma
|
||||||
rta mgrin: tamdrin
|
|
||||||
dga' ldan : ganden
|
|
||||||
dge 'dun : gendün
|
|
||||||
a mdo : amdo
|
|
||||||
srid pa : sipa
|
|
||||||
pad ma : pèma
|
|
|
@ -40,7 +40,7 @@ klad pa > l
|
||||||
glog > log
|
glog > log
|
||||||
le'u > lé'u
|
le'u > lé'u
|
||||||
pa'ang > pa'ang
|
pa'ang > pa'ang
|
||||||
ba'i > wa'i
|
bar ba'i > barwa'i
|
||||||
rta mgrin > tamdrin
|
rta mgrin > tamdrin
|
||||||
|
|
||||||
; Other tests, to exercise particular rules in the grammar that aren't covered in the rules above
|
; Other tests, to exercise particular rules in the grammar that aren't covered in the rules above
|
||||||
|
|
|
@ -18,16 +18,26 @@
|
||||||
; compactly. For example, it would be difficult to capture the
|
; compactly. For example, it would be difficult to capture the
|
||||||
; effects of preinitial consonants on tone (as in the scheme used
|
; effects of preinitial consonants on tone (as in the scheme used
|
||||||
; in Joe Wilson's book, for instance).) Also note that not even the
|
; in Joe Wilson's book, for instance).) Also note that not even the
|
||||||
; whole of the present scheme is implemented using these rules. In
|
; whole of the present scheme is implemented using these rules. For
|
||||||
; particular, the deletion of prefix and superscript consonants,
|
; example, the deletion of prefix and superscript consonants,
|
||||||
; and of wa-zur, are done in program code, not using the rules here.
|
; and of wa-zur, are done in program code, not using the rules here.
|
||||||
|
|
||||||
|
; This makes e come out é only when the last letter in a "word" (*not*
|
||||||
|
; syllable). Our grammar engine is not nearly powerful enough to do
|
||||||
|
; this in a clean way.
|
||||||
|
<?Enable THDL final é kludge?>
|
||||||
|
|
||||||
; Miscellaneous prefix transformations
|
; Miscellaneous prefix transformations
|
||||||
g. ; delete this (representing g prefix, used before root y only)
|
g. ; delete this (representing g prefix, used before root y only)
|
||||||
dby y ; must come before db->w, for dbyang
|
dby y ; must come before db->w, for dbyang
|
||||||
dbr r ; must come before db->w, for dbral
|
dbr r ; must come before db->w, for dbral
|
||||||
db w ; must come before by->j
|
db w ; must come before by->j
|
||||||
|
|
||||||
|
; Removal of confusing 'h's
|
||||||
|
th t
|
||||||
|
ph p
|
||||||
|
tsh ts
|
||||||
|
|
||||||
; c and ch are both transcribed ch. To get this we need a kludge
|
; c and ch are both transcribed ch. To get this we need a kludge
|
||||||
; (involving x), because the rule c -> ch would apply recursively.
|
; (involving x), because the rule c -> ch would apply recursively.
|
||||||
ch c
|
ch c
|
||||||
|
@ -42,10 +52,10 @@ my ny
|
||||||
|
|
||||||
; Retroflexes
|
; Retroflexes
|
||||||
kr tr
|
kr tr
|
||||||
khr thr
|
khr tr
|
||||||
gr dr
|
gr dr
|
||||||
pr tr
|
pr tr
|
||||||
phr thr
|
phr tr
|
||||||
br dr
|
br dr
|
||||||
|
|
||||||
; Other bad behavior from R
|
; Other bad behavior from R
|
||||||
|
@ -55,7 +65,7 @@ sr s
|
||||||
; Uniquely random case
|
; Uniquely random case
|
||||||
zl d
|
zl d
|
||||||
|
|
||||||
; Umlaut of a, o, u followed by d, n, l, s
|
; Umlaut of a, o, u followed by d, n, l, s, and 'i
|
||||||
; Note: this must be done before suffix-stripping.
|
; Note: this must be done before suffix-stripping.
|
||||||
; Before actually doing the umlaut, we "hide" the n in ng, so that ng doesn't
|
; Before actually doing the umlaut, we "hide" the n in ng, so that ng doesn't
|
||||||
; induce umlaut. This is gross; if we had a real grammar engine it wouldn't
|
; induce umlaut. This is gross; if we had a real grammar engine it wouldn't
|
||||||
|
@ -65,17 +75,24 @@ ad e
|
||||||
an en
|
an en
|
||||||
al el
|
al el
|
||||||
as e
|
as e
|
||||||
|
a'i e
|
||||||
od ö
|
od ö
|
||||||
on ön
|
on ön
|
||||||
ol öl
|
ol öl
|
||||||
os ö
|
os ö
|
||||||
|
o'i ö
|
||||||
ud ü
|
ud ü
|
||||||
un ün
|
un ün
|
||||||
ul ül
|
ul ül
|
||||||
us ü
|
us ü
|
||||||
|
u'i ü
|
||||||
; restore ng
|
; restore ng
|
||||||
x ng
|
x ng
|
||||||
|
|
||||||
|
; Stripping of 'i from e'i
|
||||||
|
; (It is stripped from a, o, u by umlaut rules, and from i by vowel-doubling rule.)
|
||||||
|
e'i e
|
||||||
|
|
||||||
; Stripping of suffix d, s, and ' from i and e
|
; Stripping of suffix d, s, and ' from i and e
|
||||||
; Note: this has already been done by the umlaut rules for some cases,
|
; Note: this has already been done by the umlaut rules for some cases,
|
||||||
; which don't need to be repeated here.
|
; which don't need to be repeated here.
|
||||||
|
@ -112,22 +129,27 @@ ub up
|
||||||
|
|
||||||
; There is one exception per line. Each exception consists of
|
; There is one exception per line. Each exception consists of
|
||||||
; the transliteration (which may be several syllables separated
|
; the transliteration (which may be several syllables separated
|
||||||
; by spaces), followed by a space, a colon, a space, and the
|
; by spaces), followed by a space, a greater-than, a space, and the
|
||||||
; pronunciation (which may also contain spaces). A semicolon
|
; pronunciation (which may also contain spaces). A semicolon
|
||||||
; precedes a comment. Blank lines are OK.
|
; precedes a comment. Blank lines are OK.
|
||||||
|
|
||||||
ba : wa ; mind you, ba (pronounced ba) means cow. But that's much rarer than wa.
|
mkha' 'gro > khandro
|
||||||
bo : wo
|
sprul sku > tulku
|
||||||
ba'i : wai
|
rta mgrin > tamdrin
|
||||||
bo'i : woi
|
dga' ldan > ganden
|
||||||
bar : ?bar/war? ; bar = "middle"; could be either, so supply both and let user sort it out
|
dge 'dun > gendün
|
||||||
bor : ?bor/wor? ; bor = "cast away"; could be either, so supply both and let user sort it out
|
a mdo > amdo
|
||||||
rdo rje : dorje
|
bka' 'gyur > kangyur
|
||||||
mkha' 'gro : khandro
|
rgyu 'bras > gyundré
|
||||||
sprul sku : tulku
|
ngos 'dzin > ngöndzin
|
||||||
rta mgrin: tamdrin
|
chab mdo > chamdo
|
||||||
dga' ldan : ganden
|
dpal ldan > penden
|
||||||
dge 'dun : gendün
|
dpal 'bar > pembar
|
||||||
a mdo : amdo
|
rig 'dzin > rindzin
|
||||||
blo bzang : lobzang
|
skyabs 'gro > kyamdro
|
||||||
sbra nag zhol : banakzhöl
|
'bri ru > biru
|
||||||
|
sbra nag zhol > banakzhöl
|
||||||
|
rdo rje > dorje
|
||||||
|
o rgyan > orgyen
|
||||||
|
lha rje > lharjé
|
||||||
|
rgyal rtse > gyantsé
|
|
@ -1,72 +1,120 @@
|
||||||
;
|
;
|
||||||
; These examples come from the draft (8/21/03) THDL Phonetics document
|
; These examples mostly come from the THDL Phonetics document (Jan 2004 draft)
|
||||||
;
|
;
|
||||||
lha sa > lhasa
|
dag pa > dakpa
|
||||||
|
ring po > ringpo
|
||||||
|
rin chen > rinchen
|
||||||
|
lab > lap
|
||||||
|
dum bu > dumbu
|
||||||
|
dmar po > marpo
|
||||||
|
ril bu > rilbu
|
||||||
sa skya pa > sakyapa
|
sa skya pa > sakyapa
|
||||||
blo bzang > lobzang
|
blo bzang > lozang
|
||||||
rnying ma pa > nyingmapa
|
rnying ma pa > nyingmapa
|
||||||
rdo rje > dorje
|
rdo rje > dorjé
|
||||||
dge lugs pa > gelukpa
|
dge lugs pa > gelukpa
|
||||||
gzhis ka rtse > zhikatse
|
gzhis ka rtse > zhikatsé
|
||||||
mar me > marme
|
mar me > marmé
|
||||||
|
dge bshes > geshé
|
||||||
bcu > chu
|
bcu > chu
|
||||||
lce > che
|
gcig pa > chikpa
|
||||||
rin chen bzang po > rinchenzangpo
|
|
||||||
nag chu > nakchu
|
nag chu > nakchu
|
||||||
bka' rgyud pa > kagyüpa
|
'phag pa > pakpa
|
||||||
|
gser thang > sertang
|
||||||
|
khang tshan > khangtsen
|
||||||
|
lce > ché
|
||||||
|
rin chen bzang po > rinchenzangpo
|
||||||
|
bka' rgyud > kagyü
|
||||||
bsod nams> sönam
|
bsod nams> sönam
|
||||||
thub bstan > thupten
|
yul > yül
|
||||||
|
dus tshod > dütsö
|
||||||
|
bon po > bönpo
|
||||||
|
sde dge > degé
|
||||||
|
brgyad > gyé
|
||||||
|
dge rgan > gegen
|
||||||
|
ral pa can > relpachen
|
||||||
|
tshe ring > tsering
|
||||||
|
byes > jé
|
||||||
|
bstan 'dzin > tendzin
|
||||||
'jam dpal dbyangs > jampelyang
|
'jam dpal dbyangs > jampelyang
|
||||||
dge legs > gelek
|
dge legs > gelek
|
||||||
kha btags > khatak
|
kha btags > khatak
|
||||||
|
sngags pa > ngakpa
|
||||||
|
byang chub > jangchup
|
||||||
|
thub bstan > tupten
|
||||||
|
tabs > tap
|
||||||
bka' shag > kashak
|
bka' shag > kashak
|
||||||
sbra nag zhol > banakzhöl
|
sbra nag zhol > banakzhöl
|
||||||
thabs > thap
|
thabs > tap
|
||||||
lha sa ba > lhasawa
|
lha sa ba > lhasawa
|
||||||
jo bo > jowo
|
jo bo > jowo
|
||||||
dpa' bo > pawo
|
dpa' bo > pawo
|
||||||
|
gsal bar > selwar
|
||||||
|
; nga'i deb > ngé dep -- can't do this one, it depends on word segmentation
|
||||||
|
bar ba > barwa
|
||||||
spyan ras gzig > chenrezik
|
spyan ras gzig > chenrezik
|
||||||
|
phyag > chak
|
||||||
sbyin bdag > jindak
|
sbyin bdag > jindak
|
||||||
smyong > nyong
|
smyong > nyong
|
||||||
|
dmyal ba > nyelwa
|
||||||
sgrol ma > drölma
|
sgrol ma > drölma
|
||||||
rten 'brel > tendrel
|
rten 'brel > tendrel
|
||||||
'bras spungs > drepung
|
'bras spungs > drepung
|
||||||
'phrin las > thrinle
|
'phrin las > trinlé
|
||||||
dbang > wang
|
srung ma > sungma
|
||||||
dbral > rel
|
rdzun smra ba > dzünmawa
|
||||||
dbyar kha > yarkha
|
|
||||||
zla ba > dawa
|
|
||||||
klad pa > lepa
|
klad pa > lepa
|
||||||
glog > lok
|
glog > lok
|
||||||
|
zla ba > dawa
|
||||||
|
lha sa > lhasa
|
||||||
|
lho phyogs > lhochok
|
||||||
|
lhun grub > lhündrup
|
||||||
|
dbang > wang
|
||||||
|
dbyar kha > yarkha
|
||||||
|
dbral > rel
|
||||||
le'u > leu
|
le'u > leu
|
||||||
|
khyi'u > khyiu
|
||||||
pa'ang > pang
|
pa'ang > pang
|
||||||
ba'i > wai
|
gri'i > dri
|
||||||
|
'gro ba'i > drowé
|
||||||
|
rgyal bu'i > gyelbü
|
||||||
|
rin po che'i > rinpoché
|
||||||
|
bdag po'i > dakpö
|
||||||
|
le'u'i > leü
|
||||||
rta mgrin > tamdrin
|
rta mgrin > tamdrin
|
||||||
|
|
||||||
; Other tests, to exercise particular rules in the grammar that aren't covered in the rules above
|
|
||||||
g.yon > yön
|
g.yon > yön
|
||||||
phyag > chak
|
phyag > chak
|
||||||
bkra shis > trashi
|
bkra shis > trashi
|
||||||
khros ma > thröma
|
khros ma > tröma
|
||||||
sprul > trül
|
sprul > trül
|
||||||
mri tam ga > mitamga
|
mri tam ga > mitamga
|
||||||
srid pa > sipa
|
srid pa > sipa
|
||||||
pad ma > pema
|
pad ma > pema
|
||||||
pan chen > penchen
|
pan chen > penchen
|
||||||
ral pa can > relpachen
|
thun > tün
|
||||||
thun > thün
|
|
||||||
dus gsum > düsum
|
dus gsum > düsum
|
||||||
sbed > be
|
sbed > bé
|
||||||
ces > che
|
ces > ché
|
||||||
pa'i > pai
|
btsan dbang > tsenwang
|
||||||
che'i > chei
|
tshong khang > tsongkhang
|
||||||
gri'i > dri
|
rdzong > dzong
|
||||||
po'i > poi
|
stabs > tap
|
||||||
le'u'i > leui
|
thug pa > tukpa
|
||||||
rdzogs > dzok
|
debs > dep
|
||||||
thug pa > thukpa
|
|
||||||
'debs > dep
|
|
||||||
sib sib > sipsip
|
sib sib > sipsip
|
||||||
lobs pa > loppa
|
lobs pa > loppa
|
||||||
grub > drup
|
grub > drup
|
||||||
kla col > lachöl
|
kla col > lachöl
|
||||||
|
spyan snga ba > chenngawa
|
||||||
|
sems dpa'i > sempé
|
||||||
|
bon po'i > bönpö
|
||||||
|
rdzogs > dzok
|
||||||
|
|
||||||
|
; Other random tests
|
||||||
|
phreng > treng
|
||||||
|
|
||||||
|
; Test of second-suffix d removal. Made-up word because I don't know real ones.
|
||||||
|
rand > ren
|
||||||
|
; Test that we don't spazz out on single-letter words.
|
||||||
|
a > a
|
||||||
|
ai > ai
|
||||||
|
|
Binary file not shown.
Loading…
Reference in a new issue