Fixed phonetics code to strip post-suffix d (bug 800167 in

SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)
This commit is contained in:
a1tsal 2004-02-20 09:37:23 +00:00
parent 3910d355f9
commit 2e9ea92a3a
5 changed files with 136 additions and 72 deletions

View file

@ -96,20 +96,14 @@ x ng
; pronunciation (which may also contain spaces). A semicolon ; pronunciation (which may also contain spaces). A semicolon
; precedes a comment. Blank lines are OK. ; precedes a comment. Blank lines are OK.
ba : wa ; mind you, ba (pronounced ba) means cow. But that's much rarer than wa. rdo rje > dorjé
bo : wo mkha' 'gro > khandro
ba'i : wa'i sku mnye > kumnyé
bo'i : wo'i sprul sku > tulku
bar : ?bar/war? ; bar = "middle"; could be either, so supply both and let user sort it out mtsho rgyal > tsogyèl
bor : ?bor/wor? ; bor = "cast away"; could be either, so supply both and let user sort it out rta mgrin> tamdrin
rdo rje : dorjé dga' ldan > ganden
mkha' 'gro : khandro dge 'dun > gendün
sku mnye : kumnyé a mdo > amdo
sprul sku : tulku srid pa > sipa
mtsho rgyal : tsogyèl pad ma > pèma
rta mgrin: tamdrin
dga' ldan : ganden
dge 'dun : gendün
a mdo : amdo
srid pa : sipa
pad ma : pèma

View file

@ -40,7 +40,7 @@ klad pa > l
glog > log glog > log
le'u > lé'u le'u > lé'u
pa'ang > pa'ang pa'ang > pa'ang
ba'i > wa'i bar ba'i > barwa'i
rta mgrin > tamdrin rta mgrin > tamdrin
; Other tests, to exercise particular rules in the grammar that aren't covered in the rules above ; Other tests, to exercise particular rules in the grammar that aren't covered in the rules above

View file

@ -18,16 +18,26 @@
; compactly. For example, it would be difficult to capture the ; compactly. For example, it would be difficult to capture the
; effects of preinitial consonants on tone (as in the scheme used ; effects of preinitial consonants on tone (as in the scheme used
; in Joe Wilson's book, for instance).) Also note that not even the ; in Joe Wilson's book, for instance).) Also note that not even the
; whole of the present scheme is implemented using these rules. In ; whole of the present scheme is implemented using these rules. For
; particular, the deletion of prefix and superscript consonants, ; example, the deletion of prefix and superscript consonants,
; and of wa-zur, are done in program code, not using the rules here. ; and of wa-zur, are done in program code, not using the rules here.
; This makes e come out é only when the last letter in a "word" (*not*
; syllable). Our grammar engine is not nearly powerful enough to do
; this in a clean way.
<?Enable THDL final é kludge?>
; Miscellaneous prefix transformations ; Miscellaneous prefix transformations
g. ; delete this (representing g prefix, used before root y only) g. ; delete this (representing g prefix, used before root y only)
dby y ; must come before db->w, for dbyang dby y ; must come before db->w, for dbyang
dbr r ; must come before db->w, for dbral dbr r ; must come before db->w, for dbral
db w ; must come before by->j db w ; must come before by->j
; Removal of confusing 'h's
th t
ph p
tsh ts
; c and ch are both transcribed ch. To get this we need a kludge ; c and ch are both transcribed ch. To get this we need a kludge
; (involving x), because the rule c -> ch would apply recursively. ; (involving x), because the rule c -> ch would apply recursively.
ch c ch c
@ -42,10 +52,10 @@ my ny
; Retroflexes ; Retroflexes
kr tr kr tr
khr thr khr tr
gr dr gr dr
pr tr pr tr
phr thr phr tr
br dr br dr
; Other bad behavior from R ; Other bad behavior from R
@ -55,7 +65,7 @@ sr s
; Uniquely random case ; Uniquely random case
zl d zl d
; Umlaut of a, o, u followed by d, n, l, s ; Umlaut of a, o, u followed by d, n, l, s, and 'i
; Note: this must be done before suffix-stripping. ; Note: this must be done before suffix-stripping.
; Before actually doing the umlaut, we "hide" the n in ng, so that ng doesn't ; Before actually doing the umlaut, we "hide" the n in ng, so that ng doesn't
; induce umlaut. This is gross; if we had a real grammar engine it wouldn't ; induce umlaut. This is gross; if we had a real grammar engine it wouldn't
@ -65,17 +75,24 @@ ad e
an en an en
al el al el
as e as e
a'i e
od ö od ö
on ön on ön
ol öl ol öl
os ö os ö
o'i ö
ud ü ud ü
un ün un ün
ul ül ul ül
us ü us ü
u'i ü
; restore ng ; restore ng
x ng x ng
; Stripping of 'i from e'i
; (It is stripped from a, o, u by umlaut rules, and from i by vowel-doubling rule.)
e'i e
; Stripping of suffix d, s, and ' from i and e ; Stripping of suffix d, s, and ' from i and e
; Note: this has already been done by the umlaut rules for some cases, ; Note: this has already been done by the umlaut rules for some cases,
; which don't need to be repeated here. ; which don't need to be repeated here.
@ -112,22 +129,27 @@ ub up
; There is one exception per line. Each exception consists of ; There is one exception per line. Each exception consists of
; the transliteration (which may be several syllables separated ; the transliteration (which may be several syllables separated
; by spaces), followed by a space, a colon, a space, and the ; by spaces), followed by a space, a greater-than, a space, and the
; pronunciation (which may also contain spaces). A semicolon ; pronunciation (which may also contain spaces). A semicolon
; precedes a comment. Blank lines are OK. ; precedes a comment. Blank lines are OK.
ba : wa ; mind you, ba (pronounced ba) means cow. But that's much rarer than wa. mkha' 'gro > khandro
bo : wo sprul sku > tulku
ba'i : wai rta mgrin > tamdrin
bo'i : woi dga' ldan > ganden
bar : ?bar/war? ; bar = "middle"; could be either, so supply both and let user sort it out dge 'dun > gendün
bor : ?bor/wor? ; bor = "cast away"; could be either, so supply both and let user sort it out a mdo > amdo
rdo rje : dorje bka' 'gyur > kangyur
mkha' 'gro : khandro rgyu 'bras > gyundré
sprul sku : tulku ngos 'dzin > ngöndzin
rta mgrin: tamdrin chab mdo > chamdo
dga' ldan : ganden dpal ldan > penden
dge 'dun : gendün dpal 'bar > pembar
a mdo : amdo rig 'dzin > rindzin
blo bzang : lobzang skyabs 'gro > kyamdro
sbra nag zhol : banakzhöl 'bri ru > biru
sbra nag zhol > banakzhöl
rdo rje > dorje
o rgyan > orgyen
lha rje > lharjé
rgyal rtse > gyantsé

View file

@ -1,72 +1,120 @@
; ;
; These examples come from the draft (8/21/03) THDL Phonetics document ; These examples mostly come from the THDL Phonetics document (Jan 2004 draft)
; ;
lha sa > lhasa dag pa > dakpa
ring po > ringpo
rin chen > rinchen
lab > lap
dum bu > dumbu
dmar po > marpo
ril bu > rilbu
sa skya pa > sakyapa sa skya pa > sakyapa
blo bzang > lobzang blo bzang > lozang
rnying ma pa > nyingmapa rnying ma pa > nyingmapa
rdo rje > dorje rdo rje > dorjé
dge lugs pa > gelukpa dge lugs pa > gelukpa
gzhis ka rtse > zhikatse gzhis ka rtse > zhikatsé
mar me > marme mar me > marmé
dge bshes > geshé
bcu > chu bcu > chu
lce > che gcig pa > chikpa
rin chen bzang po > rinchenzangpo
nag chu > nakchu nag chu > nakchu
bka' rgyud pa > kagyüpa 'phag pa > pakpa
gser thang > sertang
khang tshan > khangtsen
lce > ché
rin chen bzang po > rinchenzangpo
bka' rgyud > kagyü
bsod nams> sönam bsod nams> sönam
thub bstan > thupten yul > yül
dus tshod > dütsö
bon po > bönpo
sde dge > degé
brgyad > gyé
dge rgan > gegen
ral pa can > relpachen
tshe ring > tsering
byes > jé
bstan 'dzin > tendzin
'jam dpal dbyangs > jampelyang 'jam dpal dbyangs > jampelyang
dge legs > gelek dge legs > gelek
kha btags > khatak kha btags > khatak
sngags pa > ngakpa
byang chub > jangchup
thub bstan > tupten
tabs > tap
bka' shag > kashak bka' shag > kashak
sbra nag zhol > banakzhöl sbra nag zhol > banakzhöl
thabs > thap thabs > tap
lha sa ba > lhasawa lha sa ba > lhasawa
jo bo > jowo jo bo > jowo
dpa' bo > pawo dpa' bo > pawo
gsal bar > selwar
; nga'i deb > ngé dep -- can't do this one, it depends on word segmentation
bar ba > barwa
spyan ras gzig > chenrezik spyan ras gzig > chenrezik
phyag > chak
sbyin bdag > jindak sbyin bdag > jindak
smyong > nyong smyong > nyong
dmyal ba > nyelwa
sgrol ma > drölma sgrol ma > drölma
rten 'brel > tendrel rten 'brel > tendrel
'bras spungs > drepung 'bras spungs > drepung
'phrin las > thrinle 'phrin las > trinlé
dbang > wang srung ma > sungma
dbral > rel rdzun smra ba > dzünmawa
dbyar kha > yarkha
zla ba > dawa
klad pa > lepa klad pa > lepa
glog > lok glog > lok
zla ba > dawa
lha sa > lhasa
lho phyogs > lhochok
lhun grub > lhündrup
dbang > wang
dbyar kha > yarkha
dbral > rel
le'u > leu le'u > leu
khyi'u > khyiu
pa'ang > pang pa'ang > pang
ba'i > wai gri'i > dri
'gro ba'i > drowé
rgyal bu'i > gyelbü
rin po che'i > rinpoché
bdag po'i > dakpö
le'u'i > leü
rta mgrin > tamdrin rta mgrin > tamdrin
; Other tests, to exercise particular rules in the grammar that aren't covered in the rules above
g.yon > yön g.yon > yön
phyag > chak phyag > chak
bkra shis > trashi bkra shis > trashi
khros ma > thröma khros ma > tröma
sprul > trül sprul > trül
mri tam ga > mitamga mri tam ga > mitamga
srid pa > sipa srid pa > sipa
pad ma > pema pad ma > pema
pan chen > penchen pan chen > penchen
ral pa can > relpachen thun > tün
thun > thün
dus gsum > düsum dus gsum > düsum
sbed > be sbed > bé
ces > che ces > ché
pa'i > pai btsan dbang > tsenwang
che'i > chei tshong khang > tsongkhang
gri'i > dri rdzong > dzong
po'i > poi stabs > tap
le'u'i > leui thug pa > tukpa
rdzogs > dzok debs > dep
thug pa > thukpa
'debs > dep
sib sib > sipsip sib sib > sipsip
lobs pa > loppa lobs pa > loppa
grub > drup grub > drup
kla col > lachöl kla col > lachöl
spyan snga ba > chenngawa
sems dpa'i > sempé
bon po'i > bönpö
rdzogs > dzok
; Other random tests
phreng > treng
; Test of second-suffix d removal. Made-up word because I don't know real ones.
rand > ren
; Test that we don't spazz out on single-letter words.
a > a
ai > ai

Binary file not shown.