WylieWord/THDL phonetics scheme.txt

; THDL Phonetics rules for WylieWord.

; File format: there are two sections, for rules and exceptions.
;
; There is one rule per line.  Typically a rule has two parts,
; separated by a space.  The first part is a sequence of letters
; that, if it appears in a Wylie transliteration, is to be replaced
; by the second part in the phonetic transcription.  The second
; part can be missing, in which case the first part is simply 
; deleted from transcriptions when found.  
;
; A semicolon precedes a comment.  Blank lines are OK.
;
; The rules are applied in the order they appear in this file.
; Each rule is applied as many times as possible, but we never
; go back to a previous rule.  (This simple rewrite-rule grammar is
; not sufficient to implement all phonetic schemes, at least not
; compactly.  For example, it would be difficult to capture the
; effects of preinitial consonants on tone (as in the scheme used
; in Joe Wilson's book, for instance).)  Also note that not even the
; whole of the present scheme is implemented using these rules.  For
; example, the deletion of prefix and superscript consonants, 
; and of wa-zur, are done in program code, not using the rules here.

; This makes e come out <20> only when the last letter in a "word" (*not*
; syllable).  Our grammar engine is not nearly powerful enough to do
; this in a clean way.
<?Enable THDL final <20> kludge?>

; Another thing that is handled kludgily.
<?Enable nasalization before a-chung?>

; Miscellaneous prefix transformations
g.      ; delete this (representing g prefix, used before root y only)
dby y   ; must come before db->w, for dbyang
dbr r   ; must come before db->w, for dbral
db w    ; must come before by->j

; Removal of confusing 'h's
th t
ph p
tsh ts

; c and ch are both transcribed ch.  To get this we need a kludge
; (involving x), because the rule c -> ch would apply recursively.
ch c
c x
x ch

; Bad behavior from Y
py ch
phy ch
by j
my ny

; Retroflexes
kr tr
khr tr
gr dr
pr tr
phr tr
br dr

; Other bad behavior from R
mr m
nr n
sr s

; Uniquely random case
zl d

; Umlaut of a, o, u followed by d, n, l, s, and 'i
; Note: this must be done before suffix-stripping.
; Before actually doing the umlaut, we "hide" the n in ng, so that ng doesn't
;   induce umlaut.  This is gross; if we had a real grammar engine it wouldn't
;   be necessary.
ng x
ad e
an en
al el
as e
a'i e
od <20>
on <20>n
ol <20>l
os <20>
o'i <20>
ud <20>
un <20>n
ul <20>l
us <20>
u'i <20>
; restore ng
x ng

; Stripping of 'i from e'i 
; (It is stripped from a, o, u by umlaut rules, and from i by vowel-doubling rule.)
e'i e

; Stripping of suffix d,  s, and ' from i and e
; Note: this has already been done by the umlaut rules for some cases, 
;       which don't need to be repeated here.
id i
ed e
is i
es e
a' a
e' e
i' i
o' o
u' u

; Remove doubled vowels (e.g. pa'ang -> pang, not paang)
aa a
ee e
ii i
oo o
uu u

; Devoicing of suffix g, b
ag ak
eg ek
ig ik
og ok
ug uk
ab ap
eb ep
ib ip
ob op
ub up

<?Exceptions?>

; There is one exception per line.  Each exception consists of
; the transliteration (which may be several syllables separated
; by spaces), followed by a space, a greater-than, a space, and the
; pronunciation (which may also contain spaces).  A semicolon 
; precedes a comment.  Blank lines are OK.

; Exceptions to nasalization rule:
skyabs 'gro > kyamdro
rten 'brel > tendrel
lam 'bras > lamdr<64>

; Other exceptions:
sprul sku > tulku
rta mgrin > tamdrin
a mdo > amdo
chab mdo > chamdo
dpal ldan > penden
'bri ru > biru
sbra nag zhol > banakzh<7A>l
rdo rje > dorj<72>
o rgyan > orgyen
lha rje > lharj<72>
rgyal rtse > gyants<74>
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								; THDL Phonetics rules for WylieWord.
 								; File format: there are two sections, for rules and exceptions.
 								;
 								; There is one rule per line.  Typically a rule has two parts,
 								; separated by a space.  The first part is a sequence of letters
 								; that, if it appears in a Wylie transliteration, is to be replaced
 								; by the second part in the phonetic transcription.  The second
 								; part can be missing, in which case the first part is simply
 								; deleted from transcriptions when found.
 								;
 								; A semicolon precedes a comment.  Blank lines are OK.
 								;
 								; The rules are applied in the order they appear in this file.
 								; Each rule is applied as many times as possible, but we never
 								; go back to a previous rule.  (This simple rewrite-rule grammar is
 								; not sufficient to implement all phonetic schemes, at least not
 								; compactly.  For example, it would be difficult to capture the
 								; effects of preinitial consonants on tone (as in the scheme used
 								; in Joe Wilson's book, for instance).)  Also note that not even the
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								; whole of the present scheme is implemented using these rules.  For
 								; example, the deletion of prefix and superscript consonants,
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								; and of wa-zur, are done in program code, not using the rules here.
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								; This makes e come out <20> only when the last letter in a "word" (*not*
 								; syllable).  Our grammar engine is not nearly powerful enough to do
 								; this in a clean way.
 								<?Enable THDL final <20> kludge?>
-												Added support for recently-added nasalization rule in THDL phonetics.

											
										
										
											2004-04-12 08:28:35 +00:00
+								; Another thing that is handled kludgily.
 								<?Enable nasalization before a-chung?>
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								; Miscellaneous prefix transformations
 								g.      ; delete this (representing g prefix, used before root y only)
 								dby y   ; must come before db->w, for dbyang
 								dbr r   ; must come before db->w, for dbral
 								db w    ; must come before by->j
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								; Removal of confusing 'h's
 								th t
 								ph p
 								tsh ts
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								; c and ch are both transcribed ch.  To get this we need a kludge
 								; (involving x), because the rule c -> ch would apply recursively.
 								ch c
 								c x
 								x ch
 								; Bad behavior from Y
 								py ch
 								phy ch
 								by j
 								my ny
 								; Retroflexes
 								kr tr
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								khr tr
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								gr dr
 								pr tr
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								phr tr
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								br dr
 								; Other bad behavior from R
 								mr m
-												Change nr to be pronounced n, rather than nr.  See SF bug 1150671.

											
										
										
											2005-02-24 05:54:26 +00:00
+								nr n
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								sr s
 								; Uniquely random case
 								zl d
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								; Umlaut of a, o, u followed by d, n, l, s, and 'i
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								; Note: this must be done before suffix-stripping.
 								; Before actually doing the umlaut, we "hide" the n in ng, so that ng doesn't
 								;   induce umlaut.  This is gross; if we had a real grammar engine it wouldn't
 								;   be necessary.
 								ng x
 								ad e
 								an en
 								al el
 								as e
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								a'i e
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								od <20>
 								on <20>n
 								ol <20>l
 								os <20>
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								o'i <20>
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								ud <20>
 								un <20>n
 								ul <20>l
 								us <20>
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								u'i <20>
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								; restore ng
 								x ng
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								; Stripping of 'i from e'i
 								; (It is stripped from a, o, u by umlaut rules, and from i by vowel-doubling rule.)
 								e'i e
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								; Stripping of suffix d,  s, and ' from i and e
 								; Note: this has already been done by the umlaut rules for some cases,
 								;       which don't need to be repeated here.
 								id i
 								ed e
 								is i
 								es e
 								a' a
 								e' e
 								i' i
 								o' o
 								u' u
 								; Remove doubled vowels (e.g. pa'ang -> pang, not paang)
 								aa a
 								ee e
 								ii i
 								oo o
 								uu u
 								; Devoicing of suffix g, b
 								ag ak
 								eg ek
 								ig ik
 								og ok
 								ug uk
 								ab ap
 								eb ep
 								ib ip
 								ob op
 								ub up
 								<?Exceptions?>
 								; There is one exception per line.  Each exception consists of
 								; the transliteration (which may be several syllables separated
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								; by spaces), followed by a space, a greater-than, a space, and the
-												Initial check-in of 2.0b1 non-binary files.

											
										
										
											2003-09-04 03:08:27 +00:00
+								; pronunciation (which may also contain spaces).  A semicolon
 								; precedes a comment.  Blank lines are OK.
-												Remove from the rules cases that were previously exceptions but are now
handled by the nasalization rule.

											
										
										
											2004-04-18 19:21:35 +00:00
+								; Exceptions to nasalization rule:
 								skyabs 'gro > kyamdro
 								rten 'brel > tendrel
 								lam 'bras > lamdr<64>
 								; Other exceptions:
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								sprul sku > tulku
 								rta mgrin > tamdrin
 								a mdo > amdo
 								chab mdo > chamdo
 								dpal ldan > penden
 								'bri ru > biru
 								sbra nag zhol > banakzh<7A>l
-												dorje gets an acute accent

											
										
										
											2004-02-27 10:21:32 +00:00
+								rdo rje > dorj<72>
-												  Fixed phonetics code to strip post-suffix d (bug 800167 in
  SourceForge).

  Implemented (somewhat kludgily) option for phonetics scheme to
  replace e with é iff it is the last letter of the last tsheg bar.
  This is required by the new THDL phonetics spec.

  New algorithm, per new THDL phonetics spec, for ba->wa processing.
  The heuristic is that it applies only to the last tsheg bar in
  multi-tsheg-bar words.  (Previously, ba always generated "?ba/wa?",
  which is maybe more correct but less attractive.)  This heuristic
  fails on, e.g., "tsheg bar".  Oh well.

  Rationalized format of phonetics file: > is used as separator in exceptions
  as well as rules.  (Previously, : was used in exceptions only.)

											
										
										
											2004-02-20 09:37:23 +00:00
+								o rgyan > orgyen
 								lha rje > lharj<72>
-												Added support for recently-added nasalization rule in THDL phonetics.

											
										
										
											2004-04-12 08:28:35 +00:00
+								rgyal rtse > gyants<74>