WylieWord/Conversion test cases.txt

750 lines
6.6 KiB
Text

// Test cases for Wylie -> [other representation] -> Wylie round-trip conversion.
//
// Each line in the file is a comment, blank, or one test case.
//
// A comment line starts with //.
//
// Each test case is understood as EWTS; it should convert back into itself.
// The following test cases are from DLC's DuffPaneTest.
tsha
tsa
dza
sha
nga
nag
nga /
bkra shis bde legs/
sgom pa'am
sgom pe'am
le'u'i'o
la'u'i'o/la'am/pa'ang/pe'ang
bras
dwa
gwa
gyug
g.yag
gyag
g.yas
gyas
'gas
gangs
gnags
'byung
'byungs
blags
mnags
gdams
'gams
// Random things known to be hard
// Difficult to parse deterministically
brgyud
// Featuritis
brtswand
// two-letter suffix (not to be treated as n suffix + g second-suffix)
dbang
// plus second sufix
dbangs
// A common word that is not grammatical, according to some
// sources, and that is also hard because br- looks like
// b should be the root
brlabs
// "'is" is the form of kyis used after bare vowel or a chung
re'u'is
he'is
// Bare vowels
a
i
'o
'a
// Bare polyvowels
a'o
i'u'i
'o'i
// things that at one time or another tickled bugs in WW code
gyurd
ga*ra
ha_sa
gangs
mngas
'gram
// The exceptions to the three-letter root rule
rags lags nags bags bangs gangs rangs langs nangs sangs babs rabs rams nams
// This is my old root-letter-capitalization and pronunciation test suite.
// It's should also be good for tickling conversion bugs.
grwa
bsgrubs
bzlog
brlabs
sprul
shabs
shes
ldan
mgon
rgyud
ten
rigs
chos
dbang
dbying
dbyang
spyang
phyug
byang
sbyin
myur
po'i
mri
srung
mkha'
'gro
rta'i
khwa
rdzogs
rdo
rje
zla
bar
ngan
rnga
g.yo
gyo
dpa'
'dag
klu
khyung
// These are from DLC's PackageTest
brtan
blta
blag
brag
bsabs
zungs
brtib
spyoms
sku'i
bskyangs
ag
bda'
dbang
dga'
dgra
dmar
gda'
// Sanskrit
// Tests from DLC
ai
heM hiM h-iM heM haiM hoM hauM hUM
pad+me
D+h+D+ha
gaD+h+D+ha
autapa
parItshatsawa
ke:
ka:
n+yadA
ak+Sha
Asa
AT+Ta
b+da
d+ba
d+ga
d+g+ra
d+gara
// Other Sanskrit
rak+Shasa
DAkinI
haM
pad+ma
samuts+tsaya
gang+gA
sid+d+hi
shrI
sat+t+wa
sarma
// bug ticklers
ragyada
'agarma
gayaradasa
// Taken from THDL Sechen text collection
dashadigAMsarbatimirabi ni saranan+nAmashrIguh+yagar+b+hatatwaMsh+ts+yayastanataras+yaTa'ikAbiharatisma
shrI tan+t+radzamAyadzalasamudAyArtas+yadwArAt+nish+tsitasub+haShitaMguh+yapatimukhAgamanAmabiharatisma
guh+ya pa ti sa shad+h+ya laM kA ra nA ma shrA iguh+ya gar+b+ha ta twa pa in ish+tsa ya tan+t+ra s+ya pr-i t+t-i bi ha rti sma
// Bare vowels:
A I U -i -I
// leading weird vowels
aiga
-Iba
// two or more vowels in a row
A.I.ai
a.a
ai.-i
-I.e
// M,H on bare vowel
aiM
eM
U^M
au^M
aH
aiH
// M and ^M in the middle of a word
gaMb+ha
hU^MhU^MhU^M
aMu^MraH
raMH
// Semivowel as subscript.
hr-I kl-iba
// Difficult because ny+d is not legal
ny+dza
raH
hai
hau
// One instance of every stack in tibwn.ini, with sequential vowels attached
k+Sha
k+ke
k+khi
k+ngo
k+tsu
k+tA
k+t+yI
k+t+rU
k+t+r+yai
k+t+wau
k+th-i
k+th+y-I
k+Na
k+ne
k+n+yi
k+pho
k+mu
k+m+yA
k+r+yI
k+w+yU
k+shai
k+sau
k+s+n-i
k+s+m-I
k+s+ya
k+s+we
kh+khi
kh+no
kh+lu
g+gA
g+g+hI
g+nyU
g+dai
g+d+hau
g+d+h+y-i
g+d+h+w-I
g+na
g+n+ye
g+pi
g+b+ho
g+b+h+yu
g+mA
g+m+yI
g+r+yU
g+hai
g+h+g+hau
g+h+ny-i
g+h+n-I
g+h+n+ya
g+h+me
g+h+li
g+h+yo
g+h+ru
g+h+wA
ng+kI
ng+k+tU
ng+k+t+yai
ng+k+yau
ng+kh-i
ng+kh+y-I
ng+ga
ng+g+re
ng+g+yi
ng+g+ho
ng+g+h+yu
ng+g+h+rA
ng+ngI
ng+tU
ng+nai
ng+mau
ng+y-i
ng+l-I
ng+sha
ng+he
ng+k+Shi
ng+k+Sh+wo
ng+k+Sh+yu
ts+tsA
ts+tshI
ts+tsh+wU
ts+tsh+rai
ts+nyau
ts+n+y-i
ts+m-I
ts+ya
ts+re
ts+li
ts+h+yo
tsh+thu
tsh+tshA
tsh+yI
tsh+rU
tsh+lai
dz+dzau
dz+dz+ny-i
dz+dz+w-I
dz+dz+ha
dz+h+dz+he
dz+nyi
dz+ny+yo
dz+nu
dz+n+wA
dz+mI
dz+yU
dz+rai
dz+wau
dz+h-i
dz+h+y-I
dz+h+ra
dz+h+le
dz+h+wi
ny+tso
ny+ts+mu
ny+ts+yA
ny+tshI
ny+dzU
ny+dz+yai
ny+dz+hau
ny+ny-i
ny+p-I
ny+pha
ny+ye
ny+ri
ny+lo
ny+shu
T+kA
T+TI
T+T+hU
T+nai
T+pau
T+m-i
T+y-I
T+wa
T+se
Th+yi
Th+ro
D+gu
D+g+yA
D+g+hI
D+g+h+rU
D+Dai
D+D+hau
D+D+h+y-i
D+n-I
D+ma
D+ye
D+ri
D+wo
D+hu
D+h+D+hA
D+h+mI
D+h+yU
D+h+rai
D+h+wau
N+T-i
N+Th-I
N+Da
N+D+ye
N+D+ri
N+D+r+yo
N+D+hu
N+NA
N+d+rI
N+mU
N+yai
N+wau
t+k-i
t+k+r-I
t+k+wa
t+k+se
t+gi
t+nyo
t+Thu
t+tA
t+t+yI
t+t+rU
t+t+wai
t+thau
t+th+y-i
t+n-I
t+n+ya
t+pe
t+p+ri
t+pho
t+mu
t+m+yA
t+yI
// t+rnU -- !!! I think that this is a bogus tibwn.ini entry, but I'm not sure
t+sai
t+s+thau
t+s+n-i
t+s+n+y-I
t+s+ma
t+s+m+ye
t+s+yi
t+s+ro
t+s+wu
t+r+yA
t+w+yI
t+k+ShU
th+yai
th+wau
d+g-i
d+g+y-I
d+g+ra
d+g+he
d+g+h+ri
d+dzo
d+du
d+d+yA
d+d+rI
d+d+wU
d+d+hai
d+d+h+nau
d+d+h+y-i
d+d+h+r-I
d+d+h+wa
d+ne
d+bi
d+b+ro
d+b+hu
d+b+h+yA
d+b+h+rI
d+mU
d+yai
d+r+yau
d+w+y-i
d+h-I
d+h+na
d+h+n+ye
d+h+mi
d+h+yo
d+h+ru
d+h+r+yA
d+h+wI
n+kU
n+k+tai
n+g+hau
n+ng-i
n+dz-I
n+dz+ya
n+De
n+ti
n+t+yo
n+t+ru
n+t+r+yA
n+t+wI
n+t+sU
n+thai
n+dau
n+d+d-i
n+d+d+r-I
n+d+ya
n+d+re
n+d+hi
n+d+h+ro
n+d+h+yu
n+nA
n+n+yI
n+pU
n+p+rai
n+phau
n+m-i
n+b+h+y-I
n+tsa
n+ye
n+ri
n+wo
n+w+yu
n+sA
n+s+yI
n+hU
n+h+rai
p+tau
p+t+y-i
p+t+r+y-I
p+da
p+ne
p+n+yi
p+po
p+mu
p+lA
p+wI
p+sU
p+s+n+yai
p+s+wau
p+s+y-i
b+g+h-I
b+dza
b+de
b+d+dzi
b+d+ho
b+d+h+wu
b+tA
b+nI
b+bU
b+b+hai
b+b+h+yau
b+m-i
b+h-I
b+h+Na
b+h+ne
b+h+mi
b+h+yo
b+h+ru
b+h+wA
m+nyI
m+NU
m+nai
m+n+yau
m+p-i
m+p+r-I
m+pha
m+be
m+b+hi
m+b+h+yo
m+mu
m+lA
m+wI
m+sU
m+hai
y+yau
y+r-i
y+w-I
y+sa
r+khe
r+g+hi
r+g+h+yo
r+ts+yu
r+tshA
r+dz+nyI
r+dz+yU
r+Tai
r+Thau
r+D-i
r+N-I
r+t+wa
r+t+te
r+t+si
r+t+s+no
r+t+s+n+yu
r+thA
r+th+yI
r+d+d+hU
r+d+d+h+yai
r+d+yau
r+d+h-i
r+d+h+m-I
r+d+h+ya
r+d+h+re
r+pi
r+b+po
r+b+bu
r+b+hA
r+m+mI
r+yU
r+wai
r+shau
r+sh+y-i
r+Sh-I
r+Sh+Na
r+Sh+N+ye
r+Sh+mi
r+Sh+yo
r+su
r+hA
r+k+ShI
l+g+wU
l+b+yai
l+mau
l+y-i
l+w-I
l+la
l+h+we
w+yi
w+ro
w+nu
w+wA
sh+tsI
sh+ts+yU
sh+tshai
sh+Nau
sh+n-i
sh+p-I
sh+b+ya
sh+me
sh+yi
sh+r+yo
sh+lu
sh+w+gA
sh+w+yI
sh+shU
Sh+kai
Sh+k+rau
Sh+T-i
Sh+T+y-I
Sh+T+ra
Sh+T+r+ye
Sh+T+wi
Sh+Tho
Sh+Th+yu
Sh+NA
Sh+N+yI
Sh+DU
Sh+thai
Sh+pau
Sh+p+r-i
Sh+m-I
Sh+ya
Sh+we
Sh+Shi
s+k+so
s+khu
s+ts+yA
s+TI
s+ThU
s+t+yai
s+t+rau
s+t+w-i
s+th-I
s+th+ya
s+n+ye
s+n+wi
s+pho
s+ph+yu
s+yA
s+r+wI
s+sU
s+s+wai
s+hau
s+w+y-i
h+ny-I
h+Na
h+te
h+ni
h+n+yo
h+pu
h+phA
h+mI
h+yU
h+lai
h+sau
h+s+w-i
h+w+y-I
k+Sh+Na
k+Sh+me
k+Sh+m+yi
k+Sh+yo
k+Sh+ru
k+Sh+lA
k+Sh+wI
a+yU
a+rai
a+r+yau
// Non-Tibetan, Non-Sanskrit words that should be convertible
fava
vafa
vI
fai
vo
fI/ fai/ fo/ fuM
// punctuation
012345678901234
!@#$%&*()_={}:;<>?
a_tsal852ja$)@#%(!Ta)0daM)%!@sa
// test special handing for @#
@# #@ @a#
// English insert
rta'i [(horse's)] mgrin
<?Reject?>
// The dbu can [m][ng][s] should be read "mngas".
mangs
// ??? legal???
pa'am'ang
// is this actually illegal?
pe'as
'angs
'ag
bskyUMbs
bskyUMbsHgro
b+g+ka
'ad
xan
tax
qan
taq
shae
aeb
.
+
.y
// Arguable if these are illegal
g.ra
r.ya
+a
a+ba
e+ya
ba+
b+
M
^M
Ma
Mam
^Ma
^Mam
r+ra
tat
sbla
// Skt can't end in vowel
ragyad
// no vowel between ' and g
'garma