WylieWord/Conversion test cases.txt
a1tsal 830db8d980 Better rejection tests, changes for Unicode, new test case "baraga",
which 2.0 gets wrong (I think -- still not sure on this one!)
2005-01-12 11:38:55 +00:00

844 lines
8.1 KiB
Text

// Test cases for Wylie -> [other representation] -> Wylie round-trip conversion.
//
// Each line in the file is a comment, blank, or one test case.
//
// A comment line starts with //.
//
// Each test case is understood as EWTS; it should convert back into itself.
// The following test cases are from DLC's DuffPaneTest.
tsha
tsa
dza
sha
nga
nag
nga /
bkra shis bde legs/
sgom pa'am
sgom pe'am
le'u'i'o
la'u'i'o/la'am/pa'ang/pe'ang
bras
dwa
gwa
gyug
g.yag
gyag
g.yas
gyas
'gas
gangs
gnags
'byung
'byungs
blags
mnags
gdams
'gams
// Random things known to be hard
// Difficult to parse deterministically
brgyud
// Featuritis
brtswand
// the only other case of superscript + root + wazur
rgwug
// two-letter suffix (not to be treated as n suffix + g second-suffix)
dbang
// plus second sufix
dbangs
// A common word that is not grammatical, according to some
// sources, and that is also hard because br- looks like
// b should be the root
brlabs
// "'is" is the form of kyis used after bare vowel or a chung
re'u'is
he'is
// Bare vowels
a
i
'o
'a
// Bare polyvowels; not clear if these are really legal, but:
a'o
i'u'i
'o'i
// things that at one time or another tickled bugs in WW code
gyurd
ga*ra
ha_sa
gangs
mngas
'gram
la'i
a'i
od
bem a ra
lnga
lang
mkha'i
mkhe'i
khe'i
// A-chung as root
'a
'od
'angs
'ag
'ad
// The exceptions to the three-letter root rule
rags lags nags bags bangs gangs rangs langs nangs sangs babs rabs rams nams
// This is my old root-letter-capitalization and pronunciation test suite.
// It's should also be good for tickling conversion bugs.
grwa
bsgrubs
bzlog
brlabs
sprul
// this one is important because it's an exception to the rule that in a
// row of three consonants, the second is almost always the root.
shabs
shes
ldan
mgon
rgyud
ten
rigs
chos
dbang
dbying
dbyang
spyang
phyug
byang
sbyin
myur
po'i
mri
srung
mkha'
'gro
rta'i
khwa
rdzogs
rdo
rje
zla
bar
ngan
rnga
g.yo
gyo
dpa'
'dag
klu
khyung
// These are from DLC's PackageTest
brtan
blta
blag
brag
bsabs
zungs
brtib
spyoms
sku'i
bskyangs
ag
bda'
dbang
dga'
dgra
dmar
gda'
// Sanskrit
// Tests from DLC
ai
heM hiM h-iM heM haiM hoM hauM hUM
pad+me
D+h+D+ha
gaD+h+D+ha
autapa
parItshatsawa
ke:
ka:
n+yadA
ak+Sha
Asa
AT+Ta
b+da
d+ba
d+ga
d+g+ra
d+gara
// Other Sanskrit
rak+Shasa
DAkinI
haM
pad+ma
samuts+tsaya
gang+gA
sid+d+hi
shrI
sat+t+wa
sarma
hU~M`!
wa~M`ba~MthangumaM
// Sanskrit bug ticklers
ragyada
'agarma
gayaradasa
// WylieWord 2.0 got this wrong (and I didn't have a test for it then)
baraga
// There's special-case code for lng in Sanskrit
lngaTa
// R+, +W, +Y, +R
paN+D+R+yau
k+Sh+RUM
kaR+Wabi
waw+Wa
fuN+D+Yo
yay+Yaya
R+Yi
// Taken from THDL Sechen text collection
dashadigAMsarbatimirabi ni saranan+nAmashrIguh+yagar+b+hatatwaMsh+ts+yayastanataras+yaTa'ikAbiharatisma
shrI tan+t+radzamAyadzalasamudAyArtas+yadwArAt+nish+tsitasub+haShitaMguh+yapatimukhAgamanAmabiharatisma
guh+ya pa ti sa shad+h+ya laM kA ra nA ma shrA iguh+ya gar+b+ha ta twa pa in ish+tsa ya tan+t+ra s+ya pr-i t+t-i bi ha rti sma
// Bare vowels:
A I U -i -I
// leading weird vowels
aiga
-Iba
// two or more vowels in a row
a.a
// A.I.ai !!! WW is known not to be able to deal with .-
ai.a
I.a
A.a
A.A
a.a
// ai.-i !!! WW is known not to be able to deal with .-
-I.e
he.a
ha.e
ha.a
hai.a
ha.ai
hU.a
ha.A
ha.U
// fun with weird vowels; these test cases look similar, but actually mostly go through different paths
// in the (wretchedly convoluted) code.
hA.A
hA.a
hA.e
hA.ai
// hA.-i !!! WW is known not to be able to deal with .-
hU.A
hU.a
hU.e
hU.ai
// hU.-i !!! WW is known not to be able to deal with .-
hAMA
hAMa
hAMe
hAMai
hAM-i
hAMu
hA.Ada
hA.ada
hA.eda
hA.aida
hiMa
ho~M`a
// hA.-ida !!! WW is known not to be able to deal with .-
// M,H on bare vowel
aiM
eM
U~M
au~M`
aH
aiH
// bindus in the middle of a word
gaMb+ha
hUMhU~MhU~M`
aMu~MraH
raMH
// Semivowel as subscript.
hr-I kl-iba
// Difficult because ny+d is not legal
ny+dza
raH
hai
hau
// One instance of every stack in tibwn.ini, with sequential vowels attached
k+Sha
k+ke
k+khi
k+ngo
k+tsu
k+tA
k+t+yI
k+t+rU
k+t+r+yai
k+t+wau
k+th-i
k+th+y-I
k+Na
k+ne
k+n+yi
k+pho
k+mu
k+m+yA
k+r+yI
k+w+yU
k+shai
k+sau
k+s+n-i
k+s+m-I
k+s+ya
k+s+we
kh+khi
kh+no
kh+lu
g+gA
g+g+hI
g+nyU
g+dai
g+d+hau
g+d+h+y-i
g+d+h+w-I
g+na
g+n+ye
g+pi
g+b+ho
g+b+h+yu
g+mA
g+m+yI
g+r+yU
g+hai
g+h+g+hau
g+h+ny-i
g+h+n-I
g+h+n+ya
g+h+me
g+h+li
g+h+yo
g+h+ru
g+h+wA
ng+kI
ng+k+tU
ng+k+t+yai
ng+k+yau
ng+kh-i
ng+kh+y-I
ng+ga
ng+g+re
ng+g+yi
ng+g+ho
ng+g+h+yu
ng+g+h+rA
ng+ngI
ng+tU
ng+nai
ng+mau
ng+y-i
ng+l-I
ng+sha
ng+he
ng+k+Shi
ng+k+Sh+wo
ng+k+Sh+yu
ts+tsA
ts+tshI
ts+tsh+wU
ts+tsh+rai
ts+nyau
ts+n+y-i
ts+m-I
ts+ya
ts+re
ts+li
ts+h+yo
tsh+thu
tsh+tshA
tsh+yI
tsh+rU
tsh+lai
dz+dzau
dz+dz+ny-i
dz+dz+w-I
dz+dz+ha
dz+h+dz+he
dz+nyi
dz+ny+yo
dz+nu
dz+n+wA
dz+mI
dz+yU
dz+rai
dz+wau
dz+h-i
dz+h+y-I
dz+h+ra
dz+h+le
dz+h+wi
ny+tso
ny+ts+mu
ny+ts+yA
ny+tshI
ny+dzU
ny+dz+yai
ny+dz+hau
ny+ny-i
ny+p-I
ny+pha
ny+ye
ny+ri
ny+lo
ny+shu
T+kA
T+TI
T+T+hU
T+nai
T+pau
T+m-i
T+y-I
T+wa
T+se
Th+yi
Th+ro
D+gu
D+g+yA
D+g+hI
D+g+h+rU
D+Dai
D+D+hau
D+D+h+y-i
D+n-I
D+ma
D+ye
D+ri
D+wo
D+hu
D+h+D+hA
D+h+mI
D+h+yU
D+h+rai
D+h+wau
N+T-i
N+Th-I
N+Da
N+D+Ye
N+D+ri
N+D+R+yo
N+D+hu
N+NA
N+d+rI
N+mU
N+yai
N+wau
t+k-i
t+k+r-I
t+k+wa
t+k+se
t+gi
t+nyo
t+Thu
t+tA
t+t+yI
t+t+rU
t+t+wai
t+thau
t+th+y-i
t+n-I
t+n+ya
t+pe
t+p+ri
t+pho
t+mu
t+m+yA
t+yI
t+r+nU
t+sai
t+s+thau
t+s+n-i
t+s+n+y-I
t+s+ma
t+s+m+ye
t+s+yi
t+s+ro
t+s+wu
t+r+yA
t+w+yI
t+k+ShU
th+yai
th+wau
d+g-i
d+g+y-I
d+g+ra
d+g+he
d+g+h+ri
d+dzo
d+du
d+d+yA
d+d+rI
d+d+wU
d+d+hai
d+d+h+nau
d+d+h+y-i
d+d+h+r-I
d+d+h+wa
d+ne
d+bi
d+b+ro
d+b+hu
d+b+h+yA
d+b+h+rI
d+mU
d+yai
d+r+yau
d+w+y-i
d+h-I
d+h+na
d+h+n+ye
d+h+mi
d+h+yo
d+h+ru
d+h+r+yA
d+h+wI
n+kU
n+k+tai
n+g+hau
n+ng-i
n+dz-I
n+dz+ya
n+De
n+ti
n+t+yo
n+t+ru
n+t+r+yA
n+t+wI
n+t+sU
n+thai
n+dau
n+d+d-i
n+d+d+r-I
n+d+yA
n+d+re
n+d+hi
n+d+h+ro
n+d+h+yu
n+nA
n+n+yI
n+pU
n+p+rai
n+phau
n+m-i
n+b+h+y-I
n+tsa
n+ye
n+ri
n+wo
n+w+yu
n+sA
n+s+yI
n+hU
n+h+rai
p+tau
p+t+y-i
p+t+r+y-I
p+da
p+ne
p+n+yi
p+po
p+mu
p+lA
p+wI
p+sU
p+s+n+yai
p+s+wau
p+s+y-i
b+g+h-I
b+dza
b+de
b+d+dzi
b+d+ho
b+d+h+wu
b+tA
b+nI
b+bU
b+b+hai
b+b+h+yau
b+m-i
b+h-I
b+h+Na
b+h+ne
b+h+mi
b+h+yo
b+h+ru
b+h+wA
m+nyI
m+NU
m+nai
m+n+yau
m+p-i
m+p+r-I
m+pha
m+be
m+b+hi
m+b+h+yo
m+mu
m+lA
m+wI
m+sU
m+hai
y+Yau
y+r-i
y+w-I
y+sa
r+khe
r+g+hi
r+g+h+yo
r+ts+yu
r+tshA
r+dz+nyI
r+dz+yU
r+Tai
r+Thau
r+D-i
r+N-I
r+t+wa
r+t+te
r+t+si
r+t+s+no
r+t+s+n+yu
r+thA
r+th+yI
r+d+d+hU
r+d+d+h+yai
r+d+yau
r+d+h-i
r+d+h+m-I
r+d+h+ya
r+d+h+re
r+pi
r+b+po
r+b+bu
r+b+hA
r+m+mI
R+YU
R+Wai
R+shau
R+sh+y-i
R+Sh-I
R+Sh+Na
R+Sh+N+ye
R+Sh+mi
R+Sh+yo
R+su
r+hA
r+k+ShI
l+g+wU
l+b+yai
l+mau
l+y-i
l+w-I
l+la
l+h+we
w+yi
w+ro
w+nu
w+WA
sh+tsI
sh+ts+yU
sh+tshai
sh+Nau
sh+n-i
sh+p-I
sh+b+ya
sh+me
sh+yi
sh+r+yo
sh+lu
sh+w+gA
sh+w+yI
sh+shU
Sh+kai
Sh+k+rau
Sh+T-i
Sh+T+y-I
Sh+T+ra
Sh+T+r+ye
Sh+T+wi
Sh+Tho
Sh+Th+yu
Sh+NA
Sh+N+yI
Sh+DU
Sh+thai
Sh+pau
Sh+p+r-i
Sh+m-I
Sh+ya
Sh+we
Sh+Shi
s+k+so
s+khu
s+ts+yA
s+TI
s+ThU
s+t+yai
s+t+rau
s+t+w-i
s+th-I
s+th+ya
s+n+ye
s+n+wi
s+pho
s+ph+yu
s+yA
s+r+wI
s+sU
s+s+wai
s+hau
s+w+y-i
h+ny-I
h+Na
h+te
h+ni
h+n+yo
h+pu
h+phA
h+mI
h+yU
h+lai
h+sau
h+s+w-i
h+w+y-I
k+Sh+Na
k+Sh+me
k+Sh+m+yi
k+Sh+yo
k+Sh+Ru
k+Sh+lA
k+Sh+wI
a+yU
a+rai
a+r+yau
// Non-Tibetan, Non-Sanskrit words that should be convertible
fava
vafa
vI
fai
vo
fI/ fai/ fo/ fuM
// punctuation
012345678901234
!@#$%&*()_={}:;<>?
a_tsal852ja$)@#%(!Ta)0daM)%!@sa
// test special handing for @#
@# #@ @a#
// English inserts
rta'i [(of horse)] mgrin
// It would be nice to have a test of the \ syntax, but that doesn't translate back to itself, so we can't pass it.
<?Reject?>
g+a
+a
Su
S+ta
t+Sa
b~a
ba~
// ??? legal???
pa'am'ang
// is this actually illegal?
pe'as
pe's
p'is
p'e
bskyUMbs
bskyUMbsHgro
xan
tax
qan
taq
shae
aeb
.
+
.y
// Arguable if these are illegal
g.ra
r.ya
+a
e+ya
ba+
b+
M
~M
Ma
Mam
~Ma
~Mam
tat
sbla
// Skt can't end in consonant
ragyad
// no vowel between ' and g
'garma
// stuff testing handling of ' as root
b'o
r'o
'yo
'wo
// Obscure special case
lng+ta
// These were rejected by WylieWord 2.0 because they are non-TMW stacks; but they are legal Unicode stacks,
// so we shouldn't reject them (when in Unicode mode).
//
// b+g+ka
// a+ba
// r+ra