830db8d980
which 2.0 gets wrong (I think -- still not sure on this one!)
844 lines
8.1 KiB
Text
844 lines
8.1 KiB
Text
// Test cases for Wylie -> [other representation] -> Wylie round-trip conversion.
|
|
//
|
|
// Each line in the file is a comment, blank, or one test case.
|
|
//
|
|
// A comment line starts with //.
|
|
//
|
|
// Each test case is understood as EWTS; it should convert back into itself.
|
|
|
|
// The following test cases are from DLC's DuffPaneTest.
|
|
tsha
|
|
tsa
|
|
dza
|
|
sha
|
|
nga
|
|
nag
|
|
nga /
|
|
bkra shis bde legs/
|
|
sgom pa'am
|
|
sgom pe'am
|
|
le'u'i'o
|
|
la'u'i'o/la'am/pa'ang/pe'ang
|
|
bras
|
|
dwa
|
|
gwa
|
|
gyug
|
|
g.yag
|
|
gyag
|
|
g.yas
|
|
gyas
|
|
'gas
|
|
gangs
|
|
gnags
|
|
'byung
|
|
'byungs
|
|
blags
|
|
mnags
|
|
gdams
|
|
'gams
|
|
|
|
// Random things known to be hard
|
|
|
|
// Difficult to parse deterministically
|
|
brgyud
|
|
|
|
// Featuritis
|
|
brtswand
|
|
|
|
// the only other case of superscript + root + wazur
|
|
rgwug
|
|
|
|
// two-letter suffix (not to be treated as n suffix + g second-suffix)
|
|
dbang
|
|
// plus second sufix
|
|
dbangs
|
|
|
|
// A common word that is not grammatical, according to some
|
|
// sources, and that is also hard because br- looks like
|
|
// b should be the root
|
|
brlabs
|
|
|
|
// "'is" is the form of kyis used after bare vowel or a chung
|
|
re'u'is
|
|
he'is
|
|
|
|
// Bare vowels
|
|
a
|
|
i
|
|
'o
|
|
'a
|
|
|
|
// Bare polyvowels; not clear if these are really legal, but:
|
|
a'o
|
|
i'u'i
|
|
'o'i
|
|
|
|
// things that at one time or another tickled bugs in WW code
|
|
gyurd
|
|
ga*ra
|
|
ha_sa
|
|
gangs
|
|
mngas
|
|
'gram
|
|
la'i
|
|
a'i
|
|
od
|
|
bem a ra
|
|
lnga
|
|
lang
|
|
mkha'i
|
|
mkhe'i
|
|
khe'i
|
|
|
|
// A-chung as root
|
|
'a
|
|
'od
|
|
'angs
|
|
'ag
|
|
'ad
|
|
|
|
// The exceptions to the three-letter root rule
|
|
rags lags nags bags bangs gangs rangs langs nangs sangs babs rabs rams nams
|
|
|
|
// This is my old root-letter-capitalization and pronunciation test suite.
|
|
// It's should also be good for tickling conversion bugs.
|
|
grwa
|
|
bsgrubs
|
|
bzlog
|
|
brlabs
|
|
sprul
|
|
// this one is important because it's an exception to the rule that in a
|
|
// row of three consonants, the second is almost always the root.
|
|
shabs
|
|
shes
|
|
ldan
|
|
mgon
|
|
rgyud
|
|
ten
|
|
rigs
|
|
chos
|
|
dbang
|
|
dbying
|
|
dbyang
|
|
spyang
|
|
phyug
|
|
byang
|
|
sbyin
|
|
myur
|
|
po'i
|
|
mri
|
|
srung
|
|
mkha'
|
|
'gro
|
|
rta'i
|
|
khwa
|
|
rdzogs
|
|
rdo
|
|
rje
|
|
zla
|
|
bar
|
|
ngan
|
|
rnga
|
|
g.yo
|
|
gyo
|
|
dpa'
|
|
'dag
|
|
klu
|
|
khyung
|
|
|
|
// These are from DLC's PackageTest
|
|
brtan
|
|
blta
|
|
blag
|
|
brag
|
|
bsabs
|
|
zungs
|
|
brtib
|
|
spyoms
|
|
sku'i
|
|
bskyangs
|
|
ag
|
|
bda'
|
|
dbang
|
|
dga'
|
|
dgra
|
|
dmar
|
|
gda'
|
|
|
|
// Sanskrit
|
|
|
|
// Tests from DLC
|
|
ai
|
|
heM hiM h-iM heM haiM hoM hauM hUM
|
|
pad+me
|
|
D+h+D+ha
|
|
gaD+h+D+ha
|
|
autapa
|
|
parItshatsawa
|
|
ke:
|
|
ka:
|
|
n+yadA
|
|
ak+Sha
|
|
Asa
|
|
AT+Ta
|
|
b+da
|
|
d+ba
|
|
d+ga
|
|
d+g+ra
|
|
d+gara
|
|
|
|
// Other Sanskrit
|
|
rak+Shasa
|
|
DAkinI
|
|
haM
|
|
pad+ma
|
|
samuts+tsaya
|
|
gang+gA
|
|
sid+d+hi
|
|
shrI
|
|
sat+t+wa
|
|
sarma
|
|
hU~M`!
|
|
wa~M`ba~MthangumaM
|
|
|
|
// Sanskrit bug ticklers
|
|
ragyada
|
|
'agarma
|
|
gayaradasa
|
|
// WylieWord 2.0 got this wrong (and I didn't have a test for it then)
|
|
baraga
|
|
// There's special-case code for lng in Sanskrit
|
|
lngaTa
|
|
|
|
// R+, +W, +Y, +R
|
|
paN+D+R+yau
|
|
k+Sh+RUM
|
|
kaR+Wabi
|
|
waw+Wa
|
|
fuN+D+Yo
|
|
yay+Yaya
|
|
R+Yi
|
|
|
|
// Taken from THDL Sechen text collection
|
|
dashadigAMsarbatimirabi ni saranan+nAmashrIguh+yagar+b+hatatwaMsh+ts+yayastanataras+yaTa'ikAbiharatisma
|
|
shrI tan+t+radzamAyadzalasamudAyArtas+yadwArAt+nish+tsitasub+haShitaMguh+yapatimukhAgamanAmabiharatisma
|
|
guh+ya pa ti sa shad+h+ya laM kA ra nA ma shrA iguh+ya gar+b+ha ta twa pa in ish+tsa ya tan+t+ra s+ya pr-i t+t-i bi ha rti sma
|
|
|
|
// Bare vowels:
|
|
A I U -i -I
|
|
// leading weird vowels
|
|
aiga
|
|
-Iba
|
|
// two or more vowels in a row
|
|
a.a
|
|
// A.I.ai !!! WW is known not to be able to deal with .-
|
|
ai.a
|
|
I.a
|
|
A.a
|
|
A.A
|
|
a.a
|
|
// ai.-i !!! WW is known not to be able to deal with .-
|
|
-I.e
|
|
he.a
|
|
ha.e
|
|
ha.a
|
|
hai.a
|
|
ha.ai
|
|
hU.a
|
|
ha.A
|
|
ha.U
|
|
|
|
// fun with weird vowels; these test cases look similar, but actually mostly go through different paths
|
|
// in the (wretchedly convoluted) code.
|
|
hA.A
|
|
hA.a
|
|
hA.e
|
|
hA.ai
|
|
// hA.-i !!! WW is known not to be able to deal with .-
|
|
hU.A
|
|
hU.a
|
|
hU.e
|
|
hU.ai
|
|
// hU.-i !!! WW is known not to be able to deal with .-
|
|
hAMA
|
|
hAMa
|
|
hAMe
|
|
hAMai
|
|
hAM-i
|
|
hAMu
|
|
hA.Ada
|
|
hA.ada
|
|
hA.eda
|
|
hA.aida
|
|
hiMa
|
|
ho~M`a
|
|
// hA.-ida !!! WW is known not to be able to deal with .-
|
|
|
|
// M,H on bare vowel
|
|
aiM
|
|
eM
|
|
U~M
|
|
au~M`
|
|
aH
|
|
aiH
|
|
|
|
// bindus in the middle of a word
|
|
gaMb+ha
|
|
hUMhU~MhU~M`
|
|
aMu~MraH
|
|
|
|
raMH
|
|
|
|
// Semivowel as subscript.
|
|
hr-I kl-iba
|
|
|
|
// Difficult because ny+d is not legal
|
|
ny+dza
|
|
|
|
raH
|
|
hai
|
|
hau
|
|
|
|
// One instance of every stack in tibwn.ini, with sequential vowels attached
|
|
k+Sha
|
|
k+ke
|
|
k+khi
|
|
k+ngo
|
|
k+tsu
|
|
k+tA
|
|
k+t+yI
|
|
k+t+rU
|
|
k+t+r+yai
|
|
k+t+wau
|
|
k+th-i
|
|
k+th+y-I
|
|
k+Na
|
|
k+ne
|
|
k+n+yi
|
|
k+pho
|
|
k+mu
|
|
k+m+yA
|
|
k+r+yI
|
|
k+w+yU
|
|
k+shai
|
|
k+sau
|
|
k+s+n-i
|
|
k+s+m-I
|
|
k+s+ya
|
|
k+s+we
|
|
kh+khi
|
|
kh+no
|
|
kh+lu
|
|
g+gA
|
|
g+g+hI
|
|
g+nyU
|
|
g+dai
|
|
g+d+hau
|
|
g+d+h+y-i
|
|
g+d+h+w-I
|
|
g+na
|
|
g+n+ye
|
|
g+pi
|
|
g+b+ho
|
|
g+b+h+yu
|
|
g+mA
|
|
g+m+yI
|
|
g+r+yU
|
|
g+hai
|
|
g+h+g+hau
|
|
g+h+ny-i
|
|
g+h+n-I
|
|
g+h+n+ya
|
|
g+h+me
|
|
g+h+li
|
|
g+h+yo
|
|
g+h+ru
|
|
g+h+wA
|
|
ng+kI
|
|
ng+k+tU
|
|
ng+k+t+yai
|
|
ng+k+yau
|
|
ng+kh-i
|
|
ng+kh+y-I
|
|
ng+ga
|
|
ng+g+re
|
|
ng+g+yi
|
|
ng+g+ho
|
|
ng+g+h+yu
|
|
ng+g+h+rA
|
|
ng+ngI
|
|
ng+tU
|
|
ng+nai
|
|
ng+mau
|
|
ng+y-i
|
|
ng+l-I
|
|
ng+sha
|
|
ng+he
|
|
ng+k+Shi
|
|
ng+k+Sh+wo
|
|
ng+k+Sh+yu
|
|
ts+tsA
|
|
ts+tshI
|
|
ts+tsh+wU
|
|
ts+tsh+rai
|
|
ts+nyau
|
|
ts+n+y-i
|
|
ts+m-I
|
|
ts+ya
|
|
ts+re
|
|
ts+li
|
|
ts+h+yo
|
|
tsh+thu
|
|
tsh+tshA
|
|
tsh+yI
|
|
tsh+rU
|
|
tsh+lai
|
|
dz+dzau
|
|
dz+dz+ny-i
|
|
dz+dz+w-I
|
|
dz+dz+ha
|
|
dz+h+dz+he
|
|
dz+nyi
|
|
dz+ny+yo
|
|
dz+nu
|
|
dz+n+wA
|
|
dz+mI
|
|
dz+yU
|
|
dz+rai
|
|
dz+wau
|
|
dz+h-i
|
|
dz+h+y-I
|
|
dz+h+ra
|
|
dz+h+le
|
|
dz+h+wi
|
|
ny+tso
|
|
ny+ts+mu
|
|
ny+ts+yA
|
|
ny+tshI
|
|
ny+dzU
|
|
ny+dz+yai
|
|
ny+dz+hau
|
|
ny+ny-i
|
|
ny+p-I
|
|
ny+pha
|
|
ny+ye
|
|
ny+ri
|
|
ny+lo
|
|
ny+shu
|
|
T+kA
|
|
T+TI
|
|
T+T+hU
|
|
T+nai
|
|
T+pau
|
|
T+m-i
|
|
T+y-I
|
|
T+wa
|
|
T+se
|
|
Th+yi
|
|
Th+ro
|
|
D+gu
|
|
D+g+yA
|
|
D+g+hI
|
|
D+g+h+rU
|
|
D+Dai
|
|
D+D+hau
|
|
D+D+h+y-i
|
|
D+n-I
|
|
D+ma
|
|
D+ye
|
|
D+ri
|
|
D+wo
|
|
D+hu
|
|
D+h+D+hA
|
|
D+h+mI
|
|
D+h+yU
|
|
D+h+rai
|
|
D+h+wau
|
|
N+T-i
|
|
N+Th-I
|
|
N+Da
|
|
N+D+Ye
|
|
N+D+ri
|
|
N+D+R+yo
|
|
N+D+hu
|
|
N+NA
|
|
N+d+rI
|
|
N+mU
|
|
N+yai
|
|
N+wau
|
|
t+k-i
|
|
t+k+r-I
|
|
t+k+wa
|
|
t+k+se
|
|
t+gi
|
|
t+nyo
|
|
t+Thu
|
|
t+tA
|
|
t+t+yI
|
|
t+t+rU
|
|
t+t+wai
|
|
t+thau
|
|
t+th+y-i
|
|
t+n-I
|
|
t+n+ya
|
|
t+pe
|
|
t+p+ri
|
|
t+pho
|
|
t+mu
|
|
t+m+yA
|
|
t+yI
|
|
t+r+nU
|
|
t+sai
|
|
t+s+thau
|
|
t+s+n-i
|
|
t+s+n+y-I
|
|
t+s+ma
|
|
t+s+m+ye
|
|
t+s+yi
|
|
t+s+ro
|
|
t+s+wu
|
|
t+r+yA
|
|
t+w+yI
|
|
t+k+ShU
|
|
th+yai
|
|
th+wau
|
|
d+g-i
|
|
d+g+y-I
|
|
d+g+ra
|
|
d+g+he
|
|
d+g+h+ri
|
|
d+dzo
|
|
d+du
|
|
d+d+yA
|
|
d+d+rI
|
|
d+d+wU
|
|
d+d+hai
|
|
d+d+h+nau
|
|
d+d+h+y-i
|
|
d+d+h+r-I
|
|
d+d+h+wa
|
|
d+ne
|
|
d+bi
|
|
d+b+ro
|
|
d+b+hu
|
|
d+b+h+yA
|
|
d+b+h+rI
|
|
d+mU
|
|
d+yai
|
|
d+r+yau
|
|
d+w+y-i
|
|
d+h-I
|
|
d+h+na
|
|
d+h+n+ye
|
|
d+h+mi
|
|
d+h+yo
|
|
d+h+ru
|
|
d+h+r+yA
|
|
d+h+wI
|
|
n+kU
|
|
n+k+tai
|
|
n+g+hau
|
|
n+ng-i
|
|
n+dz-I
|
|
n+dz+ya
|
|
n+De
|
|
n+ti
|
|
n+t+yo
|
|
n+t+ru
|
|
n+t+r+yA
|
|
n+t+wI
|
|
n+t+sU
|
|
n+thai
|
|
n+dau
|
|
n+d+d-i
|
|
n+d+d+r-I
|
|
n+d+yA
|
|
n+d+re
|
|
n+d+hi
|
|
n+d+h+ro
|
|
n+d+h+yu
|
|
n+nA
|
|
n+n+yI
|
|
n+pU
|
|
n+p+rai
|
|
n+phau
|
|
n+m-i
|
|
n+b+h+y-I
|
|
n+tsa
|
|
n+ye
|
|
n+ri
|
|
n+wo
|
|
n+w+yu
|
|
n+sA
|
|
n+s+yI
|
|
n+hU
|
|
n+h+rai
|
|
p+tau
|
|
p+t+y-i
|
|
p+t+r+y-I
|
|
p+da
|
|
p+ne
|
|
p+n+yi
|
|
p+po
|
|
p+mu
|
|
p+lA
|
|
p+wI
|
|
p+sU
|
|
p+s+n+yai
|
|
p+s+wau
|
|
p+s+y-i
|
|
b+g+h-I
|
|
b+dza
|
|
b+de
|
|
b+d+dzi
|
|
b+d+ho
|
|
b+d+h+wu
|
|
b+tA
|
|
b+nI
|
|
b+bU
|
|
b+b+hai
|
|
b+b+h+yau
|
|
b+m-i
|
|
b+h-I
|
|
b+h+Na
|
|
b+h+ne
|
|
b+h+mi
|
|
b+h+yo
|
|
b+h+ru
|
|
b+h+wA
|
|
m+nyI
|
|
m+NU
|
|
m+nai
|
|
m+n+yau
|
|
m+p-i
|
|
m+p+r-I
|
|
m+pha
|
|
m+be
|
|
m+b+hi
|
|
m+b+h+yo
|
|
m+mu
|
|
m+lA
|
|
m+wI
|
|
m+sU
|
|
m+hai
|
|
y+Yau
|
|
y+r-i
|
|
y+w-I
|
|
y+sa
|
|
r+khe
|
|
r+g+hi
|
|
r+g+h+yo
|
|
r+ts+yu
|
|
r+tshA
|
|
r+dz+nyI
|
|
r+dz+yU
|
|
r+Tai
|
|
r+Thau
|
|
r+D-i
|
|
r+N-I
|
|
r+t+wa
|
|
r+t+te
|
|
r+t+si
|
|
r+t+s+no
|
|
r+t+s+n+yu
|
|
r+thA
|
|
r+th+yI
|
|
r+d+d+hU
|
|
r+d+d+h+yai
|
|
r+d+yau
|
|
r+d+h-i
|
|
r+d+h+m-I
|
|
r+d+h+ya
|
|
r+d+h+re
|
|
r+pi
|
|
r+b+po
|
|
r+b+bu
|
|
r+b+hA
|
|
r+m+mI
|
|
R+YU
|
|
R+Wai
|
|
R+shau
|
|
R+sh+y-i
|
|
R+Sh-I
|
|
R+Sh+Na
|
|
R+Sh+N+ye
|
|
R+Sh+mi
|
|
R+Sh+yo
|
|
R+su
|
|
r+hA
|
|
r+k+ShI
|
|
l+g+wU
|
|
l+b+yai
|
|
l+mau
|
|
l+y-i
|
|
l+w-I
|
|
l+la
|
|
l+h+we
|
|
w+yi
|
|
w+ro
|
|
w+nu
|
|
w+WA
|
|
sh+tsI
|
|
sh+ts+yU
|
|
sh+tshai
|
|
sh+Nau
|
|
sh+n-i
|
|
sh+p-I
|
|
sh+b+ya
|
|
sh+me
|
|
sh+yi
|
|
sh+r+yo
|
|
sh+lu
|
|
sh+w+gA
|
|
sh+w+yI
|
|
sh+shU
|
|
Sh+kai
|
|
Sh+k+rau
|
|
Sh+T-i
|
|
Sh+T+y-I
|
|
Sh+T+ra
|
|
Sh+T+r+ye
|
|
Sh+T+wi
|
|
Sh+Tho
|
|
Sh+Th+yu
|
|
Sh+NA
|
|
Sh+N+yI
|
|
Sh+DU
|
|
Sh+thai
|
|
Sh+pau
|
|
Sh+p+r-i
|
|
Sh+m-I
|
|
Sh+ya
|
|
Sh+we
|
|
Sh+Shi
|
|
s+k+so
|
|
s+khu
|
|
s+ts+yA
|
|
s+TI
|
|
s+ThU
|
|
s+t+yai
|
|
s+t+rau
|
|
s+t+w-i
|
|
s+th-I
|
|
s+th+ya
|
|
s+n+ye
|
|
s+n+wi
|
|
s+pho
|
|
s+ph+yu
|
|
s+yA
|
|
s+r+wI
|
|
s+sU
|
|
s+s+wai
|
|
s+hau
|
|
s+w+y-i
|
|
h+ny-I
|
|
h+Na
|
|
h+te
|
|
h+ni
|
|
h+n+yo
|
|
h+pu
|
|
h+phA
|
|
h+mI
|
|
h+yU
|
|
h+lai
|
|
h+sau
|
|
h+s+w-i
|
|
h+w+y-I
|
|
k+Sh+Na
|
|
k+Sh+me
|
|
k+Sh+m+yi
|
|
k+Sh+yo
|
|
k+Sh+Ru
|
|
k+Sh+lA
|
|
k+Sh+wI
|
|
a+yU
|
|
a+rai
|
|
a+r+yau
|
|
|
|
// Non-Tibetan, Non-Sanskrit words that should be convertible
|
|
fava
|
|
vafa
|
|
vI
|
|
fai
|
|
vo
|
|
fI/ fai/ fo/ fuM
|
|
|
|
// punctuation
|
|
012345678901234
|
|
!@#$%&*()_={}:;<>?
|
|
a_tsal852ja$)@#%(!Ta)0daM)%!@sa
|
|
// test special handing for @#
|
|
@# #@ @a#
|
|
|
|
// English inserts
|
|
rta'i [(of horse)] mgrin
|
|
// It would be nice to have a test of the \ syntax, but that doesn't translate back to itself, so we can't pass it.
|
|
|
|
<?Reject?>
|
|
|
|
g+a
|
|
+a
|
|
Su
|
|
S+ta
|
|
t+Sa
|
|
b~a
|
|
ba~
|
|
|
|
// ??? legal???
|
|
pa'am'ang
|
|
// is this actually illegal?
|
|
pe'as
|
|
|
|
pe's
|
|
p'is
|
|
p'e
|
|
|
|
bskyUMbs
|
|
bskyUMbsHgro
|
|
xan
|
|
tax
|
|
qan
|
|
taq
|
|
shae
|
|
aeb
|
|
.
|
|
+
|
|
.y
|
|
// Arguable if these are illegal
|
|
g.ra
|
|
r.ya
|
|
+a
|
|
e+ya
|
|
ba+
|
|
b+
|
|
M
|
|
~M
|
|
Ma
|
|
Mam
|
|
~Ma
|
|
~Mam
|
|
|
|
tat
|
|
sbla
|
|
|
|
// Skt can't end in consonant
|
|
ragyad
|
|
|
|
// no vowel between ' and g
|
|
'garma
|
|
|
|
// stuff testing handling of ' as root
|
|
b'o
|
|
r'o
|
|
'yo
|
|
'wo
|
|
|
|
// Obscure special case
|
|
lng+ta
|
|
|
|
// These were rejected by WylieWord 2.0 because they are non-TMW stacks; but they are legal Unicode stacks,
|
|
// so we shouldn't reject them (when in Unicode mode).
|
|
//
|
|
// b+g+ka
|
|
// a+ba
|
|
// r+ra
|