WylieWord/Conversion test cases.txt

881 lines
No EOL
9.4 KiB
Text

// Test cases for Wylie -> [other representation] -> Wylie round-trip conversion.
//
// Each line in the file is a comment, blank, or one test case.
//
// A comment line starts with //.
//
// Each test case is understood as EWTS; it should convert back into itself.
// The following test cases are from DLC's DuffPaneTest.
tsha
tsa
dza
sha
nga
nag
nga /
bkra shis bde legs/
sgom pa'am
sgom pe'am
le'u'i'o
la'u'i'o/la'am/pa'ang/pe'ang
bras
dwa
gwa
gyug
g.yag
gyag
g.yas
gyas
'gas
gangs
gnags
'byung
'byungs
blags
mnags
gdams
'gams
// All the compound native-Tibetan stacks; list taken from http://iris.lib.virginia.edu/tibet/collections/langling/tibstacks.html
rka rga rnga rja rnya rta rda rna rba rma rtsa rdza
lka lga lnga lca lja lta lda lpa lba lha
ska sga snga snya sta sda sna spa sba sma stsa
kwa khwa gwa cwa nywa twa dwa tswa tshwa zhwa zwa
rwa shwa swa hwa
kya khya gya pya phya bya mya
kra khra gra tra thra dra pra phra bra mra shra sra hra
kla gla bla zla rla sla
rkya rgya rmya rgwa rtswa
skya sgya spya sbya smya
skra sgra snra spra sbra smra
grwa drwa phywa
// Random things known to be hard
// Difficult to parse deterministically
brgyud
// Featuritis
brtswand
// the only other case of superscript + root + wazur
rgwug
// two-letter suffix (not to be treated as n suffix + g second-suffix)
dbang
// plus second sufix
dbangs
// A common word that is not grammatical, according to some
// sources, and that is also hard because br- looks like
// b should be the root. (Actually r is, but it's pronounced "lap"!)
brlabs
// One of the few native Tibetan words with nr in it.
snrubs
// "'is" is the form of kyis used after bare vowel or a chung
re'u'is
he'is
// Bare vowels
a
i
'o
'a
// Bare polyvowels; not clear if these are really legal, but:
a'o
i'u'i
'o'i
// things that at one time or another tickled bugs in WW code
gyurd
ga*ra
ha_sa
gangs
'gram
la'i
a'i
od
bem a ra
lnga
lang
mkha'i
mkhe'i
khe'i
dam
// A-chung as root
'a
'od
'angs
'ag
'ad
// The exceptions to the three-letter root rule (that the second letter is normally the root):
rags lags nags bags bangs gangs rangs langs mangs sangs babs rabs rams nams
// From Chris Fynn's list of three-letter root ambiguities (see SourceForge bug #614464),
// these are the ones not covered above:
dgas 'gas dngas gnad mnad dbas 'bas mags dmas
// This is my old root-letter-capitalization and pronunciation test suite.
// It's should also be good for tickling conversion bugs.
grwa
bsgrubs
bzlog
brlabs
sprul
// this one is important because it's an exception to the rule that in a
// row of three consonants, the second is almost always the root.
shabs
shes
ldan
mgon
rgyud
ten
rigs
chos
dbang
dbying
dbyang
spyang
phyug
byang
sbyin
myur
po'i
mri
srung
mkha'
'gro
rta'i
khwa
rdzogs
rdo
rje
zla
bar
ngan
rnga
g.yo
gyo
dpa'
'dag
klu
khyung
// These are from DLC's PackageTest
brtan
blta
blag
brag
bsabs
zungs
brtib
spyoms
sku'i
bskyangs
ag
bda'
dbang
dga'
dgra
dmar
gda'
// Sanskrit
// Tests from DLC
ai
heM hiM h-iM haiM hoM hauM hUM
pad+me
D+h+D+ha
gaD+h+D+ha
autapa
parItshatsawa
ke:
ka:
n+yadA
ak+Sha
Asa
AT+Ta
b+da
d+ba
d+ga
d+g+ra
d+gara
// Other Sanskrit
rak+Shasa
DAkinI
haM
pad+ma
samuts+tsaya
gang+gA
sid+d+hi
shrI
sat+t+wa
sarma
hU~M`!
wa~M`ba~MthangumaM
// Sanskrit bug ticklers
ragyada
'agarma
gayaradasa
// This has been discussed on tibetscript@list.mail.virginia.edu. Currently WW gets it wrong.
// Plausibly WW can be fixed by disallowing b as a prefix for r unless it has a subscript
// (maybe unless it has an l subscript specifically).
baraga
// There's special-case code for lng in Sanskrit
lngaTa
// R+, +W, +Y, +R
paN+D+R+yau
k+Sh+RUM
kaR+Wabi
waw+Wa
fuN+D+Yo
yay+Yaya
R+Yi
// Taken from THDL Sechen text collection
dashadigAMsarbatimirabi ni saranan+nAmashrIguh+yagar+b+hatatwaMsh+ts+yayastanataras+yaTa'ikAbiharatisma
shrI tan+t+radzamAyadzalasamudAyArtas+yadwArAt+nish+tsitasub+haShitaMguh+yapatimukhAgamanAmabiharatisma
guh+ya pa ti sa shad+h+ya laM kA ra nA ma shrA iguh+ya gar+b+ha ta twa pa in ish+tsa ya tan+t+ra s+ya pr-i t+t-i bi ha rti sma
// Bare vowels:
A I U -i -I
// leading weird vowels
aiga
-Iba
// two or more vowels in a row
a.a
// A.I.ai !!! WW is known not to be able to deal with .-
ai.a
I.a
A.a
A.A
a.a
// ai.-i !!! WW is known not to be able to deal with .-
-I.e
he.a
ha.e
ha.a
hai.a
ha.ai
hU.a
ha.A
ha.U
// fun with weird vowels; these test cases look similar, but actually mostly go through different paths
// in the (wretchedly convoluted) code.
hA.A
hA.a
hA.e
hA.ai
// hA.-i !!! WW is known not to be able to deal with .-
hU.A
hU.a
hU.e
hU.ai
// hU.-i !!! WW is known not to be able to deal with .-
hAMA
hAMa
hAMe
hAMai
hAM-i
hAMu
hA.Ada
hA.ada
hA.eda
hA.aida
hiMa
ho~M`a
// hA.-ida !!! WW is known not to be able to deal with .-
// M,H on bare vowel
aiM
eM
U~M
au~M`
aH
aiH
// bindus in the middle of a word
gaMb+ha
hUMhU~MhU~M`
aMu~MraH
raMH
// Semivowel as subscript.
hr-I kl-iba
// Difficult because ny+d is not legal
ny+dza
raH
hai
hau
// One instance of every stack in tibwn.ini, with sequential vowels attached
k+Sha
k+ke
k+khi
k+ngo
k+tsu
k+tA
k+t+yI
k+t+rU
k+t+r+yai
k+t+wau
k+th-i
k+th+y-I
k+Na
k+ne
k+n+yi
k+pho
k+mu
k+m+yA
k+r+yI
k+w+yU
k+shai
k+sau
k+s+n-i
k+s+m-I
k+s+ya
k+s+we
kh+khi
kh+no
kh+lu
g+gA
g+g+hI
g+nyU
g+dai
g+d+hau
g+d+h+y-i
g+d+h+w-I
g+na
g+n+ye
g+pi
g+b+ho
g+b+h+yu
g+mA
g+m+yI
g+r+yU
g+hai
g+h+g+hau
g+h+ny-i
g+h+n-I
g+h+n+ya
g+h+me
g+h+li
g+h+yo
g+h+ru
g+h+wA
ng+kI
ng+k+tU
ng+k+t+yai
ng+k+yau
ng+kh-i
ng+kh+y-I
ng+ga
ng+g+re
ng+g+yi
ng+g+ho
ng+g+h+yu
ng+g+h+rA
ng+ngI
ng+tU
ng+nai
ng+mau
ng+y-i
ng+l-I
ng+sha
ng+he
ng+k+Shi
ng+k+Sh+wo
ng+k+Sh+yu
ts+tsA
ts+tshI
ts+tsh+wU
ts+tsh+rai
ts+nyau
ts+n+y-i
ts+m-I
ts+ya
ts+re
ts+li
ts+h+yo
tsh+thu
tsh+tshA
tsh+yI
tsh+rU
tsh+lai
dz+dzau
dz+dz+ny-i
dz+dz+w-I
dz+dz+ha
dz+h+dz+he
dz+nyi
dz+ny+yo
dz+nu
dz+n+wA
dz+mI
dz+yU
dz+rai
dz+wau
dz+h-i
dz+h+y-I
dz+h+ra
dz+h+le
dz+h+wi
ny+tso
ny+ts+mu
ny+ts+yA
ny+tshI
ny+dzU
ny+dz+yai
ny+dz+hau
ny+ny-i
ny+p-I
ny+pha
ny+ye
ny+ri
ny+lo
ny+shu
T+kA
T+TI
T+T+hU
T+nai
T+pau
T+m-i
T+y-I
T+wa
T+se
Th+yi
Th+ro
D+gu
D+g+yA
D+g+hI
D+g+h+rU
D+Dai
D+D+hau
D+D+h+y-i
D+n-I
D+ma
D+ye
D+ri
D+wo
D+hu
D+h+D+hA
D+h+mI
D+h+yU
D+h+rai
D+h+wau
N+T-i
N+Th-I
N+Da
N+D+Ye
N+D+ri
N+D+R+yo
N+D+hu
N+NA
N+d+rI
N+mU
N+yai
N+wau
t+k-i
t+k+r-I
t+k+wa
t+k+se
t+gi
t+nyo
t+Thu
t+tA
t+t+yI
t+t+rU
t+t+wai
t+thau
t+th+y-i
t+n-I
t+n+ya
t+pe
t+p+ri
t+pho
t+mu
t+m+yA
t+yI
t+r+nU
t+sai
t+s+thau
t+s+n-i
t+s+n+y-I
t+s+ma
t+s+m+ye
t+s+yi
t+s+ro
t+s+wu
t+r+yA
t+w+yI
t+k+ShU
th+yai
th+wau
d+g-i
d+g+y-I
d+g+ra
d+g+he
d+g+h+ri
d+dzo
d+du
d+d+yA
d+d+rI
d+d+wU
d+d+hai
d+d+h+nau
d+d+h+y-i
d+d+h+r-I
d+d+h+wa
d+ne
d+bi
d+b+ro
d+b+hu
d+b+h+yA
d+b+h+rI
d+mU
d+yai
d+r+yau
d+w+y-i
d+h-I
d+h+na
d+h+n+ye
d+h+mi
d+h+yo
d+h+ru
d+h+r+yA
d+h+wI
n+kU
n+k+tai
n+g+hau
n+ng-i
n+dz-I
n+dz+ya
n+De
n+ti
n+t+yo
n+t+ru
n+t+r+yA
n+t+wI
n+t+sU
n+thai
n+dau
n+d+d-i
n+d+d+r-I
n+d+yA
n+d+re
n+d+hi
n+d+h+ro
n+d+h+yu
n+nA
n+n+yI
n+pU
n+p+rai
n+phau
n+m-i
n+b+h+y-I
n+tsa
n+ye
n+ri
n+wo
n+w+yu
n+sA
n+s+yI
n+hU
n+h+rai
p+tau
p+t+y-i
p+t+r+y-I
p+da
p+ne
p+n+yi
p+po
p+mu
p+lA
p+wI
p+sU
p+s+n+yai
p+s+wau
p+s+y-i
b+g+h-I
b+dza
b+de
b+d+dzi
b+d+ho
b+d+h+wu
b+tA
b+nI
b+bU
b+b+hai
b+b+h+yau
b+m-i
b+h-I
b+h+Na
b+h+ne
b+h+mi
b+h+yo
b+h+ru
b+h+wA
m+nyI
m+NU
m+nai
m+n+yau
m+p-i
m+p+r-I
m+pha
m+be
m+b+hi
m+b+h+yo
m+mu
m+lA
m+wI
m+sU
m+hai
y+Yau
y+r-i
y+w-I
y+sa
r+khe
r+g+hi
r+g+h+yo
r+ts+yu
r+tshA
r+dz+nyI
r+dz+yU
r+Tai
r+Thau
r+D-i
r+N-I
r+t+wa
r+t+te
r+t+si
r+t+s+no
r+t+s+n+yu
r+thA
r+th+yI
r+d+d+hU
r+d+d+h+yai
r+d+yau
r+d+h-i
r+d+h+m-I
r+d+h+ya
r+d+h+re
r+pi
r+b+po
r+b+bu
r+b+hA
r+m+mI
R+YU
R+Wai
R+shau
R+sh+y-i
R+Sh-I
R+Sh+Na
R+Sh+N+ye
R+Sh+mi
R+Sh+yo
R+su
r+hA
r+k+ShI
l+g+wU
l+b+yai
l+mau
l+y-i
l+w-I
l+la
l+h+we
w+yi
w+ro
w+nu
w+WA
sh+tsI
sh+ts+yU
sh+tshai
sh+Nau
sh+n-i
sh+p-I
sh+b+ya
sh+me
sh+yi
sh+r+yo
sh+lu
sh+w+gA
sh+w+yI
sh+shU
Sh+kai
Sh+k+rau
Sh+T-i
Sh+T+y-I
Sh+T+ra
Sh+T+r+ye
Sh+T+wi
Sh+Tho
Sh+Th+yu
Sh+NA
Sh+N+yI
Sh+DU
Sh+thai
Sh+pau
Sh+p+r-i
Sh+m-I
Sh+ya
Sh+we
Sh+Shi
s+k+so
s+khu
s+ts+yA
s+TI
s+ThU
s+t+yai
s+t+rau
s+t+w-i
s+th-I
s+th+ya
s+n+ye
s+n+wi
s+pho
s+ph+yu
s+yA
s+r+wI
s+sU
s+s+wai
s+hau
s+w+y-i
h+ny-I
h+Na
h+te
h+ni
h+n+yo
h+pu
h+phA
h+mI
h+yU
h+lai
h+sau
h+s+w-i
h+w+y-I
k+Sh+Na
k+Sh+me
k+Sh+m+yi
k+Sh+yo
k+Sh+Ru
k+Sh+lA
k+Sh+wI
a+yU
a+rai
a+r+yau
// Non-Tibetan, Non-Sanskrit words that should be convertible
fava
vafa
vI
fai
vo
fI/ fai/ fo/ fuM
// punctuation
012345678901234
!@#$%&*()_={}:;<>?
a_tsal852ja$)@#%(!Ta)0daM)%!@sa
// test special handing for @#
@# #@ @a#
// Test of [] nesting. Anything ending in t will fail if it isn't "protected" by [].
// Note that this will fail in the simulated typing test until the key binding for ]
// is made to respect nesting.
ra[rat[bat[hat]sat]mat]fa
// !!! We *don't* test X/~X here because they are known not to work.
// English inserts
rta'i [(of horse)] mgrin
// It would be really good to have a test of the \ syntax, but that
// doesn't translate back to itself, so we can't pass it.
<?Reject?>
g+a
+a
Su
S+ta
t+Sa
b~a
ba~
// ??? legal???
pa'am'ang
// is this actually illegal?
pe'as
pe's
p'is
p'e
bskyUMbs
bskyUMbsHgro
xan
tax
qan
taq
shae
aeb
.
+
.y
// Arguable if these are illegal
g.ra
r.ya
+a
e+ya
ba+
b+
M
~M
Ma
Mam
~Ma
~Mam
tat
sbla
// Skt can't end in consonant
ragyad
// no vowel between ' and g
'garma
// stuff testing handling of ' as root
b'o
r'o
'yo
'wo
// Obscure special case
lng+ta
rSha
// These were rejected by WylieWord 2.0 because they are non-TMW stacks; but they are legal Unicode stacks,
// so we shouldn't reject them (when in Unicode mode).
//
// b+g+ka
// a+ba
// r+ra
// This is legal in Unicode but not TMW:
// b+'u