// Test cases for Wylie -> [other representation] -> Wylie round-trip conversion. // // Each line in the file is a comment, blank, or one test case. // // A comment line starts with //. // // Each test case is understood as EWTS; it should convert back into itself. // The following test cases are from DLC's DuffPaneTest. tsha tsa dza sha nga nag nga / bkra shis bde legs/ sgom pa'am sgom pe'am le'u'i'o la'u'i'o/la'am/pa'ang/pe'ang bras dwa gwa gyug g.yag gyag g.yas gyas 'gas gangs gnags 'byung 'byungs blags mnags gdams 'gams // All the compound native-Tibetan stacks; list taken from http://iris.lib.virginia.edu/tibet/collections/langling/tibstacks.html rka rga rnga rja rnya rta rda rna rba rma rtsa rdza lka lga lnga lca lja lta lda lpa lba lha ska sga snga snya sta sda sna spa sba sma stsa kwa khwa gwa cwa nywa twa dwa tswa tshwa zhwa zwa rwa shwa swa hwa kya khya gya pya phya bya mya kra khra gra tra thra dra pra phra bra mra shra sra hra kla gla bla zla rla sla rkya rgya rmya rgwa rtswa skya sgya spya sbya smya skra sgra snra spra sbra smra grwa drwa phywa // Random things known to be hard // Difficult to parse deterministically brgyud // Featuritis brtswand // the only other case of superscript + root + wazur rgwug // two-letter suffix (not to be treated as n suffix + g second-suffix) dbang // plus second sufix dbangs // A common word that is not grammatical, according to some // sources, and that is also hard because br- looks like // b should be the root. (Actually r is, but it's pronounced "lap"!) brlabs // One of the few native Tibetan words with nr in it. snrubs // "'is" is the form of kyis used after bare vowel or a chung re'u'is he'is // Bare vowels a i 'o 'a // Bare polyvowels; not clear if these are really legal, but: a'o i'u'i 'o'i // things that at one time or another tickled bugs in WW code gyurd ga*ra gangs 'gram la'i a'i od bem a ra lnga lang mkha'i mkhe'i khe'i dam // A-chung as root 'a 'od 'angs 'ag 'ad // The exceptions to the three-letter root rule (that the second letter is normally the root): rags lags nags bags bangs gangs rangs langs mangs sangs babs rabs rams nams // From Chris Fynn's list of three-letter root ambiguities (see SourceForge bug #614464), // these are the ones not covered above: dgas 'gas dngas gnad mnad dbas 'bas mags dmas // This is my old root-letter-capitalization and pronunciation test suite. // It's should also be good for tickling conversion bugs. grwa bsgrubs bzlog brlabs sprul // this one is important because it's an exception to the rule that in a // row of three consonants, the second is almost always the root. shabs shes ldan mgon rgyud ten rigs chos dbang dbying dbyang spyang phyug byang sbyin myur po'i mri srung mkha' 'gro rta'i khwa rdzogs rdo rje zla bar ngan rnga g.yo gyo dpa' 'dag klu khyung // These are from DLC's PackageTest brtan blta blag brag bsabs zungs brtib spyoms sku'i bskyangs ag bda' dbang dga' dgra dmar gda' // Sanskrit // Tests from DLC ai heM hiM h-iM haiM hoM hauM hUM pad+me D+h+D+ha gaD+h+D+ha autapa parItshatsawa ke: ka: n+yadA ak+Sha Asa AT+Ta b+da d+ba d+ga d+g+ra d+gara // Other Sanskrit rak+Shasa DAkinI haM pad+ma samuts+tsaya gang+gA sid+d+hi shrI sat+t+wa sarma hU~M`! wa~M`ba~MthangumaM // Sanskrit bug ticklers ragyada 'agarma gayaradasa // This has been discussed on tibetscript@list.mail.virginia.edu. Currently WW gets it wrong. // Plausibly WW can be fixed by disallowing b as a prefix for r unless it has a subscript // (maybe unless it has an l subscript specifically). baraga // There's special-case code for lng in Sanskrit lngaTa // R+, +W, +Y, +R paN+D+R+yau k+Sh+RUM kaR+Wabi waw+Wa fuN+D+Yo yay+Yaya R+Yi // Taken from THDL Sechen text collection dashadigAMsarbatimirabi ni saranan+nAmashrIguh+yagar+b+hatatwaMsh+ts+yayastanataras+yaTa'ikAbiharatisma shrI tan+t+radzamAyadzalasamudAyArtas+yadwArAt+nish+tsitasub+haShitaMguh+yapatimukhAgamanAmabiharatisma guh+ya pa ti sa shad+h+ya laM kA ra nA ma shrA iguh+ya gar+b+ha ta twa pa in ish+tsa ya tan+t+ra s+ya pr-i t+t-i bi ha rti sma // Bare vowels: A I U -i -I // leading weird vowels aiga -Iba // two or more vowels in a row a.a // A.I.ai !!! WW is known not to be able to deal with .- ai.a I.a A.a A.A a.a // ai.-i !!! WW is known not to be able to deal with .- -I.e he.a ha.e ha.a hai.a ha.ai hU.a ha.A ha.U // fun with weird vowels; these test cases look similar, but actually mostly go through different paths // in the (wretchedly convoluted) code. hA.A hA.a hA.e hA.ai // hA.-i !!! WW is known not to be able to deal with .- hU.A hU.a hU.e hU.ai // hU.-i !!! WW is known not to be able to deal with .- hAMA hAMa hAMe hAMai hAM-i hAMu hA.Ada hA.ada hA.eda hA.aida hiMa ho~M`a // hA.-ida !!! WW is known not to be able to deal with .- // M,H on bare vowel aiM eM U~M au~M` aH aiH // bindus in the middle of a word gaMb+ha hUMhU~MhU~M` aMu~MraH raMH // Semivowel as subscript. hr-I kl-iba // Difficult because ny+d is not legal ny+dza raH hai hau // One instance of every stack in tibwn.ini, with sequential vowels attached k+Sha k+ke k+khi k+ngo k+tsu k+tA k+t+yI k+t+rU k+t+r+yai k+t+wau k+th-i k+th+y-I k+Na k+ne k+n+yi k+pho k+mu k+m+yA k+r+yI k+w+yU k+shai k+sau k+s+n-i k+s+m-I k+s+ya k+s+we kh+khi kh+no kh+lu g+gA g+g+hI g+nyU g+dai g+d+hau g+d+h+y-i g+d+h+w-I g+na g+n+ye g+pi g+b+ho g+b+h+yu g+mA g+m+yI g+r+yU g+hai g+h+g+hau g+h+ny-i g+h+n-I g+h+n+ya g+h+me g+h+li g+h+yo g+h+ru g+h+wA ng+kI ng+k+tU ng+k+t+yai ng+k+yau ng+kh-i ng+kh+y-I ng+ga ng+g+re ng+g+yi ng+g+ho ng+g+h+yu ng+g+h+rA ng+ngI ng+tU ng+nai ng+mau ng+y-i ng+l-I ng+sha ng+he ng+k+Shi ng+k+Sh+wo ng+k+Sh+yu ts+tsA ts+tshI ts+tsh+wU ts+tsh+rai ts+nyau ts+n+y-i ts+m-I ts+ya ts+re ts+li ts+h+yo tsh+thu tsh+tshA tsh+yI tsh+rU tsh+lai dz+dzau dz+dz+ny-i dz+dz+w-I dz+dz+ha dz+h+dz+he dz+nyi dz+ny+yo dz+nu dz+n+wA dz+mI dz+yU dz+rai dz+wau dz+h-i dz+h+y-I dz+h+ra dz+h+le dz+h+wi ny+tso ny+ts+mu ny+ts+yA ny+tshI ny+dzU ny+dz+yai ny+dz+hau ny+ny-i ny+p-I ny+pha ny+ye ny+ri ny+lo ny+shu T+kA T+TI T+T+hU T+nai T+pau T+m-i T+y-I T+wa T+se Th+yi Th+ro D+gu D+g+yA D+g+hI D+g+h+rU D+Dai D+D+hau D+D+h+y-i D+n-I D+ma D+ye D+ri D+wo D+hu D+h+D+hA D+h+mI D+h+yU D+h+rai D+h+wau N+T-i N+Th-I N+Da N+D+Ye N+D+ri N+D+R+yo N+D+hu N+NA N+d+rI N+mU N+yai N+wau t+k-i t+k+r-I t+k+wa t+k+se t+gi t+nyo t+Thu t+tA t+t+yI t+t+rU t+t+wai t+thau t+th+y-i t+n-I t+n+ya t+pe t+p+ri t+pho t+mu t+m+yA t+yI t+r+nU t+sai t+s+thau t+s+n-i t+s+n+y-I t+s+ma t+s+m+ye t+s+yi t+s+ro t+s+wu t+r+yA t+w+yI t+k+ShU th+yai th+wau d+g-i d+g+y-I d+g+ra d+g+he d+g+h+ri d+dzo d+du d+d+yA d+d+rI d+d+wU d+d+hai d+d+h+nau d+d+h+y-i d+d+h+r-I d+d+h+wa d+ne d+bi d+b+ro d+b+hu d+b+h+yA d+b+h+rI d+mU d+yai d+r+yau d+w+y-i d+h-I d+h+na d+h+n+ye d+h+mi d+h+yo d+h+ru d+h+r+yA d+h+wI n+kU n+k+tai n+g+hau n+ng-i n+dz-I n+dz+ya n+De n+ti n+t+yo n+t+ru n+t+r+yA n+t+wI n+t+sU n+thai n+dau n+d+d-i n+d+d+r-I n+d+yA n+d+re n+d+hi n+d+h+ro n+d+h+yu n+nA n+n+yI n+pU n+p+rai n+phau n+m-i n+b+h+y-I n+tsa n+ye n+ri n+wo n+w+yu n+sA n+s+yI n+hU n+h+rai p+tau p+t+y-i p+t+r+y-I p+da p+ne p+n+yi p+po p+mu p+lA p+wI p+sU p+s+n+yai p+s+wau p+s+y-i b+g+h-I b+dza b+de b+d+dzi b+d+ho b+d+h+wu b+tA b+nI b+bU b+b+hai b+b+h+yau b+m-i b+h-I b+h+Na b+h+ne b+h+mi b+h+yo b+h+ru b+h+wA m+nyI m+NU m+nai m+n+yau m+p-i m+p+r-I m+pha m+be m+b+hi m+b+h+yo m+mu m+lA m+wI m+sU m+hai y+Yau y+r-i y+w-I y+sa r+khe r+g+hi r+g+h+yo r+ts+yu r+tshA r+dz+nyI r+dz+yU r+Tai r+Thau r+D-i r+N-I r+t+wa r+t+te r+t+si r+t+s+no r+t+s+n+yu r+thA r+th+yI r+d+d+hU r+d+d+h+yai r+d+yau r+d+h-i r+d+h+m-I r+d+h+ya r+d+h+re r+pi r+b+po r+b+bu r+b+hA r+m+mI R+YU R+Wai R+shau R+sh+y-i R+Sh-I R+Sh+Na R+Sh+N+ye R+Sh+mi R+Sh+yo R+su r+hA r+k+ShI l+g+wU l+b+yai l+mau l+y-i l+w-I l+la l+h+we w+yi w+ro w+nu w+WA sh+tsI sh+ts+yU sh+tshai sh+Nau sh+n-i sh+p-I sh+b+ya sh+me sh+yi sh+r+yo sh+lu sh+w+gA sh+w+yI sh+shU Sh+kai Sh+k+rau Sh+T-i Sh+T+y-I Sh+T+ra Sh+T+r+ye Sh+T+wi Sh+Tho Sh+Th+yu Sh+NA Sh+N+yI Sh+DU Sh+thai Sh+pau Sh+p+r-i Sh+m-I Sh+ya Sh+we Sh+Shi s+k+so s+khu s+ts+yA s+TI s+ThU s+t+yai s+t+rau s+t+w-i s+th-I s+th+ya s+n+ye s+n+wi s+pho s+ph+yu s+yA s+r+wI s+sU s+s+wai s+hau s+w+y-i h+ny-I h+Na h+te h+ni h+n+yo h+pu h+phA h+mI h+yU h+lai h+sau h+s+w-i h+w+y-I k+Sh+Na k+Sh+me k+Sh+m+yi k+Sh+yo k+Sh+Ru k+Sh+lA k+Sh+wI a+yU a+rai a+r+yau // Non-Tibetan, Non-Sanskrit words that should be convertible fava vafa vI fai vo fI/ fai/ fo/ fuM // Punctuation. Underscore is not tested because it (correctly) turns into [ ] in round-trip testing. // 012345678901234 !@#$%&*()={}:;<>? a1tsal852ja$)@#%(!Ta)0daM)%!@sa // test special handing for @# @# #@ @a# // Test of [] nesting. Anything ending in t will fail if it isn't "protected" by []. // Note that this will fail in the simulated typing test until the key binding for ] // is made to respect nesting. ra[rat[bat[hat]sat]mat]fa // !!! We *don't* test X/~X here because they are known not to work. // English inserts rta'i [(of horse)] mgrin // It would be really good to have a test of the \ syntax, but that // doesn't translate back to itself, so we can't pass it. g+a +a Su S+ta t+Sa b~a ba~ h- // ??? legal??? pa'am'ang // is this actually illegal? pe'as pe's p'is p'e bskyUMbs bskyUMbsHgro xan tax qan taq shae aeb . + .y // Arguable if these are illegal g.ra r.ya +a e+ya ba+ b+ M ~M Ma Mam ~Ma ~Mam tat sbla // Skt can't end in consonant ragyad // no vowel between ' and g 'garma // stuff testing handling of ' as root b'o r'o 'yo 'wo // Obscure special case lng+ta rSha // These were rejected by WylieWord 2.0 because they are non-TMW stacks; but they are legal Unicode stacks, // so we shouldn't reject them (when in Unicode mode). // // b+g+ka // a+ba // r+ra // This is legal in Unicode but not TMW: // b+'u