up that String into tsheg bars, punctuation, etc., while finding
errors. I've tested it some, but I'm not yet committing the tests.
Next step: a converter that takes an ACIP file as input and outputs
TMW+Latin.
and it has the capability to produce error messages and warnings that
make sense to the user. One can now get the correct parse, if one
exists, for an ACIP tsheg bar.
One could even feed in ACIP and get a list of warnings about things as
innocuous as PADMA, which a dumb converter would have trouble with.
One could then turn ACIP into well-behaved ACIP for that dumb
converter, if you really wanted to.
Still to do:
o Scan ACIP files into tsheg bars.
o Produce TMW/Latin (from which you can get Unicode, etc.).
o E-mail the illegal tsheg bars to the ACIP fellows so they can fix
the affected documents (most of the Kangyur has unparseable
creatures).