www/htdocs/TMW_RTF_TO_THDL_WYLIE.html
2003-11-01 04:50:02 +00:00

338 lines
13 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<!-- @author David Chandler -->
<!-- @date-created May 18, 2003 -->
<!-- @editor Emacs, baby! -->
<head>
<title>Tibetan Machine Web Converter</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<script type="text/javascript" src="http://iris.lib.virginia.edu/tibet/scripts/thdl_scripts.js"></script>
<link rel="stylesheet" type="text/css" href="http://iris.lib.virginia.edu/tibet/style/thdl-styles.css"/>
</head>
<body>
<div id="banner">
<a id="logo" href="http://iris.lib.virginia.edu/tibet/index.html"><img id="test" alt="THDL Logo" src="http://iris.lib.virginia.edu/tibet/images/logo.png"/></a>
<h1>The Tibetan &amp; Himalayan Digital Library</h1>
<div id="menubar">
<script type='text/javascript'>function Go(){return}</script>
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/thdl_menu_config.js'></script>
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/menu_new.js'></script>
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/menu9_com.js'></script>
<noscript><p>Your browser does not support javascript.</p></noscript>
<div id='MenuPos' >Menu Loading... </div>
</div><!--END menubar-->
</div><!--END banner-->
<div id="sub_banner">
<div id="search">
<form method="get" action="http://www.google.com/u/thdl">
<p>
<input type="text" name="q" id="q" size="15" maxlength="255" value="" />
<input type="submit" name="sa" id="sa" value="Search"/>
<input type="hidden" name="hq" id="hq" value="inurl:iris.lib.virginia.edu"/>
</p>
</form>
</div>
<div id="breadcrumbs">
<a href="http://iris.lib.virginia.edu/tibet/index.html">Home</a> &gt; <a href="index.html">Tools</a> &gt; <a href="http://iris.lib.virginia.edu/tibet/tools/software.html">Software</a> &gt; Nightly Builds
</div>
</div><!--END banner-->
<div id="main">
<h2>Tibetan Machine Web Converter</h2>
<p>
In recent versions of Jskad, the 'Tools' menu has an option 'Launch
Converter...'.&nbsp; If you use that option, you will find a
first-class Tibetan-to-Tibetan and Tibetan-to-Wylie converter.&nbsp;
That converter has a user-friendly GUI interface, and it tells you
when things go wrong (even things as subtle as your having selected
the wrong conversion).&nbsp; If you need a command-line interface to
that converter, however, read on.
</p>
<p>
In the same JAR file as Jskad, power users will find a command-line
utility that converts Tibetan documents from one digital
representation to another.&nbsp; The converter embodies the same
technology as Jskad itself, but often works even when Jskad fails
due to Java's presently poor support for viewing RTF
documents.&nbsp; This command-line utility converts a Tibetan
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
either of these three output formats:
</p>
<p>
In the same JAR file as Jskad, power users will find a command-line
utility that converts Tibetan documents from one digital
representation to another.&nbsp; The converter embodies the same
technology as Jskad itself, but often works even when Jskad fails
due to Java's presently poor support for viewing RTF
documents.&nbsp; This command-line utility converts a Tibetan
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
either of these three output formats:
</p>
<ul>
<li>RTF files in Unicode</li>
<li>RTF files with the appropriate THDL Extended Wylie (Wylie) used
instead of TMW</li>
<li>RTF files in Tibetan Machine (used in legacy systems)</li>
</ul>
<p>
In addition, this converter can convert Tibetan Machine RTF files to
Tibetan Machine Web RTF files, and takes precautions to ensure that
only a 100% perfect conversion is done in both directions
(TM-&gt;TMW and TMW&gt;TM).&nbsp; One such precaution is that two
independent teams (Garrett and Garson, Chandler) turned the Tibetan
Machine Web <a
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
documentation</a> into TM&lt;-&gt;TMW tables.&nbsp; These tables
were compared, giving full confidence that the tables are as
accurate as the documentation (which has a <a
href="http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515">
few flaws</a> itself).&nbsp; That documentation has not been
extensively verified against the actual fonts, however.&nbsp;
Another precaution is that any unknown characters cause the
conversion to fail, and the result is a document containing merely
the unknown characters.&nbsp; (There are some known, illegal glyphs
created by Tibet Doc, and the converter handles the ones it knows of
and treats the rest as unknown.)
</p>
<p>
This converter is smart enough to solve the &quot;curly-brace
problem&quot;, wherein Tahoma '{', '}', and '\' characters appear
instead of the TMW stacks they are supposed to represent.&nbsp; This
problem originates with certain versions of Microsoft Word's Rich
Text Format writing capabilities.
</p>
<p>
Further, this converter gives a polite error message when a given
.rtf file simply cannot be read by the version of Java used.
</p>
<p>
Perhaps most importantly, the converter has a
<tt>--find-some-non-tmw</tt> mode of operation that gives you, the
user, confidence that RTF reading and writing idiosyncrasies are not
going to interfere with a flawless conversion.&nbsp; It does so by
printing out the first occurrence of a given character in a non-TMW
font.&nbsp; Here is some example output:
</p>
<pre>
java -cp "c:\my thdl tools\Jskad.jar" \
org.thdl.tib.input.TibetanConverter \
--find-some-non-tmw \
"Dalai Lama Fifth History 01.rtf"
Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39
Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45
Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66
Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219
Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237
Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754
</pre>
<p>
Given the above output, you can be sure that a flawless conversion
(barring the appearance of <a href="#knownbugs">known bugs</a>) will
result when you run <tt>java -cp "c:\my thdl tools\Jskad.jar"
org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth
History 01.rtf" &gt; "Dalai Lama Fifth History 01 in THDL Extended
Wylie.rtf"</tt>.&nbsp; (Note that the '&gt;' causes the output to be
directed to the file named thereafter; this is quite handy.)&nbsp;
This is because the only text in the input file besides Tibetan is
whitespace and the Tahoma characters <tt>'{'</tt>, <tt>'}'</tt>, and
<tt>'\'</tt>. These Tahoma characters are understood by the tool;
they are symptoms of the &quot;curly-brace problem&quot;.
</p>
<h3>Failed Conversions</h3>
<p>
In this section, you'll learn how to tell if a conversion has
succeeded in full, ran into minor problems, or failed altogether.
</p>
<h4>TMW to Wylie</h4>
<p>
<font color="red">
This section is too up-to-date -- this is documenting plans for the
future. At present, an error message like
<code>&lt;&lt;[[JSKAD_TMW_TO_WYLIE_ERROR_NO_SUCH_WYLIE: Cannot
convert DuffCode &lt;duffcode font=TibetanMachineWeb7 charNum=72
character=H/&gt; to THDL Extended Wylie. Please see the
documentation for the TMW font and transcribe this
yourself.]]&gt;&gt;</code> appears.
</font>
</p>
<p>
Note that some TMW glyphs have no transliteration in Exteded
Wylie.&nbsp; When you encounter such a glyph, you'll find
<tt>\tmwXYYY</tt> in your output, where X tells you which TMW font
the troublesome glyph comes from and YYY is the decimal number of
the glyph in that font (which is a number between 000 and 255
inclusive, usually between 33 and 126).&nbsp; The following are
values corresponding to X:
</p>
<ul>
<li>
When X is 0, the TibetanMachineWeb font contains the glyph.
</li>
<li>
When X is 1, the TibetanMachineWeb1 font contains the glyph.
</li>
<li>
When X is 2, the TibetanMachineWeb2 font contains the glyph.
</li>
<li>
When X is 3, the TibetanMachineWeb3 font contains the glyph.
</li>
<li>
When X is 4, the TibetanMachineWeb4 font contains the glyph.
</li>
<li>
When X is 5, the TibetanMachineWeb5 font contains the glyph.
</li>
<li>
When X is 6, the TibetanMachineWeb6 font contains the glyph.
</li>
<li>
When X is 7, the TibetanMachineWeb7 font contains the glyph.
</li>
<li>
When X is 8, the TibetanMachineWeb8 font contains the glyph.
</li>
<li>
When X is 9, the TibetanMachineWeb9 font contains the glyph.
</li>
</ul>
<p>
Upon finding a <tt>\tmwXYYY</tt> sequence in your output, you should
consult the <a
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
documentation</a> for the specific TMW font named.&nbsp; Find the
glyph (by its YYY value) and decide how to proceed.&nbsp; If you
find a glyph that you believe should have been converted into
Extended Wylie by the tool, please report this as a bug through the
SourceForge website or via e-mail.
</p>
<h4>Other Conversions</h4>
<p>
The other conversions are all-or-nothing.&nbsp; That is, if you run
into any trouble whatsoever, the result will be a file containing
just the problematic glyphs, each preceded by achen (i.e., U+0F68,
the letter whose THDL Extended Wylie representation is 'a').&nbsp;
These glyphs will be bracketed on the left by U+0F3C (for which the
THDL Extended Wylie is '(') and on the right by U+0F3D (for which
the THDL Extended Wylie is ')').&nbsp; If your result is as long as
your input, then the conversion went flawlessly.
</p>
<p>
There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091])
that has no Tibetan Machine equivalent.&nbsp; This glyph is the only
TMW glyph that can cause a TMW-&gt;TM conversion to fail.&nbsp; It
is fairly common, though, especially if you've used Jskad to prepare
your document.&nbsp; It might be appropriate to change the document
to use TibetanMachineWeb7, glyph 90 [\tmw7090], a similar glyph that
does have a TM equivalent.
</p>
<p>
You might consider using Jskad to convert documents that give
errors, as it has better error reporting and can tell you just
what's wrong.
</p>
<p>
If you ever encounter problems in a TM-&gt;TMW conversion, please
send us mail with the error report (and the problem input document's
resulting document) so that we can improve our tools.
</p>
<h3>Invoking the Converter</h3>
<p>
First add Jskad.jar to your CLASSPATH.&nbsp; You can do this by
setting an environment variable CLASSPATH to contain the absolute
path of the Jskad.jar file and then running the command <tt>java
org.thdl.tib.input.TibetanConverter</tt>.&nbsp; Alternatively, you
can use <code>java -cp "c:\my tibetan documents\Jskad.jar"
org.thdl.tib.input.TibetanConverter</code> where you put in the
appropriate path to Jskad.jar.&nbsp; You will see usage information
appear if you do this correctly; you'll see a message like
<code>java.lang.NoClassDefFoundError:
org/thdl/tib/input/TibetanConverter; Exception in thread
"main"</code> if you've not correctly told Java where to find
Jskad.jar.
</p>
<h3><a name="knownbugs"></a>Known Bugs</h3>
<p>
All known bugs are listed in this section.&nbsp; They're more likely
to be fixed if users complain, so complain away.
</p>
<p>
There are no known bugs at present.
</p>
<p>
Please
<a href="mailto:thdltools-devel@lists.sourceforge.net">
e-mail us</a>
your comments about this page.
</p>
<p>
The
<a href="http://www.sourceforge.net/projects/thdltools">
THDL Tools</a>
project is generously hosted by:
<!--
DO NOT DELETE THE SF.NET LOGO.
We have a choice of colors and sizes for this logo (see
"https://sourceforge.net/docman/display_doc.php?docid=790&group_id=1"),
but we do not have the option of removing it. SourceForge requests
that we put it on each web page for our project, and to give us
incentive to do so, they will not track the number of hits for our
project web pages unless we put this link in. To track hits, see
"http://sourceforge.net/project/stats/index.php?report=months&group_id=61934".
-->
<a href="http://sourceforge.net/">
<img src="http://sourceforge.net/sflogo.php?group_id=61934&amp;type=1"
width="88" height="31" alt="SourceForge Logo" />
</a>
<!-- AGAIN, DO NOT DELETE THE SF.NET LOGO. -->
</p>
</div>
</body>
</html>