www/htdocs/TMW_RTF_TO_THDL_WYLIE.html
dchandler 59c398f1ce Added --find-some-non-tm and --find-all-non-tm modes to the converter to
help ensure worry-free TM->TMW conversions.
2003-06-22 00:13:57 +00:00

446 lines
15 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<!-- @author David Chandler -->
<!-- @date-created May 18, 2003 -->
<!-- @editor Emacs, baby! -->
<!--
==============
INSERT KEYWORDS AND DESCRIPTION HERE
==============
-->
<meta name="keywords" content="tibetan fonts, tibetan software, digital ethnography">
<meta name="description" content="This presents intellectual and electronic tools for using Tibetan language in a digital medium and for viewing the THDL site.">
<!--
==============
INSERT PAGE TITLE HERE
==============
In order to facilitate the use of the unicode character set the charset declaration will be set equal to utf-8
Always incorporate Tibetan and Himalayan
-->
<title>Tibetan Machine Web Converter</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-utf-8">
<!--
==============
THIS IS THE STYLE FOR THE BODY; IT'S WHAT CALLS ON THE BANNER AS A BACKGROUND IMAGE AND SETS UP THE BACKGROUND COLOR. MAKE SURE TO LINK TO THE BANNER HERE.
==============
-->
<style type="text/css">
<!--
body {background-attachment: scroll; background-image: url(http://iris.lib.virginia.edu/tibet/images/bannerTools.gif); background-repeat: no-repeat; background-position: left top; backgroud-color: white}
-->
</style>
<!--
==============
THE STYLE FOR THE TEXT, ETC. IT GOES TO THE TOP-LEVEL STYLE PAGE. IS THERE A NEED FOR EACH COLLECTION TO HAVE INDIVIDUAL STYLE PAGES?
==============
-->
<link rel="stylesheet" href="http://iris.lib.virginia.edu/tibet/style/tools.css">
<!--
==============
MAKES NETSCAPE RELOAD IF THE WINDOW IS RESIZED
==============
-->
<script language="JavaScript">
<!--
function MM_reloadPage(init) { //reloads the window if Nav4 resized
if (init==true) with (navigator) {if ((appName=="Netscape")&&(parseInt(appVersion)==4)) {
document.MM_pgW=innerWidth; document.MM_pgH=innerHeight; onresize=MM_reloadPage; }}
else if (innerWidth!=document.MM_pgW || innerHeight!=document.MM_pgH) location.reload();
}
MM_reloadPage(true);
// -->
</script>
</head>
<!--
==============
SETS THE PAGE MARGINS TO "0" SO THE MENU DOESN'T GET SCREWED UP
==============
-->
<body topmargin="0" leftmargin="0" marginwidth="0" marginheight="0">
<!--
==============
THE JAVASCRIPT MENUS
==============
-->
<script type='text/javascript'>
//HV Menu v5- by Ger Versluis (http://www.burmees.nl/)
//Submitted to Dynamic Drive (http://www.dynamicdrive.com)
//Visit http://www.dynamicdrive.com for this script and more
function Go(){return}
//==============
// --CALL UP THE MENU. THE 1ST OF THE CHOICES HAS THE VARIABLES
// AND NEEDS TO BE UPDATED FOR EACH SECTION.
// MENU_LOADER.JS WILL ALWAYS BE THE SAME.
//==============
</script>
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/menu_tools.js'></script>
<script type='text/javascript' src='http://iris.lib.virginia.edu//tibet/scripts/menu_loader.js'></script>
<script type="text/javascript" src="http://iris.lib.virginia.edu//tibet/scripts/thdl_scripts.js"></script>
<noscript> Your browser does not support Javascript menus. Please utilize the
Site Map for navigation.</noscript>
<div align="left" style="position:absolute; left:0px; top:0px; width:100px; height:50px; z-index:7">
<a href="http://iris.lib.virginia.edu/tibet/index.html">
<img src="http://iris.lib.virginia.edu/tibet/images/spacer.gif" style="border-width:0" width="100" height="50">
</a>
</div>
<div id="Layer1" style="position:absolute; left:100px; top:28px; width:708px; height:20px; z-index:1">
<!--
==============
INSERT THE BREADCRUMBS
==============
A HREF tags should not be applied to current page, e.g. THDL and Collections get link, Literature and Home do not. If you were on a subpage of Literature, then Literature would link to the Literature home page.
-->
<div align="right"><font color="#000000"><A HREF="http://iris.lib.virginia.edu/tibet/index.html">THDL</A> : <A HREF="http://iris.lib.virginia.edu/tibet/tools/index.html">Tools</A> : <a href="http://iris.lib.virginia.edu/tibet/tools/software.html">Software</a> : Tibetan Machine Web Converter</a></font>
</div>
</div>
<!--
==============
INSERT THE MENU
==============
-->
<div id='MenuPos' style="text-size:9px; layer-background-color:#CCCCCC; background-color: #CCCCCC; position:absolute; left:0px; top:50px; width:808px"><table width="808" cellpadding="0" cellspacing="0" height="19"><tr><td><p style="font-size:9px">Menu loading...</p></td></tr></table></div>
<!--
==============
MAKE THE LAYER THAT WILL HOLD THE CONTENT OF THE PAGE - THE TEXT, IMAGES, WHATEVER. THIS LAYER WILL BE CLOSED AT THE END OF THIS HTML DOCUMENT
==============
Maximum table width not to exceed 750
All images must have borders of 1 pixel
No image will exceed 325 X 325 height and width measurements
Position attribute on layer3 may need to be changed to absolute to accomodate Netscape
-->
<div id="Layer3" style="position:relative; left:7px; top:80px; width:801px; height:396px; z-index:1; overflow: visible; background-color: #FFFFFF; layer-background-color: #FFFFFF; border: 1px none #000000" >
<table width="750" border="0" cellspacing="0" cellpadding="10">
<tr>
<td valign="top" align="left">
<!--
=============
FOR EVERY "ADMINISTRATIVELY DISTINCT" PROJECT, THERE SHOULD BE A CREDIT FOR THE PROVIDER OF THE INFO
=============
-->
<script langauge="JavaScript">
function openWin(url, name) { popupWin = window.open(url, name,"resizable=1,scrollbars=1,toolbar=0,width=400 ,height=450")
}
</script>
<!--
============
THIS LINK WILL OPEN A SEPARATE WINDOW THAT WILL PROVIDE INFORMATION REGARDING THE ADVISORY BOARD MEMBERS. ALSO PROVIDED IN THE PAGE WILL BE PARTICIPATION AND DONATION INFORMATION.
============
-->
<!--
==============
SETS THE BODY TEXT TO JUSTIFIED
==============
-->
<div align=justify>
<!--
=============
INSERT LINK TO GUIDED TOUR HERE -uncomment when ready.
=============
-->
<!--
==============
INSERT BODY TEXT HERE
The first section of text is the short "introduction" about the Theme and the various discplines that have a vested interest in them.
Design principle: Bold the first few words of this text section.
==============
-->
<h2>Tibetan Machine Web Converter</h2>
<p>
In the same JAR file as Jskad, power users will find a command-line
utility that converts Tibetan documents from one digital
representation to another.&nbsp; The converter embodies the same
technology as Jskad itself, but often works even when Jskad fails
due to Java's presently poor support for viewing RTF
documents.&nbsp; This command-line utility converts a Tibetan
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
either of these three output formats:
</p>
<ul>
<li>RTF files in Unicode</li>
<li>RTF files with the appropriate THDL Extended Wylie (Wylie) used
instead of TMW</li>
<li>RTF files in Tibetan Machine (used in legacy systems)</li>
</ul>
<p>
In addition, this converter can convert Tibetan Machine RTF files to
Tibetan Machine Web RTF files, and takes precautions to ensure that
only a 100% perfect conversion is done in both directions
(TM-&gt;TMW and TMW&gt;TM).&nbsp; One such precaution is that two
independent teams (Garrett and Garson, Chandler) turned the Tibetan
Machine Web <a
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
documentation</a> into TM&lt;-&gt;TMW tables.&nbsp; These tables
were compared, giving full confidence that the tables are as
accurate as the documentation (which has a <a
href="http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515">
few flaws</a> itself).&nbsp; That documentation has not been
extensively verified against the actual fonts, however.&nbsp;
Another precaution is that any unknown characters cause the
conversion to fail, and the result is a document containing merely
the unknown characters.&nbsp; (There are some known, illegal glyphs
created by Tibet Doc, and the converter handles the ones it knows of
and treats the rest as unknown.)
</p>
<p>
This converter is smart enough to solve the &quot;curly-brace
problem&quot;, wherein Tahoma '{', '}', and '\' characters appear
instead of the TMW stacks they are supposed to represent.&nbsp; This
problem originates with certain versions of Microsoft Word's Rich
Text Format writing capabilities.
</p>
<p>
Further, this converter gives a polite error message when a given
.rtf file simply cannot be read by the version of Java used.
</p>
<p>
Perhaps most importantly, the converter has a
<tt>--find-some-non-tmw</tt> mode of operation that gives you, the
user, confidence that RTF reading and writing idiosyncrasies are not
going to interfere with a flawless conversion.&nbsp; It does so by
printing out the first occurrence of a given character in a non-TMW
font.&nbsp; Here is some example output:
</p>
<pre>
java -cp "c:\my thdl tools\Jskad.jar" \
org.thdl.tib.input.TibetanConverter \
--find-some-non-tmw \
"Dalai Lama Fifth History 01.rtf"
Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39
Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45
Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66
Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219
Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237
Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754
</pre>
<p>
Given the above output, you can be sure that a flawless conversion
(barring the appearance of <a href="#knownbugs">known bugs</a>) will
result when you run <tt>java -cp "c:\my thdl tools\Jskad.jar"
org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth
History 01.rtf" &gt; "Dalai Lama Fifth History 01 in THDL Extended
Wylie.rtf"</tt>.&nbsp; (Note that the '&gt;' causes the output to be
directed to the file named thereafter; this is quite handy.)&nbsp;
This is because the only text in the input file besides Tibetan is
whitespace and the Tahoma characters <tt>'{'</tt>, <tt>'}'</tt>, and
<tt>'\'</tt>. These Tahoma characters are understood by the tool;
they are symptoms of the &quot;curly-brace problem&quot;.
</p>
<h3>Failed Conversions</h3>
<p>
In this section, you'll learn how to tell if a conversion has
succeeded in full, ran into minor problems, or failed altogether.
</p>
<h4>TMW to Wylie</h4>
<p>
Note that some TMW glyphs have no transliteration in Exteded
Wylie.&nbsp; When you encounter such a glyph, you'll find
<tt>\tmwXYYY</tt> in your output, where X tells you which TMW font
the troublesome glyph comes from and YYY is the decimal number of
the glyph in that font (which is a number between 000 and 255
inclusive, usually between 33 and 126).&nbsp; The following are
values corresponding to X:
</p>
<ul>
<li>
When X is 0, the TibetanMachineWeb font contains the glyph.
</li>
<li>
When X is 1, the TibetanMachineWeb1 font contains the glyph.
</li>
<li>
When X is 2, the TibetanMachineWeb2 font contains the glyph.
</li>
<li>
When X is 3, the TibetanMachineWeb3 font contains the glyph.
</li>
<li>
When X is 4, the TibetanMachineWeb4 font contains the glyph.
</li>
<li>
When X is 5, the TibetanMachineWeb5 font contains the glyph.
</li>
<li>
When X is 6, the TibetanMachineWeb6 font contains the glyph.
</li>
<li>
When X is 7, the TibetanMachineWeb7 font contains the glyph.
</li>
<li>
When X is 8, the TibetanMachineWeb8 font contains the glyph.
</li>
<li>
When X is 9, the TibetanMachineWeb9 font contains the glyph.
</li>
</ul>
<p>
Upon finding a <tt>\tmwXYYY</tt> sequence in your output, you should
consult the <a
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
documentation</a> for the specific TMW font named.&nbsp; Find the
glyph (by its YYY value) and decide how to proceed.&nbsp; If you
find a glyph that you believe should have been converted into
Extended Wylie by the tool, please report this as a bug through the
SourceForge website or via e-mail.
</p>
<h4>Other Conversions</h4>
<p>
The other conversions are all-or-nothing.&nbsp; That is, if you run
into any trouble whatsoever, the result will be a file containing
just the problematic glyphs.&nbsp; If your result is as long as your
input, then the conversion went flawlessly.
</p>
<p>
There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091])
that has no Tibetan Machine equivalent.&nbsp; This glyph is the only
TMW glyph that can cause a TMW-&gt;TM conversion to fail.&nbsp; It
is fairly common, though, especially if you've used Jskad to prepare
your document.&nbsp; It might be appropriate to change the document
to use TibetanMachineWeb7, glyph 90 [\tmw7090], a similar glyph that
does have a TM equivalent.
</p>
<p>
You might consider using Jskad to convert documents that give
errors, as it has better error reporting and can tell you just
what's wrong.
</p>
<p>
If you ever encounter problems in a TM-&gt;TMW conversion, please
send us mail with the error report (and the problem input document's
resulting document) so that we can improve our tools.
</p>
<h3>Invoking the Converter</h3>
<p>
First add Jskad.jar to your CLASSPATH.&nbsp; You can do this by
setting an environment variable CLASSPATH to contain the absolute
path of the Jskad.jar file and then running the command <tt>java
org.thdl.tib.input.TibetanConverter</tt>.&nbsp; Alternatively, you
can use <code>java -cp "c:\my tibetan documents\Jskad.jar"
org.thdl.tib.input.TibetanConverter</code> where you put in the
appropriate path to Jskad.jar.&nbsp; You will see usage information
appear if you do this correctly; you'll see a message like
<code>java.lang.NoClassDefFoundError:
org/thdl/tib/input/TibetanConverter; Exception in thread
"main"</code> if you've not correctly told Java where to find
Jskad.jar.
</p>
<h3><a name="knownbugs"></a>Known Bugs</h3>
<p>
All known bugs are listed in this section.&nbsp; They're more likely
to be fixed if users complain, so complain away.
</p>
<p>
First, if the TMW given is not syntactically legal, then the Wylie
that results will not necessarily yield, if imported into Jskad, the
same Tibetan with which the converter started.&nbsp; The glyphs
corresponding to the Wylie 'jaskadaskeda' have this problem, for
example.
</p>
<p>
Please
<a href="mailto:thdltools-devel@lists.sourceforge.net">
e-mail us</a>
your comments about this page.
</p>
<p>
The
<a target="_blank" href="http://www.sourceforge.net/projects/thdltools">
THDL Tools</a>
project is generously hosted by:
<!--
DO NOT DELETE THE SF.NET LOGO.
We have a choice of colors and sizes for this logo (see
"https://sourceforge.net/docman/display_doc.php?docid=790&group_id=1"),
but we do not have the option of removing it. SourceForge requests
that we put it on each web page for our project, and to give us
incentive to do so, they will not track the number of hits for our
project web pages unless we put this link in. To track hits, see
"http://sourceforge.net/project/stats/index.php?report=months&group_id=61934".
-->
<a target="_blank" href="http://sourceforge.net/">
<img src="http://sourceforge.net/sflogo.php?group_id=61934&amp;type=1"
width="88" height="31" border="0" alt="SourceForge Logo">
</a>
<!-- AGAIN, DO NOT DELETE THE SF.NET LOGO. -->
</p>
</div>
</td>
</tr>
</table>
</div>
</body>
</html>