762f923154
adding achens before the problem glyphs in our truncated output.
448 lines
15 KiB
HTML
448 lines
15 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
|
|
<!-- @author David Chandler -->
|
|
<!-- @date-created May 18, 2003 -->
|
|
<!-- @editor Emacs, baby! -->
|
|
|
|
<!--
|
|
==============
|
|
INSERT KEYWORDS AND DESCRIPTION HERE
|
|
==============
|
|
-->
|
|
|
|
|
|
<meta name="keywords" content="tibetan fonts, tibetan software, digital ethnography">
|
|
|
|
<meta name="description" content="This presents intellectual and electronic tools for using Tibetan language in a digital medium and for viewing the THDL site.">
|
|
|
|
<!--
|
|
==============
|
|
INSERT PAGE TITLE HERE
|
|
==============
|
|
In order to facilitate the use of the unicode character set the charset declaration will be set equal to utf-8
|
|
Always incorporate Tibetan and Himalayan
|
|
-->
|
|
|
|
<title>Tibetan Machine Web Converter</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-utf-8">
|
|
|
|
<!--
|
|
==============
|
|
THIS IS THE STYLE FOR THE BODY; IT'S WHAT CALLS ON THE BANNER AS A BACKGROUND IMAGE AND SETS UP THE BACKGROUND COLOR. MAKE SURE TO LINK TO THE BANNER HERE.
|
|
==============
|
|
-->
|
|
|
|
<style type="text/css">
|
|
<!--
|
|
body {background-attachment: scroll; background-image: url(http://iris.lib.virginia.edu/tibet/images/bannerTools.gif); background-repeat: no-repeat; background-position: left top; backgroud-color: white}
|
|
-->
|
|
</style>
|
|
|
|
<!--
|
|
==============
|
|
THE STYLE FOR THE TEXT, ETC. IT GOES TO THE TOP-LEVEL STYLE PAGE. IS THERE A NEED FOR EACH COLLECTION TO HAVE INDIVIDUAL STYLE PAGES?
|
|
==============
|
|
-->
|
|
|
|
<link rel="stylesheet" href="http://iris.lib.virginia.edu/tibet/style/tools.css">
|
|
|
|
<!--
|
|
==============
|
|
MAKES NETSCAPE RELOAD IF THE WINDOW IS RESIZED
|
|
==============
|
|
-->
|
|
|
|
<script language="JavaScript">
|
|
<!--
|
|
function MM_reloadPage(init) { //reloads the window if Nav4 resized
|
|
if (init==true) with (navigator) {if ((appName=="Netscape")&&(parseInt(appVersion)==4)) {
|
|
document.MM_pgW=innerWidth; document.MM_pgH=innerHeight; onresize=MM_reloadPage; }}
|
|
else if (innerWidth!=document.MM_pgW || innerHeight!=document.MM_pgH) location.reload();
|
|
}
|
|
MM_reloadPage(true);
|
|
// -->
|
|
</script>
|
|
</head>
|
|
|
|
<!--
|
|
==============
|
|
SETS THE PAGE MARGINS TO "0" SO THE MENU DOESN'T GET SCREWED UP
|
|
==============
|
|
-->
|
|
|
|
<body topmargin="0" leftmargin="0" marginwidth="0" marginheight="0">
|
|
|
|
|
|
<!--
|
|
==============
|
|
THE JAVASCRIPT MENUS
|
|
==============
|
|
-->
|
|
|
|
<script type='text/javascript'>
|
|
//HV Menu v5- by Ger Versluis (http://www.burmees.nl/)
|
|
//Submitted to Dynamic Drive (http://www.dynamicdrive.com)
|
|
//Visit http://www.dynamicdrive.com for this script and more
|
|
|
|
function Go(){return}
|
|
|
|
//==============
|
|
// --CALL UP THE MENU. THE 1ST OF THE CHOICES HAS THE VARIABLES
|
|
// AND NEEDS TO BE UPDATED FOR EACH SECTION.
|
|
// MENU_LOADER.JS WILL ALWAYS BE THE SAME.
|
|
//==============
|
|
|
|
|
|
</script>
|
|
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/menu_tools.js'></script>
|
|
<script type='text/javascript' src='http://iris.lib.virginia.edu//tibet/scripts/menu_loader.js'></script>
|
|
<script type="text/javascript" src="http://iris.lib.virginia.edu//tibet/scripts/thdl_scripts.js"></script>
|
|
<noscript> Your browser does not support Javascript menus. Please utilize the
|
|
Site Map for navigation.</noscript>
|
|
|
|
|
|
<div align="left" style="position:absolute; left:0px; top:0px; width:100px; height:50px; z-index:7">
|
|
<a href="http://iris.lib.virginia.edu/tibet/index.html">
|
|
<img src="http://iris.lib.virginia.edu/tibet/images/spacer.gif" style="border-width:0" width="100" height="50">
|
|
</a>
|
|
</div>
|
|
<div id="Layer1" style="position:absolute; left:100px; top:28px; width:708px; height:20px; z-index:1">
|
|
|
|
<!--
|
|
==============
|
|
INSERT THE BREADCRUMBS
|
|
==============
|
|
A HREF tags should not be applied to current page, e.g. THDL and Collections get link, Literature and Home do not. If you were on a subpage of Literature, then Literature would link to the Literature home page.
|
|
-->
|
|
|
|
<div align="right"><font color="#000000"><A HREF="http://iris.lib.virginia.edu/tibet/index.html">THDL</A> : <A HREF="http://iris.lib.virginia.edu/tibet/tools/index.html">Tools</A> : <a href="http://iris.lib.virginia.edu/tibet/tools/software.html">Software</a> : Tibetan Machine Web Converter</a></font>
|
|
</div>
|
|
</div>
|
|
|
|
<!--
|
|
==============
|
|
INSERT THE MENU
|
|
==============
|
|
-->
|
|
|
|
<div id='MenuPos' style="text-size:9px; layer-background-color:#CCCCCC; background-color: #CCCCCC; position:absolute; left:0px; top:50px; width:808px"><table width="808" cellpadding="0" cellspacing="0" height="19"><tr><td><p style="font-size:9px">Menu loading...</p></td></tr></table></div>
|
|
|
|
|
|
<!--
|
|
==============
|
|
MAKE THE LAYER THAT WILL HOLD THE CONTENT OF THE PAGE - THE TEXT, IMAGES, WHATEVER. THIS LAYER WILL BE CLOSED AT THE END OF THIS HTML DOCUMENT
|
|
==============
|
|
Maximum table width not to exceed 750
|
|
All images must have borders of 1 pixel
|
|
No image will exceed 325 X 325 height and width measurements
|
|
Position attribute on layer3 may need to be changed to absolute to accomodate Netscape
|
|
-->
|
|
|
|
<div id="Layer3" style="position:relative; left:7px; top:80px; width:801px; height:396px; z-index:1; overflow: visible; background-color: #FFFFFF; layer-background-color: #FFFFFF; border: 1px none #000000" >
|
|
|
|
<table width="750" border="0" cellspacing="0" cellpadding="10">
|
|
<tr>
|
|
<td valign="top" align="left">
|
|
|
|
<!--
|
|
=============
|
|
FOR EVERY "ADMINISTRATIVELY DISTINCT" PROJECT, THERE SHOULD BE A CREDIT FOR THE PROVIDER OF THE INFO
|
|
=============
|
|
-->
|
|
|
|
<script langauge="JavaScript">
|
|
function openWin(url, name) { popupWin = window.open(url, name,"resizable=1,scrollbars=1,toolbar=0,width=400 ,height=450")
|
|
}
|
|
</script>
|
|
|
|
|
|
<!--
|
|
============
|
|
THIS LINK WILL OPEN A SEPARATE WINDOW THAT WILL PROVIDE INFORMATION REGARDING THE ADVISORY BOARD MEMBERS. ALSO PROVIDED IN THE PAGE WILL BE PARTICIPATION AND DONATION INFORMATION.
|
|
============
|
|
-->
|
|
|
|
|
|
<!--
|
|
==============
|
|
SETS THE BODY TEXT TO JUSTIFIED
|
|
==============
|
|
-->
|
|
|
|
<div align=justify>
|
|
|
|
|
|
<!--
|
|
=============
|
|
INSERT LINK TO GUIDED TOUR HERE -uncomment when ready.
|
|
=============
|
|
-->
|
|
|
|
|
|
<!--
|
|
==============
|
|
INSERT BODY TEXT HERE
|
|
|
|
The first section of text is the short "introduction" about the Theme and the various discplines that have a vested interest in them.
|
|
Design principle: Bold the first few words of this text section.
|
|
==============
|
|
-->
|
|
<h2>Tibetan Machine Web Converter</h2>
|
|
|
|
<p>
|
|
In the same JAR file as Jskad, power users will find a command-line
|
|
utility that converts Tibetan documents from one digital
|
|
representation to another. The converter embodies the same
|
|
technology as Jskad itself, but often works even when Jskad fails
|
|
due to Java's presently poor support for viewing RTF
|
|
documents. This command-line utility converts a Tibetan
|
|
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
|
|
either of these three output formats:
|
|
</p>
|
|
<ul>
|
|
<li>RTF files in Unicode</li>
|
|
<li>RTF files with the appropriate THDL Extended Wylie (Wylie) used
|
|
instead of TMW</li>
|
|
<li>RTF files in Tibetan Machine (used in legacy systems)</li>
|
|
</ul>
|
|
|
|
<p>
|
|
In addition, this converter can convert Tibetan Machine RTF files to
|
|
Tibetan Machine Web RTF files, and takes precautions to ensure that
|
|
only a 100% perfect conversion is done in both directions
|
|
(TM->TMW and TMW>TM). One such precaution is that two
|
|
independent teams (Garrett and Garson, Chandler) turned the Tibetan
|
|
Machine Web <a
|
|
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
|
documentation</a> into TM<->TMW tables. These tables
|
|
were compared, giving full confidence that the tables are as
|
|
accurate as the documentation (which has a <a
|
|
href="http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515">
|
|
few flaws</a> itself). That documentation has not been
|
|
extensively verified against the actual fonts, however.
|
|
Another precaution is that any unknown characters cause the
|
|
conversion to fail, and the result is a document containing merely
|
|
the unknown characters. (There are some known, illegal glyphs
|
|
created by Tibet Doc, and the converter handles the ones it knows of
|
|
and treats the rest as unknown.)
|
|
</p>
|
|
|
|
<p>
|
|
This converter is smart enough to solve the "curly-brace
|
|
problem", wherein Tahoma '{', '}', and '\' characters appear
|
|
instead of the TMW stacks they are supposed to represent. This
|
|
problem originates with certain versions of Microsoft Word's Rich
|
|
Text Format writing capabilities.
|
|
</p>
|
|
|
|
<p>
|
|
Further, this converter gives a polite error message when a given
|
|
.rtf file simply cannot be read by the version of Java used.
|
|
</p>
|
|
|
|
<p>
|
|
Perhaps most importantly, the converter has a
|
|
<tt>--find-some-non-tmw</tt> mode of operation that gives you, the
|
|
user, confidence that RTF reading and writing idiosyncrasies are not
|
|
going to interfere with a flawless conversion. It does so by
|
|
printing out the first occurrence of a given character in a non-TMW
|
|
font. Here is some example output:
|
|
</p>
|
|
<pre>
|
|
java -cp "c:\my thdl tools\Jskad.jar" \
|
|
org.thdl.tib.input.TibetanConverter \
|
|
--find-some-non-tmw \
|
|
"Dalai Lama Fifth History 01.rtf"
|
|
|
|
Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39
|
|
Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45
|
|
Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66
|
|
Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219
|
|
Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237
|
|
Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754
|
|
</pre>
|
|
|
|
<p>
|
|
Given the above output, you can be sure that a flawless conversion
|
|
(barring the appearance of <a href="#knownbugs">known bugs</a>) will
|
|
result when you run <tt>java -cp "c:\my thdl tools\Jskad.jar"
|
|
org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth
|
|
History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended
|
|
Wylie.rtf"</tt>. (Note that the '>' causes the output to be
|
|
directed to the file named thereafter; this is quite handy.)
|
|
This is because the only text in the input file besides Tibetan is
|
|
whitespace and the Tahoma characters <tt>'{'</tt>, <tt>'}'</tt>, and
|
|
<tt>'\'</tt>. These Tahoma characters are understood by the tool;
|
|
they are symptoms of the "curly-brace problem".
|
|
</p>
|
|
|
|
<h3>Failed Conversions</h3>
|
|
|
|
<p>
|
|
In this section, you'll learn how to tell if a conversion has
|
|
succeeded in full, ran into minor problems, or failed altogether.
|
|
</p>
|
|
|
|
<h4>TMW to Wylie</h4>
|
|
|
|
<p>
|
|
Note that some TMW glyphs have no transliteration in Exteded
|
|
Wylie. When you encounter such a glyph, you'll find
|
|
<tt>\tmwXYYY</tt> in your output, where X tells you which TMW font
|
|
the troublesome glyph comes from and YYY is the decimal number of
|
|
the glyph in that font (which is a number between 000 and 255
|
|
inclusive, usually between 33 and 126). The following are
|
|
values corresponding to X:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
When X is 0, the TibetanMachineWeb font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 1, the TibetanMachineWeb1 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 2, the TibetanMachineWeb2 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 3, the TibetanMachineWeb3 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 4, the TibetanMachineWeb4 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 5, the TibetanMachineWeb5 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 6, the TibetanMachineWeb6 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 7, the TibetanMachineWeb7 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 8, the TibetanMachineWeb8 font contains the glyph.
|
|
</li>
|
|
<li>
|
|
When X is 9, the TibetanMachineWeb9 font contains the glyph.
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
Upon finding a <tt>\tmwXYYY</tt> sequence in your output, you should
|
|
consult the <a
|
|
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
|
documentation</a> for the specific TMW font named. Find the
|
|
glyph (by its YYY value) and decide how to proceed. If you
|
|
find a glyph that you believe should have been converted into
|
|
Extended Wylie by the tool, please report this as a bug through the
|
|
SourceForge website or via e-mail.
|
|
</p>
|
|
|
|
<h4>Other Conversions</h4>
|
|
|
|
<p>
|
|
The other conversions are all-or-nothing. That is, if you run
|
|
into any trouble whatsoever, the result will be a file containing
|
|
just the problematic glyphs, each preceded by achen (i.e., U+0F68,
|
|
the letter whose THDL Extended Wylie representation is 'a').
|
|
If your result is as long as your input, then the conversion went
|
|
flawlessly.
|
|
</p>
|
|
|
|
<p>
|
|
There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091])
|
|
that has no Tibetan Machine equivalent. This glyph is the only
|
|
TMW glyph that can cause a TMW->TM conversion to fail. It
|
|
is fairly common, though, especially if you've used Jskad to prepare
|
|
your document. It might be appropriate to change the document
|
|
to use TibetanMachineWeb7, glyph 90 [\tmw7090], a similar glyph that
|
|
does have a TM equivalent.
|
|
</p>
|
|
|
|
<p>
|
|
You might consider using Jskad to convert documents that give
|
|
errors, as it has better error reporting and can tell you just
|
|
what's wrong.
|
|
</p>
|
|
<p>
|
|
If you ever encounter problems in a TM->TMW conversion, please
|
|
send us mail with the error report (and the problem input document's
|
|
resulting document) so that we can improve our tools.
|
|
</p>
|
|
|
|
<h3>Invoking the Converter</h3>
|
|
|
|
<p>
|
|
First add Jskad.jar to your CLASSPATH. You can do this by
|
|
setting an environment variable CLASSPATH to contain the absolute
|
|
path of the Jskad.jar file and then running the command <tt>java
|
|
org.thdl.tib.input.TibetanConverter</tt>. Alternatively, you
|
|
can use <code>java -cp "c:\my tibetan documents\Jskad.jar"
|
|
org.thdl.tib.input.TibetanConverter</code> where you put in the
|
|
appropriate path to Jskad.jar. You will see usage information
|
|
appear if you do this correctly; you'll see a message like
|
|
<code>java.lang.NoClassDefFoundError:
|
|
org/thdl/tib/input/TibetanConverter; Exception in thread
|
|
"main"</code> if you've not correctly told Java where to find
|
|
Jskad.jar.
|
|
</p>
|
|
|
|
<h3><a name="knownbugs"></a>Known Bugs</h3>
|
|
|
|
<p>
|
|
All known bugs are listed in this section. They're more likely
|
|
to be fixed if users complain, so complain away.
|
|
</p>
|
|
|
|
<p>
|
|
First, if the TMW given is not syntactically legal, then the Wylie
|
|
that results will not necessarily yield, if imported into Jskad, the
|
|
same Tibetan with which the converter started. The glyphs
|
|
corresponding to the Wylie 'jaskadaskeda' have this problem, for
|
|
example.
|
|
</p>
|
|
|
|
<p>
|
|
Please
|
|
|
|
<a href="mailto:thdltools-devel@lists.sourceforge.net">
|
|
e-mail us</a>
|
|
|
|
your comments about this page.
|
|
</p>
|
|
|
|
<p>
|
|
The
|
|
<a target="_blank" href="http://www.sourceforge.net/projects/thdltools">
|
|
THDL Tools</a>
|
|
project is generously hosted by:
|
|
<!--
|
|
|
|
DO NOT DELETE THE SF.NET LOGO.
|
|
|
|
We have a choice of colors and sizes for this logo (see
|
|
"https://sourceforge.net/docman/display_doc.php?docid=790&group_id=1"),
|
|
but we do not have the option of removing it. SourceForge requests
|
|
that we put it on each web page for our project, and to give us
|
|
incentive to do so, they will not track the number of hits for our
|
|
project web pages unless we put this link in. To track hits, see
|
|
"http://sourceforge.net/project/stats/index.php?report=months&group_id=61934".
|
|
|
|
-->
|
|
<a target="_blank" href="http://sourceforge.net/">
|
|
<img src="http://sourceforge.net/sflogo.php?group_id=61934&type=1"
|
|
width="88" height="31" border="0" alt="SourceForge Logo">
|
|
</a>
|
|
<!-- AGAIN, DO NOT DELETE THE SF.NET LOGO. -->
|
|
</p>
|
|
</div>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</div>
|
|
|
|
|
|
</body>
|
|
</html>
|