Updated documentation on the converters. Added significant documentation for ACIP->Tibetan converters.
This commit is contained in:
parent
ab83c76d8b
commit
b7104fd188
File diff suppressed because it is too large
Load Diff
|
@ -7,7 +7,7 @@
|
||||||
|
|
||||||
|
|
||||||
<head>
|
<head>
|
||||||
<title>Tibetan Machine Web Converter</title>
|
<title>Converters in Jskad</title>
|
||||||
|
|
||||||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||||||
<script type="text/javascript" src="http://iris.lib.virginia.edu/tibet/scripts/thdl_scripts.js"></script>
|
<script type="text/javascript" src="http://iris.lib.virginia.edu/tibet/scripts/thdl_scripts.js"></script>
|
||||||
|
@ -44,259 +44,141 @@
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
<div id="breadcrumbs">
|
<div id="breadcrumbs">
|
||||||
<a href="http://iris.lib.virginia.edu/tibet/index.html">Home</a> > <a href="index.html">Tools</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/software.html">Software</a> > Nightly Builds
|
<a href="http://iris.lib.virginia.edu/tibet/index.html">Home</a> > <a href="index.html">Tools</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/allfonts.html">Fonts & Input</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/conv.html">Converters</a> > Converters in Jskad
|
||||||
</div>
|
</div>
|
||||||
</div><!--END banner-->
|
</div><!--END banner-->
|
||||||
|
|
||||||
|
|
||||||
<div id="main">
|
<div id="main">
|
||||||
|
|
||||||
<h2>Tibetan Machine Web Converter</h2>
|
<h2>Converters in Jskad</h2>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
In recent versions of Jskad, the 'Tools' menu has an option 'Launch
|
In recent versions of Jskad, the 'Tools' menu has an option 'Launch
|
||||||
Converter...'. If you use that option, you will find a
|
Converter...'. If you use that option, you will find a set of
|
||||||
first-class Tibetan-to-Tibetan and Tibetan-to-Wylie converter.
|
first-class converters that can convert digital Tibetan from one
|
||||||
That converter has a user-friendly GUI interface, and it tells you
|
form to another. (A command-line interface is also available;
|
||||||
when things go wrong (even things as subtle as your having selected
|
see below.)
|
||||||
the wrong conversion). If you need a command-line interface to
|
|
||||||
that converter, however, read on.
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
In the same JAR file as Jskad, power users will find a command-line
|
Some of the converters there are based on Jskad technology, but all
|
||||||
utility that converts Tibetan documents from one digital
|
are first-class in the sense that they are well though-out, well
|
||||||
representation to another. The converter embodies the same
|
tested<!-- DLC LINK TO V&V story -->, and handle errors
|
||||||
technology as Jskad itself, but often works even when Jskad fails
|
nicely. Certain features in Jskad are quite buggy; for
|
||||||
due to Java's presently poor support for viewing RTF
|
example, its keyboards do not work as desired, but even when they
|
||||||
documents. This command-line utility converts a Tibetan
|
do, they silently drop certain input characters. Do not worry
|
||||||
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
|
that the converters described here suffer from these flaws; not one
|
||||||
either of these three output formats:
|
character of input is ever silently dropped. It is the
|
||||||
|
intention of the developers that a Buddhist canon one day could be
|
||||||
|
entrusted to these converters. Before you do that, though,
|
||||||
|
please contact <a
|
||||||
|
href="mailto:thdltools-devel@lists.sourceforge.net">the
|
||||||
|
developers</a> to be sure that this documentation is up-to-date and
|
||||||
|
to develop a custom validation and verification plan. None of
|
||||||
|
the converters has yet been hand-validated on a real text of any
|
||||||
|
size, but extensive unit testing has been performed for each
|
||||||
|
conversion at every stage of development.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
In the same JAR file as Jskad, power users will find a command-line
|
The following converters are available:
|
||||||
utility that converts Tibetan documents from one digital
|
|
||||||
representation to another. The converter embodies the same
|
|
||||||
technology as Jskad itself, but often works even when Jskad fails
|
|
||||||
due to Java's presently poor support for viewing RTF
|
|
||||||
documents. This command-line utility converts a Tibetan
|
|
||||||
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
|
|
||||||
either of these three output formats:
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li>RTF files in Unicode</li>
|
<li><a href="ACIP_To_Tibetan_Converter.html">ACIP->Unicode</a>
|
||||||
<li>RTF files with the appropriate THDL Extended Wylie (Wylie) used
|
(Text->Text)</li>
|
||||||
instead of TMW</li>
|
|
||||||
<li>RTF files in Tibetan Machine (used in legacy systems)</li>
|
<li><a href="ACIP_To_Tibetan_Converter.html">ACIP->Tibetan
|
||||||
|
Machine Web</a> (Text->RTF)</li>
|
||||||
|
|
||||||
|
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->ACIP</a> (RTF->RTF)</li>
|
||||||
|
|
||||||
|
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->ACIP</a> (RTF->Text)</li>
|
||||||
|
|
||||||
|
<li><a href="TMW_or_TM_To_X_Converters.html">TM->TMW</a> (RTF->RTF)</li>
|
||||||
|
|
||||||
|
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->TM</a> (RTF->RTF)</li>
|
||||||
|
|
||||||
|
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->Unicode</a> (RTF->RTF)</li>
|
||||||
|
|
||||||
|
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->EWTS</a> (RTF->RTF)</li>
|
||||||
|
|
||||||
|
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->EWTS</a> (RTF->Text)</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
In addition, this converter can convert Tibetan Machine RTF files to
|
Moreover, EWTS->Unicode and EWTS->TMW converters are in
|
||||||
Tibetan Machine Web RTF files, and takes precautions to ensure that
|
development. <a
|
||||||
only a 100% perfect conversion is done in both directions
|
href="http://iris.lib.virginia.edu/tibet/tools/wyword.html">Wylie
|
||||||
(TM->TMW and TMW>TM). One such precaution is that two
|
Word 2.0</a> has better EWTS support at present.
|
||||||
independent teams (Garrett and Garson, Chandler) turned the Tibetan
|
|
||||||
Machine Web <a
|
|
||||||
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
|
||||||
documentation</a> into TM<->TMW tables. These tables
|
|
||||||
were compared, giving full confidence that the tables are as
|
|
||||||
accurate as the documentation (which has a <a
|
|
||||||
href="http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515">
|
|
||||||
few flaws</a> itself). That documentation has not been
|
|
||||||
extensively verified against the actual fonts, however.
|
|
||||||
Another precaution is that any unknown characters cause the
|
|
||||||
conversion to fail, and the result is a document containing merely
|
|
||||||
the unknown characters. (There are some known, illegal glyphs
|
|
||||||
created by Tibet Doc, and the converter handles the ones it knows of
|
|
||||||
and treats the rest as unknown.)
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
This converter is smart enough to solve the "curly-brace
|
Above, <em>RTF</em> is an abbreviation for Rich Text Format;
|
||||||
problem", wherein Tahoma '{', '}', and '\' characters appear
|
<em>Text</em> refers to an unformatted text file (in one of several
|
||||||
instead of the TMW stacks they are supposed to represent. This
|
encodings); <em>TMW</em> refers to the <a
|
||||||
problem originates with certain versions of Microsoft Word's Rich
|
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html">Tibetan
|
||||||
Text Format writing capabilities.
|
Machine Web</a> font; <em>TM</em> refers to the <a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/tools/tm.html">Tibetan
|
||||||
|
Machine</a> font; <em>Unicode</em> refers to the Tibetan <a
|
||||||
|
href="http://www.unicode.org/">Unicode</a> characters in the range
|
||||||
|
U+0F00-U+0FFF mainly but also sometimes includes other Unicode
|
||||||
|
characters; <em>EWTS</em> refers to Tibetan encoded using the <a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/collections/langling/ewts/">Extended
|
||||||
|
Wylie Transliteration Scheme</a>, a Roman transliteration scheme;
|
||||||
|
<em>ACIP</em> refers to Tibetan encoded using <a
|
||||||
|
href="http://asianclassics.org">Asian Classics Input Project</a>
|
||||||
|
(ACIP) <a
|
||||||
|
href="http://asianclassics.org/download/tibetancode/ticode.pdf">Tibetan
|
||||||
|
Input Code</a>, another Roman transliteration scheme.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<a name="#invok"></a><h3>Invoking the Converters</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The converters have a user-friendly GUI interface, and it tells you
|
||||||
|
when things go wrong (from things like the lack of a needed glyph in
|
||||||
|
the output font to things like your having selected the wrong
|
||||||
|
conversion). The GUI is not properly documented here, and
|
||||||
|
probably will not be until you contact <a
|
||||||
|
href="mailto:thdltools-devel@lists.sourceforge.net">the
|
||||||
|
developers</a> and ask them to document it.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Further, this converter gives a polite error message when a given
|
To use the GUI, first launch <a
|
||||||
.rtf file simply cannot be read by the version of Java used.
|
href="http://iris.lib.virginia.edu/tibet/tools/jskad.html">Jskad</a>
|
||||||
|
itself. Then select 'Launch Converter...' from the 'Tools'
|
||||||
|
menu. Let's hope from there it's self-explanatory, because it
|
||||||
|
is not yet properly documented.<!-- DLC -->
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Perhaps most importantly, the converter has a
|
For batch conversions of many files, a command-line interface to the
|
||||||
<tt>--find-some-non-tmw</tt> mode of operation that gives you, the
|
converters may be more suitable than the GUI interface. In the
|
||||||
user, confidence that RTF reading and writing idiosyncrasies are not
|
same JAR file as Jskad, power users will find a command-line utility
|
||||||
going to interfere with a flawless conversion. It does so by
|
that can do everything the GUI interface to the converters can
|
||||||
printing out the first occurrence of a given character in a non-TMW
|
do. To learn how to invoke it, see the output you get when you
|
||||||
font. Here is some example output:
|
use this invocation:
|
||||||
</p>
|
</p>
|
||||||
<pre>
|
<pre>
|
||||||
java -cp "c:\my thdl tools\Jskad.jar" \
|
java -cp "c:\my thdl tools\Jskad.jar" \
|
||||||
org.thdl.tib.input.TibetanConverter \
|
org.thdl.tib.input.TibetanConverter --help
|
||||||
--find-some-non-tmw \
|
|
||||||
"Dalai Lama Fifth History 01.rtf"
|
|
||||||
|
|
||||||
Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39
|
|
||||||
Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45
|
|
||||||
Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66
|
|
||||||
Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219
|
|
||||||
Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237
|
|
||||||
Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754
|
|
||||||
</pre>
|
</pre>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Given the above output, you can be sure that a flawless conversion
|
where you must replace "c:\my thdl tools\Jskad.jar" with the
|
||||||
(barring the appearance of <a href="#knownbugs">known bugs</a>) will
|
appropriate path on your system.
|
||||||
result when you run <tt>java -cp "c:\my thdl tools\Jskad.jar"
|
|
||||||
org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth
|
|
||||||
History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended
|
|
||||||
Wylie.rtf"</tt>. (Note that the '>' causes the output to be
|
|
||||||
directed to the file named thereafter; this is quite handy.)
|
|
||||||
This is because the only text in the input file besides Tibetan is
|
|
||||||
whitespace and the Tahoma characters <tt>'{'</tt>, <tt>'}'</tt>, and
|
|
||||||
<tt>'\'</tt>. These Tahoma characters are understood by the tool;
|
|
||||||
they are symptoms of the "curly-brace problem".
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h3>Failed Conversions</h3>
|
<!-- DLC link to V&V story... -->
|
||||||
|
|
||||||
<p>
|
<h5>License</h5>
|
||||||
In this section, you'll learn how to tell if a conversion has
|
|
||||||
succeeded in full, ran into minor problems, or failed altogether.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h4>TMW to Wylie</h4>
|
<p>Both the converters and this document are released under the <a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/tools/thdl_license.txt">THDL
|
||||||
|
Open Community License Version 1.0</a>.</p>
|
||||||
|
|
||||||
<p>
|
|
||||||
<font color="red">
|
|
||||||
This section is too up-to-date -- this is documenting plans for the
|
|
||||||
future. At present, an error message like
|
|
||||||
<code><<[[JSKAD_TMW_TO_WYLIE_ERROR_NO_SUCH_WYLIE: Cannot
|
|
||||||
convert DuffCode <duffcode font=TibetanMachineWeb7 charNum=72
|
|
||||||
character=H/> to THDL Extended Wylie. Please see the
|
|
||||||
documentation for the TMW font and transcribe this
|
|
||||||
yourself.]]>></code> appears.
|
|
||||||
</font>
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Note that some TMW glyphs have no transliteration in Exteded
|
|
||||||
Wylie. When you encounter such a glyph, you'll find
|
|
||||||
<tt>\tmwXYYY</tt> in your output, where X tells you which TMW font
|
|
||||||
the troublesome glyph comes from and YYY is the decimal number of
|
|
||||||
the glyph in that font (which is a number between 000 and 255
|
|
||||||
inclusive, usually between 33 and 126). The following are
|
|
||||||
values corresponding to X:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li>
|
|
||||||
When X is 0, the TibetanMachineWeb font contains the glyph.
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
When X is 1, the TibetanMachineWeb1 font contains the glyph.
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
When X is 2, the TibetanMachineWeb2 font contains the glyph.
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
When X is 3, the TibetanMachineWeb3 font contains the glyph.
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
When X is 4, the TibetanMachineWeb4 font contains the glyph.
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
When X is 5, the TibetanMachineWeb5 font contains the glyph.
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
When X is 6, the TibetanMachineWeb6 font contains the glyph.
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
When X is 7, the TibetanMachineWeb7 font contains the glyph.
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
When X is 8, the TibetanMachineWeb8 font contains the glyph.
|
|
||||||
</li>
|
|
||||||
<li>
|
|
||||||
When X is 9, the TibetanMachineWeb9 font contains the glyph.
|
|
||||||
</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
Upon finding a <tt>\tmwXYYY</tt> sequence in your output, you should
|
|
||||||
consult the <a
|
|
||||||
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
|
||||||
documentation</a> for the specific TMW font named. Find the
|
|
||||||
glyph (by its YYY value) and decide how to proceed. If you
|
|
||||||
find a glyph that you believe should have been converted into
|
|
||||||
Extended Wylie by the tool, please report this as a bug through the
|
|
||||||
SourceForge website or via e-mail.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h4>Other Conversions</h4>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
The other conversions are all-or-nothing. That is, if you run
|
|
||||||
into any trouble whatsoever, the result will be a file containing
|
|
||||||
just the problematic glyphs, each preceded by achen (i.e., U+0F68,
|
|
||||||
the letter whose THDL Extended Wylie representation is 'a').
|
|
||||||
These glyphs will be bracketed on the left by U+0F3C (for which the
|
|
||||||
THDL Extended Wylie is '(') and on the right by U+0F3D (for which
|
|
||||||
the THDL Extended Wylie is ')'). If your result is as long as
|
|
||||||
your input, then the conversion went flawlessly.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091])
|
|
||||||
that has no Tibetan Machine equivalent. This glyph is the only
|
|
||||||
TMW glyph that can cause a TMW->TM conversion to fail. It
|
|
||||||
is fairly common, though, especially if you've used Jskad to prepare
|
|
||||||
your document. It might be appropriate to change the document
|
|
||||||
to use TibetanMachineWeb7, glyph 90 [\tmw7090], a similar glyph that
|
|
||||||
does have a TM equivalent.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
You might consider using Jskad to convert documents that give
|
|
||||||
errors, as it has better error reporting and can tell you just
|
|
||||||
what's wrong.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
If you ever encounter problems in a TM->TMW conversion, please
|
|
||||||
send us mail with the error report (and the problem input document's
|
|
||||||
resulting document) so that we can improve our tools.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h3>Invoking the Converter</h3>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
First add Jskad.jar to your CLASSPATH. You can do this by
|
|
||||||
setting an environment variable CLASSPATH to contain the absolute
|
|
||||||
path of the Jskad.jar file and then running the command <tt>java
|
|
||||||
org.thdl.tib.input.TibetanConverter</tt>. Alternatively, you
|
|
||||||
can use <code>java -cp "c:\my tibetan documents\Jskad.jar"
|
|
||||||
org.thdl.tib.input.TibetanConverter</code> where you put in the
|
|
||||||
appropriate path to Jskad.jar. You will see usage information
|
|
||||||
appear if you do this correctly; you'll see a message like
|
|
||||||
<code>java.lang.NoClassDefFoundError:
|
|
||||||
org/thdl/tib/input/TibetanConverter; Exception in thread
|
|
||||||
"main"</code> if you've not correctly told Java where to find
|
|
||||||
Jskad.jar.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<h3><a name="knownbugs"></a>Known Bugs</h3>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
All known bugs are listed in this section. They're more likely
|
|
||||||
to be fixed if users complain, so complain away.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
There are no known bugs at present.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Please
|
Please
|
||||||
|
|
|
@ -0,0 +1,368 @@
|
||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
|
||||||
|
<!-- @author David Chandler -->
|
||||||
|
<!-- @editor Emacs, baby! -->
|
||||||
|
|
||||||
|
|
||||||
|
<head>
|
||||||
|
<title>Converting from TM or TMW</title>
|
||||||
|
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||||||
|
<script type="text/javascript" src="http://iris.lib.virginia.edu/tibet/scripts/thdl_scripts.js"></script>
|
||||||
|
<link rel="stylesheet" type="text/css" href="http://iris.lib.virginia.edu/tibet/style/thdl-styles.css"/>
|
||||||
|
</head>
|
||||||
|
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<div id="banner">
|
||||||
|
<a id="logo" href="http://iris.lib.virginia.edu/tibet/index.html"><img id="test" alt="THDL Logo" src="http://iris.lib.virginia.edu/tibet/images/logo.png"/></a>
|
||||||
|
<h1>The Tibetan & Himalayan Digital Library</h1>
|
||||||
|
|
||||||
|
<div id="menubar">
|
||||||
|
<script type='text/javascript'>function Go(){return}</script>
|
||||||
|
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/thdl_menu_config.js'></script>
|
||||||
|
|
||||||
|
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/menu_new.js'></script>
|
||||||
|
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/menu9_com.js'></script>
|
||||||
|
<noscript><p>Your browser does not support javascript.</p></noscript>
|
||||||
|
<div id='MenuPos' >Menu Loading... </div>
|
||||||
|
</div><!--END menubar-->
|
||||||
|
|
||||||
|
</div><!--END banner-->
|
||||||
|
|
||||||
|
<div id="sub_banner">
|
||||||
|
<div id="search">
|
||||||
|
<form method="get" action="http://www.google.com/u/thdl">
|
||||||
|
<p>
|
||||||
|
<input type="text" name="q" id="q" size="15" maxlength="255" value="" />
|
||||||
|
<input type="submit" name="sa" id="sa" value="Search"/>
|
||||||
|
<input type="hidden" name="hq" id="hq" value="inurl:iris.lib.virginia.edu"/>
|
||||||
|
</p>
|
||||||
|
</form>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
<div id="breadcrumbs">
|
||||||
|
<a href="http://iris.lib.virginia.edu/tibet/index.html">Home</a> > <a href="index.html">Tools</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/allfonts.html">Fonts & Input</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/conv.html">Converters</a> > <a href="TMW_RTF_TO_THDL_WYLIE.html">Converters in Jskad</a> > Converting from TM or TMW
|
||||||
|
</div>
|
||||||
|
</div><!--END banner-->
|
||||||
|
|
||||||
|
|
||||||
|
<div id="main">
|
||||||
|
|
||||||
|
<h2>Converting from Tibetan Machine or Tibetan Machine Web</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Among the <a href="TMW_RTF_TO_THDL_WYLIE.html">converters in
|
||||||
|
Jskad</a> are some converters that take input that is encoded to use
|
||||||
|
either the <a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/tools/tm.html">Tibetan
|
||||||
|
Machine</a> (TM) or <a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html">Tibetan
|
||||||
|
Machine Web</a> (TMW) fonts. These converters are described
|
||||||
|
here.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
First, to learn how to invoke the converters, see <a
|
||||||
|
href="TMW_RTF_TO_THDL_WYLIE.html#invok">these instructions</a>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The converters embody the same technology as <a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/tools/jskad.html">Jskad</a>
|
||||||
|
itself, but often work even when Jskad fails due to Java's presently
|
||||||
|
poor support for viewing Rich Text Format (RTF) documents.
|
||||||
|
These converters can convert a TMW-encoded RTF file to any of these
|
||||||
|
output formats:
|
||||||
|
</p>
|
||||||
|
<ul>
|
||||||
|
<li>an RTF file using <a href="http://www.unicode.org/">Unicode</a>,
|
||||||
|
a standard encoding that will be widely supported in the future</li>
|
||||||
|
|
||||||
|
<li>an RTF file using the appropriate THDL Extended Wylie (<a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/collections/langling/ewts/">EWTS</a>)
|
||||||
|
instead of TMW</li>
|
||||||
|
|
||||||
|
<li>a text file using the appropriate THDL Extended Wylie (<a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/collections/langling/ewts/">EWTS</a>)
|
||||||
|
instead of TMW</li>
|
||||||
|
|
||||||
|
<li>an RTF file using the appropriate <a
|
||||||
|
href="http://asianclassics.org">Asian Classics Input Project</a>
|
||||||
|
(ACIP) <a
|
||||||
|
href="http://asianclassics.org/download/tibetancode/ticode.pdf">Tibetan
|
||||||
|
Input Code</a> instead of TMW</li>
|
||||||
|
|
||||||
|
<li>a text file using the appropriate <a
|
||||||
|
href="http://asianclassics.org">Asian Classics Input Project</a>
|
||||||
|
(ACIP) <a
|
||||||
|
href="http://asianclassics.org/download/tibetancode/ticode.pdf">Tibetan
|
||||||
|
Input Code</a> instead of TMW</li>
|
||||||
|
|
||||||
|
<li>an RTF file using the Tibetan Machine encoding (used in legacy
|
||||||
|
systems).</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
In addition, this converter can convert a Tibetan Machine RTF file to
|
||||||
|
a Tibetan Machine Web RTF file.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<a name="vv"></a>
|
||||||
|
<p>
|
||||||
|
All the converters take precautions to ensure that only a 100%
|
||||||
|
perfect conversion is done. One such precaution is that two
|
||||||
|
independent teams (Garrett and Garson, Chandler) turned the Tibetan
|
||||||
|
Machine Web <a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
||||||
|
documentation</a> into TM<->TMW tables. These tables
|
||||||
|
were compared, giving full confidence that the tables are as
|
||||||
|
accurate as the documentation (which has a few flaws itself,
|
||||||
|
documented in the <a href="Tibetan51Errata.html">errata</a> we have
|
||||||
|
created). That documentation has been verified against the
|
||||||
|
actual fonts. David Chapman's assistance in this area has been
|
||||||
|
invaluable.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Another precaution is that any unknown characters (in the font being
|
||||||
|
converted from) cause the conversion to <a href="#failure">fail</a>,
|
||||||
|
and the result is either a document containing merely the unknown
|
||||||
|
characters or a document with conspicuous error messages
|
||||||
|
interspersed.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
These converters are smart enough to solve the "curly-brace
|
||||||
|
problem", wherein '{', '}', and '\' characters in the Tahoma
|
||||||
|
font appear instead of the TMW stacks they are supposed to
|
||||||
|
represent. This problem originates with certain versions of
|
||||||
|
Microsoft Word's Rich Text Format writing capabilities. These
|
||||||
|
converters are also smart enough to work around Java's <a
|
||||||
|
href="http://developer.java.sun.com/developer/bugParade/bugs/4907759.html">Bug
|
||||||
|
4907759</a>.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Furthermore, these converters give a polite error message when a
|
||||||
|
given RTF file simply cannot be read by the version of Java used.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
<h2>Invoking the Converters</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
See <a href="TMW_RTF_TO_THDL_WYLIE.html#invok">here</a> for details
|
||||||
|
on how to invoke the converters.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<!-- DLC TEST TMW->UNICODE F021... does that appear? -->
|
||||||
|
|
||||||
|
<a name="failure"></a><h2>Failed Conversions</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
In this section, you'll learn how to tell if a conversion has
|
||||||
|
succeeded in full, ran into minor problems, or failed altogether.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>TMW to ACIP</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
When a TMW->ACIP conversion fails, a message such as
|
||||||
|
<tt>[# JSKAD_TMW_TO_ACIP_ERROR_NO_SUCH_ACIP: Cannot convert
|
||||||
|
<glyph font=TibetanMachineWeb8 charNum=38 character=&/> to
|
||||||
|
ACIP. Please transcribe this yourself.]</tt> will appear in your
|
||||||
|
output, but it will be amidst the successfully converted text.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h3>TMW to Wylie (i.e., EWTS)</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
A TMW to EWTS conversion rarely fails; EWTS is almost entirely
|
||||||
|
comprehensive (and may have been revised to be comprehensive by the
|
||||||
|
time you read this.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
That said, you may want to search the output for EWTS constructs
|
||||||
|
that you don't like, such as <tt>\u0F39</tt>- and
|
||||||
|
<tt>\uF021</tt>-style escape sequences.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
If a TMW glyph has no transliteration according to <a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/collections/langling/ewts/">EWTS</a>,
|
||||||
|
then an error message like
|
||||||
|
<tt><<[[JSKAD_TMW_TO_WYLIE_ERROR_NO_SUCH_WYLIE: Cannot convert
|
||||||
|
<glyph font=TibetanMachineWeb7 charNum=95 character=_/> to
|
||||||
|
THDL Extended Wylie. Please see the documentation for the TM or TMW
|
||||||
|
font and transcribe this yourself.]]>></tt> appears in the
|
||||||
|
output.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Upon finding such a message in your output, you should consult the
|
||||||
|
<a href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
||||||
|
documentation</a> for the specific TMW font named. Find the
|
||||||
|
glyph and decide how to proceed. If you find a glyph that you
|
||||||
|
believe should have been converted into Extended Wylie by the tool,
|
||||||
|
please report this as a bug through the SourceForge website or via
|
||||||
|
e-mail.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
<h3>TMW to Unicode, TM to TMW, and TMW to TM Conversions</h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The TMW->Unicode, TM->TMW, and TMW->TM conversions are
|
||||||
|
all-or-nothing. That is, if you run into any trouble
|
||||||
|
whatsoever, the result will be a file containing just the
|
||||||
|
problematic glyphs, each preceded by a-chen (i.e., U+0F68, the
|
||||||
|
letter whose THDL Extended Wylie representation is 'a'). These
|
||||||
|
glyphs will be bracketed on the left by U+0F3C (for which the THDL
|
||||||
|
Extended Wylie is '(') and on the right by U+0F3D (for which the
|
||||||
|
THDL Extended Wylie is ')'). If your result is as long as your
|
||||||
|
input, then the conversion went flawlessly.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091])
|
||||||
|
that has no Tibetan Machine equivalent. This glyph is the only
|
||||||
|
TMW glyph that can cause a TMW->TM conversion to fail. It
|
||||||
|
is fairly common, though, especially if you've used Jskad to prepare
|
||||||
|
your document. It might be appropriate to change the document
|
||||||
|
to use TibetanMachineWeb7, glyph 90 (decimal ordinal 90, that is), a
|
||||||
|
similar glyph that does have a TM equivalent.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
You might consider using the GUI converter interface in Jskad to
|
||||||
|
convert documents that give impenetrable errors when converted by
|
||||||
|
the command-line tool, as the GUI has better error reporting and can
|
||||||
|
tell you just what's wrong.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
<h2>Finding Potential Problems Before Conversion</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The converters that take TM and TMW input deal with problematic
|
||||||
|
input in a clean way, but you might prefer the mechanism described
|
||||||
|
here.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There is a <tt>--find-some-non-tmw</tt> mode of operation that gives
|
||||||
|
you, the user, confidence that RTF reading and writing
|
||||||
|
idiosyncrasies are not going to interfere with a flawless
|
||||||
|
conversion. It does so by printing out the first occurrence of
|
||||||
|
a given character in a non-TMW font. Here is some example
|
||||||
|
output:
|
||||||
|
</p>
|
||||||
|
<pre>
|
||||||
|
java -cp "c:\my thdl tools\Jskad.jar" \
|
||||||
|
org.thdl.tib.input.TibetanConverter \
|
||||||
|
--find-some-non-tmw \
|
||||||
|
"Dalai Lama Fifth History 01.rtf"
|
||||||
|
|
||||||
|
Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39
|
||||||
|
Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45
|
||||||
|
Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66
|
||||||
|
Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219
|
||||||
|
Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237
|
||||||
|
Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Given the above output, you can be sure that a flawless conversion
|
||||||
|
(barring the appearance of <a href="#knownbugs">known bugs</a>) will
|
||||||
|
result when you run <tt>java -cp "c:\my thdl tools\Jskad.jar"
|
||||||
|
org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth
|
||||||
|
History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended
|
||||||
|
Wylie.rtf"</tt>. (Note that the '>' causes the output to be
|
||||||
|
directed to the file named thereafter; this is quite handy.)
|
||||||
|
This is because the only text in the input file besides Tibetan is
|
||||||
|
whitespace and the Tahoma characters <tt>'{'</tt>, <tt>'}'</tt>, and
|
||||||
|
<tt>'\'</tt>. These Tahoma characters are understood by the tool;
|
||||||
|
they are symptoms of the "curly-brace problem".
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There is a similar <tt>--find-some-non-tm</tt> mode of operation,
|
||||||
|
useful for ensuring a trouble-free TM->TMW conversion.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
<a name="knownbugs"></a><h2>Known Bugs</h2>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
All known bugs are listed in this section. They're more likely
|
||||||
|
to be fixed if users complain, so complain away. And if you
|
||||||
|
ever encounter problems in a conversion that are not listed here,
|
||||||
|
please send us mail with the error report (and the problem input
|
||||||
|
document's resulting document) so that we can improve our
|
||||||
|
tools. The bugs are as follows:
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>
|
||||||
|
TMW->ACIP does not produce {KA (KHA)} to indicate differing
|
||||||
|
font sizes.
|
||||||
|
</li>
|
||||||
|
<li>
|
||||||
|
TMW to Unicode fails subtly when the TMW for {\u0F28\u0F3E} is
|
||||||
|
converted: {\u0F3E\u0F28} appears instead. [<a
|
||||||
|
href="http://sourceforge.net/tracker/index.php?func=detail&aid=855480&group_id=61934&atid=502515">855480</a>]
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h2>License</h2>
|
||||||
|
|
||||||
|
<p>Both the converters and this document are released under the <a
|
||||||
|
href="http://iris.lib.virginia.edu/tibet/tools/thdl_license.txt">THDL
|
||||||
|
Open Community License Version 1.0</a>.</p>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Please
|
||||||
|
|
||||||
|
<a href="mailto:thdltools-devel@lists.sourceforge.net">
|
||||||
|
e-mail us</a>
|
||||||
|
|
||||||
|
your comments about this page.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The
|
||||||
|
<a href="http://www.sourceforge.net/projects/thdltools">
|
||||||
|
THDL Tools</a>
|
||||||
|
project is generously hosted by:
|
||||||
|
<!--
|
||||||
|
|
||||||
|
DO NOT DELETE THE SF.NET LOGO.
|
||||||
|
|
||||||
|
We have a choice of colors and sizes for this logo (see
|
||||||
|
"https://sourceforge.net/docman/display_doc.php?docid=790&group_id=1"),
|
||||||
|
but we do not have the option of removing it. SourceForge requests
|
||||||
|
that we put it on each web page for our project, and to give us
|
||||||
|
incentive to do so, they will not track the number of hits for our
|
||||||
|
project web pages unless we put this link in. To track hits, see
|
||||||
|
"http://sourceforge.net/project/stats/index.php?report=months&group_id=61934".
|
||||||
|
|
||||||
|
-->
|
||||||
|
<a href="http://sourceforge.net/">
|
||||||
|
<img src="http://sourceforge.net/sflogo.php?group_id=61934&type=1"
|
||||||
|
width="88" height="31" alt="SourceForge Logo" />
|
||||||
|
</a>
|
||||||
|
<!-- AGAIN, DO NOT DELETE THE SF.NET LOGO. -->
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
|
||||||
|
</body>
|
||||||
|
</html>
|
Loading…
Reference in New Issue