Updated documentation on the converters. Added significant documentation for ACIP->Tibetan converters.
This commit is contained in:
parent
ab83c76d8b
commit
b7104fd188
3 changed files with 1999 additions and 215 deletions
1534
htdocs/ACIP_To_Tibetan_Converter.html
Normal file
1534
htdocs/ACIP_To_Tibetan_Converter.html
Normal file
File diff suppressed because it is too large
Load diff
|
@ -7,7 +7,7 @@
|
|||
|
||||
|
||||
<head>
|
||||
<title>Tibetan Machine Web Converter</title>
|
||||
<title>Converters in Jskad</title>
|
||||
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||||
<script type="text/javascript" src="http://iris.lib.virginia.edu/tibet/scripts/thdl_scripts.js"></script>
|
||||
|
@ -44,259 +44,141 @@
|
|||
|
||||
</div>
|
||||
<div id="breadcrumbs">
|
||||
<a href="http://iris.lib.virginia.edu/tibet/index.html">Home</a> > <a href="index.html">Tools</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/software.html">Software</a> > Nightly Builds
|
||||
<a href="http://iris.lib.virginia.edu/tibet/index.html">Home</a> > <a href="index.html">Tools</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/allfonts.html">Fonts & Input</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/conv.html">Converters</a> > Converters in Jskad
|
||||
</div>
|
||||
</div><!--END banner-->
|
||||
|
||||
|
||||
<div id="main">
|
||||
|
||||
<h2>Tibetan Machine Web Converter</h2>
|
||||
<h2>Converters in Jskad</h2>
|
||||
|
||||
<p>
|
||||
In recent versions of Jskad, the 'Tools' menu has an option 'Launch
|
||||
Converter...'. If you use that option, you will find a
|
||||
first-class Tibetan-to-Tibetan and Tibetan-to-Wylie converter.
|
||||
That converter has a user-friendly GUI interface, and it tells you
|
||||
when things go wrong (even things as subtle as your having selected
|
||||
the wrong conversion). If you need a command-line interface to
|
||||
that converter, however, read on.
|
||||
Converter...'. If you use that option, you will find a set of
|
||||
first-class converters that can convert digital Tibetan from one
|
||||
form to another. (A command-line interface is also available;
|
||||
see below.)
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In the same JAR file as Jskad, power users will find a command-line
|
||||
utility that converts Tibetan documents from one digital
|
||||
representation to another. The converter embodies the same
|
||||
technology as Jskad itself, but often works even when Jskad fails
|
||||
due to Java's presently poor support for viewing RTF
|
||||
documents. This command-line utility converts a Tibetan
|
||||
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
|
||||
either of these three output formats:
|
||||
Some of the converters there are based on Jskad technology, but all
|
||||
are first-class in the sense that they are well though-out, well
|
||||
tested<!-- DLC LINK TO V&V story -->, and handle errors
|
||||
nicely. Certain features in Jskad are quite buggy; for
|
||||
example, its keyboards do not work as desired, but even when they
|
||||
do, they silently drop certain input characters. Do not worry
|
||||
that the converters described here suffer from these flaws; not one
|
||||
character of input is ever silently dropped. It is the
|
||||
intention of the developers that a Buddhist canon one day could be
|
||||
entrusted to these converters. Before you do that, though,
|
||||
please contact <a
|
||||
href="mailto:thdltools-devel@lists.sourceforge.net">the
|
||||
developers</a> to be sure that this documentation is up-to-date and
|
||||
to develop a custom validation and verification plan. None of
|
||||
the converters has yet been hand-validated on a real text of any
|
||||
size, but extensive unit testing has been performed for each
|
||||
conversion at every stage of development.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
In the same JAR file as Jskad, power users will find a command-line
|
||||
utility that converts Tibetan documents from one digital
|
||||
representation to another. The converter embodies the same
|
||||
technology as Jskad itself, but often works even when Jskad fails
|
||||
due to Java's presently poor support for viewing RTF
|
||||
documents. This command-line utility converts a Tibetan
|
||||
Machine Web-encoded (TMW-encoded) Rich Text Format (RTF) file to
|
||||
either of these three output formats:
|
||||
The following converters are available:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li>RTF files in Unicode</li>
|
||||
<li>RTF files with the appropriate THDL Extended Wylie (Wylie) used
|
||||
instead of TMW</li>
|
||||
<li>RTF files in Tibetan Machine (used in legacy systems)</li>
|
||||
<li><a href="ACIP_To_Tibetan_Converter.html">ACIP->Unicode</a>
|
||||
(Text->Text)</li>
|
||||
|
||||
<li><a href="ACIP_To_Tibetan_Converter.html">ACIP->Tibetan
|
||||
Machine Web</a> (Text->RTF)</li>
|
||||
|
||||
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->ACIP</a> (RTF->RTF)</li>
|
||||
|
||||
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->ACIP</a> (RTF->Text)</li>
|
||||
|
||||
<li><a href="TMW_or_TM_To_X_Converters.html">TM->TMW</a> (RTF->RTF)</li>
|
||||
|
||||
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->TM</a> (RTF->RTF)</li>
|
||||
|
||||
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->Unicode</a> (RTF->RTF)</li>
|
||||
|
||||
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->EWTS</a> (RTF->RTF)</li>
|
||||
|
||||
<li><a href="TMW_or_TM_To_X_Converters.html">TMW->EWTS</a> (RTF->Text)</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
In addition, this converter can convert Tibetan Machine RTF files to
|
||||
Tibetan Machine Web RTF files, and takes precautions to ensure that
|
||||
only a 100% perfect conversion is done in both directions
|
||||
(TM->TMW and TMW>TM). One such precaution is that two
|
||||
independent teams (Garrett and Garson, Chandler) turned the Tibetan
|
||||
Machine Web <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
||||
documentation</a> into TM<->TMW tables. These tables
|
||||
were compared, giving full confidence that the tables are as
|
||||
accurate as the documentation (which has a <a
|
||||
href="http://sourceforge.net/tracker/index.php?func=detail&aid=746871&group_id=61934&atid=502515">
|
||||
few flaws</a> itself). That documentation has not been
|
||||
extensively verified against the actual fonts, however.
|
||||
Another precaution is that any unknown characters cause the
|
||||
conversion to fail, and the result is a document containing merely
|
||||
the unknown characters. (There are some known, illegal glyphs
|
||||
created by Tibet Doc, and the converter handles the ones it knows of
|
||||
and treats the rest as unknown.)
|
||||
Moreover, EWTS->Unicode and EWTS->TMW converters are in
|
||||
development. <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/wyword.html">Wylie
|
||||
Word 2.0</a> has better EWTS support at present.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
This converter is smart enough to solve the "curly-brace
|
||||
problem", wherein Tahoma '{', '}', and '\' characters appear
|
||||
instead of the TMW stacks they are supposed to represent. This
|
||||
problem originates with certain versions of Microsoft Word's Rich
|
||||
Text Format writing capabilities.
|
||||
Above, <em>RTF</em> is an abbreviation for Rich Text Format;
|
||||
<em>Text</em> refers to an unformatted text file (in one of several
|
||||
encodings); <em>TMW</em> refers to the <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html">Tibetan
|
||||
Machine Web</a> font; <em>TM</em> refers to the <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/tm.html">Tibetan
|
||||
Machine</a> font; <em>Unicode</em> refers to the Tibetan <a
|
||||
href="http://www.unicode.org/">Unicode</a> characters in the range
|
||||
U+0F00-U+0FFF mainly but also sometimes includes other Unicode
|
||||
characters; <em>EWTS</em> refers to Tibetan encoded using the <a
|
||||
href="http://iris.lib.virginia.edu/tibet/collections/langling/ewts/">Extended
|
||||
Wylie Transliteration Scheme</a>, a Roman transliteration scheme;
|
||||
<em>ACIP</em> refers to Tibetan encoded using <a
|
||||
href="http://asianclassics.org">Asian Classics Input Project</a>
|
||||
(ACIP) <a
|
||||
href="http://asianclassics.org/download/tibetancode/ticode.pdf">Tibetan
|
||||
Input Code</a>, another Roman transliteration scheme.
|
||||
</p>
|
||||
|
||||
<a name="#invok"></a><h3>Invoking the Converters</h3>
|
||||
|
||||
<p>
|
||||
The converters have a user-friendly GUI interface, and it tells you
|
||||
when things go wrong (from things like the lack of a needed glyph in
|
||||
the output font to things like your having selected the wrong
|
||||
conversion). The GUI is not properly documented here, and
|
||||
probably will not be until you contact <a
|
||||
href="mailto:thdltools-devel@lists.sourceforge.net">the
|
||||
developers</a> and ask them to document it.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Further, this converter gives a polite error message when a given
|
||||
.rtf file simply cannot be read by the version of Java used.
|
||||
To use the GUI, first launch <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/jskad.html">Jskad</a>
|
||||
itself. Then select 'Launch Converter...' from the 'Tools'
|
||||
menu. Let's hope from there it's self-explanatory, because it
|
||||
is not yet properly documented.<!-- DLC -->
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Perhaps most importantly, the converter has a
|
||||
<tt>--find-some-non-tmw</tt> mode of operation that gives you, the
|
||||
user, confidence that RTF reading and writing idiosyncrasies are not
|
||||
going to interfere with a flawless conversion. It does so by
|
||||
printing out the first occurrence of a given character in a non-TMW
|
||||
font. Here is some example output:
|
||||
For batch conversions of many files, a command-line interface to the
|
||||
converters may be more suitable than the GUI interface. In the
|
||||
same JAR file as Jskad, power users will find a command-line utility
|
||||
that can do everything the GUI interface to the converters can
|
||||
do. To learn how to invoke it, see the output you get when you
|
||||
use this invocation:
|
||||
</p>
|
||||
<pre>
|
||||
java -cp "c:\my thdl tools\Jskad.jar" \
|
||||
org.thdl.tib.input.TibetanConverter \
|
||||
--find-some-non-tmw \
|
||||
"Dalai Lama Fifth History 01.rtf"
|
||||
|
||||
Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39
|
||||
Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45
|
||||
Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66
|
||||
Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219
|
||||
Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237
|
||||
Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754
|
||||
org.thdl.tib.input.TibetanConverter --help
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Given the above output, you can be sure that a flawless conversion
|
||||
(barring the appearance of <a href="#knownbugs">known bugs</a>) will
|
||||
result when you run <tt>java -cp "c:\my thdl tools\Jskad.jar"
|
||||
org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth
|
||||
History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended
|
||||
Wylie.rtf"</tt>. (Note that the '>' causes the output to be
|
||||
directed to the file named thereafter; this is quite handy.)
|
||||
This is because the only text in the input file besides Tibetan is
|
||||
whitespace and the Tahoma characters <tt>'{'</tt>, <tt>'}'</tt>, and
|
||||
<tt>'\'</tt>. These Tahoma characters are understood by the tool;
|
||||
they are symptoms of the "curly-brace problem".
|
||||
where you must replace "c:\my thdl tools\Jskad.jar" with the
|
||||
appropriate path on your system.
|
||||
</p>
|
||||
|
||||
<h3>Failed Conversions</h3>
|
||||
<!-- DLC link to V&V story... -->
|
||||
|
||||
<p>
|
||||
In this section, you'll learn how to tell if a conversion has
|
||||
succeeded in full, ran into minor problems, or failed altogether.
|
||||
</p>
|
||||
<h5>License</h5>
|
||||
|
||||
<h4>TMW to Wylie</h4>
|
||||
<p>Both the converters and this document are released under the <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/thdl_license.txt">THDL
|
||||
Open Community License Version 1.0</a>.</p>
|
||||
|
||||
<p>
|
||||
<font color="red">
|
||||
This section is too up-to-date -- this is documenting plans for the
|
||||
future. At present, an error message like
|
||||
<code><<[[JSKAD_TMW_TO_WYLIE_ERROR_NO_SUCH_WYLIE: Cannot
|
||||
convert DuffCode <duffcode font=TibetanMachineWeb7 charNum=72
|
||||
character=H/> to THDL Extended Wylie. Please see the
|
||||
documentation for the TMW font and transcribe this
|
||||
yourself.]]>></code> appears.
|
||||
</font>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Note that some TMW glyphs have no transliteration in Exteded
|
||||
Wylie. When you encounter such a glyph, you'll find
|
||||
<tt>\tmwXYYY</tt> in your output, where X tells you which TMW font
|
||||
the troublesome glyph comes from and YYY is the decimal number of
|
||||
the glyph in that font (which is a number between 000 and 255
|
||||
inclusive, usually between 33 and 126). The following are
|
||||
values corresponding to X:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
When X is 0, the TibetanMachineWeb font contains the glyph.
|
||||
</li>
|
||||
<li>
|
||||
When X is 1, the TibetanMachineWeb1 font contains the glyph.
|
||||
</li>
|
||||
<li>
|
||||
When X is 2, the TibetanMachineWeb2 font contains the glyph.
|
||||
</li>
|
||||
<li>
|
||||
When X is 3, the TibetanMachineWeb3 font contains the glyph.
|
||||
</li>
|
||||
<li>
|
||||
When X is 4, the TibetanMachineWeb4 font contains the glyph.
|
||||
</li>
|
||||
<li>
|
||||
When X is 5, the TibetanMachineWeb5 font contains the glyph.
|
||||
</li>
|
||||
<li>
|
||||
When X is 6, the TibetanMachineWeb6 font contains the glyph.
|
||||
</li>
|
||||
<li>
|
||||
When X is 7, the TibetanMachineWeb7 font contains the glyph.
|
||||
</li>
|
||||
<li>
|
||||
When X is 8, the TibetanMachineWeb8 font contains the glyph.
|
||||
</li>
|
||||
<li>
|
||||
When X is 9, the TibetanMachineWeb9 font contains the glyph.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
Upon finding a <tt>\tmwXYYY</tt> sequence in your output, you should
|
||||
consult the <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
||||
documentation</a> for the specific TMW font named. Find the
|
||||
glyph (by its YYY value) and decide how to proceed. If you
|
||||
find a glyph that you believe should have been converted into
|
||||
Extended Wylie by the tool, please report this as a bug through the
|
||||
SourceForge website or via e-mail.
|
||||
</p>
|
||||
|
||||
<h4>Other Conversions</h4>
|
||||
|
||||
<p>
|
||||
The other conversions are all-or-nothing. That is, if you run
|
||||
into any trouble whatsoever, the result will be a file containing
|
||||
just the problematic glyphs, each preceded by achen (i.e., U+0F68,
|
||||
the letter whose THDL Extended Wylie representation is 'a').
|
||||
These glyphs will be bracketed on the left by U+0F3C (for which the
|
||||
THDL Extended Wylie is '(') and on the right by U+0F3D (for which
|
||||
the THDL Extended Wylie is ')'). If your result is as long as
|
||||
your input, then the conversion went flawlessly.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091])
|
||||
that has no Tibetan Machine equivalent. This glyph is the only
|
||||
TMW glyph that can cause a TMW->TM conversion to fail. It
|
||||
is fairly common, though, especially if you've used Jskad to prepare
|
||||
your document. It might be appropriate to change the document
|
||||
to use TibetanMachineWeb7, glyph 90 [\tmw7090], a similar glyph that
|
||||
does have a TM equivalent.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
You might consider using Jskad to convert documents that give
|
||||
errors, as it has better error reporting and can tell you just
|
||||
what's wrong.
|
||||
</p>
|
||||
<p>
|
||||
If you ever encounter problems in a TM->TMW conversion, please
|
||||
send us mail with the error report (and the problem input document's
|
||||
resulting document) so that we can improve our tools.
|
||||
</p>
|
||||
|
||||
<h3>Invoking the Converter</h3>
|
||||
|
||||
<p>
|
||||
First add Jskad.jar to your CLASSPATH. You can do this by
|
||||
setting an environment variable CLASSPATH to contain the absolute
|
||||
path of the Jskad.jar file and then running the command <tt>java
|
||||
org.thdl.tib.input.TibetanConverter</tt>. Alternatively, you
|
||||
can use <code>java -cp "c:\my tibetan documents\Jskad.jar"
|
||||
org.thdl.tib.input.TibetanConverter</code> where you put in the
|
||||
appropriate path to Jskad.jar. You will see usage information
|
||||
appear if you do this correctly; you'll see a message like
|
||||
<code>java.lang.NoClassDefFoundError:
|
||||
org/thdl/tib/input/TibetanConverter; Exception in thread
|
||||
"main"</code> if you've not correctly told Java where to find
|
||||
Jskad.jar.
|
||||
</p>
|
||||
|
||||
<h3><a name="knownbugs"></a>Known Bugs</h3>
|
||||
|
||||
<p>
|
||||
All known bugs are listed in this section. They're more likely
|
||||
to be fixed if users complain, so complain away.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
There are no known bugs at present.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Please
|
||||
|
|
368
htdocs/TMW_or_TM_To_X_Converters.html
Normal file
368
htdocs/TMW_or_TM_To_X_Converters.html
Normal file
|
@ -0,0 +1,368 @@
|
|||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
|
||||
<!-- @author David Chandler -->
|
||||
<!-- @editor Emacs, baby! -->
|
||||
|
||||
|
||||
<head>
|
||||
<title>Converting from TM or TMW</title>
|
||||
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||||
<script type="text/javascript" src="http://iris.lib.virginia.edu/tibet/scripts/thdl_scripts.js"></script>
|
||||
<link rel="stylesheet" type="text/css" href="http://iris.lib.virginia.edu/tibet/style/thdl-styles.css"/>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
|
||||
<div id="banner">
|
||||
<a id="logo" href="http://iris.lib.virginia.edu/tibet/index.html"><img id="test" alt="THDL Logo" src="http://iris.lib.virginia.edu/tibet/images/logo.png"/></a>
|
||||
<h1>The Tibetan & Himalayan Digital Library</h1>
|
||||
|
||||
<div id="menubar">
|
||||
<script type='text/javascript'>function Go(){return}</script>
|
||||
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/thdl_menu_config.js'></script>
|
||||
|
||||
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/menu_new.js'></script>
|
||||
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/new/menu9_com.js'></script>
|
||||
<noscript><p>Your browser does not support javascript.</p></noscript>
|
||||
<div id='MenuPos' >Menu Loading... </div>
|
||||
</div><!--END menubar-->
|
||||
|
||||
</div><!--END banner-->
|
||||
|
||||
<div id="sub_banner">
|
||||
<div id="search">
|
||||
<form method="get" action="http://www.google.com/u/thdl">
|
||||
<p>
|
||||
<input type="text" name="q" id="q" size="15" maxlength="255" value="" />
|
||||
<input type="submit" name="sa" id="sa" value="Search"/>
|
||||
<input type="hidden" name="hq" id="hq" value="inurl:iris.lib.virginia.edu"/>
|
||||
</p>
|
||||
</form>
|
||||
|
||||
</div>
|
||||
<div id="breadcrumbs">
|
||||
<a href="http://iris.lib.virginia.edu/tibet/index.html">Home</a> > <a href="index.html">Tools</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/allfonts.html">Fonts & Input</a> > <a href="http://iris.lib.virginia.edu/tibet/tools/conv.html">Converters</a> > <a href="TMW_RTF_TO_THDL_WYLIE.html">Converters in Jskad</a> > Converting from TM or TMW
|
||||
</div>
|
||||
</div><!--END banner-->
|
||||
|
||||
|
||||
<div id="main">
|
||||
|
||||
<h2>Converting from Tibetan Machine or Tibetan Machine Web</h2>
|
||||
|
||||
<p>
|
||||
Among the <a href="TMW_RTF_TO_THDL_WYLIE.html">converters in
|
||||
Jskad</a> are some converters that take input that is encoded to use
|
||||
either the <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/tm.html">Tibetan
|
||||
Machine</a> (TM) or <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html">Tibetan
|
||||
Machine Web</a> (TMW) fonts. These converters are described
|
||||
here.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
First, to learn how to invoke the converters, see <a
|
||||
href="TMW_RTF_TO_THDL_WYLIE.html#invok">these instructions</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The converters embody the same technology as <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/jskad.html">Jskad</a>
|
||||
itself, but often work even when Jskad fails due to Java's presently
|
||||
poor support for viewing Rich Text Format (RTF) documents.
|
||||
These converters can convert a TMW-encoded RTF file to any of these
|
||||
output formats:
|
||||
</p>
|
||||
<ul>
|
||||
<li>an RTF file using <a href="http://www.unicode.org/">Unicode</a>,
|
||||
a standard encoding that will be widely supported in the future</li>
|
||||
|
||||
<li>an RTF file using the appropriate THDL Extended Wylie (<a
|
||||
href="http://iris.lib.virginia.edu/tibet/collections/langling/ewts/">EWTS</a>)
|
||||
instead of TMW</li>
|
||||
|
||||
<li>a text file using the appropriate THDL Extended Wylie (<a
|
||||
href="http://iris.lib.virginia.edu/tibet/collections/langling/ewts/">EWTS</a>)
|
||||
instead of TMW</li>
|
||||
|
||||
<li>an RTF file using the appropriate <a
|
||||
href="http://asianclassics.org">Asian Classics Input Project</a>
|
||||
(ACIP) <a
|
||||
href="http://asianclassics.org/download/tibetancode/ticode.pdf">Tibetan
|
||||
Input Code</a> instead of TMW</li>
|
||||
|
||||
<li>a text file using the appropriate <a
|
||||
href="http://asianclassics.org">Asian Classics Input Project</a>
|
||||
(ACIP) <a
|
||||
href="http://asianclassics.org/download/tibetancode/ticode.pdf">Tibetan
|
||||
Input Code</a> instead of TMW</li>
|
||||
|
||||
<li>an RTF file using the Tibetan Machine encoding (used in legacy
|
||||
systems).</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
In addition, this converter can convert a Tibetan Machine RTF file to
|
||||
a Tibetan Machine Web RTF file.
|
||||
</p>
|
||||
|
||||
<a name="vv"></a>
|
||||
<p>
|
||||
All the converters take precautions to ensure that only a 100%
|
||||
perfect conversion is done. One such precaution is that two
|
||||
independent teams (Garrett and Garson, Chandler) turned the Tibetan
|
||||
Machine Web <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
||||
documentation</a> into TM<->TMW tables. These tables
|
||||
were compared, giving full confidence that the tables are as
|
||||
accurate as the documentation (which has a few flaws itself,
|
||||
documented in the <a href="Tibetan51Errata.html">errata</a> we have
|
||||
created). That documentation has been verified against the
|
||||
actual fonts. David Chapman's assistance in this area has been
|
||||
invaluable.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Another precaution is that any unknown characters (in the font being
|
||||
converted from) cause the conversion to <a href="#failure">fail</a>,
|
||||
and the result is either a document containing merely the unknown
|
||||
characters or a document with conspicuous error messages
|
||||
interspersed.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
These converters are smart enough to solve the "curly-brace
|
||||
problem", wherein '{', '}', and '\' characters in the Tahoma
|
||||
font appear instead of the TMW stacks they are supposed to
|
||||
represent. This problem originates with certain versions of
|
||||
Microsoft Word's Rich Text Format writing capabilities. These
|
||||
converters are also smart enough to work around Java's <a
|
||||
href="http://developer.java.sun.com/developer/bugParade/bugs/4907759.html">Bug
|
||||
4907759</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Furthermore, these converters give a polite error message when a
|
||||
given RTF file simply cannot be read by the version of Java used.
|
||||
</p>
|
||||
|
||||
|
||||
<h2>Invoking the Converters</h2>
|
||||
|
||||
<p>
|
||||
See <a href="TMW_RTF_TO_THDL_WYLIE.html#invok">here</a> for details
|
||||
on how to invoke the converters.
|
||||
</p>
|
||||
|
||||
<!-- DLC TEST TMW->UNICODE F021... does that appear? -->
|
||||
|
||||
<a name="failure"></a><h2>Failed Conversions</h2>
|
||||
|
||||
<p>
|
||||
In this section, you'll learn how to tell if a conversion has
|
||||
succeeded in full, ran into minor problems, or failed altogether.
|
||||
</p>
|
||||
|
||||
<h3>TMW to ACIP</h3>
|
||||
|
||||
<p>
|
||||
When a TMW->ACIP conversion fails, a message such as
|
||||
<tt>[# JSKAD_TMW_TO_ACIP_ERROR_NO_SUCH_ACIP: Cannot convert
|
||||
<glyph font=TibetanMachineWeb8 charNum=38 character=&/> to
|
||||
ACIP. Please transcribe this yourself.]</tt> will appear in your
|
||||
output, but it will be amidst the successfully converted text.
|
||||
</p>
|
||||
|
||||
<h3>TMW to Wylie (i.e., EWTS)</h3>
|
||||
|
||||
<p>
|
||||
A TMW to EWTS conversion rarely fails; EWTS is almost entirely
|
||||
comprehensive (and may have been revised to be comprehensive by the
|
||||
time you read this.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
That said, you may want to search the output for EWTS constructs
|
||||
that you don't like, such as <tt>\u0F39</tt>- and
|
||||
<tt>\uF021</tt>-style escape sequences.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
If a TMW glyph has no transliteration according to <a
|
||||
href="http://iris.lib.virginia.edu/tibet/collections/langling/ewts/">EWTS</a>,
|
||||
then an error message like
|
||||
<tt><<[[JSKAD_TMW_TO_WYLIE_ERROR_NO_SUCH_WYLIE: Cannot convert
|
||||
<glyph font=TibetanMachineWeb7 charNum=95 character=_/> to
|
||||
THDL Extended Wylie. Please see the documentation for the TM or TMW
|
||||
font and transcribe this yourself.]]>></tt> appears in the
|
||||
output.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Upon finding such a message in your output, you should consult the
|
||||
<a href="http://iris.lib.virginia.edu/tibet/tools/tmw.html#doc">
|
||||
documentation</a> for the specific TMW font named. Find the
|
||||
glyph and decide how to proceed. If you find a glyph that you
|
||||
believe should have been converted into Extended Wylie by the tool,
|
||||
please report this as a bug through the SourceForge website or via
|
||||
e-mail.
|
||||
</p>
|
||||
|
||||
|
||||
<h3>TMW to Unicode, TM to TMW, and TMW to TM Conversions</h3>
|
||||
|
||||
<p>
|
||||
The TMW->Unicode, TM->TMW, and TMW->TM conversions are
|
||||
all-or-nothing. That is, if you run into any trouble
|
||||
whatsoever, the result will be a file containing just the
|
||||
problematic glyphs, each preceded by a-chen (i.e., U+0F68, the
|
||||
letter whose THDL Extended Wylie representation is 'a'). These
|
||||
glyphs will be bracketed on the left by U+0F3C (for which the THDL
|
||||
Extended Wylie is '(') and on the right by U+0F3D (for which the
|
||||
THDL Extended Wylie is ')'). If your result is as long as your
|
||||
input, then the conversion went flawlessly.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
There is one TMW glyph (TibetanMachineWeb7, glyph 91 [\tmw7091])
|
||||
that has no Tibetan Machine equivalent. This glyph is the only
|
||||
TMW glyph that can cause a TMW->TM conversion to fail. It
|
||||
is fairly common, though, especially if you've used Jskad to prepare
|
||||
your document. It might be appropriate to change the document
|
||||
to use TibetanMachineWeb7, glyph 90 (decimal ordinal 90, that is), a
|
||||
similar glyph that does have a TM equivalent.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
You might consider using the GUI converter interface in Jskad to
|
||||
convert documents that give impenetrable errors when converted by
|
||||
the command-line tool, as the GUI has better error reporting and can
|
||||
tell you just what's wrong.
|
||||
</p>
|
||||
|
||||
|
||||
<h2>Finding Potential Problems Before Conversion</h2>
|
||||
|
||||
<p>
|
||||
The converters that take TM and TMW input deal with problematic
|
||||
input in a clean way, but you might prefer the mechanism described
|
||||
here.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
There is a <tt>--find-some-non-tmw</tt> mode of operation that gives
|
||||
you, the user, confidence that RTF reading and writing
|
||||
idiosyncrasies are not going to interfere with a flawless
|
||||
conversion. It does so by printing out the first occurrence of
|
||||
a given character in a non-TMW font. Here is some example
|
||||
output:
|
||||
</p>
|
||||
<pre>
|
||||
java -cp "c:\my thdl tools\Jskad.jar" \
|
||||
org.thdl.tib.input.TibetanConverter \
|
||||
--find-some-non-tmw \
|
||||
"Dalai Lama Fifth History 01.rtf"
|
||||
|
||||
Non-TMW character newline [decimal 10] in the font Tahoma appears first at location 39
|
||||
Non-TMW character ' ' [decimal 32] in the font TimesNewRoman appears first at location 45
|
||||
Non-TMW character '}' [decimal 125] in the font Tahoma appears first at location 66
|
||||
Non-TMW character '{' [decimal 123] in the font Tahoma appears first at location 219
|
||||
Non-TMW character '\' [decimal 92] in the font Tahoma appears first at location 1237
|
||||
Non-TMW character newline [decimal 10] in the font Times New Roman appears first at location 9754
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
Given the above output, you can be sure that a flawless conversion
|
||||
(barring the appearance of <a href="#knownbugs">known bugs</a>) will
|
||||
result when you run <tt>java -cp "c:\my thdl tools\Jskad.jar"
|
||||
org.thdl.tib.input.TibetanConverter --to-wylie "Dalai Lama Fifth
|
||||
History 01.rtf" > "Dalai Lama Fifth History 01 in THDL Extended
|
||||
Wylie.rtf"</tt>. (Note that the '>' causes the output to be
|
||||
directed to the file named thereafter; this is quite handy.)
|
||||
This is because the only text in the input file besides Tibetan is
|
||||
whitespace and the Tahoma characters <tt>'{'</tt>, <tt>'}'</tt>, and
|
||||
<tt>'\'</tt>. These Tahoma characters are understood by the tool;
|
||||
they are symptoms of the "curly-brace problem".
|
||||
</p>
|
||||
|
||||
<p>
|
||||
There is a similar <tt>--find-some-non-tm</tt> mode of operation,
|
||||
useful for ensuring a trouble-free TM->TMW conversion.
|
||||
</p>
|
||||
|
||||
|
||||
<a name="knownbugs"></a><h2>Known Bugs</h2>
|
||||
|
||||
<p>
|
||||
All known bugs are listed in this section. They're more likely
|
||||
to be fixed if users complain, so complain away. And if you
|
||||
ever encounter problems in a conversion that are not listed here,
|
||||
please send us mail with the error report (and the problem input
|
||||
document's resulting document) so that we can improve our
|
||||
tools. The bugs are as follows:
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
TMW->ACIP does not produce {KA (KHA)} to indicate differing
|
||||
font sizes.
|
||||
</li>
|
||||
<li>
|
||||
TMW to Unicode fails subtly when the TMW for {\u0F28\u0F3E} is
|
||||
converted: {\u0F3E\u0F28} appears instead. [<a
|
||||
href="http://sourceforge.net/tracker/index.php?func=detail&aid=855480&group_id=61934&atid=502515">855480</a>]
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
</p>
|
||||
|
||||
<h2>License</h2>
|
||||
|
||||
<p>Both the converters and this document are released under the <a
|
||||
href="http://iris.lib.virginia.edu/tibet/tools/thdl_license.txt">THDL
|
||||
Open Community License Version 1.0</a>.</p>
|
||||
|
||||
|
||||
|
||||
<p>
|
||||
Please
|
||||
|
||||
<a href="mailto:thdltools-devel@lists.sourceforge.net">
|
||||
e-mail us</a>
|
||||
|
||||
your comments about this page.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The
|
||||
<a href="http://www.sourceforge.net/projects/thdltools">
|
||||
THDL Tools</a>
|
||||
project is generously hosted by:
|
||||
<!--
|
||||
|
||||
DO NOT DELETE THE SF.NET LOGO.
|
||||
|
||||
We have a choice of colors and sizes for this logo (see
|
||||
"https://sourceforge.net/docman/display_doc.php?docid=790&group_id=1"),
|
||||
but we do not have the option of removing it. SourceForge requests
|
||||
that we put it on each web page for our project, and to give us
|
||||
incentive to do so, they will not track the number of hits for our
|
||||
project web pages unless we put this link in. To track hits, see
|
||||
"http://sourceforge.net/project/stats/index.php?report=months&group_id=61934".
|
||||
|
||||
-->
|
||||
<a href="http://sourceforge.net/">
|
||||
<img src="http://sourceforge.net/sflogo.php?group_id=61934&type=1"
|
||||
width="88" height="31" alt="SourceForge Logo" />
|
||||
</a>
|
||||
<!-- AGAIN, DO NOT DELETE THE SF.NET LOGO. -->
|
||||
</p>
|
||||
</div>
|
||||
|
||||
|
||||
</body>
|
||||
</html>
|
Loading…
Reference in a new issue