4aac262355
a couple of references that I didn't grok.
461 lines
15 KiB
HTML
461 lines
15 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
|
|
<!-- @author David Chandler -->
|
|
<!-- @date-created October 20, 2002 -->
|
|
<!-- @editor Emacs, baby! -->
|
|
|
|
<head>
|
|
<title>tibwn.ini File Format</title>
|
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
<script type="text/javascript" src="http://orion.lib.virginia.edu/thdl/scripts/thdl_scripts.js"></script>
|
|
<link rel="stylesheet" type="text/css" href="http://orion.lib.virginia.edu/thdl/style/thdl-styles.css"/>
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div id="banner">
|
|
<a id="logo" href="http://orion.lib.virginia.edu/thdl/index.html"><img id="test" alt="THDL Logo" src="http://orion.lib.virginia.edu/thdl/images/logo.png"/></a>
|
|
<h1>The Tibetan & Himalayan Digital Library</h1>
|
|
|
|
<div id="menubar">
|
|
<script type='text/javascript'>function Go(){return}</script>
|
|
<script type='text/javascript' src='http://orion.lib.virginia.edu/thdl/scripts/new/thdl_menu_config.js'></script>
|
|
|
|
<script type='text/javascript' src='http://orion.lib.virginia.edu/thdl/scripts/new/menu_new.js'></script>
|
|
<script type='text/javascript' src='http://orion.lib.virginia.edu/thdl/scripts/new/menu9_com.js'></script>
|
|
<noscript><p>Your browser does not support javascript.</p></noscript>
|
|
<div id='MenuPos' >Menu Loading... </div>
|
|
</div><!--END menubar-->
|
|
|
|
</div><!--END banner-->
|
|
|
|
<div id="sub_banner">
|
|
<div id="search">
|
|
<form method="get" action="http://www.google.com/u/thdl">
|
|
<p>
|
|
<input type="text" name="q" id="q" size="15" maxlength="255" value="" />
|
|
<input type="submit" name="sa" id="sa" value="Search"/>
|
|
<input type="hidden" name="hq" id="hq" value="inurl:orion.lib.virginia.edu"/>
|
|
</p>
|
|
</form>
|
|
|
|
</div>
|
|
<div id="breadcrumbs">
|
|
<a href="http://orion.lib.virginia.edu/thdl/index.html">Home</a> > <a href="index.html">Tools</a> > <a href="http://orion.lib.virginia.edu/thdl/tools/software.html">Software</a> > Nightly Builds
|
|
</div>
|
|
</div><!--END banner-->
|
|
|
|
|
|
<div id="main">
|
|
|
|
<h2>tibwn.ini File Format</h2>
|
|
|
|
<p>
|
|
<a
|
|
href="http://orion.lib.virginia.edu/thdl/tools/jskad.html">Jskad</a>
|
|
and <a
|
|
href="http://orion.lib.virginia.edu/thdl/tools/wyword.html">WylieWord</a>
|
|
both make use of a data file named <a
|
|
href="http://cvs.sourceforge.net/viewcvs.py/thdltools/Jskad/source/org/thdl/tib/text/tibwn.ini?view=markup"><code>tibwn.ini</code></a>.
|
|
This document concerns the structure and content of that data file.
|
|
</p>
|
|
|
|
<p>
|
|
The purpose of the file is to encode all knowledge of the <a
|
|
href="http://orion.lib.virginia.edu/thdl/tools/tm.html">Tibetan
|
|
Machine</a> (TM) and <a
|
|
href="http://orion.lib.virginia.edu/thdl/tools/tmw.html">Tibetan
|
|
Machine Web</a> (TMW) fonts. Specifically, the following
|
|
knowledge is found:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
which TM glyphs corresponds to which TMW glyphs (a two-way
|
|
mapping, which is one-to-one except for TMW7.90 and TMW7.91, which
|
|
both map to the same TM glyph for our purposes)
|
|
</li>
|
|
<li>
|
|
which TM glyphs corresponds to which TMW glyphs (a two-way
|
|
mapping, which is one-to-one except for TMW7.90 and TMW7.91, which
|
|
both map to the same TM glyph for our purposes)
|
|
</li>
|
|
<li>
|
|
which Unicode codepoints are suitable for a TM or TMW to Unicode
|
|
conversion
|
|
</li>
|
|
<li>
|
|
which THDL Extended Wylie is suitable for a TM or TMW to Wylie
|
|
conversion
|
|
</li>
|
|
<li>
|
|
which vowel/bindu/vowel+bindu/achung-as-vowel glyphs correspond to
|
|
which consonant/consonant stack glyphs (needed for composing
|
|
beautiful stacks; needed for Wylie->TMW conversions and input
|
|
methods).
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
Much of this knowledge is found in the <a
|
|
href="http://orion.lib.virginia.edu/thdl/tools/tib5doc.pdf">documentation</a>
|
|
for the TM and TMW fonts. Note the <a
|
|
href="Tibetan51Errata.html">errata</a> for this document, as it
|
|
especially concerns the data in <code>tibwn.ini</code>.
|
|
</p>
|
|
|
|
<h3>File Format -- Overview</h3>
|
|
|
|
<p>
|
|
The <code>tibwn.ini</code> file format allows for comments, blank
|
|
lines (entirely blank, containing not even whitespace), Section
|
|
headers, comma-delimited lists, and newline-delimited rows of
|
|
tilde-delimited data. A comment line is any line that begins
|
|
with two slashes ('//'); the entire line is ignored. A blank
|
|
line is ignored, too. A section header is a line
|
|
'<?<i>section-name</i>?>'. A comma-delimited list must
|
|
fit entirely on one line.
|
|
</p>
|
|
|
|
|
|
<h3>File Format -- Sections</h3>
|
|
|
|
<p>
|
|
The <code>tibwn.ini</code> file is broken up into sections.
|
|
Currently, these are as follows:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Consonants - set of Tibetan/Tibetanized Sanskrit consonants that
|
|
the application reading tibwn.ini supports; does not include
|
|
stacks
|
|
</li>
|
|
|
|
<li>
|
|
Numbers - set of Tibetan numbers that the application reading
|
|
tibwn.ini supports
|
|
</li>
|
|
|
|
<li>
|
|
Vowels - set of Tibetan vowels that the application reading
|
|
tibwn.ini supports
|
|
</li>
|
|
|
|
<li>
|
|
Other - set of Tibetan punctuation and other characters that the
|
|
application reading tibwn.ini supports
|
|
</li>
|
|
|
|
<li>
|
|
Input:Punctuation - application-independent data for all Tibetan
|
|
punctuation and other miscellaneous characters that can be listed
|
|
in the Other section
|
|
</li>
|
|
|
|
<li>
|
|
Input:Vowels - application-independent data for all Tibetan vowels
|
|
that can be listed in the Vowels section
|
|
</li>
|
|
|
|
<li>
|
|
Input:Tibetan - application-independent data for all Tibetan
|
|
consonants and Tibetan (but not Tibetanized Sanskrit) consonant
|
|
stacks that can be listed in the Consonants section
|
|
</li>
|
|
|
|
<li>
|
|
Input:Numbers - application-independent data for all Tibetan
|
|
numbers that can be listed in the Numbers section
|
|
</li>
|
|
|
|
<li>
|
|
Input:Sanskrit - application-independent data for all Tibetanized
|
|
Sanskrit stacks
|
|
</li>
|
|
|
|
<li>
|
|
ToWylie - data needed merely for Tibetan-to-THDL Extended Wylie
|
|
and Tibetan-to-Unicode conversions
|
|
</li>
|
|
|
|
<li>
|
|
Ignore - a section containing data to be ignored for all purposes
|
|
other than Tibetan-to-Unicode conversions (should probably be
|
|
called ToUnicode at this point <!-- FIXME -->, but is called
|
|
Ignore because it was once entirely ignored)
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
The Consonants, Numbers, Vowels, and Other sections contain
|
|
comma-delimited lists of the THDL Extended Wylie representations of
|
|
each consonant, number, etc. that the application reading
|
|
<code>tibwn.ini</code> wishes to support. Note that it is not
|
|
possible to represent a comma (though extending the file format to
|
|
do so would likely work along the same lines as the support for '~'
|
|
characters via __TILDE__). These sections should probably be
|
|
discarded entirely so that applications and users can choose which
|
|
characters to support themselves. The rest of the file is
|
|
application-independent.<!-- FIXME -->
|
|
</p>
|
|
|
|
<p>
|
|
The remaining sections are the meat of the file. Each one
|
|
contains zero or more rows of data, one row per line. Each
|
|
line looks like this:
|
|
</p>
|
|
|
|
<pre>
|
|
h~61,1~1,100~1,62~1,109~1,112~1,123~1,125~10,115~10,122~0F67~1,102~
|
|
</pre>
|
|
|
|
<p>
|
|
Each line describes one glyph in the Tibetan Machine Web font.
|
|
There are thirteen tilde-delimited columns per line, though the last
|
|
two columns are optional. An empty column takes no space; for
|
|
example,
|
|
</p>
|
|
|
|
<pre>
|
|
_~32,1~~2,32~~~~~~~0020
|
|
</pre>
|
|
|
|
<p>
|
|
has non-empty data for only columns 1 (the first column, i.e. the one
|
|
containing an underscore), 2, 4, and 11. When it is desirable to
|
|
represent a tilde ('~') character, __TILDE__ is used. For
|
|
example, in the THDL Extended Wylie for nyi zla, a tilde is used, so
|
|
this line occurs:
|
|
</p>
|
|
|
|
<pre>
|
|
__TILDE__^~91,5~~9,89~~~~~~~0F82
|
|
</pre>
|
|
|
|
<p>
|
|
The columns themselves each have a meaning. They are as
|
|
follows:
|
|
</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Column 1 - the THDL Extended Wylie corresponding to this glyph
|
|
</li>
|
|
|
|
<li>
|
|
Column 2 - '<i>ord</i>,<i>fn</i>' where <i>fn</i> corresponds to a
|
|
<a href="#tmindex">Tibetan Machine font</a> and <i>ord</i> tells
|
|
which glyph in <b>Tibetan Machine</b> corresponds to the Tibetan
|
|
Machine Web glyph this line describes
|
|
</li>
|
|
|
|
<li>
|
|
Column 3 - '<i>fn</i>,<i>ord</i>' where <i>fn</i> corresponds to a
|
|
<a href="#tmwindex">Tibetan Machine Web font</a> and <i>ord</i>
|
|
tells which glyph in that font corresponds to the Tibetan Machine
|
|
Web <b>reduced-height glyph</b> corresponding to the full-height
|
|
glyph this line describes
|
|
</li>
|
|
|
|
<li>
|
|
<b>Column 4 - key -</b> '<i>fn</i>,<i>ord</i>' where <i>fn</i>
|
|
corresponds to a <a href="#tmwindex">Tibetan Machine Web font</a>
|
|
and <i>ord</i> tells which Tibetan Machine Web glyph this line
|
|
describes. No two rows of data may have the same value for
|
|
this column. (Note that TMW is a superset of TM, so there is
|
|
one glyph in TM that could reasonably appear twice, mapped to both
|
|
TibetanMachineWeb7.90 and TMW7.91.) <i>But note that Jskad
|
|
etc. must deal with a superset of TMW -- such as when converting
|
|
the ACIP {<tt>W+W+W+KA</tt>} into Unicode -- and thus cannot
|
|
internally use the TMW glyph alone to represent arbitrary Tibetan
|
|
text. And the Extended Wylie Transliteration is not a unique
|
|
key either; see, e.g., the many glyphs that EWTS
|
|
{<tt>r</tt>}. For this reason, a smart tool uses the pair
|
|
(EWTS, TMW) as an internal representation. (In Jskad,
|
|
this is done in a way that's hard to understand, but it is done --
|
|
see, e.g., the code implementing the TMW->ACIP conversion of
|
|
TibetanMachineWeb7.69.)</i>
|
|
</li>
|
|
|
|
<li>
|
|
Column 5 - '<i>fn</i>,<i>ord</i>' where <i>fn</i> corresponds to a
|
|
<a href="#tmwindex">Tibetan Machine Web font</a> and <i>ord</i>
|
|
tells which glyph in that font corresponds to the Tibetan Machine
|
|
Web glyph for the <b>gi-gu</b> vowel that looks most beautiful
|
|
with the glyph this line describes
|
|
</li>
|
|
|
|
<li>
|
|
Column 6 - '<i>fn</i>,<i>ord</i>' where <i>fn</i> corresponds to a
|
|
<a href="#tmwindex">Tibetan Machine Web font</a> and <i>ord</i>
|
|
tells which glyph in that font corresponds to the Tibetan Machine
|
|
Web glyph for the <b>zhabs-kyu</b> vowel that looks most beautiful
|
|
with the glyph this line describes
|
|
</li>
|
|
|
|
<li>
|
|
Column 7 - '<i>fn</i>,<i>ord</i>' where <i>fn</i> corresponds to a
|
|
<a href="#tmwindex">Tibetan Machine Web font</a> and <i>ord</i>
|
|
tells which glyph in that font corresponds to the Tibetan Machine
|
|
Web glyph for the <b>'greng-bu</b> vowel that looks most beautiful
|
|
with the glyph this line describes
|
|
</li>
|
|
|
|
<li>
|
|
Column 8 - '<i>fn</i>,<i>ord</i>' where <i>fn</i> corresponds to a
|
|
<a href="#tmwindex">Tibetan Machine Web font</a> and <i>ord</i>
|
|
tells which glyph in that font corresponds to the Tibetan Machine
|
|
Web glyph for the <b>na-ro</b> vowel that looks most beautiful
|
|
with the glyph this line describes
|
|
</li>
|
|
|
|
<li>
|
|
Column 9 - '<i>fn</i>,<i>ord</i>' where <i>fn</i> corresponds to a
|
|
<a href="#tmwindex">Tibetan Machine Web font</a> and <i>ord</i>
|
|
tells which glyph in that font corresponds to the Tibetan Machine
|
|
Web glyph for the <b>a-chung</b> [Sanskrit] vowel that looks most
|
|
beautiful with the glyph this line describes
|
|
</li>
|
|
|
|
<li>
|
|
Column 10 - '<i>fn</i>,<i>ord</i>' where <i>fn</i> corresponds to
|
|
a <a href="#tmwindex">Tibetan Machine Web font</a> and <i>ord</i>
|
|
tells which glyph in that font corresponds to the Tibetan Machine
|
|
Web glyph for the <b>a-chung plus zhabs-kyu</b> [Sanskrit] vowel
|
|
that looks most beautiful with the glyph this line describes
|
|
</li>
|
|
|
|
<li>
|
|
Column 11 - '<i>x<sub>1</sub></i>,<i>x<sub>2</sub></i>,...' or
|
|
'none', where <i>x<sub>i</sub></i> describes a Unicode codepoint,
|
|
most often a codepoint in the Tibetan range U+0F00 to
|
|
U+0FFF. A case-insensitive string of one, two, three, or
|
|
four hexadecimal digits composes each <i>x<sub>i</sub></i>.
|
|
The full, comma-separated sequence is the preferred Unicode
|
|
representation of the glyph this line describes; 'none' appears
|
|
for glyphs that have no Unicode correspondence.
|
|
</li>
|
|
|
|
<li>
|
|
Column 12 - '<i>fn</i>,<i>ord</i>' where <i>fn</i> corresponds to
|
|
a <a href="#tmwindex">Tibetan Machine Web font</a> and <i>ord</i>
|
|
tells which glyph in that font corresponds to the Tibetan Machine
|
|
Web <b>severely-reduced-height glyph</b> corresponding to the
|
|
full-height glyph this line describes
|
|
</li>
|
|
|
|
<li>
|
|
Column 13 - '<i>fn</i>,<i>ord</i>' where <i>fn</i> corresponds to
|
|
a <a href="#tmwindex">Tibetan Machine Web font</a> and <i>ord</i>
|
|
tells which glyph in that font corresponds to the Tibetan Machine
|
|
Web <b>vowel+bindu glyph</b> corresponding to the vowel glyph this
|
|
line describes
|
|
</li>
|
|
</ul>
|
|
|
|
<a name="tmindex"></a>
|
|
<h4>Tibetan Machine Font Indices</h4>
|
|
|
|
<p>
|
|
The following are the indices this file format uses to refer to the
|
|
TibetanMachine font files:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
1 corresponds to TibetanMachine, the normal font
|
|
</li>
|
|
<li>
|
|
2 corresponds to TibetanMachineSkt1
|
|
</li>
|
|
<li>
|
|
3 corresponds to TibetanMachineSkt2
|
|
</li>
|
|
<li>
|
|
4 corresponds to TibetanMachineSkt3
|
|
</li>
|
|
<li>
|
|
5 corresponds to TibetanMachineSkt4
|
|
</li>
|
|
</ul>
|
|
|
|
<a name="tmwindex"></a>
|
|
<h4>Tibetan Machine Web Font Indices</h4>
|
|
|
|
<p>
|
|
The following are the indices this file format uses to refer to the
|
|
TibetanMachineWeb font files:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
1 corresponds to TibetanMachineWeb
|
|
</li>
|
|
<li>
|
|
2 corresponds to TibetanMachineWeb1
|
|
</li>
|
|
<li>
|
|
3 corresponds to TibetanMachineWeb2
|
|
</li>
|
|
<li>
|
|
4 corresponds to TibetanMachineWeb3
|
|
</li>
|
|
<li>
|
|
5 corresponds to TibetanMachineWeb4
|
|
</li>
|
|
<li>
|
|
6 corresponds to TibetanMachineWeb5
|
|
</li>
|
|
<li>
|
|
7 corresponds to TibetanMachineWeb6
|
|
</li>
|
|
<li>
|
|
8 corresponds to TibetanMachineWeb7
|
|
</li>
|
|
<li>
|
|
9 corresponds to TibetanMachineWeb8
|
|
</li>
|
|
<li>
|
|
10 corresponds to TibetanMachineWeb9
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
Please
|
|
|
|
<a href="mailto:thdltools-devel@lists.sourceforge.net">
|
|
e-mail us</a>
|
|
|
|
your comments about this page.
|
|
</p>
|
|
|
|
<p>
|
|
The
|
|
<a href="http://www.sourceforge.net/projects/thdltools">
|
|
THDL Tools</a>
|
|
project is generously hosted by:
|
|
<!--
|
|
|
|
DO NOT DELETE THE SF.NET LOGO.
|
|
|
|
We have a choice of colors and sizes for this logo (see
|
|
"https://sourceforge.net/docman/display_doc.php?docid=790&group_id=1"),
|
|
but we do not have the option of removing it. SourceForge requests
|
|
that we put it on each web page for our project, and to give us
|
|
incentive to do so, they will not track the number of hits for our
|
|
project web pages unless we put this link in. To track hits, see
|
|
"http://sourceforge.net/project/stats/index.php?report=months&group_id=61934".
|
|
|
|
-->
|
|
<a href="http://sourceforge.net/">
|
|
<img src="http://sourceforge.net/sflogo.php?group_id=61934&type=1"
|
|
width="88" height="31" alt="SourceForge Logo" />
|
|
</a>
|
|
<!-- AGAIN, DO NOT DELETE THE SF.NET LOGO. -->
|
|
</p>
|
|
</div>
|
|
|
|
|
|
</body>
|
|
</html>
|