Tibetan and Himalayan Library - THL

thl header title text

tibwn.ini File Format

Jskad and WylieWord both make use of a data file named tibwn.ini.  This document concerns the structure and content of that data file.

The purpose of the file is to encode all knowledge of the Tibetan Machine (TM) and Tibetan Machine Web (TMW) fonts.  Specifically, the following knowledge is found:

  • which TM glyphs corresponds to which TMW glyphs (a two-way mapping, which is one-to-one except for TMW7.90 and TMW7.91, which both map to the same TM glyph for our purposes)
  • which TM glyphs corresponds to which TMW glyphs (a two-way mapping, which is one-to-one except for TMW7.90 and TMW7.91, which both map to the same TM glyph for our purposes)
  • which Unicode codepoints are suitable for a TM or TMW to Unicode conversion
  • which THDL Extended Wylie is suitable for a TM or TMW to Wylie conversion
  • which vowel/bindu/vowel+bindu/achung-as-vowel glyphs correspond to which consonant/consonant stack glyphs (needed for composing beautiful stacks; needed for Wylie->TMW conversions and input methods).

Much of this knowledge is found in the documentation for the TM and TMW fonts.  Note the errata for this document, as it especially concerns the data in tibwn.ini.

File Format -- Overview

The tibwn.ini file format allows for comments, blank lines (entirely blank, containing not even whitespace), Section headers, comma-delimited lists, and newline-delimited rows of tilde-delimited data.  A comment line is any line that begins with two slashes ('//'); the entire line is ignored.  A blank line is ignored, too.  A section header is a line '<?section-name?>'.  A comma-delimited list must fit entirely on one line.

File Format -- Sections

The tibwn.ini file is broken up into sections.  Currently, these are as follows:

  • Consonants - set of Tibetan/Tibetanized Sanskrit consonants that the application reading tibwn.ini supports; does not include stacks
  • Numbers - set of Tibetan numbers that the application reading tibwn.ini supports
  • Vowels - set of Tibetan vowels that the application reading tibwn.ini supports
  • Other - set of Tibetan punctuation and other characters that the application reading tibwn.ini supports
  • Input:Punctuation - application-independent data for all Tibetan punctuation and other miscellaneous characters that can be listed in the Other section
  • Input:Vowels - application-independent data for all Tibetan vowels that can be listed in the Vowels section
  • Input:Tibetan - application-independent data for all Tibetan consonants and Tibetan (but not Tibetanized Sanskrit) consonant stacks that can be listed in the Consonants section
  • Input:Numbers - application-independent data for all Tibetan numbers that can be listed in the Numbers section
  • Input:Sanskrit - application-independent data for all Tibetanized Sanskrit stacks
  • ToWylie - data needed merely for Tibetan-to-THDL Extended Wylie and Tibetan-to-Unicode conversions
  • Ignore - a section containing data to be ignored for all purposes other than Tibetan-to-Unicode conversions (should probably be called ToUnicode at this point , but is called Ignore because it was once entirely ignored)

The Consonants, Numbers, Vowels, and Other sections contain comma-delimited lists of the THDL Extended Wylie representations of each consonant, number, etc. that the application reading tibwn.ini wishes to support.  Note that it is not possible to represent a comma (though extending the file format to do so would likely work along the same lines as the support for '~' characters via __TILDE__).  These sections should probably be discarded entirely so that applications and users can choose which characters to support themselves.  The rest of the file is application-independent.

The remaining sections are the meat of the file.  Each one contains zero or more rows of data, one row per line.  Each line looks like this:

h~61,1~1,100~1,62~1,109~1,112~1,123~1,125~10,115~10,122~0F67~1,102~

Each line describes one glyph in the Tibetan Machine Web font.  There are thirteen tilde-delimited columns per line, though the last two columns are optional.  An empty column takes no space; for example,

_~32,1~~2,32~~~~~~~0020

has non-empty data for only columns 1 (the first column, i.e. the one containing an underscore), 2, 4, and 11.  When it is desirable to represent a tilde ('~') character, __TILDE__ is used.  For example, in the THDL Extended Wylie for nyi zla, a tilde is used, so this line occurs:

__TILDE__^~91,5~~9,89~~~~~~~0F82

The columns themselves each have a meaning.  They are as follows:

  • Column 1 - the THDL Extended Wylie corresponding to this glyph
  • Column 2 - 'ord,fn' where fn corresponds to a Tibetan Machine font and ord tells which glyph in Tibetan Machine corresponds to the Tibetan Machine Web glyph this line describes
  • Column 3 - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which glyph in that font corresponds to the Tibetan Machine Web reduced-height glyph corresponding to the full-height glyph this line describes
  • Column 4 - key - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which Tibetan Machine Web glyph this line describes.  No two rows of data may have the same value for this column.  (Note that TMW is a superset of TM, so there is one glyph in TM that could reasonably appear twice, mapped to both TibetanMachineWeb7.90 and TMW7.91.)  But note that Jskad etc. must deal with a superset of TMW -- such as when converting the ACIP {W+W+W+KA} into Unicode -- and thus cannot internally use the TMW glyph alone to represent arbitrary Tibetan text.  And the Extended Wylie Transliteration is not a unique key either; see, e.g., the many glyphs that EWTS {r}.  For this reason, a smart tool uses the pair (EWTS, TMW) as an internal representation.  (In Jskad, this is done in a way that's hard to understand, but it is done -- see, e.g., the code implementing the TMW->ACIP conversion of TibetanMachineWeb7.69.)
  • Column 5 - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which glyph in that font corresponds to the Tibetan Machine Web glyph for the gi-gu vowel that looks most beautiful with the glyph this line describes
  • Column 6 - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which glyph in that font corresponds to the Tibetan Machine Web glyph for the zhabs-kyu vowel that looks most beautiful with the glyph this line describes
  • Column 7 - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which glyph in that font corresponds to the Tibetan Machine Web glyph for the 'greng-bu vowel that looks most beautiful with the glyph this line describes
  • Column 8 - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which glyph in that font corresponds to the Tibetan Machine Web glyph for the na-ro vowel that looks most beautiful with the glyph this line describes
  • Column 9 - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which glyph in that font corresponds to the Tibetan Machine Web glyph for the a-chung [Sanskrit] vowel that looks most beautiful with the glyph this line describes
  • Column 10 - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which glyph in that font corresponds to the Tibetan Machine Web glyph for the a-chung plus zhabs-kyu [Sanskrit] vowel that looks most beautiful with the glyph this line describes
  • Column 11 - 'x1,x2,...' or 'none', where xi describes a Unicode codepoint, most often a codepoint in the Tibetan range U+0F00 to U+0FFF.  A case-insensitive string of one, two, three, or four hexadecimal digits composes each xi.  The full, comma-separated sequence is the preferred Unicode representation of the glyph this line describes; 'none' appears for glyphs that have no Unicode correspondence.
  • Column 12 - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which glyph in that font corresponds to the Tibetan Machine Web severely-reduced-height glyph corresponding to the full-height glyph this line describes
  • Column 13 - 'fn,ord' where fn corresponds to a Tibetan Machine Web font and ord tells which glyph in that font corresponds to the Tibetan Machine Web vowel+bindu glyph corresponding to the vowel glyph this line describes

Tibetan Machine Font Indices

The following are the indices this file format uses to refer to the TibetanMachine font files:

  • 1 corresponds to TibetanMachine, the normal font
  • 2 corresponds to TibetanMachineSkt1
  • 3 corresponds to TibetanMachineSkt2
  • 4 corresponds to TibetanMachineSkt3
  • 5 corresponds to TibetanMachineSkt4

Tibetan Machine Web Font Indices

The following are the indices this file format uses to refer to the TibetanMachineWeb font files:

  • 1 corresponds to TibetanMachineWeb
  • 2 corresponds to TibetanMachineWeb1
  • 3 corresponds to TibetanMachineWeb2
  • 4 corresponds to TibetanMachineWeb3
  • 5 corresponds to TibetanMachineWeb4
  • 6 corresponds to TibetanMachineWeb5
  • 7 corresponds to TibetanMachineWeb6
  • 8 corresponds to TibetanMachineWeb7
  • 9 corresponds to TibetanMachineWeb8
  • 10 corresponds to TibetanMachineWeb9

Please e-mail us your comments about this page.

The THDL Tools project is generously hosted by: SourceForge Logo


Loading...