www/htdocs/TibetanFormatConverterDesign.html
eg3p ee840fe816 I have embedded the SourceForge documentation into
THDL's web design and menu system. At the design
protocols for the THDL web site are a bit messy, but
they are being cleaned up and hopefully will eventually
emerge in a way that makes them easier for non-THDL
staff like David C. to make sense of.
2003-05-14 20:52:13 +00:00

468 lines
14 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<!-- @author David Chandler -->
<!-- @date November 14, 2002 -->
<!-- @editor Emacs, baby! -->
<!--
==============
INSERT KEYWORDS AND DESCRIPTION HERE
==============
-->
<meta name="keywords" content="tibetan fonts, tibetan software, digital ethnography">
<meta name="description" content="This presents intellectual and electronic tools for using Tibetan language in a digital medium and for viewing the THDL site.">
<!--
==============
INSERT PAGE TITLE HERE
==============
In order to facilitate the use of the unicode character set the charset declaration will be set equal to utf-8
Always incorporate Tibetan and Himalayan
-->
<title>Tibetan Format Converter Design Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-utf-8">
<!--
==============
THIS IS THE STYLE FOR THE BODY; IT'S WHAT CALLS ON THE BANNER AS A BACKGROUND IMAGE AND SETS UP THE BACKGROUND COLOR. MAKE SURE TO LINK TO THE BANNER HERE.
==============
-->
<style type="text/css">
<!--
body {background-attachment: scroll; background-image: url(http://iris.lib.virginia.edu/tibet/images/bannerTools.gif); background-repeat: no-repeat; background-position: left top; backgroud-color: white}
-->
</style>
<!--
==============
THE STYLE FOR THE TEXT, ETC. IT GOES TO THE TOP-LEVEL STYLE PAGE. IS THERE A NEED FOR EACH COLLECTION TO HAVE INDIVIDUAL STYLE PAGES?
==============
-->
<link rel="stylesheet" href="http://iris.lib.virginia.edu/tibet/style/tools.css">
<!--
==============
MAKES NETSCAPE RELOAD IF THE WINDOW IS RESIZED
==============
-->
<script language="JavaScript">
<!--
function MM_reloadPage(init) { //reloads the window if Nav4 resized
if (init==true) with (navigator) {if ((appName=="Netscape")&&(parseInt(appVersion)==4)) {
document.MM_pgW=innerWidth; document.MM_pgH=innerHeight; onresize=MM_reloadPage; }}
else if (innerWidth!=document.MM_pgW || innerHeight!=document.MM_pgH) location.reload();
}
MM_reloadPage(true);
// -->
</script>
</head>
<!--
==============
SETS THE PAGE MARGINS TO "0" SO THE MENU DOESN'T GET SCREWED UP
==============
-->
<body topmargin="0" leftmargin="0" marginwidth="0" marginheight="0">
<!--
==============
THE JAVASCRIPT MENUS
==============
-->
<script type='text/javascript'>
//HV Menu v5- by Ger Versluis (http://www.burmees.nl/)
//Submitted to Dynamic Drive (http://www.dynamicdrive.com)
//Visit http://www.dynamicdrive.com for this script and more
function Go(){return}
//==============
// --CALL UP THE MENU. THE 1ST OF THE CHOICES HAS THE VARIABLES
// AND NEEDS TO BE UPDATED FOR EACH SECTION.
// MENU_LOADER.JS WILL ALWAYS BE THE SAME.
//==============
</script>
<script type='text/javascript' src='http://iris.lib.virginia.edu/tibet/scripts/menu_tools.js'></script>
<script type='text/javascript' src='http://iris.lib.virginia.edu//tibet/scripts/menu_loader.js'></script>
<script type="text/javascript" src="http://iris.lib.virginia.edu//tibet/scripts/thdl_scripts.js"></script>
<noscript> Your browser does not support Javascript menus. Please utilize the
Site Map for navigation.</noscript>
<div align="left" style="position:absolute; left:0px; top:0px; width:100px; height:50px; z-index:7">
<a href="http://iris.lib.virginia.edu/tibet/index.html">
<img src="http://iris.lib.virginia.edu/tibet/images/spacer.gif" style="border-width:0" width="100" height="50">
</a>
</div>
<div id="Layer1" style="position:absolute; left:100px; top:28px; width:708px; height:20px; z-index:1">
<!--
==============
INSERT THE BREADCRUMBS
==============
A HREF tags should not be applied to current page, e.g. THDL and Collections get link, Literature and Home do not. If you were on a subpage of Literature, then Literature would link to the Literature home page.
-->
<div align="right"><font color="#000000"><A HREF="http://iris.lib.virginia.edu/tibet/index.html">THDL</A> : <A HREF="http://iris.lib.virginia.edu/tibet/tools/index.html">Tools</A> : <a href="http://iris.lib.virginia.edu/tibet/tools/software.html">Software</a> : Converter Design</a></font>
</div>
</div>
<!--
==============
INSERT THE MENU
==============
-->
<div id='MenuPos' style="text-size:9px; layer-background-color:#CCCCCC; background-color: #CCCCCC; position:absolute; left:0px; top:50px; width:808px"><table width="808" cellpadding="0" cellspacing="0" height="19"><tr><td><p style="font-size:9px">Menu loading...</p></td></tr></table></div>
<!--
==============
MAKE THE LAYER THAT WILL HOLD THE CONTENT OF THE PAGE - THE TEXT, IMAGES, WHATEVER. THIS LAYER WILL BE CLOSED AT THE END OF THIS HTML DOCUMENT
==============
Maximum table width not to exceed 750
All images must have borders of 1 pixel
No image will exceed 325 X 325 height and width measurements
Position attribute on layer3 may need to be changed to absolute to accomodate Netscape
-->
<div id="Layer3" style="position:relative; left:7px; top:80px; width:801px; height:396px; z-index:1; overflow: visible; background-color: #FFFFFF; layer-background-color: #FFFFFF; border: 1px none #000000" >
<table width="750" border="0" cellspacing="0" cellpadding="10">
<tr>
<td valign="top" align="left">
<!--
=============
FOR EVERY "ADMINISTRATIVELY DISTINCT" PROJECT, THERE SHOULD BE A CREDIT FOR THE PROVIDER OF THE INFO
=============
-->
<script langauge="JavaScript">
function openWin(url, name) { popupWin = window.open(url, name,"resizable=1,scrollbars=1,toolbar=0,width=400 ,height=450")
}
</script>
<!--
============
THIS LINK WILL OPEN A SEPARATE WINDOW THAT WILL PROVIDE INFORMATION REGARDING THE ADVISORY BOARD MEMBERS. ALSO PROVIDED IN THE PAGE WILL BE PARTICIPATION AND DONATION INFORMATION.
============
-->
<!--
==============
SETS THE BODY TEXT TO JUSTIFIED
==============
-->
<div align=justify>
<!--
=============
INSERT LINK TO GUIDED TOUR HERE -uncomment when ready.
=============
-->
<!--
==============
INSERT BODY TEXT HERE
The first section of text is the short "introduction" about the Theme and the various discplines that have a vested interest in them.
Design principle: Bold the first few words of this text section.
==============
-->
<h2>Tibetan Format Converter Design Document</h2>
<p>
This document describes the design of a mechanism for converting
from any of a number of representations of Tibetan+Roman text to any
of a number of representations. This converter will store
Tibetan+Roman text internally in a
org.thdl.tib.text.TibetanDocument, and it will use a
org.thdl.tib.text.TibetanKeyboard to populate a TibetanDocument.
These two classes exist presently inside the Jskad application, but
will be modified as needed so that servlets, console applications,
and AWT/Swing-based applications can all make use of them.
</p>
<p>
The difficulty is in fault-tolerance, reliability (DLC address both
verification AND validation), and speed. Speed will be of least
concern.
</p>
<h3>Input formats</h3>
<p>
The converter will support, in a modular fashion, <b>mixed Tibetan
and Roman</b> input in the following formats:
</p>
<ul>
<li>
An HTML file with embedded &lt;tibetan
translit="extended-wylie"&gt;sgra&lt;/tibetan&gt; tags (from the
SimpleTibetanAndRomanDocument DTD mentioned below)
</li>
<li>
Unicode (regardless of the order of consonants in a stack)
</li>
<li>
RTF for TibetanMachine
</li>
<li>
RTF for TibetanMachineWeb
</li>
<li>
RTF for Sambhota Old
</li>
<li>
RTF for Sambhota New
</li>
<li>
Edward and Than's XHTML
</li>
</ul>
<p>
In addition, the converter will support, in a modular fashion,
<b>strictly Tibetan</b> input in the following formats:
</p>
<ul>
<li>
Extended Wylie, ACIP, and any other format for which there
exists a Jskad keyboard (i.e., a .ini file in the desired
format). In practice, only ACIP and some Wylie variants are
used for storing Tibetan, but the mechanism is general. (This
will be in UTF8 with no metadata)
</li>
</ul>
<p>
The converter will attempt to accept input that has minor flaws, but
it will also have a mode that rejects input with even the slightest
flaw.
</p>
<h3>Output formats</h3>
<p>
The converter will support, in a modular fashion, outputting a
TibetanDocument that is <b>entirely Tibetan, entirely Roman, or a
mix of Tibetan and Roman</b>, to the following output formats:
</p>
<ul>
<li>
A proprietary, not-very-well-thought-out XML file of David
Chandler's design. For ease of imputation, let's say that this
will adhere to the LetterByLetterTibetanAndRomanDocument DTD.
This is useful for testing the software. Also useful because it
can easily be transformed into as-yet-unthought-of output
formats.
</li>
<li>
Extended Wylie or ACIP (inside a trivial XML[UTF8] document that
describes the tool that output this file and links to a
versioned DTD on the THDL web server) [only these two are used,
but we could generate output in the TCC keyboard #1
"transliteration" because the mechanism is general]
</li>
<li>
Unicode (DLC: in which order for consonantal stacks? also,
normalized or not?)
</li>
<li>
RTF for TibetanMachine
</li>
<li>
RTF for TibetanMachineWeb
</li>
<li>
RTF for Sambhota Old
</li>
<li>
RTF for Sambhota New
</li>
<li>
Edward and Than's XHTML
</li>
<li>
XML that is much leaner and has &lt;tibetan translit="acip |
extended-wylie"&gt; and &lt;roman&gt; tags (just a minimum of
them). This will be according to the not-yet-in-existence
SimpleTibetanAndRomanDocument DTD.
</li>
</ul>
<p>
The converter will support, in a modular fashion, outputting a
TibetanDocument that contains <b>only Tibetan and no Roman text</b>
to the following additional output formats:
</p>
<ul>
<li>
Extended Wylie, ACIP, and any other format for which there
exists a Jskad keyboard (i.e., a .ini file in the desired
format). In practice, only ACIP and some Wylie variants are
used for storing Tibetan, but the mechanism is general. (This
will be in UTF8 with no metadata)
</li>
<li>
Phonetic Tibetan (ACIP loose standard)
</li>
<li>
Phonetic Tibetan (THDL standard)
</li>
</ul>
<p>
What formats am I missing? E-mail <a
href="mailto:dchandler@users.sourceforge.net">me</a> them.
</p>
<h3>Advantages and Benefits</h3>
<p>
After this work item is completed, Jskad will be a powerful viewer
of the various input formats described above.
</p>
<p>
Command-line tools will exist to convert to-and-fro this-and-that.
The most useful conversions will be to-and-from Unicode. This will
allow long-term storage in a format that will exist for years, while
still allowing day-to-day work on systems without support for
rendering Unicode.
</p>
<p>
In addition, it will be possible with a little extra work to use
Jskad as an HTML source editor rather than notepad. You can save as
the ugly, uneditable XHTML source that browsers can display, or
preview in your system's default browser.
</p>
<p>
Edward envisions a servlet that allows users to paste in, type in,
or upload Tibetan in their format of choice. This will be shown on
the left side of the web page. Upon identifying that format
(perhaps the servlet will make an educated guess, even), they can
then select any of our supported output formats and see the result
(and download at their leisure) on the right half of the web page.
</p>
<h3>Implementation Plan</h3>
<p>
To implement this converter, we will do the following:
</p>
<ol>
<li>
Have TibetanDocument output a dense XML document that adheres to
the LetterByLetterTibetanAndRomanDocument DTD.
</li>
<li>
Play with XSLT and use it where appropriate to create output.
</li>
<li>
Get the keyboard input logic out of org.thdl.tib.input.DuffPane.
At this point, it will be possible to programmatically simulate
a human user at the keyboard. Automated tests that certain
Tibetan keyboards are working correctly will be performed at
this point, and these tests will work off the
LetterByLetterTibetanAndRomanDocument that TibetanDocument was
made to output above.
</li>
<li>
Create a command-line tool to convert from ACIP or Extended
Wylie to the currently supported output formats using Chandler's
modified gengetopt-2.4 [dubbed 2.4j] for command-line parameter
processing.
</li>
<li>
Add "Save As
[Unicode|Extended-Wylie|ACIP|XHTML|RTF(TMW)|RTF(SambhotaNew)|...]"
options to Jskad.
</li>
<li>
Code up Edward's servlet (described above).
</li>
</ol>
<p>
DLC: address fault-tolerance etc.
</p>
<h3>Things to think more about...</h3>
<p>
Things to think more about:
</p>
<ul>
<li>
Unicode normalization
</li>
</ul>
<p>
Please
<a href="mailto:thdltools-devel@lists.sourceforge.net">
e-mail us</a>
your comments about this page.
</p>
<p>
The
<a target="_blank" href="http://www.sourceforge.net/projects/thdltools">
THDL Tools</a>
project is generously hosted by:
<!--
DO NOT DELETE THE SF.NET LOGO.
We have a choice of colors and sizes for this logo (see
"https://sourceforge.net/docman/display_doc.php?docid=790&group_id=1"),
but we do not have the option of removing it. SourceForge requests
that we put it on each web page for our project, and to give us
incentive to do so, they will not track the number of hits for our
project web pages unless we put this link in. To track hits, see
"http://sourceforge.net/project/stats/index.php?report=months&group_id=61934".
-->
<a target="_blank" href="http://sourceforge.net/">
<img src="http://sourceforge.net/sflogo.php?group_id=61934&amp;type=1"
width="88" height="31" border="0" alt="SourceForge Logo">
</a>
<!-- AGAIN, DO NOT DELETE THE SF.NET LOGO. -->
</p>
</div>
</td>
</tr>
</table>
</div>
</body>
</html>