Added a design document concerning the Tibetan Format Converter,
a.k.a. the "Rosetta Stone".
This commit is contained in:
		
							parent
							
								
									58287c09a5
								
							
						
					
					
						commit
						5cfbcdfd30
					
				
					 1 changed files with 293 additions and 0 deletions
				
			
		
							
								
								
									
										293
									
								
								htdocs/TibetanFormatConverterDesign.html
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										293
									
								
								htdocs/TibetanFormatConverterDesign.html
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,293 @@ | ||||||
|  | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> | ||||||
|  | <html> | ||||||
|  | 
 | ||||||
|  | <!-- @author David Chandler --> | ||||||
|  | <!-- @date November 14, 2002 --> | ||||||
|  | <!-- @editor Emacs, baby! --> | ||||||
|  | 
 | ||||||
|  | <head> | ||||||
|  |   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | ||||||
|  |   <title>Tibetan Format Converter Design Document</title> | ||||||
|  | </head> | ||||||
|  | 
 | ||||||
|  | <body> | ||||||
|  | <h1>Tibetan Format Converter Design Document</h1> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   This document describes the design of a mechanism for converting | ||||||
|  |   from any of a number of representations of Tibetan+Roman text to any | ||||||
|  |   of a number of representations.  This converter will store | ||||||
|  |   Tibetan+Roman text internally in a | ||||||
|  |   org.thdl.tib.text.TibetanDocument, and it will use a | ||||||
|  |   org.thdl.tib.text.TibetanKeyboard to populate a TibetanDocument. | ||||||
|  |   These two classes exist presently inside the Jskad application, but | ||||||
|  |   will be modified as needed so that servlets, console applications, | ||||||
|  |   and AWT/Swing-based applications can all make use of them. | ||||||
|  | </p> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   The difficulty is in fault-tolerance, reliability (DLC address both | ||||||
|  |   verification AND validation), and speed.  Speed will be of least | ||||||
|  |   concern. | ||||||
|  | </p> | ||||||
|  | 
 | ||||||
|  | <h3>Input formats</h3> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   The converter will support, in a modular fashion, <b>mixed Tibetan | ||||||
|  |   and Roman</b> input in the following formats: | ||||||
|  | </p> | ||||||
|  |   <ul> | ||||||
|  |     <li> | ||||||
|  |       An HTML file with embedded <tibetan | ||||||
|  |       translit="extended-wylie">sgra</tibetan> tags (from the | ||||||
|  |       SimpleTibetanAndRomanDocument DTD mentioned below) | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Unicode (regardless of the order of consonants in a stack) | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       RTF for TibetanMachine | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       RTF for TibetanMachineWeb | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       RTF for Sambhota Old | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       RTF for Sambhota New | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Edward and Than's XHTML | ||||||
|  |     </li> | ||||||
|  |   </ul> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   In addition, the converter will support, in a modular fashion, | ||||||
|  |   <b>strictly Tibetan</b> input in the following formats: | ||||||
|  | </p> | ||||||
|  |   <ul> | ||||||
|  |     <li> | ||||||
|  |       Extended Wylie, ACIP, and any other format for which there | ||||||
|  |       exists a Jskad keyboard (i.e., a .ini file in the desired | ||||||
|  |       format).  In practice, only ACIP and some Wylie variants are | ||||||
|  |       used for storing Tibetan, but the mechanism is general. (This | ||||||
|  |       will be in UTF8 with no metadata) | ||||||
|  |     </li> | ||||||
|  |   </ul> | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   The converter will attempt to accept input that has minor flaws, but | ||||||
|  |   it will also have a mode that rejects input with even the slightest | ||||||
|  |   flaw. | ||||||
|  | </p> | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | <h3>Output formats</h3> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   The converter will support, in a modular fashion, outputting a | ||||||
|  |   TibetanDocument that is <b>entirely Tibetan, entirely Roman, or a | ||||||
|  |   mix of Tibetan and Roman</b>, to the following output formats: | ||||||
|  | </p> | ||||||
|  |   <ul> | ||||||
|  |     <li> | ||||||
|  |       A proprietary, not-very-well-thought-out XML file of David | ||||||
|  |       Chandler's design.  For ease of imputation, let's say that this | ||||||
|  |       will adhere to the LetterByLetterTibetanAndRomanDocument DTD. | ||||||
|  |       This is useful for testing the software.  Also useful because it | ||||||
|  |       can easily be transformed into as-yet-unthought-of output | ||||||
|  |       formats. | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Extended Wylie or ACIP (inside a trivial XML[UTF8] document that | ||||||
|  |       describes the tool that output this file and links to a | ||||||
|  |       versioned DTD on the THDL web server) [only these two are used, | ||||||
|  |       but we could generate output in the TCC keyboard #1 | ||||||
|  |       "transliteration" because the mechanism is general] | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Unicode (DLC: in which order for consonantal stacks? also, | ||||||
|  |       normalized or not?) | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       RTF for TibetanMachine | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       RTF for TibetanMachineWeb | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       RTF for Sambhota Old | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       RTF for Sambhota New | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Edward and Than's XHTML | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       XML that is much leaner and has <tibetan translit="acip | | ||||||
|  |       extended-wylie"> and <roman> tags (just a minimum of | ||||||
|  |       them).  This will be according to the not-yet-in-existence | ||||||
|  |       SimpleTibetanAndRomanDocument DTD. | ||||||
|  |     </li> | ||||||
|  |   </ul> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   The converter will support, in a modular fashion, outputting a | ||||||
|  |   TibetanDocument that contains <b>only Tibetan and no Roman text</b> | ||||||
|  |   to the following additional output formats: | ||||||
|  | </p> | ||||||
|  |   <ul> | ||||||
|  |     <li> | ||||||
|  |       Extended Wylie, ACIP, and any other format for which there | ||||||
|  |       exists a Jskad keyboard (i.e., a .ini file in the desired | ||||||
|  |       format).  In practice, only ACIP and some Wylie variants are | ||||||
|  |       used for storing Tibetan, but the mechanism is general. (This | ||||||
|  |       will be in UTF8 with no metadata) | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Phonetic Tibetan (ACIP loose standard) | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Phonetic Tibetan (THDL standard) | ||||||
|  |     </li> | ||||||
|  |   </ul> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  | What formats am I missing?  E-mail <a | ||||||
|  | href="mailto:dchandler@users.sourceforge.net">me</a> them. | ||||||
|  | </p> | ||||||
|  | 
 | ||||||
|  | <h3>Advantages and Benefits</h3> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   After this work item is completed, Jskad will be a powerful viewer | ||||||
|  |   of the various input formats described above. | ||||||
|  | </p> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   Command-line tools will exist to convert to-and-fro this-and-that. | ||||||
|  |   The most useful conversions will be to-and-from Unicode.  This will | ||||||
|  |   allow long-term storage in a format that will exist for years, while | ||||||
|  |   still allowing day-to-day work on systems without support for | ||||||
|  |   rendering Unicode. | ||||||
|  | </p> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   In addition, it will be possible with a little extra work to use | ||||||
|  |   Jskad as an HTML source editor rather than notepad.  You can save as | ||||||
|  |   the ugly, uneditable XHTML source that browsers can display, or | ||||||
|  |   preview in your system's default browser. | ||||||
|  | </p> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   Edward envisions a servlet that allows users to paste in, type in, | ||||||
|  |   or upload Tibetan in their format of choice.  This will be shown on | ||||||
|  |   the left side of the web page.  Upon identifying that format | ||||||
|  |   (perhaps the servlet will make an educated guess, even), they can | ||||||
|  |   then select any of our supported output formats and see the result | ||||||
|  |   (and download at their leisure) on the right half of the web page. | ||||||
|  | </p> | ||||||
|  | 
 | ||||||
|  | <h3>Implementation Plan</h3> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   To implement this converter, we will do the following: | ||||||
|  | </p> | ||||||
|  |   <ol> | ||||||
|  |     <li> | ||||||
|  |       Have TibetanDocument output a dense XML document that adheres to | ||||||
|  |       the LetterByLetterTibetanAndRomanDocument DTD. | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Play with XSLT and use it where appropriate to create output. | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Get the keyboard input logic out of org.thdl.tib.input.DuffPane. | ||||||
|  |       At this point, it will be possible to programmatically simulate | ||||||
|  |       a human user at the keyboard.  Automated tests that certain | ||||||
|  |       Tibetan keyboards are working correctly will be performed at | ||||||
|  |       this point, and these tests will work off the | ||||||
|  |       LetterByLetterTibetanAndRomanDocument that TibetanDocument was | ||||||
|  |       made to output above. | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Create a command-line tool to convert from ACIP or Extended | ||||||
|  |       Wylie to the currently supported output formats using Chandler's | ||||||
|  |       modified gengetopt-2.4 [dubbed 2.4j] for command-line parameter | ||||||
|  |       processing. | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Add "Save As | ||||||
|  |       [Unicode|Extended-Wylie|ACIP|XHTML|RTF(TMW)|RTF(SambhotaNew)|...]"  | ||||||
|  |       options to Jskad. | ||||||
|  |     </li> | ||||||
|  |     <li> | ||||||
|  |       Code up Edward's servlet (described above). | ||||||
|  |     </li> | ||||||
|  |   </ol> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  | DLC: address fault-tolerance etc. | ||||||
|  | </p> | ||||||
|  | 
 | ||||||
|  | <h3>Things to think more about...</h3> | ||||||
|  | 
 | ||||||
|  | <p> | ||||||
|  |   Things to think more about: | ||||||
|  | </p> | ||||||
|  |   <ul> | ||||||
|  |     <li> | ||||||
|  |       Unicode normalization | ||||||
|  |     </li> | ||||||
|  |   </ul> | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | <!-- THDLTools FOOTER: --> | ||||||
|  | <hr> | ||||||
|  | 
 | ||||||
|  |   <i> | ||||||
|  |     Please | ||||||
|  | 
 | ||||||
|  |     <a href="mailto:thdltools-devel@lists.sourceforge.net"> | ||||||
|  |       e-mail us</a>  | ||||||
|  | 
 | ||||||
|  |     your comments about this page. | ||||||
|  |   </i> | ||||||
|  | 
 | ||||||
|  | <hr> | ||||||
|  | 
 | ||||||
|  | The | ||||||
|  | 
 | ||||||
|  | <a href="index.html"> | ||||||
|  |   THDL Tools</a> | ||||||
|  | 
 | ||||||
|  | project is generously hosted by: | ||||||
|  | 
 | ||||||
|  | <!-- | ||||||
|  | 
 | ||||||
|  | DO NOT DELETE THE SF.NET LOGO. | ||||||
|  | 
 | ||||||
|  | We have a choice of colors and sizes for this logo (see | ||||||
|  | "https://sourceforge.net/docman/display_doc.php?docid=790&group_id=1"), | ||||||
|  | but we do not have the option of removing it.  SourceForge requests | ||||||
|  | that we put it on each web page for our project, and to give us | ||||||
|  | incentive to do so, they will not track the number of hits for our | ||||||
|  | project web pages unless we put this link in.  To track hits, see | ||||||
|  | "http://sourceforge.net/project/stats/index.php?report=months&group_id=61934". | ||||||
|  | 
 | ||||||
|  | --> | ||||||
|  | <a href="http://sourceforge.net/"> | ||||||
|  |   <img src="http://sourceforge.net/sflogo.php?group_id=61934&type=1" | ||||||
|  |        width="88" height="31" border="0" alt="SourceForge Logo"> | ||||||
|  | </a> | ||||||
|  | <!-- AGAIN, DO NOT DELETE THE SF.NET LOGO. --> | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | </body> | ||||||
|  | </html> | ||||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue