Jskad/source/org/thdl/tib/text/tshegbar/package.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<!--

  @(#)package.html

  Copyright 2002 Tibetan and Himalayan Digital Library

  This software is the confidential and proprietary information of
  the Tibetan and Himalayan Digital Library. You shall use such
  information only in accordance with the terms of the license
  agreement you entered into with the THDL.

-->
</head>
<body bgcolor="white">

  Provides for manipulating Tibetan text at the <i>tsek bar</i> level.
  Roughly speaking, a "tsheg bar" (pronounced <i>tsek bar</i>) is a
  syllable.

<p>
  This package allows for turning a string of Unicode codepoints into
  our <i>TTBIR</i>, our Tibetan Tsheg Bar Internal Representation.
  Said Unicode document may contain non-Tibetan codepoints also.
</p>

</body>
</html>
This commit is for my benefit only; these classes are not ready for prime time, and the build system is not yet aware of them. I'm adding some classes for representing legal tsheg-bars (syllables, for the most part) in Unicode. These classes were designed bottom-up (OK, OK -- they weren't designed designed, but I had to write down everything I knew about Tibetan syntax somewhere). The classes are aware of extended wylie. I doubt the Javadocs work yet, and I'm still testing (and am not committing my testing code with these as it is not yet ready). Next on my list--fix these up to reflect my new awareness of suffix particles (like le'u'i'o) add classes to support syntactically incorrect Unicode sequences. Then add a UnicodeReader, and we've got the back end of a Tibetan Unicode shaping system (like half of MS's Uniscribe or Apple's Worldscript or FreeType Layout or Omega's OTPs). A top-down design would not have included LegalTshegBar. But now that my itch has been scratched, potential uses are lingering about. For example, it would be nice to scan some input and break it into LegalTshegBars, punctuation/marks/signs, and illegal stacks. Then we could alert the client of the illegality, its precise form, and its precise location. The real system for turning a Unicode stream into an internal representation suitable for conversion to EWTS/ACIP/XHTML/what-have-you need not be aware of Tibetan syntax. But to make the very best conversion from Unicode to, e.g., EWTS, it is necessary to konw that gaskad is better represented as gskad, but that jaskad is not the same as jskad. 2002-12-09 01:02:23 +00:00			`<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">`
			`<html>`
			`<head>`
			`<!--`

			`@(#)package.html`

			`Copyright 2002 Tibetan and Himalayan Digital Library`

			`This software is the confidential and proprietary information of`
			`the Tibetan and Himalayan Digital Library. You shall use such`
			`information only in accordance with the terms of the license`
			`agreement you entered into with the THDL.`

			`-->`
			`</head>`
			`<body bgcolor="white">`

			`Provides for manipulating Tibetan text at the <i>tsek bar</i> level.`
			`Roughly speaking, a "tsheg bar" (pronounced <i>tsek bar</i>) is a`
			`syllable.`

			`<p>`
Now uses terminology from the Unicode standard. No more talk of characters, for example. Normalization forms NFKD and NFD are supported for the Tibetan Unicode range. I don't like either, actually. I've tested NFKD, but I've not yet committed the tests. 2002-12-15 03:35:24 +00:00			`This package allows for turning a string of Unicode codepoints into`
This commit is for my benefit only; these classes are not ready for prime time, and the build system is not yet aware of them. I'm adding some classes for representing legal tsheg-bars (syllables, for the most part) in Unicode. These classes were designed bottom-up (OK, OK -- they weren't designed designed, but I had to write down everything I knew about Tibetan syntax somewhere). The classes are aware of extended wylie. I doubt the Javadocs work yet, and I'm still testing (and am not committing my testing code with these as it is not yet ready). Next on my list--fix these up to reflect my new awareness of suffix particles (like le'u'i'o) add classes to support syntactically incorrect Unicode sequences. Then add a UnicodeReader, and we've got the back end of a Tibetan Unicode shaping system (like half of MS's Uniscribe or Apple's Worldscript or FreeType Layout or Omega's OTPs). A top-down design would not have included LegalTshegBar. But now that my itch has been scratched, potential uses are lingering about. For example, it would be nice to scan some input and break it into LegalTshegBars, punctuation/marks/signs, and illegal stacks. Then we could alert the client of the illegality, its precise form, and its precise location. The real system for turning a Unicode stream into an internal representation suitable for conversion to EWTS/ACIP/XHTML/what-have-you need not be aware of Tibetan syntax. But to make the very best conversion from Unicode to, e.g., EWTS, it is necessary to konw that gaskad is better represented as gskad, but that jaskad is not the same as jskad. 2002-12-09 01:02:23 +00:00			`our <i>TTBIR</i>, our Tibetan Tsheg Bar Internal Representation.`
Now uses terminology from the Unicode standard. No more talk of characters, for example. Normalization forms NFKD and NFD are supported for the Tibetan Unicode range. I don't like either, actually. I've tested NFKD, but I've not yet committed the tests. 2002-12-15 03:35:24 +00:00			`Said Unicode document may contain non-Tibetan codepoints also.`
This commit is for my benefit only; these classes are not ready for prime time, and the build system is not yet aware of them. I'm adding some classes for representing legal tsheg-bars (syllables, for the most part) in Unicode. These classes were designed bottom-up (OK, OK -- they weren't designed designed, but I had to write down everything I knew about Tibetan syntax somewhere). The classes are aware of extended wylie. I doubt the Javadocs work yet, and I'm still testing (and am not committing my testing code with these as it is not yet ready). Next on my list--fix these up to reflect my new awareness of suffix particles (like le'u'i'o) add classes to support syntactically incorrect Unicode sequences. Then add a UnicodeReader, and we've got the back end of a Tibetan Unicode shaping system (like half of MS's Uniscribe or Apple's Worldscript or FreeType Layout or Omega's OTPs). A top-down design would not have included LegalTshegBar. But now that my itch has been scratched, potential uses are lingering about. For example, it would be nice to scan some input and break it into LegalTshegBars, punctuation/marks/signs, and illegal stacks. Then we could alert the client of the illegality, its precise form, and its precise location. The real system for turning a Unicode stream into an internal representation suitable for conversion to EWTS/ACIP/XHTML/what-have-you need not be aware of Tibetan syntax. But to make the very best conversion from Unicode to, e.g., EWTS, it is necessary to konw that gaskad is better represented as gskad, but that jaskad is not the same as jskad. 2002-12-09 01:02:23 +00:00			`</p>`

			`</body>`
			`</html>`