/* The contents of this file are subject to the THDL Open Community License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License on the THDL web site (http://www.thdl.org/). Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific terms governing rights and limitations under the License. The Initial Developer of this software is the Tibetan and Himalayan Digital Library (THDL). Portions created by the THDL are Copyright 2001 THDL. All Rights Reserved. Contributor(s): ______________________________________. */ package org.thdl.tib.text.tshegbar; /** A TshegBar (pronounced tsek bar) is roughly a Tibetan * syllable. In truth, it is the stuff between two tseks. * *
First, some terminology.
* *\u0F74
is not a grapheme cluster. In
* addition, in English, many fonts have a single glyph (a
* "ligature") for the combination of two grapheme clusters,
* e.g. "fi". A single grapheme cluster may have one or more
* representations by sequences of Unicode codepoints, or it may not
* be representable becuase it is only part of one Unicode codepoint
* or pictures a nonstandard character.\u0F35
,
* plus an optional a-chung (\u0F71
), plus an
* optional simple vowel.\u0F72
, \u0F74
,
* \u0F7A
, \u0F7B
,
* \u0F7C
, \u0F7D
, or
* \u0F80
.(Note: The string "\u0F68\u0F7E\u0F7C"
* seems to equal "\u0F00"
, though the Unicode
* standard does not indicate that it is so. This code treats it
* that way.)
This class allows for invalid tsheg bars, like those * containing more than one prefix, more than two suffixes, an * invalid postsuffix (secondary suffix), more than one consonant * stack (excluding the special case of what we call in THDL Extended * Wylie "'i", which is technically a consonant stack but is used in * Tibetan like a suffix).
. * *Subclasses exist for valid, grammatically correct tsheg bars, * and for invalid tsheg bars. Note that correctness is at the tsheg * bar level only; it may be grammatically incorrect to concatenate * two valid tsheg bars. Some subclasses can be represented in * Unicode, but others contain nonstandard glyphs/characters and * cannot be.
* * @author David Chandler */ public abstract class TshegBar implements UnicodeReadyThunk { /** Returns true, as we consider a transliteration in the Tibetan * alphabet of a non-Tibetan language, say Chinese, as being * Tibetan. * @return true */ public boolean isTibetan() { return true; } }