I really hesitate to commit this because I'm not sure what it brings to the
table exactly and I fear that it makes the ACIP->Tibetan converter code a lot uglier. The TODO(DLC)[EWTS->Tibetan] comments littered throughout are part of the ugliness; they point to the ugliness. If each were addressed, cleanliness could perhaps be achieved. I've largely forgotten exactly what this change does, but it attempts to improve EWTS->Tibetan conversion. The lexer is probably really, really primitive. I concentrate here on converting a single tsheg bar rather than a whole document. Eclipse was used during part of my journey here and some imports were reorganized merely because I could. :) (Eclipse was needed when the usual ant build failed to run a new test EWTSTest. And I wanted its debugger.) Next steps: end-to-end EWTS tests should bring many problems to light. Fix those. Triage all the TODO comments. I don't know that I'll ever really trust the implementation. The tests are valuable, though. A clean implementation of EWTS->Tibetan in Jython might hold enough interest for me; I'd like to learn Python.
This commit is contained in:
parent
f64bae8ea6
commit
7198f23361
45 changed files with 1666 additions and 695 deletions
|
@ -506,5 +506,25 @@ public class UnicodeUtils implements UnicodeConstants {
|
|||
} while (mutated_this_time_through);
|
||||
return mutated;
|
||||
}
|
||||
|
||||
/** Returns true iff ch is a valid Tibetan codepoint in Unicode
|
||||
* 4.0: */
|
||||
public boolean isTibetanUnicodeCodepoint(char ch) {
|
||||
// NOTE: could use an array of 256 booleans for speed but I'm lazy
|
||||
return ((ch >= '\u0f00' && ch <= '\u0fcf')
|
||||
&& !(ch == '\u0f48'
|
||||
|| (ch > '\u0f6a' && ch < '\u0f71')
|
||||
|| (ch > '\u0f8b' && ch < '\u0f90')
|
||||
|| ch == '\u0f98'
|
||||
|| ch == '\u0fbd'
|
||||
|| ch == '\u0fcd'
|
||||
|| ch == '\u0fce'));
|
||||
}
|
||||
|
||||
/** Returns true iff ch is in 0F00-0FFF but isn't a valid Tibetan
|
||||
* codepoint in Unicode 4.0: */
|
||||
public boolean isInvalidTibetanUnicode(char ch) {
|
||||
return (isInTibetanRange(ch) && !isTibetanUnicodeCodepoint(ch));
|
||||
}
|
||||
}
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue