Commit graph

230 commits

Author SHA1 Message Date
dchandler
7ea185fa01 Renamed UnicodeCharToExtendedWylie to
UnicodeCodepointToThdlWylie.java.

Added a new class, UnicodeGraphemeCluster, that can tell you
the components of a grapheme cluster from top to bottom.  It does not
yet have good error checking; it is not yet finished.

Next is to parse clean Unicode into GraphemeClusters.  After that comes
scanning dirty Unicode into best-guess GraphemeClusters, and scanning
dirty Unicode to get nice error messages.
2002-12-17 13:51:18 +00:00
dchandler
8e8a23c6a6 Extended Wylie is referred to as THDL Extended Wylie or THDL Wylie
because a Japanese scholar has an "Extended Wylie" also.

NFKD and NFD have a new brother, NFTHDL.  I wish there weren't a need,
but as my yet-to-be-put-into-CVS break-unicode-into-grapheme-clusters code
demonstrates, the-need-is-there.  forgive-me for the hyphens, it's late.
2002-12-15 06:57:32 +00:00
dchandler
a42347b224 Now uses terminology from the Unicode standard. No more talk of
characters, for example.

Normalization forms NFKD and NFD are supported for the Tibetan Unicode
range.  I don't like either, actually.  I've tested NFKD, but I've not yet
committed the tests.
2002-12-15 03:35:24 +00:00
dchandler
26993a5093 So that Unicode escape sequences appear correctly in javadocs. 2002-12-09 02:35:39 +00:00
dchandler
2d6c8be804 So that Unicode escape sequences appear correctly in javadocs. 2002-12-09 02:29:09 +00:00
dchandler
22c6ec5406 Javadoc now works without warnings. 2002-12-09 01:48:34 +00:00
dchandler
f4a16f8e9d This commit is for my benefit only; these classes are not ready for prime time,
and the build system is not yet aware of them.

I'm adding some classes for representing legal tsheg-bars (syllables, for the
most part) in Unicode.  These classes were designed bottom-up (OK, OK --
they weren't designed designed, but I had to write down everything I knew
about Tibetan syntax somewhere).  The classes are aware of extended
wylie.  I doubt the Javadocs work yet, and I'm still testing (and am not
committing my testing code with these as it is not yet ready).

Next on my list--fix these up to reflect my new awareness of suffix particles
(like le'u'i'o) add classes to support syntactically incorrect Unicode
sequences.  Then add a UnicodeReader, and we've got the back end of
a Tibetan Unicode shaping system (like half of MS's Uniscribe or Apple's
Worldscript or FreeType Layout or Omega's OTPs).

A top-down design would not have included LegalTshegBar.  But now that
my itch has been scratched, potential uses are lingering about.  For example,
it would be nice to scan some input and break it into LegalTshegBars,
punctuation/marks/signs, and illegal stacks.  Then we could alert the client
of the illegality, its precise form, and its precise location.

The real system for turning a Unicode stream into an internal representation
suitable for conversion to EWTS/ACIP/XHTML/what-have-you need not be
aware of Tibetan syntax.  But to make the very best conversion from
Unicode to, e.g., EWTS, it is necessary to konw that gaskad is better
represented as gskad, but that jaskad is not the same as jskad.
2002-12-09 01:02:23 +00:00
eg3p
9eedfcd909 This is Tashi's TibetanSyllable class for sorting Wylie Tibetan.
It does not have many methods for determining the root letter, suffix,
and so on, but these should be easy to add. David, please use this
class to the extent that it and your new work overlap.
2002-12-05 01:48:41 +00:00
dchandler
d200b03d66 Updated the build system so that you must do a cvs checkout of the
'Fonts' module inside the 'Jskad' module.  I.e., you must now have the
tree like so:

Jskad/
   source/
   dist/
   Fonts/
       TibetanMachineWeb/
   .
   .
   .

This is because the THDL tools now optionally (and by default) load
the TibetanMachineWeb fonts automatically.

Updated the build system so that the 'web-start-releases' and
'self-contained-dist' targets JAR up optional JARs to create
double-clickable, self-contained joy.  Even the TMW fonts are in the
JARs now.

Changed the strings describing two Jskad keyboards so that "keyboard"
is no longer in the description.  It's in the label next to the combo
box.

Jskad now saves preferences on exit or when the user selects a menu
item (that is there for debugging mainly) to ~/my_thdl_preferences.txt
on *nix or C:\my_thdl_preferences.txt on Win32.  I don't know the
correct Mac location.

There's a new paradigm for telling org.thdl.util.ThdlOptions that a
user preference has been changed.  If, for example, a combo box is
manipulated so that the ACIP keyboard is selected, then you must call
a certain method in ThdlOptions.
2002-11-18 16:12:25 +00:00
eg3p
c9349f6846 These files are not used. 2002-11-12 16:47:02 +00:00
dchandler
ecf61bc892 A DuffPane is now a TibetanPane. A TibetanPane is much more lightweight
but does line breaks correctly.  I.e., I refactored DuffPane into two classes.

I did this trying to track down a subtle bug in line breaking: 'gye ' breaks
after 'gy' sometimes, with the dreng bo on the next line, but only when you
resize the window certain ways, and only in Savant (and maybe QD and the
translation tool, I don't know) but not in Jskad.

I was not successful in finding the bug, but it still exists when I use
TibetanPanes instead of DuffPanes in org.thdl.savant.tib.*.
2002-11-08 04:11:42 +00:00
dchandler
d462f4e41c Fixes all known bugs with the ACIP keyboard except for one:
ACIP's 'WA' represents Wylie's 'wa', but ACIP's 'ZHVA' represents Wylie's
'zhwa'.  The key for wasur is the same as the key for the twentieth
consonant in extended Wylie, but not in ACIP.
2002-11-03 17:34:33 +00:00
dchandler
de6ae79959 Fixes bug 624133, "Input freezes after impossible character". Try 'shsM' in
ACIP or 'ShSm' in Extended Wylie to see the new behavior.

We use a trie to store valid input sequences.  In the future, we could use
the same trie as a replacement for the more inefficient HashSets we use to
store characters, vowels, and punctuation.  For example, we'd use
'validInputSequences.put("K", new Pair("consonant", "k"))' when reading
in the ACIP keyboard's description of the first consonant of the Tibetan
alphabet in 'TibetanKeyboard.java'.

Note that the current trie implementation is only useful for 7- or 8-bit
transcription systems, and works best for tries with low average depth, which
describes a transcription system's trie very well.  If you used arbitrary
Unicode in your keyboard, you'd need a different trie implementation.

Improved the optional keyboard input mode status messages.
2002-11-02 18:44:24 +00:00
dchandler
a6cc4a7ff3 Removed/commented out/tagged some unused local variables.
Added a JUnit test for the new Trie that fails at present since the Trie is
case-insensitive.  Running JUnit tests is not something our build system
knows about at present, but Eclipse 2.0 makes it very easy.

Fixed a few compiler errors due to imports I'd forgotten.
2002-11-02 16:01:40 +00:00
dchandler
aa580e0bea Undoing my erroneous commit of buggy code. 2002-11-02 03:46:44 +00:00
dchandler
abcf8f19b3 Factored TibetanDocument into two classes, one that is a
DefaultStyledDocument, and another consisting entirely of static utility
methods for processing Tibetan text.  Moved TibetanDocument.DuffData
into its own class.

I think this makes things a bit more transparent, and gets us a little closer to
making clean use of Swing.
2002-11-02 03:38:59 +00:00
dchandler
5249c48807 Factored TibetanDocument into two classes, one that is a
DefaultStyledDocument, and another consisting entirely of static utility
methods for processing Tibetan text.  Moved TibetanDocument.DuffData
into its own class.

I think this makes things a bit more transparent, and gets us a little closer to
making clean use of Swing.
2002-11-02 03:33:09 +00:00
dchandler
97c530e974 GHA and KR'i now work. 2002-10-28 05:31:19 +00:00
dchandler
1ecbfe6a7c Fixed some Javadoc comments in preparation for putting up new Javadocs
on http://thdltools.sf.net/.
2002-10-28 04:49:24 +00:00
dchandler
fd1b4dd468 Now breaks the line after the last whitespace, not the first.
I cleaned things up a bit, and I've made logging optional since I don't yet
trust the code fully.

A Wylie underscore at the end of a line is worth looking into further, at the
very least.
2002-10-28 04:12:49 +00:00
dchandler
8433369d60 Now with slightly better error handling. 2002-10-28 03:17:28 +00:00
dchandler
0ad135f8f1 This may well be a fix to the "Improper line wrapping" bug. The fix
is basically that we use our own special ViewFactory, with a new
subclass of LabelView (the view RTFEditorKit uses for the nitty
gritty) that is aware of Tibetan.

There are a couple of nasty hacks still here, and Swing's
documentation for doing what I did was quite poor.  I searched the web
for hours, read the Javadocs and the tutorials, and consulted a Swing
reference book, but I still don't have tremendous confidence in this
solution.  If it fundamentally doesn't work, though, we have to define
our own first-class Document, Element hierarchy, ViewFactory, Views,
and EditorKit.  So let's hope it *does* work fundamentally.

I can't say for sure if this even works, as I have yet to run this
code on a machine where Jskad works properly.  I had major trouble
installing the TMW fonts on Linux, and have yet to resolve it, even
after verifying via xlsfonts that the fonts were installed and then
changing TibetanMachineWeb.java to look for them.  Because I haven't
tested this yet, a lot of nasty code is tagged 'DLC' and commented
out.
2002-10-28 03:08:04 +00:00
dchandler
2a923f83f8 Added a first attempt at an ACIP keyboard following their document
http://www.asianclassics.org/download/tibetancode/ticode.pdf
2002-10-20 07:59:25 +00:00
dchandler
4f9bdab7f7 Changed a /* */ comment to a Javadoc (/** */) comment. 2002-10-13 19:13:59 +00:00
dchandler
403f21c8db Added Javadoc files overview.html and several package.html files.
Added a "Quit" option to Savant's File menu.  Factored out the Close
option in doing so.

Exceptions in many action listeners are now handled by
org.thdl.util.ThdlActionListener or org.thdl.util.ThdlAbstractAction.

Many exceptions that we used to just log now optionally cause aborts.
This option is on by default for developers using 'ant savant-run'-style
targets, but it is off for users.

An erroneous CLASSPATH now causes a useful error message in almost
all situations.

Fixed some typos and bad links in Javadoc comments.

Added a simple assertion facility, but the overhead is suffered even in
release builds.

Factored out the code that sets up log files like savant.log and jskad.log.
2002-10-06 18:23:27 +00:00
dchandler
859a7731fb More robust--handles the case when tibwn.ini cannot be found. 2002-10-04 04:37:32 +00:00
dchandler
10d86fc3b7 Updated comments so that Javadoc 1.4 warnings went away. 2002-09-30 03:10:00 +00:00
dchandler
1b47e7c268 Fixed the line feeds, which were botched DOS line feeds.
Added copyright boilerplate.
2002-09-28 14:35:09 +00:00
dchandler
3c82f0a24c Initial revision 2002-09-28 00:53:39 +00:00
dchandler
c6d6116ff2 Initial revision 2002-09-23 23:15:39 +00:00