|
JavaTM 2 Platform Std. Ed. v1.4.0 |
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--java.text.Collator | +--java.text.RuleBasedCollator
The RuleBasedCollator
class is a concrete subclass of
Collator
that provides a simple, data-driven, table
collator. With this class you can create a customized table-based
Collator
. RuleBasedCollator
maps
characters to sort keys.
RuleBasedCollator
has the following restrictions
for efficiency (other subclasses may be used for more complex languages) :
The collation table is composed of a list of collation rules, where each rule is of one of three forms:
<modifier> <relation> <text-argument> <reset> <text-argument>The definitions of the rule elements is as follows:
b c
is treated as bc
.
'@' : Indicates that accents are sorted backwards, as in French.
'&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted.
This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:
Notice that the order is important, as the subsequent item goes immediately after the text-argument. The following are not equivalent:a < b < c a < b & b < c a < c & a < b
Either the text-argument must already be present in the sequence, or some initial substring of the text-argument must be present. (e.g. "a < b & ae < e" is valid since "a" is present in the sequence before "ae" is reset). In this latter case, "ae" is not entered and treated as a single character; instead, "e" is sorted as if it were expanded to two characters: "a" followed by an "e". This difference appears in natural languages: in traditional Spanish "ch" is treated as though it contracts to a single character (expressed as "c < ch < d"), while in traditional German a-umlaut is treated as though it expanded to two characters (expressed as "a,A < b,B ... &ae;\u00e3&AE;\u00c3"). [\u00e3 and \u00c3 are, of course, the escape sequences for a-umlaut.]a < b & a < c a < c & a < b
Ignorable Characters
For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all the all text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character, as we saw earlier in the word "black-birds". In the samples for different languages, you see that most accents are ignorable.
Normalization and Accents
RuleBasedCollator
automatically processes its rule table to
include both pre-composed and combining-character versions of
accented characters. Even if the provided rule string contains only
base characters and separate combining accent characters, the pre-composed
accented characters matching all canonical combinations of characters from
the rule string will be entered in the table.
This allows you to use a RuleBasedCollator to compare accented strings even when the collator is set to NO_DECOMPOSITION. There are two caveats, however. First, if the strings to be collated contain combining sequences that may not be in canonical order, you should set the collator to CANONICAL_DECOMPOSITION or FULL_DECOMPOSITION to enable sorting of combining sequences. Second, if the strings contain characters with compatibility decompositions (such as full-width and half-width forms), you must use FULL_DECOMPOSITION, since the rule tables only include canonical mappings. For more information, see The Unicode Standard, Version 2.0.)
Errors
The following are errors:
RuleBasedCollator
throws
a ParseException
.
Examples
Simple: "< a < b < c < d"
Norwegian: "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J < k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T < u,U< v,V< w,W< x,X< y,Y< z,Z < \u00E5=a\u030A,\u00C5=A\u030A ;aa,AA< \u00E6,\u00C6< \u00F8,\u00D8"
Normally, to create a rule-based Collator object, you will use
Collator
's factory method getInstance
.
However, to create a rule-based Collator object with specialized
rules tailored to your needs, you construct the RuleBasedCollator
with the rules contained in a String
object. For example:
Or:String Simple = "< a< b< c< d"; RuleBasedCollator mySimple = new RuleBasedCollator(Simple);
String Norwegian = "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J" + "< k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T" + "< u,U< v,V< w,W< x,X< y,Y< z,Z" + "< \u00E5=a\u030A,\u00C5=A\u030A" + ";aa,AA< \u00E6,\u00C6< \u00F8,\u00D8"; RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
Combining Collator
s is as simple as concatenating strings.
Here's an example that combines two Collator
s from two
different locales:
// Create an en_US Collator object RuleBasedCollator en_USCollator = (RuleBasedCollator) Collator.getInstance(new Locale("en", "US", "")); // Create a da_DK Collator object RuleBasedCollator da_DKCollator = (RuleBasedCollator) Collator.getInstance(new Locale("da", "DK", "")); // Combine the two // First, get the collation rules from en_USCollator String en_USRules = en_USCollator.getRules(); // Second, get the collation rules from da_DKCollator String da_DKRules = da_DKCollator.getRules(); RuleBasedCollator newCollator = new RuleBasedCollator(en_USRules + da_DKRules); // newCollator has the combined rules
Another more interesting example would be to make changes on an existing
table to create a new Collator
object. For example, add
"&C< ch, cH, Ch, CH" to the en_USCollator
object to create
your own:
// Create a new Collator object with additional rules String addRules = "&C< ch, cH, Ch, CH"; RuleBasedCollator myCollator = new RuleBasedCollator(en_USCollator + addRules); // myCollator contains the new rules
The following example demonstrates how to change the order of non-spacing accents,
// old rule String oldRules = "=\u0301;\u0300;\u0302;\u0308" // main accents + ";\u0327;\u0303;\u0304;\u0305" // main accents + ";\u0306;\u0307;\u0309;\u030A" // main accents + ";\u030B;\u030C;\u030D;\u030E" // main accents + ";\u030F;\u0310;\u0311;\u0312" // main accents + "< a , A ; ae, AE ; \u00e6 , \u00c6" + "< b , B < c, C < e, E & C < d, D"; // change the order of accent characters String addOn = "& \u0300 ; \u0308 ; \u0302"; RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
The last example shows how to put new primary ordering in before the
default setting. For example, in Japanese Collator
, you
can either sort English characters before or after Japanese characters,
// get en_US Collator rules RuleBasedCollator en_USCollator = (RuleBasedCollator)Collator.getInstance(Locale.US); // add a few Japanese character to sort before English characters // suppose the last character before the first base letter 'a' in // the English collation rule is \u2212 String jaString = "& \u2212 < \u3041, \u3042 < \u3043, \u3044"; RuleBasedCollator myJapaneseCollator = new RuleBasedCollator(en_USCollator.getRules() + jaString);
Collator
,
CollationElementIterator
Field Summary |
Fields inherited from class java.text.Collator |
CANONICAL_DECOMPOSITION, FULL_DECOMPOSITION, IDENTICAL, NO_DECOMPOSITION, PRIMARY, SECONDARY, TERTIARY |
Constructor Summary | |
RuleBasedCollator(String rules)
RuleBasedCollator constructor. |
Method Summary | |
Object |
clone()
Standard override; no change in semantics. |
int |
compare(String source,
String target)
Compares the character data stored in two different strings based on the collation rules. |
boolean |
equals(Object obj)
Compares the equality of two collation objects. |
CollationElementIterator |
getCollationElementIterator(CharacterIterator source)
Return a CollationElementIterator for the given String. |
CollationElementIterator |
getCollationElementIterator(String source)
Return a CollationElementIterator for the given String. |
CollationKey |
getCollationKey(String source)
Transforms the string into a series of characters that can be compared with CollationKey.compareTo. |
String |
getRules()
Gets the table-based rules for the collation object. |
int |
hashCode()
Generates the hash code for the table-based collation object |
Methods inherited from class java.text.Collator |
compare, equals, getAvailableLocales, getDecomposition, getInstance, getInstance, getStrength, setDecomposition, setStrength |
Methods inherited from class java.lang.Object |
finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public RuleBasedCollator(String rules) throws ParseException
rules
- the collation rules to build the collation table from.
ParseException
- A format exception
will be thrown if the build process of the rules fails. For
example, build rule "a < ? < d" will cause the constructor to
throw the ParseException because the '?' is not quoted.Locale
Method Detail |
public String getRules()
public CollationElementIterator getCollationElementIterator(String source)
CollationElementIterator
public CollationElementIterator getCollationElementIterator(CharacterIterator source)
CollationElementIterator
public int compare(String source, String target)
compare
in class Collator
source
- the source string.target
- the target string.
CollationKey
,
Collator.getCollationKey(java.lang.String)
public CollationKey getCollationKey(String source)
getCollationKey
in class Collator
source
- the string to be transformed into a collation key.
CollationKey
,
Collator.compare(java.lang.String, java.lang.String)
public Object clone()
clone
in class Collator
Cloneable
public boolean equals(Object obj)
equals
in interface Comparator
equals
in class Collator
obj
- the table-based collation object to be compared with this.
public int hashCode()
hashCode
in class Collator
Object.equals(java.lang.Object)
,
Hashtable
|
JavaTM 2 Platform Std. Ed. v1.4.0 |
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Java, Java 2D, and JDBC are trademarks or registered trademarks of Sun Microsystems, Inc. in the US and other countries.
Copyright 1993-2002 Sun Microsystems, Inc. 901 San Antonio Road
Palo Alto, California, 94303, U.S.A. All Rights Reserved.