Unicode Version: 5.0.0
Date: 2006-06-13, 23:23:45 GMT
This page illustrates the application of the boundary specifications. The first chart shows where breaks would appear between different sample characters or strings. The sample characters are chosen mechanically to represent the different properties used by the specification. Where properties used in the rules have 'overlaps', the samples are given 'composed' names. For example, SentenceBreak uses GCLF_Sep: Sep is the SentenceBreak property, but it overlaps with the GraphemeClusterBreak property LF.
Other | GCControl | GCExtend | GCLF_Sep | GCCR_Sep | GCControl_Sep | GCControl_Format | Katakana | ALetter | MidLetter | MidNum | Numeric | ExtendNumLet | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Other | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
GCControl | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
GCExtend | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
GCLF_Sep | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
GCCR_Sep | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
GCControl_Sep | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
GCControl_Format | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
Katakana | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | × | ÷ | ÷ | ÷ | ÷ | × |
ALetter | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | × | ÷ | ÷ | × | × |
MidLetter | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
MidNum | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
Numeric | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | × | ÷ | ÷ | × | × |
ExtendNumLet | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | × | × | ÷ | ÷ | × | × |
ALetter GCControl_Format | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | × | ÷ | ÷ | × | × |
ALetter MidLetter | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | × | ÷ | ÷ | ÷ | ÷ |
ALetter MidLetter | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | × | ÷ | ÷ | ÷ | ÷ |
ALetter MidLetter GCControl_Format | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | × | ÷ | ÷ | ÷ | ÷ |
ALetter MidNum | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
Numeric MidLetter | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
Numeric MidLetter | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | ÷ | ÷ |
Numeric MidNum | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | × | ÷ |
Numeric MidNum GCControl_Format | ÷ | ÷ | × | ÷ | ÷ | ÷ | × | ÷ | ÷ | ÷ | ÷ | × | ÷ |
Due to the way they have been mechanically processed for generation, the following rules do not match the UAX rules precisely. In particular:
For the original rules, see the UAX.
The following samples illustrate the application of the rules. The blue lines indicate possible break points. If your browser supports titles, then positioning the mouse over each character will show its name, white positioning between characters shows the rule number of the rule responsible for the break-status.