Re: Authority For Western Line Breaking Rules
I have pointed this thread to Asmus Freytag who is a author of UAX#14. Follwings are reply from him, I copied them with the permission from Freytag. > Mr. Eliot: >In thinking about it, I think that the Annex 14 rules are stated in such >a way that the rules are appropriate for languages that do not use space >to determine line breaks without explicitly disallowing Western-style >line breaking behavior. Kobayashi: >The question is UAX#14 is appropriate for Western language or not. (Freytag until the end of body of this mail:) The answer to that is YES. The whole idea about UAX#14 is to have a single default algorithm that does well in a Western (space based) and East Asian environment, by giving special treatment to characters that are of concern in both environments. The results should be usable in standard text handling, perhaps with minor tailoring as suggested in the document. High-end publishing systems may need to apply some additional tailoring. These systems often give users a choice of line-breaking rules. There may be some languages that require tailoring in specific situations. In the message you pointed me to, the following statements were made: >For background, Annex 14 is very permissive, implicitly allowing line >breaks wherever they are not explicitly disallowed and does not, for >example, disallow breaks following closing punctuation, allowing for >example, this break: > > >"e. >g., a thing" > >That is, Annex 14 allows this break, even though it would be wrong in any >Western language I'm familiar with. However, the statement is incorrect. UAX#14 allows breaks after closing punctuation, but not if it precedes alphabetic characters. There are no breaks in "e.g.", but there is a break in "...tailoring. These...", since there is a space after the ".". >Annex 14 is also informative--it does not require conforming Unicode >implementations to implement the Annex 14 rules except for those >characters that have normative line breaking properties, such as line >separator and soft hyphen. This statement is correct. The rules in UAX#14 define what I would like to call for the purpose of this discussion a 'best default practice with normative nucleus'. Some of the rules (and the properties they are based on) describe behavior that is required. Usually, this is limited to special behaviors, such as the non-breaking behavior of the NO BREAK SPACE for example. Without such requirements, users would not be able to rely on the use of NO BREAK SPACE to express the kinds of linebreak behavior for which NO BREAK SPACE has always been intended. However, many of the other rules are subject to customization (tailoring) to fit the requirements of particular languages more precisely, or to match the needs of a particular in-house style at a large publisher's. In other words, the main reason that those rules are informative is that there is no single set of rules for line breaking, often not even a single one for a given language. However, using UAX#14 as the starting point will allow an implementation to cover all Unicode characters, so that texts with foreign material inserted will behave quite reasonable, without the need for all implementers to become experts in *all* languages. In some instances, a small amount of tailoring will be useful if texts are known to be predominantly in a given language which has special requirements. --------------Up to here------ Best regards, Tokushige Kobayashi Antenna House, Inc. E-mail koba@xxxxxxxxxxxxx WWW http://www.antenna.co.jp/XML/xml-top.htm WWW http://www.antennahouse.com/xslformatter.html (English) TEL +81-3-3234-1361(direct call) FAX +81-3-3221-9975 Antenna House XSL School http://www.antenna.co.jp/XML/school/xslday.htm XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format