[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Whitespace rules (v2)

  • From: "Neil Bradley" <neil@b...>
  • To: liamquin@i..., xml-dev@i...
  • Date: Sat, 16 Aug 1997 06:35:23 +0000

hyphen rules
Dear Liam,

Thanks for the feedback.

> > [...]
> > RULE 2. All whitespace preceding the start-tag and following the end-tag 
> > of a 'block enclosing' element is discarded.
> > ---
> > Note: a non-validating applications must refer to a style sheet or
> > configuration file to identify 'block enclosing' elements (perhaps by 
> > applying this rule to elements not specified as in-line elements).
> 
> No -- "blockness" is not at all the same as element content.
> For example, you have to allow for a run-in heading, which starts out
> looking like an HTML H3 (say) except that the rest of the paragraph
> follow on on the same line.  So it isn't a block in the paragraph sense.
> 
> > As a validating application cannot easily determine this rule from the
> > content model (the first mixed content element in the hierarchy is 
> > block enclosing, as well as all outer layers), it may choose the same 
> > approach. 
> 
> I think this is too complicated, as well as being not 100% right.
> I don't think there's a single "right" solution.  This is why it's
> best to allow the parser to pass _all_ whitespace back to the application,
> although it is certainly useful if a DTD-aware parser, even if it isn't
> validating, distinguishes element content whitespace from PCDATA whitespace
> in some way.

Note that these rules are intended for the application, not the 
parser, or any other part of the XML processor. As I state at the top of the rules, "A formatting application 
should......according to the following 5 rules".

> > Note: If PI's, comments or empty elements remain in the data stream,
> > they are deemed transparent to this process, so:
> >  [SP]<!--comment--><p>Some text...
> > 
> > becomes:
> > 
> >  <!--comment--><p>Some text...
> 
> Note that if you have a very large comment, you might need a lot of
> lookahead here.

Actually no, because the application would already KNOW that it is 
currently in block content.

> > RULE 3. A sequence of one or more line-end codes immediately
> > following a start-tag, or immediately preceding an end-tag, are
> > discarded (except in preserved content).
> 
> This means that
> <Paragraph>This is<Emphasis>
> very
> </Emphasis>strange.</Paragraph>
> 
> becomes
> <Paragraph>This is<Emphasis>very</Emphasis>strange.</Paragraph>
> 
> or, if you format withut distinguishing emphasis,
> <Paragraph>This isverystrange.</Paragraph>
> 
> which I don't think is what you want.
> 
> But SGML itself is broken in this regard.

I know, and as it is impossible to cover all angles. I think your 
example is one of the least likely things to happen in reality, and if 
necessary document authors must be educated to avoid it.

I am open to other suggestions, of course. I am only trying to get 
detailed discussions rolling. For example, we could get rid of both 
rules 2 and 3, and improve rule 5 to say that all surrounding white 
space is removed. 
 
> > RULE 4.  A remaining line-end code is converted into a space, except when it is 
> > preceded by a normal (hard) hyphen, or by a soft hyphen ('&#176;'), 
> > in which case it is removed (a soft hyphen is also then removed). 
> > ---
> > Note:
> > 
> >  A[CR]
> >  line-[CR]
> >  end code sep&#176;[CR]
> >  erates lines.
> > 
> > becomes:
> > 
> >  A line-end code seperates lines.
> 
> Well, note that there is no hyphen in that paragraph!!
> The character "-" in ISO 8859-1 (Latin 1) and ASCII is _not_ a hyphen.
> It is a minus sign.

Well, most people in the past have used it as a hyphen in text 
documents, which I think is the important point here.

Also, my source tells me that this character is the official ISO 
hyphen - but my source may be wrong.

> The hyphen is 0255 octal (173 decimal).  It is a hyphen, not a soft hyphen.
> There is no soft hyphen in Latin 1

OK. I will take your word on this. Again, my source of information may be wrong.
 
> I don't have the necessary copy of Unicode in front of me, but last time
> I checked (Unicode 1.1) it was the same in this regard, and also in having
> the ` character be a spacing grave accent, not a single quote.
> 
> This should be done by applications.  I wouldn't want your mesage:

It is being done by the application.

What "wouldn't you want your message:"?

>     ----------
>     RULE 5. Consecutive whitespace characters (including translated 
> turrning into
>     ----------RULE 5. Consecutive whitespace characters (including translated 
> for example.
> 
> > Note: Multiple spaces can be preserved using the non-break space
> > character ('&#160;').
> > 
> >  <p>Some&#160;&#160;&#160;spaces.
> Er, is this defined in Unicode or in ISO 10646??

Don't know. I have it as a non-breaking space, which I am 'liberally' 
interpreting here as a required space (if it can't be broken over 
lines, it must be pretty important). If Unicode has a more explicit 
required space character, then fine, let's use that.

> Lee

Neil.


-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@b...
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@i... the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.