|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML vs the Dreaded Whitespace
Thanks very much Chris, I'm probably not going to be much practical help, but I hope your posting catalyses a practical response from the SGML experts. I'd be surprised if conventional XML-enhanced SGML tools couldn't handle this problem, but I have no idea what they would cost. [The last flier I got was 2 orders of magnitude greater than an impecunious academic could afford.] At 03:00 11/12/97 -0500, Chris Smith wrote: > [... first problem punted ...] >The second question is much less firm right now. We would like make >whitespace handling robust - if someone along the way uses a tool >which breaks a line, we should be able to fix it rather than die. > >If we add the following character entities to our DTD, > ><!ENTITY spc " "> ><!ENTITY tab "	"> ><!ENTITY cr " "> ><!ENTITY lf " "> > >then it should be possible to use these to represent 'wanted' >whitespace, and thus allow for a simple rule prior to checking message >authentication - that is, remove all 'native' space, tab, LF, and CR >from the #PCDATA and check what remains (whitespace inside tags is >handled in a more draconian fashion). (According to the previous >section, "Hi&spc;there!" will be checked exactly that way you see it >here - not as "Hi there!" The question? - is this distinction (between >eg the native 0x0009 and &tab; (which converts to 0x0009) going to be >difficult to keep track of? As one of the few authors of a generic native XML application I have to face this problem and have repeatedly failed to get practical solutions. the main response is: Yes, its' a problem and Yes, it's your problem As I understand it, your XML document may contain two sorts of white space: whitespace that matters whitespace that doesn't matter The latter may be inserted randomly by authors whose lines don't wrap. From my very limited experience of SGML I would say your approach looks a sensible one. However the major problem is 'where is your application software going to come from?' I have argued very strongly (and shall continue to do so), that there need to be generic conventions honoured by common application programs. Otherwise you have to write your own application for your problem. At present you have only two options: - write it yourself (and maintain it) - pay an SGML house to solve your problem for you I hope shortly to propose some generic whitespace problems (implemented in JUMBO) for certain types of document. I don't know whether they would solve your problems, but thanks for giving me the chance to think about a real problem. :-) As a corollary: Is anyone testing the ESIS output of the current crop of XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model or the value of xml:space they should all produce identical ESIS (right?) If not, then one or more is wrong. And all applications should (IMO) be prepared to work with ESIS which I think is isomorphous with a WF XML document. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








