[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: heritage (was Re: SGML on the Web)
Hi Patrick, > Oh, you mean prior to processing! Well, it has no structure prior to > processing does it? Until something determines what is or is not > markup, and what rules that markup must follow, the document is not > anything at all with regards to XML. Right. The syntax used by the original documents is completely irrelevant. So why use example documents that look like XML? Why not use {}s around your tags instead, for example? It would be a lot less confusing for people who see a .xml extension (on your source document, prior to processing) and therefore imagine that it's an XML document. It's interesting how close you are to Walter Perry's position here -- that nothing matters aside from the output of your process, and that the input can be anything at all; it's up to the processor to work out what. But I, as a user, have to write a document that can be interpreted by your JITT processors into an XML document -- say, just for example, a document with multiple overlapping hierarchies. If you don't tell me what syntax to use for that document, how am I supposed to know what to write? You might say that I can write in any syntax at all, but I know that if I present some LMNL, say, to your JITTFilter.xsl stylesheet, it won't be able to extract a tree from that because it doesn't recognise the syntax. I do need to know something about what your processes expect. > I belabor this because it is very important: A JITTs parser can use > standard XML syntax and do things that are simply not possible with > a standard XML parser. The example I gave earlier today of the > dictionary entry is only one example. JITTs is does not, has not and > will not require a new syntax to produce benefits that current XML > processes cannot produce. I understand that. But users cannot write an XML document and have it interpreted as multiple hierarchies, I believe? Or if they can, I'd love to see an example. >>Of course that doesn't detract from the idea of using configurable >>parsers to interpret a true XML document in different ways, and I >>appreciate that you're just using an existing syntax to try out >>these ideas, but as an XML person I'd feel a lot more comfortable >>with your examples if you'd use well-formed XML, with milestones to >>represent the overlapping structures, in your examples, rather than >>a pseudo-XML. > > I suspect the discomfort is due in part to the persistence of the > idea that an XML document, or any other document for that matter, > has some inherent structure. There is no structure until something > in the document is interpreted as "markup" and that "markup" is > subjected to a set of content models, and with XML, for its > adherence to the rules for well-formedness. Right. I understand that, but what I'm saying is that if you interpret a well-formed XML document in terms of its XML-defined markup, the very nature of its syntactic rules -- the fact that tags must match, the fact that attributes have string values -- limit the ways in which that markup can be interpreted. In XML's case, it is limited to tree structures and to unstructured attributes. I don't have *any* problem with the approach of "choose what you want to see in the document", but you can't pull out a structure that the markup syntax cannot legally represent. > As I pointed out in our paper (and here) JITTs is not limited to > overlapping hierarchies. It addresses a number of issues with > current markup strategies. > > We set out to solve one problem (overlap) and eventually arrived at > a solution that appears to have a much broader applicability. I understand that, and I think that the approach is very powerful and useful. I'm just trying to persuade you that using pseudo-XML documents as the source documents for your processes is confusing. Starting with well-formed XML documents, and hiding or showing particular markup within them, is great. If you start with ill-formed XML, you're using a new syntax for those documents, and I think you should be up front about that. > A third alternative is to change how one interprets markup for the > purpose of imposing structures on a text. There is no natural law > requirement that markup processing recognized all the markup in a > document. Actually the XML 1.0 spec specifies a syntax for markup > but it never says that all markup has to be recognized. It does have > all the other restrictions that have been mentioned but it omits > that one. So long as the markup presented to the parser meets all > the stated requirements, it appears to be valid XML. You're surprisingly right that the XML 1.0 Rec. doesn't say anything about whether or not elements and attributes are reported to an application. (I actually think that this was because this assumption was so fundamental that they didn't think that they needed to spell it out; it *does* explicitly say that some things *don't* need to be reported to the application, such as comments, which is what makes me think the default is "report everything"). But the XML 1.0 Rec. does say: "An XML processor must always pass all characters in a document that are not markup through to the application." which would seem to say that an XML processor must not hide particular parts of an XML document, and, more importantly: "Validating and non-validating processors alike must report violations of this specification's well-formedness constraints in the content of the document entity and any other parsed entities that they read." which would seem to say that an XML processor should detect errors such as overlapping markup and report it. Of course I'm not saying that a JITT processor, or any other processor, can't treat a document that happens to use XML markup in some other way; it's just that if it *does*, it's not an XML processor. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|