[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: character entities
David, >> that lists the mappings between character numbers and entities. > > That assumes that the entities do correspond to characters so that > you can do this as a linearisation option; just asking that certain > character numbers are output as entity references. (Of course you > may think that the subject line above alows you to make that > assumption, but it's best never to assume anything:-) Yes indeed. > Taking a random example from the dtd on which I use that shell > script, how would you tell the serialiser to output [snip complex XML entity] > as &e04BL-a; ? I think we've had this discussion before, haven't we? First I'll say that I do think that character entity references are the cause of 90% of the problems in this area. Of course you do get instances where people use other entities in source documents, particularly when dealing with document-oriented XML, but in most transformations these should be parsed along with the rest of the document. It's also more than a little tempting to just say use XInclude or XLink rather than entities, which are *so* last millenium ;) I think that control over how characters are output would be a good addition in XSLT 2.0. After all, you get control over which elements get to have CDATA section content, which is another physical structure. Saxon's gives you nice control in HTML over whether you want native characters, character references in decimal or hexadecimal, or character entity references. And I like the Xalan technique of pointing to a file describing the mapping between characters and character entity references; it would be even better if it could take several files and could interpret DTD syntax. Actually I notice this is partly covered by Requirement 2.7. But anyway... For general entities, one option would be to make sure that you store the XML for the entity as canonical XML, and then do a text-based substitution of the entity XML on the canonical XML generated from the result tree, before finally outputting it according to the xsl:output instructions. The other possibility is comparing the trees for the result and the entity. Interestingly, it looks as though the XQuery/XPath 2.0 data model includes the notion of 'value-equal' which includes deep equality between node sequences. So possibly you could say that a sequence of nodes in the result tree should be replaced by a given entity if the sequence is value-equal to the sequence defined by a given XML fragment. Very probably that's very time-consuming, especially for 1000 entities on a long document. With character entity references quite often you want characters to be included differently in the input (where they're probably native characters or character references) to the output (where you want e.g. HTML character entity references). But with general entities, I wonder how often you actually want the result tree to be examined to find whatever entities might be included. Usually the real problem, as illustrated by your shell script, is how to get the XSLT processor to pass through entities from the source document or to include directly entities specified in the stylesheet. From the stylesheet to the output you could use something like saxon:entity-ref, which is covered by Requirement 2.8. >From the source to the output, it's a different matter because as we know entities aren't in the data model. The only options are to include them in the data model (which I don't think's going to happen) or to change them into something that is in the data model (which is essentially what you're doing with your text substitution). I'm not exactly sure, but presumably XSLT processors access the stylesheet before they access the source document - it would make sense in that it would allow them to build the tree without whitespace-only text nodes in the first place rather than stripping them after building it. So perhaps you could have a switch within the stylesheet - something like an xsl:input top-level element with a include-entity-references attribute - that governed whether entity references were included in the node tree. You could use elements to hold the resolved content of the entity, just in case you *did* need to have access to it within the stylesheet, or you could point to the file holding the entity if it was an external parsed entity (which means you could control within the stylesheet whether you ever retrieved it or not). Then you could use a similar instruction to saxon:entity-ref to create the entity reference in the output by matching on the entity-reference element, if you wanted. Of course the trouble with that is that you'd need to make sure that the XPaths in the stylesheet took into account that a 'child' element might actually be a grandchild, within one of these entity-reference elements. And it greatly adds to the size of the node tree (for which reason I'd say that it shouldn't apply to character entity references). But you might be happy to put up with that. Cheers, Jeni --- Jeni Tennison http://www.jenitennison.com/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|