[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Using entities for me dash problem
JCS wrote: > Once again, you're arguing computer logic with me. Okay, yes, it's > ASCII, fine. But I wanted to *preserve* the NCR and keep the > declaration as UTF-8. That's a perfectly acceptable thing to ask for. > Unfortunately, XSL does not allow me to preserve the NCR. It's > "dumb". Not really. The NCR has disappeared before the xslt processor ever sees it - the parser takes care of that. Wherever there was an NCR, the parser sticks the code for the actual character into the data. It makes no record of the fact that an NCR had been used. Now, certain parsers may let you intercept an NCR and do something when one appears. You could write your own handler that inserts the NCR text instead of the actual character that it represents. As to output, an xslt processor may choose to output an NCR depending on the character and the encoding, but it would never know that the input had originally contained and NCR. > If anything, your above statement *proves* that the output method > shouldn't be linked to the result declaration, because then the > computer is assuming what the declaration should be based on how it > was transformed. If the transformed result does not necessarily > represent the declaration, I should have be able to change the > declaration. In other words, if I've preserved the NCR for the sake > of making the result UTF-8, then it shouldn't say US-ASCII just > because I *had* to transform it due to the way the computer is > programmed to encode these documents. Not at all. When the parser has done its parsing, knowledge of the original encoding is not captured and sent to the xslt processor - it is not part of the xpath model. Remember, xml data is always unicode, even when The transformation is designed to act on the actual characters involved, on not their encoded representation (because that is how xml processing works). Upon output, any implemented encoding may be selected. So the output encoding is independent of the input encoding. The output encoding will match the _output_ encoding declaration. > To make it simpler, if I want to preserve NCR, there should be an > option without using ASCII encoding, or rather, I should be able to > declare whatever encoding I wish the result to be, regardless of how > the transformation was encoded. You can specify the output encoding, as long as it has been implemented by the processor. UTF-8 and UTF-16 are universal by specification. You can usually get iso-8859-1 and often, us-ascii. Otherwise, the available encodings are processor-dependent. > > I think I've come to grips with the fact that it's illogical and > output encoding should NOT be linked to the result declaration as > they can be two different things. Your perspective needs to be enlarged here. An xml document can be assembled from pieces, each of which can have a different encoding. The xslt stylesheet can be in a different encoding from the source document. The stylesheet can import other stylesheets, and load other documents, all of which may be in arbitrary encodings. The processor has to be able to handle them all. So what should a processor count as "THE" encoding? It is an impossible question to answer. Instead, all the encoded data gets decoded into a standard working format, which may be utf-8, utf-16, or whatever the processor uses. All character references are taken by specification to mean their unicode characters, not characters in the encoding that was used. Cheers, Tom P
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|