|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Attributes as tags in namespaces and how to guess characte
Martin Olsson wrote: > --- QUESTION 2 > > XML files can use different character encodings including UNICODE and > normal ascii text files. An XML parser must know what encoding is used > before it starts to process the file, loading a UNICODE file is very > different from loading a normal text file. The parser can obviously not > first read the encoding attribute of the XML declaration which is the > first line of the XML file and then load the file. On the contrary, the xml declaration is entirely in ascii except for a possible byte order mark, so the processor can determine 8-bit vs. 16-bit encodings from the BOM and the <?xml, and then read the encoding declaration, knowing that it is in ascii. > ... Should the XML parser use a brute force approach and try all of these? THe only problem would come if the actual encoding does not match the declared encoding (I am leaving aside those cases where the processor knows the encoding by some other means). The processor is not expected to sort out such discrepancies. XML is one of the few formats out there that can handle multiple encodings and unicode decently, and much of this is due to the xml declaration. Cheers, Tom P -- Thomas B. Passin Explorer's Guide to the Semantic Web (Manning Books) http://www.manning.com/catalog/view.php?book=passin
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








