[Home] [By Thread] [By Date] [Recent Entries]
Microsoft Notepad saves a BOM at the start of a UTF-8 file (as someone else points out in another post). As much as I'd like to dispense with the BOM since XML has no use for it, I think the case of Notepad points out a valid use case for keeping it. One of the strengths of XML is that it is just text and any text editor can be used to compose XML. A non-XML aware text editor, though, has no reliable way of recognizing the character encoding of a file without a BOM. None of the filesystems I know of support extended attributes that can identify the character encoding of a text file, and every one of them is saddled with a non-Unicode legacy for character encoding (ASCII, ISO-8859-1, whatever). If we preclude the presence of a BOM in an XML entity, then we undermine the utility of such generalized text editors for composing XML. I think that use case is a strong argument for keeping the BOM, in spite of the complications it poses for current XML parsers that don't support it. Unfortunately, I don't think Sun's Crimson parser supports the BOM. I remember having problems with this with JAXP 1.0. I'll try to recreate a test case this afternoon and pass on the results (I have to prepare for a meeting, right now). > -----Original Message----- > From: Richard Tobin [mailto:richard@c...] > Sent: Thursday, June 14, 2001 4:24 AM > To: xml-dev@l... > Subject: UTF-8 BOM > > > The W3C XML Core WG is considering the question of whether a UTF-8 > byte-order make (BOM) is allowed at the start of an XML entity. This > question was raised a few weeks ago in a thread on comp.text.xml > starting at article > > <180520011620538217%andreas.prilop@a...> > > We would like to determine how existing parsers handle the byte > sequence #xEF #xBB #xBF when it appears at the start of an XML > document or other entity. Is it treated as a BOM (and not part > of the text of the entity) or as a zero-width non-breaking space > character? > > We have placed a number of test cases at > > http://www.cogsci.ed.ac.uk/~richard/bomtest/ > > and would be grateful for feedback on how parsers handle them. Please > post results here in xml-dev to avoid unnecessary duplication. > > We would also like to know of any editors (or similar tools) that > generate XML documents starting with a UTF-8 BOM. > > -- Richard (on behalf of the XML Core WG) > > ------------------------------------------------------------------ > The xml-dev list is sponsored by XML.org, an initiative of OASIS > <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To unsubscribe from this elist send a message with the single word > "unsubscribe" in the body to: xml-dev-request@l... >
|

Cart



