[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: An XML document is not well-formed if encoding="..."does n
Roger wrote: > I would advocate using UTF-8 exclusively That's what I do with my own files, and what I advocate whenever I have any input to design decisions, but as Liam and others have said, it's not practical to expect everyone to adopt this convention. What I really want to know is, when can we start freely using BOMs in UTF-8? I really like this idea, because it is a simple, easy way for a text file to "declare" that it is in UTF-8, and eliminate the ambiguity when the text files are passed around. Unfortunately, a lot of software, especially on Linux, still chokes on these. On a slightly different topic (UTF-16), this discussion reminded of something else I read a while back, a technical note the Unicode Consortium advocating for the use of UTF-16 for internal processing (as opposed to file interchange): http://unicode.org/notes/tn12/tn12-1.html. On the other hand, I just found from a Google search this recent thread on StackExchange, where several people argue that UTF-16 should be considered harmful: http://programmers.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful. I guess the debate will rage on, but interoperability, on the whole, does seem to be getting better. Chris On Sat, Dec 29, 2012 at 2:36 PM, Costello, Roger L. <costello@mitre.org> wrote: > Hi Folks, > > I spoke with George Cristian Bina from oXygen XML and he gave me the scoop on how things work inside oXygen. > > George told me to do this: > > 1. Create an iso-8859-1 encoded XML file. > > 2. Using a hex editor, change encoding="iso-8859-1" to encoding="utf-8" > > 3. Drag and drop the file into oXygen. > > 4. oXygen will generate an encoding exception: > > Cannot open the specified file. Got a character > encoding exception [snip] > > Next, here is something George told me. It is mind-blowing: > > If you have an iso-8859-1 encoded XML file loaded into oXygen > and change encoding="iso-8859-1" to encoding="utf-8" then > oXygen will automatically change the encoding of every character > in the document to UTF-8. > > Wow! > > That is so fantastic, I jumped out of my chair when I read it. > > I just received this additional information from George: > > Please note that the encoding is important only when the file is loaded > and saved. When the file is loaded the bytes are converted to characters > and then the application works only with characters. When the file is > saved then those characters need to be converted to bytes and the > encoding used will be determined from the XML header with a default to > UTF-8 if no encoding can be detected. > > /Roger > > _______________________________________________________________________ > > XML-DEV is a publicly archived, unmoderated list hosted by OASIS > to support XML implementation and development. To minimize > spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@lists.xml.org > subscribe: xml-dev-subscribe@lists.xml.org > List archive: http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|