[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Feeler for SML (Simple Markup Language)

  • From: David Brownell <david-b@p...>
  • To: Tim Bray <tbray@t...>
  • Date: Mon, 15 Nov 1999 14:57:48 -0800

simple ascii
Tim Bray wrote:
> 
> At 01:08 PM 11/15/99 -0800, David Brownell wrote:
> >> The UTF-*'s are logically equivalent to most users, in that they share
> >> the property that almost no real-world data objects are encoded in either.
> >
> >Quite true, from what I know, if you don't consider all the documents
> >encoded in ASCII (which is a subset of UTF-8).  Many of them aren't
> >tagged as to encoding; assert they're UTF-8 not ASCII, and disproof is
> >often going to be impossible!
> 
> I used to think so too, but actually, if you look closely, the proportion
> of "ascii" that's actually pure US-ASCII is not that high. 

Well, ASCII is ASCII -- if it's not pure, it's not ASCII (and
hence it's not usable as UTF-8 either).  ASCII uses only seven
bits; always has (modulo parity), and I can't see that changing.

But while that's key to what I was saying (if it _really_ is
ASCII, it's also UTF-8, and there's lots of real ASCII), I
suspect that was likely not what you were getting at there.


>	 The prevalence
> of é's and õ's and so on these days is in my experience really growing,
> which means that documents which are ideally ISO-8859-1 but in fact
> some Microsoft codepage is really immense.  -T.

Those characters are actually in ISO-8859-1, but I understand that
Microsoft does cause real problems by its use of many characters
that are reserved in 8859-1 ... look at the number of web pages
with strange characters where you should have &ldquo; or &rdquo;
(but hmm, not all browsers accept those entities anyway).

Assert that one of those documents is ASCII, and disproof is trivial:
some character has the eighth bit set.  (When was the last time you
saw a document using it for parity?  A LONG time ago, for me!)  Since
it's not ASCII, you clearly can't read it as UTF-8.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@i... the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.