[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Re: URIs, concrete (was Re: Un-ask the question)


invalid uri
> From: Amelia A Lewis [mailto:amyzing@t...]
> Sent: Saturday, August 03, 2002 11:00 PM
> To: Uche Ogbuji
> Cc: xml-dev@l...
> Subject: Re:  Re: URIs, concrete (was Re:  Un-ask the
> question)
>
>
> I'm going to be an irritating little git, Uche.  Sorry.
>
> On Sat, 2002-08-03 at 15:30, Uche Ogbuji wrote:
> > [Amy wrote:]
> > > Sorry, do we have any escaping rules?  I don't recall seeing such a
> > > thing in the Namespaces rec (I'm not considering the anyURI
> type in W3C
> > > XML Schema; does that have escaping rules?  Or interesting rules for
> > > comparison?  *sigh*  Guess I'll go look ...).
> >
> > Yes we do.  For example:
> >
> > http://bête.com
> >
> > Is an invalid URI, and thus an invalid namespace name.  It must
> be escaped to
> >
> > http://b%eate.com
> >
> > One thing I don't know is how this URI restriction interacts
> with the recent
> > opening up of DNS to i18n.
>
> I can't actually find a justification for this.  It isn't in the
> Namespaces recommendation, which is fairly silent on what a URI is.

And that's A Good Thing.

> Instead, the recommendation points at RFC 2396.  Section 2 of RFC 2396
> discusses representations of URIs, and the generalized escape mechanism.

Yes.

> It is important to note, however, that the RFC delegates *all* authority
> over which characters are reserved for which components to the component
> ... that is, to the URI registration specification subsection dealing
> with that particular part of that particular URI scheme.

I disagree. Section 2:

<quote>
URI consist of a restricted set of characters, primarily chosen to aid
transcribability and usability both in computer systems and in non-computer
communications. Characters used conventionally as delimiters around URI were
excluded. The restricted set of characters consists of digits, letters, and
a few graphic symbols were chosen from those common to most of the character
encodings and input facilities available to Internet users.

uric          = reserved | unreserved | escaped

Within a URI, characters are either used as delimiters, or to represent
strings of data (octets) within the delimited portions. Octets are either
represented directly by a character (using the US-ASCII character for that
octet [ASCII]) or by an escape encoding. This representation is elaborated
below.
</quote>

So a URI by definition consists only of US-ASCII characters. Independantly
of the scheme.

> Or in other, other words, you may well have a requirement that URIs be
> legal and valid, per the scheme's constraints, before it is transformed
> into a namespace name.  Once it has been so transformed, it is not
> possible to unescape it.  Since the escape mechanism happens before a
> namespace name can be used, and there is no valid unescape mechanism,
> then it does not make sense to speak of an escape mechanism.  What you
> have, instead, is just a string of characters.  This string should
> follow the rules to create a valid URI in some scheme, encoded for
> computer-based transmission, but it doesn't matter, because the
> namespace recommendation says you can't modify it, or interpret it, in
> any useful fashion.
>
> Note that your example, above, is an invalid URI for computer
> transmission, but would be allowed, pretty explicitly, by RFC 2396.  So

Nope. There's no distrinction between a "URI" and a "URI for computer
transmission". There is no such thing as a "unescaped" URI. After unescaping
URI-reserved characters, it stops being a URI.

> blame the mess on TimBL, maybe.  But it seems fairly clear that there is
> no two-way activity happening.  If you get something that contains
> %61%6d%79, you are *not* allowed to read it as 'amy'.  The namespaces
> recommendation gives you no permission to unescape the encoded
> characters.

Indeed.


Julian


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.