[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: xml:base and fragments

  • From: "Andrew S. Townley" <ast@atownley.org>
  • To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
  • Date: Thu, 11 May 2017 17:06:57 +0200

Re:  xml:base and fragments
Home stretch, or not….

> On May 11, 2017, at 5:45 AM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
> 
> 
>> On May 10, 2017, at 12:29 PM, Andrew S. Townley <ast@atownley.org> wrote:
>> 
>> 
>>> On May 10, 2017, at 2:28 PM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>>> 
>> 
>> And I said this because you repeatedly referred to the URI used to retrieve the entity resource as the base URI in spite of content-level, explicit specifications to the contrary.
> 
> I don’t think so.  
> 
> I have repeatedly said that the relative reference is defined as identifying a
> fragment within the current entity, and thus (I infer) identifies the same
> fragment as a URI constructed from the URI used to retrieve the current
> document, without fragment identifier, plus the fragment component
> given in the relative reference in question.

The above paragraph is true *except* in the case of a base URI defined within the content of the “current document.”  In that case, the precedence rules defined by the RFC apply, and your paragraph is no longer accurate.

> I do not believe I have ever described the document-retrieval URI as
> the base URI; if you wish to persuade me that I have done so not once
> but repeatedly, I think I will have to ask you to provide some evidence of
> your claim.

How about these words:  "(I infer) identifies the same fragment as a URI constructed from the URI used to retrieve the current document, without fragment identifier, plus the fragment component given in the relative reference in question."

While you have not used the term “base URI” in the above sentence, you are describing what happens according to the RFC in relation to Section 5 which does use the term “base URI.”

The only conclusion I can possibly draw from this is that – while unstated – you are in fact treating the document-retrieval URI as a base URI of a primary resource if you are using that URI as the basis for resolving references to any secondary resources it may contain despite the explicit presence of a base URI in the content of the “current document."

> If you have misunderstood my statements about what the relative reference 
> identifies in the various examples as claims that the relative reference
> has more than one base URI, or one different from that specified in xml:base,
> then that would explain some of the course of this discussion.  But no,
> I do not believe and have never claimed that the document entity in
> the examples is the base URI to be used in absolutizing the relative 
> references; I have merely observed that the absolute form of the 
> relative reference and a URI constructed from the document URI seem,
> on the account offered in RFC 3986, to identify the same thing.

I’m assuming “absolutizing” means “resolve relative references per the RFC.”

Since about response 3, when you got all over me about sloppy terminology references, I’ve made an effort to stick to the terminology used by the RFC.  It would be easier to understand the above if you were to try and do the same.

I don’t know what “document URI” means in this context.  Is it the URI used to retrieved the entity as I would presume?

Either way, the point of your last sentence includes the phrase “a URI constructed from the document URI seem, on the account offered in RFC 3986, to identify the same thing.”

How is this different than what you claim in the previous sentence?

According to the RFC, you cannot construct an absolute URI reference for a secondary resource identifier (fragment) using the URI used to retrieve the entity UNLESS no base URI has yet been defined by the content or any encapsulating entity within the entity being examined.  It is, in fact, prohibited by the wording of the precedence rules in the specification, Section 5, paragraph 2, sentence 1.

I have re-read the above paragraphs about 6 times – slowly and carefully – however, as a native English speaker with what I would consider reasonable fluency and comprehension, I am left with the impression that you say you are not doing something and then immediately state the opposite.

And yes, this is probably the primary reason that this discussion has continued as long as it has.

It seems that no matter how many different ways I try and highlight how my interpretation of the specification – using the words of the specification itself – precludes the possibility of the very thing you imply in the paragraph quoted immediately above, you maintain your position that it is still possible according to the RFC.

That’s fine.  The specification isn’t going to judge either one of us correct.  We, and others, need to make that decision ourselves.

At this stage, my objective isn’t to persuade you per se.  Rather, my objective is to make sure that I can state my position in a way that is consistent with the specification that I and others may also be inclined to agree with my proposition, now or in the future.

>> 
>> If not, then this question is closed, but also given your “many camps” description above, perhaps it is still not addressed satisfactorily.
>> 
>> At this stage, my inclination is to leave it to the pundits to make their own judgments, but that’s how the thread started in the first place.  
> 
> What is the antecedent of “that” here? 

“leaving it to the [RFC] pundits to make their own judgements”

> 
> You appear to be saying that this thread began with my claiming that under
> certain circumstances the URI from which a document was retrieved should
> be used as the base URI for resolving relative references in preference to
> the base URI specified by the use of HTML ‘base’ or the xml:base attribute.
> If so, then I respectfully disagree.  The thread began with email from 
> John McCaskey in which he posed a question which had been discussed
> on the TEI-L list and asked readers of xml-dev what they thought of it. 

That may be where the thread started, but, as I said previously, due to list delivery issues, the first message in the thread I saw was yours from 4th of May, and the point that prompted me to join the discussion originally was this one:

>> I think the question can be paraphrased using defined terms as “Does
>> ‘#apple’ identify the resource 
>> 
>> (1)  …/example.xml#apple
>> 
>> (where … denotes whatever absolute context was present in the URI
>> used to retrieve the document in the first place) or the resource 
>> 
>> (2)  http://www.dictionary.com/a.html#apple
>> 
>> ?
>> 
>> The answer, as I read RFC 3986, is “both”.  
>> 
>> If the creator of a document is not happy with that answer, then caution 
>> should be taken in the use of xml:base and fragment-only identifiers.

What I hadn’t realized at the time was this was going to re-open a more than 13-year debate on the interpretation of the RFC.  If I had, perhaps I would’ve resisted the urge to participate given the number of things on my plate at the moment and the amount of hours consumed by this thread at this stage… ;)

[snip]

>> 
>> 
>> Given that there’s a sequence that takes place during the retrieval process, while the user (or developer) may have the impression that http://example.com/stat/doc.html#foo and http://www.example/com/stat/blargh#foo are the same, and, except from a technical perspective relating to HTTP and the RFC, they are as far as most users may be concerned.
> 
> I apologize, but I do not know how to parse the sentence just quoted.

Given it was after midnight my time and probably because I got interrupted by a crying baby, you are correct.  It was not a complete sentence.

Without the crying baby distraction, I’ll try and make it simpler – but requiring more text – by setting the stage with a quote from David Bohm:

	“all theories are insights, which are neither true nor false, but rather, clear in certain domains,
	and unclear when extended beyond those domains."

Given that we’re speaking of specifications rather than theories, I will make the bold equations of “clear in certain domains” to be functionally equivalent to “true” and “unclear [in certain domains]” to be functionally equivalent to “false” for the rest of this discussion.

In the end-user domain or the domain of someone unfamiliar with the details, rules and inner-workings of RFC 3986, if a user is given a URI pasted on a billboard or the side of a bus consisting of the character sequence "http://example.com/stat/doc.html#foo” and they rush home to their web browser or diligently key the character sequence into their smartphone browser, they may rightly assume that if a blinking, pink neon paragraph is scrolled visible that in fact, by some vague insight into HTML, the neon marquee paragraph had been given an anchor of “foo” so that’s why “The World’s Only 1,000% Effective Cure for Toenail Fungus” dominated the display of their smartphone.

Data: "http://example.com/stat/doc.html#foo”
Action: enter the URI into my browser
Result: Health Remedy Claim

And, within their domain of reference and knowledge, this would be true, because a) they observed it, and b) it was repeatable.

This is the equivalent to another Bohm quote: “that a rapidly spinning bicycle wheel gives the impression of a solid disc, rather than of a sequence of rotating spokes.”

In their domain, it is true because they have personally-verified experience that it is true, and, in fact, they don’t care if there’s another explanation because they care about the result, not the mechanics.

Alternatively, in the “I am a diligent student of RFC 3986” domain, this explanation can’t be good enough because we are concerned not only about the result but also about both the how and why the specific action of typing the given character sequence into the address bar of a browser reliably and repeatably gives the observed result.

In order to do this, we must enter the domain of RFC 3986, base URIs, relative URIs, primary and secondary resources, relative URI resolution, media types, fragments and client user agents.  All of which are concepts alien and potentially scary to members of our previously described end-user or non-RFC 3986-aware domain.

In our domain, because we understand the rules, constraints and behavior required of RFC 3986, we realize that the wheel is not a solid disc at all but rather a collection of individual spokes that, when rotated at a certain velocity, provide the impression of a solid disc.  Therefore, we know that the conclusions drawn from the user domain, while valid and reasonable with a certain level of knowledge and understanding, are actually false within the domain of RFC 3986.

So, apparently, finally, I gave you the answer you were after—but probably not for the reasons you wanted.

Below, in the previous response, I attempt to illustrate why I believe something true in the user domain is actually false in the RFC 3986 domain, but we’ll revisit that part somewhere below.

> 
>> 
>> However, since we’re talking in the land of RFC’s and specifications, these distinctions are material, so I will attempt to illustrate my perspective.
>> 
>> 1. A URI dereference via retrieval action is requested for “http://www.example.com/stat/doc.html#foo” assuming this is not within the scope of 4.4 Same Document reference semantics.
>> 
>> 2. The URI is parsed into its scheme representation according to Section 4.3:
>> 
>>>  Scheme specifications will not define
>>>  fragment identifier syntax or usage, regardless of its applicability
>>>  to resources identifiable via that scheme, as fragment identification
>>>  is orthogonal to scheme definition.
>> 
>> This means that the actual resource dereference retrieval action by the user agent uses the URI "http://www.example.com/stat/doc.html”
>> 
>> 3. The octet stream and meta data is processed by the user agent to assemble a representation based on understanding the structure of the resource’s media type.
>> 
>> 4. The resource is interpreted according to Section 5.1 to identify an appropriate Base URI for relative URI resolution according to Section 5.  The result of this interpretation (in this case), results in assigning the value of "http://www.example.com/stat/blarg” as the base URI for the loaded resource.
>> 
>> 5. The original request URI is parsed for any fragment identifier that needs to be resolved as a relative secondary resource identifier within the primary resource loaded by the user agent having the base URI of http://www.example.com/stat/blarg”
>> 
>> 6. Resolution of the ‘#foo’ fragment identifier takes place according to the rules of the RFC and the browser does not initiate a new retrieval action, doing whatever is appropriate to display the secondary resource to the user as defined by the media type specification
>> 
>> According to this sequence the original URI with fragment identifier http://www.example.com/stat/doc.html#foo is not dereferencable until the primary resource has been loaded.  
> 
> OK, I think I’m more or less following your account so far.  

Good, because this is the other relevant part of Section 3.5 that explains why this is the case:

>    Fragment identifiers have a special role in information retrieval
>    systems as the primary form of client-side indirect referencing,
>    allowing an author to specifically identify aspects of an existing
>    resource that are only indirectly provided by the resource owner.  As
>    such, the fragment identifier is not used in the scheme-specific
>    processing of a URI; instead, the fragment identifier is separated
>    from the rest of the URI prior to a dereference, and thus the
>    identifying information within the fragment itself is dereferenced
>    solely by the user agent, regardless of the URI scheme.

Aaaaannnnnddddd, here’s where things get non-Newtonian:

> 
>> The very process of dereferencing that primary resource defines a different base URI for resolution of the fragment portion of the URI than the URI from which the resource was loaded, so, technically, and I do mean, technically, the ‘#foo” secondary resource only exists as a secondary resource of the primary resource identified by the URI specified in the content as the base URI for fragment resolution according to Section 4.4.
> 
> Why “only”?
> 
> The adverb “only” would follow, I think, from an assumption that no 
> fragment can be part of (or a secondary resource of) more than one 
> primary resource.  Are you making that assumption?  If so, can you
> explain what you believe justifies the claim?  If not, is there another
> reason for the “only” in the sentence above?

“Only” because of this quite crucial part of Section 3.5, paragraph 6, sentence 2:

  “the fragment identifier is not used in the scheme-specific
   processing of a URI; instead, the fragment identifier is separated
   from the rest of the URI prior to a dereference, and thus the
   identifying information within the fragment itself is dereferenced
   solely by the user agent, regardless of the URI scheme.”

That means the user agent must first load the result of the URI deference to understand how it is supposed to resolve the secondary resource (fragment) within the context of the primary resource (the result of dereferencing the original URI, "http://www.example.com/stat/doc.html”).

And I should also be quite clear that “only” is not a global case.  It is only true in the presence of a base URI that is identified according to Section 5’s precedent rules 1 or 2.  In all other cases, what you say is correct because the primary resource URI would be the base URI derived from precedent rule 3: the URI used to retrieve the entity.

>> 
>> There is no way, except accidentally, to interpret that the http://www.example.com/stat/doc.html#foo secondary resource, when dereferenced, actually exists as part of the primary resource http://www.example.com/stat/doc.html because the act of dereferencing the http://www.example.com/stat/doc.html URI hides the existence of this URI from the fragment resolution mechanism defined within the RFC itself.
> 
> I don’t think I follow your logic here.  

My reference to “accidentally” relates to my shiny new “spinning spokes” example within the end-user domain of "uri -> action -> result” described above.

Meanwhile, in our domain…

Secondary resources have no identity outside the context of the primary resource to which they are bound.  In dereferencing a URI known to the user agent to contain a fragment reference to a secondary resource, a different primary resource URI was found due to the existence of a content-specific base URI definition.  That means that since the client user agent has the sole job of resolving the secondary resource location, the only value for a primary resource URI available to it according to RFC 3986 *must be* the content-specified base URI.

Therefore, following the rules of the RFC, the only possible choice of a primary resource URI for resolving the secondary resource URI fragment is "http://www.example.com/stat/blarg”, resulting in an absolute secondary resource URI of "http://www.example.com/stat/blarg#foo.”

The preceding statement is true *if and only if* a content-specific base URI is specified.  In the event that no content-specific base URI is defined, base URI detection precedent rule #3 of Section 5 applies, and the correct primary resource URI with which the secondary resource fragment URI ‘#foo’ would be the URI used to retrieve the entity, namely "http://www.example.com/stat/doc.html” and resulting in an absolute secondary resource URI of "http://www.example.com/stat/doc.html#foo."

> I think I agree that until we have found the appropriate fragment, 
> we don’t know that a given resource has a fragment with a given
> name.  In the example, though, this seems to me to be equally
> true of http://www.example.com/stat/doc.html#foo and of 
> http://www.example.com/stat/blarg#foo and I don’t see what
> hiding has to do with anything.  
> 
> Sorry; completely lost here.

Whether the fragment *exists* and we can find it is secondary to what the URI of the resolved (absolute) URI of the secondary resource identified by the fragment should be.  The secondary resource is defined by the author by virtue of including it in the original URI or in a relative URI reference within the content of a given entity resulting from a retrieval action against a URI.

Because, even when available outside of a primary resource URI, resolution of the secondary resource URI is ALWAYS done by the client user agent, and, if compliant with RFC 3986, that resolution must be in relation to a single base URI established according to the precedence rules of Section 5.

In the immortal words of Connor “The Highlander” MacLeod, “There can be only one!”

>> 
>> So, from the *user* perspective, the http://www.example.com/stat/doc.html#foo secondary resource does, in fact, exist because the user sees it associated with this URI which they may see in their browser.  However, technically, and from the perspective of the wording of the RFC itself, it does not – it cannot – exist, because it is never possible to resolve the secondary URI fragment in relation to the primary URI from which the resource was originally loaded.
> 
> I don’t think that’s given at all in the examples offered so far.   Perhaps
> I am misunderstanding your claim.  Consider the example offered by Paul 
> Grosso at [1], with the following document at 
> http://www.example.org/doc.xml:
> 
> <doc xml:base="http://www.example.com/stat/blargh">
>  <para href="#foo">xxx</para>
>  <para id="foo">yyy</para>
> </doc>
> 
> Are you saying that the URI reference http://www.example.org/doc.xml#foo
> does not point to the paragraph with id=“foo” ?  Why on earth not?
> 
> [1] https://lists.w3.org/Archives/Public/uri/2004Jan/0007.html

The phrase “points to” is an expression of reality within the “solid disc” end-user domain defined above.  In this domain, the answer is yes, of course.

However, if you’re trying to actually resolve the URI fragment “#foo” in a manner conformant to RFC 3986, the answer is “eventually, but the resulting URI reference to the paragraph with id=‘foo’ would be 'http://www.example.com/stat/blargh#foo' and not ‘http://www.example.com/doc.html#foo', as the media type resolver within the client user agent would have no reference to ‘http://www.example.com/doc.html' due to the base URI associated with the doc element via the xml:base attribute.”

>>> 
>>> Since you began by suggesting that RFC 3986 was irrelevant to the case,
>>> I had then the impression that you thought what it said was of no concern
>>> in the example in question.
>>> 
>>> I’m happy to learn that I misunderstood your position.  
>> 
>> Fantastic.
>> 
>> To restate:
>> 
>> RFC 3986 defines the concept of a URI, a base URI, a relative URI and a mechanism for resolving relative URI references against a base URI.  The RFC also defines a future-focused extension point that allows content formats a first opportunity to specify what the value of a base URI should be when resolving relative references to any degree of granularity possible to specify in the definition of the content format itself.
>> 
>> XML Base and HTML, as content formats, both provide a mechanism for identifying the base URI to be used for relative URI resolution according to RFC 3986.  However, the mechanisms they choose to use to define the value of the base URI to be used by RFC 3986 complaint software is totally and completely orthogonal and independent to RFC 3986 except for two things:
>> 
>> 1) said content specification must make a normative reference that it is able to provide base URIs according to the requirements specified in RFC 3986, and
>> 2) said content specification must define the rules and scope in which the character sequences it provides according to this interface are to be used by RFC 3986.
>> 
>> I’d also like to state that the majority of the discussion seems to have been not on the identification of the appropriate base URI that should be used but the consequences and mechanics of using that base URI in conformance with RFC 3986.
> 
> Yes, I think that’s true.  I think that’s because there was never any serious question of
> what the base URI is or how to apply it.

…and yet… Today is February 2nd….again.

>> 
>> The part of the discussion that I joined was originally focused on how the value of a base URI was to be established using xml:base specifically rather than how that value was used according to RFC 3986.  
> 
> That’s an interesting and to me unexpected characterization of the
> discussion.  It’s a funny old world, innit?

Indeed it is.  Indeed it is.

Cheers,

ast
--
Andrew S. Townley <ast@atownley.org>
http://atownley.org



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.