[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: xml:base and fragments

  • From: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
  • To: "Andrew S. Townley" <ast@atownley.org>
  • Date: Wed, 10 May 2017 21:45:39 -0600

Re:  xml:base and fragments
> On May 10, 2017, at 12:29 PM, Andrew S. Townley <ast@atownley.org> wrote:
>> On May 10, 2017, at 2:28 PM, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>>> On May 10, 2017, at 5:49 AM, Andrew S. Townley <ast@atownley.org> wrote:
>>> ...
>>> The question still remains, apparently, what is the correct character sequence that should be the value of the base URI within the scope of the element e.
>> ?!  
> And I said this because you repeatedly referred to the URI used to retrieve the entity resource as the base URI in spite of content-level, explicit specifications to the contrary.

I don’t think so.  

I have repeatedly said that the relative reference is defined as identifying a
fragment within the current entity, and thus (I infer) identifies the same
fragment as a URI constructed from the URI used to retrieve the current
document, without fragment identifier, plus the fragment component
given in the relative reference in question.

I do not believe I have ever described the document-retrieval URI as
the base URI; if you wish to persuade me that I have done so not once
but repeatedly, I think I will have to ask you to provide some evidence of
your claim.

If you have misunderstood my statements about what the relative reference 
identifies in the various examples as claims that the relative reference
has more than one base URI, or one different from that specified in xml:base,
then that would explain some of the course of this discussion.  But no,
I do not believe and have never claimed that the document entity in
the examples is the base URI to be used in absolutizing the relative 
references; I have merely observed that the absolute form of the 
relative reference and a URI constructed from the document URI seem,
on the account offered in RFC 3986, to identify the same thing.

> If not, then this question is closed, but also given your “many camps” description above, perhaps it is still not addressed satisfactorily.
> At this stage, my inclination is to leave it to the pundits to make their own judgments, but that’s how the thread started in the first place.  

What is the antecedent of “that” here?  

You appear to be saying that this thread began with my claiming that under
certain circumstances the URI from which a document was retrieved should
be used as the base URI for resolving relative references in preference to
the base URI specified by the use of HTML ‘base’ or the xml:base attribute.
If so, then I respectfully disagree.  The thread began with email from 
John McCaskey in which he posed a question which had been discussed
on the TEI-L list and asked readers of xml-dev what they thought of it. 

> ...
> The only thing I was “denying” was that there was an additional required dereference retrieval action against the base URI discovered in the content so that Section 4.4’s “same document” semantics applied.

Thanks for the clarification; your denial (or “denial”) would perhaps have 
been more obvious and more clearly relevant if anyone in the discussion 
had suggested that dereferencing the reference “#apple” in John
McCaskey’s original example required a fresh retrieval of the
document containing the reference. Did anyone make that suggestion?

> As stated 4? times now, that assumption was incorrect and the processing user agent must consider the base URI as relating to the current resource, regardless if said base URI has been dereferenced or not.

Sorry; I’m completely lost here.  I do not know what you are talking about.

> ...
> Given that there’s a sequence that takes place during the retrieval process, while the user (or developer) may have the impression that http://example.com/stat/doc.html#foo and http://www.example/com/stat/blargh#foo are the same, and, except from a technical perspective relating to HTTP and the RFC, they are as far as most users may be concerned.

I apologize, but I do not know how to parse the sentence just quoted.

> However, since we’re talking in the land of RFC’s and specifications, these distinctions are material, so I will attempt to illustrate my perspective.
> 1. A URI dereference via retrieval action is requested for “http://www.example.com/stat/doc.html#foo” assuming this is not within the scope of 4.4 Same Document reference semantics.
> 2. The URI is parsed into its scheme representation according to Section 4.3:
>>   Scheme specifications will not define
>>   fragment identifier syntax or usage, regardless of its applicability
>>   to resources identifiable via that scheme, as fragment identification
>>   is orthogonal to scheme definition.
> This means that the actual resource dereference retrieval action by the user agent uses the URI "http://www.example.com/stat/doc.html”
> 3. The octet stream and meta data is processed by the user agent to assemble a representation based on understanding the structure of the resource’s media type.
> 4. The resource is interpreted according to Section 5.1 to identify an appropriate Base URI for relative URI resolution according to Section 5.  The result of this interpretation (in this case), results in assigning the value of "http://www.example.com/stat/blarg” as the base URI for the loaded resource.
> 5. The original request URI is parsed for any fragment identifier that needs to be resolved as a relative secondary resource identifier within the primary resource loaded by the user agent having the base URI of http://www.example.com/stat/blarg”
> 6. Resolution of the ‘#foo’ fragment identifier takes place according to the rules of the RFC and the browser does not initiate a new retrieval action, doing whatever is appropriate to display the secondary resource to the user as defined by the media type specification
> According to this sequence the original URI with fragment identifier http://www.example.com/stat/doc.html#foo is not dereferencable until the primary resource has been loaded.  

OK, I think I’m more or less following your account so far.  

> The very process of dereferencing that primary resource defines a different base URI for resolution of the fragment portion of the URI than the URI from which the resource was loaded, so, technically, and I do mean, technically, the ‘#foo” secondary resource only exists as a secondary resource of the primary resource identified by the URI specified in the content as the base URI for fragment resolution according to Section 4.4.

Why “only”?

The adverb “only” would follow, I think, from an assumption that no 
fragment can be part of (or a secondary resource of) more than one 
primary resource.  Are you making that assumption?  If so, can you
explain what you believe justifies the claim?  If not, is there another
reason for the “only” in the sentence above?

> There is no way, except accidentally, to interpret that the http://www.example.com/stat/doc.html#foo secondary resource, when dereferenced, actually exists as part of the primary resource http://www.example.com/stat/doc.html because the act of dereferencing the http://www.example.com/stat/doc.html URI hides the existence of this URI from the fragment resolution mechanism defined within the RFC itself.

I don’t think I follow your logic here.  

I think I agree that until we have found the appropriate fragment, 
we don’t know that a given resource has a fragment with a given
name.  In the example, though, this seems to me to be equally
true of http://www.example.com/stat/doc.html#foo and of 
http://www.example.com/stat/blarg#foo and I don’t see what
hiding has to do with anything.  

Sorry; completely lost here.

> So, from the *user* perspective, the http://www.example.com/stat/doc.html#foo secondary resource does, in fact, exist because the user sees it associated with this URI which they may see in their browser.  However, technically, and from the perspective of the wording of the RFC itself, it does not – it cannot – exist, because it is never possible to resolve the secondary URI fragment in relation to the primary URI from which the resource was originally loaded.

I don’t think that’s given at all in the examples offered so far.   Perhaps
I am misunderstanding your claim.  Consider the example offered by Paul 
Grosso at [1], with the following document at 

<doc xml:base="http://www.example.com/stat/blargh">
  <para href="#foo">xxx</para>
  <para id="foo">yyy</para>

Are you saying that the URI reference http://www.example.org/doc.xml#foo
does not point to the paragraph with id=“foo” ?  Why on earth not?

[1] https://lists.w3.org/Archives/Public/uri/2004Jan/0007.html

>> Since you began by suggesting that RFC 3986 was irrelevant to the case,
>> I had then the impression that you thought what it said was of no concern
>> in the example in question.
>> I’m happy to learn that I misunderstood your position.  
> Fantastic.
> To restate:
> RFC 3986 defines the concept of a URI, a base URI, a relative URI and a mechanism for resolving relative URI references against a base URI.  The RFC also defines a future-focused extension point that allows content formats a first opportunity to specify what the value of a base URI should be when resolving relative references to any degree of granularity possible to specify in the definition of the content format itself.
> XML Base and HTML, as content formats, both provide a mechanism for identifying the base URI to be used for relative URI resolution according to RFC 3986.  However, the mechanisms they choose to use to define the value of the base URI to be used by RFC 3986 complaint software is totally and completely orthogonal and independent to RFC 3986 except for two things:
> 1) said content specification must make a normative reference that it is able to provide base URIs according to the requirements specified in RFC 3986, and
> 2) said content specification must define the rules and scope in which the character sequences it provides according to this interface are to be used by RFC 3986.
> I’d also like to state that the majority of the discussion seems to have been not on the identification of the appropriate base URI that should be used but the consequences and mechanics of using that base URI in conformance with RFC 3986.

Yes, I think that’s true.  I think that’s because there was never any serious question of
what the base URI is or how to apply it.  

> The part of the discussion that I joined was originally focused on how the value of a base URI was to be established using xml:base specifically rather than how that value was used according to RFC 3986.  

That’s an interesting and to me unexpected characterization of the
discussion.  It’s a funny old world, innit?

C. M. Sperberg-McQueen
Black Mesa Technologies LLC

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.