[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Identity

  • From: "Hunter, David" <dhunter@M...>
  • To: "'xml-dev@i...'" <xml-dev@i...>
  • Date: Wed, 23 Jun 1999 12:22:38 -0400

identy search
Lars Marius Garshol [mailto:larsga@i...] writes:
> | In another post on this thread, Lars Marius Garshol asked if the
> | following two URLs denote the same resource:
> | 
> | <URL: http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html>
> | <URL: http://birk105.studby.uio.no/linker/XMLtools.html>
> | 
> | My question is, does it matter?  Is there a case where we need an
> | application to know or think that these two URLs are the same? 
> 
> Definitely! When people do a search for 'Free XML software' on Google
> I want them to get a result more or less like:
> 
>   <li><a 
> href="http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html">
>       Free XML software</a> (<a 
> href="http://birk105.../">alternative</a>)
> 
> and not to see these as two completely unrelated sites.

But in this case the search engine isn't treating them as "the same thing";
it is treating them as two distinct "things", which are in some way
<em>related</em>.  ("ThingB" is a mirror of "ThingA".)  Having two "things"
which are related is a much different kettle of fish then having two
"things" and trying to figure out if in fact they are the same "thing".  (If
they were, in fact, "the same", then there would be no need to have a link
to the second "thing".)

> | OTOH, are THESE two URLs the same:
> | 
> | <URL:  http://a.server.com/dir/page.asp>
> | <URL:  http://a.server.com/dir/page.asp?param1=5&param2=6>
> | 
> | This, in my [small] mind, is a much more difficult question to
> | answer, but again, is there a case where we need an application to
> | know or think that these refer to the same thing?
> 
> Sure! Lots! Some examples:
> 
>  - a server log analyzer that provides a referral report should merge
>    references from these two

But to the web server itself, i.e. a.server.com, there really would never be
such a "thing" as "page.asp?param1=5&param2=6"; there would only be a
"page.asp", and anything else is just a parameter to the one "thing".  (This
is strictly when talking about ASP; if we talk about CGI I would be in over
my head, not having dealt with it, but I have a feeling that it would be
similar:  to the web server, there would only be one [executable?] which
would be our "thing", and anything else would be parameters.)

OTOH, if we move our point of reference to an external computer somewhere,
which I guess is where I've been talking from, if it is "merging" references
from the two, then it is treating both as different "things".  (If they're
both the same "thing", then there's nothing to merge.)

>  - a search engine should know whether they are the same, just as with
>    my example above

See the point I'm about to make below...

>  - software that builds an offline copy of a web site should know
>    whether to make separate copies for these two URLs
> 
>  and so on...
> 
> And, BTW, it's by no means obvious that those two URLs really refer to
> the same thing. I'm sure you'll agree that these two URLs refer to
> different resources, for example:
> 
> <URL: 
> http://www.80s.com/cgi-bin/valley.cgi?url=http%3A%2F%2F208.206
.40.209%2Fmyfamily%2Froad.html>
<URL:
http://www.80s.com/cgi-bin/valley.cgi?url=http%3A%2F%2F207.200.30.120%2F%47o
ver%6Eor%2F%42ush.html>

> --Lars M.

Right, but this is kind of my point.  If two URLs (or URIs) are
character-for-character identical, then they're the same thing.  If they're
different <em>in any way</em>, then perhaps they should be treated as
different resources, or perhaps "different but related" resources.  i.e.

<URL:  http://a.server.com/dir/page.asp>
is the same as
<URL:  http://a.server.com/dir/page.asp>

and is different from
<URL:  http://a.server.com/dir/page2.asp>

and is different but related to
<URL:  http://a.server.com/dir/page.asp?param1=5>

(I readily admit that this may be a gross over-simplification.)

(And I heartily wish that I could remember how this discussion got started,
so that my examples could be more relevant.  Did it start with namespaces?
Or Schemas, and their use of namespaces?  Or something completely unrelated?
Even the very first "Identity" email was in reference to ANOTHER thread, so
I can't even trace it back...)

David Hunter
david.hunter@m...
MediaServ Information Architects
http://www.MediaServ.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.