[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: A processing instruction for robots

  • From: Lars Marius Garshol <larsga@g...>
  • To: xml-dev@i...
  • Date: 06 Dec 1999 10:25:50 +0100

robots instruction

* Walter Underwood
|
| Comments are welcome.

First thought: this is fine for very simple uses, but for more complex
uses something along the lines of the robots.txt file would be very
nice. How about a variant PI that can point to a robots.rdf resource?


Second thought: "and the index attribute must be first". This is nice
for implementors, but is likely to clash with the expectations of
users and the cost of more generality is very low for implementors.

Why not follow the <URL: http://www.w3.org/TR/xml-stylesheet/ > style
of specifying PI pseudo-attributes?


Also: The robot PI, says the spec, "should be in the internal subset
(not in an external DTD or parameter entity). Since robots may be
non-validating, a robots PI in the external subset might not be seen
by the robot."

I think this is misleading, since "the internal subset" is usually a
short for "the internal DTD subset". A better way of putting it might
be "It should be in the document entity (not in an external entity,
including the external DTD subset and external parameter entities).
Since robots may skip external entities, PIs in external entities
might not be seen by the robot."

However, I don't think this will do either. Entities are what the
storage structure of SGML/XML documents are composed of, and I think
this spec needs to take some sort of stand as to how entities map to
WWW resources, and which entities the PI is really talking about.

One way is to say that every resource is an entity, and every
web-accessible entity is a resource. Then one might say that the
robots PI refers to

 a) the entity in which it is found

 b) the entity in which it is found and all entities included by this
 entity via entity references, regardless of any robots PIs in these
 included entities

 c) the entity in which it is found, and if "follow" is set to yes,
 all entities included by this entity via entity references,
 regardless of any robots PIs in these included entities

 d) the entity in which it is found, and if "sub-entities" is set to
 yes, all entities included by this entity via entity references,
 regardless of any robots PIs in these included entities

Once one agrees on a policy I think this is worth a subsection in the
spec, regardless of the choice made. b) is probably the easiest to
implement, since many APIs do not expose entity structure. It might
not be the best choice, though.

--Lars M.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@i... the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.