[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Updates (was Re: best practice for providing new
Joshua Allen suggested that "new" items in RSS feeds could be identified by doing: > a diff, comparing file hashes, or whatever. Well, that doesn't work very well with RSS as used today. RSS feeds generated by http://quicktopic.com provide an excellent example of why diff, hashes, etc. don't help when working with RSS feeds. This example should clarify the urgent need for the combination of unique entry id and date that Atom will provide. QuickTopic RSS feeds are dynamically generated on demand. Additionally, quicktopics modifies all hrefs from content in their feeds so that they indirect through a "link.cgi" program. Presumably, this allows them to track how frequently people follow links to other sites. But, the real problem is that they add unique identifies to the rewritten links. Those unique identifiers change for every version of the file generated. Thus, any RSS item which is found in an RSS file generated by QuickTopic will be different *every time* it is fetched if it contains an external link. For instance, At 5:54 this evening I fetched http://www.quicktopic.com/7/H/rhSrjkWgjnvRq.rss The first item contains a link to an external site. It is: href="/cgi-bin/link.cgi?link=http%3A%2F%2Fwww.hyperorg.com%2Fbackissue s%2Fjoho-jun17-01.html&x=215221622.4" At 5:56 this evening, I retched the same RSS file and the link had changed to: href="/cgi-bin/link.cgi?link=http%3A%2F%2Fwww.hyperorg.com%2Fbackissue s%2Fjoho-jun17-01.html&x=215221643.6" Note: The difference is in the "x=" parameter which is at the end of the two hrefs. If you hash or diff these two entries, they will be different even though the entry itself is over 7 months old! If this was an atom feed, and if Quicktopics was "following the rules" then the entry in question here would have a unique id and a date. Rather than doing hashes or diffs of the contents of the entry, we would be able to check that id and the modified or issued date to determine if this was a "new" entry. But, with RSS, which has no useable mechanism for providing unique ids (I've pointed out in other messages why GUID is useless) and with no explicit indication of "modified time", we're stuck believing that this and many other messages from quicktopic are "new" every time we read them. Problems of "ever-changing-items" also occur on sites like InfoWorld that insert ads into their RSS feeds. Whenever the ad changes, any hashing based solution is going to think the item has changed. My concern with this is not some "arrogant" "technology" push. Customers complain that they are seeing the same item in their feeds multiple times. We need Atom to prevent flooding them with duplicate entries. My belief is that the failings of RSS are so great and that the quality of service we'll be able to provide with Atom feeds is so much greater than what we can currently provide, that RSS use will fall off rapidly once Atom becomes established. Users will demand it. bob wyman
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|