[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Postel's law, exceptions
Sometimes, it is hard to follow the law... As it turns out, even though I've been arguing for insisting that people generate valid data, my site (http://weblogs.pubsub.com/) has been accused of generating invalid RSS files. If we don't "fix" this, we're going to get put on the Syndic8 list of feeds needing "repair"... I'd appreciate some guidance on how to fix this problem. The answer isn't intuitively obvious. What we do at PubSub.com is generate custom, synthetic RSS feeds. We scan about 100K feeds continuously and let people "subscribe" to items in those feeds. (Thus, if you want to know every time "(RSS OR ATOM) AND (BLOG OR FEED)" is mentioned in an RSS feed, we can help you... When we find a match we insert it into a custom RSS file being maintained for the subscriber. (In the future, we'll support other kinds of "delivery". Email, SOAP, XMLRPC, etc..) The issue with our feeds is that we don't put <language> tags in them. These tags are defined as optional in RSS V2.0, but there is no question that having them improves the utility of a feed significantly and some people consider their absence to constitute a "broken feed.". Our dilemma is that RSS appears to have been defined with the assumption that all items in a feed would share a common language. This is a good assumption when RSS is being used to syndicate the content of a blog being maintained by a single person, however, it doesn't work well when the feed is composed of items sourced from thousands of other feeds. What we need is a <language> tag on items -- not a single tag for the whole RSS file. Unfortunately, RSS V2.0 doesn't define item-level <language> tags... Now, clearly, we could define some new namespace and create an item-level <language> tag of our own like "<ps:language>". The difficulty with doing so is that this private tag wouldn't achieve much more than wasting bandwidth since no known news aggregator knows what to do with it. This is the case, of course, with many "extensions" to XML formats... They work within small groups, but are simply noise when the scope of usage expands since no one supports them. It has been suggested that we should do a scan of the generated feed and determine what language is most commonly used in the various items that have been collected. However, I don't think this gets us to any place useful. The problem is that while this might mean that the channel-level language tag is right for many items, it will still be wrong for many other items. Also, this means that the <language> for one of our RSS channels could be changing from minute to minute as content of one language or another ebbs and flows into the generated feed. Our interface allows people to create subscriptions that restrict the content that is scanned for them to only those that are marked as being in some specific language. Potentially, we could insert <language> tags into such single language feeds, but we are then still left with the issue of what we should do for subscriptions that specify "any language" as the content source... One approach to solving this problem would be to simply use a newly defined language code that indicates ambiguity of language. Thus, I might use "x-mixed" or "x-unknown". (Until "i-mixed" and "i-unknown" are registered with IANA to join the "i-default" which is already registered.) RSS V2.0 defines its language codes via W3C as compliant with RFC1766 which provides for new language tags to be defined. We would use one of these tags on feeds which were not language specific. But, is this the right thing to do? It solves my problem of needing to have a language tag and of needing to be explicit about what I'm transmitting, however, it will probably be some time before news aggregators actually know what to do with such a tag... Also, registering these tags with IANA could result in other people using them with potentially negative impacts in other XML files, etc... But, perhaps there is some obvious solution that I haven't considered... Please consider offering some guidance on this issue and have a look at our site at http://weblogs.pubsub.com/ . How do I keep my feeds off the "broken feeds" list? bob wyman
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|