[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML-DEV JEWELS (was : XML-DEV on Groves)

  • From: Arjun Ray <aray@q...>
  • To: xml-dev@x...
  • Date: Sat, 12 Feb 2000 15:45:31 -0500 (EST)

list of jewels


On 12 Feb 2000, Thierry Bezecourt wrote:

> From what I have learned [...] groves are about hierarchical data
> structures and addressing nodes in these structures, so they seem
> ideal for a mailing list archive.

Having tackled the problem before in various lo-tech ways, I haven't
found much hierarchic structure that was useful, as opposed to merely
organizational.  Unlike usenet posts, mail messages lack a References:
header (the In-Reply-To:, when it's there, is much too bogotically
variable) so you don't get the benefit of threading.

[However, it might be a basis for a collaborative effort, where the
grove is "grown" over time with feedback on threading links - say a
forms-based adjunct to a Hypermail/MHonArc-style interface, driven by
a grove-aware engine doing smarter things than just spitting out the 
contents of an overview.fmt database.]

> To do that, if I'm correct, we would have to define a property set
> for mailing lists, where articles would be nodes, header fields
> would be properties of these nodes, the "References" header field
> would be used for links to other articles, and the "contents"
> would be the body of the article, which could contain links.

My limited understanding of groves tells me that the key is the grove
plan - which basically determines the amount of analytic granularity
one wants or needs to work with.  (E.g. an article would be a node,
but how "high" or "low" in the hierarchy?)  Maximum flexibility needs
an exhaustive/detailed property set as the basis.

> It does not seem very difficult.

Well, it has been my experience that reliably extracting fine-grained
material from mail messages is very difficult.  (Just think of the
variety of quoting habits/conventions.)  For comparison, look at:

 (1) the monthly aggregations of messages, in UNIX mbox format, from
     the majordomo bot at IC where this list was.
 (2) Erik Naggum's old archive of usenet posts to comp.text.sgml,
     already preprocessed into a SGML format at
        ftp://ftp.ifiuio.no/pub/SGML/comp.text.sgml

Care to develop a good property set?:)


Arjun



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.