[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

re: data smushing

  • From: David Megginson <david@m...>
  • To: xml-dev@l..., www-rdf-interest@w...
  • Date: Sat, 30 Dec 2000 07:21:33 -0500 (EST)

smushing
Seth Russell writes:

 > So true, the Semantic Web doesn't work without "data smushing"!  I
 > think we should even apply "data smushing" to nodes with URIs,
 > cause there gonna be people misapplying URIs.  My question is: has
 > anybody come up with some good algorithms for "data smushing" ?  (I
 > love that term, I've used it 3 times now.)  Maybe we should come up
 > with a schema for expressing smushing rules in RDF ... any hint of
 > that being done yet?

There are two separate problems here:

1. combining data from two different sources; and

2. pruning redundant entities.

It may be the case that the different sources use the same URI to
identify the same entity; likewise, a single source with a large
database might end up with many duplicate versions of the same entity
shadowing each other.

Outside the research lab, #2 is extremely difficult.  For #1, however,
all we have to do is extend the (oversimplified version of the) RDF
logical model to include one more member:

  {predicate, subject, object, source}

where source is a URI representing the source of the information
(probably, but not necessarily, the URL of an RDF document; it could
also be a URI representing a news wire, for example).  Now, query
operations, searches, etc. can take into account where the information
came from, and can distinguish, say, two "name" properties provided by
the same source from two "name" properties provided by two different
sources.


<rant>

As I've mentioned many times before, the published RDF logical model
needs to be extended anyway because it does not distinguish specific
subjects from open-ended subject patterns (rdf:aboutEachPrefix), it
does not distinguish literal objects from resource objects, and it
does not allow for xml:lang (which the RDF spec states is significant
in RDF processing).  A logical model that takes all of this into
account would look something like

  {predicate, subject, subjectType, object, objectType, lang}

or, with the source information

  {predicate, subject, subjectType, object, objectType, lang, source}

You could argue that subject type is an internal trait of subject, and
that objectType and lang are internal traits of the object, but then
the grammar needs to be elaborated properly:

  statement: predicate, subject, object

  predicate: URI

  subject: URI, subjectType

  subjectType: ("uri" | "pattern")

  object: URI, objectType, lang

  objectType: ("literal" | "resource")

  lang: LITERAL

It's still not all that bad, but the

  {predicate, subject, object}

thing was always bogus.

</rant>


All the best,


David

-- 
David Megginson                 david@m...
           http://www.megginson.com/

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.