[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Short Essay: Squeezing RDF into a Java Object Model
The more I work with RDF, the more I find it fascinating in the abstract but annoying in the concrete. The biggest problem is that RDF claims an extremely simple data model statement: subject, predicate, object but that the model does not even come close to describing what information actually appears in an RDF statement. Let's start with the most naive mapping into a Java object model: public interface RDFStatement { public abstract String getSubject (); public abstract String getPredicate (); public abstract String getObject (); } This will work fine for something like the following: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://www.purl.org/dc#"> <rdf:Description about="http://www.megginson.com/"> <dc:Title>Megginson Technologies</dc:Title> </rdf:Description> </rdf:RDF> statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Title" statement.getObject() => "Megginson Technologies" However, it falls apart quickly when the value of the property is a resource: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://www.purl.org/dc#"> <rdf:Description about="http://www.megginson.com/"> <dc:Creator rdf:resource="http://home.sprynet.com/sprynet/dmeggins/"/> </rdf:Description> </rdf:RDF> statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Creator" statement.getObject() => "http://home.sprynet.com/sprynet/dmeggins/" In the first case, the object was a literal, and in the second case, the object is a resource; however, the naive interface does not make this information available. The only solution is to add a new property to the Java interface: public interface RDFStatement { public abstract String getSubject (); public abstract String getPredicate (); public abstract String getObject (); public abstract boolean objectIsResource (); } Now, for the first example, we have statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Title" statement.getObject() => "Megginson Technologies" statement.objectIsResource() => false and for the second example, we have statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Creator" statement.getObject() => "http://home.sprynet.com/sprynet/dmeggins/" statement.objectIsResource() => true Unfortunately, we're not nearly through yet. The next nasty bit comes from the aboutEachPrefix attribute. For example, here's a modified version of the first example: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://www.purl.org/dc#"> <rdf:Description aboutEachPrefix="http://www.megginson.com/"> <dc:Title>Megginson Technologies</dc:Title> </rdf:Description> </rdf:RDF> Now, this description no longer applies just to http://www.megginson.com/, but to *all* resources whose URIs begin with http://www.megginson.com/ (a constantly-changing set, and, in the case of CGIs or Servlets, potentially infinite). As a result, the following information is no longer sufficient: statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Title" statement.getObject() => "Megginson Technologies" statement.objectIsResource() => false We need to modify the interface once again public interface RDFStatement { public abstract String getSubject (); public abstract String getPredicate (); public abstract String getObject (); public abstract boolean subjectIsPrefix (); public abstract boolean objectIsResource (); } statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Title" statement.getObject() => "Megginson Technologies" statement.subjectIsPrefix() => true statement.objectIsResource() => false But wait -- there's more. The RDF spec states that the 'xml:lang' attribute does not modify the data model, but rather, is a property of the (underspecified) literal. Consider the following (RDF purists would perfer to use an RDF:Alt, but let's keep things simple): <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://www.purl.org/dc#"> <rdf:Description aboutEachPrefix="http://www.megginson.com/"> <dc:Subject xml:lang="en">markup</dc:Subject> <dc:Subject xml:lang="fr">balisage</dc:Subject> </rdf:Description> </rdf:RDF> statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Subject" statement.getObject() => "markup" statement.subjectIsPrefix() => true statement.objectIsResource() => false statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Subject" statement.getObject() => "balisage" statement.subjectIsPrefix() => true statement.objectIsResource() => false The language distinction is missing from our model, so we have to add yet another property to the Java interface: public interface RDFStatement { public abstract String getSubject (); public abstract String getPredicate (); public abstract String getObject (); public abstract boolean subjectIsPrefix (); public abstract boolean objectIsResource (); public abstract String getObjectLang (); } statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Subject" statement.getObject() => "markup" statement.subjectIsPrefix() => true statement.objectIsResource() => false statement.getObjectLang() => "en" statement.getSubject() => "http://www.megginson.com/" statement.getPredicate() => "http://www.purl.org/dc#Subject" statement.getObject() => "balisage" statement.subjectIsPrefix() => true statement.objectIsResource() => false statement.getObjectLang() => "fr" We're still not done. Take a look at the following: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:megg="http://www.megginson.com/ns#"> <rdf:Description aboutEachPrefix="http://www.megginson.com/"> <megg:poem rdf:parseType="Literal"> <poem> <line>Roses are red,</line> <line>Violets are blue</line> <line>Sugar is sweet,</line> <line>And I love you.</line> </poem> </megg:poem> </rdf:Description> </rdf:RDF> Since the <megg:poem> element sets the 'rdf:parseType' attribute to "Literal", the contents of the element will not be interpreted as RDF markup. As a result, the value of this statement is a literal string: statement.getObject() => " <poem> <line>Roses are red,</line> <line>Violets are blue</line> <line>Sugar is sweet,</line> <line>And I love you.</line> </poem> " statement.objectIsLiteral() => true If I were to round-trip this back to XML, however, how would I know that it was meant to be XML markup? My software might just as easily generate the following: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:megg="http://www.megginson.com/ns#"> <rdf:Description aboutEachPrefix="http://www.megginson.com/"> <megg:poem rdf:parseType="Literal"> <poem> <line>Roses are red,</line> <line>Violets are blue</line> <line>Sugar is sweet,</line> <line>And I love you.</line> </poem> </megg:poem> </rdf:Description> </rdf:RDF> This probably isn't what I want. As a result, I have to add more information to my Java interface to note whether the literal value is meant to be read as XML markup: public interface RDFStatement { public abstract String getSubject (); public abstract String getPredicate (); public abstract String getObject (); public abstract boolean subjectIsPrefix (); public abstract boolean objectIsResource (); public abstract boolean objectIsXML (); public abstract String getObjectLang (); } At this point, it might make sense to split this out into different classes: public interface RDFComponent { public abstract String getValue (); } public interface RDFSubject extends RDFComponent { public abstract boolean isPrefix (); } public interface RDFPredicate extends RDFComponent { } public interface RDFObject extends RDFComponent { public abstract boolean isResource (); public abstract boolean isXML (); } public interface RDFStatement { public abstract RDFSubject getSubject (); public abstract RDFPredicate getPredicate (); public abstract RDFObject getObject (); } Obviously, there's a much more complex model underlying RDF than the spec lets on, and that model affects not only the ease or difficulty of implementing an object model, but also the difficult of many standard operations like queries against a collection of RDF statements and storage in a relational database. I'd love to hear from others on this list who've worked with RDF. It's full of some very good ideas, but I'm afraid that the underlying (and hidden) conceptual complexity might stunt any serious implementation. All the best, David -- David Megginson david@m... http://www.megginson.com/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|