Re: Expert's advice needed about XML Schema and definingsome k

To: xml-dev@l...
Subject: Re: Expert's advice needed about XML Schema and definingsome kind of relation
From: Henrik Martensson <henrik.martensson@b...>
Date: Fri, 05 Dec 2003 00:20:45 +0100
In-reply-to: <000001c3ba66$a1f27d80$a9f2abc1@justice>
References: <000001c3ba66$a1f27d80$a9f2abc1@justice>

Play the video

On Thu, 2003-12-04 at 14:00, Peter Glantschnig wrote:
<snip>
> I will try to explain the main problem. Let's say you have two XML
> files. One stores publications and the other one stores some names of
> persons. Now each person is responsible for a couple of publications.
> Now I want to make sure that this relation is always true by using XML
> Schema. So when you enter a new publication, you should not be able to
> assign a person to that publication, which can not be found in the
> persons XML file. So at least when you validate the publications XML
> file you should get an error.

If I had this problem, I would probably have solved it using XLink.
Using XLink, or some other linking approach, it is not necessary to
change the publications schema every time a new person is added to, or
removed from, the persons document.

You may have one author associated with several publications, but I
assume the reverse is also true. While it is possible to create the
links using Simple XLink, Extended XLink would seem to offer a more
natural solution in this case, because Extended XLink supports both
multiended and out of line links.

The links could either be inline, i.e. the links are inside the two
files, or out of line, which means the links would be defined in a
separate document. (Topic maps use the latter approach.)

Personally, I prefer going with inline links when it is possible to edit
the source document at will. It is usually (but not always) a bit easier
to implement applications that way.

Instead of validating against a schema, you would have to check the
links. This is fairly easy though: walk through the link elements, yank
the URIs and see if there is anything at the other end. Given Perl and
LibXML you could do something like this:

use LWP::Simple;
use URI;
use XML::LibXML;
...
sub find_broken_locators {
  my $doc_element = shift;
  my $uri;
  my $xlink_ns = 'http://www.w3.org/1999/xlink';
  my @broken_locators;
  foreach my $locator ($doc_element->findnodes('//*[@xlink:href]')) {
    $uri = $locator->getAttributeNS($xlink_ns, 'href');
    find_link_end($uri) or push @broken_locators, $locator;
  }
  return @broken_locators;
}

my %document_cache;
sub find_link_end {
  my $uri = URI->new(shift);
  my $base_uri = $uri->scheme().'//:'.$uri->authority().$uri->path();
  unless (defined($document_cache{$base_uri})) {
    my $document_string = get($base_uri);
    $document_cache{$base_uri} =
      eval{XML::LibXML->new()->parse_string($document_string)} ||
      undef;
  }
  return undef unless $document_cache{$base_uri};
  my $document = $document_cache{$base_uri};
  my $id = $uri->fragment();
  my ($target_element) = $document->findnodes("//*[id($id)]")};
  return $target_element;
}
...

Quite some time since I did something like this, so I'm sure you can
find a bug or three. Also, it would be necessary to add a bit more error
handling in real life. I hope the principle is clear though.

find_broken_locators() iterates over a list of XLink elements and calls
find_link_end() for each one. If a link target is _not_ found, the link
element is added to a list of broken links. After checking all links,
the function returns a list of broken links.

find_link_end() extracts the base URI (well, URL, really,) and downloads
the target document. Since we will want to check the same target
document many times, it is cached. This saves a lot of wear and tear on
get() and the parser. If we didn't find a target document, or could not
parse it, the function returns undef. If we did parse the target
document successfully, we get the fragment identifier, assumed to be the
value of an ID attribute, locate the element node with that ID value,
and return the element node.

In real life I would probably go for a more object oriented solution
(still Perl though, if I have a choice). I would not try to implement
link checking in XSLT, even though it is possible to do it.

/Henrik

References:
- Expert's advice needed about XML Schema and defining some kind of relation
  - From: "Peter Glantschnig" <justice@s...>

Prev by Date: Re: When Empty is Everything
Next by Date: RE: DOCTYPE used in an XML Schema
Previous by thread: Re: Expert's advice needed about XML Schema and defining some kind of relation
Next by thread: XML Schema Versioning
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >