Re: Expert's advice needed about XML Schema and defining some

To: Robert Koberg <rob@k...>
Subject: Re: Expert's advice needed about XML Schema and defining some kind of relation
From: Henrik Martensson <henrik.martensson@b...>
Date: Sun, 07 Dec 2003 21:28:42 +0100
Cc: "'xml-dev@l...'" <xml-dev@l...>
In-reply-to: <3FD205E0.2030506@k...>
References: <531C0947F1E0844FBDDB065F2EA5329501953CB1@u...> <91EF69A8-2724-11D8-860C-000A95CCC59E@x...> <3FD09B1C.6000300@k...> <1070712933.3656.204.camel@l...> <3FD205E0.2030506@k...>

Play the video

On Sat, 2003-12-06 at 17:37, Robert Koberg wrote:
> Henrik Martensson wrote:
> 
> Hi and thanks for your comments. [more inline]
<snip>
> 
> As I mention below the idref is *not* defined in the content schema as 
> an xs:IDREF. It gets /validated/ by using XSL which basically renders a 
> report of 'broken ' links (which should never happen if strictly using 
> our tool). When indexing an xml document for search the link/@page_idref 

OOPS! Now I feel really stupid. (Lucky thing I am used to it.) I missed
the "not" word. I am so used to thinking of CDATA locators as having a
"href" attribute that I assumed  "idref" was of the type IDREF, even
though you, as you point out, very clearly stated that it was not.

> is set as a field and a search is performed in the app before a site.xml 
> 'page' can be deleted; if linked the the link must be removed/changed 
> before the page can be deleted. If that somehow fails, the rendering XSL 
> simply does not render it as a link but as inline text.
> 
> The idref is used in rendering to create an 'always valid' html:a/@href 
> by traveling back up from the ref'd page in the site.xml hierarchy. This 
> means when a page or folder is moved and the site is regenerated the 
> html:a link is always valid for internal pages.
> 
> As for reuse, there is the option of putting a site_idref, but I 
> understand what you are writing and I do agree that it is a limitation. 
> I have not hit it as our current clients have no cross-site needs, but I 
> assume I eventually will. I just haven't found a better way to do it and 
> keeping the advantages.

Well, you have something that works now, and can figure out how to
implement something more powerful when you need it, I am sure.

It is far more dangerous to overdesign a system, and never get it off
the ground. Once I worked with a company that had been discussing a
sophisticated linking system for several years. They never got it to
fly. The system was born as HyTime and dieed as XLink. If they had been
less intent on covering every eventuality, they could probably have
built something that would have worked fairly well in a very short time.

> >>If the above is understandable (:-o), am I following bad or good practices?
> > 
> > 
> > Yes it is understandable. It is not the way I would have designed it,
> > but it works and it isn't to complex.
> 
> There is a little more to it :)

I am sure there is, but believe me, I have seen far, far worse.

> 
> > 
> > What I have against the design is that it locks out standard tools and
> > techniques. 
> 
> it can most definitely use standard tools. The client can pick up all 
> their assets (and generated HTML) and edit/generate their site on their 
> own using a number of different techniques -- it is very flexible. We 
> can provide an XSL that transforms the site.xml/topics.xml into a 
> Jakarta Ant build file which is used to generate the site. (I don't mind 
> sharing this, but have not had the time to create a sourceforge project 
> or the like. I would love it if it would become a standard :)

Yes. It was because of the IDREF misunderstanding I thought you would
have trouble validating documents. Sorry!

> 
>  > How does an XML editor validate an article?
> 
> The content pieces can be validated individually, the link validation is 
> separate. Any XML Schema validating editor can be used (even MSWord now 
> :). In our gui, we use MSXML in the browser for validation and its SOM 
> to give editors valid options on the client and occasional validation on 
> the server (xerces). I usually use Oxygen -- a really nice tool (even 
> supports RNG).

I haven't tried it yet, but Oxygen sounds interesting. A friend told me
it is a bit slow, but then again, it may be time for a computer upgrade
at work.

The clients I work with usually want Epic, which isn't my favourite
editor. I have strong leanings towards XMetaL, though the changes in the
development environment made in the latest version has prompted me to
have a look at other alternatives.

> 
> > For that matter,
> > how does an authoring application validate an IDREF link when the target
> > is in another document? 
> 
> as I mention above, it is done in a few ways: by using XSL and putting 
> it in a search index. I am definitely open to better ways to do this.

There are a few XLink validators out there. They might be useful. I
haven't investigated them too closely though. I work in large projects,
and there is always plenty of people around that love to mess with
standards. Since people can't even agree on what a URL looks like, It is
often necessary to write special purpose software to traverse links.

> 
> The UI presents the user with a grouped (by things like /folder1/folder2 
> etc) dropdown of available site.xml pages. so if using the our gui, they 
> cannot put in an invalid internal link.
> 
> > How do you reuse content that is currently
> > published on this web site somewhere else, where the publishing
> > mechanism is entirely different? 
> 
> If the content piece has an internal link, and that link should not 
> point to the original site's page, then yes, it is a problem. I don't 
> know how to solve it and keep the benefits I get from doing it this way.

URNs might be part of the answer. With URNs you can name target
resources, instead of pointing to their location, as you do with an URL.

At some point you need to map the URN to a physical location. The
"traditional" way to do that is with a catalog file. For a complex
system, storing the mapping table in a database would probably be
better.

When a link changes, only the mapping table has to be updated.

This one way to do it:

1. A document is authored using URNs as resource identifier. Omitting
   several help attributes, a link in a paragraph could look something
   like this:

   <p><locator xlink:href="urn:isbn:0-321-15078-3"/> provides valuable
   insight into the theory behind the Agile methodologies.</p>

   (In a real application, you would probably have an xlink:title
   attribute that contains the title so that the link is easy to
   format in an editor.)

2. Before actually using the link it must of course be transformed into
   something a browser (or a formatting system) will understand.
   Assuming that the document will be stored on a web site, there are
   two alternatives:
   a. Process the document before storing it on the web site, replacing
      the URNs with appropriate URLs.
   b. Process the document on the fly, when it is served by the web
      server
   In either case, an OO language, such as Java or Perl, is probably
   more suitable for doing this remapping than an XSLT stylesheet.
   Of course, if there are any additional transformations that are
   better handled by XSLT, it is easy enough to invoke an XSLT
   transformation from that language. (BTW this it is far better
   to invoke an XSLT stylesheet from the OO language than doing it
   the other way around. If you invoke the OO language from XSLT,
   you become dependant on extension mechanisms that may, or may not
   be supported by other XSLT processors. It is better to keep the XSLT
   pure, and thus more portable.)

On one web site, the link could be mapped to a local URL containing a
review. On another website, the same link could be mapped to Amazon, or
some other bookstore. Nothing in the document changes. The document
could be served by completely different web applications, or by the same
one on different sites. It doesn't matter. Only the mapping table
changes.

Of course, the above description is a little bit simplified... :-)

> 
> > Again, these considerations may not be
> > relevant for everyone, but they certainly are to me and the systems I
> > work with.
> > 
> > Most of the differences in our respective approaches, is probably due to
> > the fact that we work with different things. I am mainly concerned with
> > content creation, and you (it seems to me) with publishing. Also, we
> > deal with different kinds of information, a brochure and the technical
> > documents I work with are very different. It is only natural if we have
> > different perspectives and come up with widely different solutions to
> > similar problems.
> 
> We are trying to provide a solution to small/medium-size 
> businesses/departments which do not (in my experience) have a great need 
> for content reuse outside of their site. Their needs are usually pretty 
> basic and generally don't even know that they are using XML. They 

I am working mostly with fairly large industrial companies, such as car
manufacturers and telecommunications companies. Content reuse is the
bread and butter of their systems. For example, a car manufacturer may
want to provide a customized manual for every car that is built. Most of
the information remains the same from car to car, but some of it
changes. Keeping track of it all can be quite a challenge.

> occasionally login (we are an ASP CMS) to add some content, wordsmith or 
> add a folder/page. I tend to think of our business model similar to that 
> of a health club -- in that their is usually a flurry of activity 
> initially, then it tapers off to popping in once a month or so.

Well, the projects I am involved in frequently last more than a year.
The health club metaphor does not quite fit them. I wish it would...

> 
> When I need to change the system (which I do rather frequently), I can 
> transform the config/content XML, perhaps transform the XSL, make the 
> appropriate modifications to the schema and validate everything to 
> ensure it will work properly. It allows us (my wife and I) to manage a 
> large number of projects very easily. I have found (over the last 3 
> years) using this approach to be very efficient.
> 

At my end, projects aren't managed as much as they are survived (or
not). There is ongoing work to make document production systems more
manageable, but the industry is slow to catch on. The tendency is to
adopt new technology, and then use it to do old things. Still XML (and
SGML before that) has brought some great benefits in terms of lowered
cost, and sometimes even better quality.

/Henrik

Follow-Ups:
- RE: Expert's advice needed about XML Schema and defining some kind of relation
  - From: "A. Belkin" <arkbel@c...>

References:
- RE: Expert's advice needed about XML Schema and definin g some kind of relation
  - From: Barwell Jonathan <Jonathan.Barwell@a...>
- Re: Expert's advice needed about XML Schema and definin g some kind of relation
  - From: Michael Champion <mc@x...>
- Re: Expert's advice needed about XML Schema and defining some kind of relation
  - From: Robert Koberg <rob@k...>
- Re: Expert's advice needed about XML Schema and defining some kind of relation
  - From: Henrik Martensson <henrik.martensson@b...>
- Re: Expert's advice needed about XML Schema and defining some kind of relation
  - From: Robert Koberg <rob@k...>

Prev by Date: RE: ANN: XQEngine 0.61
Next by Date: RE: Expert's advice needed about XML Schema and defining some kind of relation
Previous by thread: Re: Expert's advice needed about XML Schema and defining some kind of relation
Next by thread: RE: Expert's advice needed about XML Schema and defining some kind of relation
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >