[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: creating tags around a string

Subject: Re: creating tags around a string
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Tue, 18 Apr 2006 12:38:06 -0400
indesign inx
Troy,

Listen to Jon. (Except when he says listen to me. Everything I say should be taken with salt. I have been known to crack jokes.)

Since INX is an XML-based format, you can and should consider your handling of it to fall well within the range of garden-variety XML processing. Thus no "tag-writing" techniques are needed. You can and should go with the straight stuff.

However, as Jon says that doesn't really solve your problem. That's because you are dealing with what we call an "upconversion", namely a transformation that goes "up hill", adding information to the source that was not there to begin with.

That is, in order to go from

<AUTHOR>Al Stick, Tom She, Dick Burg, and Harry Ward</AUTHOR>

to something richer and more useful like

<AUTHOR>
  <name><fname>Al</fname> <lname>Stick</lname></name>,
  <name><fname>Tom</fname> <lname>She</lname></name>,
  <name><fname>Dick</fname> <lname>Burg</lname></name>, and
  <name><fname>Harry</fname> <lname>Ward</lname></name>
</AUTHOR>

or even to something intermediate like

<AUTHOR>
  <fname>Al</fname> <lname>Stick</lname>,
  <fname>Tom</fname> <lname>She</lname>,
  <fname>Dick</fname> <lname>Burg</lname>, and
  <fname>Harry</fname> <lname>Ward</lname>
</AUTHOR>

your process has to be able to do more than split up strings and wrap the substrings in tags (or more properly, insert them in elements). Ultimately, it has to be able to recognize what's a name, what's an "fname" and what's an "lname".

These are non-trivial operations, which is why thinking up the realistic and all-too-common complex cases is an important part of this task. Jon suggested "Anne Marie Scott", which takes a form you'll see in almost any list of names. Then there's "Mishima Yukio" (Japanese like many other languages places the family name first) or "George Noel Gordon, Lord Byron" (not two names but one, and you'll have to specify how it should be tagged).

XSLT 1.0 was not designed for upconversion, so you'll find even straightforward string-wrapping operations (which I see now was the essence of your original question) to be rather gnarly and difficult, albeit a common problem, which can therefore be handled using publicly-available code.

XSLT 2.0 is much better at this, and since you already appear to be using XPath 2.0 constructs, I'd recommend you look further into the tokenize() function along with XSLT 2.0 regular expressions.

But due to the deeper issues, which have to do not with the mechanics of string-wrapping but with semantic inferencing (getting the processor to discriminate between the parts of your complex cases and label them correctly), my feeling is that this is not wise even to attempt without a clear-eyed assessment of the difficulties and limitations. This is one of those cases where half a solution is often worse than none, since it creates expectations that are then bound to be disappointed.

This is probably why Jon also urges that you push the problem back upstream. The people creating this data are in a much better position to tag it fully and correctly to begin with. Short of that, you may find a manual or semi-automated method is less painful than a broken automated process that only creates bad code, which must then be corrected by hand.

Good luck,
Wendell

At 11:53 AM 4/18/2006, Jon wrote:
On 4/18/06, TGolshan@xxxxxxxxxxxx <TGolshan@xxxxxxxxxxxx> wrote:
> Wendell,
>
> Thanks for the insight. Perhaps I need to explain myself a little more.

I'd recommend paying attention to Wendell.  He addressed at least one
of your problems.  You need to think about generating elements, not
"tags".  The code is a bit clearer when you do:

<fname><xsl:value-of select="." /></fname>

instead of

       <xsl:text>
&lt;fname&gt;</xsl:text>
                                                               <
xsl:value-of select="."/>
                                                       <xsl:text>
&lt;/fname&gt;</xsl:text>



>I am taking an InDesign inx file and trying to build some structure (ie an
> XML document) that I can then use later. I am working with an army of
> editors who will not style first or last name in InDesign. They will
> however style every name as author, so my inx file looks like this:
>
> <AUTHOR>Al Stick, Tom She, Dick Burg, and Harry Ward</AUTHOR>
>
> and I want to add <fname> and <lname> elements to the mix.
>
> What is the best way to do this? I wrote the below function but realize
> that this is difficult at best.

The reason you're not necessarily getting a ton of help on your
question is that it's a lot deeper and more complex than any simple
trick with XSLT.  This mailing list is concerned with XSLT, while your
problem is more a fundamental problem with markup systems and
publishing....


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.