[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Multi-lingual experiment - a call for action

  • From: "Laurent Bossavit" <laurent@m...>
  • To: Xml-Dev <xml-dev@x...>
  • Date: Mon, 17 Apr 2000 17:44:47 +0200

xul accents
Didier wrote :

> It make sense to have a DTD in each language so that, people can experiment
> translating form one language to an other. Do not forget that the experiment
> is about the result of a database. 

I suspect that we have a slight disagreement though about what we 
mean by 'translation'. It seems perfectly natural to me to want to 
translate a document's character content, that content being a 
necessary and crucial part of its meaning.

But as long as we restrict ourselves to a DTD, rather than using - 
say - Schema, isn't "translating" the DTD more a matter of 
*structure* than one of content ? Documents with different DTDs will 
necessarily be of different types; we can *transform* one such type 
to another, but can we say that in doing so we have performed a 
translation ?

My answer is no - because the essence of 'translating' is not only to 
map "STL Tutorial" to "Une introduction à STL" but also to say that a 
"title" in english is the same thing as a "titre" in french. This 
can't be done, as far as I can tell, with a DTD since it does not 
have the means of expressing equivalence between structural 
vocabularies. This is why I think this problem would make a perfect 
test case for a more sophisticated schema language such as XML 

> a) Example: translate from an XML document encoded with a French DTD into a
> new XML document encoded in German for trading. Should I mention here that
> this matter of fact will happen with a high probability mainly for exchange
> and trade within the European community.

I would argue that an important requirement here woud be that either 
the French or the German version of such a document should pass 
validation by the same parser.

If you're hinting at a sort of "folder" of documents where one 
"multilingual" element could be the parent of a number of subelements 
each representing a different language version of the "same" content, 
then it seems to me that it would be desirable that each such 
subelement be, structurally, *equivalent* to any other, even if 
element names should differ.

Example :
   <objet xml:lang="français">
      <titre>Introduction à STL</titre>
   <item xml:lang="english">
      <title>STL Tutorial</title>

With a (very partial) Schema as follows (if I understand Schema at 
all, that is, which might be far from the case...) :
<schema targetNamespace="http://yo.com/polyglot">
  <element name="T" type="T" abstract="true"/>
  <element name="titre" equivClass="T"/>
  <element name="title" equivClass="T"/>
  <element name="O" type="O" abstract="true"/>
  <element name="objet" equivClass="O"/>
  <element name="item" equivClass="O"/>

In this case a single XSL transform expressed with (say) french 
element names could be used to output the French version of any 
<objet> contained within a <versions> folder, even if this <objet> is 
in fact an <item>... A rose by any other name, etc. (An interesting 
question is how the equivalence classes themselves should be named; 
maybe Esperanto...)

> Please, use the accents since French includes accent. If I show you a
> Japanese DTD (unfortunately most mails won't be able to decode UTF-8
> Japanese characters) you'll notice that the elements are full Japanese words
> _not_ cut back ones. So please, include the accents so that it is french not
> a language between two chairs. If we speak of multi-ligual let's be
> multi-lingual. Anyway, don't bother, I'll add them.

Yeah, accents seem to be allowed - looks like I read the spec wrong. 
Excerpted from the XML 1.0 spec:

[45]  elementdecl ::=  '<!ELEMENT' S Name S contentspec S? '>' 
[5]  Name ::=  (Letter | '_' | ':') (NameChar)* 
[84]  Letter ::=  BaseChar | Ideographic 
[4]  NameChar ::=  Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender 

Then again, does the above mean an element name can't *start* with a 
diacritic ? That would rule out "éditeur"... It's all spelled out in 
the spec but I haven't gotten around to learning Unicode yet - I know 
I should !

Laurent Bossavit     -     Ingénieur R&D
>>>        laurent@m...        <<<
>>            ICQ#39281367            <<
MultiMania     http://www.multimania.fr/

This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.