[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: plea for help...

Subject: RE: plea for help...
From: Mike Ferrando <mikeferrando@xxxxxxxxx>
Date: Thu, 9 Mar 2006 14:31:08 -0800 (PST)
xml help for beginners
Wendell,
I attended.

It was very well done. A great help for beginners as well as good
insights for those with lots of battle scars.

Thanks,
Mike Ferrando
Library Technician
Library of Congress
Washington, DC
202-707-4454

--- Wendell Piez <wapiez@xxxxxxxxxxxxxxxx> wrote:

> Walter,
> 
> At Mulberry we recently gave a seminar on the topic of converting 
> HTML to XML, so the issues are fresh in my mind.
> 
> You're facing a fairly complex set of problems, but they can be 
> simplified (as you are discovering) by distinguishing between
> 
> A. The syntactic conversion of HTML to XML
> B. The "semantic" conversion from HTML display-oriented tagging to
> a 
> stronger form of tagging in XML.
> 
> Other contributors have posted links to tools that help you with
> job 
> A -- Tidy and its ilk -- and it appears you've got a handle on
> that. 
> This work can be largely or entirely automated. Of course, what you
> 
> get out the other end is still HTML tagging, albeit in XML syntax 
> (it'll be either valid XHTML or a similar XML-compliant HTML), so
> as 
> you're finding it's not good to go for everything you might do with
> 
> well-designed XML markup. But to have it XML syntactically is
> already 
> a big step, because you can then use more and better tools on it to
> 
> take it the rest of the way -- including (which is the question
> isn't 
> totally off topic here) XSLT.
> 
> To do conversion B, however, is an entirely different kettle of
> fish 
> -- and it is beyond the scope of this list, I'm afraid.
> 
> As long as I'm already on it, however, I am willing to comment that
> 
> the scope and difficulty of conversion B is directly related both
> to 
> the quality of tagging in your source (HTML can be "clean" or 
> "dirty", consistent or messy, even after it's made XML-conformant
> in 
> its syntax) and, most dramatically, to the nature of your target
> tag 
> set and to the feasibility of mapping from the HTML you have to
> this target.
> 
> Sometimes this conversion can be automated; sometimes it can be 
> mostly automated; often it requires a good measure of attention
> from 
> human beings to determine how things should be converted in any
> given case.
> 
> The design of that target markup, however, is critical; by itself, 
> this factor alone can make or break your project. There is an 
> infinity of things potentially expressible in XML, which a machine,
> 
> even one programmed with very sophisticated heuristics, will not
> know 
> how to tag correctly, even when it's starting with some kind of
> HTML tagging.
> 
> Accordingly, generally successful efforts at this kind of
> conversion 
> include both designing that format up front, and controlling its 
> design carefully. Design it to concrete requirements, not just to 
> what you think might be useful or fun to have some day, and don't
> be 
> over-ambitious. You can't convert to a target you can't see. But if
> 
> you have a design, the places where conversion is easy or difficult
> 
> will fairly quickly come to light and you can figure out how to
> deal with them.
> 
> I think earlier someone suggested you prototype this first before 
> attempting it. That's very good advice.
> 
> There are also professionals who will gladly share their experience
> 
> in this area, if you are in a position to save money over the long 
> term by investing it intelligently in the near term.
> 
> Good luck,
> Wendell
> 
> At 11:52 AM 3/9/2006, you wrote:
> 
> >On Wed, March 8, 2006 5:28 pm, Florent Georges wrote:
> > > Walter Torres wrote:
> > >
> > >
> > >> 1) convert HMTL into well formed HTML (many are not)
> > >> 2) convert well formed HTML into xHTML
> > >>
> > >
> > > Tidy HTML will give you XHTML from HTML.
> >
> >Yes, just found it late last night. Been playing with it all
> morning.
> >
> >Getting it to work in PHP5 is waht I'm focusing on now.
> >
> >
> > >> 3) convert xHTML into XML
> > >>
> > >
> > > An XHTML instance is already an XML instance.
> >
> >Yes, I understand that.
> >
> >But I'm trying to get this to a "pure" xml, no display
> characteristics
> >markup what so ever!
> >
> >The idea here is to have a "raw/naked" file as possible, that way
> any
> >system can display this as they see fit.
> >
> >
> > > If you want to translate the instance from XHTML to an other
> XML document
> > > type, XSLT may be of great help.
> >
> >Sure, that way I can great a look for website A which is different
> than
> >website B, then create a text or RTF only or even email text or
> HTML or
> >even via web-phone.
> >
> >This is why I was asking about how different folks hand this kind
> of
> >content. What kind of markup it contains, etc.
> >
> >
> > >> 4) create XSLTs to transpose XML back to HTML for page display
> > >
> > > Here again, XSLT may be of great help.
> >
> >Right.
> >
> >Thanks
> >
> >Walter
> 
> 
>
======================================================================
> Wendell Piez                           
> mailto:wapiez@xxxxxxxxxxxxxxxx
> Mulberry Technologies, Inc.               
> http://www.mulberrytech.com
> 17 West Jefferson Street                    Direct Phone:
> 301/315-9635
> Suite 207                                          Phone:
> 301/315-9631
> Rockville, MD  20850                                 Fax:
> 301/315-8285
>
----------------------------------------------------------------------
>    Mulberry Technologies: A Consultancy Specializing in SGML and
> XML
>
======================================================================
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.