[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: text to xml conversion

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: ycao5@scs.carleton.ca
  • Date: Tue, 02 Jun 2009 15:12:10 +1000

Re:  text to xml conversion
ycao5@scs.carleton.ca wrote:
>
> Hello everyone,
>
>     I want to ask one question about covering text to xml file. Is 
> there any way to attach a schema to a text document and parse it into 
> xml according the rules defined in the schema? Can I find such kind a 
> tool, otherwise I plan to write one myself. Please give me some 
> references. Thanks.
There is one called SP, which is open source from James Clark.

It is parses data files using SGML configuration files and schema, and 
is suitable when that file contains Wiki kinds of markup  or CSV or 
other formats with explicit delimiters, but not so much for more 
free-form data. It is probably only worth using if you will have to do 
this kind of things many times.

See http://www.xml.com/lpt/a/1377   for an overview of this approach. SP 
is industrial strength.

You could convert your XML Schema to an XML DTD, then decorate it with 
information to make it an SGML DTD to say:

 1) Which delimiters in your text should be substituted for which tags
 2) In which contexts this recognition takes place
 3) Which tags won't have corresponding delimiters in your file and are 
allowed to be implied

The output is XML. SGML has many gotchas for new players, but if you 
aleady know HTML and XML and DTDs or XSD, then they will be much easier 
to cope with (SGML, XML's precursor, got a bad rep because people needed 
to learn the equivalent to XML + HTML bits + this kind of text parsing 
system all  at the same time.)

I also made a some software that wasn't based on grammars for doing this 
task: it was called Psyche in Java and Micah Dubinko also made an 
implementation of it (for .NET?) but we never released them. If there is 
interest I could drag it out again: it also requires delimiters.

Cheers
Rick Jelliffe


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.