[Home] [By Thread] [By Date] [Recent Entries]

  • From: David Carlisle <davidc@n...>
  • To: jon@n...
  • Date: Sat, 6 Oct 2007 21:30:13 +0100


> Several of us involved with Distributed Proofreaders and Project
> Gutenberg are analyzing a number of TEI documents representing PG
> etexts.

If you know that the documents are all valid to the full TEI DTD then
you are in a much strnger position than just trying to infer a DTD from
a set of instances. You can, essentially, just use a simple xpath
expression (or just perl, probably) to get a list of element and
attribute names used in your instances then take the TEI DTD and just
delete any references to elements not used in your instances. You may
have a few small manual changes to make the resulting grammar
deterministic, but basically you are done.

Alternatively perhaps you could just get the TEI Pizza chef to bake you a
small DTD
http://www.tei-c.org/pizza.html

David


________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member