[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Who needs XHTML Namespace?

  • From: Walter Underwood <wunder@i...>
  • To: Paul Prescod <paul@p...>, xml-dev <xml-dev@i...>
  • Date: Wed, 01 Sep 1999 10:52:42 -0700

namespace sentence breaking
At 07:31 AM 9/1/99 -0400, Paul Prescod wrote:
>David Megginson wrote:
>> 
>> Paul Prescod writes:
>> 
>>  > What is the virtue in discovering XHTML data in an arbitrary
>>  > document if there are *no rules* about what that information will
>>  > look like? Are you really going to write processors that do not
>>  > care whether images occur within titles or tables within images?
>> 
>> Sure -- a search engine is a very good example of one.
>
>Really? Search engines don't care whether <title>s have images in them?
>Or whether <h1>'s have <table>'s in them? I'm sure that there are some
>that don't but I'm equally sure that there are some that do.

Ours doesn't. It recognizes some tags as a place to break sentences
for natural language processing, and it looks for the first undecorated
text in the document to use as a summary. It also saves text from
inside an <a> tag to index with the referenced document (no, Google
didn't do it first).

But it doesn't care whether <title> has an image, or which kind of
sentence-breaking tag is used (<p>, <blockquote>, <td>, ...).

Hmm, the "strict" variant makes looking for undecorated text
more difficult. I doubt that we'll interpret a stylesheets in 
order to index text. So anbody who wants to use "strict" had
better be ready to put in "description" meta tags.

wunder

--
Walter R. Underwood
wunder@i...
wunder@b... (home)
http://software.infoseek.com/cce/ (my product)
http://www.best.com/~wunder/
1-408-543-6946

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.