[XSL-LIST Mailing List Archive Home]
[By Thread]
[By Date]
[Recent Entries]
[Reply To This Message]
Re: collection() and uncommon file extensions
You were right, I had a typo in addition to the file type issue. It
works fine if I switch file extensions to .xml.
Thanks indeed,
Martin
On 2018-11-15 2:52 p.m., Martin Holmes gtxxgm-xsl-list-2@xxxxxxxxxxx wrote:
I'm actually encountering the same problem if I change the extensions
from .hocr to .xml, so there's definitely something odd going on here.
The files are definitely well-formed and appear to be valid. They start
like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
B B B "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
....
Do you see anything that would prevent Saxon 9.9 HE from parsing these
as XML? I've come across another situation where files were recognized
as XML if they had the XML declaration but not if they didn't (those
were SVG), but these fail even with the declaration.
Cheers,
Martin
On 2018-11-15 12:59 p.m., Michael Kay mike@xxxxxxxxxxxx wrote:
Everything about the collection() function is very
implementation-specific, so this is really a Saxon question rather
than an XSLT question. (And no, there are no plans to define standards
in this area, though it would be nice.)
The way you are going about it looks right to me. It's probably
failing because of some detail that you didn't realise was important.
I know it's difficult to put together a repro for this kind of problem
but that's really what we need.
Around 40 years ago I worked with an operating system that knew the
content type of each file. Shame the idea didn't catch on.
Michael Kay
Saxonica
On 15 Nov 2018, at 19:32, Martin Holmes gtxxgm-xsl-list-2@xxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi all,
The recent changes to XPath
(https://www.w3.org/TR/xpath-functions-31/#func-collection) have
introduced the capability for the collection() function to retrieve
non-XML documents as well as XML documents. However, that has broken
some processes I have where XML documents with different extensions
are being retrieved. For instance, where this:
collection('dir/?*.hocr')
used to happily retrieve and parse HOCR files (which are actually
XHTML), Saxon now treats these files as xs:base64Binary items, and
won't parse them, even though they have XML declarations.
I know that the recommended approach to dealing with this is to use a
Saxon configuration file to register the file extension -- which I
presume would be done like this:
<resources>
B <fileExtension extension="hocr" mediaType="text/xml"/>
</resources>
However, this doesn't seem to work for me -- do I have that syntax
wrong?
Also, the conf file approach isn't easily portable, so I'm wondering
if there are any plans to enable the media type to be specified on
the collection() function itself, or to be registered in an XSLT
document somehow?
Cheers,
Martin
|
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format
RSS 2.0 |
|
Atom 0.3 |
|
|