[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Gavin Nicol" <gtn@r...>
  • To: "Arjun Ray" <arayq2@g...>, xml-dev <xml-dev@l...>
  • Date: Tue, 20 Jul 2021 14:32:57 -0400

FWIW. The general trend is away from ETL to ELT or LET, with type projection being part of the 'T'... certainly in a lot of the more 'loose' integrations This is a use-case where JSON is actually very convenient (vs XML, for example)... a common pattern is to stream small chunks of JSON into a 'database' and to then do ad-hoc extraction/transformation as part of report generation.

On Tue, Jul 20, 2021, at 12:03 PM, Arjun Ray wrote:
On Wed, 21 Jul 2021 00:45:31 +1000, Rick Jelliffe
| Arjun wrote:
|  *Or maybe I'm not getting the point here?*

| Automatic data binding.  For a datatype to be attached to the parse tree
| (DOM etc) as a primitive type (a la C), something has to be told to take
| the text value and convert it: it could be a program, it could be a schema,
| or it could be from instance syntax (i.e. delimiters and lexical patterns).

Isn't that the job of the ETL subsystem responsible for loading a DOM
that is - or should be! - fit for purpose? (But yes, working with
generic DOMs would place that burden on the application.)  

| So lets say I have a XSLT script which decorates an incoming document of a
| standard format with an ISO8601 date in an attribute @D.

Or a LPD?  (Nah. Too bad ISO8879 bollixed the definition of LINK.)

| Contrast this with a richer syntax  where the parser (or transducer is the
| right CS term?)  can do those steps automatically with no configuation or
| coding of that on the server side.

The trouble with that is the universe of such useful auto-conversions
is unbounded.  Why stop at ISO8601 dates?  (How about Roverdates[*],
or DbaseIIdates, which are YYYYMMDD, 32-bit ints in C parlance?)

Customizing the ETL layer seems wiser, from a system design POV.

[*] After Rover, Salomon Brothers' hoary database (back then, some
well-known databases on Wall Street had names like Spot and Fido...)

| The thing is, it is ridiculous (IMHO) to claim that an ISO 8601 date is
| something that we really need freedom to allow clients to interpret
| differently

I always thought the argument was to leave to the client the decision
to use ISO8601 at all, as opposed to some other scheme.
 
| Instead of datatype, it might be good for SGML-ers to consider it in 
| terms of NOTATION. 

Now there's an idea! (And bring in data attributes as well?)

| But SGML did not provide a way to declare the NOTATION of an
| attribute value, doubly not providing it for DTD-less documents.

Actually, Annex K in the WebSGML TC did, but they could have added
that for elements as well.


(That went nowhere, of course, and now that Google has fubar-ed its
Dejanews takeover, the CTS archive is inaccessible altogether.) 

| I think it is entirely reasonable and SGML-ish to want to specify the 
| notation used for some attributes.

The case for DTD-less documents is harder, and I think intractable
within the confines of the spartan syntax of XML.  There are other
punctuation characters that could be put to good use.  And the case of
simple elements, with #PCDATA content models, are very suitable
candidates for content notations and shorthand based on the old NET
style:

<myItem someAtt="someVal" /data content here/>

This can be extended with other content delimiters

    <myItem format="iso8601" /@2021-07-20/>

And so on.  I don't think we can get by with just the current syntax.
 

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@l...





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member