This is a note about simplicity and power, and associated
tradeoffs for future XML development.
Objectives
Simplicity is an easy objective for all to agree
upon. However simplicity lies
largely in the eye of the beholder.
Some find COBOL simple; others LISP. Albert Einstein found simplicity in
General Relativity Theory.
A view of simplicity can be more precisely defined in
terms of what is to be accomplished.
To deal with just data content, particularly if it is
simply text, then, as some would suggest, an XML subset can be very simple
indeed. JSON is somewhat more
complicated than that but a step towards considerably more power.
To deal with specifications about data content, such as
XHTML, then more is required.
Finally, to find a way of bringing the power of many
existing XML capabilities to a wide audience of application developers, a still
different view of simplicity is needed.
This note addresses a much larger question than just the base XML
language, but also an integration of consistent support for all the useful
standards that derive from it.
So, I contend that any approach to âsimplificationâ is
meaningful only in the context of some goal to be supported.
Approach
In simplest terms here, I would judge simplicity by
readability, and power by capabilities to develop applications for the Web.
Why an application focus ? Simple â application developers are the
users of XML along with piecemeal XML solutions to issues related to data
presentation, data declaration and validation, data content, data manipulation,
syntax, and fundamentally of modularity. And, the current capabilities come far
short of the realizable potential for fundamental enhancement of the application
development paradigm.
Of course the stakeholders, that cannot be ignored, are
the commercial browser developers.
Itâs hard to tell what motivates them, other than their reluctance to
pursue new ventures, when something else is going well for them. However, it would seem that, of the
motivations that are possible, a foundation that integrates and fosters
application development, should be of interest.
The approach, that I am advocating, is to view XML as a
specification language for models as a powerful tool for Web application
development. This approach needs to
analyze carefully the capabilities that have been developed over the last
decade, to abstract from them, and to understand basic simplicities. These notions can then can be
generalized to provide even more powerful capabilities with no added cost in
complexity. And, current XML
standards are full of such starting points.
That said, I believe also, that what much of what is
being debated on this list for minimal XML foundations can be a significant part
of this effort, especially for the data content focused crowd.
Although this might seem to be a ânew XMLâ, its focus is largely on a consolidation
and systemization of what exists already, and thereby laying foundations for
more rapid growth of further extensions and capabilities.
In particular, however, I tend to be extremely suspicious
of incremental attempts to graft simplicity on top of complexity. These can
appear seductively useful in limited contexts, but in the long run they
generally tend to produce redundancy and inconsistency with the broader
base. (To illustrate this, see the
notes on comments in this list.)
Some Observations
XML started as a markup language for text, with an angle
bracket notation useful for text markup.
·
In terms of syntax
- imbedding markup in text with an
angle bracket syntax is natural, while imbedding text in an angle bracket syntax
is, perhaps can we agree, at least awkward.
·
In terms of semantics â
XML has evolved, unsystematically and perhaps even chaotically, to provide an
impressive and powerful set of models that support various application
development capabilities - just that they are hard to understand and
use.
If the focus is to be on the user, i.e., the
application developer, then what is needed is a language for specification of
models, especially for physical and logic data structures, for presentation, for
control and for communication.
This suggests what would appear at first to be a somewhat
radical notion; i.e., a recognition that even the terminology derived from
concepts of a markup language has long been obsolete, and it should be
reinvented to reflect foundations for a language for models. For instance,
·
âdocumentâ is at best
an awkward reference to the more general concept of an identifiable resource
that can be serve a data stream (not necessarily of text) that can be
parsed.
·
âattributeâ and contained âelementâ have
arbitrary distinctions that are more specific and constrained than the
fundamental concepts of âobjectâ, âpropertyâ and âbehaviorâ.
·
ânamespacesâ should imply no more than a
modular scope of unique names that can be referenced with some simple
extensions, such as a âusingâ statement.
(See more below.)
·
âschemaâ should
be dropped as such, and replaced with other constructs such as âmetadataâ, âdeclarations and constraintsâ needed
for parsing and validation, and sets of âpropertiesâ for processing
(such as presentation).
Secondly, any development needs to provide a clear
separation of concerns;
·
Syntax and semantics
are separate issues, particularly if it is recognized that infosets that work as
objects can be a middle ground.
·
Data content and
specifications for the use of data content are different.
In particular, the above two points combine with the observation that
there is no single syntax possible for data content.
This is partly for compatibility reasons. But fundamentally, data exists on the
Web in many forms and representations, and all need to be accessible. A simple example is data that results
from a SQL query. More general is
the possibility for support for application specific parsers that can extract useful data from complex
documents. And as advocated in this
list, there are contexts where a minimal subset is useful.
·
Applications rely
on models for logical and physical data
structures, for presentation, for communication, and for control.
Specification models, as evidenced by HTML for
presentation, provide impressive capabilities for application development
without the need, or with minimal needs, for procedural code.
These models are clearly separate but
interdependent. What is also
separate is what they have in common. This implies an approach to generalize
from these and other models, and thereby to discover the fundamental
capabilities of a language with support for all of them.
·
Modularization
support in XML standards has
evolved in strange and wonderful ways.
The fundamental capability is for a set of specifications
that can be easily used and integrated with each other. Notes below suggest that this can be
accomplished in a more complete and straightforward manner than through existing
specifications for namespaces, CSS,
Xlink, etc.
·
A specification
language is useful for programmers and also for those who have no programming
experience. This implies that any
specification have a âprimerâ that describes a complete set of capabilities that are
easy to use and understand, leaving more powerful capabilities to other
documentation.
Thirdly, extensibility is fundamental. If for no other reason than
compatibility, the new must be extensible so that it can be easily used and
integrated with existing data and specifications. Several starting points for extensibility
include:
·
Given basic support of
fundamentals in a new language, old syntax and semantics can be largely
convertible to it.
·
A module can specify
its own parser, a semantic analyzer and a processor (e.g., intelligent
âCDATAâ)..
·
An element can have a
related executable library to support properties and behavior.
This, along with reasonable âconstraintâ expressions, can
allow many new standards to evolve without dependencies on new browser support.
Starting Points
Some starting
points that both simplify and provide power:
·
Basic names need to be
simple, uniform and uncluttered with punctuation.
Thus they can consist of alpha characters, numeric digits and the ever
popular underscore. Other
characters such as & : â â and
. can be used to create name expressions in specific
contexts.
Some conventions, such as âcamel notationâ are useful,
for applications and can be consistently used in specifications.
Some restrictions such as a leading underscore only for
âkey wordsâ might be necessary.
Existing names can be escaped, typically with quotes, or,
more generally, with some construct such as - &Name( existing name ) .
·
Standard base data
types from a variety of sources need to be consolidated.
·
A fundamental data type
for reference, which can be specialized in
a variety of ways, is critical.
In
particular a reference can be a name (including a URI), a link, an
expression, such as path or query, a function that returns a reference,
etc.
The syntax needs to distinguish the reference from the
referent.
·
Parameters are typed
values which can derive from either the specification context or referenced data
values or both.
Parameters can be generally substituted for any syntactic
unit.
·
Adaptation support,
especially by non non-progrmmers.
Examples range from parameters in configuration files to
allow specifications to be easily adapted to particular environments or users,
to skeletons such as a interactive tables that mimics what was once called
âquery by exampleâ, to âwizardsâ that prompt users to complete a specification.
·
Basic expressions,
including arithmetic, comparison and boolean need to be consolidated.
Extended expressions are useful for selection, query,
etc.
Reference expressions can look somewhat like
name.node . . . node
where node is
name | link |name(qualifier) | name(qualifier)(selected_contents) |
join(condition) |
node(function )| node(pre-function, post-function)
|
function // (that returns a node)
The pre and post functions above support navigation with
node entry and exit functions for the traversal.
The result of the above can be viewed as a hierarchy or a
table join.
·
Templates are syntactic
units that can be parameterized with substitution and selection.
·
Namespaces need not
exist as such.
âNamespacesâ simply specifies properties of constructs, such as data
types and packages, that require that the names in the context be unique, unless
explicitly overridden. Context
names include those that are inherited and explicitly included.
Name spaces have names and aliases that allow themselves
and their context names to be referenced.
These names can be imported and aliased with âusingâ statements.
·
A module is a set of
specifications that can be referenced and
used in a variety of ways.
Modules are namespaces. Modules include data types, data
structures, and packages which contain data and/or specifications.
Modules can be extended globally (such as for metadata)
or within a local context (such as to provide presentation
properties).
Modules can be referenced, specialized, parameterized,
nested, created, merged, extended, restricted, extracted from, transformed,
etc.
·
Packages are sets of
specifications.
A large application would probably have libraries of
packages of similar types, such as data types, available resources, data units,
metadata, presentation structures, validity specifications, etc. Then these would integrated with sets of
packages that use these specify particular functional capabilities. Finally, there would be packages that
combine functional units into applications that can be parameterized for
particular environments and users.
Also packages could be organized into hierarchical models
from conceptual (standardization level)
to abstract (application models) to concrete (with implementation
details).
·
Data types specify
fundamental properties and behavior, which are define the application concept
they model. ,
Data types also specify general properties and behavior,
which are used to adapt them to specific environments, such as messages,
storage, and presentation.
Properties and behavior can be extended explicitly to
support inheritance which allows polymorphism.
Properties and behavior can have restrictions. This creates âtwinsâ that can be
substituted for their parents but not for each other. (E.G., a circle is a
restricted ellipse, but both are
shapes).
Fundamental properties and behavior can be implemented in executable
libraries. Behavior can include
operators for expressions.
·
Application models
support a Model / View / Control Paradigm.
Presentation models include HTML, Open Office, etc, in an
integrated framework
Data models support generic operations on physical
and logical data structures and elements for create, insert, remove, copy,
index, sort, compare, query, delete, etc.
Augmented operations might include transform or execute.
Control models provide execution frameworks, for instance
pipe, work flow, and state machines.
Communication models support generic protocols to allow interactions and
collaboration.
·
A Process is an
abstraction that performs an action (i.e. response to an event, with
specifications or scripts), either synchronously or asynchronously, in a
specified (and possibly restricted) environment.
·
Syntax can be
considerably enhanced based on parameterized templates (to reduce redundancy),
JSON or Java like structure (to improve readability), and through simplification
of existing constructs such as namespaces.
·
Tools are always an
issue for new XML capabilities.
Basically, tools provide
disparate and, by their very nature, limited support for various XML
capabilities. Thus, they can be a
constraint on introduction and use of new capabilities.
However, the above suggests that fundamental tools can be
a direct extension of the XML base.
In particular, specifications are data and, for a tool, specifications
are the data content. Given the
capability for an interactive environment to have a native capability to
manipulate graphic structures, it should be easy to map sets of specifications
to diagrams for review, edit, and test.
Tool makers, however, do not get left out as they can
still support total application integration and process integration. They just get more to work with.
Summary
It is the contention here that very powerful capabilities
can be developed in a significantly simpler context than existing capabilities,
that this requires new language features, and that these features extend and can
be made compatible with what exists (or is proposed and lacks
implementation).