[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Round 2: Identifying Data for Interchange
Hi Folks, Thanks again for all your input. It has been most enlightening! I have taken all your comments, assimilated them, and I believe that I have found a new perspective on this issue that may alter things quite a bit. Before I present this "new perspective" it is important that I recap what was discussed. RECAP Issue: What are the defining characteristics of data that is "suitable" for being interchanged? Conversely, what are the defining characteristics of data that is "not suitable" for being interchanged? To provide a basis for discussion, I present two examples: 1. AIRCRAFT EXAMPLE: onboard an aircraft is a system that periodically transmits to a ground station some information. Included in this information is: - Distance to Destination Airport - Distance to Navigation Aid - Distance to Emergency Airport 2. FINANCIAL INSTITUTION EXAMPLE: a financial institution provides recommendations to its clients on what to do with their stock shares. The information sent to its clients is a recommendation: - Buy, Sell, Hold Let's look at each of these examples. AIRCRAFT EXAMPLE What data should be sent to the ground station? That is, what data is "suitable" for interchange? Your first instinct might be to send "Distance-to-X" data. For example: <distance to="destination airport">590</distance> <distance to="navigation aid">140</distance> <distance to="emergency airport">75</distance> Let's consider the advantages and disadvantages of interchanging distance data. Advantages: - The sender and receiver will have the same value for the distance. There is no ambiguity due to miscalculation. - If we think of the aircraft as a "service" then it is providing a service by taking its raw position data and calculating the distance. A client may not have the ability to compute the distance from raw data, so being able to receive the distance data is valuable to the client. Disadvantages: - Distance data may be important to the client, but it has limited utility. It discards much of the information that recipients of the data could potentially want. For example, it doesn't allow recipients to compute things like heading, location relative to another aircraft, etc. Thus, the client has paid for the service with a lack in flexibility of the data. Rather than interchanging Distance-to-X data the aircraft system could send more fundamental data - position data. Let's consider the advantages and disadvantages of interchanging position data. Advantages: - With position data a recipient could not only know the distance of the aircraft, but could also determine the aircraft heading, time to fly-over, relationship to other aircraft. Also, it could plug the position data into other contexts, such as a map application. Disadvantages: - It is possible that a recipient may calculate a distance that is different than the distance the aircraft system calculates (due to differing algorithms, rounding errors, etc). Thus, there is the possibility for ambiguity and misinterpretation between the sender and the receiver. - The recipient may not have the computational power to take the raw position data and calculate the distance. - The aircraft "service" is not providing much service to the client. Instead of the service doing value-add, the recipient is expected to do value-add. Some things to note from this example: a. Where is "value-add" done? ... server-side or client-side? - sending distance data implies doing value-add on the server-side. - sending position data implies doing value-add on the client-side. b. Value-adding on the server-side makes it easier on the client, but restricts the utility of the data. c. Value-adding on the client-side allows the data to be used in more ways, but burdens the client with more work. At this point, based upon the above example, I would like to introduce some terminology: DERIVED DATA: Distance represents data that is the output from doing a calculation on raw, position data. The distance data is called derived data. FUNDAMENTAL DATA: Position is the raw, basic data. Position data is called fundamental data. FINANCIAL INSTITUTION EXAMPLE A service that financial institutions provide is to give its clients advice on when to buy, sell, or hold a stock. The algorithms used to determine whether to buy, sell, or hold a stock is based upon some fundamental data (plus other data not shown): - performance of the stock over the past few months - performance of the entire stock market - performance of other stock markets Typically financial institutions send to their clients just the recommendation - "buy", or "sell", or "hold". The advantages of this are pretty clear-cut: Advantages: - The factors that a financial institution takes into account to generate its "recommendation" is proprietary. It has no wish to reveal all the data that it uses, nor the algorithm that it employs. - Typically clients are of the mindset, "just tell me what to do". Thus, the financial institution is providing them a quick, easy service. The alternative to sending the derived, recommendation data is for the financial institution to send the fundamental data (the performances data shown above). However, the disadvantages to this are also clear-cut: Disadvantages: - A client typically does not have all the data necessary, nor the algorithms necessary to generate a good recommendation. In this example, the advantages of sending derived is overwhelming. SUMMARY When the data being interchanged is data from a Service then most likely the data should "derived data". Otherwise, the "Service" is not providing much of a "service". ... A DIFFERENT PERSPECTIVE The above examples focused on data interchange from the perspective of the sender providing a "Service" to clients. Now I'd like to shift perspective to this: TWO SYSTEMS SHARING DATA That is, System B needs data from System A. System A wishes to share its data with System B. Let's take the above two examples again. This time we will look at them from the perspective of sharing data with another system. AIRCRAFT EXAMPLE Let's suppose that both System A and System B have requirements for aircraft distance data. They would like to share the data: System A <----------------> System B distance That is, if System A has aircraft distance data then it would like to be able to share it with System B, or vice versa. (System interoperability.) Let's suppose that both systems are using XML to organize their data. Amazingly, they both use the same element and attribute names! For the above example, here's what the data looks like in both systems: System A: <distance to="distination airport">590</distance> System B <distance to="distination airport">545</distance> The careful reader will have noted that, while the element and attribute names are identical, the "data" differs slightly (590 vs 545). Let's examine why this is the case. System A keeps a record of the "line-of-sight distance" between the aircraft and the ground station. System B keeps a record of the "ground distance" between the aircraft and the ground station. Thus, the two systems have slightly differing "semantics" with regards to the "distance". How do the two systems "bridge the semantic gap"? One approach would be for one system to "give in" and adopt the other systems semantics. That is highly unlikely. Both systems will fight like "cat and dogs" to keep their semantics. And there is good reason for this - they have invested a lot of time and money to build applications that process the data with those semantics. With both systems the distance data is derived data - the data is derived from the fundamental position data. If we recognize this then lots of grief and arguing can be avoided by agreeing to interchange the fundamental position data. Thus, we bridge the semantic gap by simply side-stepping it! FINANCIAL INSTITUTION EXAMPLE Let's suppose that Institution A and Institution B are partners and both have requirements to compute a recommendation (buy, sell, hold) for their clients. They would like to be able to share their data: Institution A <----------------> Institution B what data? What data should they share? One possibility is for them to share their recommendation: Institution A <----------------> Institution B recommendation Let's even suppose that both systems are using XML to organize their data. Amazingly, they both use the same element name! Here's what their data looks like: Institution A: <recommendation>Sell</recommendation> Institution B <recommendation>Hold</recommendation> The careful reader will have noted that, while the element names are identical, the "data" differs (Sell vs Hold). Let's examine why this is the case. Institution A uses a different algorithm for computing its recommendation than does Institution B. Thus, the two Institutions have differing "semantics" with regards to the "recommendation". How do the two Institutions "bridge the semantic gap"? One approach would be for one Institution to "give in" and adopt the other systems semantics (i.e., the other Institution's algorithm). That is highly unlikely. Both Institutions will fight like "cat and dogs" to keep their semantics. And there is good reason for this - they have invested a lot of time and money to create their recommendation-producing algorithms. Both Institution's recommendation data is derived data - the data is derived from the fundamental market performances data. If we recognize this then lots of grief and arguing can be avoided by agreeing to interchange the fundamental performances data: Institution A <----------------> Institution B performances data SUMMARY When the data being interchanged is for the purpose of "sharing data" then it seems most prudent to interchange fundamental data. This will enable you to gracefully avoid the "semantic mismatch" problem. ** GRAND SUMMARY ** We need to distinguish between: - data interchange for the purpose of Services, - data interchange for the purpose of sharing data. For Service data interchange --> use derived data. For data sharing --> use fundamental data. QUESTIONS 1. Schema Implications: suppose that a System/Institution deploys a Web service, and also shares data with a partner. Does it create two schemas: - one that describes the interchange data with Web service clients, - one that describes the interchange data for data sharing. Are two schemas needed? Perhaps there should always be just one schema - that describes the fundamental data. After all, derived data is "application generated data". Here's my argument: Applications come and go, but fundamental data endures. Thus, "application (derived) data" should not be part of a schema. So, I will argue in favor of creating schemas just for fundamental data. What do you think? 2. Data Sharing = Service? You can certainly think of data sharing as a very simple form of Service. And yet, with the former it is best to interchange fundamental data, whereas in the later it is best to interchange derived data. At what point does "switchover" occur? That is, at what point does it stop becoming a data sharing situation using fundamental data, and becomes a Service using derived data? I eagerly look forward to your thoughts and comments. /Roger
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|