Robert Cary Leif Ph.D. This schema is derived from data-types that are copyrighted http://medical.nema.org/dicom/2004.html This schema includes some of the common ,Digital Imaging and Communication in Medicine, DICOM, data-types that are imported by other schemas. This is a DRAFT proposal that has not been formally tested to comply with the W3C XML schema version 1.0 specification. No position is taken in respect to whether a particular software implementing this schema works according to medical or other regulations. Draft Attention is called to the possibility that implementation of this specification may require use of subject matter covered by patent rights. By publication of this standard, no position is taken with respect to the existence or validity of any patent rights in connection therewith. Newport Instruments shall not be responsible for identifying patents or patent applications for which a license may be required to implement a standard or for conducting inquiries into the legal validity or scope of those patents that are brought to its attention. Copyright 2002-2008 Newport Instruments. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, or duplication of any material in this document for a fee or for commercial purposes require permission of the copyright holder. For all other uses, please contact Robert C. Leif, Ph.D. at rleif@rleif.com Since this is, at present, a research project, this schema should not be used in a medical device." Supported by Newport Instruments internal development funds. The content is solely the responsibility of the authors and does not necessarily represent the official views of any governmental or private organization. All types, elements and attributes have id attributes. This will hopefully facilitate working with RDF The Registry of DICOM data elements in the Data Dictionary is a table with four fields: Name, Tag, Value Representation (VR), and Value Multiplicity (VM). The DICOM name maps to the XML element name. The Tag and VR are attributes each with a fixed (constant) value. The Value Multiplicity (VM) is handled by the XML minOccurs and maxOccurs constructs. PS 3.6-2004,Page 6, 3.4 DICOM DATA DICTIONARY, "Tag: A unique identifier for an element of information composed of an ordered pair of numbers (a Group Number followed by an Element Number), which is used to identify Attributes and corresponding Data Elements." The Tag is the unique value that identifies every DICOM data element. From: PS 3.5-2008 Table 6.2-1, DICOM VALUE REPRESENTATIONS AT= Attribute Tag is an Ordered pair of 16-bit unsigned integers that is the value of a Data Element Tag. Its length is 4 bytes fixed Since it will be used as an XML attribute, Tag_Type must be a XML simple type. The latest draft of DICOM Suplement 118 has eliminated the lower case version of the hexadecimal numbers from the XML interface to DICOM 26Aug09 The DICOM Value Representations Table has the following fields: VR, Name, Definition, Character Repertoire, and Length of Value. The Value Representation is an enumerated type which includes a two letter abbreviation. It is the data type or class of the DICOM element. Most of the Value Representations are enumerated below. In the future, more can be added as needed. The OB VR has been included; although it appears not to be defined in DICOM in a form that is understandable to someone other than a DICOM expert. I would guess that it is involved in the definition of the endian (RCL) Create DICOM string types as bounded strings 1 to maxLength. The Token type which is derived from string was used, as it is appropriate for descriptions of one word to one paragraph. "The value space of token is the set of strings that do not contain the line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces." The prefix Bd is an abbreviation for bounded. Bd#_Types are strings with the number of characters ranging from minLength to MaxLength. This is an example. Please notice that the addition of the VR_Type causes the use of a complexType Short Text (ST) is a character string that may contain one or more paragraphs. It may contain the Graphic Character set and the Control Characters, CR, LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but leading spaces are considered to be significant. Data Elements with this VR shall not be multivalued and therefore character code 5CH (the BACKSLASH “\” in ISO-IR 6) may be used. Long Text (LT) is a character string that may contain one or more paragraphs. It may contain the Graphic Character set and the Control Characters, CR, LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but leading spaces are considered to be significant. Data Elements with this VR shall not be multi-valued and therefore character code 5CH (the BACKSLASH “\” in ISO-IR 6) may be used. From PS 3.5-2006, PS 3.5-2008, Table 6.2-1, DICOM VALUE REPRESENTATIONS Definition: A string of characters that identifies an Application Entity with leading and trailing spaces (20H) being non-significant. A value consisting solely of spaces shall not be used. Default Character Repertoire excluding character code 5CH (the BACKSLASH “\” in ISO-IR 6), and control characters LF, FF, CR and ESC. This is slightly more restrictive than above. Pianykh recommends making it all capitals in order to be effectively case insensitive. (0008,0100) Code Value SH value multiplicity = 1. PS 3.3, 8.1 "CODE VALUE: The Code Value (0008,0100) is an identifier that is unambiguous within the Coding Scheme denoted by Coding Scheme Designator 0008,0102) and Coding Scheme Version (0008,0103). Note: The Code Value is typically not a natural language string, e.g. “T-04000”. (0008,0104) Code Meaning VR = LO, value multiplicity = 1. PS 3.3, 8.3 CODE MEANING "The Code Meaning (0008,0104) is text which has meaning to a human and which conveys the meaning of the term defined by the combination of Code Value and Coding Scheme Designator. Though such a meaning can be “looked up” in the dictionary for the coding scheme, it is encoded for the convenience of applications that do not have access to such a dictionary." PS 3.5, Table 6.2-1 DICOM VALUE REPRESENTATIONS A string of characters with leading or trailing spaces (20H) being non-significant. Uppercase characters, “0”-”9”, the SPACE character, and underscore “_”, of the Default Character Repertoire This data type is described in PS 3.5-2004, Digital Imaging and Communications in Medicine (DICOM) Part 5: Data Structures and Encoding, Table 6.2-1. DICOM VALUE REPRESENTATIONS. The actual definition is based on the one from The HR-XML Consortium. The description of the DICOM Person Name includes, "The five components in their order of occurrence are: family name complex, given name complex, middle name, name prefix, name suffix. Any of the five components may be an empty string." The correspondence between the DICOM data types and HR-XML elements are as follows. family name complex => Family_Name; given name complex => Preferred_Given_Name; middle name => Middle_Name; name prefix => Prefix; name suffix => The concatination of generation and qualification. The HR-XML elements: Formatted_Name, Legal_Name, and Given_Name do not have counterparts in DICOM. A character string encoded using a 5 component convention. The character code 5CH (the BACKSLASH “\” in ISO-IR 6) shall not be present, as it is used as the delimiter between values in multiple valued data elements. The string may be padded with trailing spaces. The five components in their order of occurrence are: family name complex, given name complex, middle name, name prefix, name suffix. Any of the five components may be an empty string. The component delimiter shall be the caret “^” character (5EH). Delimiters are required for interior null components. Trailing null components and their delimiters may be omitted. Multiple entries are permitted in each component and are encoded as natural text strings, in the format preferred by the named person. This conforms to the ANSI HISPP MSDS Person Name common data type. This group of five components is referred to as a Person Name component group. For the purpose of writing names in ideographic characters and in phonetic characters, up to 3 groups of components (see Annex H examples 1 and 2) may be used. The delimiter for component groups shall be the equals character “=” (3DH). The three component groups of components in their order of occurrence are: a single-byte character representation, an ideographic representation, and a phonetic representation. Any component group may be absent, including the first component group. In this case, the person name may start with one or more “=” delimiters. Delimiters are required for interior null component groups. Trailing null component groups and their delimiters may be omitted. Precise semantics are defined for each component group. See section 6.2.1. Example: Rev. John Robert Quincy Adams, B.A. M.Div. “Adams^John Robert Quincy^^Rev.^B.A. M.Div.” [One family name; three given names; no middle name; one prefix; two suffixes.] This is obsolete. It is better to split the name into a DICOM string part and a standard XML Person Name. Please see persons.xsd An identifier of the of the order for the Study. From Table C.7-3 GENERAL STUDY MODULE ATTRIBUTES in the draft supplement 122. Source:PS 3.5-2008,3.10 DICOM DATA STRUCTURES AND ENCODING DEFINITIONS, Page 15 UNIQUE IDENTIFIER (UID): A string of characters that uniquely identifies a wide variety of items; guaranteeing uniqueness across multiple countries, sites, vendors and equipment. Source=PS 3.5-2008, Page 55 - Standard - Section 9 Unique Identifiers (UIDs) The UID identification scheme is based on the OSI Object Identification (numeric form) as defined by the ISO 8824 standard. All Unique Identifiers, used within the context of the DICOM Standard, are registered values as defined by ISO 9834-3 to ensure global uniqueness. The uses of such UIDs are defined in the various Parts of the DICOM Standard. Each UID is composed of two parts, an (org_root) and a (suffix): UID = (org root).(suffix) The org_root portion of the UID uniquely identifies an organization, (i.e., manufacturer, research organization, NEMA, etc.), and is composed of a number of numeric components as defined by ISO 8824. The suffix portion of the UID is also composed of a number of numeric components, and shall be unique within the scope of the org_root. This implies that the organization identified in the org_root is responsible for guaranteeing suffix uniqueness by providing registration policies. These policies shall guarantee suffix uniqueness for all UID's created by that organization. Unlike the org_root, which may be common for UID's in an organization, the suffix shall take different unique values between different UID's that identify different objects. 9.1 UID ENCODING RULES The DICOM UID encoding rules are defined as follows: Each component of a UID is a number and shall consist of one or more digits. The first digit of each component shall not be zero unless the component is a single digit. Note: Registration authorities may distribute components with non-significant leading zeroes. The leading zeroes should be ignored when being encoded (ie. “00029” would be encoded “29”). Each component numeric value shall be encoded using the characters 0-9 of the Basic G0 Set of the International Reference Version of ISO 646:1990 (the DICOM default character repertoire). Components shall be separated by the character "." (2EH). PS 3.5-2008 Page 56 - Standard - If ending on an odd byte boundary, except when used for network negotiation (See PS 3.8), one trailing NULL (00H), as a padding character, shall follow the last component in order to align the UID on an even byte boundary. {The NULL is not used in WADO, RCL) UID's, shall not exceed 64 total characters, including the digits of each component, separators between components, and the NULL (00H) padding character if needed. PS 3.5-2008, Table 6.2-1 DICOM VALUE REPRESENTATIONS A character string containing a UID that is used to uniquely identify a wide variety of items. The UID is a series of numeric components separated by the period "." character. If a Value Field containing one or more UIDs is an odd number of bytes in length, the Value Field shall be padded with a single trailing NULL (00H) character to ensure that the Value Field is an even number of bytes in length. DICOM Unique Identifier(UID). The value, name, and use are loosely based on the headings of Table A-1 UID VALUES, which are UID Value, UID NAME, UID TYPE, and Part. Part referrers the volume and section of the DICOM standard. This permits a UID to be pointed to as a UID or URI or both From Network Working Group Request for Comments: 4122 Category: Standards Track Authors: P. Leach, Microsoft; M. Mealling, Refactored Networks, LLC; R. Salz, DataPower Technology, Inc. Date: July 2005 http://www.webdav.org/specs/rfc4122.html Abstract: "This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier). A UUID is 128 bits long, and can guarantee uniqueness across space and time. UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation's (OSF) Distributed Computing Environment (DCE), and then in Microsoft Windows platforms." A 128 bit long UUID should contain 16 binary bytes. “Each field is treated as an integer and has its value printed as a zero-filled hexadecimal digit string with the most significant digit first. The hexadecimal values "a" through "f" are output as lower case characters and are case insensitive on input.” Since the final representation of a UUID uses lower case characters, the simpler definition should used. DICOM PS 3.3 - 2008 Section 4 Symbols and abbreviations defines UUID as Universal Unique Identifier (ISO/IEC 11578) and in Table C.12-1, SOP COMMON MODULE ATTRIBUTES as well as other tables describes it Attribute Name = HL7 Instance Identifier 1 Tag Type = (0040,E001) Attribute Description = Instance Identifier of the referenced HL7 Structured Document, encoded as a UID (OID or UUID), concatenated with a caret (“^”) and Extension value (if Extension is present in Instance Identifier). PS 3.6 (0040,E001) HL7 Instance Identifier VR=ST (Short Text), Value Multiplicity=1.