Robert
Cary
Leif
Ph.D.
This schema is derived from data-types that are copyrighted http://medical.nema.org/dicom/2004.html
This schema includes some of the common ,Digital Imaging and Communication in Medicine, DICOM, data-types that are imported by other schemas.
This is a DRAFT proposal that has not been formally
tested to comply with the W3C XML schema version 1.0 specification. No position
is taken in respect to whether a particular software implementing this schema
works according to medical or other regulations.
Draft
Attention is called to the possibility that implementation of this
specification may require use of subject matter covered by patent rights.
By publication of this standard, no position is taken with respect to the existence
or validity of any patent rights in connection therewith.
Newport Instruments shall not be responsible for identifying patents
or patent applications for which a license may be required to implement a standard
or for conducting inquiries into the legal
validity or scope of those patents that are brought to its attention.
Copyright 2002-2008 Newport Instruments. One print or electronic copy may be made
for personal use only. Systematic or multiple reproduction, distribution to multiple
locations via electronic or other means, or duplication of any material in this document
for a fee or for commercial purposes require permission of the copyright holder.
For all other uses, please contact Robert C. Leif, Ph.D. at rleif@rleif.com
Since this is, at present, a research project, this schema should not be used in
a medical device."
Supported by Newport Instruments internal development funds.
The content is solely the responsibility of the authors and
does not necessarily represent the official views of any governmental or private organization.
All types, elements and attributes have id attributes.
This will hopefully facilitate working with RDF
The Registry of DICOM data elements in the
Data Dictionary is a table with four fields:
Name, Tag, Value Representation (VR), and Value Multiplicity (VM).
The DICOM name maps to the XML element name.
The Tag and VR are attributes each with a fixed (constant) value.
The Value Multiplicity (VM) is handled by the XML minOccurs and maxOccurs
constructs.
PS 3.6-2004,Page 6, 3.4 DICOM DATA DICTIONARY, "Tag: A unique identifier
for an element of information composed of an ordered pair of numbers
(a Group Number followed by an Element Number), which is used to identify
Attributes and corresponding Data Elements."
The Tag is the unique value that identifies every DICOM data element.
From: PS 3.5-2008 Table 6.2-1, DICOM VALUE REPRESENTATIONS
AT= Attribute Tag is an Ordered pair of 16-bit unsigned integers that is
the value of a Data Element Tag. Its length is 4 bytes fixed
Since it will be used as an XML attribute, Tag_Type must be a XML simple type.
The latest draft of DICOM Suplement 118 has eliminated the lower case version of the hexadecimal numbers from the XML interface to DICOM 26Aug09
The DICOM Value Representations Table has the following fields:
VR, Name, Definition, Character Repertoire, and Length of Value.
The Value Representation is an enumerated type which includes a two letter abbreviation.
It is the data type or class of the DICOM element. Most of the Value Representations
are enumerated below. In the future, more can be added as needed.
The OB VR has been included; although it appears not to be defined in DICOM in a form that is
understandable to someone other than a DICOM expert. I would guess that it is involved in the definition
of the endian (RCL)
Create DICOM string types as bounded strings 1 to maxLength.
The Token type which is derived from string was used, as it is appropriate
for descriptions of one word to one paragraph. "The value space of token is
the set of strings that do not contain the line feed (#xA) nor tab (#x9) characters,
that have no leading or trailing spaces (#x20) and that have no internal sequences
of two or more spaces." The prefix Bd is an abbreviation for bounded.
Bd#_Types are strings with the number of characters ranging from minLength to MaxLength.
This is an example. Please notice that the addition of the VR_Type causes the use of a complexType
Short Text (ST) is a character string that may contain one or more
paragraphs. It may contain the Graphic Character set and the Control Characters, CR,
LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but
leading spaces are considered to be significant. Data Elements with this VR shall not be multivalued
and therefore character code 5CH (the BACKSLASH “\” in ISO-IR 6) may be used.
Long Text (LT) is a character string that may contain one or
more paragraphs. It may contain the Graphic Character set and the Control Characters, CR,
LF, FF, and ESC. It may be padded with trailing spaces, which may be ignored, but
leading spaces are considered to be significant. Data Elements with this VR shall
not be multi-valued and therefore character code 5CH (the BACKSLASH “\” in ISO-IR 6)
may be used. From PS 3.5-2006,
PS 3.5-2008, Table 6.2-1, DICOM VALUE REPRESENTATIONS
Definition: A string of characters that identifies an Application Entity with leading and trailing
spaces (20H) being non-significant. A value consisting solely of spaces shall not be used.
Default Character Repertoire excluding character code 5CH (the BACKSLASH “\” in ISO-IR 6), and
control characters LF, FF, CR and ESC.
This is slightly more restrictive than above. Pianykh recommends making it all capitals in order to be effectively case insensitive.
(0008,0100) Code Value SH value multiplicity = 1.
PS 3.3, 8.1 "CODE VALUE: The Code Value (0008,0100) is an identifier that is
unambiguous within the Coding Scheme denoted by Coding Scheme Designator
0008,0102) and Coding Scheme Version (0008,0103).
Note: The Code Value is typically not a natural language string, e.g. “T-04000”.
(0008,0104) Code Meaning VR = LO, value multiplicity = 1.
PS 3.3, 8.3 CODE MEANING
"The Code Meaning (0008,0104) is text which has meaning to a human
and which conveys the meaning of the term defined by the combination
of Code Value and Coding Scheme Designator. Though such a meaning
can be “looked up” in the dictionary for the coding scheme, it is
encoded for the convenience of applications that do not have access
to such a dictionary."
PS 3.5, Table 6.2-1 DICOM VALUE REPRESENTATIONS
A string of characters with leading or trailing spaces (20H) being
non-significant. Uppercase characters, “0”-”9”, the SPACE character,
and underscore “_”, of the Default Character Repertoire
This data type is described in PS 3.5-2004,
Digital Imaging and Communications in Medicine (DICOM)
Part 5: Data Structures and Encoding, Table 6.2-1.
DICOM VALUE REPRESENTATIONS.
The actual definition is based on the one from The HR-XML Consortium.
The description of the DICOM Person Name includes, "The five components in
their order of occurrence are: family name complex, given name complex,
middle name, name prefix, name suffix. Any of the five components may be an
empty string." The correspondence between the DICOM data types and HR-XML
elements are as follows.
family name complex => Family_Name;
given name complex => Preferred_Given_Name;
middle name => Middle_Name;
name prefix => Prefix;
name suffix => The concatination of generation and qualification.
The HR-XML elements: Formatted_Name, Legal_Name, and Given_Name do not have
counterparts in DICOM.
A character string encoded using a 5 component convention. The character code
5CH (the BACKSLASH “\” in ISO-IR 6) shall not be present, as it is used as the delimiter
between values in multiple valued data elements. The string may be padded with
trailing spaces. The five components in their order of occurrence are: family name complex,
given name complex, middle name, name prefix, name suffix. Any of the five components
may be an empty string. The component delimiter shall be the caret “^” character (5EH).
Delimiters are required for interior null components. Trailing null components and
their delimiters may be omitted. Multiple entries are permitted in each component and
are encoded as natural text strings, in the format preferred by the named person. This
conforms to the ANSI HISPP MSDS Person Name common data type.
This group of five components is referred to as a Person Name component group.
For the purpose of writing names in ideographic characters and in phonetic
characters, up to 3 groups of components (see Annex H examples 1 and 2) may be used. The
delimiter for component groups shall be the equals character “=” (3DH). The three
component groups of components in their order of occurrence are: a single-byte
character representation, an ideographic representation, and a phonetic representation.
Any component group may be absent, including the first component group. In this
case, the person name may start with one or more “=” delimiters. Delimiters are required for
interior null component groups. Trailing null component groups and their delimiters may be
omitted.
Precise semantics are defined for each component group. See section 6.2.1.
Example: Rev. John Robert Quincy Adams, B.A. M.Div.
“Adams^John Robert
Quincy^^Rev.^B.A. M.Div.”
[One family name; three given names;
no middle name; one prefix; two
suffixes.]
This is obsolete. It is better to split the name into a DICOM string part
and a standard XML Person Name. Please see persons.xsd
An identifier of the of the order for the Study. From Table C.7-3
GENERAL STUDY MODULE ATTRIBUTES in the draft supplement 122.
Source:PS 3.5-2008,3.10 DICOM DATA STRUCTURES AND ENCODING DEFINITIONS, Page 15 UNIQUE IDENTIFIER (UID):
A string of characters that uniquely identifies a wide variety of items;
guaranteeing uniqueness across multiple countries, sites, vendors and equipment.
Source=PS 3.5-2008, Page 55 - Standard -
Section 9 Unique Identifiers (UIDs)
The UID identification scheme is based on the OSI Object Identification (numeric form) as defined by the
ISO 8824 standard. All Unique Identifiers, used within the context of the DICOM Standard, are registered
values as defined by ISO 9834-3 to ensure global uniqueness. The uses of such UIDs are defined in the
various Parts of the DICOM Standard.
Each UID is composed of two parts, an (org_root) and a (suffix):
UID = (org root).(suffix)
The org_root portion of the UID uniquely identifies an organization, (i.e., manufacturer, research
organization, NEMA, etc.), and is composed of a number of numeric components as defined by ISO 8824.
The suffix portion of the UID is also composed of a number of numeric components, and shall be
unique within the scope of the org_root. This implies that the organization identified in the org_root is
responsible for guaranteeing suffix uniqueness by providing registration policies. These policies shall
guarantee suffix uniqueness for all UID's created by that organization. Unlike the org_root, which may
be common for UID's in an organization, the suffix shall take different unique values between different
UID's that identify different objects.
9.1 UID ENCODING RULES
The DICOM UID encoding rules are defined as follows:
Each component of a UID is a number and shall consist of one or more digits. The first digit of
each component shall not be zero unless the component is a single digit.
Note: Registration authorities may distribute components with non-significant leading zeroes. The leading
zeroes should be ignored when being encoded (ie. “00029” would be encoded “29”).
Each component numeric value shall be encoded using the characters 0-9 of the Basic G0 Set
of the International Reference Version of ISO 646:1990 (the DICOM default character
repertoire).
Components shall be separated by the character "." (2EH).
PS 3.5-2008
Page 56
- Standard -
If ending on an odd byte boundary, except when used for network negotiation (See PS 3.8),
one trailing NULL (00H), as a padding character, shall follow the last component in order to
align the UID on an even byte boundary. {The NULL is not used in WADO, RCL)
UID's, shall not exceed 64 total characters, including the digits of each component, separators
between components, and the NULL (00H) padding character if needed.
PS 3.5-2008, Table 6.2-1 DICOM VALUE REPRESENTATIONS
A character string containing a UID that is used to uniquely identify a wide variety of items. The
UID is a series of numeric components separated by the period "." character. If a
Value Field containing one or more UIDs is an odd number of bytes in length, the Value Field
shall be padded with a single trailing NULL (00H) character to ensure that the Value Field
is an even number of bytes in length.
DICOM Unique Identifier(UID). The value, name, and use are loosely based on the headings of Table A-1 UID VALUES,
which are UID Value, UID NAME, UID TYPE, and Part. Part referrers the volume and section of the DICOM standard.
This permits a UID to be pointed to as a UID or URI or both
From Network Working Group Request for Comments: 4122 Category: Standards Track
Authors: P. Leach, Microsoft; M. Mealling, Refactored Networks, LLC; R. Salz, DataPower Technology, Inc.
Date: July 2005
http://www.webdav.org/specs/rfc4122.html
Abstract:
"This specification defines a Uniform Resource Name namespace for
UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally
Unique IDentifier). A UUID is 128 bits long, and can guarantee
uniqueness across space and time. UUIDs were originally used in the
Apollo Network Computing System and later in the Open Software
Foundation's (OSF) Distributed Computing Environment (DCE), and then
in Microsoft Windows platforms."
A 128 bit long UUID should contain 16 binary bytes.
“Each field is treated as an integer and has its value printed as a zero-filled hexadecimal digit
string with the most significant digit first. The hexadecimal values "a" through "f" are output
as lower case characters and are case insensitive on input.”
Since the final representation of a UUID uses lower case characters, the simpler definition should used.
DICOM PS 3.3 - 2008 Section 4 Symbols and abbreviations
defines UUID as Universal Unique Identifier (ISO/IEC 11578) and in Table C.12-1, SOP COMMON MODULE ATTRIBUTES as well as
other tables describes it Attribute Name = HL7 Instance Identifier 1 Tag Type = (0040,E001)
Attribute Description = Instance Identifier of the referenced HL7 Structured Document, encoded as a UID (OID or
UUID), concatenated with a caret (“^”) and Extension value (if Extension is present in Instance Identifier).
PS 3.6 (0040,E001) HL7 Instance Identifier VR=ST (Short Text), Value Multiplicity=1.