Skeleton Schema Language description

Prolog

XML is at heart a set of lexical rules that provide a means to present data in a structured format. It is both the simplicity of the rules, and the richness of the structures that they enable, that have facilitated the great success of the XML paradigm.

A schema language is used as a way to tame the potential complexity-via-combinatorics that raw XML presents. A schema document provides rigid lexical constraints on what is permissible in an XML document. Typically, these constraints are so strict as to eliminate the extensibility that is built into XML's name - eXtensible Markup Language.

This last point is worth some comment, as the Skeleton Schema approach avoids this limitation. Once a document is submitted for validation against a schema, the validation does two things. The first is to confirm that all the structural rules defined by the schema are adhered to. The second is to enforce that no extra-schema content is present. It is this second feature that destroys the extensibility of xml when schemas are applied. The Skeleton Schema approach differs on this point. The structure that it defines is to be considered as being limited in scope to the instance document content that matches the schema artifacts. In other words, when an element or attribute is found in an instance document that matches a QName from the schema in the correct location, then the data contained in the instance document must conform to the schema rules. However, if any extra element or attributes appear in the instance document, then they are treated as not existing at all with respect to schema validation. So the Skeleton Schema formalism enforces what is defined, and permits all that is not defined.

Of course, there are situations when it is desired to prevent extra content from being included in an instance document. The Skeleton Schema does provide a means to lock up the content model in such circumstances. However, it is important to understand that this is not the default behavior.

Justification

The two prevalent schema languages are DTD (Document Type Definition) and XSDL (Xml Schema Definition Language). Each has their advantages and disadvantages. Perhaps the largest disadvantage that they share is that it is difficult to discern the structure of an instance document that any given schema describes. For example, a DTD schema isn't even written as an xml document. An XSDL schema is written in xml, but its structure differs drastically from that of the associated instance documents.

When working with these schemas, it is a common practice to attempt to manually generate an instance document to get a better sense of the structure. It is also common, when a schema designer distributes the schema to people charged with the generation of instance documents, to include a number of example instance documents to assist in communicating the intent of the schema. Since the schema itself should be, and technically is, sufficient to describe the structure of instance documents, clearly these common practices indicate that these particular schema languages are inadequate - at least in communicating structure to a human audience.

So the goal of the Skeleton Schema language is to be people-friendly. The design goal is to provide a schema that looks very close to an instance document. All the schema artifacts are collected into a small set of namespaces so that they standout. The number of artifacts are purposely small to limit the complexity of the schema language. While the schema language must be complete so that it can, at least in principle, be used to validate instance documents, the syntax has several shortcuts that are intended for ease of comprehension for human readers. The human reader is the primary target of the Skeleton Schema language. The chief design goal is to permit a human reader, unfamiliar with the schema language, to understand the meaning of a schema and discern the structure of instance documents at first reading.

There are typically two main types of xml documents. The first is a data document and is intended for long term storage of data. Traditional schema languages are quite good at describing data documents. In the Skeleton Schema language these documents are described by a schema carrying the /DataDocument/ namespace.

The other type of xml document is used for making requests to, and sending responses from, application service providers. These are, here, referred to as protocol documents. These instance documents are usually composed using a common set of xml artifacts, but the multiplicities for these artifacts vary greatly among the different request and response types. This variability in the multiplicities causes traditional schema languages to fail. There are two approaches that can be taken with these schema languages. The first is to make most artifacts optional. Clearly this is a severe problem, and forces the application developer to handle the validation of multiplicity constraints himself. The second approach is to create a separate schema for each request and response type. This introduces the need for the application developer to introduce his own methods for determining which schema to apply during validation. The Skeleton Schema language handles protocol xml documents within a single schema, and provides a way to declare the rule for matching instance documents with the appropriate schema artifacts. In the Skeleton Schema language these documents are described by a schema carrying the /Protocol/ namespace.

Different schema languages have different strengths and weaknesses. It is not a bad idea to create schemas for the same xml domain using different schema languages. There is no claim that the Skeleton Schema language is a replacement for the alternatives. For the interested reader, there is a discussion of the relative strengths and weaknesses of the different schema languages here.

/Protocol/ namespace

The schema describes a number of packets that are used in a communication protocol.

The root element of the schema is named p:Protocol and lives in the /Protocol/ namespace. Each protocol document type is declared as a child of the p:Protocol element. If protocol packets are to be validated against the skeleton schema, then each child of the p:Protocol element must declare the @p:selector. This is used by a validator to locate the correct protocol packet type from the schema to use for validation. On the other hand, if the skeleton schema is not going to be used for validation, then the @p:selector is not required.

When a protocol packet is validated against the schema, the validator will step through each child of the schema's p:Protocol element. If a schema-child element has the same QName as the root element of the protocol packet, then the schema element's @p:selector is examined. The value is an xpath expression which is evaluated against the protocol packet, with the root element as the context node. If the expression evaluates as false( ) (or an empty node set) then the validator continues searching for a matching schema element. If the xpath expression evaluates as true( ) (or a non-empty node set) then the schema element and it's content is taken as the relevant schema to use when validating the protocol packet. From then on the validation proceeds similar to the validation with a /DataDocument/ schema.

In addition to the attribute @p:selector, each packet-root element can also have the optional attribute @p: name. This can be used to give a user-friendly name to each protocol packet. It is anticipated that any tools that are implemented to work with skeleton schemas will use the @p: name while interacting with users. For example, a validator may report the @p: name as part of a validation error message, which will permit human investigators to more easily locate the error source.

/Catalog/ namespace

This namespace is used in the definition of a separate (optional) Catalog document. The purpose of a Catalog document is to provide a way for a Skeleton Schema validator to locate the schemas to use when validating an xml instance document. There are two reasons why this may be necessary.

The first is when a schema references namespaces defined elsewhere. Since a Skeleton Schema models the structure of the instance document, the external schema isn't required for validating structure. Data types are another matter. It shouldn't be necessary to redeclare all external data types within a schema in order to use them. The mechanism of a Catalog file permits the mapping of an external namespace to a schema location. A Skeleton Schema validator can then locate the external schema and extract the relevant data type declarations. In the scenario where externally declared data types appear in a Skeleton Schema document and the validator cannot locate the declaring schema, the undetermined types will be treated as s:string for validation purposes. The exception to this are the types in the XSDL namespace http://www.w3.org/2001/XMLSchema. These types are well-known and Skeleton Schema validators are required to recognize them intrinsically.

The second use for a Catalog file is to provide the validator with a mechanism to handle version changes in an xml instance document. It is quite common for the definition of xml documents within an application to evolve over time. As they evolve their structures are enriched and/or modified. These changes are reflected in changes to their descriptive schemas. Such changes are typically flagged with a version identifier embedded within the instance document. This should permit a validator to select the matching schema version.

The structure of a Skeleton Schema Catalog file will be defined at the end of this document.

Versioning

There are at least three version numbers that should be relevant when working with Schema Skeletons. The first is the version of the Skeleton Schema language itself. This will allow the language to evolve without having to maintain absolute backward compatibility. The second is the version of a given schema instance. Xml schemas tend to evolve over time so it is useful to capture versioning. One approach for this is to include the version in the namespace, but that can become unwieldy over time. Also, that would mean a different mechanism than that for other version information. The third (and possibly) other version numbers arise from the application domain being modeled. The Skeleton Schema language recommends handling all of these version numbers in a uniform fashion, with attributes of a particular data type. For the two version numbers associated with schemas, we have two defined attributes on the document element of the schema:

s:schema_version : = The version of the schema document, is of data type s: version (see below)
s: language_version : = The version of the Skeleton Schema language in which the schema is written, also of data type s: version. This document describes s: language_version="201003"

There is yet another version that can come into play. Consider the scenario where a standards body develops a schema. Many entities will use this schema as the cornerstone of their communication needs. Ideally, the standard schema would suffice for all needs, but in practice this doesn't always work out perfectly. For a given pair of communication partners, there may arise a need to deviate from the schema or add to it. A typical example would be the need to require certain data items whereas the standard schema has set them as optional. In the case of adding to the schema (i.e., data items that weren't anticipated by the authors of the standard schema) it becomes necessary to add new elements and/or attributes to the standard schema. In order to satisfy these needs, the skeleton schema language provides for two more attributes that can be attached to the schema's root element. They are:

s:brand : = An s: uri that indicates authorship of the 'branded' schema
s:brand_version : = An s: version indicating the version of the branded schema. This can only be present when s:brand is also defined and populated

It should be noted that these branding attributes are for documentation only. They play no role in the workings of a skeleton schema validator. There is also no way to explicitly declare where and how a branded schema deviates from the standard. If an entity is going to work with multiple brandings of the same standard schema, then there will need to be an indicator within the instance documents themselves to permit a validator to select the correct branded schema via a Catalog file.

Structural Information

The entities described below give the meta-information that defines the structure of the instance documents.

s: mult : = multiplicity 1 (default), ?, +, *, {fixed count}, {min, max}
s:default : = When data is optional, especially in a protocol packet, it may happen that a default value will be assigned. In this case it is a good idea to document that within the schema so there is no doubt about what the values will ultimately be assigned. It is anticipated that this will be especially useful in branding scenarios, where an implementer of a standard schema may want to clarify the specific default values chosen - choices that are likely to vary between different implementers.
s:ordered : = whether subelements are ordered as they appear in the schema, or can appear in any order. Takes values T|F, with T as the default value.
s: type : = data type (s:string is default, other types selected from some namespace)
s: mixed : = whether an element's content can contain text nodes mixed in with the declared subelements. Takes values T|F, with F as the default value.
s:cond : = XPath expression that must evaluate as true( ) in order for the indicated xml artifact to be present
s: vals : = used in attribute declarations to declare the data type held as the attribute's value.
s: template_id : = Mark an element with an identifier that can be referenced later in the schema. Referencing this element is a shorthand to indicate that the referencing element has the same structure as the element that is referenced. In other words, this provides a simple way to avoid repeating structures in the schema.
s: template_idref : = Mark an element as using the same structure as an earlier element marked with the matching s: template_id.
s:choice_group_id : = When a group of elements are mutually exclusive, mark them with the same group identifier. This is not limited to schema-sibling elements. It probably needs to be mentioned that these identifiers are not of type s: id, which need to be unique to a document. The type s: id refers to instance documents. The identifiers that appear as values of s:choice_group_id appear in the skeleton schema. There is no uniqueness constraint on them, as there cannot be given their meaning.
It is possible to include an element in multiple exclusion relationships, although it is probably not a very good idea – probably indicating a poor design. At any rate, the way to handle this is to have multiple identifiers separated by spaces (similar to how IDREFS works in a DTD). The exclusion rules for each identifier are enforced independently of each other.
s:choice_bind_id : = This is used in conjunction with an s:choice_group_id. If none of the elements with the same s:choice_group_id have an s:choice_bind_id then only one of these elements may appear in an instance document. A refinement of this concept is to have sets of elements that appear together, but are mutually exclusive of other element sets. In this scenario, all elements involved will be marked with the same s:choice_group_id. For the elements within a set, they are bound together by giving them the same s:choice_bind_id.
Just as with s:choice_group_id, s:choice_bind_id can contain multiple values. However, now there is an issue of which set identifiers correspond to which binding identifiers. The rule is that the number of identifiers must match and they correspond based on their orders within the s:choice_group_id and s:choice_bind_id schema attributes. This means that in practice any element that belongs to multiple choice groups MUST define a corresponding s:choice_bind_id schema-attribute.
s: rigid : = Mark an element as only containing content explicitly mentioned in the schema. It takes values T|F, with F the default. The value is inherited by all subelements unless and until it is explicitly overridden.
s:ANY_ELEM : = This is to be used as an element name in the schema, and serves as a place-holder for any element from any namespace. It is used when the schema is defining a set of attributes that can be applied to arbitrary documents. For example, it can be used to create a skeleton schema for the xlink specification.
s:elem_def_type : = This can be placed on the root element of the schema or on any other element of the schema. The value is inherited by all sub-elements. It defines the data type that is the default for element content, overriding the usual default type of s:string. This can be a very useful feature in three scenarios. The first scenario is when the grammar is attribute-centric so elements do not have any data content; in this case the default type can be defined as s: none. The other scenario is when string data is expected, but the application wants to limit the range of characters that can be passed it, e.g., for security reasons; in this case a regular expression can be used to enforce the character-set limitation. The third scenario is when all or most of the data is expected to consist of non-empty strings; in this case the type s: vstring can be used.
s:att_def_type : = Similar to s:elem_def_type, this can be used to define the default value for attributes. This can be useful if most attributes take the same non-string type. It can also be useful for the security scenario described above for attribute-centric documents.
s:elem_def_mult : = Similar to s:elem_def_type, this can be used to define the default value for the multiplicity of the element and it's sub-elements. For some types of schemas this can significantly clean-up the markup. However, multiplicities aren't normally as readily apparent to human readers as are data types. This observation suggests that this feature be used with caution to avoid confusion.

For elements, these artifacts are attributes declared on the element. If the element is to take a constant value the corresponding schema element will take that value. If the element is to take a value from an enumeration, the corresponding schema element will take the full enumeration, as described under the Data Types section below. In the case of s: type, a notational shortcut is to declare the type as the element's value. This will work as long as the value isn't a constant or enumeration. However, since in this case the s: type keyword doesn't appear, the declared type must include the namespace prefix.

   To clarify the last point, you can define an element's value to be of type s: int in any of the following ways:

   <ElementName s: type="int"/>
   or
   <ElementName s: type="s: int"/>
   or
   <ElementName>s: int</ElementName>

For attributes, the full structural information must be included within the value of the attribute. In this case the full format is:
@attribute_name="s: mult='mult-indicator';s: type='type-indicator';s:cond='cond-indicator';s: vals='valuespace-indicator'"
However, this is usually overkill. It is only necessary to include the artifacts that do not take a default indicator. In these cases a shorthand syntax can be used.

The four shorthand scenarios are:
@attribute_name="" or @attribute_name="?" to express s: mult, which can only be 1 or ? for an attribute
@attribute_name="s: int" to express s: type
@attribute_name="the constant value" to express that s: vals is a constant value of s: type='string'
@attribute_name="A|B|C|s:space|s:empty" to express that s: vals is an enumeration (here consisting of 5 possible values)

When the shorthand is used to indicate s: vals, the s: type is usually implicit in the format. However there is always ambiguity since the type s:string can take any value, and other types may have overlapping value spaces.
@attribute_name="{type-indicator}valuespace-indicator".

For example, to declare an enumerated set of prices one could have
@price="{s: money}1.99|2.99|3.99"

One can also use a similar shorthand to express s: mult, so if the previous attribute is optional, you could have
@price="{?}{s: money}1.99|2.99|3.99"

Defined Constants

s:space : = ASCII char 0x20, a typographic space
s:empty : = The empty string
s: posinf : = Positive infinity
s: neginf : = Negative infinity
s: null : = A NULL value

An s:space is useful as it is often difficult to discern the presence of a typographic space. Also, xml parsers may eliminate spaces as irrelevant white space.

An s:empty is useful to indicate that an element or attribute can take an empty value.

Both s: posinf and s: neginf are useful in defining unbounded ranges.

An s: null is useful in declaring a NULL value, which has special meaning in certain application domains. One example is in serializing for transport the contents of a table in a relational database. Another example is when the xml models containment relationships between instances of objects in an OOP system. Of course, in an instance document there will already be a mechanism for expressing NULL values that is traditional for a given application domain. The s: null is a means to uniformly handle NULL values in schemas across application domains. It is similar to the xsi:nil="true" from the xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance", only much easier to work with.

A final comment on the defined constants is to stress that they are used within schema documents, not in instance documents. In fact, all artifacts defined within the Skeleton Schema namespace are to be understood as appearing in schema documents only. This is what distinguishes s: null from xsi:nil="true". The former is for the skeleton schema and would map to a domain-specific representation of NULL in instance documents. The xsi:nil attribute is intended to appear in instance documents, providing a universal representation of NULL across all domains. The Skeleton Schema language recommends the use of s: null in schema documents and xsi:nil="true" in all instance documents.

Pre-defined Data Types

The following pre-defined data types can be used in Skeleton Schemas:

s:string : = The default type, it is used explicitly when some length constraints are imposed, as in s:string(fixed length) or s:string(min, max). It is also used when the default type is overridden with s:elem_def_type or s:att_def_type and the particular data item requires the full s:string type.
s: vstring : = A valued-string, it is a string for which the minimum length is 1. If used without any length constraint it is the same as s:string(1, posinf). It can also be used with a length constraint to give a maximum length, as in s: vstring(max). It is often the case that a datum takes a string value but cannot be empty. The s: vstring provides that in a simpler syntax than the rather pedantic s:string(1, posinf).
s: int : = An integer, it may be constrained to a fixed number of digits as in s: int(5) or to a range as in s: int(min, max)
s: float : = A floating point number which may have an indicator of number of decimal places as in s: float(places), a range as in s: float(min, max) or both as in s: float(places; min, max)
s: money : = money in US currency, always expressed as-if s: float(2) [other currencies will require the declaration of an s:currency]
s:currency : = A currency specifier from the ISO-4217 specification using the alphabetic codes. The ISO-4217 code determines the lexical space of data of type s: money, which isn't always s: float(2) as for USD currency. For example, the Japanese yen, JPY, would be s: int. Other currencies imply s: money values equivalent to s: float(1) or s: float(3).
Only attributes can be defined as type s:currency. The declared currency in the instance document then applies to all values of type s: money that are in the scope of the element on which the s:currency attribute is defined. Of course, the s:currency can be redefined for a contained element, with it's affect limited to the scope of the element for which it is declared.
s:date : = ISO-8601 date in canonical form
s:datetime : = ISO-8601 datetime with or without time zone in canonical form
s:duration : = ISO-8601 timespan in canonical form
s: timestamp : = ISO-8601 datetime in UTC (canonical form), with millisecond accuracy [format is yyyy-MM-ddTHH:mm:ss.fffZ]
s: version : = yyyyMM or yyyyMM.dd, where yyyy should be a 4-digit year, MM should be a 2-digit month and dd should be a 2-digit day
s: regex : = s:string conforming to a regular expression; always expressed as s: regex(pattern)
s:guid : = a GUID in format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
s:base64 : = base64 encoded binary data
s: mimetype : = a MIME type
s:email : = an email address
s: id : = equivalent to a DTD ID type, although validation will not limit it to the same rules that apply to an NCName (which is imposed by DTD validators). One great advantage is that guids can be used for ID's, a very natural choice that is excluded when restricted to the rules of NCNames.
s: idref : = equivalent to a DTD IDREF type
s: idrefs : = equivalent to a DTD IDREFS type
s: uri : = a universal resource identifier
s: xpath : = an XPath expression
s:any : = Limited to elements, it declares that an element can have arbitrary content.
s: none : = Limited to elements, it declares that an element cannot have data (this is necessary to override the default data type of s:string).

A constant value can be declared by using the value. An enumeration can be declared by listing the values separated by '|', for example A|B|C.

It is possible to use the not( ) operator on most types. What that means is that when validating the instance document, the validator will determine whether a datum conforms to the type as specified in the schema. The answer to this determination will be a logical TRUE or FALSE. If the not( ) operator is applied to the type in the schema, then the validator checks against the contained type and it will pass the test when the test against the contained type is FALSE (because not( ) converts it to TRUE). Let's examine a particularly useful example. Consider the type expression not(s:empty). This can be used to declare that a datum can take any value, but it cannot be empty (this example is equivalent to using the data type s: vstring). An example of when the not( ) operator cannot be used would be not(s:string). Since in xml all data is lexically a string, it is impossible to satisfy the condition not(s:string).

An interesting question is how would a validator deal with a declared data type when the xml artifact is not required. For example, consider an attribute declared att_name="{?}s: int". How would validation treat an instance document that has att_name=""? There are two possible answers. The first is that since the attribute is optional, it should be acceptable to have an empty value. The second answer would be to note that if a blank value were acceptable the schema should be att_name="{?}s: int|s:empty". To resolve the issue, one must remember that the design goal is to be user-friendly. It is a common practice for developers to create xml instance documents (especially protocol packets) by starting with a template string consisting of all the elements and attributes with empty values, then populating those values for which there are data. This practice forces the viewpoint that an empty value must be acceptable for any optional artifacts. Of course, for required artifacts an empty value will only be acceptable if the schema specifically allows for it.

For attributes the above discussion is sufficient. An optional element can contain a value, attributes with their own values and subelements. The rule for an optional element is that it is okay to pass empty data if and only if all the data is empty, except for constants which must be populated. Even when an optional element contains a required attribute and/or subelement that data must be empty for the rule that allows empty strings for optional values to apply. Again, this is motivated by the common usage of beginning with a template xml document that has all non-constant data as empty.

Other Data Types

Other types can be used by including a reference to their namespace – for example, the types familiar from XSDL can be used. If using types that aren't native to XSDL or the /Structure/ namespace, one can reference the defining schema for those types using a Catalog file, as described below.

Alternatively, new types can also be defined by including, as the last children of the document element, the schema element s: DataTypeDefinitions. This element must declare the @s: types_namespace to provide the namespace within which the data types are defined - one s: DataTypeDefinitions is required for each namespace. Each subelement of s: DataTypeDefinitions is an xsd:simpleType and uses the syntax of XSDL. Of course, the xsd:simpleType must have a non-empty @name so that it can be referenced from within the body of the schema.

These defined types should declare unique value spaces. It is considered an abuse of the Skeleton Schema philosophy to use a defined type to provide an alias for an existing type. In particular, one should not attempt to provide semantically meaningful types on the back of simple value-space types. Semantics should be confined to the application realm. While it is true that semantically meaningful names are typically used in designing xml, this is really somewhat of an illusion. The true semantic meaning only holds when the content of an xml document is consumed. Also, these names appear in the instance documents, whereas the names given to schema types do not – they only appear within schema documents. One should maintain clarity on this point and confine semantic names to elements and attributes and leave data types at the lower level of lexical value spaces, which means the permissible combinations of typographic characters.

As an example, consider attempting to define a type t:zipcode to be 5 digits. The schema would include
<Zipcode s: type="t:zipcode"/>.
This is incorrect because it attempts to place semantic information in the type when it belongs in the element name. The correct schema would be:
<Zipcode s: type="int(5)"/>.

On the other hand, an application may need a full Zip+4 value. In that case it would be permissible to define a new lexical type t:zip4 which should look like s: int(5)–s: int(4). While this can be handled via s: regex, it is considered good form to define a distinct lexical type; s: regex is a catch-all type, and isn't very user friendly.

Protocol Example

I want to present a real world example. I'm going to use an existing DTD schema and translate it to the Skeleton Schema language. The DTD in question is from the Mismo organization that is charged with defining standards for communicating between various entities involved in the mortgage industry. I've taken their merge-only credit report request schema and placed it here. Notice that the DTD defines a protocol, and as such nearly all of the defined entities are optional. However, in an actual Mismo request packet of a given request type, most entities are required. I've placed the corresponding Skeleton schema here. As you can see, the multiplicities are specified accurately because a /Protocol/ schema is possible. The better support of data types is obvious, but that is only because the Mismo schema is written in DTD. Mismo has recently moved to an XSDL schema, so the data types are handled better. The lack of support for protocols is still an obvious limitation in the new Mismo schemas, which are still forced to define nearly all entities as optional.

One last point on the Mismo skeleton schema is to notice the attributes @s:brand and @s:brand_version on the root element. Since Mismo is a standards body, they are unable to anticipate every need of all the entities that have adopted their standard. As such, the move from the DTD to the skeleton schema had somewhat arbitrary choices made – though typical of actual practice. There is therefore a need to 'brand' the schema as deviating from the standard according to the particular needs of the communicating partners. It is also possible that these needs will change in time while the standard schema remains fixed. Pretending that the @s: version was set by the Mismo organization, the evolving variations from the standard are marked with the @s:brand_version.

Structure of Catalog files

A Skeleton Schema Catalog file has the root element c:SchemaCatalog in the namespace xmlns:c="http://SkeletonSchema.info/Catalog/". The structure is defined by this Skeleton Schema.

Each schema in the catalog is defined by an @instance_namespace and an optional set of Selector elements.

When a validator is using the catalog to locate a data type definition, the namespace is that to which the data type belongs. In this case there is no need for Selector elements. If no element has a matching namespace then the type is treated as equivalent to s:string. If multiple elements have a matching namespace then each schema is searched, in order, for a definition of the data type in question. The first definition found is the one that is used.

When a validator is using the catalog to locate a schema, the namespace refers to the namespace of the document element of the instance document. If there are more than one Schema elements with the same namespace, then the Selector elements are used to identify the correct schema. The first set of Selector elements that all evaluate to true( ) (or non-empty node set) defines the chosen schema. If no set of Selectors evaluate to true( ), then the first Schema with the correct namespace is the chosen schema. If none of the Schema elements has a matching namespace, then the validator must return an error as this is most likely the result of an error on the part of either the schema author or the author of the instance document.

In all scenarios the Location element gives the location where the schema document can be found.

Namespace Aliases

Frequently you may use elements and attributes from well-known namespaces within your own xml grammars. When doing so, you may find that the tool you use to write the schema is aware of these well-known namespaces. When that is the case, you'll find that your skeleton schema gets flagged as erroneous for these artifacts. For example, consider the attribute xml:id. The value of this must match the DTD type ID. If you were to include it as an optional attribute in a skeleton schema you might enter xml:id="?". This is an error since any xml parser that is aware of xml:id knows that "?" is not a valid value because it is not an NCName. There are many well-known namespaces and the number is growing all the time. As xml tools get updated they will become aware of this growing list of namespaces, so the Skeleton Schema language is faced with a challenge.

The Skeleton Schema Language offers a pseudo-namespace, http://SkeletonSchema.info/NamespaceAlias/. This is used to create a mapping between well-known namespaces into the skeleton schema arena, for which xml tools are not prepared to enforce any rules. Lets look at a simple example:

<d:DataDocument xmlns:d="http:SkeletonSchema.info/DataDocument/" xmlns:s="http:SkeletonSchema.info/Structure/" xmlns:_xml="http://SkeletonSchema.info/NamespaceAlias/xml" xmlns:_xlink="http://SkeletonSchema.info/NamespaceAlias/xlink"> <ElementExample xmlns:xlink="http://www.w3.org/1999/xlink" _xml:id="?" _xlink:type="simple" _xlink:href="?"/> </d:DataDocument>

and an associated instance document:

<ElementExample xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="id_001132" xlink:type="simple" xlink:href="http://mydomain.com/image.gif"> This is an example document. </ElementExample>

The schema uses the attribute named 'id' in the namespace http://SkeletonSchema.info/NamespaceAlias/xml. This is perfectly legal xml and any tools that parse the XML will treat it without imposing any interpretation. A Skeleton Schema validator will recognize the pseudo namespace, extract the terminating 'xml' and then apply the rules to _xml:id as if it were xml:id. In this case the namespace prefix 'xml' is well-known and so no actual namespace mapping is required. In general this isn't the case so the appropriate namespace declaration will be required as well. Notice that the example contains a similar usage for xlink, and for that the namespace is declared correctly in the schema even though there are no xml artifacts from that namespace in the schema. In the instance document the correct prefixes are used.

Just as normal namespace declarations have the concept of scope attached to them, namespace aliases also have scope attached to them. This must be true as they appear in a Skeleton Schema document and in that context they are true namespace declarations. Accordingly, these can be placed on any elements within the schema and the scoping rules are applied. A parser will just extract the terminating characters and treat that as a namespace prefix. When a namespace alias is referenced via a namespace prefix, a matching true namespace prefix must be in scope or the parser will report an error. It is probably a good idea, though, to limit the usage of namespace alias declarations to the root element of the schema. This is desirable because a namespace alias decalaration takes up a lot of space and would present a distraction if embedded within an element that is part of the defined grammar. It is a distraction there because a namespace alias declaration should never appear in an instance document. Since well-known namespaces typically have customary namespace prefixes associated with them, a namespace alias should usually require a single alias prefix and that permits a single namespace alias declaration on the schema's root element.

Welcome to the Skeleton Schema Page