The UCD (Unified Content Descriptors) have been initially developed at CDS, in order to describe homogeneously the contents of the VizieR tables (Ortiz et al., 1999). This first set of terms (hereafter UCD1) seemed to be a very good starting point for a standard description of astronomy, but had some inconveniences (lack of flexibility, missing concepts) which hampered broader use in the emerging VO projects.
Some tools have already been built to demonstrate the possible applications of UCDs in the VO context (Derriere et al., 2003). In order to generalize the usage of UCDs in the different VO components, it has been decided to build a new set of rules and standard terms, named UCD2. This new set will be validated by a committee at the IVOA level. The list of UCD2 terms will become the reference for use by any VO application.
Defining a controlled vocabulary, and getting common agreement is a nearly impossible task, mainly because different people have different views of the same things (e.g. the FITS keyword NAXIS1 can be described as the ``number of pixels along the image axis'', or as the ``size of the detector'').
The main role of UCDs is to describe quantities that are used in practice (and represented as numbers, character strings, ...) at some level between the fuzziness of natural language and the accuracy of attributes of data models. The objective is to ensure interoperability between services in the VO, by the use of a controlled vocabulary that can be interpreted by machines and still readable (and writeable) by humans.
We have tried to reconcile a bottom-up approach (describe what is found in existing datasets) with the ontology-related vision (cf. section 4.). The resulting vocabulary is a compromise, intended to be flexible enough, but less ambiguous than natural language. Of course, the resulting list of words, as any standardization process, is somehow subjective (for example, only one word is kept when there are synonyms of some concept). But words are created to describe the quantities used in practice, taking into account the UCD1 (i.e. the contents of astronomical tables), FITS keywords, etc., so they try to cover the semantic field.
The syntax of UCD1 reflected the underlying hierarchical organization, with different
levels separated by underscore characters (e.g. POS_EQ_RA_MAIN).
In UCD2, this would be written
There are three reserved characters in UCD2:
:is used to separate the optional namespace from words;
;is used to separate composed words;
.is used to concatenate atoms to build composed words.
The semicolon is reserved for possible future usage of an optional namespace.
The use of a leading namespace should be avoided as far as possible. Standard UCD2
(defined by the IVOA board) can be written with the
ivoa: namespace, but it is
recommended not to write it (e.g.
A UCD2 is composed of several composed words, the most important (carrying most of the meaning) being the first one, that is called primary word. The primary word describes a property; it is a first-order description of the quantity. Following words, if present, give additional precision: they can add precision to either the concept to which the property refers, or the context in which it was measured.
The composed words are made of atoms separated by periods. They are arranged, only for convenience, in a hierarchical tree. This structure does not imply the existence of an underlying model.
Currently, UCD are not an ontology. In the future, a project called UCD3 will apply knowledge representation methods developed outside astronomy to the controlled vocabulary.
However, to ease the transition between UCD2 and 3, we have tried in the definition of UCD2 to take into account the ideas of concepts, properties, classes and instances.
The thumb rule is that primary words refer to properties. For example:
The recommendations for the use of UCD2, and the list of standard terms will be made available on the IVOA web site.
The list of proposed roots for the tree of standard composed words currently contains: meta (for metadata-related quantites), instr (instrument), obs (observation), phot (photometry), src (source), stat (statistical), em (electromagnetic)...
The em branch is a special case: it has been created mainly to indicate in which part of the electromagnetic spectrum
a measurement is made.
Specific words have been created to identify the frequently used bandpasses and filters. The
electromagnetic spectrum has been divided into 8 domains (radio, sub-mm, IR, optical, UV, EUV, X-ray, gamma),
in agreement with other IVOA representations.
Further divisions are made to define large bands classically used in the different domains.
For photometric quantities, the property that is measured is phot.mag, or phot.flux or phot.count,
and possible valid UCD2 could be
There will be a committee in charge of studying the suggestions for additions of new terms in the list of standard UCDs.
A set of tools for UCD2 will be progressively made available (similarly as for UCD1):
It is important to note that data providers do not need to change the internal description of their existing databases to use UCDs. Interoperability only requires a translation layer able to associate UCDs to parameters used internally. This layer is used to make the conversion from (resp. to) UCDs to (resp. from) the internal description, as shown on Fig. 1, in the case of two services exchanging data.
This translation layer only needs to be built once, and the assignment tool can be used to facilitate this step.
Derriere, S. et al. 2003, in ASP Conf. Ser., Vol. 295, Astronomical Data Analysis Software and Systems XII, ed. H. E. Payne, R. I. Jedrzejewski, & R. N. Hook (San Francisco: ASP), 69
Ortiz, P. et al. 1999, in ASP Conf. Ser., Vol. 172, Astronomical Data Analysis Software and Systems VIII, ed. David M. Mehringer, Raymond L. Plante, & Douglas A. Roberts (San Francisco: ASP), 379