So far this discussion has assumed that the intention is access to a single database. Today, one of the main challenges is to support simultaneous access to many geographically distributed databases. In fact, distributed database systems have existed for many years but, as with 4GLs and other DBMS components, most have relied on vendor-specific components or on special tools designed to link products from different vendors. In all cases, these systems have relied on having a complete knowledge of the schemas, or internal data models, used by each component database. In the web environment, the problem is to be able to search many disparate data sources without the need to understand each individual schema. To put it another way, we need a way of searching dynamic web resources comparable with the indexes of static web pages provided by conventional search engines.
The core problem is one of semantic interoperability. Even if the names of tables and columns in a database are known, how can we discover whether they are comparable with those used in another? In one database there might be an entity called Finds with dimension attributes length, width and height whilst another might have Artefacts with size attributes length, width and depth. A human user could guess that these were directly equivalent, although they would need extra information about the dimensional units employed before attempting any direct comparisons. A machine, however, is not equipped to make such judgements so there must be some method of resolving semantic problems.
One approach might be to insist that all databases use standard field names and comply with standard thesauri of terms. While such standards exist for some classes of information (e.g. the EH monument type names), they would need to be agreed for all possible fields that might be used in searching the databases. Even if this could be done, there would still be a large number of 'legacy systems' that had been developed before the standards were agreed and so could not be included in a distributed resource.
A more realistic approach is to employ metadata standards to provide a common language within which existing resources can be described. Metadata — data about data — provides a way of having a common descriptive format for diverse resources. The Dublin Core provide a simple metadata framework for the outline description of a wide range of resource types in terms of fifteen basic elements such as title, subject, spatial and temporal coverage, etc. At the other end of the complexity spectrum, the FGDC 8 spatial metadata standard and the more recent OpenGIS specifications have sufficient scope to cover the smallest details of almost any variety of spatial data.
Interoperability between different databases can be achieved by allowing the user to pose queries in terms of these metadata descriptions. The system then uses a mapping between these and the actual stored data to translate the query to suit each distinct resource. The Z39.50 protocol 9, originally developed for interoperability between library catalogue systems, provides detailed support for such mapping.
A recent example of this approach is joint project between the Computing Laboratory at the University of Kent at Canterbury and the Archaeology Data Service (ADS) at the University of York which enables a number of geographically remote datasets to be searched (Austin et al. 2002). Other partners in this project include the Portable Antiquities Scheme (PAS), the Royal Commission on the Ancient and Historic Monuments of Scotland (RCAHMS) and the Scottish Cultural Resource Access Network (SCRAN) who, along with the ADS, will act as targets for a Historic Environment portal. It is expected to embrace a wider community eventually with expressions of interest from Europe and the USA. The outline architecture of this 4-tier system, which employs many of the technologies discussed above, is shown in Figure 7.
Figure 7: Outline architecture of the ADS Historic Environment Portal
The project allows the virtual searching of the holdings of the partner organisations as one. It has options to search on, in any combination, Who (creator), What (subject), When (coverage), Where (coverage) and co-ordinate defined geographic areas. Thus a user might cross-search the ADS and RCAHMS (CANMORE) databases for references to Roman (when) forts (what) in the border area between England and Scotland (user defined co-ordinates).
© Internet Archaeology
URL: http://intarch.ac.uk/journal/issue15/8/nr8.html
Last updated: Wed 28 Jan 2004