Internet Archaeol 1. Beardah and Baxter. 5 Examples of the use of bivariate KDEs

5 Examples of the use of bivariate KDEs

5.1 Introduction to examples of bivariate KDEs

In this section we will illustrate some possible applications of bivariate KDEs to archaeological data. Baxter et al. (1996) also deal with this topic. One of their examples illustrated the use of KDEs with spatial data, and the following examples develop this in much greater detail than was possible there, partly to illustrate some presentational options. The background to this and other examples is now described.

5.2 The Mask Site data

The data used here were originally discussed by Binford (1978) and were taken from Blankholm (1991) who used them to test a variety of approaches to intrasite spatial analysis. The data we use consist of the coordinates of the locations of 276 bone splinters. This is from an ethnoarchaeological study, and the location of other features of interest such as hearths, activity areas and rocks is known. In Baxter et al. (1996) we argue that the results obtained compare favourably with the different analyses of Blankholm (1991). Here the intention is to use the data to investigate a range of presentational options and some issues they raise.

5.3 Initial analysis of the Mask Site data

Figure 11: Scatter plot of the Mask Site data

Figure 11 shows the distribution of the 276 bone splinters, and it is evident that there are a number of spatial concentrations. "Objective" approaches to determining the number of concentrations (Blankholm, 1991; Baxter et al., 1996) have suggested between 3 and 5 main concentrations. Two views of a KDE of these data are shown in Figure 12 and Figure 13.

Figure 12:Two dimensional KDE - Mask Site data. This view has been chosen to hide one of the main peaks.

Figure 13: Two dimensional KDE - Mask Site data. This is the same KDE as Figure 12, but a more useful viewpoint has been selected.

Figure 14: Animation(Netscape 2 or similar needed)

Figure 12 is deliberately chosen to be unhelpful in the sense that one major concentration is hidden. Figure 13 highlights four main concentrations. The animation (for which Netscape navigator 2 or a similar browser is necessary) provides a way of viewing the KDE without commitment to a particular view. Each concentration appears to be associated with a hearth. Two of these hearths, associated with the two smaller peaks to the right of Figure 13 are adjacent. It is worth noting that other methodologies have not suggested the two separate concentrations as the KDE does.

5.4 Contouring using the Mask Site data

After determining a KDE, each point in the plane is associated with an estimated `height' that can be used for contouring in various ways. In conventional contouring the contours are drawn at equal height intervals. For the Mask Site data and the STE estimate of Figure 15 and Figure 16 shows the data contoured in this way. Three main concentrations are suggested, with that at the bottom right possibly sub-divided.

Figure 15: Contouring - Mask Site data

An alternative and possibly more useful approach exists that is particularly well suited to identifying modes in KDEs (Bowman and Foster, 1993). Each data point is ranked according to the height of its density estimate, and from this it is easy to determine which are the x% most dense points for any value of x. Percentage inclusion contours may then be drawn to identify the densest concentrations of data at any chosen level. This is illustrated for the Mask Site data in Figure 13 (a variant of a figure in Baxter et al., 1996), where 25, 50, 75 and 100% inclusion contours are used.

Figure 16: Percentage contouring - Mask Site data

Figure 17: Animation (Netscape 2 or similar needed)

Three main concentrations are evident, with that at the bottom right sub-divided into two parts, as discussed in (5.4). The isolated contour in the upper right is, possibly fortuitously, associated with the fifth hearth.

The choice of both the level of inclusion to use, and number of levels, is subjective. One approach is to contour at, say, the 10% level and successively add to this 20%, 30% contours etc. This is informative in itself, and may assist in the selection of a single figure for publication purposes. The process is illustrated in the animation (for which Netscape Navigator 2 or a similar browser is needed).

This kind of contouring may be extended to take account of known groups in the data. This is illustrated in Figure 20 in 5.6.

5.5 The Southampton Glass data

Background to the data used here was given in 3.5 where it was used to illustrate aspects of the use of univariate KDEs. For 271 specimens of Early Medieval glass from excavations at Southampton, the chemical composition was determined with respect to 11 oxides (Heyworth, 1991). A principal component analysis was undertaken using standardised values of these oxides, after removing seven multivariate outliers, and a bivariate plot of scores on the first two components was used to investigate structure in the data.

The analysis in 5.6 follows that of 5.3 and 5.4, and readers may find it helpful to review the last of these if they have not looked at it. The new feature of 5.6 is that there are known colour groups in the data, and contouring these separately provides a useful aid to interpreting the principal component analysis.

5.6 Bivariate KDEs of glass compositions

Background to the analyses reported here is given in the introduction to bivariate KDEs, and 3.5.

Figure 18: Bivariate KDE of PCA scores - Southampton glass data

Figure 18 shows the bivariate KDE for the first two component scores presented in a similar manner to Figure 15. The window-widths h1 and h2 (see introduction to bivariate KDEs) were chosen separately using the univariate normal scale rule (see choice of window-width). This possibly over-smooths the data, but there are two clear modes evident, one somewhat smaller than the other.

Figure 19:Contouring - Southampton glass data

Figure 19 shows an alternative view of the data, using 25, 50 and 75% inclusion contours (see 5.4). Again, the bimodality is evident. For these data the glass colour was recorded and a scatter-plot of the first two components labelled by colour (not shown here) suggested that the two modes were associated with light-green (small mode) and light-blue (large mode). These are the dominant colours in the assemblage. To check and present this it is possible to contour each group separately. This is an idea not used in the other bivariate examples (though 3.5 and Figure 7 show the univariate equivalent) and is based on material in Bowman and Foster (1993).

Figure 20: Contouring by glass colour - Southampton glass data

Figure 20 shows the outcome. To obtain this the density heights are extracted for the light-green glass only. These are ranked and used as the basis for defining inclusion contours, as described in 5.4. The figure shows 25, 50 and 75% inclusion contours. The same exercise is repeated for the light-blue glass. Although there is some overlap at the 75% level of inclusion it is also clear that the two colours tend to be separate on the plot. This makes sense and can be readily explained in terms of glass chemistry.

Two more general points are worth making here. The first is that a reasonably large sample size is needed to detect the structure present here, which may be characterised as overlapping clusters. Some experimental work, still in progress, suggests that with this kind of structure the bimodality may be missed with sample sizes of about 50 or less. Many compositional studies that use multivariate methods operate with smaller sample sizes than this, and do not necessarily record colour or related information. It is likely that underlying structure would be missed in these cases. The second point is that with large data sets it can be difficult to see the detail when labelling points, because of their density, and contouring as in Figures 19 or 20 forms a valuable alternative method of display.

PREVIOUS NEXT CONTENTS HOME