Saturday, November 22, 2008

Spatial Statistics. Part I.

Spatial Statistics. Part I.

Ripley's K(r) function

Spatial statistics are becoming an important tool for population ecologists. These techniques allow researchers to answer questions regarding the spatial arrangements of individuals within a population and the causal factors that may be influencing spatial distributions. I will detail some basic spatial statistics using the spatstat library.
library(spatstat)
The first approach to a planar point pattern (ppp) is to describe the size of the plot, the number of individuals (i.e. points) and the density of such occurrences or instances. In spatial statistics density is usually referred to as intensity. We will be using the bei data-set included in the spatstat library. From the help page: «The dataset bei gives the positions of 3605 trees of the species Beilschmiedia pendula (Lauraceae) in a 1000 by 500 metre rectangular sampling region in the tropical rainforest of Barro Colorado Island.» The following code invokes the bei data-set and requests some summary statistics:
> data(bei) > summary(bei)
Planar point pattern: 3604 points Average intensity 0.00721 points per square metre Window: rectangle = [0, 1000] x [0, 500] metres Window area = 5e+05 square metres Unit of length: 1 metre
> plot(bei)
The previous results show a total of 3604 points in a 50 ha plot. The average intensity (i.e. density) is 0.00721 points per square meter or 72.1 trees per hectare. The position of the trees is shown in the following graph produced by the plot(bei) command:


The plot evidently shows that this species --B. pendula -- is located more commonly in certain areas of the plot, while absent in others. This suggests an aggregated spatial distribution, which is commonplace for tropical trees. To assess the spatial distribution of these points, we will use Ripley's K(r) function. This method, also known as Ripley's second moment reduced function, estimates the expected number of random points within a distance r of a randomly chosen point within a plot (Ripley 1976). Ripley's K(r) function is generally transformed as follows:
The L(r) function transforms the theoretically expected value for a random distribution into a horizontal line intersecting the P(0,0) point, thus it is more easily interpreted than the exponential K(r) function. Ripley's K(r) function is produced by the Kest() command in R. Given the large number of points in the bei data-set, calculations may take a while in computers with slow processors. The L(r) transformation is performed on-the-fly by the plot() command.
> a1 <- Kest(bei, correction="isotropic", nlarge=Inf) > plot(a1, sqrt(./pi)-r~r, ylab="L(r)")
The previous code request Ripley's K(r) function using the bie data-set. We specifically request the isotropic correction for edge effects. Spatstat's Kest() function has a restriction for data-sets larger than 3000 points. In order to circumvent this restriction we must include the nlarge=Inf option in the command. Results are stored in the a1 object.

Ripley's L(r) function is shown in the previous graph. The x-axis show the automatically selected radii (r) for which abundances are calculated, while the y-axis shows the L(r) function. The dotted red line is the expected value for a random distribution and the black solid line is the observed count. If observed values lie above the zero line (i.e. random expectation) one should suspect an aggregated distibution. Nevertheless, we need to assess if this deviation is large enough to reject the null hipothesis of a random spatial distribution or CSR (complete spatial randomness). To answer this question we need to create confidence intervals for the null hypothesis using the envelope() command. These calculations require a lot of computer resources given the large number of points in the bei data-set, therefore I reduced the number of simulations to 50 from the default nsim=99:
> sobre <- envelope(bei, Kest, nlarge=Inf, nsim=50)
Generating 50 simulations of CSR ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50.
Done. > sobre
Pointwise critical envelopes for K(r) Obtained from 50 simulations of simulations of CSR Significance level of pointwise Monte Carlo test: 2/51 = 0.0392156862745098 Data: bei Function value object (class 'fv') for the function r -> K(r) Entries: id     label     description --      -----     ----------- r      r      distance argument r obs      obs(r)      function value for data pattern theo      theo(r)      theoretical value for CSR lo      lo(r)      lower pointwise envelope of simulations hi      hi(r)      upper pointwise envelope of simulations -------------------------------------- Default plot formula: . ~ r Recommended range of argument r: [0, 125] Unit of length: 1 metre
> plot(sobre, sqrt(./pi)-r~r, lty=c(1,2,3,3), col=c(1,2,3,3), ylab="L(t)")
The resulting graph is as follows:

Given that the observed count (black solid line) is above the confidence interval (green doted lines) we can conclude that the spatial distribution of B. pendula trees significantly deviates from random expectations.

Conclusion

As expected for a tropical tree (Condit, 2000), Ripley's K(t) shows that B. pendula is spatially aggregated. References:
Condit R, Ashton PS, Baker P, Bunyavejchewin S, Gunatilleke S, Gunatilleke N, Hubbell SP, Foster RB, Itoh A, LaFrankie JV, Lee HS, Losos E, Manokaran N, Sukumar R, Yamakura T (2000) Spatial patterns in the distribution of tropical tree species. Science 288:1414―1418.
Ripley BD (1976) The second-order analysis of stationary point processes. Journal of Applied Probability 13:255-256.
Ripley BD (1977) Modeling spatial patterns. Journal of the Royal Statistical Society, B. 39:172-212.

No comments: