Environ Monit Assess (2007) 132:1–13

DOI 10.1007/s10661-006-9497-x

Application of Multivariate Statistical Methods to Water

Quality Assessment of the Watercourses in Northwestern
New Territories, Hong Kong
Feng Zhou & Yong Liu & Huaicheng Guo

Received: 23 May 2006 / Accepted: 15 September 2006 / Published online: 14 December 2006
# Springer Science + Business Media B.V. 2006

Abstract Multivariate statistical methods, i.e., cluster Thus, this study demonstrated that the multivariate
analysis (CA) and discriminant analysis (DA), were statistical methods are useful for interpreting complex
used to assess temporal and spatial variations in the data sets in the analysis of temporal and spatial
water quality of the watercourses in the Northwestern variations in water quality and the optimization of
New Territories, Hong Kong, over a period of five regional water quality monitoring network.
years (2000–2004) using 23 parameters at 23 different
sites (31,740 observations). Hierarchical CA grouped Keywords Cluster analysis . Discriminant analysis .
the 12 months into two periods (the first and second Hong Kong . Northwestern New Territories . Temporal
periods) and classified the 23 monitoring sites into and spatial variations . Water quality
three groups (group A, group B, and group C) based
on similarities of water quality characteristics. DA
provided better results with great discriminatory 1 Introduction
ability for both temporal and spatial analysis. DA also
provided an important data reduction because it only The ecosystem services of watercourses such as rivers
used six parameters (pH, temperature, five-day bio- and lakes directly or indirectly contribute to both human
chemical oxygen demand, fecal coliforms, Fe, and Ni) welfare and aquatic ecosystem (Costanza et al., 1997).
for temporal analysis, affording about 84% correct Rivers also play an important role in the assimilation
assignations, and seven parameters (pH, ammonia– and transport of domestic and industrial wastewater,
nitrogen, nitrate nitrogen, fecal coliforms, Fe, Ni, and which represent constant pollution sources, and agricul-
Zn) for spatial analysis, affording more than 90% tural runoff, which is temporal and commonly affected
correct assignations. Therefore, DA allowed a reduc- by climate (Singh, Malik, Mohan, & Sinha, 2004; Vega,
tion in the dimensionality of the large data set and Pardo, Barrado, & Deban, 1998). Rivers are highly
indicated a few significant parameters that were res- vulnerable to pollution; therefore, it is important to
ponsible for most of the variations in water quality. control water pollution, monitor water quality in river
basin (Simeonov et al., 2003), and interpret the temporal
and spatial variations in water quality (Dixon &
F. Zhou : Y. Liu : H. Guo (*)
Chiswell, 1996; Singh et al., 2004). Various multivariate
College of Environmental Sciences, Peking University,
Beijing 100871, People’s Republic of China statistical methods, such as cluster analysis (CA), and
e-mail: discriminant analysis (DA), help in the interpretation of
F. Zhou complex data sets, such as those created by long-term
e-mail: water quality monitoring programs, allowing a better

2 Environ Monit Assess (2007) 132:1–13

understanding of the temporal and spatial variations in of surface water quality in Hong Kong. In the present
water quality, and in the identification of discriminant study, large data sets, obtained during a five-year
parameters that are of use in optimizing monitoring (2000–2004) monitoring program, were subjected to
network (Shrestha & Kazama, 2007; Simeonov et al., CA and DA to extract latent information about the
2003). In the last decade, multivariate statistical methods similarities or dissimilarities among the monitoring
have been applied to characterize and evaluate freshwa- periods or sites, to identify water quality variables
ter (Astel, Biziuk, Przyjazny, & Namiesnik, 2006; responsible for temporal and spatial variations in water
Kowalkowski, Zbytniewski, Szpejna, & Buszewski, quality, and to test of the validity of the results in
2006; Papatheodorou, Demopoulou, & Lambrakis, temporal and spatial DA.
2006; Shin & Fong, 1999; Shrestha & Kazama, 2007;
Simeonov et al., 2003; Simeonov, Einax, Stanimirova,
& Kraft, 2002; Simeonova, Simeonov, & Andreev, 2 Monitoring Area and Methods
2003: Singh et al., 2004; Vega et al., 1998; Wunderlin
et al., 2001), groundwater (Adams, Titus, Pietesen, 2.1 Monitoring area and sampling
Tredoux, & Harris, 2001; Helena et al., 2000;
Lambrakis, Antonakos, & Panagopoulos, 2004; K. P. The Northwestern New Territories (Figure 1) is one of
Singh, Malik, V. K. Singh, Mohan, & Sinha, 2005; the most polluted regions in Hong Kong, which
Suk & Lee, 1999) and seawater (Reghunath, Murthy, covers the entire Deep Bay Water Control Zone (WCZ)
& Raghavan, 2002; Yeung, 1999; Yung, Wong, Yau, & and includes the North and Yuen Long Districts,
Qian, 2001). According to previous researches, multi- Except for the town centers of Sheung Shui, Fan Ling,
variate statistical methods were proved as one of useful major watercourses drain most of the untreated sewage
tools to extract the meaningful information from data from the rural areas and receive domestic wastewater
set, for example, Simeonov et al. (2003) and Astel et al. and agricultural runoff (Hong Kong Environmental
(2006) applied CA to delineate the monitoring sites, Protection Department [HKEPD], 2004), thus, it is
Singh et al. (2005) and Shrestha & Kazama, 2007 used greatly essential to assess the temporal and spatial
CA and DA to identify the significant parameters and variations of water quality in this region. The
optimize the monitoring network. However, CA and Northwestern New Territories includes 13 inland
DA were not comprehensively applied in the analysis watercourses (rivers, creeks, nullahs, and streams):

Figure 1 Studying area and

its water quality monitoring
Environ Monit Assess (2007) 132:1–13 3

three major rivers in the North District, namely the (Al), iron (Fe), copper (Cu), chromium(Cr), manga-
Indus, Beas, and Ganges Rivers, join the Shenzhen nese (Mn), lead (Pd), nickel (Ni) and zinc (Zn). All
River; four other major watercourses in the Yuen the water quality parameters are expressed in milli-
Long Basin, including the Shenzhen River, flow into gram/liter, except pH, EC (μS·cm−1), TEMP (°C), E.
the inner Deep Bay. The Yuen Long Creek, the Kam coli (cfu/100 ml) and F. coli (cfu/100 ml). The
Tin River, the Tin Shui Wai Nullah, the Fairview Park sampling, preservation, transportation, and analysis
Nullah, and six minor streams (the Ngau Hom Sha, of the water samples were performed according to
Ha Pak Nai, Tai Shui Hang, Pak Nai, Sheung Pak standard methods (APHA, 1998; ASTM, 2001). The
Nai, and Tsang Kok streams) near Lau Fau Shan drain basic statistics of the five-year data set (31,740
into the outer Deep Bay. observations) on river water quality are summarized
The Indus River is one of the largest rivers in the in Table I.
area, with a total length of about 49 km, covering an
area of 43 km2. The Beas River, a major branch of the 2.3 Data treatment
Indus, flows from Lam Tsuen Country Park and
covers an area of 20 km2. The Ganges River Most multivariate statistical methods require variables
originates in Wo Ken Shan and has a smaller of to conform to the normal distribution, thus, the nor-
10 km2. Yuen Long Creek is around 60-km long and mality of the distribution of each variable was checked
covers an area of 27 km2. With a area of 44.3 km2, the by analyzing kurtosis and skewness statistical test
50-km long Kam Tin River passes through the urban before multivariate statistical analysis (Johnson &
areas of Kam Tin and Yuen Long. All four water- Wichern, 1992; Lattin, Carroll, & Green, 2003;
courses in the Yuen Long Basin flow into the inner Papatheodorou et al., 2006). The original data dem-
Deep Bay via concrete channels. The catchments of onstrated values of kurtosis ranging from −0.648 to
the six streams in Lau Fau Shan are very small and 599 and skewness ranging from −0.295 to 27.311
vary between 1.5 and 6 km2 (HKEPD, 2004). and, indicating that distributions were far from
HKEPD has collected water quality data from 24 normal with 95% confidence. Since most of the
monitoring sites, 23 of which were selected for the values of kurtosis or skewness were greater than zero,
present study’s water quality monitoring network, the original data were transformed in the form x′=
covering a wide range of the 13 inland watercourses. log10(x) (Kowalkowski et al., 2006; Papatheodorou et
The Tsang Kok Stream data were excluded because of al., 2006). After log-transformation, the kurtosis and
missing data. skewness values ranged from −0.838 to 1.919 and
−1.32 to 1.376, respectively, but the distributions of the
2.2 Monitored parameters and analytical methods log-transformed SO2 4 and Cr were also non-normal,
therefore they were not regarded in the following
The data for 23 water quality monitoring sites, study. In the case of CA, all log-transformed variables
consisting of 48 water quality parameters monitored were also z-scale standardized (the mean and variance
monthly over five years (2000−2004), were obtained were set to zero and one, respectively) to minimize the
from HKEPD (2001, 2002, 2003, 2004, 2005). Only effects of different units and variance of variables and
25 of the 48 parameters, selected based on their to render the data dimensionless (Liu, Lin, & Kuo,
sampling continuity at all the selected monitoring 2003; Singh et al., 2004).
sites, were used in the present analysis. The selected In this study, temporal variations in water quality
parameters included electrical conductivity (EC), pH, parameters were primarily evaluated using Spearman’s
dissolved oxygen (DO), temperature (TEMP), chem- R coefficient, a non-parametric test often used to
ical oxygen demand (COD), five-day biochemical evaluate the correlation structure between water
oxygen demand (BOD 5 ), ammonia–nitrogen quality parameters with non-normal distributions
(NHþ 4  N), total kjeldahl nitrogen (TKN), nitrate (Singh et al., 2004; Shrestha & Kazama, 2007;
nitrogen (NO 3  N), total phosphorus (TP), Escher- Wunderlin et al., 2001). The water quality parameters
ichia coliforms (E. coli), Fecal coliforms (F. coli), were grouped into different periods based on tempo-
total solids (TS), total suspended solids (TSS), ral CA, and each period was assigned a numerical
Sulphide (SO24 ) fluoride (F), arsenic (As), aluminum value.
4 Environ Monit Assess (2007) 132:1–13

Table I Statistical descrip-

tives of water quality Parameters Mean SD SE Minimum Maximum
EC 607.640 1,524.98 41.07 18 16,316
pH 7.353 0.53 0.01 6 10
DO 6.637 2.51 0.07 0.7 16.1
TEMP 25.460 4.57 0.12 11.9 37
COD 34.832 51.48 1.39 2 700
BOD5 19.681 36.52 0.98 1 500
NHþ 4 N 7.124 12.56 0.34 0.005 120
TKN 9.190 14.99 0.40 0.05 130
NO 3 N 0.720 0.83 0.02 0.002 12
TP 1.918 3.01 0.08 0.02 27
E. coli 485,356 3,308,716 89,099 1 110,000,000
F. coli 917923 4,309,442 116,048 2 130,000,000
TS 465.381 1151.73 31.01 37 20,000
TSS 57.608 171.58 4.62 0.5 5100
4 0.054 0.182 0.005 0.02 4.5
F 0.322 0.14 0.00 0.2 1.4
As 5.357 7.26 0.20 1 58
Al 247.230 352.00 9.48 50 6300
Fe 1.156 1.294 0.0348 0.050 21
Cu 0.011 0.016 0.000 0.001 0.150
Cr 0.002 0.003 0.000 0.001 0.059
Mn 0.268 0.379 0.010 0.010 3.300
Pb 0.006 0.013 0.000 0.001 0.280
Ni 0.004 0.005 0.000 0.001 0.067
Zn 0.1609 0.9255 0.0249 0.01 24

2.4 Cluster analysis been successfully applied to the assessment of water

quality in Wunderlin et al. (2001), Simeonov et al.
CA is an unsupervised pattern recognition method (2003), Singh et al. (2004, 2005), Shrestha &
that divides a large group of cases into smaller groups Kazama, 2007, Kowalkowski et al. (2006), and Astel
or clusters of relatively similar cases that are et al. (2006).
dissimilar to other groups. Hierarchical CA, the most
common approach, starts with each case in a separate 2.5 Discriminant analysis
cluster and joins the clusters together step by step
until only one cluster remains (Lattin et al., 2003; DA is a method of analyzing dependence that is a
McKenna, 2003). The Euclidean distance usually special case of canonical correlation, and one of its
gives the similarity between two samples, and a objectives is to determine the significance of different
distance can be represented by the difference between variables, which can allow the separation of two or
transformed values of the samples (Otto, 1998). In more naturally occurring groups. DA operates on
this study, hierarchical CA was performed on the original data, and the method constructs a discrimi-
standardized data using Ward’s method with squared nant function for each group (Johnson & Wichern,
Euclidean distances as a measure of similarity. Ward’s 1992; Lattin et al., 2003; Wunderlin et al., 2001) as
method uses analysis of variance (ANOVA) to follows:
calculate the distances between clusters to minimize X

the sum of squares of any two possible clusters at f ð G i Þ ¼ ki þ wij  pij ð1Þ
each step. Both temporal and spatial variations in
water quality were determined from hierarchical CA where i is the number of groups (G), ki is the constant
using the linkage distance. A similar approach has inherent to each group, n is the number of parameters
Environ Monit Assess (2007) 132:1–13 5

used to classify a set of data into a given group, wj is December, approximately corresponding to the dry
the weight coefficient, assigned by DA to a given season in Hong Kong (October to March; Yeung,
parameter (pj). 1999). Cluster 2 (the second period) included the
In this study, DA was performed on original data remaining months (May, June, July, August, Septem-
using the standard, forward stepwise and backward ber, October, and November), closely corresponding
stepwise modes to evaluate both the temporal and to the wet season (April to September). However, if
spatial variations in water quality. The best discrim- the 12 months had been empirically divided into
inant functions for each mode were constructed spring (March to May), summer (June to September),
considering the quality of the classification matrix autumn (October to December), and winter (January
and the number of parameters. The monitoring sites to February), or into dry/wet seasons, a mistake in
and periods were the grouping variables and the grouping would have been made. In fact, Figure 2
measured parameters were the independent variables. shows that the temporal patterns to water quality were
not purely consistent with the four seasons or the dry/
wet seasons.
3 Results and Discussion
3.2 Spatial similarity and site grouping
3.1 Temporal similarity and period grouping
Considering the experience obtained from temporal-
An initial exploratory approach involved the use of CA, spatial-CA was also used to identify similar
hierarchical CA on standardized log-transformed data monitoring sites. However, the influences of temporal
sets sorted by season. CA generated a dendrogram differences on spatial-CA were considered. Both
(Figure 2), grouping the 12 months into two clusters spatial similarity analysis for each temporal cluster
at ðDlink =Dmax Þ  100 < 25, and the difference be- and the integrated clusters (the first and second
tween the clusters was significant. Cluster 1 (the first periods) were carried out, but the results were almost
period) included January, February, March, April, and similar. Therefore, only the latter result is discussed.


the 1st period






the 2nd period





0 20 40 60 80 100 120
Figure 2 Dendrogram showing clustering of monitoring periods.
6 Environ Monit Assess (2007) 132:1–13

Spatial-CA produced a dendrogram, shown in Figure 3, Objectives (WQO) was in steady decline from 2000
with three groups at ðDlink =Dmax Þ 100 < 35. Group A to 2004 (HKEPD, 2001, 2002, 2003, 2004, 2005).
consisted of DB1, DB2, DB3, and DB4. Group B Hierarchical CA provided a useful classification of
consisted of DB5, GR3, IN3, RB1, RB2, and TSR2, and the surface watercourses in the study area that can be
group C consisted of GR1, GR2, IN1, IN2, KT1, KT2, used to design an optimal future spatial monitoring
TSR1, FVR1, YL1, YL2, YL3, and YL4. The group network with lower cost (Simeonov et al., 2003;
classifications varied with significance level, because Singh et al., 2004). According to the above results,
the sites in these groups had similar features and natural the frequency of monitoring sites might be decreased
backgrounds that were affected by similar sources. In and the monitoring periods could only selected from
group A, four sites were located in the Ha Pak Nai, Tai the first and second periods, as well, the number of
Shui Hang, Pak Nai, and Sheung Pak Nai streams, monitoring sites could also be reduced and only
which are free from major point and non-point pollution chosen from groups A, B and C.
sources. Moreover, based on the Hong Kong Annual
River Water Quality Report, the water quality of these 3.3 Temporal variations in water quality
streams remained pristine over the five years of this
study (2000–2004). Group B corresponded to relatively Temporal variation in water quality parameters (Table I)
moderately polluted sites, except DB5; most sites in this were evaluated using a period–parameter correlation
group were upstream where the major pollution sources matrix, which showed that most analyzed parameters
were discharges from unsewered villages and livestock were significantly correlated (p<0.05) with period,
farms. Group C corresponded to highly polluted sites except DO, NO 3  N, F. coli, Pb, and Zn. Tempera-
that received pollution from point and non-point ture had the highest correlation coefficient (Spearman’s
sources, i.e., unsewered areas, agricultural farms, and R>0.67), followed by NHþ 4  N (R=−0.23), TKN
surface runoff from the Yuen Long and Tin Shui Wai (R=−0.22), BOD5 (R=0.22), Ni (R=0.19), and COD
town centers. Compliance with the Water Quality (R=−0.17). These parameters accounted for the major

Group C

Group B

Group A


0 20 40 60 80 100 120
Figure 3 Dendrogram showing clustering of monitoring sites.
Environ Monit Assess (2007) 132:1–13 7

Table II Wilks’ lambda and chi-square test of DA of temporal objectives of DA in this study were (1) to test the
variation of water quality significance of discriminant functions and (2) to
Modes Fun. R Wilks’ chi-square p level determine the most significant variables associated
(s) lambda with the differences between the clusters. As shown in
Table II, the values of Wilks’ lambda and the chi-
Standard 1 0.731 0.466 1,044.549 0.000
square for each discriminant function were quite small
Forward 1 0.730 0.468 1,042.314 0.000
(0.466, 0.468, and 0.481 for each mode, respectively)
Backward 1 0.721 0.481 1,007.021 0.000
and rather high, respectively, which suggested that the
temporal-DA in this study was valid and effective.
temporal variation in water quality, but the large Discriminant functions (DFs) and classification
difference among periods in temperature was respon- matrices (CMs) obtained from the standard, forward
sible for much of the temporal variation in water stepwise, and backward stepwise modes of DA are
quality. The absence of a significant correlation of DO, shown in Tables III and IV. In the forward stepwise
NO 3  N, F. coli, Pb, and Zn with period indicated the mode, variables were included step-by-step, begin-
contribution of anthropogenic sources to pollution in ning with the most significant, until no significant
the Northwestern New Territories. changes were obtained; in the backward stepwise
Temporal variation in water quality was further mode, variables were removed step-by-step beginning
evaluated using DA. Before running the temporal- with the least significant, until no significant changes
DA, the number of clusters needed to be decided, so were obtained. The standard DA mode constructed
the clusters based on temporal-CA were applied. The DFs including all parameters, and the coefficient of E.

Table III Classification Parameters Standard mode Forward stepwise mode Backward stepwise mode
functions coefficients for
DA of temporal variation First Second First Second First Second
period period period period period period
coefficienta coefficienta coefficienta coefficienta coefficienta coefficienta

EC −55.014 −55.422
pH 1,311.644 1,292.785 1,212.191 1,194.475 958.746 938.491
DO −38.209 −37.555
TEMP 293.750 326.151 262.199 294.485 241.564 273.234
COD 9.040 9.392
BOD5 −8.800 −9.579 −5.964 −6.695 −13.602 −15.057
NH3–N 6.630 6.855
TKN −11.493 −13.533 −9.010 −10.691
NO3–N −4.407 −4.225 −7.604 −7.379
TP −6.533 −5.596 −4.039 −2.918
E. coli −0.859 −0.610
F. coli 6.860 7.556 7.370 8.246 1.631 2.452
TS 91.653 92.032
TSS −19.286 −19.337
F −154.845 −156.803 −116.980 −119.330
As −21.337 −21.027
Al 49.607 49.094
Fe −0.906 −2.836 −1.925 −3.950 26.704 25.758
Cu −3.855 −3.657
Mn 41.615 42.569 32.531 33.657
Pb −52.519 −51.694 −11.644 −11.096
Coefficients for differ- Ni 10.389 9.587 23.811 22.901 −7.352 −8.782
ent monitoring periods Zn 14.193 14.007
correspond to wij as de- Constant −932.066 −960.612 −782.883 −812.636 −613.271 −639.099
fined in Eq. 1.
8 Environ Monit Assess (2007) 132:1–13

Table IV Classification
matrix for DA of temporal Monitoring periods Percent correct Period assigned by DAa
First Period Second Period

Standard mode
First Period 84.35 485 90
Second Period 85.71 115 690
Total 85.14 600 780
Forward stepwise mode
First Period 85.04 489 86
Second Period 85.47 117 688
Total 85.29 606 774
Backward stepwise mode
First Period 82.78 476 99
Second Period 84.60 124 681
Checked by cross-valida- Total 83.84 600 780
tion method

coli was near zero. Both the standard and forward temporal-DA, as shown in Table V. The values of
stepwise mode DFs, using 23 and 12 discriminant Wilks’ lambda and the chi-square for each discrimi-
variables, respectively, yielded CMs assigning more nant function varied from 0.073 to 0.601 and 849.252
than 85% of the cases correctly (Tables IV). However, to 3,583.688, respectively, and the p level (0.000) was
in the backward stepwise mode, DA produced a CM below 0.05, indicating that the spatial-DA in this
with close to 84% correct assignations using only six study had a greater discriminatory ability of the
discriminant parameters, results similar to those from function and was credible and effective.
the former two modes, but with many fewer param- Spatial-DA was performed using the original data
eters. Thus, the temporal-DA results suggest that pH, set of 23 parameters after classification into the three
TEMP, BOD5, F. coli, Fe, and Ni were the most major groups, A, B, and C, obtained through CA. The
significant parameters for discriminating between the sites were the dependent variables and the measured
first period and the second period and to account for parameters constituted the independent variables.
most of the expected temporal variation in water quality. As in temporal-DA, DFs and CMs obtained from
Box and whisker plots of the discriminant parameters the standard, forward stepwise, and backward step-
recognized by DA as being related to the temporal trend wise modes of DA, are shown in Tables VI and VII.
are given in Figure 4. The pH (Figure 4a) was slightly The standard DA mode constructed DFs using 23
higher in the first period than in the second period. The parameters. Both the standard and forward stepwise
average temperature (Figure 4b) was clearly higher in mode DFs, using 23 and 13 discriminant parameters,
the second period than in the first period and showed a rendered the corresponding CMs correctly assigning
clear-cut temporal effect. A clear inverse relationship more than 92.0 and 91.45% of the cases, respectively
between temperature and BOD5 (Figure 4c) was also (Tables VI and VII); however, the backward stepwise

observed and contributed to the period effect. The DA showed that pH, NHþ 4  N, NO3  N, F. coli, Fe,
average concentration of F. coli (Figure 4d) was lower Ni, and Zn were the discriminant parameters in spatial
in the first period than in the second period. The variation, with correct assignations of 90.65% for the
average concentrations of Fe and Ni (Figure 4e–f), three group sites (Tables VI and VII). Thus, the
following the same pattern, were higher in the first spatial-DA results suggested that only seven param-

period than in the second period. eters, i.e., pH, NHþ4  N, NO3  N, F. coli, Fe, Ni,
and Zn, were needed to account for most of the
expected spatial variations in water quality.
3.4 Spatial variations in water quality Box and whisker plots of discriminating parame-
ters identified by spatial DA (backward stepwise
Similar with temporal-DA, The test of significance in mode) were constructed to evaluate different patterns
spatial-DA was calculated similarly to that for associated with spatial variations in water quality
Environ Monit Assess (2007) 132:1–13 9

Figure 4 Temporal variation: pH, TEMP, BOD5, F. coli, Fe and Ni.

Table V Wilks’ lambda and chi-square test of DA of spatial variation of water quality

Modes Test of fun. (s) R Wilks’ lambda chi-square p level

Standard 1 0.928 0.073 3,583.688 0.000

2 0.689 0.525 880.554 0.000
Forward 1 0.923 0.080 3,462.774 0.000
2 0.680 0.538 849.252 0.000
Backward 1 0.915 0.097 3,199.391 0.000
2 0.631 0.601 698.939 0.000
10 Environ Monit Assess (2007) 132:1–13

Table VI Classification functions coefficients for DA of spatial variation

Parameters Standard mode Forward stepwise mode Backward stepwise mode

coefficienta coefficienta coefficienta coefficienta coefficienta coefficienta coefficienta coefficienta coefficienta

EC −51.58 −47.94 −44.93

pH 1,541.00 1,597.66 1,568.93 1,532.747 1,591.182 1,570.373 1,220.445 1,276.984 1,260.341
DO −20.28 −12.97 −14.84 −32.839 −27.247 −29.848
TEMP 213.09 218.89 222.03
COD 12.95 16.27 17.56
BOD5 −11.10 −12.78 −12.41
NH3–N 18.90 25.88 27.21 9.535 14.228 14.516 −7.261 −2.219 −0.456
TKN −13.49 −18.86 −21.15
NO3–N −0.93 1.52 2.29 2.424 5.236 5.955 0.240 3.185 3.576
TP −9.18 −5.85 −1.92 −20.456 −19.310 −16.110
E. coli 5.03 6.90 5.61 −0.810 0.696 −0.469
F. coli 5.72 8.33 10.97 15.799 18.751 21.388 10.754 14.146 15.949
TS 85.16 80.08 77.01
TSS −19.90 −20.59 −20.99
F −156.21 −154.12 −148.12 −124.589 −123.634 −116.351
As −17.46 −15.94 −16.65
Al 45.63 45.55 48.25
Fe 32.73 42.86 39.63 60.215 72.095 69.227 61.337 71.775 68.806
Cu −5.55 −8.17 −10.58 −1.996 −4.134 −5.971
Mn 42.03 43.86 44.42
Pb −57.17 −58.76 −59.31 −24.079 −27.837 −28.099
Ni −3.90 −8.82 −5.88 −7.017 −12.028 −8.922 −23.636 −28.728 −24.288
Zn 11.34 13.06 16.84 −1.106 −0.105 3.458 −5.281 −5.883 −3.472
Constant −1,008.33 −1,106.22 −1,090.94 −776.371 −870.179 −850.577 −610.217 −691.824 −683.738
Coefficients for different monitoring sites correspond to wij as defined in Eq. 1.

Table VII Classification

matrix for DA of spatial Monitoring sites Percent correct Period assigned by DAa

Standard mode
Group A 96.7 232 8 0
Group B 89.7 9 322 29
Group C 91.8 1 63 716
Total 92.0 242 393 745
Forward stepwise mode
Group A 96.25 231 9 0
Group B 89.17 7 321 32
Group C 91.03 1 69 710
Total 91.45 239 399 742
Backward stepwise mode
A Group 96.25 231 9 0
B Group 87.78 7 316 37
C Group 90.26 1 75 704
Checked by cross-valida- Total 90.65 239 400 741
tion method
Environ Monit Assess (2007) 132:1–13 11

Figure 5 Spatial variation: pH, NH+4–N, NOj3 –N, F. coli, Fe, Ni and Zn.
12 Environ Monit Assess (2007) 132:1–13

(Figure 5). The average pH (Figure 5a) was higher in spatial analysis and produced more than 90% correct
group B than in groups A and C. The trends for NH4+– assignations. Therefore, DA allowed a reduction in
N, NOj 3 –N, F. coli, Fe, Ni, and Zn (Figure 5b–g) the dimensionality of the large data set and indicated a
proposed that the average concentration in group C few significant parameters responsible for large
was the highest, while that in group A was the lowest. variations in water quality that could reduce the
Within group C, most of the watercourses and their number of sampling parameters. Hence, this study
monitoring sites were located downstream or near illustrates that multivariate statistical methods are an
urban areas or unsewered villages, and the discharge excellent exploratory tool for interpreting complex
was easily influenced by wastewater from agricultural water quality data sets and for understanding temporal
irrigation and households upstream. On the contrary, and spatial variations, which are useful and effective
sites in groups A and B were relatively far from for water quality management.
pollution source. The result of spatial-CA also
supported the trends of discriminant parameters in
Acknowledgements The authors sincerely thank to Hong
water quality.
Kong Environmental Protection Department for permission to
Based on above results, backward DA was proved use the data and we would like to thank to two referees for their
as a valuable tool to recognize the discriminant valuable comments. The opinions in this paper are those of the
parameters in temporal and spatial variations of surface authors and do not reflect the views or policies of the Hong
Kong Special Administrative Region Government. This paper
water quality, additionally, it was essential to strength-
was supported by the “National Basic Research (973) Program”
en the monitoring accuracy of pH, TEMP, BOD5, F. Project (no. 2005CB724205) of the Ministry of Science and

coli, NHþ 4  N, NO3  N, Fe, Ni and Zn to clearly Technology of China and the “China Scholarship programs”
identify variations in future. Furthermore, compared Project (2006100766) of the Ministry of Education of China.
to another two groups, the pollution of group C was
relatively serious and should be controlled.

4 Conclusions Adams, S., Titus, R., Pietesen, K., Tredoux, G., & Harris, C.
(2001). Hydrochemical characteristic of aquifers near
Sutherland in the Western Karoo, South Africa. Journal
In this case study, different multivariate statistical of Hydrology, 241, 91–103.
methods were used to assess temporal and spatial APHA (1998). Standard methods for the examination of water
variations in water quality of watercourses in the and wastewater. Washington: American Public Health
Northwestern New Territories, Hong Kong. Hierar- Astel, A., Biziuk, M., Przyjazny, A., & Namiesnik, J. (2006).
chical CA grouped the 12 months into two periods Chemometrics in monitoring spatial and temporal variations
(the first and second periods) and classified 23 in drinking water quality. Water Research, 8, 1706–1716.
sampling sites into three groups (A, B, and C) based ASTM (2001). American society of testing and materials
standards. New York.
on the similarity of water quality characteristics. The Costanza, R., dArge, R., deGroot, R., Farber, S., Grasso, M.,
temporal and spatial similarities and groupings could Hannon, B., et al. (1997). The value of the world’s ecosystem
facilitate the design of an optimal future monitoring services and natural capital. Nature, 387, 253–260.
strategy that could decrease monitoring frequency, the Dixon, W., & Chiswell, B. (1996). Review of aquatic monitoring
program design. Water Research, 30, 1935–1948.
number of sampling stations, and the corresponding Helena, B., Pardo, R., Vega, M., Barrado, E., Fernandez, J. M.,
costs for the Northwestern New Territories. Moreover, & Fernandez, L. (2000). Temporal evolution of ground-
DA provided better results both temporally and water composition in an alluvial aquifer (Pisuerga river,
spatially with great discriminatory ability, according Spain) by principal component analysis. Water Research,
34, 807–816.
to significance tests. DA rendered an important HKEPD (2001). River water quality in Hong Kong in 2000.
reduction in the required amount of data for the three Hong Kong: Hong Kong Government Printer.
groups of monitoring sites, because it only used six HKEPD (2002). River water quality in Hong Kong in 2001.
parameters (pH, temperature, BOD5, F. coli, Fe, and Hong Kong: Hong Kong Government Printer.
HKEPD (2003). River water quality in Hong Kong in 2002.
Ni) for the temporal analysis and produced about 84% Hong Kong: Hong Kong Government Printer.
correct assignations, and seven parameters (pH, HKEPD (2004). River water quality in Hong Kong in 2003.

NHþ 4  N, NO3  N, F. coli, Fe, Ni, and Zn) for the Hong Kong: Hong Kong Government Printer.
Environ Monit Assess (2007) 132:1–13 13

HKEPD (2005). River water quality in Hong Kong in 2004. Simeonov, V., Stratis, J. A., Samara, C., Zachariadis, G.,
Hong Kong: Hong Kong Government Printer. Voutsa, D., Anthemidis, A., et al. (2003). Assessment of
Johnson, R. A., & Wichern, D. W. (1992). Applied multivariate the surface water quality in Northern Greece. Water
statistical analysis (5th edn.). New Jersey: Prentice-Hall. Research, 37, 4119–4124.
Kowalkowski, T., Zbytniewski, R., Szpejna, J., & Buszewski, Simeonov, V., Einax, J. W., Stanimirova, I., & Kraft, J. (2002).
B. (2006). Application chemometrics in river water Environmetric modeling and interpretation of river water
classification. Water Research, 40, 744–752. monitoring data. Analytical and Bioanalytical Chemistry,
Lambrakis, N., Antonakos, A., & Panagopoulos, G. (2004). 374, 305–898.
The use of multicomponent statistical analysis in hydro- Simeonova, P., Simeonov, V., & Andreev, G. (2003). Water
geological environmental research. Water Research, 38, quality study of the Struma River Basin, Bulgaria (1989–
1862–1872. 1998). Central European Journal of Chemistry, 1, 136–212.
Lattin, J., Carroll, D., & Green, P. (2003). Analyzing multivar- Singh, K. P., Malik, A., Mohan, D., & Sinha, S. (2004).
iate data. New York: Duxbury. Multivariate statistical techniques for the evaluation of spatial
Liu, C. W., Lin, K. H., & Kuo, Y. M. (2003). Application of and temporal variations in water quality of Gomti River
factor analysis in the assessment of groundwater quality in (India) – A case study. Water Research, 38, 3980–3992.
a blackfoot disease area in Taiwan. Science of the Total Singh, K. P., Malik, A., Singh, V. K., Mohan, D., & Sinha, S.
Environment, 313, 77–89. (2005). Chemometric analysis of groundwater quality data
McKenna, J. (2003). An enhanced cluster analysis program of alluvial aquifer of Gangetic Plain, North India.
with bootstrap significance testing for ecological commu- Analytica Chimica Acta, 550, 82–91.
nity analysis. Environmental Modelling and Software, 18, Suk, H., & Lee, K. (1999). Characterization of a ground water
205–220. hydrochemical system through multivariate analysis:
Otto, M. (1998). Multivariate methods. In: Kellner, R., Mermet, Clustering into ground water zones. Ground Water, 37,
J. M., Otto, M., & Widmer, H. M. (Eds.), Analytical 358–366.
chemistry. Weinheim: Wiley-VCH. Vega, M., Pardo, R., Barrado, E., & Deban, L. (1998).
Papatheodorou, G., Demopoulou, G., & Lambrakis, N. (2006). Assessment of seasonal and polluting effects on the
A long-term study of temporal hydrochemical data in a quality of river water by exploratory data analysis. Water
shallow lake using multivariate statistical techniques. Research, 32, 3581–3592.
Ecological Modelling, 193, 759–776. Wunderlin, D. A., Diaz, M. D. P., Ame, M. V., Pesce, S. F.,
Reghunath, R., Murthy, T. R. S., & Raghavan, B. R. (2002). Hued, A. C., & Bistoni, M. D. (2001). Pattern recognition
The utility of multivariate statistical techniques in hydro- techniques for the evaluation of spatial and temporal
geochemical studies: an example from Karnataka, India. variations in water quality. A case study: Suquia River Basin
Water Research, 36, 2437–2442. (Cordoba Argentina). Water Research, 35, 2881–2894.
Shin, P. K. S., & Fong, K. Y. S. (1999). Multiple discriminant Yeung, I. M. H. (1999). Multivariate analysis of the Hong Kong
analysis of marine sediment data. Marine Pollution Victoria harbour water quality data. Environmental Mon-
Bulletin, 39, 285–294. itoring and Assessment, 593, 331–342.
Shrestha, S., & Kazama, F. (2007). Assessment of surface water Yung, Y. K., Wong, C. K., Yau, K., & Qian, P. Y. (2001). Long-
quality using multivariate statistical techniques: A case term changes in water quality and phytoplankton charac-
study of the Fuji river basin, Japan. Environmental teristics in Port Shelter, Hong Kong, from 1988–1998.
Modelling and Software, 22, 464–475. Marine Pollution Bulletin, 40, 981–992.
