ANOVA > Web-form > Online Help > ANOVA Primer
ANOVA Primer
Overview
ANOVA stands
for ANalysis Of VAriance.
It is a standard and widely applied statistical technique in biology used to
formally compare the effects of different kinds of treatments or categorical
factors on one or more measured quantitative variable(s). Typical applications
include the analysis of data from ecological or agricultural field experiments
and clinical drug trials. An example of the former would be the comparison of
wheat yield (a quantitative variable) between field plots where different types
or quantities of fertilizer (the treatment, factor or categorical variable)
were applied. An example of ANOVA usage in clinical applications would be the
assessment of whether a drug dosage regimen (the categorical factor) significantly
improved some measured aspect (the measured or response variable) of the health
of patients relative to a placebo control.
Conceptual Basis
Simply put, what ANOVA does in a formal statistical way is look at how variability around means of the response variable associated with different treatment types is distributed, and whether there is separation or overlap suggestive of a notable effect or not. Consider a simple graph where one plots the average values for each treatment type and their respective value spread or "typical ranges" (ie. variance, or more correctly, approximately two times their standard deviation). If there is overlap in the spread of values for different treatment types, then one concludes that the effect of treatments is not statistically significant. This outcome will result if means are very similar and/or if associated variances are large. The latter occurs if there is much natural variability, or if there are large measurement errors associated with the quantitative observations). Conversely, if means are well separated and variability is low then one can confidently conclude that different treatments have a statistically measurable, significant effect. This in a nutshell is the objective of ANOVA: to provide a statistical method for the assessment of whether treatments or factors are significantly different in their effect given observed variability in a quantified measurement variable.
Methodological Framework
The above illustrates the basic principle of the method but
what is happening formally in an ANOVA is a decomposition of the observed variance
into its component parts by treatment type. This involves calculations of the
proportion of the total observed variance that is attributable to a given categorical
variable or treatment factor. The result is an F-Ratio statistic (F-value) that
is compared to probabilities (P, based on the F-distribution) given the sample
size (n) and number of factors involved (degrees of freedom). This in turn indicates
at what probability level observed differences are significant, generally differences
being deemed significant if P<=5% (note: this means there is only a 5% chance
of incorrectly concluding that differences are significant when in reality they
are not. The corollary to this is that even if the ANOVA analysis shows a significant
difference, there is still always a chance that this is a statistical artifact
given errors in the data). The lower the P, the lower the chance of concluding
wrongly, but also the larger the sample sizes (number of data points) required
for the analysis. Generally the larger the sample size on which the analysis
is based the larger statistical confidence in the outcome. Also the more categorical
factors comprising an ANOVA, the more data hungry the analysis will be. If there
is insufficient degrees of freedom due to small sample sizes or limited replication
then the ANOVA cannot be undertaken.
The examples of analysis of variance applications given previously
above are of what is referred to as One-Way ANOVA: the analysis of variance
based on a single categorical variable or factor examined in isolation (eg.
fertilizer concentration treatment on wheat yield). However, in reality wheat
yield may also be influenced by additional factors (eg. pesticide treatment,
light exposure, field slope etc etc) that may operate singly or in complex synergistic
(combined) ways. Multifactor ANOVA (ie. 2-way ANOVA and its multi-way extensions)
provides a generalized statistical methodology for extending the basic one-way
approach to examine simultaneously the effects of multiple categorical factors
on a measurement variable to assess whether they are having a significant impact
AND also whether possible interactions between factors are having significant
effects. For example, it is quite likely that both pesticide application and
higher light levels will promote better wheat yield. Additionally, however,
high light may also degrade pesticide chemicals and render it inactive, such
that indirect interactions between these factors will become important in influencing
observed crop yield under different treatment regimes. ANOVA essentially provides
a complete decomposition of observed variance amongst relevant categorical factors
in a manner that also explicitly accounts for possible interactions between
factors. Again, this decomposition and the assessment of factor and interaction
significance is based on calculation of F-ratio statistics and associated probability
levels. Such multifactor ANOVA is precisely what has been implemented in the
case of the Reef Check WRAS ANOVA system.
Finally, it is important to note that ANOVA is refered to as a parametric technique,
that is it makes certains assumptions about the shape or distribution that the variance takes (ie. as
a normal or bell-shaped distribution). Another assumption is that the data are "heteroscedastic", or
that there are no trends or correlation between variances and means in the data, otherwise clearly
significant biases will be introduced into the analysis. These assumptions are fine if one is
dealing say with variables that are well behaved (normally distributed), such as the heights of
individuals in a population of people. Here, ANOVA can confidently be applied to the raw data
to assess whether alternative diet treatments had a significant impact on the heights of people
in populations fed different diets. In cases where these assumptions are not met, as is the
case for abundance estimates or other population census data, then transformation of the raw
data is necessary to stabilize variances prior to analysis via ANOVA.
References
Bailey, N.T. 1981. Statistical Methods in Biology (2nd Ed.). Hodder & Stroughton. London.
Sokal, R.S & F.J. Rohlf. 1981. Biometry (2nd Ed.). W.H. Freeman & Compny. New York.

|