Take-Home Exam #1 (McCarthy)

BIOS-870 Multivariate Stats, Spring 2007


 

1.  Using ISI Web of Knowledge, I uncovered over 17,000 references (from the last 5 years) in the primary biological literature using “cluster analysis” as the keyword.  Pick one paper that is closest to your disciplinary interests and that utilizes resemblance coefficients and cluster analysis. Provide a detailed critique.  More specifically, explain what was done biologically and experimentally. Examine how resemblance coefficients and cluster analysis were utilized and interpreted.  Was the coefficient selected the best choice?  Were there better alternatives?  What type of cluster analysis was performed and was it a good choice? Please provide me with a xerographic copy of the paper when you submit your exam (I will return it). [20 points]

 

 

2.  Recently, I collected stand data on trees growing in a Wisconsin forest. The general goal was to describe the vegetation of the forest. Thirty 500 m2 permanent plots were placed throughout the stand. Within each plot, each tree (defined as a woody stem ³ 10 cm diameter at breast height; DBH) was identified to species, given an ID tag, measured for DBH, and recorded. The raw data (ca. 2400 trees) are provided as an Excel spreadsheet (click here). Prior to analysis, DBH must first be converted to basal area by calculating the area of a circle using the diameter measurement provided and species summed by plot.  This will yield a value in cm2 per 500 m2. In order to be consistent with the rest of the vegetation literature, each species must then be converted to m2 ha-1. Your resulting matrix should be on the order of 30 (plots) ´ 25 (species) with the data being basal area in m2 ha-1. Select an appropriate resemblance coefficient and cluster analysis procedure to evaluate the species relationships within this data set. NB: there is a separate tab at the bottom which provides an explanation of species acronyms and some basic ecological information. [40 points]

 

 

3.  In 2004, Dr. Albrecht collected a vegetation data set used to describe seedling regeneration at the Ohio Hills Site of the National Fire and Fire Surrogate research program. The Ohio Hills Site included data (click here) from three forests: REMA (R), Tar Hollow (T), and Zaleski (Z). Within each forest, four experimental stands were juxtaposed (each ca. 20 ha in size), such that one was burned (B), another thinned (T), the third was thinned and then burned (TB), and the final stand was used as an untreated control (C). Within each of these treatment stands, plots (10 per stand) were stratified by moisture condition such that some were xeric (X), some intermediate (I), and some mesic (M) [the exact number of replicates differ by forest due to availability]. The resulting matrix is roughly 120 groups ´ 22 species; data are number of individuals per 20 m2. To be consistent with the rest of the literature, data should first be converted to density per 100 m2. Select an appropriate resemblance coefficient, perform a cluster analysis, and then evaluate the results. Do clusters form as a function of forest, treatment, moisture, or none of these criteria? Comment on the results. For the purposes of this question, you need not worry about the species identity and the specifics of their ecology. [40 points]