R- Evolutionary relationships, non-independence and correlation…

In this blog entry, I would like to share my experience with R and data analysis. A few months back I developed a keen interest in the relatively recent development of NGS (Next Generation Sequencing) data analysis. Subsequently, I decided to work on the evolution of specific features in bacterial genomes. As you may know, bacterial genomes are enormously variable in their structure and composition (e.g. codon usage, GC bias, and gene copy number).

During the course of this project I came across the issue of non-independence in the context of correlation between two traits (ex: genome size and genomic GC content). I was using R and used the correlation function to determine whether these two traits were correlated. The correlation estimate was 0.6 between GenomeSize Vs GC%. However, I soon found out that since a large number of species in my dataset share common ancestry they are not independent. So, If I were to correlate any two traits – I would have to take into account the shared ancestry. The most widely used method for analyzing associations between continuous traits in species is the phylogenetically independent (PIC) contrasts (Felsenstein, 1985). PIC essentially removes the effect of shared ancestry in the traits. As most things, R had a package called ‘geiger’ to perform PIC. After taking into account shared ancestry of species, I obtained a correlation estimate of 0.47 between the same two traits.
Whitlock and Schluter cite a similar example with a dataset of 17 lily species in which the closely related lily species tend to have the same flower type as compared to slightly more distant species.

Advertisements
This entry was posted in Reflections. Bookmark the permalink.

One Response to R- Evolutionary relationships, non-independence and correlation…

  1. quynhq says:

    This is really interesting for me as I’m doing some genetic analysis myself but within a single lizard species. I was wondering that if the genome sizes between the species you were comparing varied largely, or if species of a genus/shared ancestors have similar genome size and maybe even content. There’s so many different things you can run a correlation test for with genetics!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s