Having gone through the steps to build a phylogeny of the most common lichen photobiont, Trebouxia in my last post, I will now go on to discussing the host association patterns that it reveals. Here is the Trebouxia ITS tree generated previously:
Trebouxia ITS phylogeny. Major clades are differentilly coloured and named according to authentic strains
I’ve coloured all of the taxa within clades according to the colours of the named strains and I’ve assigned unique colours to each clade that does not contain named strains. I have not attempted to break up T. jamesii or T. impressa into sub-clades, though doing so would probably be justified. This will be a topic for a future post. I should also point out that T. jamesii is referred to as T. simplex is some papers.
In contrast to Nostoc photobionts where the fasta headers were consistently labeled with the host information, these sequences are …not. I used a bioperl wrapper to NCBI’s Eutils interface to download genbank format sequences and parsed them to extract host association information from the “host”, “note” and “isolation source” annotions. I also extracted information about the author of each sequence and where it was published while I was at it: Continue reading
Having beaten the phylogeny of symbiotic cyanobacteria into submission in my previous post, I am now tackling the green algae. My plan was to start with a big-picture analysis of 18S ribosomal RNA sequences, but my initial blast search returned over 10,00o 454 reads from metagenomic projects which was a lot more “environmental isolate XXX” than I felt like dealing with. Besides, I don’t know that I could add much to this recent overview. Therefore, I am going to focus on the most important lineage of lichenized algae: Trebouxia. There have been a large number of studies that have obtained photobiont ITS sequences from a variety of Trebouxia associated lichens, so these are the data that I looked at. Continue reading
**Post has been updated with some corrections to the host information in the first phylogeny**
Today I am finally going to take a detailed look at the Nostoc phylogeny that I have been working on. But before I can begin, I have to figure out a way to highlight interesting taxa in an automated way. To do this, I wrote a script that adds html color tags after taxon names according to various classifications. While I was at it, I converted the branch support values to a binary system (≥0.9 vs. <0.9), which I can display as black circles on significantly supported branches. Note that this script requires that the tree be in NEXUS format rather than the plain Newick that is produced by PhyML. Opening the tree file in FigTree and saving it converts it to NEXUS, or the conversion could be scripted using Bioperl. Continue reading
Looking through the tree produced in my last post, I noticed that several interesting sequences were missing from the tree. There are also less sequences in the tree than I get if I search for “Nostoc rbcX” in Entrez. Turns out that this is because blast+ limits the number of results returned to 500 by default. In retrospect, the fact that I ended up with exactly 500 sequences should have been a red flag. Fortunately, blast+ includes an option to exclude sequences from the results by GI number. Continue reading
Having obtained 496 Nostoc rbcX sequences (plus one outgroup) and used them to infer a reasonable phylogeny, all that is left is to assign host association information to the branches. This will require (a) parsing the sequence files to obtain host info for each sequence, (b) associating each non-redundant sequence in the tree with the host info for all identical sequence, and (c) displaying all of this information on the tree. Continue reading
Perhaps not surprisingly given my background, I will be starting with Nostoc photobionts. In my opinion, the most useful marker for this group is rbcX, so I will be starting there.
I have decided to use blast to obtain all sequences that are homologous to a reference sequence. This will allow me to catch sequences that have been mis-labeled and/or sequences from organisms that have been mis-identified. What better reference sequence to use than one from the original paper that used this marker? (All Perl scripts used below and data files produced are included in the PhotobiontDiversity repository). Continue reading
As with most organisms, DNA sequencing has revolutionised our understanding of the genetic diversity and phylogenetic relationships of lichen photobionts over the last two decades. However, unlike most organisms, these insights have rarely been translated to formal taxonomic changes and thus, no comprehensive system exists to organize the diversity that has been uncovered. Studies focus on different taxonomic scales, sequence different markers and use different analyses. Even when the methods are consistent, comparisons are rarely made to all related sequences in the database. Studies that do attempt a comprehensive analysis, such as this one are hopelessly out of date by the time they are published.
The goal of this blog is to provide a real-time snapshot of the current state of knowledge of genetic diversity in lichen photobionts and related organisms. Over the coming weeks, I will be populating a repository with as many photobiont DNA sequences as possible, along with associated metadata. I will post phylogenies derived from the sequences here and provide commentary as appropriate. I will be updating the phylogenies as new data become available and highlighting interesting findings from the associated studies.