Friday, 19 August 2011
Hot or not
My August analysis of nitrile hydratase sequences also revealed that 95% of nitrile hydratase alpha chain sequences come from mesophilic organisms. The remaining split pretty much equally between the thermophilic and the psychrophilic. There was nothing that could be termed a hyperthermophile.
Opisthokonts, choanoflagellata and nitrile hydratases
There is an ongoing project called "The Origins of Multicellularity" which involves Gertraud Burger (University of Montreal, Canada), Michael W. Gray (Dalhousie University, Canada), Peter Holland, (University of Oxford, UK), Nicole King (University of California Berkeley, USA), B. Franz Lang (University of Montreal, Canada), Andrew Roger (Dalhousie University, Canada) and IƱaki Ruiz-Trillo (University of Barcelona, Spain) which seeks to investigate commonalities and differences underlying multicellularity in animals and fungi (to use their phrase). As I understand it, the idea is to take a range of simple organisms which go by the name of “opisthokonts” (which sounds quite unsavoury if said in a Dublin Irish accent) and use genomic analysis to track down features of a proposed eukaryotic ancestor, and propose how evolutionary pathways have lead to fungi and animals. These episothokonts are the sort of miniscule organisms the existence of which are not exactly common knowledge, and hence come with names that make bacterial monikers seem easy to remember. Here is a tree of the organisms this project are looking at- those in blue bold type are those which the genomes the project is working on.
One subset of the opisothokonts are the choanoflagellata, and it contains both Monsiga brevicollis and Salpingoeca rosetta both of which have a single chain NHase. The genome data from the project is going up online as it is produced so NCBI searching works well for making BLASTp type comparisions. I have had a good go at working through the organisms listed on this tree, BLASTp both the beta/alpha single chain NHase from Monsiga brevicollis and the alpha chain from CGA009, and I cannot find anything meaningful in anything apart from Salpingoeca rosetta, which I found interesting. So there are nothing NHase-like in the fungi taxid, nothing in filasterea or ichthyosporea AND nothing in the genome for Monosiga ovata (which seems to have had a bit of a chequered history with impurity issues). That last fact seems weird to me but I even tried using BLASTp with a section of the beta like chain of Monsiga brevicollis (just in case there was more exon/intron messing about going on, as the authors of the original Monsiga brevicollis paper identified, that I hadn't picked up on) and still nothing of consequence. (Diagram below from this paper).
It makes me wonder why just these organisms carry something like a functional NHase… there must be some evolutionary reason.
Friday, 12 August 2011
Proper NCBI numbers for August
I have been looking for a number for the actual number of proper NHase alpha subunit sequences on record. I have done this by really combing through the NCBI protein records using a BLASTp approach for similarity, and excluding sequences that aren't an appropriate length nor have the usual active site binding motif. I have ignored environmental samples, and dropped out data from PDB files.
As of the start of this week (8th Aug 2011) I reckon there are about 190 NHase alpha sequences. That breaks down as about 25% iron and 75% cobalt centred. There are four from eukaryotic organisms (three marine and one plant), and the rest are... not.
As of the start of this week (8th Aug 2011) I reckon there are about 190 NHase alpha sequences. That breaks down as about 25% iron and 75% cobalt centred. There are four from eukaryotic organisms (three marine and one plant), and the rest are... not.
Aureococcus anophagefferens- another eukaryote with a nitrile hydratase alpha chain
Whilst combing the data for new versions of nitrile hydratases which might be worthy of investigation, I came across a protein in the eukaryote alga Aureococcus anophagefferens which looks hugely like the alpha chain of a cobalt centred nitrile hydratase. I havent spotted the beta chain in the genome yet, but it is interesting to see that it doesnt seem to share the single subunit pattern of NHase like Monosiga brevicollis or Salpingoeca.
This phytoplankton is behind algal blooms which give "brown tides" off the eastern seaboard of the USA. (Image of organism below from here)
This phytoplankton is behind algal blooms which give "brown tides" off the eastern seaboard of the USA. (Image of organism below from here)
Monday, 1 August 2011
Nitrile hydratases in metagenomes: when is something too different?
Looking for variants of an enzyme in an exotically sourced metagenome is an obvious way to extend the diversity of that enzyme classification. There are a number of these metagenomes now fully searchable but it is a whole order of difficulty greater to do searching in them than in genome sequences (certainly for me as someone who is trained as a chemist, and is learning bioinformatics as I go along). The resource I have been playing with recently is MetaBioME which offers search tools (like this) for investigating almost 50 metagenomes. It doesn't have data which doesn't occur in NCBI resources but it is a different way to look through the data. Maybe there is a quicker way to analyze the data from it than the following, but if you put in a FASTA sequence such as the one for the alpha subunit of Rhodopseudomonas palustris CGA009, it will send you back matching contigs for one of the metagenomes.
Even if you interrogate with a protein sequence, it gives a nucleotide sequence back so for someone like me who likes to "eyeball" their hits, this requires a translation tool like the one at ExPASy which will give you an answer including all three reading frames for both 5'3' and vice versa. I must admit I then use the rather low teach tool of text searching within my browser window for fragments of the classic active site motif to find if there is a classic nitrile hydratase present. That is quite a long sequence of things to do, so I dont think I'll be doing this by hand with, for instance, the 213 hits from the Global Ocean Sampling Expedition metagenome it gave on default settings. However there are metagenomes which sound interesting in the list- I thought "termite gut" sounded worth investigating, and after tweaking the settings (as its default search settings seem rather conservative for nitrile hydratase searching), you do get a top hit which aligns amino acids 25-209 of CGA009 alpha with rather a nice correlation.
The usual success indicator of asterisks, colons and full stops are all over the alignment... but look at the metal binding motif (third row, about a third along)- it all looks the same as a cobalt bound NHase but instead of "VCTLCSC", it has "VCTQCSC". Leucine for glutamine- can that be right? Leucine isnt involved in binding but that is quite a change. I wonder what are the chances this is weird but right or is it just a wrong bit of nucleotide identification/translation. After all the triplets CUA and CUG are very similar to CAA and CAG.
Even if you interrogate with a protein sequence, it gives a nucleotide sequence back so for someone like me who likes to "eyeball" their hits, this requires a translation tool like the one at ExPASy which will give you an answer including all three reading frames for both 5'3' and vice versa. I must admit I then use the rather low teach tool of text searching within my browser window for fragments of the classic active site motif to find if there is a classic nitrile hydratase present. That is quite a long sequence of things to do, so I dont think I'll be doing this by hand with, for instance, the 213 hits from the Global Ocean Sampling Expedition metagenome it gave on default settings. However there are metagenomes which sound interesting in the list- I thought "termite gut" sounded worth investigating, and after tweaking the settings (as its default search settings seem rather conservative for nitrile hydratase searching), you do get a top hit which aligns amino acids 25-209 of CGA009 alpha with rather a nice correlation.
The usual success indicator of asterisks, colons and full stops are all over the alignment... but look at the metal binding motif (third row, about a third along)- it all looks the same as a cobalt bound NHase but instead of "VCTLCSC", it has "VCTQCSC". Leucine for glutamine- can that be right? Leucine isnt involved in binding but that is quite a change. I wonder what are the chances this is weird but right or is it just a wrong bit of nucleotide identification/translation. After all the triplets CUA and CUG are very similar to CAA and CAG.
Subscribe to:
Posts (Atom)