Search This Blog

Monday 1 August 2011

Nitrile hydratases in metagenomes: when is something too different?

Looking for variants of an enzyme in an exotically sourced metagenome is an obvious way to extend the diversity of that enzyme classification. There are a number of these metagenomes now fully searchable but it is a whole order of difficulty greater to do searching in them than in genome sequences (certainly for me as someone who is trained as a chemist, and is learning bioinformatics as I go along). The resource I have been playing with recently is MetaBioME which offers search tools (like this) for investigating almost 50 metagenomes. It doesn't have data which doesn't occur in NCBI resources but it is a different way to look through the data. Maybe there is a quicker way to analyze the data from it than the following, but if you put in a FASTA sequence such as the one for the alpha subunit of Rhodopseudomonas palustris CGA009, it will send you back matching contigs for one of the metagenomes.
Even if you interrogate with a protein sequence, it gives a nucleotide sequence back so for someone like me who likes to "eyeball" their hits, this requires a translation tool like the one at ExPASy which will give you an answer including all three reading frames for both 5'3' and vice versa. I must admit I then use the rather low teach tool of text searching within my browser window for fragments of the classic active site motif to find if there is a classic nitrile hydratase present. That is quite a long sequence of things to do, so I dont think I'll be doing this by hand with, for instance, the 213 hits from the Global Ocean Sampling Expedition metagenome it gave on default settings. However there are metagenomes which sound interesting in the list- I thought "termite gut" sounded worth investigating, and after tweaking the settings (as its default search settings seem rather conservative for nitrile hydratase searching), you do get a top hit which aligns amino acids 25-209 of CGA009 alpha with rather a nice correlation.
The usual success indicator of asterisks, colons and full stops are all over the alignment... but look at the metal binding motif (third row, about a third along)- it all looks the same as a cobalt bound NHase but instead of "VCTLCSC", it has "VCTQCSC". Leucine for glutamine- can that be right? Leucine isnt involved in binding but that is quite a change. I wonder what are the chances this is weird but right or is it just a wrong bit of nucleotide identification/translation. After all the triplets CUA and CUG are very similar to CAA and CAG.

No comments: