Find the distribution of amino acids in the given proteome. The normal amino acids 1-letter codes are: ACEDGFIHKMLNQPSRTWVY. Sometimes X denotes an unknown amino acid - you can ignore any non-standard letters. Compare this distribution to that of the human proteome (at http://www.d.umn.edu/~mhampton/HomoProts.fa), and the BLOSUM62 distribution which you can find on page 908 of this paper by Yu and Altschul (pubmed id 15509610).
Does the high A+T content of the genome affect the amino acid distribution in proteins?
What are some ways you could investigate that relationship further?
One thing that may be helpful in thinking about the relationship between A+T content of the genome and the amino acid composition is the codon-to-amino-acid map and its inverse. Biopython has that somewhat built-in:
from Bio.Data import CodonTable
t = CodonTable.standard_dna_table
Now t.forward_table
is a dictionary of codons (the keys) to amino acids (the values). It may be useful to you to have the reverse map; one way is to define the following dictionary, whose keys are the amino acid letters and whose values are lists of codons:
inv_dict = {}
for key in t.forward_table.keys():
amino = t.forward_table[key]
if inv_dict.has_key(amino):
inv_dict[amino].append(key)
else:
inv_dict[amino] = [key]