Abstract
Casual mutations and natural selection have driven the evolution of protein
amino acid sequences that we observe at present in nature. The question about
which is the dominant force of proteins evolution is still lacking of an unambigu-
ous answer. Casual mutations tend to randomize protein sequences while, in
order to have the correct functionality, one expects that selection mechanisms
impose rigid contraints on amino acid sequences. Moreover, one also has to
consider that the space of all possible amino acid sequences is so astonishingly
large that it could be reasonable to have a well tuned amino acid sequence in-
distinguishable from a random one.
In order to study the possibility to discriminate between random and natural
amino acid sequences, we introduce different measures of association between
pairs of amino acids in a sequence, and apply them to a dataset of 1, 047 nat-
ural protein sequences and 10, 470 random sequences, carefully generated in
order to preserve the relative length and amino acid distribution of the natu-
ral proteins. We analize the multidimensional measures with machine learning
techniques and show that, to a reasonable extent, natural protein sequences can
be differentiated from random ones
Anno
2015
Autori IAC
Tipo pubblicazione
Altri Autori
D. Santoni, G. Felici, D. Vergni
Editore
Academic Press,
Rivista
Journal of theoretical biology