Max-Planck-Institut für Physik komplexer
Systeme
International Workshop on
Biological Evolution and Statistical
Physics
May 10-14, 2000
Similarity Detection by Sequence Alignment
Terence Hwa
Physics Department, University
of California at San Diego
9500 Gilman Drive
La Jolla, CA 92093-0319
hwa@matisse.ucsd.edu
Sequence alignment algorithms are ubiquituously used by biologists
to identify unknonw proteins and reconstruct evolutionary relationships
among homologous proteins. Previous studies of alignment (e.g., the
theory of Karlin and Altschul) focused on the statistics of the null model.
In this talk, I will describe the properties of the alignment of mutually
correlated sequences. Sequence correlations are generated by a toy mutation
model mimicking the divergence of a daughter sequence from an ancestor
sequence. Generally, the optimal alignment of the daugher and ancestor
sequences can recover only a fraction of the
correlated sequence pairs. The dependence of this detected fraction
(also known as the ``alignment fidelity'') for different alignment scoring
parameters and sequence correlations can be characterized theoretically
by exploiting the mapping of the alignment problem to the well-studied
statistical physics problem of a ``directed polymer'' in random potential.
It is found that the alignment fidelity degrades smoothly for weak sequence
correlations or poor parameter choices. High quality alignments demand
appropriate choices of scoring parameters for each given type of sequence
correlations. The same phenomenology is encountered in the retrieval of
encoded RNA secondary structures, as will be discussed in a later talk
by Ralf Bundschuh.
Back
|