Biological Evolution and Statistical Physics

Max-Planck-Institut für Physik komplexer Systeme

International Workshop on
Biological Evolution and Statistical Physics
May 10-14, 2000

Similarity Detection by Sequence Alignment
       Terence Hwa
        Physics Department, University of California at San Diego
        9500 Gilman Drive
        La Jolla, CA 92093-0319
        hwa@matisse.ucsd.edu

Sequence alignment algorithms are ubiquituously used by biologists to identify unknonw proteins and reconstruct evolutionary relationships among homologous proteins. Previous studies of alignment (e.g., the theory of Karlin and Altschul) focused on the statistics of the null model. In this talk, I will describe the properties of the alignment of mutually correlated sequences. Sequence correlations are generated by a toy mutation model mimicking the divergence of a daughter sequence from an ancestor sequence. Generally, the optimal alignment of the daugher and ancestor sequences can recover only a fraction of the
correlated sequence pairs. The dependence of this detected fraction (also known as the ``alignment fidelity'') for different alignment scoring parameters and sequence correlations can be characterized theoretically by exploiting the mapping of the alignment problem to the well-studied statistical physics problem of a ``directed polymer'' in random potential. It is found that the alignment fidelity degrades smoothly for weak sequence correlations or poor parameter choices. High quality alignments demand appropriate choices of scoring parameters for each given type of sequence correlations. The same phenomenology is encountered in the retrieval of encoded RNA secondary structures, as will be discussed in a later talk by Ralf Bundschuh.

Back

Max-Planck-Institut für Physik komplexer Systeme
Nöthnitzer Str. 38, D-01187 Dresden, Germany
Tel.: +49-351-871-2105 Fax: +49-351-871-2199
evolutio@mpipks-dresden.mpg.de
http://www.mpipks-dresden.mpg.de/~evolutio