Max-Planck-Institut für Physik komplexer Systeme

International Workshop on 
Biological Evolution and Statistical Physics
May 10-14, 2000 

 

Similarity Detection by Sequence Alignment
       Terence Hwa 
        Physics Department, University of California at San Diego 
        9500 Gilman Drive 
        La Jolla, CA 92093-0319 
        hwa@matisse.ucsd.edu

Sequence alignment algorithms are ubiquituously used by biologists to identify unknonw proteins and reconstruct evolutionary relationships among homologous proteins.  Previous studies of alignment (e.g., the theory of Karlin and Altschul) focused on the statistics of the null model. In this talk, I will describe the properties of the alignment of mutually correlated sequences. Sequence correlations are generated by a toy mutation model mimicking the divergence of a daughter sequence from an ancestor sequence. Generally, the optimal alignment of the daugher and ancestor sequences can recover only a fraction of the
correlated sequence pairs. The dependence of this detected fraction (also known as the ``alignment fidelity'') for different alignment scoring parameters and sequence correlations can be characterized theoretically by exploiting the mapping of the alignment problem to the well-studied statistical physics  problem of a ``directed polymer'' in random potential. It is found that the alignment fidelity degrades smoothly for weak sequence correlations or poor parameter choices. High quality alignments demand appropriate choices of scoring parameters for each given type of sequence correlations. The same phenomenology is encountered in the retrieval of encoded RNA secondary structures, as will be discussed in a later talk by Ralf Bundschuh. 
       
Back

 Max-Planck-Institut für Physik komplexer Systeme
 Nöthnitzer Str. 38, D-01187 Dresden, Germany
 Tel.: +49-351-871-2105 Fax: +49-351-871-2199
 evolutio@mpipks-dresden.mpg.de
 http://www.mpipks-dresden.mpg.de/~evolutio