Scimago Lab
powered by Scopus
call: +1.631.470.9640
Mon-Fri 10 am - 2 pm EST


Medical Science Monitor Basic Research


eISSN: 1643-3750

A tabular approach to the sequence-to-structure relation in proteins (tetrapeptide representation) for de novo protein design.

Jan Meus, Michał Brylinski, Monika Piwowar, Monika Piwowar, Zdzisław Wiśniowski, Justyna Stefaniak, Leszek Konieczny, Grzegorz Surówka, Irena Roterman

Med Sci Monit 2006; 12(6): BR208-214

ID: 451246

Available online: 2006-06-01

Published: 2006-06-01

BACKGROUND: Experimental observations classify the protein-folding process as a multi-step event. The backbone conformation has been experimentally recognized as responsible for the early-stage structural forms of a polypeptide. The sequence-to-structure and structure-to-sequence relation is critical for predicting protein structure. A contingency table representing this relation for tetrapeptides in their early-stage is presented. Their correlation seems to be essential in protein-folding simulation. MATERIAL/METHODS: The polypeptide chains of all the proteins in the Protein Data Bank were transformed into their early-stage structural forms. The tetrapeptide was selected as the structural unit. Tetrapetide sequences and structures were expressed by letter codes. The transformation of a contingency table of any size (here: 160,000x2401) to a 2x2 table performed for each non-zero cell of the original table allowed calculation of the rho-coefficient measuring the strength of the relation. RESULTS: High values of the rho-coefficient extracted sequences of strong structural determinability and structures of high sequence selectivity. The web-site program to calculate the rho-coefficient ranking list was constructed to enable applying this method to any problem of contingency table analysis. CONCLUSIONS: The results revealed sequence-to-structure (and vice versa) correlation in early-stage folding. Surprisingly, the irregular structural forms of loops and bends appeared to be highly determined. Comparison of these results with another method based on information entropy revealed high accordance. The method oriented on interpretation of a large contingency table seems very useful especially for large-scale microarray analysis, a very popular technique in the post-genomic era.

Keywords: Databases, Protein, Amino Acid Sequence, Molecular Sequence Data, Peptides - chemistry, Protein Conformation, Protein Folding, Proteins - chemistry