| Ulla: a program for calculating environment-specific amino acid substitution tables. | |
| | |
| Jump to Full Text | |
MedLine Citation:
|
PMID: 19417059 Owner: NLM Status: MEDLINE |
Abstract/OtherAbstract:
|
SUMMARY: Amino acid residues are under various kinds of local environmental restraints, which influence substitution patterns. Ulla,(1) a program for calculating environment-specific substitution tables, reads protein sequence alignments and local environment annotations. The program produces a substitution table for every possible combination of environment features. Sparse data is handled using an entropy-based smoothing procedure to estimate robust substitution probabilities. AVAILABILITY: The Ruby source code is available under a Creative Commons Attribution-Noncommercial License along with additional documentation from http://www-cryst.bioc.cam.ac.uk/ulla. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
| | |
Authors:
|
Semin Lee; Tom L Blundell |
Related Documents
:
|
10350579 - The international program on plant bioassays and the report of the follow-up study afte... 16337619 - Qualitative differences between naïve and scientific theories of evolution. 21457309 - Trends in community pharmacy counts and closures before and after the implementation of... 16392719 - Virtual reality and computer-enhanced training applied to wheeled mobility: an overview... 9357639 - Generation of development environments for the arden syntax. 18541069 - Conceptual framework for a danish human biomonitoring program. 11939489 - Experience with external quality assessment of drugs of abuse testing in the lombardy r... 10350579 - The international program on plant bioassays and the report of the follow-up study afte... 16104419 - Monitoring of the process of composting of kitchen waste in an institutional scale worm... |
Publication Detail:
|
Type: Journal Article; Research Support, Non-U.S. Gov't Date: 2009-05-05 |
Journal Detail:
|
Title: Bioinformatics (Oxford, England) Volume: 25 ISSN: 1367-4811 ISO Abbreviation: Bioinformatics Publication Date: 2009 Aug |
Date Detail:
|
Created Date: 2009-07-20 Completed Date: 2010-01-11 Revised Date: - |
Medline Journal Info:
|
Nlm Unique ID: 9808944 Medline TA: Bioinformatics Country: England |
Other Details:
|
Languages: eng Pagination: 1976-7 Citation Subset: IM |
Affiliation:
|
Department of Biochemistry, University of Cambridge, Old Addenbrooke's Site, Cambridge CB2 1GA, UK. semin@cryst.bioc.cam.ac.uk |
Export Citation:
|
APA/MLA Format Download EndNote Download BibTex |
| MeSH Terms | |
Descriptor/Qualifier:
|
Amino Acid Substitution* Computational Biology / methods* Databases, Protein Proteins / chemistry* Sequence Alignment Sequence Homology, Amino Acid Software* |
| Grant Support | |
ID/Acronym/Agency:
|
//Wellcome Trust |
| Chemical | |
Reg. No./Substance:
|
0/Proteins |
| Comments/Corrections | |
| Full Text | |
|
Journal Information Journal ID (nlm-ta): Bioinformatics Journal ID (publisher-id): bioinformatics Journal ID (hwp): bioinfo ISSN: 1367-4803 ISSN: 1460-2059 Publisher: Oxford University Press |
Article Information Download PDF ![]() ? 2009 The Author(s) creative-commons: Received Day: 17 Month: 3 Year: 2009 Revision Received Day: 18 Month: 4 Year: 2009 Accepted Day: 28 Month: 4 Year: 2009 Print publication date: Day: 1 Month: 8 Year: 2009 Electronic publication date: Day: 5 Month: 5 Year: 2009 pmc-release publication date: Day: 5 Month: 5 Year: 2009 Volume: 25 Issue: 15 First Page: 1976 Last Page: 1977 ID: 2712337 PubMed Id: 19417059 DOI: 10.1093/bioinformatics/btp300 Publisher Id: btp300 |
| Ulla: a program for calculating environment-specific amino acid substitution tables | |
| Semin Lee* | |
| Tom L. Blundell | |
| Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Old Addenbrooke's Site, Cambridge CB2 1GA, UK |
|
| Correspondence: * To whom correspondence should be addressed. Associate Editor: Anna Tramontano |
|
In the evolution of proteins, individual amino acid residues are under various kinds of local environmental restraints such as secondary structure type, solvent accessibility and hydrogen bonding patterns. Previous study of amino acid substitutions as a function of local environment has showed that there are clear differences among substitution patterns under various environmental restraints (Overington et al., 1992). The unique patterns of amino acid substitutions have been successfully exploited to predict the stability of protein mutants (Topham et al., 1993), to identify potential interaction sites (Chelliah et al., 2004; Gong and Blundell, 2008) and to detect remote sequence-structure homology (Chelliah et al., 2005).
However, estimating amino acid substitution probabilities is not a trivial problem, especially when there are a very small number of observations in specific combinations of environments. To cope with the sparse data problem, an algorithm was developed by Sali (1991) as an extension of the method used by Sippl (1990) to derive robust potentials of mean force. Several variants of the generalized procedure such as Makesub (Topham et al., 1993) and SUBST (Mizuguchi, unpublished results) have been subsequently implemented for smoothing substitution probabilities. Nevertheless, each lacks crucial features implemented in the other, and they use slightly different procedures for smoothing substitution probabilities, which may lead to very different amino acid substitution matrices.
To overcome these problems, we developed Ulla, a program for calculating environment-specific substitution tables (ESSTs), to unify all the major features of the previously developed programs and to provide additional functionalities. The program also generates heat maps from substitution tables to visualize the degree of conservation of amino acids under the environmental restraints.
Ulla reads multiple sequence alignments and annotations for local environments in JOY template format (Mizuguchi et al., 1998a). Users can provide their own definition of environment features, and an environment feature can be constrained to count substitutions only when the environment of residues is conserved. Ulla also supports confining percent identity (PID) range of sequence pairs to be considered and uses BLOSUM-like weighting scheme (Henikoff and Henikoff, 1992) to minimize sampling bias from highly similar sequences.
Ulla uses entropy-based smoothing procedures to reduce problems caused by sparse data. It is an iterative procedure that estimates probability distribution by perturbing the previous probability distribution with the successive measurement (Sali, 1991; Sippl, 1990). Hence, starting from a uniform frequency distribution, the estimated probability distribution at each step serves as an approximation for the next probability distribution (see Supplementary Material for details).
As an illustration, we generate ESSTs from HOMSTRAD alignments (Mizuguchi et al., 1998b) with environment feature definitions of secondary structure type and solvent accessibility (Fig. 1a):
- # name of feature (string);\\
- # values adopted in .tem (alignment) file (string);\\
- # class labels assigned for each value (string);\\
- # constrained or not (T or F);\\
- # silent (used as masks)? (T or F) secondary structure and phi angle;HEPC;HEPC;F;F solvent accessibility;TF;Aa;F;F
JOY (Mizuguchi et al., 1998a) is useful to annotate the alignments with the structural environments, but Ulla recognizes any environment feature definition which conforms to the format above. Paths for an environment definition file and a file containing the list of environment feature annotated alignments are given to Ulla as input:
- $ ulla -c feature.def -l alignments.lst
Ulla generates ESSTs from a sparse data set using entropy-based smoothing procedures. It allows us to conduct analyses of amino acid substitution patterns under various environmental restraints. The resultant ESSTs can be exploited in many ways such as binding site prediction, remote homology detection, and protein stability estimation.
Ulla is publicly available on the web site http://github.com/semin/ulla, where the code is maintained in a Git repository, and its pre-built RubyGems package can be obtained from http://rubyforge.org/projects/ulla.
Notes
FN11Ulla is a traditional Korean percussion instrument.
We thank Juok Cho for statistical advice; Dan Bolser and Duangrudee Tanramluk for review of the manuscript; Richard Bickerton and Bernardo Ochoa for thorough beta testing.
Funding: Mogam Science Scholarship Foundation (to S.L., partial); The Wellcome Trust (to T.L.B.)
Conflict of Interest: none declared.
REFERENCES
| Chelliah V,et al. Distinguishing structural and functional restraints in evolution in order to identify interaction sitesJ. Mol. Biol.Year: 20043421487150415364576 | |
| Chelliah V,et al. Functional restraints on the patterns of amino acid substitutions: application to sequence-structure homology recognitionProteinsYear: 20056172273116193489 | |
| Gong S,Blundell TL. Discarding functional residues from the substitution table improves predictions of active sites within three-dimensional structuresPLoS Comput. Biol.Year: 20084e100017918833291 | |
| Henikoff S,Henikoff JG. Amino acid substitution matrices from protein blocksProc. Natl Acad. Sci. USAYear: 19928910915109191438297 | |
| Mizuguchi K,et al. JOY: protein sequence-structure representation and analysisBioinformaticsYear: 1998a146176239730927 | |
| Mizuguchi K,et al. HOMSTRAD: a database of protein structure alignments for homologous familiesProtein Sci.Year: 1998b7246924719828015 | |
| Overington J,et al. Environment-specific amino acid substitution tables: tertiary templates and prediction of protein foldsProtein Sci.Year: 199212162261304904 | |
| Sali A. Modelling three-dimensional structure of proteins from their sequence of amino acid residuesPhD ThesisYear: 1991LondonUniversity of London | |
| Sippl MJ. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteinsJ. Mol. Biol.Year: 19902138598832359125 | |
| Topham CM,et al. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tablesJ. Mol. Biol.Year: 19932291942208421300 |
Figures
Article Categories:
|
|
Previous Document: Evaluating reproducibility of differential expression discoveries in microarray studies by consideri...
Next Document: One RNA aptamer sequence, two structures: a collaborating pair that inhibits AMPA receptors.
