Document Detail


Local lazy regression: making use of the neighborhood to improve QSAR predictions.
MedLine Citation:
PMID:  16859315     Owner:  NLM     Status:  MEDLINE    
Abstract/OtherAbstract:
Traditional quantitative structure-activity relationship (QSAR) models aim to capture global structure-activity trends present in a data set. In many situations, there may be groups of molecules which exhibit a specific set of features which relate to their activity or inactivity. Such a group of features can be said to represent a local structure-activity relationship. Traditional QSAR models may not recognize such local relationships. In this work, we investigate the use of local lazy regression (LLR), which obtains a prediction for a query molecule using its local neighborhood, rather than considering the whole data set. This modeling approach is especially useful for very large data sets because no a priori model need be built. We applied the technique to three biological data sets. In the first case, the root-mean-square error (RMSE) for an external prediction set was 0.94 log units versus 0.92 log units for the global model. However, LLR was able to characterize a specific group of anomalous molecules with much better accuracy (0.64 log units versus 0.70 log units for the global model). For the second data set, the LLR technique resulted in a decrease in RMSE from 0.36 log units to 0.31 log units for the external prediction set. In the third case, we obtained an RMSE of 2.01 log units versus 2.16 log units for the global model. In all cases, LLR led to a few observations being poorly predicted compared to the global model. We present an analysis of why this was observed and possible improvements to the local regression approach.
Authors:
Rajarshi Guha; Debojyoti Dutta; Peter C Jurs; Ting Chen
Related Documents :
18484495 - Prediction of drug solubility from molecular structure using a drug-like training set.
14741025 - Molecular similarity searching using atom environments, information-based feature selec...
22985525 - Size-exclusion-chromatography separation of randomly branched polymers with tetrafuncti...
10661565 - Discrimination and molecular design of new theoretical hypolipaemic agents using the mo...
25460365 - Modeling of degradation kinetic and toxicity evaluation of herbicides mixtures in water...
10223665 - On the reduction of errors in dna computation.
Publication Detail:
Type:  Journal Article    
Journal Detail:
Title:  Journal of chemical information and modeling     Volume:  46     ISSN:  1549-9596     ISO Abbreviation:  -     Publication Date:    2006 Jul-Aug
Date Detail:
Created Date:  2006-07-24     Completed Date:  2006-09-20     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  101230060     Medline TA:  J Chem Inf Model     Country:  United States    
Other Details:
Languages:  eng     Pagination:  1836-47     Citation Subset:  IM    
Affiliation:
Department of Chemistry, Pennsylvania State University, University Park, Pennsylvania 16802, USA. rxg218@psu.edu
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:
Models, Chemical*
Quantitative Structure-Activity Relationship*
Regression Analysis

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  A structure-based 3D-QSAR study of anthrapyrazole analogues of the anticancer agents losoxantrone an...
Next Document:  Comparative performance assessment of the conformational model generators omega and catalyst: a larg...