A QSAR Study of Matrix Metalloproteinases Type 2 (MMP2) Inhibitors with Cinnamoyl Pyrrolidine Derivatives.  
Jump to Full Text  
MedLine Citation:

PMID: 22896815 Owner: NLM Status: PubMednotMEDLINE 
Abstract/OtherAbstract:

A multivariate PLSQSAR study with a data set of 31 cinnamoyl pyrrolidine derivatives described as type 2 matrix metalloproteinases (MMP2) inhibitors is presented in this paper. The variable selection was performed with the Ordered Predictors Selection (OPS) algorithm. The PLS model presented six descriptors and three Latent Variables (LV) that cumulated 71.845% of variance. LeaveNout (LNO) cross validation and yrandomization tests showed that the model presented robustness and no chance correlation, respectively. The descriptors indicated that MMP2 inhibition depends mainly on the electronic properties of the compounds. The model obtained can be useful as a support tool in the design of new MMP2 inhibitors. 
Authors:

Eduardo Borges de Melo 
Related Documents
:

2681815  Choosing and using a microcomputer for tropical epidemiology. ii. study implementation. 19292655  The development of a prospective data collection process in a traditional chinese medic... 21198685  Deconstructing heterosexism: becoming an lgb affirmative heterosexual couple and family... 12881555  Genetics. fda races in wrong direction. 3839585  Statistical software packages for physical therapists. statpak for the ibm pc. 15820855  Modeling behavioral recovery following lesion induction in the rat dentate gyrus. 
Publication Detail:

Type: Journal Article Date: 20120131 
Journal Detail:

Title: Scientia pharmaceutica Volume: 80 ISSN: 22180532 ISO Abbreviation: Sci Pharm Publication Date: 2012 Jun 
Date Detail:

Created Date: 20120816 Completed Date: 20121002 Revised Date: 20130530 
Medline Journal Info:

Nlm Unique ID: 0026251 Medline TA: Sci Pharm Country: Austria 
Other Details:

Languages: eng Pagination: 26581 Citation Subset:  
Affiliation:

Theoretical Medicinal and Environmental Chemistry Laboratory (LQMAT), Department of Pharmacy, Western Paraná State University (Unioeste), 2069 Universitária St, 8519110, CascaveI, PR, Brazil. 
Export Citation:

APA/MLA Format Download EndNote Download BibTex 
MeSH Terms  
Descriptor/Qualifier:


Comments/Corrections 
Full Text  
Journal Information Journal ID (nlmta): Sci Pharm Journal ID (isoabbrev): Sci Pharm Journal ID (publisherid): Scientia Pharmaceutica ISSN: 00368709 ISSN: 22180532 Publisher: Österreichische ApothekerVerlagsgesellschaft 
Article Information Download PDF © de Melo; licensee Österreichische ApothekerVerlagsgesellschaft m. b. H., Vienna, Austria. License: Received Day: 27 Month: 12 Year: 2011 Accepted Day: 31 Month: 1 Year: 2012 Print publication date: Month: 6 Year: 2012 collection publication date: Year: 2012 Electronic publication date: Day: 31 Month: 1 Year: 2012 Volume: 80 Issue: 2 First Page: 265 Last Page: 281 ID: 3383210 PubMed Id: 22896815 DOI: 10.3797/scipharm.111221 Publisher Id: scipharm201280265 
A QSAR Study of Matrix Metalloproteinases Type 2 (MMP2) Inhibitors with Cinnamoyl Pyrrolidine Derivatives  
Eduardo Borges de Melo  
Theoretical Medicinal and Environmental Chemistry Laboratory (LQMAT), Department of Pharmacy, Western Paraná State University (Unioeste), 2069 Universitária St, 8519110, CascaveI, PR, Brazil. 

Correspondence: Email: eduardo.melo@unioeste.br 
The matrix metalloproteinases (MMPs) are a family of enzymes that are intimately involved in tissue remodeling. These zinccontaining endopeptidases consist of subsets of enzymes, and they are involved in the degradation of the extracellular matrix (ECM) that forms the connective material between cells and around tissues. In pathologic conditions an increase of MMP activity occurs, leading to tissue degradation [^{1}].
Currently, about 27 MMPs are known. Their overexpression is associated with several diseases: cancer, cardiovascular diseases (including congestive heart failure), osteoarthritis, rheumatoid arthritis, chronic obstructive pulmonary disease, psoriasis, dermatitis, Alzheimer’s disease and periodontitis, among others [^{1}, ^{2}]. Thus, MMPs are currently an interesting target for drug design. However, despite the great amount of research, the tetracycline doxycycline (Fig. 1) is the only MMP inhibitor available in therapeutics. This longeracting antibiotic also presents a weak inhibition of collagenases (MMPs1, 8 and 13), and it is currently marketed for clinical treatment of chronic periodontal disease [^{3}–^{5}].
Among the MMPs, MMP2 and MMP9 are named gelatinases. These enzymes are able to degrade a broad range of matrix substrates, including gelatin, type IV collagen of basal laminae, as well as other nonhelical collagen domains and proteins, such as fibronectin and laminin, that constitute cellular connective tissue and are strongly involved in both normal and pathological tissue remodeling [^{1}, ^{6}]. The overexpression of this subclass, especially MMP2, is found to be strongly correlated to an aggressive malignant phenotype, and it presents poor prognosis for several types of aggressive cancer, such as ovarian, lung, breast, bladder and gastric cancers [^{6}–^{8}]. Thus, MMP2 inhibitors have been studied as a target for anticancer drug design.
Quantitative structureactivity relationship (QSAR) describes how a given biological activity can vary as a function of molecular descriptors derived from the chemical structure of a set of molecules. A model containing those calculated descriptors can be used to predict responses from new compounds, constituting an important tool to support the synthesis of new drugs [^{9}, ^{10}]. Thus, considering the continuous need for new anticancer drugs, a QSAR study based on 31 cinnamoyl pyrrolidine derivatives (Table 1) synthesized and assayed by Zhang et al. [^{8}] was carried out. The dataset was obtained through a hybridization approach between the Lhydroxyproline scaffold, the MMPs substrate, the cinnamic acid, an inhibitor of the A5491 human lung gland cancer, and the caffeic acid, an MMP2 inhibitor (Fig. 2). The aim was obtaining a mathematical model that could be used for prediction of the inhibitory potency of new cinnamoyl pyrrolidine derivatives against MMP2.
The study was carried out using the QSAR Modeling [^{11}]. The variable selection with the Ordered Predictors Selection (OPS) algorithm [^{12}–^{15}] generated a model based on three Latent Variables (LV) that cumulate 71.845% of variance (LV1: 18.043%; LV2: 31.298%; LV3: 22.504%). These LV derivate from six selected descriptors: SOFT (softness), EEig02r (eigenvalue 02 from edge adjacent matrix weighted by resonance integrals), α_{xx} (the component vector to the overall polarizability in the xaxis), q10NBO (partial charge of the atom #10 calculated through Natural Bond Orbitals approach), q2NBO (partial charge of the atom #2 calculated through Natural Bond Orbitals approach) and SsssN(oth) (Estate index for amino group attached to functional groups not aliphatic or aromatic). The values of each descriptor are available in the Supporting Information, Table S1. The standardized regression coefficients are −0.549 for EEig02r, 0.545 for SOFT, 0.377 for α_{xx}, 0.238 for q10NBO, 0.250 for q1NBO, and −0.314 for SsssN(oth). According to Wold [^{16}], regression coefficients larger than about half the maximum regression coefficient value indicate that the descriptor is significant for the PLSQSAR model. Thus, the reference value is 0.274. The coefficients of q2NBO and q10NBO are lower than this value, but its removal decreases the statistical quality of the model. Thus, these descriptors can be considered important for the model. In addition, the maximum difference is only 0.036 units, which is very low. Thus, both descriptors were maintained in the model.
Fig. 3 shows the studentized residuals (σ) versus the leverage samples plot, and it was used for the identification of outliers. No compound presented residuals higher than 2.5xσ. Only one compound presented leverage higher than the leverage cutoff line, but it can be considered acceptable [^{17}]. Therefore, the model can be considered free of outliers, something which guarantees the maximum possible representation in terms of structure and range of inhibitory activity for the dataset under study.
The model (Equation I) explains 78.324% (R^{2}=0.783) and predicts 61.844% (Q^{2}_{LOO}=0.618) of variance. The predicted values in the crossvalidation step and the residuals are available in the Supporting Information, Table S2. The difference between the values of R^{2} and Q^{2}_{LOO} was 0.165 units. A large difference between R^{2} and Q^{2}_{LOO} exceeding 0.2–0.3 is a clear indication that the model suffers from overfitting [^{18}]. Thus, this difference may be considered acceptable. The F value (32.521) was higher than the corresponding tabled value (p=3 and np1=27) with a 95% confidence interval (α=0.05). The value of PRESS_{val} was smaller than SS_{y}, another indicator of the statistical significance of the prediction [^{16}].
Eq. 1.
pIC50=0.394(SOFT)2.198(EEig02r)+0.014(αxx)+80.105(q10NBO)+11.339(q2NBO)9.218(SsssN(oth))+64.222n=31; R2=0.783; SEC=0.276; F(3,27)=32.521 (cF=2.960); Q2LOO=0.618;SEV=0.342;PRESSval=3.621(SSy=9.491). 
The results obtained from yrandomization [^{19}] analysis and LNO crossvalidation [^{20}] are available in Figs. 4 and 5. The yrandomization aids in verifying the possibility that the explained and predicted variances are due to chance correlation [^{19}]. It can be observed that the results obtained for all randomized models have a bad quality when compared to the original model, because the intercepts are within the acceptable values recommended in literature, i.e., below 0.3 (Fig. 4A) and 0.05 (Fig. 4B). These results indicate that the variance explained by the model was not due to chance correlation.
LNO crossvalidation (Fig. 5) employs smaller training sets than the LOO crossvalidation, and it can be repeated several times, because of the large number of combinations that rise when more than one compound is left out from the training set, once at a time. A QSAR model can be considered robust when the average values of Q^{2}_{LNO} are relatively high and close to Q^{2}_{LOO} [^{19}]. The model obtained in this study has an average Q^{2}_{LNO} (0.604), only 0.014 units lower than Q^{2}_{LOO}. The standard deviation for each “N” (performed in hexaplicate) value is small, with the maximum of 0.055 for Q^{2}_{L4O}.
Some studies show that only externally validated models may be considered realistic and applicable for drug design [^{21}–^{24}]. The real model (II) was obtained after the split of data in training (n=26) and test (n=5) sets. The standardized regression coefficients of each descriptor are −0.579 for EEig01x, 0.599 for SOFT, 0.362 for α_{xx}, 0.149 for q10NBO, 0.322 for q1NBO, and −0.278 for SsssN(oth). The model (II) has statistical parameters similar to those for the auxiliary model (i.e., Eq. 1). Therefore, they can be considered equivalent and can be used in the external validation.
Eq. 2.
pIC50=0.450(SOFT)2.293(EEig01x)+0.013(αxx)+61.930(q10NBO)+14.508(q2NBO)8.637(SsssN(oth))+55.156n=26; R2=0.809; SEC=0.264; F(3,22)=31.089 (cF=3.049); Q2LOO=0.626;SEV=0.340;PRESSval=3.000(SSy=8.026). 
Results obtained for the external validation (Table 2) show that the model has high external prediction power, considering the proposed limits. R^{2}_{pred}, tool used as a measure of the model’s external predictive power, was higher than the adopted threshold (R^{2}_{pred} = 0.641 > 0.5), and the associated error (SEP) with this parameter may be considered low. The GolbraikhTropsha statistics [^{25}, ^{26}] aid to confirm the prediction power of the model. Both values of k and k’ and the relation R^{2}_{0}−R’^{2}_{0} are within acceptable ranges (0.85 ≤ x ≤ 1.15, where x = k or k’, and R^{2}_{0}−R’^{2}_{0} < 0.3).
It can be observed that the obtained model has reasonable internal and external quality. However, it is always desirable to obtain a model that is able to relate the physicochemical properties represented by the selected molecular descriptors to the action mechanism of the system under study [^{27}]. Zhang et al. [^{8}] described the experimental structureactivity relationships of the data set, highlighting the importance of heteroatoms (especially the hydroxil group) to form hydrogen bonds, and π electrons to facilitate interactions with hydrophobic regions of the receptor, and a slight decrease in inhibitory potency with the addition of methoxyl to R_{1} and R_{2}. Furthermore, a docking study indicated that the ester carbonyl (atom #20) could bind with the zinc located in the active site, the lateral chain represented in this paper by R_{3} bind with the S1’ cavity, and the lateral chain attached to the nitrogen bind with the S1 cavity. A representation of the metalloproteinases active site [^{28}, ^{29}] is presented in Fig. 6.
The SOFT, a quantum chemical descriptor, was calculated using the relation SOFT=1/GAP, where GAP is the difference between the energies (calculated at B3LYP/6311(d,p) theory level) of lowest unoccupied molecular orbital and highest occupied molecular orbital (E_{LUMO}−E_{HOMO}). These molecular descriptors are known to be related to molecular reactivity. Generally, softer molecules are more reactive [^{26}, ^{30}]. As the SOFT coefficient is positively correlated to pIC_{50}, this indicates that derivatives with high value for this descriptor will react more easily. The histogram presented in Fig. 7 shows exactly this trend: considering the 16 most active compounds, only four (A2, A3, A7, and A0) have SOFT < 5. The compounds found among the most active have a greater tendency to present many heteroatoms (oxygen and chlorine) and π electrons in the substituent R_{3}, in agreement with the experimental structureactivity relationships discussed by Zhang et al. [^{8}], probably by facilitating the interaction with the enzyme via hydrogen and hydrophobic bonds. Thus, similar to what was proposed by Liu et al. for a set of αglucosidase inhibitors [^{30}], the inhibitory activity would be expected to be improved by introducing more heteroatoms and electrons π in the structure of new derivatives.
The EEig02r, which presents a negative coefficient, is an edge adjacency index, a topological descriptor derived from the edge adjacency matrix, also called bond matrix, which encodes the connectivity between graph edges [^{26}, ^{31}]. In this approach, as in many other graph theoretical representations of chemical structures, the vertices of the molecular graph represent atoms and edges represent bonds in molecules. The edge adjacency index with this weighting scheme is sensitive to the presence of heteroatoms and multiple bonds in the molecule [^{26}]. This class of descriptors can be weighted by several different atomic properties. The most interesting aspect of the presence of a weightedresonance index in the model is that this weighting scheme turns the descriptor more sensitive to the presence of heteroatoms and multiple bonds in the molecule [^{26}]. So, its selection by OPS algorithm may be, again, related to the importance of heteroatoms and π electrons in the R_{3} substituent.
The α_{xx}, calculated in the Marvin 4.1.8 [^{32}] through a method based on the empiric model proposed by Miller and Savchik [^{33}], describes the ability of a molecule to be polarized in the X Cartesian axis. The signal of the coefficient is positive, indicating that the improvement of the polarization in this plane is favorable to the activity. In Fig. 8 it is possible to see that the xaxis always crosses the frontal region of the structures. The size of R_{3} substituent causes a slight shift in the position of the axis, as it can be seen in the compounds C0 (low potent) and C10 (high potent). This information can be related to the interpretation proposed for the SOFT, since the presence of a greater number of heteroatoms and π electrons in R_{3} increase the polarization of this Cartesian axis.
The q2NBO and q10NBO are atomic charges descriptors calculated using the Natural Bond Orbital (NBO) theory. The charges measure the extent of electronic density localization in a molecule. Negative q_{n} values mean that there is excess electronic charge in the atom while positive values mean that the atom is electrondeficient [^{26}]. It is possible to observe that the charge of atom #2 undergoes a slight increase in electron density (see Supporting Information, Table S1) in subsets B and C, probably due to an electron donor effect resulting from the insertion of the methoxyl at positions R_{1} and R_{2}. This effect was more pronounced in the subset B (only R_{2} substituent) than in the subset C (substituents at R_{1} and R_{2}). Interestingly, the compounds of subset A are generally more potent than their corresponding in subsets B and C, which have, in general, higher electron densities in the atom #1. It can be proposed, since the sign of its coefficient is positive, that an electron donor effect caused by the insertion of the methoxyl in the aromatic ring decreases its electron density, hampering the interaction of this group with the S1 site of MMP2. This same effect can be observed, in a less pronounced manner, in the atom #10, the nitrogen of pyrrolidine ring, since the descriptor q10NBO also has a positive coefficient.
The SsssN(oth) is an atom type Estate (electrotopological state) index, and it also corresponds to the nitrogen from the pyrrolidine ring. The Estate formalism considers that each atom or bond has an intrinsic state, which is disturbed by every other atom or bond in the molecule. This state encodes information about the electronic distribution (as a variation caused by all other atoms) and topological aspects (major/minor accessibility of atoms and bonds to the external environment), and how such information can influence intermolecular interactions [^{26}, ^{34}]. Since this descriptor is also related to the atom #10, this indicates that, although the most important point of structural variation for the activity is the R_{3} substituent, other parts of the molecule also influence the activity. The pyrrolidine nitrogen, for example, is close to the ester carbonyl side chain, the binding point with the zinc atom located in the active site of MMP2. The negative coefficient indicates that the decrease of this descriptor is favorable to the activity. Among the dataset, the lowest SsssN(oth) values are in the A subset (Supporting Information, Table S1). This subset has no substituents in R_{1} and R_{2} (Table 1). Thus, it may indicate that these substitutions also affect the intrinsic value of nitrogen, as well as the partial charge descriptor q10NBO, influencing the interactions that this part of the molecule can have with the binding site of MMP2.
Interestingly, the three most important descriptors (EEig02r, SOFT and α_{xx}), considering the standardized coefficients of the real model (Eq. 2), are related exactly to the R_{3} substituents, the main point of structural variation in the dataset, and it is therefore primarily responsible for the variation in inhibitory potency. This result strengthens the importance of hydrogen and hydrophobic bonds to S1' binding site of MMP2, and demonstrates how the manipulation of this characteristic in structurally related compounds can be useful in the design of new cinnamoyl pyrrolidine derivatives able to inhibit MMP2.
The model obtained using the OPS, an algorithm for variable selection, showed a statistically significant internal and external prediction power. In addition, the LNO crossvalidation shows the model is robust, and in the yrandomization test it shows the model does not present chance correlation. The selected descriptors suggest that the presence of heteroatoms, especially, and π electrons in the R_{3} substituent can be important for the binding of compounds to the regions S1’ of the binding site of MMP2, but the handling of electronic distribution in the side chain attached to the pyrrolidinic nitrogen, which binds to the S1 site, can also be exploited for the design of new active derivatives. The manipulation of these features can assist in obtaining new lead compounds that can be useful for developing new drugs used in the chemotherapy for treating aggressive cancers.
Threedimensional structures were built using HyperChem 7 [^{35}] from the structure ZINC40405643, obtained in the ZINC Database (http://zinc.docking.org) [^{36}]. Calculations of MM+ force field were carried out using the same software. The most stable conformations obtained were further optimized at AM1 semiempirical quantum mechanical method, followed by HartreeFock level (HF/631G(d)) and Density Functional Theory (DFT) level (B3LYP/6311G(d,p)) using Gaussian 09 [^{37}]. The DFT/B3LYP was chosen as method for obtaining the geometries and electronic properties because it leads to quite satisfactory results in the analysis with such aims [^{9}, ^{10}].
The SMILES strings [^{38}] of each compound were used to obtain Estate indices in the Parameter Client [^{39}]. The optimized geometries were used to obtain, in the Dragon 3.0 Web Version [^{31}], the following classes of descriptors: constitutional descriptors, functional groups counts, charge descriptors, molecular properties, walk and path counts, information indices, edge adjacency indices, topological charge indices, topological descriptors, connectivity indices, 2D autocorrelations, Burden eigenvalues, and eigenvaluebased indices. The optimized geometries were also used to obtain the electronic descriptors in the Gauss View 5 [^{40}]. Partial charges of the basic structure were calculated by means of two approaches: Mulliken Charges and Natural Bond Orders [^{41}]. In the Marvin 4.1.8 [^{32}], it was obtained the molecular polarizability (α) and the respective vectorial components (α_{xx}, α_{yy} and α_{zz}). After removal of missing, invariants, and quasiinvariants descriptors calculated in the Dragon 3.0, a total of 439 molecular descriptors were available for use.
The partial least squares (PLS), a classical chemometric method, was employed to explore the quantitative relationships between the training set and MMP2 inhibition. In this calibration method, LV are obtained including the dependent variable (in this case, pIC_{50}) in the analysis in such a way that the covariance between the projection of the samples in the new axis system (also orthogonal) and the dependent variable is maximized [^{42}, ^{43}]. For this, descriptors should be preprocessed using the autoscaling scheme (columnwise meancentered and scaled to unity variance). Thus, they can be compared to each other on the same scale.
The step of variable selection in a QSAR study is a way to identify reduced subsets of descriptors that in fact reproduce the observed values of a biological activity, i.e. those that are the most useful to obtain a more accurate prediction model. The use of a good variable selection method helps to obtain the subset to reach an optimal mathematical equation for the prediction of the activity under study and, therefore, simple, robust, and more easily interpretable models [^{44}, ^{45}]. In this study, a twostep procedure was employed: (i) the 439 original descriptors were reduced to 81 by eliminating those that presented the absolute value of Pearson’s correlation coefficient (r) with pIC_{50} lower than 0.3; and (ii) the ordered predictor selection (OPS) algorithm [^{12}–^{15}] was used to select the most important descriptors. OPS is able to build PLS models by rearranging the columns of the matrix in such a way that the most important descriptors, classified according to an informative vector (available options: correlation vector, regression vector and an elementwise product between both), are placed in the first columns. Then, successive PLS regressions are performed with an increasing number of descriptors to find the best model. In this work, the three informative vectors were used. The best models were classified in descending order of statistical quality according to their coefficient of determination of leaveoneout cross validation (Q^{2}_{LOO}) or standard error of cross validation (SEV) values. OPS is implemented in QSAR Modeling [^{11}], a free JAVAbased software developed by the courtesy of the Theoretical and Applied Chemometrics Laboratory’s research group (http://lqta.iqm.unicamp.br).
Several statistical tools (see Supporting Information) are suggested in literature for validation of QSAR models. For the internal quality, the adopted parameters were the coefficient of multiple determination of calibration (R^{2}), standard error of calibration (SEC), Fratio test with a 95% confidence interval (F, α=0.05) Q^{2}_{LOO}, SEV and predictive residual sum of squares of validation (PRESS_{val}) [^{18}]. The adopted limits are R^{2} > 0.6 and Q^{2}_{LOO} > 0.5. SEC and SEV values should be as low as possible. For PRESS_{val}, values should be lower than the sum of squares of the response values (SSy) [^{19}]. Ftest value should be higher than the tabled F value (F_{p}_{,}_{n}_{−}_{p}_{−1}, where n is the number of compounds and p is the number of LV) and the higher the difference between them, the more statistically significant is the model [^{46}].
The robustness of the model was examined through leaveNout (LNO) cross validation, with N=1 to 7. This test was repeated three times for each “N” value. All rows from the data matrix and respective y values were randomized in each step of LNO process. It is expected that the average value of each Q^{2}_{LNO} would be close to Q^{2}_{LOO} (coefficient of multiple determination of leaveoneout cross validation) with standard deviations close to zero [^{21}]. The possibility of chance correlation was tested using yrandomization test, where only the y vector (pIC_{50}) was scrambled 10 times. The approach suggested by Eriksson et al. [^{20}], based on the r between the original vector y and the randomized vectors y, was used to quantify chance correlation. In this approach, two regression lines are built using these correlation coefficients (xaxis) and the R^{2} and Q^{2}_{LOO} values (yaxis). The intercepts of the equations obtained in the linear regression should be lower than 0.3 for R^{2} and 0.05 for Q^{2}_{LOO}.
Once internally validated, the data set was split into training set (n=26) and test set (n=5), generating the real model [^{18}]. The test set was selected manually, in such a way that the entire range of pIC_{50} (6.25 to 8.208, 1.958 logarithmic units) and the structural variations of the data set were well represented. A dendrogram obtained for the complete data set by Hierarchical Cluster Analysis (HCA) [^{47}] (Supporting Information, Fig. S1) aid to confirm that the selected compounds are suitable as test set. Thus, a structurally representative test set could be formed by the compounds B2 (pIC_{50}=6.553), C4 (pIC_{50}=6.696), C5 (pIC_{50}=6.952), C9 (pIC_{50}=7.542), and A0 (pIC_{50}=7.951). The HCA analysis are performed in Pirouette 4 [^{48}].
The parameter coefficient for multiple determination of prediction (R^{2}_{pred}) and standard error of external prediction (SEP) was used as a measure of the predictive power of a QSAR model. The recommended limit is R^{2}_{pred} > 0.5 [^{49}], and SEP values also should be as low as possible. However, this is not enough to guarantee that the model is really predictive. It is also recommended to check: (i) the slopes k or k’ of the linear regression lines between the observed activity (y_{i}) and the predicted activity in the external validation (ŷ_{ei}), where the slopes should be 0.85 ≤ x ≤ 1.15 (x = k or k’); and (ii) the absolute value of the difference between the coefficients of multiple determination, R^{2}_{0} and R’^{2}_{0}, smaller than 0.3 [^{26}, ^{27}].
Notes
This article is available from: http://dx.doi.org/10.3797/scipharm.111221
The MCT/CNPq/Fundaç ão Araucária (www.fundacaoaraucaria.org.br) is acknowledged for financial support (under Protocol 2010/7354).
Values of selected descriptors for each compound are available in Table S1. The results of leaveoneout crossvalidation are available in Table S2. The dendrogram used to aid in the selection of test set is available in Figure S1. Statistics parameters and adopted limits for the evaluation of the quality of the QSAR model are also available as supporting information. These documents are available in the online version (Format: PDF, Size: < 0.1 MB): http://dx.doi.org/10.3797/scipharm.111221.
The author declares no conflict of interest.
References
[1].  Kontogiorgis CA,Papaioannou P,HadjipavlouLitina DJ. Matrix metalloproteinase inhibitors: a review on pharmacophore mapping and (Q)SARs resultsCurr Med ChemYear: 200512339355 http://www.ncbi.nlm.nih.gov/pubmed/15723623. 15723623 
[2].  Pirard B. Insight into the structural determinants for selective inhibition of matrix metalloproteinasesDrug Discov TodayYear: 200712640646 http://dx.doi.org/10.1016/j.drudis.2007.06.003. 17706545 
[3].  Tu G,Xu W,Huang H,Li S. Progress in the development of matrix metalloproteinase inhibitorsCurr Med ChemYear: 20081513881395 http://dx.doi.org/10.2174/092986708784567680. 18537616 
[4].  Griffin MO,Ceballos G,Villarreal FJ. Tetracycline compounds with nonantimicrobial organ protective properties: Possible mechanisms of actionPharmacol ResYear: 201163102107 http://dx.doi.org/10.1016/j.phrs.2010.10.004. 20951211 
[5].  Patrick GL. An Introduction to Medicinal Chemistry4th edOxfordOxford University PressYear: 2009752 
[6].  AlQuntar AA,Baum O,Reich R,Srebnika M. Recently synthesized class of vinylphosphonates as potent matrix metalloproteinase (MMP2) inhibitorsArch PharmYear: 20043377680 http://dx.doi.org/10.1002/ardp.200300828. 
[7].  Li X,Li J. Recent advances in the development of MMPIs and APNIs based on the pyrrolidine plataformsMini Rev Med ChemYear: 201010794805 http://dx.doi.org/10.2174/138955710791608334. 20482497 
[8].  Zhang L,Zhang J,Fang H,Wanga Q,Xua W. Design, synthesis and preliminary evaluation of new cinnamoyl pyrrolidine derivatives as potent gelatinase inhibitorsBioorg Med ChemYear: 20061482868294 http://dx.doi.org/10.1016/j.bmc.2006.09.015. 17008101 
[9].  Ribeiro FAL,Ferreira MMC. QSPR models of boiling point, octanol–water partition coefficient and retention time index of polycyclic aromatic hydrocarbonsJ Mol Struct TheochemYear: 2003663109126 http://dx.doi.org/10.1016/j.theochem.2003.08.107. 
[10].  Molfetta FA,Bruni AT,Rosseli FP,Silva ABF. A partial least squares and principal component regression study of quinone compounds with trypanocidal activityStruct ChemYear: 2007184957 http://dx.doi.org/10.1007/s1122400691203. 
[11].  QSAR Modeling, version 2.0. Theoretical and Applied Chemometrics Laboratory, State University of Campinas, Brazil. http://lqta.iqm.unicamp.br. 
[12].  Teófilo RF,Martins JP,Ferreira MMC. Sorting variables by using informative vectors as a strategy for feature selection in multivariate regressionJ ChemometricsYear: 2009233248 http://dx.doi.org/10.1002/cem.1192. 
[13].  Hernández N,Kiralj R,Ferreira MMC,Talavera I. Critical comparative analysis, validation and interpretation of SVM and PLS regression models in a QSAR study on HIV1 protease inhibitorsChemometr Intell Lab986577 http://dx.doi.org/10.1016/j.chemolab.2009.04.012. 
[14].  Melo EB. Multivariate SAR/QSAR of 3aryl4hydroxyquinolin2(1H)one derivatives as type I fatty acid synthase (FAS) inhibitorsEur J Med ChemYear: 20104558175826 http://dx.doi.org/10.1016/j.ejmech.2010.09.044. 20965618 
[15].  Melo EB. A new quantitative structure–property relationship model to predict bioconcentration factors of polychlorinated biphenyls (PCBs) in fishes using Estate index and topological descriptorsEcotoxicol Environ SafYear: 201275213222 http://dx.doi.org/10.1016/j.ecoenv.2011.08.026. 21959189 
[16].  van de Waterbeemd HPLS for multivariate linear modelingChemometric Methods in Molecular DesignWeinheimWileyVCHYear: 1998195218 
[17].  Gramatica P. Principles of QSAR models validation: internal and externalQSAR Comb ChemYear: 200726694701 http://dx.doi.org/10.1002/qsar.200610151. 
[18].  Kiralj R,Ferreira MMC. Basic validation procedures for regression models in QSAR and QSPR studies: theory and applicationJ Braz Chem SocYear: 200920770787 http://dx.doi.org/10.1590/S010350532009000400021. 
[19].  Eriksson L,Jaworska J,Worth AP,Cronin MTD,McDowell RM,Gramatica P. Methods for reliability and uncertainty assessment and for applicability evaluations of classification and regressionbased QSARsEnviron Health PerspectYear: 200311113611375 http://dx.doi.org/10.1289/ehp.5758. 12896860 
[20].  Melagraki G,Afantitis A,Sarimveis H,Koutentis PA,Markopolus J,IgglessiMarkopoulou O. Optimization of biaryl piperidine and 4amino2biarylurea MCH1 receptor antagonists using QSAR modeling, classification techniques and virtual screeningJ Comput Aided Mol DesYear: 200721251267 http://dx.doi.org/10.1007/s1082200791124. 17377847 
[21].  van de Waterbeemd HStatistical validation of QSAR resultsChemometric Methods in Molecular DesignWeinheimWileyVCHYear: 1998309318 
[22].  Golbraikh A,Tropsha A. Beware of q2!J Mol Graph ModelYear: 200220269276 http://dx.doi.org/10.1016/S10933263(01)001231. 11858635 
[23].  Aptula AO,Jeliazkova NG,Schultz TW,Cronin MTD. The better predictive model: high q2 for the training set or low root mean square error of prediction for the test set?QSAR Comb ChemYear: 200524385396 http://dx.doi.org/10.1002/qsar.200430909. 
[24].  Consonni V,Ballabio D,Todeschini R. 2010. Evaluation of model predictive ability by external validation techniquesJ ChemometricsYear: 201024194201 http://dx.doi.org/10.1002/cem.1290. 
[25].  Golbraikh A,Shen M,Xiao Z,Xiao Y,Lee K,Tropsha A. Rational selection of training and test set for the development of validated QSAR modelsQSAR Comb ChemYear: 200317241253 http://dx.doi.org/10.1023/A:1025386326946. 
[26].  Todeschini R,Consonni V. Molecular Descriptors for Chemoinformatics2th ed1 alphabetical listing. WeinheimWileyVCHYear: 2009967 
[27].  Organization for Economic CoOperation and Development (OECD)Guidance document on the validation of (quantitative) structureactivity relationship [(Q)SAR] models http://www.oecd.org/dataoecd/33/37/37849783.pdf. 
[28].  Cheng M,De B,Almstead NG,Pikul S,Dowty ME,Dietsch CR,Dunaway CM,Gu F,Hsieh LC,Janusz MJ,Taiwo YO,Natchus MG,Hudlicky T,Mandel M. Design, synthesis, and biological evaluation of matrix metalloproteinase inhibitors derived from a modified proline scaffoldJ Med ChemYear: 19994254265436 http://dx.doi.org/10.1021/jm9904699. 10639284 
[29].  Discovery Studio Visualizer, version 2.5.5.9350. Accelrys Software Inc, www.accelrys.com 
[30].  Liu Y,Ke Z,Cui J,Chen W,Ma L,Wang B. Synthesis, inhibitory activities, and QSAR study of xanthone derivatives as αglucosidase inhibitorsBioorg Med ChemYear: 20081671857192 http://dx.doi.org/10.1016/j.bmc.2008.06.043. 18632275 
[31].  Dragon, version web 3.0. Talete srl, www.talete.mi.it 
[32].  Marvin, version 4.1.8. ChemAxon Inc. www.chemaxon.com/marvin 
[33].  Miller KJ,Savchik JA. A new empirical method to calculate average molecular polarizabilitiesJ Am Chem SocYear: 197910172067213 http://dx.doi.org/10.1021/ja00518a014. 
[34].  Devillers J,Balaban ATTopological Indices and Related Descriptors in QSAR and QSPRLondonGordon and BreachYear: 1999491562 
[35].  Hyperchem, version 7.1. Hyper Co. www.hyper.com 
[36].  Irwin JJ,Shoichet BK. ZINC  a free database of commercially available compounds for virtual screeningJ Chem Inf ModelYear: 200545177182 http://dx.doi.org/10.1021/ci049714+. 15667143 
[37].  Gaussian, version 09. Gaussian Inc, www.gaussian.com 
[38].  Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rulesJ Chem Inf Comput SciYear: 1988283148 http://dx.doi.org/10.1021/ci00057a005. 
[39].  Parameter Client. Virtual Computational Chemistry Laboratory. www.vcclab.org/lab/pclient 
[40].  Gauss View Gauss View, version 05. Gaussian Inc, www.gaussian.com 
[41].  Young DC. Computational chemistry: a practical guide for applying techniques to realworld problemsNew YorkWileyInterscienceYear: 2001369 
[42].  Wold S,Sjöström M,Eriksson L. PLSregression: a basic tool of chemometricsChemometr Intell LabYear: 200158109130 http://dx.doi.org/10.1016/S01697439(01)001551. 
[43].  Roy PP,Roy K. On Some Aspects of Variable Selection for Partial Least Squares Regression ModelsQSAR Comb SciYear: 200827302313 http://dx.doi.org/10.1002/qsar.200710043. 
[44].  Ferreira MMC,Montanari CA,Gaudio AC. Variable selection in QSARQuím NovaYear: 200225439448 http://dx.doi.org/10.1590/S010040422002000300017. 
[45].  González MP,Terán C,SaízUrra L,Teijeira M. Variable selection methods in QSAR: an overviewCurr Top Med ChemYear: 2008816061627 http://dx.doi.org/10.2174/156802608786786552. 19075770 
[46].  Gaudio AC,Zandonade E. Proposition, validation and analysis of QSAR modelsQuím NovaYear: 200124658671 http://dx.doi.org/10.1590/S010040422001000500013. 
[47].  Beebe KR,Pell RJ,Seasholtz MB. Chemometrics: a practical guideWileyNew YorkYear: 1998360 
[48].  Pirouette, version 4. Infometrix Inc. www.infometrix.com 
[49].  Roy PP,Leonard JT,Roy K. Exploring the impact of size of training sets for the development of predictive QSAR modelsChemometr Intell LabYear: 2008903142 http://dx.doi.org/10.1016/j.chemolab.2007.07.004. 
Article Categories:
Keywords: Matrix metalloproteinases, MMP2, Gelatinases, Cancer, QSAR, OPS. 
Previous Document: Routing of Biomolecules and Transgenes' Vectors in Nuclei of Oocytes.
Next Document: Pharmacophore Identification and QSAR Studies on Substituted Benzoxazinone as Antiplatelet Agents: k...