A nonparametric regressionbased linkage scan of rheumatoid factorIgM using sibpair squared sums and differences.  
Jump to Full Text  
MedLine Citation:

PMID: 18466603 Owner: NLM Status: PubMednotMEDLINE 
Abstract/OtherAbstract:

Parametric linkage methods for quantitative trait locus mapping require explicit specification of the probability model of the quantitative trait and hence can lead to misleading linkage inferences when the model assumptions are not valid. Ghosh and Majumder developed a nonparametric regression method based on kernelsmoothing for linkage mapping of quantitative trait locus using squared differences in trait values of independent sib pairs, which is relatively more robust than parametric methods with respect to violations in distributional assumptions. In this study, we modify the above mentioned nonparametric regression method by considering local linear polynomials instead of the NadarayaWatson estimator and squared sums of sibpair trait values in addition to squared differences to perform a genomewide scan of rheumatoid factorIgM levels on sib pairs in the Genetic Analysis Workshop 15 simulated data set. We obtain significant evidence of linkage very close to the quantitative trait locus controlling for RFIgM. We find that the simultaneous use of squared differences and squared sums increases the power to detect linkage compared to using only squared differences. However, because of all the sib pairs are selected for rheumatoid arthritis, there is reduced variance of RFIgM values, and empirical power to detect linkage is not very high. We also compare the performance of our method with two linear regression approaches: the classical HasemanElston method using squared sibpair trait differences and its extension proposed by Elston et al. using meancorrected sibpair crossproducts. We find that the proposed nonparametric method yields more power than the linear regression approaches. 
Authors:

Saurabh Ghosh; P Samba Siva Rao; Gourab De; Partha P Majumder 
Related Documents
:

24611693  Analyzing change at the dyadic level: the common fate growth model. 
Publication Detail:

Type: Journal Article Date: 20071218 
Journal Detail:

Title: BMC proceedings Volume: 1 Suppl 1 ISSN: 17536561 ISO Abbreviation: BMC Proc Publication Date: 2007 
Date Detail:

Created Date: 20080509 Completed Date: 20091215 Revised Date:  
Medline Journal Info:

Nlm Unique ID: 101316936 Medline TA: BMC Proc Country: England 
Other Details:

Languages: eng Pagination: S99 Citation Subset:  
Affiliation:

Human Genetics Unit, Indian Statistical Institute, 203 B,T, Road, Kolkata 700 108, India. saurabh@isical.ac.in 
Export Citation:

APA/MLA Format Download EndNote Download BibTex 
MeSH Terms  
Descriptor/Qualifier:


Comments/Corrections 
Full Text  
Journal Information Journal ID (nlmta): BMC Proc ISSN: 17536561 Publisher: BioMed Central 
Article Information Download PDF Copyright ? 2007 Ghosh et al; licensee BioMed Central Ltd. openaccess: This is an open access article distributed under the terms of the Creative Commons Attribution License (), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. collection publication date: Year: 2007 Electronic publication date: Day: 18 Month: 12 Year: 2007 Volume: 1 Issue: Suppl 1 First Page: S99 Last Page: S99 ID: 2359867 PubMed Id: 18466603 Publisher Id: 175365611S1S99 
A nonparametric regressionbased linkage scan of rheumatoid factorIgM using sibpair squared sums and differences  
Saurabh Ghosh1  Email: saurabh@isical.ac.in 
P Samba Siva Rao1  Email: srao113@yahoo.com 
Gourab De1  Email: bst0210@isical.ac.in 
Partha P Majumder1  Email: ppm@isical.ac.in 
1Human Genetics Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata 700 108, India 
Heritable quantitative characters are precursors of a clinical endpoint trait. Because endpoint traits are usually binary in nature (affected/unaffected) and hence contain minimal information on variation within trait genotypes, it may be statistically more powerful to use a correlated quantitative phenotype for identifying genes for the underlying complex trait. Unlike qualitative or binary traits, which can be characterized completely by allele frequencies and genotypic penetrances, quantitative traits require a stronger layer of modeling: the probability distribution of the underlying trait. Thus, compared to likelihoodbased approaches like variance components [^{1},^{2}], which require explicit specification of the probability distribution of the quantitative trait, nonparametric methods for quanitative trait loci (QTL) mapping are more robust to deviations in distributional assumptions. Ghosh and Majumder [^{3}] developed a nonparametric regression method based on NadarayaWatson kernelsmoothing [^{4},^{5}] for linkage mapping of QTLs using squared differences in quantitative trait values of independent sib pairs. However, studies have shown that information on linkage can be increased by using squared sib pair sums in addition to squared differences [^{6},^{7}]. Moreover, local linear polynomials provide better nonparametric regression fits [^{8},^{9}] compared to the NadaryaWatson estimator. In this study, we modify the nonparametric regression method of Ghosh and Majumder to incorporate squared sums in conjunction with squared differences and use local linear polynomials instead of the NadarayaWatson estimator to perform a genomewide linkage scan of rheumatoid factor (RF)IgM, a quantitative phenotype correlated with rheumatoid arthritis affection status in the simulated data of Genetic Analysis Workshop 15 (GAW15). We evaluate the gain in power by using squared sibpair sums in addition to squared differences. We also compare the performance of our nonparametric method with the classical HasemanElston linear regression method [^{10}] using sibpair squared differences as well as its extension proposed by Elston et al. [^{7}] using sibpair meancorrected crossproducts, which can be expressed as a linear combination of squared differences and mean corrected squared sums.
For our analyses, we used data on RFIgM levels and genomewide information on 730 microsatellite marker loci distributed over the 22 autosomal chromosomes. Our method utilizes marker genotype data on 1500 independent sib pairs and their parents for identitybydescent (IBD) computations. The nonparametric regressions for the linkage scan are based on the RFIgM and IBD data. We performed our analyses on all 100 available replicates.
Suppose y_{ij }denotes the RFIgM of the j^{th }sib in the i^{th }family, i = 1, 2,..., 1500; j = 1, 2; and ?^ip denotes the estimated IBD score for the i^{th }sib pair at an arbitrary point p on the genome. We define U_{i }= (y_{i1 }y_{i2})^{2} and V_{i }= (y_{i1 }+ y_{i2})^{2}. The classical HasemanElston method [^{10}] and its extensions [^{6},^{7}], which involve a linear regression of squared differences (or suitable alternative functions) of sibpair trait values (U_{i }values) on estimated marker IBD scores (?^ip values) are adversely affected by the increase in dominance at the QTL. Thus, a more robust strategy is to estimate empirically the nature of the functional relationship between the two variables.
Following Ghosh and Majumder [^{3}], we assume a nonparametric regression model:
U_{i }= ?(
?^ip 
where ? is a real valued function and e_{i }values are random errors. The functional form of ? is estimated using a kernel smoothing technique [^{6}] with kernel function:
k(x) = 3/4(1  x^{2}), x < 1;
0, otherwise.
Ghosh and Majumder [^{3}] had used the NadarayaWatson estimator for the prediction of U_{i }values. There is now increasing evidence that local polynomials have lower prediction errors [^{6},^{7}] than the NadarayaWatson estimator. We used a local linear polynomial to predict U_{i }as follows: Ui=?^(?^ip)=?j?(?^ip??^jph){?0+?1?^jp}/?j?(?^ip??^jph),,
where h is the "optimal" window length in the kernel smoothing procedure obtained using "leaveoneout" crossvalidation; and ?_{0 }and ?_{1 }are the weighted least squares estimators of the local linear regression of U_{j }on ?^jp with weights as
?(?^ip??^jph)/?i?(?^ip??^jph). 
To assess the significance of our regression, we used a diagnostic measure [^{11}]?=1???=11500{Ui??^(?^ip)}2/??=11500(Ui?U?)2. We note that the proposed measure ? is an analog of R^{2}, the square of the correlation coefficient between the response variable and the explanatory variable, which is used in linear regression as a measure of the proportion of variance of the response variable explained by the explanatory variable. One can evaluate the significance of the observed ? empirically by generating random IBD scores under the null hypothesis of no linkage, while preserving the actual RFIgM values.
There have been suggestions that using squared differences in conjunction with squared sums of sibpair trait values may be a more powerful linkage strategy compared to using squared differences only [^{6},^{7}]. In order to explore this hypothesis, we developed a nonparametric regression strategy combining the two phenotypic functions. For this purpose, we performed an additional nonparametric regression of V_{i }values on ?^ip values using the local linear polynomial estimator as described earlier. In this case, our diagnostic ? is defined as 1??i=11500[{Ui??^1(?^ip)}2+{Vi??^2(?^ip)}2]/??=11500{(Ui?U?)2+(Vi?V?)2}, where ?_{1 }and ?_{2 }are the unknown regression functions of ?^ip corresponding to U_{i }and V_{i}, respectively.
Because the proposed ? statistic does not consider the direction of the relationship between the squared sibpair trait difference and the estimated marker IBD scores, there may be concern of an inflated falsepositive error rate due to a random negative relationship between the variables under the null hypothesis of no linkage. To circumvent this problem, we ensured that the correlation between the variables is negative for each of the marker positions showing significant evidence of linkage. When we considered the squared differences in conjunction with the squared sums, we additionally verified that the correlation between the squared sums and the estimated IBD score is positive at each of the significant markers.
We performed our nonparametric regression analyses on all 22 autosomal chromosomes for all 100 available replications. We compared the results of the nonparametric regression with those of the classical HasemanElston regression using squared differences [^{10}] and its extension proposed by Elston et al. [^{7}] using meancorrected crossproducts. Because the data involved independent sib pairs, the generalized least squares method of Elston et al. [^{7}] reduced to an ordinary least squares analysis. The RFIgM levels were corrected for age, sex, and smoking status using linear regression. The IBD computations were performed using the statistical software MERLIN [^{12}]. The nonparametric regressions were performed at all the marker positions separately using the squared sibpair trait differences only and by combining the squared differences and the squared sibpair trait sums as discussed in the preceding section. We set a pvalue threshold of < 0.001 (based on 1000 MonteCarlo replications under the null hypothesis) to consider a linkage finding to be statistically significant. Since the "answers" were available to us, we considered a linkage peak to be true positive only if both the following criteria were satisfied: it was within a 20 cM window (10 cM on either side) of the true position of a QTL and all other markers within this window provided significant evidence of linkage. Hence, we have assessed the empirical power and the falsepositive error rate based on the proportion of replicates yielding significant linkage peaks.
Based on the proposed nonparametric regression, we obtained a linkage peak (with nominal pvalue < 0.05) at marker STRP11_22 (113 cM) on chromosome 11 in 17 replications using squared differences only and in 31 replications using both the squared differences and the squared sums. All of the other markers within the 20cM window of the position of the QTL have also provided evidence of linkage at level 0.05 for all these replications. Although given a threshold of p < 0.001 for a linkage peak to be statistical significant, the empirical power was only 0.1 when only squared differences were used and 0.23 when both squared differences and squared sums were used, we note that the linkage peak is close to Locus F (115 cM), the QTL controlling RFIgM. However, the major aim of the study, that is, the belief that the combined use of squared differences and squared sums is more powerful than using only the squared sums is validated by our results. We also found that there was no other marker which provided a statistical evidence of linkage at level 0.05 in more than three replications.
When we used the two linear regression approaches [^{7},^{10}] for comparing with the nonparametric method, we found that the linkage peak was also at marker STRP11_22 (113 cM) on chromosome 11 for most of the replications both with squared differences as well as meancorrected crossproducts. However, the number of replications giving significant linkage evidence at a nominal level of 0.05 was only 11 for squared differences and 18 for meancorrected crossproducts. When we used a nominal level of 0.001, the corresponding figures were 6 and 13, respectively. Thus, the nonparametric method was more powerful than the linear regression approach both when only squared differences were used as well as when squared sums were combined with squared differences. A summary of the linkage finding on chromosome 11 using the various methods is provided in Table 1.
Our proposed nonparametric method was able to detect linkage near the QTL controlling for RFIgM level in multiple replicates. The use of the squared sibpair trait sums in conjunction with the squared differences yielded more power to detect linkage compared to using the squared differences only. We also find that the nonparametric regression, which estimates empirically the nature of local relationship between the phenotypic and genotypic variables, is more powerful than the classical HasemanElston regression using squared differences [^{10}] and its extension using meancorrected crossproducts [^{7}], both of which assume a linear relationship between the regression variables. However, for the GAW15 data, the empirical power for the nonparametric regression method at level 0.001 was less than 0.25 even when both the squared differences and sums were used. This may be partially explained by the fact that the RFIgM levels were simulated under a model with high polygenic and nonshared environmental variances. Moreover, all the sib pairs were affected with rheumatoid arthritis. Thus, the analyses on RFIgM were performed on a selected sample with reduced variance, resulting in loss of power. However, the fact that the nonparametric method provided more power than the linear regression method seems to suggest that the nonparametric regression is more robust to selected sampling than the linear regression. This is intuitively expected because the nonparametric regression method does not assume any functional form of the relationship between the variables and hence, implicitly uses the selected nature of the sample in estimating the functional relationship. We are currently carrying out extensive simulations under different degrees of selection to evaluate the loss of power of the nonparametric regression under select conditions.
Currently methods use LOD scores as a diagnostic to evaluate the significance of linkage peaks. Because our proposed kernelsmoothing method is nonparametric, a direct comparison with likelihoodbased LOD scores is not possible. However, if we consider the pvalues of our linkage peaks, we can theoretically obtain the LOD scores which would yield these pvalues. For example, a pvalue < 0.0001 can be attained for a LOD score greater than 3.29, while a pvalue < 0.001 can be attained for a LOD score greater than 2.35. We are currently carrying out extensive simulations to compare the performance of the proposed procedure with existing distributionbased methods.
Finally, we emphasize that a major advantage of our method is that it does not assume any probability distribution for RFIgM levels or any specific functional form of dependence between the regression variables and thus, is robust to violations in underlying model assumptions.
The author(s) declare that they have no competing interests.
This work was supported by the Fogarty International Center, National Institutes of Health through R01 grant TW00660403. The authors are also grateful to Nidhan Kumar Biswas, who helped in implementing some of the computer programs.
This article has been published as part of BMC Proceedings Volume 1 Supplement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at .
References
Amos CI. Robust variancecomponents approach for assessing genetic linkage in pedigreesAm J Hum Genet 1994;54:535–543. [pmid: 8116623]  
Almasy L,Blangero J. Multipoint quantitativetrait linkage analysis in general pedigreesAm J Hum Genet 1998;62:1198–1211. [pmid: 9545414] [doi: 10.1086/301844]  
Ghosh S,Majumder PP. A twostage variable stringency semiparametric method for mapping quantitative trait loci with the use of genomewide scan data on sib pairsAm J Hum Genet 2000;66:1046–1061. [pmid: 10712217] [doi: 10.1086/302815]  
Nadaraya EA. On estimating regressionTheory Probability Appl 1964;10:186–190. [doi: 10.1137/1110024]  
Watson GS. Smooth regression analysisSankhya Ser A 1964;26:359–372.  
Drigalenko E. How sib pairs reveal linkageAm J Hum Genet 1998;63:1243–1245. [doi: 10.1086/302055]  
Elston RC,Buxbaum S,Jacobs KB,Olson JM. Haseman and Elston revisitedGenet Epidemiol 2000;19:1–17. [pmid: 10861893] [doi: 10.1002/10982272(200007)19:1<1::AIDGEPI1>3.0.CO;2E]  
Silverman BW. Density Estimation for Statistics and Data Analysis. 1986London: Chapman and Hall;  
Kundu D,Basu A. Statistical Computing: Existing Methods and Recent Developments. 2004New Delhi: Narosa Publishing House;  
Haseman JK,Elston RC. The investigation of linkage between a quantitative trait and a marker locusBehav Genet 1972;2:3–19. [pmid: 4157472] [doi: 10.1007/BF01066731]  
Ghosh S,Begleiter H,Porjesz B,Chorlian DB,Edenberg HJ,Foroud T,Goate A,Reich T. Linkage mapping of beta 2 EEG waves via nonparametric RegressionAm J Med Genet 2003;118:66–71. [doi: 10.1002/ajmg.b.10057]  
Abecasis GR,Cherny SS,Cookson WO,Cardon LR. Merlinrapid analysis of dense genetic maps using sparse gene flow treesNat Genet 2002;30:97–101. [pmid: 11731797] [doi: 10.1038/ng786] 
Tables
Empirical powers at markers near the QTL for RFIgM on chromosome 11 at level 0.001
Marker  Position  NPD^{a}  NPSD^{b}  HED^{c}  ECP^{d} 
STRP11_21  110 cM  0.10  0.23  0.06  0.11 
STRP11_22  113 cM  0.10  0.23  0.06  0.13 
STRP11_23  117 cM  0.10  0.23  0.06  0.13 
STRP11_24  124 cM  0.09  0.21  0.05  0.09 
^{a}NPD, nonparametric regression using squared differences only
^{b}NPSD, nonparametric regression using both squared sums and squared differences
^{c}HED, HasemanElston regression using squared differences [10]
^{d}ECP, Elston et al. regression using meancorrected crossproducts [7]
Article Categories:
Conference: Genetic Analysis Workshop 15. St. Pete Beach, Florida, USA. 11?15 November 2006. 
Previous Document: Incorporating quantitative variables into linkage analysis using affected sib pairs.
Next Document: Tackling tuberculosis patients' internalized social stigma through patient centred care: an interven...