Document Detail


Clustering with exclusion zones: genomic applications.
MedLine Citation:
PMID:  21051753     Owner:  NLM     Status:  In-Data-Review    
Abstract/OtherAbstract:
Methods for formally evaluating the clustering of events in space or time, notably the scan statistic, have been richly developed and widely applied. In order to utilize the scan statistic and related approaches, it is necessary to know the extent of the spatial or temporal domains wherein the events arise. Implicit in their usage is that these domains have no "holes"-hereafter "exclusion zones"-regions in which events a priori cannot occur. However, in many contexts, this requirement is not met. When the exclusion zones are known, it is straightforward to correct the scan statistic for their occurrence by simply adjusting the extent of the domain. Here, we tackle the more ambitious objective of formally evaluating clustering in the presence of "unknown" exclusion zones. We develop an algorithm for estimating total exclusion zone extent, the quantity needed to correct scan statistic-based inference, using distributional properties of "spacings," and show how bias correction for this estimator can be effected. Performance of the algorithm is assessed via simulation study. We showcase applications to genomic settings for differing marker (event) types-binding sites, housekeeping genes, and microRNAs-wherein exclusion zones can arise through a variety of mechanisms. In several instances, dramatic changes to unadjusted inference that does not accommodate exclusions are evidenced.
Authors:
Mark R Segal; Yuanyuan Xiao; Fred W Huffer
Related Documents :
20179763 - Identifying prototypical components in behaviour using clustering algorithms.
19164063 - Efficient solutions of cardiac membrane models using novel unsupervised clustering algo...
23235453 - Pso algorithm particle filters for improving the performance of lane detection and trac...
Publication Detail:
Type:  Journal Article     Date:  2010-11-04
Journal Detail:
Title:  Biostatistics (Oxford, England)     Volume:  12     ISSN:  1468-4357     ISO Abbreviation:  Biostatistics     Publication Date:  2011 Apr 
Date Detail:
Created Date:  2011-03-23     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  100897327     Medline TA:  Biostatistics     Country:  England    
Other Details:
Languages:  eng     Pagination:  234-46     Citation Subset:  IM    
Affiliation:
Division of Biostatistics and Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, CA 94107, USA,. mark@biostat.ucsf.edu.
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Switching from calcineurin inhibitor-based regimens to a belatacept-based regimen in renal transplan...
Next Document:  Do family interventions improve health?