Document Detail


The k Partition-Distance Problem.
MedLine Citation:
PMID:  22468708     Owner:  NLM     Status:  In-Data-Review    
Abstract/OtherAbstract:
Abstract Many applications of data partitioning (clustering) have been well studied in bioinformatics. Consider, for instance, a set N of organisms (elements) based on DNA marker data. A partition divides all elements in N into two or more disjoint clusters that cover all elements, where a cluster contains a non-empty subset of N. Different partitioning algorithms may produce different partitions. To compute the distance and find the consensus partition (also called consensus clustering) between two or more partitions are important and interesting problems that arise frequently in bioinformatics and data mining, in which different distance functions may be considered in different partition algorithms. In this article, we discuss the k partition-distance problem. Given a set of elements N with k partitions of N, the k partition-distance problem is to delete the minimum number of elements from each partition such that all remaining partitions become identical. This problem is NP-complete for general k > 2 partitions, and no algorithms are known at present. We design the first known heuristic and approximation algorithms with performance ratios 2 to solve the k partition-distance problem in O(k · ρ · |N|) time, where ρ is the maximum number of clusters of these k partitions and |N| is the number of elements in N. We also present the first known exact algorithm in O(ℓ · 2(ℓ)·k(2) · |N|(2)) time, where ℓ is the partition-distance of the optimal solution for this problem. Performances of our exact and approximation algorithms in testing the random data with actual sets of organisms based on DNA markers are compared and discussed. Experimental results reveal that our algorithms can improve the computational speed of the exact algorithm for the two partition-distance problem in practice if the maximum number of elements per cluster is less than ρ. From both theoretical and computational points of view, our solutions are at most twice the partition-distance of the optimal solution. A website offering the interactive service of solving the k partition-distance problem using our and previous algorithms is available (see http://mail.tmue.edu.tw/∼yhchen/KPDP.html ).
Authors:
Yen Hung Chen
Related Documents :
18824198 - Antibodies to myelin oligodendrocyte glycoprotein are not involved in the severity of c...
22753168 - Malfunction of the siaretron 1000 iper™ hyperbaric ventilator.
21309818 - A closer look at self-reported suicide attempts: false positives and false negatives.
23638018 - Invasion of the red seaweed heterosiphonia japonica spans biogeographic provinces in th...
3083518 - Bilharzia risk in lichtenburg, tvl.
19760448 - Seasonal and episodic lake mixing stimulate differential planktonic bacterial dynamics.
Publication Detail:
Type:  Journal Article    
Journal Detail:
Title:  Journal of computational biology : a journal of computational molecular cell biology     Volume:  19     ISSN:  1557-8666     ISO Abbreviation:  J. Comput. Biol.     Publication Date:  2012 Apr 
Date Detail:
Created Date:  2012-04-03     Completed Date:  -     Revised Date:  -    
Medline Journal Info:
Nlm Unique ID:  9433358     Medline TA:  J Comput Biol     Country:  United States    
Other Details:
Languages:  eng     Pagination:  404-17     Citation Subset:  IM    
Affiliation:
Department of Computer Science, Taipei Municipal University of Education , Taipei, Taiwan, R.O.C .
Export Citation:
APA/MLA Format     Download EndNote     Download BibTex
MeSH Terms
Descriptor/Qualifier:

From MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine


Previous Document:  Structural alignment of RNA with triple helix structure.
Next Document:  Algorithm for Haplotype Inference via Galled-Tree Networks with Simple Galls.