Search Results
Results 351 - 400 of 1823
< 3 4 5 6 7 8 9 10 11 12 13 >
Zhou Hui - - 2009
Clustering is one of the most useful tools for high-dimensional analysis, e.g., for microarray data. It becomes challenging in presence of a large number of noise variables, which may mask underlying clustering structures. Therefore, noise removal through variable selection is necessary. One effective way is regularization for simultaneous parameter estimation ...
Rodríguez-Ramilo Silvia T - - 2009
BACKGROUND: The inference of the hidden structure of a population is an essential issue in population genetics. Recently, several methods have been proposed to infer population structure in population genetics. METHODS: In this study, a new method to infer the number of clusters and to assign individuals to the inferred ...
Milagre Selma T - - 2009
Exploratory data analysis is often necessary to evaluate potential hypotheses for subsequent studies such as grouping the data in clusters. In real data sets the occurrence of incompleteness is very common. We propose a method that tolerates missing values for fuzzy clustering using resampling (bootstrapping) and cluster stability analysis. The ...
Park Jinho - - 2009
BACKGROUND: Atrial fibrillation (AFib) is one of the prominent causes of stroke, and its risk increases with age. We need to detect AFib correctly as early as possible to avoid medical disaster because it is likely to proceed into a more serious form in short time. If we can make ...
Cecchi Guillermo A - - 2009
Most fMRI studies use voxel-wise statistics to carry out intrasubject as well as inter-subject analysis. We show that statistics derived from voxel-wise comparisons are likely to be noisy and error prone, especially for inter-subject comparisons. In this paper we propose a novel metric called weighted cluster coverage to compare two ...
Wan Raymond - - 2009
Visualization tools allow researchers to obtain a global view of the interrelationships between the probes or experiments of a gene expression (e.g. microarray) data set. Some existing methods include hierarchical clustering and k-means. In recent years, others have proposed applying minimum spanning trees (MST) for microarray clustering. Although MST-based clustering ...
Kim Eun-Youn - - 2009
BACKGROUND: Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect ...
Yiannakoulias N - - 2009
We examined the geographic variability of information generated from different case definitions of childhood asthma derived from administrative health data used in Alberta, Canada. Our objective was to determine if analyses based on different case ascertainment algorithms identify geographic clusters in the same region of the study area. Our study ...
Ghanbari Yasser - - 2009
Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a ...
Liu Sandra S - - 2009
This paper aims to provide an example of how to use data mining techniques to identify patient segments regarding preferences for healthcare attributes and their demographic characteristics. Data were derived from a number of individuals who received in-patient care at a health network in 2006. Data mining and conventional hierarchical ...
Gerlinger Christoph - - 2009
BACKGROUND: The aim of this paper is to empirically identify a treatment-independent statistical method to describe clinically relevant bleeding patterns by using bleeding diaries of clinical studies on various sex hormone containing drugs. METHODS: We used the four cluster analysis methods single, average and complete linkage as well as the ...
Yu Lei - - 2009
Feature selection is effective in selecting predictive gene sets for microarray classification. However, the large number of predictive gene sets and the disparity among them presents a challenge for identifying potential biomarkers. To facilitate biomarker identification, we present a new data mining task, feature cluster selection, which selects from a ...
Zahoránszky László A - - 2009
BACKGROUND: Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new ...
Lacey Michelle R - - 2009
Changes in cytosine methylation at CpG nucleotides are observed in many cancers and offer great potential for translational research. Diseases such as ovarian cancer that are especially challenging to diagnose and treat are of particular interest, and abnormal methylation in the tandem repeats Sat2 and NBL2 has been observed in ...
Xia Jing - - 2009
In this paper, we propose a model based clustering method for functional magnetic resonance imaging (fMRI) data to detect the functional connectivity network. The Potts model, which represents spatial interactions of neighboring voxels, is introduced to integrate the temporal mixture regression modeling into one single unified model. The estimation of ...
Menor S A - - 2009
Observations, experiments and simulations often generate large numbers of snapshots of configurations of complex many-body systems. It is important to find methods of extracting useful information from these ensembles of snapshots in order to document the motion as the system evolves in time. Some of the most interesting information is ...
Sinnott-Armstrong Nicholas A - - 2009
Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where ...
Li Ai - - 2009
Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene ...
Novotny Vladimir - - 2009
Advanced computerized methods and models of retrieving knowledge from large multiparameter data bases were used to analyze data on fish and macroinvertebrate composition (metrics), habitat, land use and water quality. The research focused on the north central and northeastern United States and involved thousands of sites monitored by the state ...
Focardi Stefano - - 2009
BACKGROUND: L??vy flights are random walks, the step lengths of which come from probability distributions with heavy power-law tails, such that clusters of short steps are connected by rare long steps. L??vy walks maximise search efficiency of mobile foragers. Recently, several studies raised some concerns about the reliability of the ...
Salimi-Khorshidi Gholamreza - - 2009
In neuroimaging cluster-based inference has generally been found to be more powerful than voxel-wise inference. However standard cluster-based methods assume stationarity (constant smoothness), while under nonstationarity clusters are larger in smooth regions just by chance, making false positive risk spatially variant. Hayasaka et al. proposed a Random Field Theory (RFT) ...
Chung Fran?ois - - 2009
Model-based image segmentation requires prior information about the appearance of a structure in the image. Instead of relying on Principal Component Analysis such as in Statistical Appearance Models, we propose a method based on a regional clustering of intensity profiles that does not rely on an accurate pointwise registration. Our ...
Zhang Mingrui - - 2009
One of the major challenges in unsupervised clustering is the lack of consistent means for assessing the quality of clusters. In this paper, we evaluate several validity measures in fuzzy clustering and develop a new measure for a fuzzy c-means algorithm which uses a Pearson correlation in its distance metrics. ...
Karpievitch Yuliya V - - 2009
Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data ...
Wen Shihua - - 2009
A semiparametric density ratio method which borrows strength from two or more samples can be applied to moving window of variable size in cluster detection. The method requires neither the prior knowledge of the underlying distribution nor the number of cases before scanning. In this paper, the semiparametric cluster detection ...
Ballesteros Sébastien - - 2009
The recurrence of influenza A epidemics has originally been explained by a "continuous antigenic drift" scenario. Recently, it has been shown that if genetic drift is gradual, the evolution of influenza A main antigen, the haemagglutinin, is punctuated. As a consequence, it has been suggested that influenza A dynamics at ...
Kulldorff Martin - - 2009
Temporal, spatial and space-time scan statistics are commonly used to detect and evaluate the statistical significance of temporal and/or geographical disease clusters, without any prior assumptions on the location, time period or size of those clusters. Scan statistics are mostly used for count data, such as disease incidence or mortality. ...
Gurry Thomas - - 2009
Ras GTPases are lipid-anchored G proteins, which play a fundamental role in cell signaling processes. Electron micrographs of immunogold-labeled Ras have shown that membrane-bound Ras molecules segregate into nanocluster domains. Several models have been developed in attempts to obtain quantitative descriptions of nanocluster formation, but all have relied on assumptions ...
Dai Xiaofeng - - 2009
Cluster analysis has become a standard computational method for gene function discovery as well as for more general explanatory data analysis. A number of different approaches have been proposed for that purpose, out of which different mixture models provide a principled probabilistic framework. Cluster analysis is increasingly often supplemented with ...
Kaever Alexander - - 2009
A central goal of experimental studies in systems biology is to identify meaningful markers that are hidden within a diffuse background of data originating from large-scale analytical intensity measurements as obtained from metabolomic experiments. Intensity-based clustering is an unsupervised approach to the identification of metabolic markers based on the grouping ...
Duan Weisi - - 2009
We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort. We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples except for a single ambiguous word. The ...
Xing Jian - - 2009
The Centers for Disease Control and Prevention's (CDC's) BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies ...
Göker Markus - - 2009
Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often ...
Marttinen Pekka - - 2009
The versatility of DNA copy number amplifications for profiling and categorization of various tissue samples has been widely acknowledged in the biomedical literature. For instance, this type of measurement techniques provides possibilities for exploring sets of cancerous tissues to identify novel subtypes. The previously utilized statistical approaches to various kinds ...
Savage Richard S - - 2009
Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. We present an R/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering ...
Coory Michael D - - 2009
The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a ...
Ma Jinhui - - 2009
Cluster randomized trials (CRTs) are increasingly used to assess the effectiveness of interventions to improve health outcomes or prevent diseases. However, the efficiency and consistency of using different analytical methods in the analysis of binary outcome have received little attention. We described and compared various statistical approaches in the analysis ...
Arshadi Niloofar - - 2009
In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification ...
Wang Xiaogang X Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. - - 2009
In this paper, we propose a new nonparametric Bayesian framework to cluster white matter fiber tracts into bundles using a hierarchical Dirichlet processes mixture (HDPM) model. The number of clusters is automatically learnt from data with a Dirichlet process (DP) prior instead of being manually specified. After the models of ...
Ganesalingam Jeban J Department of Clinical Neuroscience, MRC Centre for Neurodegeneration Research, King's College London, Institute of Psychiatry, London, United - - 2009
Amyotrophic lateral sclerosis (ALS) is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for ...
Manor Alon - - 2008
Recent studies of cluster distribution in various ecosystems revealed Pareto statistics for the size of spatial colonies. These results were supported by cellular automata simulations that yield robust criticality for endogenous pattern formation based on positive feedback. We show that this patch statistics is a manifestation of the law of ...
Elgmati E - - 2008
Recurrent incidence of infant diarrhoea is studied, using daily data collected in Salvador, Brazil, from 754 children over 455 days. Aalen's additive intensity model is taken as the basis of the modelling strategy and a frailty extension is proposed. The idea is to estimate the frailty dynamically as time proceeds ...
Li Kaiming - - 2009
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing attention of neuroscientists and computer scientists, since it opens a new window to explore functional network of human brain with relatively high resolution. A variety of methods for fcMRI study have been proposed. This paper intends ...
Fournier A C - - 2009
The sol-gel transformation of aqueous solutions of aluminium ions into aluminium (oxy)hydroxides induced by the addition of a 'soft base'-'Tris-buffer' (pK(a)=8.2) has been investigated using monotonous single-batch titrations and a combination of four complimentary techniques for monitoring pH, conductivity, viscosity and ultrasound parameters (velocity and attenuation). The multi-probe monitoring of ...
Chu Chia-Wei - - 2008
Standardization is used to ensure that the variables in a similarity calculation make an equal contribution to the computed similarity value. This paper compares the use of seven different methods that have been suggested previously for the standardization of integer-valued or real-valued data, comparing the results with unstandardized data. Sets ...
Hong Yi - - 2009
The sensitivity of the constrained K-means clustering algorithm (Cop-Kmeans) to the assignment order of instances is studied, and a novel assignment order learning method for Cop-Kmeans, termed as clustering Uncertainty-based Assignment order Learning Algorithm (UALA), is proposed in this paper. The main idea of UALA is to rank all instances ...
Priestle John P - - 2009
This report describes a computer program for clustering docking poses based on their 3-dimensional (3D) coordinates as well as on their chemical structures. This is chiefly intended for reducing a set of hits coming from high throughput docking, since the capacity to prepare and biologically test such molecules is generally ...
Bunting, Peter
In mixed-species forests of complex structure, the delineation of tree crowns is problematic because of their varying dimensions and reflectance characteristics, the existence of several layers of canopy (including understorey), and shadowing within and between crowns. To overcome this problem, an algorithm for delineating tree crowns has been developed using ...
Xiong Hui - - 2009
K-means is a well-known and widely used partitional clustering method. While there are considerable research efforts to characterize the key features of the K-means clustering algorithm, further investigation is needed to understand how data distributions can have impact on the performance of K-means clustering. To that end, in this paper, ...
Andreopoulos Bill - - 2009
A challenge involved in applying density-based clustering to categorical biomedical data is that the "cube" of attribute values has no ordering defined, making the search for dense subspaces slow. We propose the HIERDENC algorithm for hierarchical density-based clustering of categorical data, and a complementary index for searching for dense subspaces ...
< 3 4 5 6 7 8 9 10 11 12 13 >