Model-based approaches to analyzing and exploring gene expression data
Prof. Alejandro Murua
Department of Mathematics and Statistics,
University of Montreal, Canada
In this work we present model-based approaches to analyzing microarray gene
expression data. We start by describing a flexible Markov random field approach
to modeling cDNA microarray images. This model allows for the simultaneous
estimation of hybridization and background intensities. An iteration conditional
modes ICM-like algorithm is used to estimate the parameters of the model.
Once the intensities are estimated, exploratory analysis of the group structure
in the microarray data may help explain the function of unknown genes by
relating them to know genes in the same group, or diagnosing a patient according
to the pattern observed in the corresponding gene expression data. We describe
the application of model-based clustering with Gaussian mixtures for the first
task, and of Potts model clustering for the first and second tasks. This
latter model was first proposed by Blatt, Wiseman and Domany (1996) as a general
clustering method. We built on their work and show that Potts model clustering
is linked to kernel K-means and the MNCut methods, and hence it shares their
good performance. We also show that a slightly modified version of both Potts
model clustering and kernel K-mean (a penalized Potts model clustering and
a weighted kernel K-means, respectively), solve the same problem, and introduce
an algorithm, a penalized version of the Wolff algorithm, to uncover the cluster
structure. We also note the link between kernel-based methods and non-parametric
kernel density estimation, and use it to propose several estimates of the kernel
bandwidths that improve the performance of the algorithms.
The advantages of using probabilistic models for exploring group structure in
this kind of data are numerous. Among the most important is the access to the
distribution of the cluster labels, and hence the possibility of drawing
statistical inference on the data cluster structure.
The work on cDNA microarray images was done in collaboration with R. Gottardo,
J. Besag and M. Stephens. The work on Gaussian model-based clustering was done
in collaboration with K. Y. Yeung, C. Fraley, A. E. Raftery and W. L. Ruzzo. The
work on Potts model clustering was done in collaboration with L. Stanberry and
W. Stuetzle.