Send email to user phuggins, at andrew[dotter]cmu[dotter]edu.
Office: Mellon Inst 401
Research Interests
During my PhD I worked on algebraic statistics and computational geometry,
with applications in sequence evolution. More recently I've been
thinking about statistics and machine learning in computational biology:
Automated (active) choice of experiments
Learning of metabolic pathways and cis-regulatory modules often requires performing many experiments of the same type, where each experiment tests a single gene/metabolite/etc.
The sequence of experiments which yield a particular logical deduction (or statistical inference) is typically not unique.
Ideally we'd like to use the most cost-efficient sequence of experiments to yield each desired discovery. But a priori we don't know what the discoveries will be.
Actively choosing experiments is like playing the popular game MasterMind. In order to choose which experiment to perform next, results of previous experiments are considered, in order to reduce the amount of redundant information in the experiments. This effectively reduces the number of experiments needed to make discoveries.
Automated active choice of experiments uses statistical machine learning (specifically, active learning ) to actively choose experiments. As more complicated pathways and motif modules are studied, automated choice of experiments offers a promise of greatly reducing time and materials needed to make discoveries. Ultimately, active learning machines might even formulate and test hypotheses which are too complicated to be easily understood by humans.
Multiple hypothesis testing for SNP and microarray analysis
Multiple hypothesis testing (MHT) procedures test collections of null hypotheses about marginal distributions, and maintain Type I error controls without any required assumptions on alternative hypotheses. I am currently applying transfer learning techniques to learn more powerful (application-dependent) rejection regions for MHT procedures in biology. In effect, the learned rejection regions capture suspected structure in alternative hypotheses -- but the suspected structure is not needed to maintain Type I error control.
Combined analysis of heterogeneous biological data
Problems such as transcription factor binding site (TFBS) motif discovery and
clustering of genes based on time series are properly viewed as interdependent: knowledge of TFBS motifs can improve clustering of genes, and vice-versa. An even more clasical example of interdependent biological problems are phylogenetic reconstruction and multiple sequence alignment. I am particularly interested in combining sequence, expression, phylogeny, and regulatory network data in multiple species, to simultaneously understand evolution of systems, and improve inferences in individual species.
Articles
14. Cross species analysis of microarray expression data
Bioinformatics 2009; doi: 10.1093/bioinformatics/btp247
(joint work with Y. Lu and Z. Bar-Joseph)
13. Parametric k-best alignment in preparation
12. Parametric analysis of alignment and phylogenetic uncertainty submitted
(joint work with N. Eriksson and A. Malaspinas)
11. First steps toward the geometry of cophylogeny submitted
(joint work with M. Owen and R. Yoshida)
10.
Selecting universities: personal preference and rankings submitted.
(joint work with L. Pachter)
9. On the optimality of the neighbor-joining algorithm Algorithms for Molecular Biology, Volume 3 (2008)
(joint work with K. Eickmeyer, L. Pachter and R. Yoshida)
8. Towards the human genotope Bulletin of Mathematical Biology, Volume 69, Number 8, (2007), p 2723--2725.
(joint work with L. Pachter, B. Sturmfels)
7. iB4e: A Software Framework for Parametrizing Specialized LP Problems A Iglesias, N Takayama (Eds.): Mathematical Software - ICMS 2006, Second International Congress on Mathematical Software, Castro Urdiales, Spain, September 1-3, 2006, Proceedings. Lecture Notes in Computer Science 4151 Springer 2006, ISBN 3-540-38084-1 (pp. 245-247)
6. Parametric alignment of Drosophila genomes PLoS Computational Biology, Volume 2, Number 6 (2006) p e73.
(joint work with C. Dewey, L. Pachter, B. Sturmfels, and K. Woods)
5. The hyperdeterminant and triangulations of the 4-cube Mathematics of Computation, in press.
math.CO/0602149.
(joint work with B. Sturmfels, J. Yu, and D. Yuster)
4. Fairground game computations Significance, Letters, Volume 2, Issue 2, (June 2005) p92.
(joint with J. B. Kadane and R. Yoshida)
3. A computational study of integer programming algorithms based on Barvinok's rational functions Discrete Optimization, Volume 2, Issue 2, 30 June 2005, Pages 135-144 (joint work with J.A. De Loera, D. Haws, R. Hemmecke, and R. Yoshida.)
2. Three kinds of integer programming algorithms based on Barvinok's rational functions Integer Programming and Combinatorial Optimization: 10th International IPCO Conference. Lecture Notes in Computer Science, Volume 3064, Jan 2004, Pages 244-255
(joint work with J.A. De Loera, D. Haws, R. Hemmecke, and R. Yoshida)
1. Short rational functions for toric algebra and applications Journal of Symbolic Computation, Volume 38, Issue 2, August 2004, Pages 959-973
(joint work with J.A. De Loera, D. Haws, R. Hemmecke, B. Sturmfels, and R. Yoshida)
From CMU
My current advisor, Ziv Bar-Joseph
My fellow Lane Fellows, Arvind Rao and Le Song
Jay Kadane
The Cal days
My PhD advisor, Lior Pachter
My PhD advisor, Bernd Sturmfels
The Fellowship of the Prelim, Nick Bray and Lynn Scow
Dear friend and coauthor, Ruriko Yoshida
Dear friend and coauthor, Anna-Sapfo Malaspinas
Nick Eriksson
Colin Dewey
Kevin Woods
Josephine Yu
Debbie Yuster
The UC Davis days
The graphics guy I'll want to be my best man some day, Tom Slankard
My undergraduate advisor, Jesus De Loera
The other two thirds of the Three Sombreros, Aaron Balog and John Hamilton
Dave Haws
Raymond Hemmecke