BLOG
- 9/28/07
Fall semester begins, and the race is on to finish papers and thesis.
I've realized that a good way to look at sequence alignment, HMM, or
general graphical model is as a polynomial function F, where variables
are the parameters that give transition probabilities. F
always evaluates to 1 for valid choices of transition probabilities,
but I'm actually interested in the structure of the polynomial itself.
For example, estimates on parameters induce a term order on F, and
Taylor polynomials of F can be computed quickly via dynamic programming and DP
variants. I think these Taylor polynomials could be quite useful in practice.
- 5/10/07
Professor Rine floored me with the possibility that heritable
chemical landscapes might influence phenotype during development,
even in the second generation;
and that maybe, just maybe, this is providing mechanisms (under selection)
that allow for temporary adaptations. I think that
as a comp bio guy, I'm going
to follow the bio labs much more closely from now on.
- 4/25/07
I've started a collaboration with Nick Eriksson and
Anna-Sapfo Malaspinas (from the Slatkin lab),
applying polytopes to parametric analysis of multiple sequence alignment
and inferred phlyogenetic trees. That got me thinking.
An optimal alignment is (basically) a MAP inference, which is the answer you should give at gunpoint if you have one chance to guess the entire alignment correctly.
Polytopes allow a geometric
study of how summary stats of this best guess will change, as scoring parameters vary.
It's a good message:
``Always practice safe MAP inference and make sure you use a polytope every time.''
On the other hand though, who cares that the MAP was A (or perhaps A')
if the probabilities of A or A' were 0.1%?? Intuitively we like
optimal alignments because we hope they're ``close'' to the
true alignment X somehow, even though we can never know what the true
alignment X was. So why not cut to the chase and start finding statements
about X that we can honestly say we're 95% sure about, in some rigorous sense?
- 4/1/07
New projects now
abound, looking at the polyhedral sides of phylogenetics.
- 3/11/07
I am now officially a
computer scientist out of the closet. And I'm proud!
- 3/9/07
Back from the IMA conference on applications of algebraic geometry
in biology, dynamics and statistics! Met a lot of people and scribbled
in a lot of notebook.
Promising results from ad hoc
use of invariants for testing statistical models. I think it's time
for statistics to serve the algebra now. How many invariants,
and which invariants should
we use to maximize sensitivity/specificity? How do we find the good invariants
quickly? What are the running times? What are the possible trade-offs among
the considerations? Are invariant methods
actually superior to existing methods?
- 2/26/07
I've decided to start investigating algorithms for computing Grobner
bases. Sugar (one lump, double lump) looks very nice, and I still
have some more papers to read. I wonder
if there's something Tarjan-crazy that can be done.
- 2/22/07
I've decided to put up some unpublished notes from class projects.
- 2/18/07
How the heck did a whole month go by with no journal entries...hmm...well
I'm now into the swing of things as a bio 1a math consultant. Submission
to Integer Points... got rejected. Hmpf.
- 1/19/07
A day that will live in infamy: hanuman.math went down, I lost a bunch of code.
A lot of the little utilities I won't miss much, but iB4e has been unmade back to its original release version 0.1.
- 1/9/07
New Orleans joint meetings was interesting, I chummed around with some guys from Caltech there for the interviews convention. Talked to Guiterrez and Hurdal from Florida State, each of whose work I might try to apply to our group's comp bio projects.
- 12/15/06
Finished BioE project on DNA corpus compression and information theory,
and administered and graded final exam for Math 1b. Phew!
- 12/5/06
I've started working with J Yu and B Sturmfels on tropical stuffs
(implicitization and complete intersections) and the arising polytope
and symbolic algebra computations.
- 12/1/06
Finished write up on extracting lattice points from approximate
rational function encodings, for Snowbird Integer Points...
proceedings.
- 11/26/06
Found out multiple restarts can find modestly
better archetypes for Jane's breast cancer data (project with Lior and Jane).
Restarts take too long, so I coded an effective hybrid approach:
- Use lax stopping criteria (big epsilons) and restart many times.
- Choose best approximate optima found in step 1,
and use this as starting point for one long run with stringent stopping criteria
- 11/22/06
Finished reading first half of MacKay's Information, Inference, and Learning Algorithms
. Great stuff. Armed with the information theory I'm now tackling my BioE project,
measuring the information content of genome databases and asking whether it can all be
fit on a personal hard drive.
- 11/18/06
Created supplemental webpage for Human genotope paper