Machine Learning in Quantum Chemistry
This project uses new forms of computational thinking to develop algorithmically tractable approaches to modeling chemical reactions in complex environments
and generalizes these techniques for related scientific computing challenges. Fundamental advances in predictive modeling can have far
reaching impacts on fields such as biology and nanotechnology. For instance, it seems likely that we will soon have the ability to predict protein structure
from genetic sequence, opening up the next great challenge of predicting function from structure. Biological function often involves chemical reactions, for
instance, computing the function of an enzyme requires generating the energy along various plausible pathways through which an enzyme can catalyze a biological
reaction. Such modeling requires computing the energy associated with a particular arrangement of atoms in a molecule. Quantum chemistry has developed a large
set of algorithms for computing this energy, but with a computational cost that increases rapidly with the size of the system. Fortunately, reactions typically occur
in a small locus, the reaction center, of a larger system. The surroundings of the reaction center are important, but primarily through establishment of an environment
for the reactive atoms. This makes it possible to use quantum mechanics (QM) to describe a handful of atoms in the reaction center and molecular mechanics (MM) for the
thousands of atoms in the remainder of the system. QM models the motion of the electrons and is required in the reaction center since breaking and forming chemical
bonds corresponds to rearranging the electronic structure. Molecular mechanics does not include electronic motion explicitly in the computation and instead uses a ball-and-spring model,
in which atoms are charged spheres and chemical bonds are springs. Despite the use of QM only in the reaction center, QM is often the computational bottleneck in hybrid QM/MM computations
due to its high computational cost. It is this high cost of QM that this project seeks to reduce substantially, essentially scaling substantially the reach of the QM/MM models.
An additional challenge of modeling reactions in complex environments is the need to average over the many configurations the system may adopt at non-zero temperature,
to estimate the important contribution of entropy to the free energy. This is typically done through molecular dynamics (MD), which generates a trajectory that gives
the structure of the system as a function of time. Obtaining meaningful free energy profiles requires MD trajectories with millions of time steps. At each of these time steps,
the QM/MM algorithm must be called to generate estimates of the atomic forces.
Quantum chemistry has made great strides in the past few decades and has essentially solved the electronic structure problem for small molecules.
A wide variety of methods are now available with differing reliabilities and with cpu times ranging from seconds to days. The required level of computation varies widely,
both with system and with position along the reaction coordinate. The energy and nature of the transition state (TS), where bonds are partially broken and formed,
is the key feature that establishes the properties of the chemical reaction. Unfortunately, the TS region also requires quantum chemical methods with the highest computational cost,
the most reliable of which are much too costly to invoke at each time step in an MD calculation.
We propose a machine-learning based alternative to calling a high-level quantum chemical algorithm at each step in the MD calculation.
Our approach will first generate detailed quantum chemical data on the electronic structure of the reaction center in a variety of configurations and electrostatic environments
that span those expected to arise during the MD calculation. Machine learning will then be used to extract a low-cost model of the electronic structure of the reactive atoms,
so that this model can be used in a full simulation of the biomolecular reaction. The development and training of this model will rely on multiple levels of quantum chemical theory that vary in both reliability and costs.
Orchestrating these various sources of data is the type of challenge that machine learning is meant to address. Recently, our first paper entitled Using molecular similarity to develop reliable models of chemical reactions in complex environments appeared in JCTC.