Our research interests involve theoretical chemistry, particularly
as it applies to biopolymers, macromolecules, condensed phases,
and disordered systems. We are developing computational methods
for understanding and designing molecular sytems having many
physical and chemical degrees of freedom. In addition,
we use molecular simulation techniques both to study chemical
systems in molecular detail and to test and illustrate our theories.
Most of our research involves applications of statistical mechanics.
Theories for Sequence Ensembles: proteins, protein design,
combinatorial synthesis, and sequence variability

Protein folding spans biology, physics, and chemistry and has
applications to biomedicine and biomaterials. Since proteins are
the direct products of genes, folding is fundamental to the expression
of genetic information in the cell. Experimentally determining
the structures of proteins, however, which is an important part
of understanding their function, remains a time intensive task.
A quantitative and predictive understanding of protein folding
will accelerate the interpretation of genomic information. Folding
is also of fundamental physical interest, since it involves spontaneous
ordering at the molecular scale. With few exceptions, proteins
fold reversibly to unique structures. The three-dimensional folded
structure of a protein is encoded in its sequence of amino acids.
Thus we may be able to predict structure from sequence alone and
to design desired folded structures through careful choice of
sequence. Important goals include determining structure from gene
sequence, re-engineering existing proteins, and crafting new ones.
Using synthetic sequences, features important in protein stability
and folding kinetics may be probed via selective mutations. Once
particular structures can successfully designed, novel functional
proteins can be crafted. Already these ideas are being expanded
beyond the naturally occurring biopolymers and being applied to
nonbiological "foldamers." Folding polymers, both biological
and synthetic, can provide new types of structures and properties
and lead to novel pharmaceuticals, catalysts, and materials.
By synthesizing large numbers of peptide sequences, researchers
can not only enhance their chance of discovering sequences that
fold to a particular structure but also stand to learn more concerning
the properties that foldable sequences share. Recently, combinatorial
methods have become available that allow researchers to synthesize
and keep track of very large numbers of sequences (> 106).
Researchers have found peptides having protein-like properties
in libraries of partially random sequences. From the viewpoint
of designing new proteins and understanding mutational variability,
it would helpful to know beforehand the number of sequences that
are likely to fold to a given structure and what those sequences
might be---at least in some average sense. Given the exponentially
large number of possible sequences of even a small protein having
only 100 residues, about 10130 sequences, obtaining
an understanding of the library of all possible sequences seems
at first glance to be impossible. However, counting large numbers
of configurations is a task well-suited to statistical mechanics.
Using theory and simulation, our group is studying molecular
folding in proteins and polymers. We are developing tools that
estimate not only the number of sequences at different energies
but also the probability that each position in the chain is a
particular amino acid. The theory provides a convenient means
to evaluate combinatorial design strategies and probe the ``designability''
of chosen target structures. This statistical formalism has a
structure very similar to statistical thermodynamics and draws
upon contemporary molecular modeling techniques to estimate the
number and composition of sequences that are likely to fold to
a given three dimensional structure. The theory uses an entropy
formalism, and just as in thermodynamics constraints reduce the
entropy so in our theory constraints can be introduced to focus
combinatorial libraries. Such constraints can be physical, e.g.,
the overall energy of sequences, or synthetic, e.g., the patterning
of amino acid properties. This theory yields the number and composition
of sequences likely to be compatible with a particular structure.
The theory takes as input a given target structure and a many-body
energy (or scoring) function. Because explicit enumeration is
avoided, the properties of an exponentially large number of possible
protein sequences can be addressed. We are using such statistical
methods to investigate folding in simple models of proteins. We
are also using the theory as a guide to understanding the variability
of naturally occurring protein sequences that fold to a common
structure. Via our collaborations with experimental groups, we
are also involved in the design of particular protein architectures
or nonbiological folding polymers.
|

Most probable amino acids in protein L as determined from
statisitical theory. More conserved positions (higher probability)
are in red, less conserved in blue.
|
Back to faculty page