Computational Protein Design
The focus of the lab has been the coupling of theoretical, computational, and experimental approaches for the study of structural biology. In particular, we have placed a major emphasis on developing quantitative methods for protein design with the goal of developing a fully systematic design strategy that we call "protein design automation." Our design approach has been captured in a suite of software programs called ORBIT (Optimization of Rotamers By Iterative Techniques) and has been applied to a variety of problems ranging from protein fold stabilization to enzyme design.
Enzyme Design
A prominent goal of protein design is the generation of proteins with novel functions, including the catalytic rate enhancement of chemical reactions at which natural enzymes are so efficient. The ability to design an enzyme to perform a given chemical reaction has considerable practical application for industry and medicine. Significant progress has been made at enhancing the catalytic properties of existing enzymes; however, the design of proteins with novel catalytic properties has met with relatively limited success. We have developed and implemented a general computational approach for the design of enzyme-like proteins with novel catalytic activities. In addition to the generation of new catalysts, these methods will allow the exploration of the mechanistic basis of enzymatic activity.
Recently we have been interested in creating a completely novel catalyst for the Claisen rearrangement of chorismate to prephenate. Naturally catalyzed by the chorismate mutases, this reaction offers many desirable features as an early test of enzyme design methods. The reaction, a first-order sigmatropic rearrangement of a single substrate, has neither intermediate steps nor involvement of catalytic groups such as general acids or bases. The reaction has been extensively studied in many contexts—as a rare enzyme-catalyzed pericyclic process, as an essential step in the biosynthesis of aromatic compounds, and as an example of a reaction that occurs through identical mechanisms enzymatically and in solution. Our method of enzyme design involves identifying amino acid sequences likely to bind to the transition-state structure of the chorismate-prephenate rearrangement. As a part of this process, we are testing the ability of our method to predict mutations that enhance the activity of the naturally occurring E. coli chorismate mutase. The computationally designed Ala32Ser mutation results in an enzyme with measurably enhanced activity.
Protein Sequence Evolution
In protein evolution, mutations in the genetic code are subject to selection based on the proteins encoded by the affected genes. Although many different protein sequences map to each folded structure, the mechanism by which natural selection generates these varied sequences remains an open question. It is widely believed that one sequence may evolve from another through a series of single amino acid mutations that maintain the overall folded structure at every step. In this way, each 3D structure is associated with a network of sequences that are connected to each other by energetically neutral point mutations. This neutral network hypothesis is backed by experimental data for folded RNA structures, but direct evidence for the neutral evolution of proteins remains elusive. A method to find neutral pathways between sequences that fold to the same structure could provide information about the evolutionary relationships between proteins. Furthermore, the determination of neutral trajectories through mutation space may shed light on the biophysical ramifications of specific mutations, and might suggest potential improvements to existing protein design strategies. We have developed a computational procedure to find energetically favorable pathways between two proteins that have similar structures and a fixed set of amino acid mutations. Our program randomly generates amino acid mutations that lead from one sequence to the other, and evaluates the energies of the resulting sequences using a fast side chain placement calculation and a physical force field with continuum solvation. We are currently applying this procedure to protein G and protein L, two immunoglobulin-binding proteins displayed on the cell surfaces of certain infectious bacteria to avoid immune recognition by host organisms. These proteins share a common fold topology but less than 20% sequence identity. Our program indicates that there are a large number of potential neutral trajectories between these proteins. We are expressing several proteins along one particular trajectory to assess the extent of agreement between theory and experiment.
Protein-Protein Recognition
Biologically functional proteins often carry out their actions by interacting with other components in the cell, and protein-protein association serves a very important role. Proteins can bind directly to their targets to carry out a function or they can bind specifically to themselves, forming higher-order structures to perform their duties. We are interested in learning how proteins utilize their surface residues to interact with other proteins. We are also curious about the influence protein backbone geometry has on complex formation. Previous efforts in designing protein/protein-binding interfaces have focused on altering binding specificities. These methods fall short, however, when applied to the design of novel binding sites due to difficulties in accurately modeling protein backbones. Our short-term goal is to create novel dimers from monomeric proteins. We developed a special docking algorithm that positions the member protein subunits in plausible configurations with respect to each other using parameters determined from the structures of known protein complexes. The docking procedure treats the proteins as rigid bodies and uses the Fourier correlation theorem and the fast Fourier transform to efficiently search for dimers with the highest interfacial surface complementarities. Using the docked structures as scaffolds for protein design and employing hydrophobic surface residues to drive dimer formation, we demonstrated two successful designs, one heterodimer and one homodimer, using protein G and engrailed homeodomain, respectively as the starting monomeric proteins. The computationally designed dimers were synthesized and characterized using circular dichroism, nuclear magnetic resonance, analytical ultracentrifugation, and X-ray crystallography methods. These results suggest that this strategy can be used to address the protein recognition problem and is generally applicable to creating novel binding sites with compatible binding partners.
Design of Calcium-deficient Calmodulin
Interactions of the calcium (Ca2+) sensor protein calmodulin (CaM) with calmodulin-dependent protein kinase II (CaMKII) are central to the Ca2+ signaling pathways implicated in learning and memory. Ca2+ signals of different magnitude and duration are sensed by CaM, which can bind up to four Ca2+ ions. Ca2+ binding to CaM induces a conformational change within the protein that is essential for recognition and activation of many CaM-regulated proteins including CaMKII. CaMKII activated by Ca2+/CaM phosphorylates a number of downstream protein targets in synapses. The binding of all four Ca2+ ions to CaM is generally believed to be a prerequisite for CaM-induced activation of CaMKII. However, the observed Ca2+ concentrations during the periods of Ca2+ influx into the postsynaptic spine are too low to be consistent with this hypothesis. To investigate whether CaM can activate CaMKII with only two bound Ca2+ ions, we designed two CaM mutants: one that binds Ca2+ ions only at the C-terminal domain (NMUTCWT), and one that binds Ca2+ only at the N-terminal domain (NWTCMUT). In each CaM mutant, the inactivated domain was designed by stabilizing it in the "closed" Ca2+-free conformation, while the other domain was kept intact. Ionization mass spectrometry confirmed the 2:1 Ca2+/CaM stochiometry for the designed mutants. NMUTCWT could activate CaMKII at the low Ca2+ concentrations believed to occur in the postsynaptic density in spines. Our findings show that differential activation of signaling enzymes by partially saturated CaM may contribute to synaptic plasticity's sensitivity to the timing and magnitude of postsynaptic Ca2+ flux and suggests the need to reevaluate the sensitivity of other postsynaptic signaling enzymes to CaM containing less than four bound Ca2+ ions.
Continuum Electrostatic Solvation for Protein Design
Protein design is an exceptionally difficult problem characterized by unique complications. Necessary restrictions such as a fixed protein backbone and discrete side-chain conformations (rotamers) require different considerations of structure-energy relationships than other fields of protein simulation. This structure-energy relationship has been a long-standing focus of our research, which strives to address issues including the identity of the forces that lead to protein stability and the relative strengths of these forces. Until now, damped Coulombic potentials as well as empirical surface area and volume scaling functions have been used to include electrostatic solvation energy in computational protein design calculations. These methods have allowed for the successful design of stable proteins but have been a limiting factor in the rational design of enzymatic activity and molecular recognition, for which polar and charged amino acids are key. To bring protein design energy functions up to date with these challenges, we are investigating more sophisticated continuum models for electrostatic solvation. Two related obstacles to improving electrostatic solvation energy functions are the combinatorial explosion in protein design, which requires energy scores for many side-chains and pairs of side-chains and therefore a very fast energy solver, and the need to calculate energies in one-body (single side-chain) and two-body (pairs of side-chains) terms without any knowledge of the rest of the structure. We are first interested in using fast perturbation methods for two-body terms, allowing for the computationally lengthy numerical solution to the Poisson-Boltzmann equation for a large number of side-chain pairs. We are also testing the speed and accuracy of various analytical Generalized Born methods. Coupled with strategies for approximating a molecular surface during the design calculation, both of these approaches allow us to more accurately describe the energy of a protein's charge distribution in the context of its molecular geometry and surrounding solvent. Such improvements in the electrostatic solvation energy model for protein design will have a significant impact in the areas of enzyme design and molecular recognition.
Acknowledgement of Support: This work was supported by the Ralph M. Parsons Foundation, the Defense Advanced Research Projects Agency (DARPA), an IBM Shared University Research Grant, and the Institute for Collaborative Biotechnologies (ICB).
Publications
Selected Publications
A. Marshall, C. L. Vizcarra, and S. L. Mayo, "One and Two Body Decomposable Poisson-Boltzmann Methods for Protein Design Calculations," Protein Science, 14, 1293-1304, 2005.
M. Shifman and S. L. Mayo, "Exploring the Origins of Binding Specificity Through the Computational Redesign of Calmodulin," PNAS USA, 100, 13274-13279, 2003.
N. Bolon and S. L. Mayo, "Enzyme-like Proteins by Computational Design," PNAS USA, 98, 14274-14279, 2001.
A. Voigt, S. L. Mayo, F. H. Arnold, Z. Wang, "Computational Method to Reduce the Search Space for Directed Protein Evolution," PNAS, 98, 3778-3783, 2001.
A. Marshall and S. L. Mayo, "Achieving Stability and Conformational Specificity in Designed Proteins Via Binary Patterning," J. Mol. Bio., 305, 619-631, 2001.
Shimaoka, J. M. Shifman, H. Jing, J. Takagi, S. L. Mayo, and T. A. Springer, "Computational Design of an Integrin I Domain Stabilized in the Open High Affinity Conformation," Nature Structural Biology, 7, 674-678, 2000.
A. Pierce, J. A. Spriet, J. Desmet and S. L. Mayo, "Conformational Splitting: a More Powerful Criterion for Dead-End Elimination," J. Comp. Chem., 21, 999-1009, 2000.
M. Malakauskas and S. L. Mayo, "Design, Structure and Stability of a Hyperthermophillic Protein Variant," Nature Struc. Bio., 5, 470-475, 1998.
I. Dahiyat and S. L. Mayo, "De Novo Protein Design: Fully Automated Sequence Selection," Science, 278, 82-87, 1997.