Lessons from other fields: How methods in computaional protein design can be applied to neuroscience

 Below is an essay adopted from a lecture about computational protein design. Even though the lecture is not related to neuroscience, there is much to be learned from listening to other fields. Here I will give a short summary of the lecture, and then discuss how ideas used in the protein design field, can be applied to neuroscience. They key is to always have an open mind, and to think outside the box. There is always something to be learned from others if your willing to listen.

Summary 


Dr. Mou presented a very interesting study. He used computational simulations called molecular dynamics [1-3] in order to design specialized proteins with specialized structure and function. Here, Dr. Mou was able to build a homodimer (two copies of the same protein monomer that bind together to create a doublet) that was capable of binding to DNA [1]. Through specifying the binding site of the DNA, He was able to create a DNA nanowire.  These nanowires are constructed using alternating segments of homodimers and DNA fragments.  The Homodimers are constructed in such a fashion as to only bind to specific DNA base pair sequence (TAATTT) [1]. By repeating this sequence twice (TAATTTAATTT) Each DNA fragment can bind to two homodimers. Likewise, each homodimer can bind to two DNA fragments, one fragment for each monomer. Thus, given a large quantity of DNA fragments and monomers, they will self organize into long filamentous DNA nanowires. These wires we observe to be around 200 nm long [1].
    However, in order to construct these proteins, Dr. Mou had to complete several difficult tasks and at each phase of the design process relied on computer simulations to aid his protein design. The first hurdle was finding a suitable protein as a starting point. He chose the Drosophila Engrailed homeodomain ENH, as it was both able to bind to DNA and other proteins. He then proceeded to use a simulator, to modify this protein’s amino acids, until it would bind to itself, thus creating the mutant protein E23P-YFP. [3]
    Unfortunately, just modifying a protein to create homodimers is not sufficient. As often occurs in protein design  these proteins are not stable at room temperature (23 C). Therefore the protein needs to be modified inorder to stabilize it at room temperature, and ideally at temperatures far above room temperature. However, this modification process is incredibly complex requiring testing 10^10 different configurations of proteins. This sort of analysis is only possible using simulations and specially designed algorithms called the FASTER algorithm [4] inside of a molecular simulator known as GROMACs [5].
    Roughly, the algorithm works by changing a single amino acid and calculating its native state (what state a protein will naturally fold into). Figuring out the native state is akin to finding the minimum of the conformational free energy. Thus one can take, a simulator such as GROMACs, initialize the amino acid sequence and measure the final state of the simulation, assuming the amino acid has settled into the conformational free energy minimum. However, because there are 20 amino acids, and different possible locations to change within the sequence of the protein. the number of possible combinations balloons exponentially. Thus the Faster Algorithm [2,3,5] becomes necessary, as it allows one to reject most possible configurations called rotamers [5,6]  by noting that a vast majority of rotamers will not lower the energy, and thus not raise the stability temperature [6]. 
The FASTER algorithm also does not check every possible combination of amino acids, rather it randomly makes one change and tests it, and if it improves the energy keeps it, if not rejects and tries again. This process is called Simulating Annealing [7], and is a very important optimization algorithm that uses randomness to sample the large possibilities of combinations. After using these techniques, Dr. Mou was able to increase E23P-YFP stability temperature from 49 c to 62 C [2].
Thus through this process of computer aided design, Dr. Mou was able to construct a protein and DNA fragments that will self assemble into a DNA nanowire.

Discussion

How can this process of Protein design be useful in my research that focuses primarily on the design of neural circuits? Obviously, the dynamics of protein folding are no longer of primary interest. However, the algorithms used in the simulation  to minimize the search of a vast parameter space are of key interest to my research. Specifically the FASTER algorithm [5] that Dr. Mou made significant use of [1-3] is of prime interest to my research.
 One of the key issues in designing a mathematical model of a biological system is determining what parameters will reproduce an observed experimental effect. Rarely is direct measurement possible, as with the landmark study of the Giant squid Axon, and the now famous Hodgkin Huxley Model [8]. However, the averaging techniques that they used, sometimes can fail spectacularly, resulting in models that are useless, or biologically unrealistic [9].  Thus, as modelers we are forced to search a large unwieldy parameter space [10,11,12]. 
In order to deal with the large parameter space, algorithms like FASTER [5] are employed. While FASTER makes use of rejection of rotameters [7], and obviously neural optimization cannot, as neurons have no rotamers. They still use two underlying techniques. First, they are trying to minimize a function. In protein design they are minimizing the free energy of the system, while in neural parameters we are trying to minimize the distance between the observed experimental variable, and the model’s experimental variable. [11] There are multiple choices to minimize over, but common choices include, the root square mean  (rsqm) of voltage[11] , or the rsqm of multiple voltages [13]. Other options include looking at the rsqm difference in the phase space of voltage and the time derivative of voltage [14], or looking at spike statistics like firing rate, and Interspike Interval [15, 16].  I personally prefer using spike statistics to optimize over, as they tend to reflect the functionality of the network, rather than matching the shape of the spike and hoping that the input-output of the neuron is the same. Still, any metric one chooses to optimize, usually will capture a majority of the dynamics of the experiment. 
Once a metric to optimize has been chosen, then one needs an optimization algorithm. In Dr. Mou’s chosen algorithm FASTER [5] they used simulated annealing [7]. Here, simulated annealing[7,17] starts off with a set of solutions. It computes the function value to optimize every solution. Then it generates a new random model, finds the function value for that model[7,17]. If the model value is better than the average of the new set, it keeps it, if not it disregards it [7,17]. At first, the algorithm uses large differences in the parameters, but as time goes on and on, it uses smaller differences[7,17]. Ideally this allows it to at first sample large regions of parameter space then slowly will relax to the region with the smallest value, and ideally this will be the global minimum [7,17].
However, there are other optimization algorithms that can be used as well. One of the simplest is gradient descent[18]. Here, no randomness is required, only using the set of initial points to calculate the gradient, and moving some step size in that direction. Repeating this method until one settles in some local minimum. However, sometimes this process can be slow, and get stuck in local minima [19]. 
Another great algorithm inspired by biology is called evolutionary algorithms. [20,21] Here is a set of initial parameters, and they are allowed to breed, which is often represented by either inheriting one of the two parent values, or averaging two of the parents together [21]. However, there are mutations injected into the system in order to allow it to explore parameter space. This class of algorithms is very similar to Simulated Annealing.
Still this begs the question, why are optimization algorithms useful in biological modeling, and ultimately in biology? AS Dr. Mou elegantly showed, by using optimization algorithms, one could design a protein to perform special functions, and ideally aid in drug development. By direct analog, Brain-Machine-interfaces use similar optimization techniques to train the machine to interpret neural ensembles to move robotic arms[22-24]. They do this by simulating a neural network inside of a computer, and this network is optimized using algorithms like backpropagation (gradient descent)[25], or simulated annealing[26].
Other projects like the Blue Brain project, where the goal is to simulate an entire microcolumn of neocortex of a rat [27]. To do this, they rely heavily on optimization of parameters using the same simulated annealing and gradient descent detailed above [28-30].  These models can then be used to generate new hypotheses to test in actual rats. Such successful examples include classifying inhibitory neurons by their effect on the dynamics inside of rat neocolumn [31]. 
Thus to briefly summarize, while Dr. Mou’s research is not directly relevant to neuroscience resear
ch, the algorithms he uses are. As well as his style of thinking through projects. By understanding how to implement optimization algorithms and how to utilize them in a biological workflow, I hope that us computational biologists can deepen our understanding of biology, and develop new treatments.Word Count: 1452


Author: Alex White


references:
1. Mou, Y.*; Yu, J. Y.; Wannier, T. M.; Guo C. L.; Mayo, S. L.*; “Computational design of co-assembling protein-DNA nanowires.” Nature. 2015, 525, 230-233. (*these authors are co-corresponding authors.)

2. Mou, Y.; Huang, P. S.; Hsu F. C.; Huang S. J.; Mayo, S. L*.; “Computational design and experimental verification of a symmetric homodimer” Proc. Natl. Acad. Sci. U.S.A. 2015, 112(34), 10714-10719.

3. Mou, Y.; Huang, P. S.; Thomas, L. M.; Mayo, S. L.*; “Using molecular dynamics simulations as an aid in the prediction of domain swapping of computationally designed protein variants” J. Mol. Biol. 2015, 427, 2697-2706.

4. Allen, BD; Mayo, SL (July 30, 2006). "Dramatic performance enhancements for the FASTER optimization algorithm". Journal of Computational Chemistry. 27 (10): 1071–5.

5.  Hess, B;  Kutzner, C; van der Spoel, D;  Lindahl E (2008) “GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation”. J Chem Theory Comput, 4, pp. 435-447

6. Donald, Bruce R. (2011). Algorithms in Structural Molecular Biology. Cambridge, MA: MIT Press.

7.Samish, I; MacDermaid, CM; Perez-Aguilar, JM; Saven, JG (2011). "Theoretical and computational protein design". Annual Review of Physical Chemistry. 62: 129–49.

8. Hodgkin AL, Huxley AF (April 1952). "Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo". The Journal of Physiology. 116

9. Golowasch J, Goldman MS, Abbott LF, Marder E (2002) Failure of averaging in the construction of a conductance-based neuron model. J Neurophysiol 87(2): 1129-1131.

10.Achard P, De Schutter E (2006) Complex parameter landscape for a complex neuron model. PLoS Comput Biol 2(7): e94.

11. Achard P, Van Geit W, LeMasson G (in press) Parameter searching. In: Computational modeling methods for neuroscientists, De Schutter E, ed. Cambridge: MIT Press.

12. Bhalla US, Bower JM (1993) Exploring parameter space in detailed single neuron models: Simulations of the mitral and granule cells of the olfactory bulb. J Neurophysiol 69(6): 1948-1965.

13 Keren N, Peled N, Korngreen A (2005) Constraining compartmental models using multiple voltage recordings and genetic algorithms. J Neurophysiol 94: 3730-3742.

14. LeMasson G, Maex R (2001) Introduction to equation solving and parameter fitting. In: Computational neuroscience: Realistic modeling for experimentalists, De Schutter E, ed. London: CRC Press.

15. Prinz AA, Billimoria CP, Marder E (2003) Alternative to hand-tuning conductance-based models: construction and analysis of databases of model neurons. J Neurophysiol 90: 3998-4015.

16. Prinz AA, Bucher D, Marder E (2004) Similar network activity from disparate circuit parameters. Nat Neurosci 7(2): 1345-1352.

17. Kirkpatrick, S.; Gelatt Jr, C. D.; Vecchi, M. P. (1983). "Optimization by Simulated Annealing". Science. 220 (4598): 671–680.

18. Dimitri P. Bertsekas, Nonlinear Programming, Athena Scientific 1999, 2nd edition, pp. 187.

19. Kiwiel, Krzysztof C. (2001). "Convergence and efficiency of subgradient methods for quasiconvex minimization". Mathematical Programming, Series A. 90 (1). Berlin, Heidelberg: Springer. pp. 1–25.

20. Vikhar, P. A. "Evolutionary algorithms: A critical review and its future prospects". Proceedings of the 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC). Jalgaon, 2016, pp. 261-265.

21. Ferreira, C., 2001. "Gene Expression Programming: A New Adaptive Algorithm for Solving Problems". Complex Systems, Vol. 13, issue 2: 87–129.

22. Lebedev, MA; Nicolelis, MA (2006). "Brain-machine interfaces: past, present and future" (PDF). Trends in Neurosciences. 29 (9): 536–46.

23. Stanley, GB; Li, FF; Dan, Y (1999). "Reconstruction of natural scenes from ensemble responses in the lateral geniculate nucleus" (PDF). Journal of Neuroscience. 19 (18): 8036–42.

24. Nicolelis, Miguel A. L.; Wessberg, Johan; Stambaugh, Christopher R.; Kralik, Jerald D.; Beck, Pamela D.; Laubach, Mark; Chapin, John K.; Kim, Jung; Biggs, S. James; et al. (2000). "Real-time prediction of hand trajectory by ensembles of cortical neurons in primates". Nature. 408 (6810): 361–5.

25. Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). "6.5 Back-Propagation and Other Differentiation Algorithms". Deep Learning. MIT Press. pp. 200–220.

26. Da, Y.; Xiurun, G. (July 2005). T. Villmann (ed.). An improved PSO-based ANN with simulated annealing technique. New Aspects in Neurocomputing: 11th European Symposium on Artificial Neural Networks. Elsevier. doi:10.1016/j.neucom.2004.07.002

27. H.Markram (2006)  The blue brain project.Nat Rev Neurosci. 7, 153-160, 2006

28. S.Druckmann, Y.Banitt, A.Gidon, F.Schürmann, H.Markram, and I.Segev(2007) A Novel Multiple Objective Optimization Framework for Constraining Conductance-Based Neuron Models by Experimental Data, Frontiers in Neuroscience, Vol. 1, Issue 1, 2007

29. S.Druckmann, T.Berger, S.Hill, F.Schürmann, H.Markram, I.Segev (2008) Evaluating automated parameter constraining procedures of neuron models by experimental and surrogate data, Biol Cybern, 99(4-5):371-9, 2008

30.  M. Reimann, E.Muller, S.Ramaswamy, H.Markram:(2015) An Algorithm to Predict the Connectome of Neural Microcircuits. 2015. Frontiers in Neural Circuits 9 2015, 28.

31. 160. Keller, D., Meystre, J., Veettil, R.V., Burri, O., Guiet, R., Schürmann, F., and Markram, H. (2019). A derived positional mapping of inhibitory subtypes in the somatosensory cortex. Front. Neuroanat. 13, 78.

留言