Early Habitable Environments and the Evolution of Complexity Principal Investigator - David J. Des Marais

Origins of Functional Proteins and the Early Evolution of Metabolism

Andrew Pohorille, Lead Co-Investigator

Co-Investigators: Michael Wilson, Burckhard Seelig
Researcher: Chenyu Wei
Collaborator: James Lake

OBJECTIVE 5: Characterize the emergence of catalytic functionality of macromolecules from random polymer sequences (as a critical requirement of emerging biological systems) and investigate the coupled early evolution of catalytic functions and metabolism in order to capture and elaborate fundamental processes that lead to the origins of life from a landscape of possibilities.

Objectives, Expected Significance,
and Extending the State of Knowledge

We address Objective 5 through a series of investigations devoted to the emergence of catalytic functionality of macromolecules from random polymer sequences and early, coupled evolution of catalytic functions and metabolism. By doing so, we aim at identifying a number of critical requirements for emerging biological systems that, so far, have been only sparingly explored. Our investigations will be conducted using novel molecular biology and computer simulation methods.

In contrast to the bulk of previous studies on the origins of life, which concentrated on prebiotic chemistry and information molecules, we focus specifically on the raise of complexity associated with metabolism - a network of chemical reactions that supports self-maintenance, growth, reproduction and evolution of cells – and macromolecules that catalyze these reactions. This focus is motivated by the fact that metabolism can be considered as the essence of life. Both metabolism and macromolecular catalysts possess the traits of complex, emergent systems, and thus it is fruitful to consider them from this perspective. Although the complexity of a biological macromolecule or system can be described in a number of different ways, such as genetic or structural information, we will concentrate on complexity associated with functions of protobiological macromolecules and systems. The concept of functional biocomplexity, in comparison to other types of complexity, has been recently discussed by Hazen et al. (2007).

To examine functional complexity, we focus on proteins, which are the main structural and functional agents in the cell. Although today's biology is dominated by proteins it is generally believed that this form of life was preceded by a "RNA World" in which RNA enzymes (ribozymes) facilitated all necessary protobiological metabolic processes. A plausible scenario for the transition from the RNA World to the protein-dominated world involves the evolution of ribozymes that were able to synthesize proteins in a non-coded fashion, followed by the coded evolution of proteins of progressively increasing length. An alternative hypothesis posits that amino acids assembled into the first proteins without the help of RNA, prior to genetically encoded protein synthesis.

Both scenarios rely on the emergence of proteins with evolutionarily advantageous properties from a population of polymers with random amino acid sequences. This immediately raises a number of fundamental questions. Can functionality emerge from random sequences of proteins? How does the initial repertoire of functional proteins diversify to facilitate new functions? Does this diversification proceed primarily through drawing novel functionalities from random sequences or through evolution of already existing proto-enzymes? Can the same reaction be catalyzed by multiple proto-enzymes of independent origins and, if so, what are the criteria for evolutionary selection among these proteins? Did protein evolution start from a pool of proteins defined by a 'frozen accident' and other collections of proteins could start a different evolutionary pathway? What is the relationship between randomness at the molecular level and emergent biochemical properties? To what extent was the emerging early metabolism constrained by the laws of chemistry? How do the concepts of biological evolution apply to protobiological systems?

So far, many of these questions have been in the realm of educated guesses and indirect reasoning. For the first time, we plan to tackle a number of them experimentally. To do so we will use a technique of in vitro protein evolution (Roberts and Szostak, 1997, Keefe and Szostak, 2001, Seelig and Szostak, 2007). This will be complemented with advanced computational modeling of biochemically plausible, early metabolic systems. Our research plan consists of the four closely related investigations that address the following:

Investigation 5.1

The emergence of catalytic macromolecules from populations of polymers with random sequences

Investigation 5.2

Existence of multiple, structurally unrelated proto-enzymes capable of catalyzing the same chemical reaction

Investigation 5.3

The capability of early proteins to acquire new functions during evolution through a small number of mutations in their sequence

Investigation 5.4

Early, coupled evolution of metabolic networks and proto-enzymes

These investigations and specific problems that they address bear directly on perhaps the most fundamental question about the origins of life – are processes that led to the emergence of life predictable? One view holds that the origin of life is an event governed by chance, and the result of so many random events is unpredictable. This view was eloquently expressed by Monod (1971) in his book "Chance or Necessity", in which he argued that life is a product of "nature's roulette." If one accepts this view, understanding the emergence of biocomplexity on Earth and identifying habitable environments elsewhere in the universe are impossible goals. In an alternative view, the origin of life is considered a deterministic event. Its details need not be deterministic in every respect, but the overall behavior is predictable. An elegant exposition of this view can be found in Morowitz (1992). In our proposal we take a perspective that bridges these two apparently disparate views. The processes underlying the emergence of life are stochastic and, therefore, can be described only in probabilistic terms. However, their outcome is predictable, although not in full detail (Pohorille, 2008).

Primordial Catalytic Proteins from Random Protein Sequences
(Investigation 5.1)

Aim and significance

We address the fundamental question of whether enzymatic activity can arise from completely random sequences of amino acids. For the first time, we will carry out the selection of specific enzymes from such a population and explore the probability of finding an enzyme in a mixture of random peptides. We will compare this probability with the likelihood of finding ribozymes performing the same functions in random nucleic acid libraries. We will also compare the efficiencies of the protein and RNA enzymes. These investigations are essential to understanding the emergence of complexity in protobiological systems. In most theories it is assumed that polymers with random sequences were the original source of macromolecular catalysts in biological systems. This implies that the probability of finding a catalyst among such polymers, compared to the number of polymers that might have existed in the primordial environment, is not negligibly small. Otherwise one would have to assume the existence of yet unknown mechanisms and the corresponding environmental conditions that would bias the initial population of macromolecules towards those that were functionally active. Comparisons between protein and RNA enzymes are highly informative because previous assessments suggest that the frequency of finding ribozymes among random RNA sequences was sufficient to seed the RNA world (Paul and Joyce, 2004). Such comparisons will also shed light on a hypothesis that the emergence of the dual-polymer world was due to high efficiency of even very simple protein enzymes compared to their RNA counterparts.

Experimental design

We have recently developed a method for isolating enzymes from mixtures containing a vast number of random peptides. This technique allows us to select specific enzymes from a collection of more than 1012 individual protein sequences in a single test tube (Seelig and Szostak, 2007). In this mRNA display technology, based on an in vitro selection method developed by Roberts and Szostak (1997), polypeptides are covalently linked to their encoding mRNA. The approach has been used to evolve de novo proteins that can bind a specific ligand from both random-sequence libraries (Keefe and Szostak, 2001) and libraries based on a known protein structure (scaffold) (Cho and Szostak, 2006). Recently, we extended the mRNA display method from a selection of ligand-binding proteins to a selection of enzymes. A general scheme for this process is shown in Figure 1. The method relies purely on the very high diversity of the protein library and uses product formation as the sole selection criterion. We used this approach to isolate novel enzymes (RNA ligases), capable of joining two fragments of RNA (oligonucleotides) into a single RNA chain, from a library based on a protein scaffold containing two "zinc finger" structural motifs with randomized loop regions (Seelig and Szostak, 2007).

We will use the same approach to select new RNA ligase enzymes, this time from a fully random library. First, we will transcribe and translate the respective synthetic DNA library to generate mRNA-displayed proteins, which we will then reverse transcribe with a primer joined to one of the two RNA substrates to be ligated. We will incubate the library of mRNA-displayed proteins with the biotinylated second RNA substrate and the complementary splint oligonucleotide that aligns the two substrate oligonucleotides. Proteins that catalyze the ligation of the two substrates will covalently attach the biotin moiety to their own cDNA, and will be captured on streptavidin-coated agarose beads. We will amplify the cDNA by polymerase chain reaction (PCR) and use it as input for the next round of selection and amplification. Over several rounds, the fraction of the input library immobilized on the streptavidin beads should increase significantly. We can further increase the catalytic activity through in vitro evolution by introducing mutations during the PCR amplification step. At the end of this procedure, we will clone and sequence the obtained cDNA and thereby isolate individual ligase enzymes.

As preliminary work, we have already synthesized the pool of DNA molecules for this selection, and transcribed and translated this pool into a library of mRNA-displayed proteins. We have optimized our selection protocol for best performance with this library. Our random protein library is based on a library that contains 80 random amino acid positions and that already gave rise to the novel ATP-binding proteins (Keefe and Szostak, 2001). We have changed the 5'- and 3'-constant regions of this library so that it is compatible with technological improvements of the mRNA-display method and also to avoid the possibility of a cross-contamination of the random library by previously selected scaffold-derived ligases.

Once we isolate the new random-derived ligase enzymes, we will express them in E. coli to produce sufficient quantities for biochemical characterizations. We will measure their kinetic characteristics and compare their catalytic efficiencies with those of previously selected ribozymes catalyzing the same reaction (Bartel and Szostak, 1992). From the sequencing data we will determine the frequency of ligase enzymes in the random peptide library, which we will then also compare to the frequency of ribozymes in random RNA libraries.

Multiple Origins of Enzymatic Function
(Investigation 5.2)


We have previously demonstrated that RNA-ligase enzymes can by selected from a protein library based on a zinc finger scaffold (Seelig and Szostak, 2007). The new selection of enzymes from random sequences, will provide us with another class or, more likely, several classes of enzymes for the same reaction. We will compare the scaffold-derived and the random-derived enzymes that have emerged from completely independent origins, yet possess the same enzymatic function. By doing so, we will provide the first demonstration and characterization of multiple origins of a single enzymatic function. To date, the only, protobiologically relevant examples of multiple proteins that perform the same function are four families of proteins with unrelated amino acid sequences capable of binding adenosine triphosphate (ATP), which were selected from random libraries in the previous period of our NAI-funded grant (Keefe and Szostak, 2001).

A possible existence of multiple enzymes capable of catalyzing the same chemical reaction distinguishes protobiological systems from contemporary cells. In contemporary cells, a given function is generally catalyzed by a single enzyme or a family of closely related enzymes. The initial variety of proteins in protobiological systems was likely to be evolutionarily advantageous because it provided a broader repertoire of protein structures. This repertoire, in turn, enabled the selection of enzymes with an improved primary function or even the acquisition of new functions. Among these proto-enzymes, only those that were the best "fit" survived. However, it is not clear what the main criteria for fitness were. This investigation is aimed at identifying some of these criteria.

Experimental design

The initial structural characterization of the scaffold-derived ligases yielded unexpected results. Although the starting library was based on scaffold containing two zinc fingers, sequence analysis of the selected ligases revealed that the original scaffold had been destroyed during the selection and evolution process. Up to half of all cysteines coordinating the two zinc ions in the original scaffold were lost and, in some ligases, entire helices were deleted. These findings demonstrate the power of our selection technology and suggest that ligases can be evolved without a scaffold, using a totally random library.

Experiments to determine the structure of the scaffold-derived ligases are underway. Preliminary biophysical studies suggest that the ligase possesses a folded structure. For further characterization we chose a ligase that exhibits high solubility (up to 0.3 mM) over an extended period of time. Circular dichroism (CD) spectroscopy revealed an α–helical component of the secondary structure, and thermal denaturation indicated cooperative thermal unfolding. We are currently working on obtaining crystals for X-ray crystallography. At the same time, the high solubility allows us to pursue high-resolution structural investigations by nuclear magnetic resonance (NMR). The two-dimensional 1H15N heteronuclear single-quantum coherence (HSQC) NMR spectrum obtained for the ligase shows that its significant portion is well folded into a compact, three-dimensional structure. A similar HSQC experiment with selectively 15N-cysteine labeled protein suggests that all six cysteines of this protein are well structured. At the end of this investigation we expect to determine the complete three-dimensional structure of the selected protein. This was already accomplished for the ATP-binding protein using both X-ray crystallography (Lo Surdo et al., 2004) and NMR (Mansy et al., 2007).

To characterize the structure of the random-derived ligase enzymes we will carry out experiments similar to those described above for the scaffold-derived ligases. If the stability or solubility of the selected ligase enzymes is not sufficient for a comprehensive structure determination, presumably due to the low thermodynamic stability of the folded state, we will further evolve the proteins towards higher stability. This will be achieved through carrying out a selection under denaturing conditions such as the presence of a denaturing agent (guanidinium hydrochloride) or at elevated temperatures. A selection in the presence of guanidinium hydrochloride has already been used successfully to yield variants of the ATP-binding protein with increased stability and solubility (Chaput and Szostak, 2004). We have already synthesized a modified set of RNA substrates that will allow us to select ligases under denaturing conditions. We have shown that those new substrates are compatible with the selection method and a selection for higher stability of the scaffold-derived ligases is underway.

Anticipated results and their significance

Once we gain sufficient structural information about the proteins selected from different sources, we will compare their biochemical properties, such as efficiency, selectivity, reaction mechanisms and the involvement of metal ions in the structure. A key issue that we intend to address is whether the presumably different protein structures imply different mechanisms of action or, alternatively, the mechanism of catalysis remains the same even though the reaction is carried out by structurally and evolutionary unrelated macromolecules. These two possibilities have vastly different interpretations in terms of the early evolution of biological complexity. The former means that selection for the most fit reaction mechanisms is a likely criterion during evolution. The latter implies that mechanisms are defined and constant, subject to thermodynamic and kinetic constraints, and only structures that realize these mechanisms undergo evolution, e.g., towards higher efficiency and selectivity. Consequently, depending on which of these possibilities is correct, fitness of proto-enzymes should either be evaluated on the basis of the efficiencies of different mechanisms that they implement, or on the degree of adaptation to carry out a given, fixed mechanism.

Evolvability of Early Proteins
(Investigation 5.3)


As the metabolic complexity of protobiological systems increased, so did the functional diversity of macromolecules carrying out metabolic reactions. Novel catalysts might have emerged from random sequences, as discussed above, but other, more efficient mechanisms might have been at play. In particular, evolutionary flexibility may have allowed those early proteins to improve their catalytic efficiency, or to alter their substrate specificity in response to a small number of mutations. This mechanism has been demonstrated for a number of highly evolved enzymes from contemporary cells (Aharoni et al., 2005, Yoshikuni et al., 2006, Khersonsky et al., 2006). We propose that the same mechanism applied to primitive functional proteins, thus providing a powerful fitness criterion. Proteins lacking evolutionary flexibility would not have been very useful during natural selection.

Experimental and computational design

To test evolvability of model primordial proteins we will study a protein previously selected from a fully randomized protein library (Keefe and Szostak, 2001). This protein is highly selective towards ATP, but does not bind its close analog, guanosine triphosphate (GTP). Considering the high evolutionary potential of proteins, mutating an ATP-binding protein such that it would bind GTP appears to be a simple task, especially considering that a similar change of specificity was already accomplished for a protein from a contemporary organism (Tucker et al., 1998). The latter only required replacing the amino acids that form hydrogen bonds with N1, N3 and N6 of adenine with residues capable of forming hydrogen bonds with N1, N2 N3 and O6 of guanine. However, several attempts to achieve the required change in specificity of our ATP-binding protein failed. The recently obtained high-resolution structures of two variants of this protein (Lo Surdo et al., 2004, Mansy et al., 2007). and our preliminary computer simulations allow for explaining this failure. Adenine and guanine in ATP and GTP, respectively, form hydrogen bonds with the protein backbone atoms lining the binding pocket rather than with amino acid side chains. This in turn means that point mutations will not change hydrogen bonding in the pocket and, therefore, will not have a significant effect on the ATP/GTP selectivity.

The knowledge of the protein structure guides more advanced efforts to mutate our protein. Amino acids that have a potential of forming hydrogen bonds with either adenine or guanine are located in a short loop between the zinc binding side and a β-strand that contributes to the binding pocket (Figure 2). This loop is highly strained and points in a wrong direction –instead of side chains, the backbone faces the bound ligand. This strain can be relieved and the orientation of the loop can be changed without global reorganization of the protein structure only by insertion of one or two amino acids into the loop. This will be initially done in computer modeling studies using a novel loop prediction algorithm (Jacobson et al., 2004). The identity of the additional residues will be chosen to stabilize the desired conformation of the loop and provide sites that can form hydrogen bonds with guanine. If these efforts succeed we will confirm the stability of the redesigned protein by molecular dynamics computer simulations. We will further calculate the difference in the free energies of binding ATP and GTP to the protein using the free energy perturbation method (Chipot and Pohorille, 2007). This quantity is defined as the difference between the free energy of the protein-ligand complex in solution and the free energies of the protein and the ligand separately in solution.

Once the computational phase is completed, we will synthesize the proteins and test them for GTP-binding in solution. The yielded protein variants will then be further evolved for improved binding properties by our in vitro selection method. If the computational design approach does not result in measurable GTP binding capability we will use the modeling results to suggest amino acid position hot spots that are involved in ATP binding. These positions will then be randomized to generate a protein library that is likely to yield the desired GTP-binding if at all feasible.

Anticipated results and their significance

The results will provide clues to identifying selection criteria in protobiological evolution. If our efforts fail, it might explain why the structural motif of the ATP-binding protein is not found in contemporary proteins. Even if this protein were present at the early stages of evolution, it would have been eliminated through evolutionary pruning because it lacked sufficient evolutionary potential. In contrast, if our efforts are successful, our results will indicate that the protein is not represented among contemporary structural motifs for reasons other than insufficient stability, functionality or evolvability. One possible explanation is that the protein was never present in the original pool of functional proteins. This would lend the first concrete support for the "evolutionary accident" hypothesis of the origin of proteins. This hypothesis states that only a small subset of the many different macromolecular structures was sampled on the early earth, but that small subset was sufficient to facilitate evolution to modern organisms. Other suites of structures are also possible and could have been encountered along different evolutionary pathways on different habitable planets.

Evolution of Complexity in Metabolic Networks
(Investigation 5.4)


In contemporary cells, proteins enable metabolism, which is a network of catalyzed chemical reactions that support the self-maintenance, growth, reproduction and evolution of cells. How did this metabolism emerge and how was this process facilitated by proto-enzymes? How did the diversification of the population of proto-enzymes influence the evolution of the early metabolism? To what extent is the outcome of this evolution predictable? We will address these questions by means of computer modeling of biochemically plausible systems.

A number of previous theoretical studies addressed the emergence and evolution of early metabolism (see e.g. Kauffman, 1993, Dyson, 1999, Morowitz, 1992, 2000, Segrè and Lancet, 1999, Bagley and Farmer, 1991, Ikehara, 2002, Nir and Lahav, 1997, Lahav et al., 2001, Ma et al., 2007). In particular, models developed by Dyson (1999), Kauffman (1993) and Lancet (Segrè et al. 1998, 2000) describe possible transitions between initial, poorly organized catalytic networks and evolved, well-connected catalytic systems. This work differs from those studies not only in mathematical framework but also in several basic assumptions. In particular, instead of assuming massive, random reaction sets, we consider only a suite of protobiologically plausible reactions catalyzed under biochemically realistic conditions. This is in accord with a more recent view that biochemical reactions at the origins of life could only form sparse networks (Morowitz et al., 2000, Smith and Morowitz, 2004). Furthermore, our model involves, for the first time, simultaneous evolution of metabolic networks and enzymes that catalyze the reactions in these networks.

Biochemical model

The starting point for the proposed mechanism of protocellular reproduction and evolution is the emergence of proto-enzymes. It is assumed that initially proto-enzymes catalyzed only a small number of reactions with relatively poor efficiencies and selectivities. During the course of evolution, however, their catalytic properties improved and the repertoire of reactions that they mediated increased through diversification. Examples of relevant protein-catalyzed reactions are those participating in pathways and cycles that lead to activation of reactants with high-energy groups, synthesis of amino acids, membrane-forming amphiphiles and nucleic acids, and metabolism of small molecules.

It is often assumed that the set of all possible reactions is prohibitively large. This might lead to a "diversity catastrophe", whereby all resources are expended in reactions that offer little or no benefit to a protobiological system. In reality, however, carbon-based chemistry is subjected to strong thermodynamic (free energy and redox potential) and kinetic constraints. Weber (2002) described all organic transformations underlying the origin of life in a systematic way as electron pair transfer processes. This approach allows for calculating the free energies of organic reactions from carbon group reduction potential, and by doing so provides an understanding of how the energy constraints govern the direction and extent of these reactions. Briefly, the free energies of organic reactions are determined by the direction of electron pair transfer and the oxidation state of the participating carbon groups. The favorable direction of irreversible reactions corresponds to electron pair transfer from groups at a higher oxidation state to groups at lower state. Reversible reactions involve transfer between groups at the same oxidation state. In addition, a systematic review of kinetic parameters of organic reactions revealed that carbonyl groups (aldehyde or ketone) must be located at or near the reaction centers to accelerate reactions sufficiently that the synthetic processes are sustained at rates needed to counterbalance continual chemical degradation (Weber, 2004).

Although the rules developed by Weber and recently extended to nitrogen-containing compounds (Weber, unpublished) apply directly to spontaneous reactions they can be profitably extended to chemistry catalyzed by proto-enzymes. Even in contemporary biochemistry, energetically unfavorable reactions occur only as steps in overall favorable pathways or in pathways requiring an external energy supply. Highly unfavorable reactions are observed rarely. Similarly, reactions with high-energy barriers are not common and require specialized, highly evolved enzymes. In their absence, and at a low level of biochemical organization characteristic of early metabolism, it is expected that there was a strong bias towards reactions that were neither energetically nor kinetically demanding. These universal rules of biochemistry and previous work on reconstructing the earliest metabolism from these rules (Weber, 1999, Weber, 2001, Weber and Pizzarello, 2006) or from modern metabolic networks (Cactano-Anolle et al., 2007, Smith and Morowitz, 2004, Copley, et al., 2007) will serve as the basis for building the inventory of initial, catalyzed reactions.

A novel feature of our planned model is the introduction of balance between constructive and destructive processes. In particular, proteins that do not adopt a well-defined three-dimensional structure, and therefore are most likely non-functional, either precipitate from solution or are preferentially cleaved by protease proto-enzymes. This provides an important mechanism for preventing diversity catastrophe. Considering that balance between constructive and destructive processes is a universal phenomenon acting at different levels on all living systems, it is likely that it also played an important role in the beginnings of life.

Mathematical model

The computational model that we will use is an extension of a previously developed model that included only a very limited number of functions (New and Pohorille, 2000). Neither model is aimed at biochemical accuracy; this would be a futile task. Instead, it is aimed at biochemical plausibility – it should be in accord with principles of chemistry and biochemistry. The identities of the amino acids forming peptides are not considered. An attempt to relate directly the sequence of a protein to its function would be equivalent to solving the folding and the structure-function relationship problems, which is currently not possible. Instead, the key quantity in the model is the probability distribution of finding a protein with a given efficiency of catalyzing a desired function, irrespective of its sequence. The functional form describing probability distributions must be both biochemically plausible and mathematically tractable. The simplest choice would be normal distributions with variable mean values and variances. However, asymmetric, lognormal distributions are likely to be more realistic because they better capture a biochemical feature that most randomly chosen protein sequences exhibit no or only minimal catalytic activity. This probabilistic approach is conceptually related to "functional information" introduced recently as a measure of biocomplexity (Szostak, 2003, Hazen et al., 2007). For an enzymatic reaction, this measure is defined as the probability of finding an enzyme capable of catalyzing this reaction with efficiency above a predefined threshold. The approach is also consistent with our view that multiple proto-enzymes can catalyze a single reaction and, therefore, the focus should be on functions rather than on specific protein structures.

Rules for the probabilities of catalyzing biochemical reactions used in our model allow for both the emergence of novel enzymes as well as the increases in efficiency and specificity of existing enzymes. For example, the probability of finding an efficient catalyst of a given reaction decreases as efficiency, specificity or catalytic power (i.e. the reduction of the energy barrier) increase. Similarly, biochemical considerations dictate that the efficiencies of short amino acid sequences increase only slightly with the length of the polymer. Only when these sequences reach lengths sufficient for them to adopt an ordered, three-dimensional structure can the average efficiencies increase markedly with length. In addition, the model fulfills a number of conditions required by mathematical consistency. For example, during the evolution of the modeled system, consecutive populations of enzymes are linked through conditional probabilities that must obey the appropriate detailed balance conditions.

Solving the model

The problem to be solved can be defined as follows: given a suite of substrates for and catalysts of chemical reactions, which both evolve with time, predict the distribution of reaction products. This requires simulating the evolution of a system with many species and many reaction channels as a function of time. Commonly, one assumes that the number of molecules of each species is sufficiently large that it can be replaced by a continuous variable (concentration) that varies deterministically over time. This leads to a coupled system of differential equations that are solved numerically. The basic assumptions of this approach, however, are not well supported in our protobiological systems. Thus, we will use instead the Next Reaction Method, an exact and efficient stochastic algorithm to simulate coupled chemical reactions (Gillespie, 1977, Gibson and Bruck, 2000). For a system in a given state, a probabilistic algorithm is defined to answer two questions: (1) which reaction occurs next, and (2) when does it occur? The method is not limited to conventional chemical reactions but also allows for incorporating other cellular processes such as transport of nutrients across cell walls, and cell growth and division. This approach can be seamlessly connected with our stochastic model of a protein system. For each simulation the initial inventory of substrates and proto-enzymes is defined and time evolution is carried out in parallel for a large number of separate systems, which can be considered as representing individual "protocells". In general, protocells will differ in their metabolic capabilities and the inventory of proto-enzymes, which enables Darwinian competition between them. At the end of a simulation the probability distribution of solutions will be analyzed.

Anticipated results and their significance

Once we develop and test our software tools, we will perform a series of simulations that are aimed at determining conditions that are necessary for evolution of a population of proteins in increasingly complicated systems. The results of these simulations should help answer such questions as:

Can we observe self-organized pathways and autocatalytic cycles and what is the degree of complexity of the systems in which they emerge? How do they relate to contemporary biochemical pathways?

How do different types of networks, which can be interpreted as different "species", evolve? Do they compete with each other in a population, driving some of them to extinction?

How broad are the probability distributions of solutions under specified conditions and, therefore, to what extent can the outcomes be interpreted as deterministic? At the other extreme, does the system experience a "diversity catastrophe". If not, which is the expected result, how do different mechanisms prevent a diversity catastrophe from happening.

How robust are the results with respect to changes in the initial inventory of reactants, proto-enzymes and the parameters in the model? How do populations of protocells respond to changes that represent changes in the environment (changes in fitness)?

The results will shed light on the potential of protobiological systems to increase their complexity by self-organizing into metabolic networks in the presence of proto-enzymes. In the absence of macromolecular catalysts, this self-organizing potential has been hotly debated (Morowitz et al., 2000, Orgel, 2000). We will also obtain information to what extent the formation of protocellular metabolism was deterministic, even though it was driven by highly stochastic processes. Finally we will learn whether such concepts as speciation and fitness to the environment, which are familiar in the context of cellular and organismal evolution, also apply to protobiological systems. It would be desirable to test conceptual results emerging from our simulations in laboratory experiments. Although experiments on networks of simple, catalyzed reactions are very difficult to conduct, first steps in this direction have already been made (Lee et al., 1997, Yao et al., 1998). We anticipate that our results will guide future experiments in this area.


Early Habitable Environments and the Evolution of Complexity
---David J. Des Marais, Principal Investigator

Cosmic Distribution of Chemical Complexity
---Lou Allamandola, Lead Co-Investigator

Disks and the Origins of Planetary Systems
---Sanford Davis, Lead Co-Investigator

Mineralogical Traces of Early Habitable Environments
---Tori Hoehler, Lead Co-Investigator

Origins of Functional Proteins and the Early Evolution of Metabolism
---Andrew Pohorille, Lead Co-Investigator