Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome

D.G. Gibson et al. reported, in Science Magazine, the “Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome”. Now, I used to be a biologist and have studied this particular type of biology for a number of years before leaving the field, mostly for financial reasons, for a career in computer science. I’m also a certifiable geek, as I think most of the readers of this blog are, so I thought I’d explain what this means, in geek terms. First, let me start with an analogy: if you buy a PC that comes with an OS pre-installed, this is the equivalent of reading the source code of the OS, re-writing it with a few (minor) changes and installing it in the PC, replacing the old OS. People around the world do this every day on their computers, but it’s a first when the computer is actually a single-cell micro-organism.

In this analogy, the genetic material of a micro-organism is compared with the OS of a PC. This is arguably a poor analogy, and I’ll explain why (which will also explain a bit of the difficulty that scientists have in understanding the genome). First of all, the genome does not contain instructions that are executed by something exterior to it: it does contain instructions, but it also has its own structure and the instructions it contains are more like blueprints than like statements in C or C++. In fact, the genome is more like VHDL (VHSIC hardware definition language) than it is like C. However, an FPGA (field-programmable gate array) doesn’t normally carry its VHDL code around in VHDL: the VHDL code goes through a couple of phases before it can program an FPGA. The genome’s instructions go through such phases as well: it is made up of DNA, which is translated to RNA, which is then interpreted at the ribosome (which itself consists of RNA and proteins) to create proteins - so if the DNA is the VHDL, the proteins are the gates of the FPGA. Note, though, that the translation from DNA to RNA and from RNA to protein is done by proteins which, themselves, were made in the same way.

So, let’s come up with a closer analogy: say you have a pre-programmed FPGA that contains the VHDL that was used to program it and, encoded in the VHDL (and therefore present in the FPGA) a VHDL interpreter. What Gibson et al. did was to extract the VHDL, make a copy of it, make some modifications to it, put it back on the FPGA and let the FPGA reprogram itself using that new VHDL. Note that the VHDL encodes the VHDL interpreter: DNA has a way of modulating its own interpretation, though in bacteria that modulation is arguably far more limited than it is in multi-cellular organisms (like humans).

Now, it may seem a bit banal to download code from a microchip, copy it, tweak it a bit and put it back, but the process is quite a bit more involved than it would be for VHDL. First of all, DNA is a molecule, which is a lot more tangible than a file but a lot less than a pencil - and a lot more difficult to handle than either of these. DNA is actually a polymer meaning it is built up of a lot of smaller molecules, each of which, in the case of DNA, is called a nucleotide. In DNA (as in RNA) there are four of them: Guanine (G), Cytosine (C), Adenine (A) and Thymine (T) (In RNA, Thymine is replaced by Uridine (U)). You’ll have noticed that with the names of each of these molecules I put the first letter in parentheses afterwards. We (biologists and former biologists) usually refer to these molecules only by that first letter. So DNA is a polymer of these four basic molecules.

If you’re as much of a geek as I think you are, you probably already knew part of this and you also know DNA is generally referred to as a double helix and the image to the right will be familiar to you. The genome is quite a bit more complex than “just” that double helix, though: it is wrapped around itself several times over. In fact, it it weren’t, it would be pretty difficult to carry around in each and every one of our cells. The human genome, which is about 3.4 billion base pairs is about two meters long (for all your cells together than amounts to over 182 billion kilometers (113 billion miles) - which is the distance from the earth to the sun more than 1200 times over). All that material is packed into the nucleus of each and every one of the cells of your body (of which you have between 50 and 100 trillion, depending on who you believe - in comparison, the US deficit at the time of this writing is 8.6 trillion USD - by the time you read this it may have reached 9 trillion, but it’s still far less than the number of cells you have in your body). So, two meters of DNA inside a cell that is too small to see with even a pretty good magnifying glass - neat eh?

Bacterial DNA is significantly smaller in size than human DNA: in the case of Mycoplasma mycoides, it’s only about a million base-pairs, which is about 100 millimeters long. Still pretty big for a single molecule, but pretty small nonetheless.

Reading DNA requires a few steps: first you isolate it from the organism it’s in. I’ve done this many, many times with Escherichia coli (probably the most commonly used bacterium for genetic engineering) as well as some plants (Phaseolus vulgaris, the common green bean, and P. coccineus, its cousin) and fungi. For some bacteria, there are commercial kits of which you just follow the instructions to get the DNA out (actually, you don’t really need a kit: E. coli is pretty easy to separate from its DNA once you get a hang of it.); for plants and fungi, it’s usually a bit more involved than that.

After that, you have to sequence it. I won’t go into the details on how to do that, but it involves a process called the Polymerase chain reaction which basically allows you to copy strains of DNA, of which you can then read the sequence of base-pairs through the magic of fluorescence. Fun to do, especially now that there are machines that will read the gel (on which the DNA is separated in an electric field - I’ll explain that a bit later) and translate that to (often colored) sequences of A, C, G, and T. Sequencing short strands of DNA doesn’t take very long; sequencing a bacterial genome takes a few days.

So, now you have the sequence of the DNA. Step three is to synthesize new DNA with that sequence (perhaps after modifying the sequence a bit to suit your tastes). I have to admit I’m on somewhat shakier ground for this part, because I’ve never actually synthesized DNA: I’ve only used PCR to copy DNA, the difference being that I already had DNA with the proper sequence and just needed to make a copy. Gibson et al. synthesized, meaning they didn’t start from a copy. Once they had their first few copies, they put it in a vector, which was copied by Sacharomyces cerevisiae - common yeast. The reason to do this in yeast is pretty simple: although yeast is a single-cellular organism, it is a yeast (duh!), which is a single-cellular fungus (more or less - bare with me). That means it has a much, much bigger genome than bacteria do and it has all kinds of mechanisms we can hijack for genetic engineering. Also, the impact of having a large chunk of foreign DNA in an organism that has many times the amount of DNA is far smaller than having that same chunk of DNA in an organism of which the genome is, itself, of about the same size. That doesn’t mean there isn’t any impact, but it does contribute to the fact that yeast is an excellent little “DNA factory”. Finally, yeast has a different set of tools for expressing genes than bacteria do, so the bacterial DNA will only be copied by the yeast, but it won’t try to use it for anything.

If you followed me up to here, you now have an idea of the kind of work involved in making a recombinant genome. This is several years of work, and builds upon several decades of research. To quote Gibson et al.: “[they] developed a strategy for assembling viral sized pieces [which they synthesized] to produce large DNA molecules that enabled [them] to assemble a synthetic M. genitalium genome in four stages from chemically synthesized DNA cassettes averaging about 6 kb in size. This was accomplished through a combination of in vitro enzymatic methods and in vivo recombination in Saccharomyces cerevisiae. The whole synthetic genome (582,970 bp) was stably grown as a yeast centromeric plasmid (YCp)”. A YCp is basically a large circular chunk of DNA that yeast can copy, and copy, and copy, …, and copy again ad infinitum. (In the earlier stages of assembly, E. coli was used to reproduce the smaller chunks.)

So, once they had all that DNA replicated, they transplanted it to another bacterial species and went home for a week-end. Note the important part here: “another bacterial species”: the DNA they originally took was from Mycoplasma mycoides ss. capri, and they transplated their regenerated (and redesigned) DNA into Mycoplasma mycoides ss. capricolum (OK, these are different subspecies, but hey).

If you followed me up to here, the rest should not be a problem to understand: the genome they actually synthesized in the experiment they reported on wasn’t one they just copied - it was one they designed, based on two laboratory strains of the bacteria they were working with. Because they based their designed sequence on early versions of the discovered sequences of those two genomes, there were 19 errors in the designed sequence w.r.t. the DNA they based it on. Because those errors were harmless, they kept them - this also allowed them to recognize their designed DNA vs. the natural version.

Of course, 19 random mutations amounted over a million base pairs (which is what they ended up with) is hardly enough to identify that genome, so they replaced four larger chunks of DNA with DNA that they designed themselves, but which did not encode any functionality. If you take a look at the figure S1 you can see what those “watermark” sequences look like. Note the parts they’ve highlighted in green: TTAACTAGCTAA. These sequences are called primer sequences. They are short sequences that allow the polumerase chain reaction (PCR) to start the copying process. They are intentionally short enough to easily synthesize but long enough to both bind to the DNA easily and be a unique sequence, not found in the DNA too many times.

The details of how they assembled the genome (in four steps, using larger and larger chunks) aren’t all that interesting - at least to the layman. When I read it, I particularly liked the way they got rid of the linear yeast DNA once they had assembled the (circular) bacterial genome - using physical rather than chemical processes. Another interesting caveat is that, once they arrived at 100 kb (~100,000 byte pairs) chunks, they tested to make sure that there were no significant mutations in those 100 kb chunks. They did so by replacing the corresponding chunk in the natural genome and selecting the colonies that survived. Using this technique, they found an error in one of the fragments that had to be corrected. This same error also served as a negative control for the genome transplantation experiments: now that they had a way to produce a genome that was sure to kill recipient bacteria, they could verify that the transplantation worked in three independent ways:

bacteria that received a natural genome from a donor subspecies would contain the known genetic markers of that subspecies (in our FPGA analogy, this means the FPGA that received the VHDL of the other model would start behaving like that other model)
bacteria that received a synthetic genome would have the markers of that genome (in our FPGA analogy, the receiving FPGA would have all the known gates that were in the new VHDL and none of the ones that were supposedly replaced)
bacteria that received the known-buggy genome would all die (in our FPGA analogy, they’d probably short out - there are no built-in safeties in biology) Note that the second of these three is perhaps the most important: the four “watermark” sequences that were designed into the synthetic genome (the ones with the special TTAACTAGCTAA sequence I talked about earlier) are visible on an agarose gel after PCR amplification. This is when you isolate the DNA from your bacteria and run it through a PCR which would allow the watermark sequences to be copied many times over, then put it in an agarose gel (a gel made of a substance called agarose) and put it in an electic field (which is why this part of the process is called “electrophoreses”). The DNA is pulled, by the electric field, to one of the two poles (I forget which, it’s been a while). Depending on the size of the DNA strands, they are slowed down by the gel, so if you keep the electric field there a while, chunks of different sizes get separated from each other. After that, you can see the different sizes of the DNA strands that were separated from each other separately on the gel, under UV light. You can even use that possibility to cut the gel into pieces and take just the DNA of a particular size you’re interested in. You have to do that quickly though, because the UV light damages the DNA.

Like I said, though: in order to know that your transplantation worked, you need to have both the markers of your wanted genome to be present, and the markers of your unwanted genome to be absent. Nature (or God) wasn’t kind enough to equip bacteria with markers we can easily replicate using PCR that we would also not need in the actual bacteria (so you could remove them from the synthetic bacteria and be done with it). However, He was kind enough to equip us with enzymes that can cut DNA at specific sequences. In biology, we call this “digesting” DNA. You don’t want to completely digest your DNA though, because you basically wouldn’t have any left if you do. However, if you digest the DNA partially - i.e. you let the digestion process go on for a limited amount of time - you get chunks of DNA that, on the gel I just talked about, give you a distinctive pattern. These patterns are different from one genome to another (sufficiently different to distinguish one person from another using a similar technique), so you can use them as a test to verify that no unwanted genomes are present in your recipient bacteria. To do that, you take some unwanted DNA, some wanted DNA and the DNA you’re testing and you digest them with the same amount of digesting enzymes for the same amount of time, then put them next to each other on a gel. If all is well, you’ll see two identical lanes on your gel: the transformed bacteria’s DNA and the wanted DNA. If something is wrong, you’ll see three different patterns: the unwanted DNA, the wanted DNA, and some odd mixture of the two. When you see that, you could start over - but as you’ve probably tried many, many transformations at the same time, you’re likely to be lucky if your method works at all.

Once they performed those tests and had results consistent with what they expected, they sequenced the genome of one of the colonies, which they called Mycoplasma mycoides JCVI-syn1.0. Why do all those other tests first? Well, first of all, if you do an experiment like that you are likely to have a large number of untransformed and badly transformed bacteria that you would not want to sequence the genome of: sequencing is a costly business compared to partial digestion and PCR amplification of watermark genes - both of which you can do many times a day if you want to. Sequencing a genome is a far more arduous affair and takes a lot more time, and money.

What they found in Mycoplasma mycoides JCVI-syn1.0 confirmed that it was a successfully transplanted bacteria with the synthetic genome: the sequence didn’t reveal any bits of the receiver bacteria’s subspecies’ genome (i.e. “[they] did not find any sequences in the synthetic genome that could be identified as belonging to M. capricolum”), and matched the synthetic genome with a few minor mishaps. Notably, one of the genes in the genome (apparently not a vital one) was interrupted by a piece of DNA from E. coli that shouldn’t have been there and hadn’t been detected at an earlier stage, there were eight new single-nucleotide “polymorphisms” (single-letter changes in the DNA sequence that don’t actually have it encode anything different from what was there before due to the fact the the “DNA code” is partically redundant) and a bit of DNA that was duplicated (i.e. repeated). So the copy wasn’t perfect, but it was certainly good enough. They performed a whole slew of other tests on the newly minted Mycoplasma mycoides JCVI-syn1.0 bacteria which showed that it was different from the two other bacteria involved in the process (three of you count E. coli, that some genes were broken but their corresponding DNA was still present (those are bugs, not features) and that they grow just like normal, natural bacteria would.

From the article:

"In 1995, the quality standard for sequencing was considered to be one error in 10,000 bp and the sequencing of a microbial genome required months. Today, the accuracy is substantially higher. Genome coverage of 30-50X is not unusual, and sequencing only requires a few days. However, obtaining an error-free genome that could be transplanted into a recipient cell to create a new cell controlled only by the synthetic genome was complicated and required many quality control steps. Our success was thwarted for many weeks by a single base pair deletion in the essential gene dnaA. One wrong base out of over one million in an essential gene rendered the genome inactive, while major genome insertions and deletions in non-essential parts of the genome had no observable impact on viability. The demonstration that our synthetic genome gives rise to transplants with the characteristics of M. mycoides cells implies that the DNA sequence upon which it is based is accurate enough to specify a living cell with the appropriate properties."

In english, this means that DNA sequencing technology has come a long way in the last fifteen years, but while large chunks of DNA may not be vital to the organism (i.e. it can live with errors in it) a single base-pair deletion rendered the genome non-vital and had to be fixed - which delayed the whole process for several weeks.

Another important note from the article - and this is where any controvercy will come from:

"This work provides a proof of principle for producing cells based upon genome sequences designed in the computer. DNA sequencing of a cellular genome allows storage of the genetic instructions for life as a digital file. The synthetic genome described in this paper has only limited modifications from the naturally occurring M. mycoides genome. However, the approach we have developed should be applicable to the synthesis and transplantation of more novel genomes as genome design progresses."

Does this mean that we can now design any genome on a computer from scratch, synthesize the DNA of that genome with a little help from E. coli and S. cerevisiae and put it in a cell to give that cell any properties we could possibly want? Well… no. For one thing, there is a limitation in our knowledge, at the moment, of the purpose of a large part of the genes in a genome. In fact, there are quite a few parts of DNA of which we don’t really know what they’re for - even if do have the sequence. Let’s say that we are novices at VHDL: we can copy and paste bits of code, try some relatively minor modifications and upload it to an FPGA, but that is still a long way from being an expert VHDL artist who’d be able to make the FPGA do whatever your desktop PC would be able to do. Aside from that, there are intrinsic limitations in the receiving bacteria: the genomic DNA transplanted into these bacteria needs to be sufficiently similar to what was there before for the cell to integrate the DNA in question. That doesn’t mean, though, that the “synthetic cells” are no different after transplantation than they were before: after the several thousand cell multiplications those cells have to go through in order to form a colony, there are practically no proteins of the original proteome left in the cell. It does mean, however, that there is a limit to the level in which the genome can be radically different from the one that was there in the original cell. If one would want to create a truly radically different cell, it would arguably be possible by re-iterating the process several times - which I would call “evolution by design” (something we do in computer science every day: debugging, modifying, incrementally adding new features, etc.).

Does this mean a living organism was created in the laboratory? No. The cells were there, as were all of the processes that made the cell alive. The fact that its genome was replaced with another genome, although it does significantly change the nature of the cell, does not create life where it did not exist before.

What could this new technology be used for? Recombinant DNA, genetic engineering, etc. are already used in a wide variety of applications, in research, in medicine and in industry. This new technology would eventually allow researches to come up with new ways to produce a variety of things that micro-organisms can produce.

What are the theological implications of this? As far as I can see, there are none. The fact that it is now possible to replace the genome of a micro-organism with another, synthetic, genome does not mean anything w.r.t. the veracity of any holy book I know (though arguably I only know the Bible to any extent that might be notable) and I personally don’t see this as the researchers playing God. This doesn’t, in fact, say anything about either evolution or Creation (in fact, the word “evolution” is not mentioned even once in the article): it “just” describes a new way of playing with what either one of these processes (evolution and Creation) could have produced. Perhaps this is another way that Man co-creates.

Would this technology be applicable to humans? I.e. would it be possible to replace the genome of a human being by a synthetically created genome? Not with this technology, no. There would be enormous ethical implications in even conceiving of such a thing, but those implications aside this technology requires the reproduction of the entire synthetic genome in a host organism (yeast, in this case) transplanting that assembly into the receiving cells. As I explained above, this is only possible if the synthetic genome is sufficiently smaller than the genome of the host organism, used to copy that genome and produce enough DNA to attempt transplantation (with any rate of success). Though humans do by no means have the largest genome of all living organisms (if I recall correctly, that honor goes to water-lillies) our genome is too large for that kind of experimental amplification. The other way of amplifying DNA sequences, PCR, is not accurate enough for this kind of application either. That does not mean that a technology that would allow a synthetically created humanoid genome is inconceivable (although it would be ethically inconceivable in my opinion) but it does mean we do not have the know-how to devise it, and probably won’t for decades to come.