• Login
  • |
  • Register

NGS Leaders Blog

The Road to the $1000 Genome

 Permanent link


Kevin Davies, Editor-in-Chief, Bio-IT World : The term next-generation sequencing (NGS) has been around for so long it has become almost meaningless. We use “NGS” to describe platforms that are so well established they are almost institutions, and future (3rd-, 4th-, or whatever) generations promising to do for terrestrial triage what Mr Spock’s Tricorder did for intergalactic health care. But as the costs of consumables keep falling, turning the data-generation aspect of NGS increasingly into a commodity, the all-important problems of data analysis, storage, and medical interpretation loom ever larger. 

 “There is a growing gap between the generation of massively parallel sequencing output and the ability to process and analyze the resulting data,” says Canadian cancer research John McPherson, feeling the pain of NGS neophytes left to negotiate “a bewildering maze of base calling, alignment, assembly, and analysis tools with often incomplete documentation and no idea how to compare and validate their outputs. Bridging this gap is essential, or the coveted $1,000 genome will come with a $20,000 analysis price tag.”

 “The cost of DNA sequencing might not matter in a few years,” says the Broad Institute’s Chad Nusbaum. “People are saying they’ll be able to sequence the human genome for $100 or less. That’s lovely, but it still could cost you $2,500 to store the data, so the cost of storage ultimately becomes the limiting factor, not the cost of sequencing. We can quibble about the dollars and cents, but you can’t argue about the trends at all.

 But these issues look relatively trivial compared to the challenge of mining a personal genome sequence for medically actionable benefit. Stanford’s chair of bioengineering, Russ Altman, points out that not only is the cost of sequencing “essentially free,” but the computational cost of dealing with the data is also trivial. “I mean, we might need a big computer, but big computers exist, they can be amortized, and it’s not a big deal. But the interpretation of the data will be keeping us busy for the next 50 years.” Or as Bruce Korf, the president of the American College of Medical Genetics, puts it: “We are close to having a $1,000 genome sequence, but this may be accompanied by a $1,000,000 interpretation.”

 Arbimagical Goal
 The “$1,000 genome” is, in the view of Infinity Pharmaceuticals’ Keith Robison, an “arbimagical goal”—an arbitrary target that has nevertheless obtained a magical notoriety through repetition. The catchphrase was first coined in 2001, although by whom isn’t entirely clear. The University of Wisconsin’s David Schwartz insists he proposed the term during a National Human Genome Research Institute (NHGRI) retreat in 2001. During a breakout session, he said that NHGRI needed a new technology to complete a human genome sequence in a day. Asked to price that, Schwartz paused: “I thought for a moment and responded, ‘$1,000.’” However, NHGRI officials say they had already coined the term.

 The $1,000 genome caught on a year later, when Craig Venter and Gerry Rubin hosted a major symposium in Boston (see, “Wanted: The $1000 Genome,” Bio•IT World, Nov 2002). Venter invited George Church and five other hopefuls to present new sequencing technologies, none more riveting than U.S. Genomics founder Eugene Chan, who described an ingenious technology to unfurl DNA molecules that would soon sequence a human genome in an hour. (The company abandoned its sequencing program a year later.)

 Another of those hopefuls was 454 Life Sciences, which in 2007 made Jim Watson the first personal genome using NGS, at a cost of about $1 million. Since then, the cost of sequencing has plummeted to less than $10,000 in 2010. Much of that has been fueled by the competition between Illumina and Applied Biosystems (ABI). When Illumina said its HiSeq 2000 could sequence a human genome for $10,000, ABI countered with a $6,000 genome dropping to $3,000 at 99.99% accuracy.

 Earlier this year, Complete Genomics reported its first full human genomes in Science. One of those belonged to George Church, whose genome was sequenced for about $1,500. CEO Cliff Reid told us earlier this year that Complete Genomics now routinely sequenced human genomes at 30x coverage for less than $1,000 in reagent costs.

 The ever-quotable Clive Brown, formerly a central figure at Solexa and now VP development and informatics for Oxford Nanopore, a 3rd-generation sequencing company says: “I like to think of the Gen 2 systems as giant fighting dinosaurs, ‘[gigabases] per run—grr—arggh’ etc., a volcano of data spewing behind them in a Jurassic landscape—Sequanosaurus Rex. Meanwhile, in the undergrowth, the Gen 3 ‘mammals’ are quietly getting on with evolving and adapting to the imminent climate change... smaller, faster, more agile, and more intelligent.”

 Nearly all the 2nd-generation platforms have placed bets on 3rd-gen technologies. Illumina has partnered with Oxford Nanopore; Life Technologies has countered by acquiring Ion Torrent Systems; and Roche is teaming up with IBM. PacBio has talked about a “15-minute” genome by 2014, Halcyon Molecular promises a “$100 genome,” while a Harvard start-up called GnuBio has placed a bet on a mere $30 genome.

 David Dooling of The Genome Center at Washington University, points out the widely debated cost of the Human Genome Project included everything—the instruments, personnel, overhead, consumables, and IT. But the $1,000 genome—or in 2010 numbers, the $10,000 genome—only refers to flow cells and reagents. Clearly, the true cost of a genome sequence is much higher (see, “The Grand Illusion”). In fact, Dooling estimates the true cost of a “$10,000 genome” as closer to $30,000, by the time one has considered instrument depreciation and sample prep, personnel and IT, informatics and validation, management and overheads.

 “If you are just costing reagents, most of the vendors could claim a $1,000 genome right now,” says Brown. “A more interesting question is: ‘$1,000 genome—so what?’ It’s an odd goal because the closer you get to it the less relevant it becomes.”

This article also appeared in the September-October 2010 issue of  Bio-IT World. 

Genome Studies for Drug Safety

 Permanent link


Ernie Bush, VP and Scientific Director, Cambridge Healthtech Associates : The following is an interview that Ernie Bush conducted with Paul Watkins in September 2010.     

Paul Watkins, Professor of Medicine and Pharmacy at the University of North Carolina, Chapel Hill, and director of the Hamner-UNC Institute for Drug Safety Sciences, spoke with me about the Hamner Institute’s efforts to leverage information from the human genome to explore issues around drug induced liver injury.

EB: Could you describe the work you’re doing at the Hamner Institute?

PW: Rare idiosyncratic drug induced liver injury (DILI) is the major drug toxicity that terminates a new molecular entity in clinical development and it’s also the major organ toxicity that leads to regulatory actions on drugs post approval. And there’s really been very little insight to date on what is the mechanism of these adverse events, i.e. why a drug that’s safe for the vast majority of people can cause life threatening liver injury in a rare person, say 1 in 10,000 or 1 in 50,000. And a very exciting endeavor that’s been ongoing now for five years and funded by the NIH is the drug induced liver injury network or DILIN.

I chair the steering committee and I also chair the genetics sub-committee for this initiative. It involves finding people around the country who’ve actually had these rare reactions and recovered, in some cases after a liver transplant. And one of the first areas we’re starting on is the genetics. All of these subjects are going to undergo the million SNP chip analysis for a genome-wide association study (GWAS). About half of the 850 have already had this GWAS and there have been some successes just doing this technique.

To extend this assessment, what we’re now embarking on, and this is in collaboration with David Goldstein’s group at Duke, is whole exome sequencing with cases due to certain drugs where we have a relatively large number of well-characterized subjects... There’s also a parallel effort in Europe...

All these studies in the U.S. and Europe are, of course, generating a huge amount of data. Therefore, the real challenge now is: How do we mine that large data set to get mechanistic insight of why an individual would be susceptible to DILI drug reactions. So here at the Hamner, we’ve begun several research initiatives designed to synergize and capitalize on these other resources.

One is that we’re looking at panels of inbred mice that are genetically very well characterized but one strain is significantly different from the other. We use these strains as a model for genetic variation that exists in people. We were fortunate enough to get an NIH grant—that’s myself and David Threadgill who’s the chair of genetics at NC State University—to start giving drugs to these mice to see if we could find one particular strain or a couple of strains that would manifest the toxicity, just as it occurs in people. Because the genetics is very well worked out and established in these mice, we can immediately see candidate genes that account for the susceptibility in mice and then compare that to the human genetic data in the DILIN and SAE consortium gene banks to test these hypotheses.

The other approach is to study the effects of these drugs, the ones implicated as causing liver injury, in primary cultures of human hepatocytes with additional cells that are present in the liver; particularly Kupffer cells to see if we can define the pathways that these drugs perturb as you go from physiologic concentrations to toxic concentrations. These studies may generate a list of suspect genes that we can then go over to the human genetic data and actually look very closely at these genes to see if there’s any variation that might cause a susceptibility to DILI.

Likewise as David Goldstein’s group comes up with certain genetic variations that they say statistically are different between the cases and controls, we can go to our hepatocytes and look to see if that makes sense mechanistically.

And then lastly, we’re putting all this information together in an in silco model called DILI-SIM. This work is a collaboration between the Food and Drug Administration and a Bay area company called Entelos where we’ve already begun modeling the major pathways taken by drugs and the major perturbations that drugs cause that can lead to liver toxicity.

EB: At the World Pharmaceutical Congress, you presented preliminary data that suggest not so much a metabolic difference in these individuals, but an immune system difference, is that correct?

PW: That’s right. One of the surprises in the GWAS analysis, both done in the DILIN and the SAE consortium, was that for some drugs the major susceptibility factor that we could identify was within what’s called the HLA region of the genome. So this is involved in the immune response—what’s called the acquired immune response, suggesting that these severe events may in some cases be due to the body’s immune system attacking the liver.

In addition to that, we found suggestions for other types of genes including drug metabolism and transporter genes. But, the associations so far have been weak and we’ve actively working to improve and expand these findings.

This article also appeared in the September-October 2010 issue of Bio-IT World 

I-Study: Genomic Interpretation - Who Will Pay?
During this webinar, members of the study review team present preliminary findings of the I-Study, conducted at the Harvard Medical School's 2011 Personalized Medicine Conference.
Twitter Feed
Privacy Policy|Terms of Use