Search the KHIT Blog

Sunday, November 11, 2012

"Big Data"? How about REALLY Big Data? "Personalized" to EACH of us?

Will "Meaningful Use" get us to The Promised Land of Affordable Personalized Medicine? (I've previously blogged on the topic here.) Will even Stage Three ONC Certified HIT be up to the effective real-time analytical data delivery task?

Imagine my dubiety.

Which is by no means to argue that we can't get there. Just that the breakthroughs are likely to come from other directions as yet off the Beltway radar.

I've had an interesting few days of reflection on the path forward, both my own and my take on the likely improvement paths of medical science more generally. First, this arrived in my newsfeed.

Freely distributed, absent the customary commercial firewall. I printed a copy and grabbed my yellow highlighter. A great read.
A common hypothesis is that advances in human genomics will reduce disparities by identifying genetic causes of disparities. In support of this hypothesis, racial and ethnic differences in genetic variant frequency have been demonstrated for many diseases. However, translating this evidence into reductions in disparities has proven challenging for several reasons. First, many variants identified have a small attributable risk and explain little of the disease burden in any group, either because of a weak association between variant and disease or because the variant is rare in the population. Second, far more genetic variation occurs within racial or ethnic groups than between groups, and disease-associated variation has no apparent predilection for the 4% to 8% of variation that can be linked to race or ethnicity. Thus, if genomic variation explains a minority of most diseases and is unlikely to be linked to a racial or ethnic group, it becomes unlikely that genomic variation between groups will be a substantial cause of disparities in most common diseases. Third, developing interventions based on this information is challenging. Although prenatal or even premarital genetic screening can reduce the burden of severe diseases if screening influences reproductive decision making, lack of acceptance of these approaches has limited their effectiveness.
It gets better straight away.
Another pathway by which genomics may reduce racial disparities that has received considerably less attention is its effect on clinical uncertainty and statistical discrimination. The need to make decisions under conditions of uncertainty is one of the hallmarks of medicine. This uncertainty arises on 2 levels. For many decisions, there is no credible and consistent evidence about risks and benefits of different interventions. Moreover, even when evidence exists, uncertainty arises about the effect of that evidence on the individual patient. The gap between the average effect in a population and the effect in a specific patient can be substantial, in part because of differences between patients in practice and trial participants and in part because the average effect in a trial masks substantial variation among trial participants.
Under conditions of uncertainty, 2 situations may lead to racial disparities in care. First, clinical decisions become dependent on heuristics, stereotypes, and biases. Although heuristics, or decision shortcuts, can lead to cognitive errors, the real risk of disparities arises from stereotypes and bias. Stereotypes assign characteristics to an individual based on assumptions about group affiliation. Minority stereotypes in the United States may have negative connotations, including beliefs that minorities are less adherent with treatment, less interested in numerical data, or less willing to travel for care.1 When these negative stereotypes influence decisions, disparities arise. For example, a study of diabetic treatment found that disparities by race and ethnicity were explained in part by differences in clinician beliefs about patient self-management abilities and family competence.
Indeed. I have no choice but to reflexively channel Messrs Weeds' "Medicine in Denial" (cited here as well). Dr. Armstrong continues:
Second, even in the absence of bias and stereotypes, clinical uncertainty can lead to disparities in health care through a phenomenon termed statistical discrimination. Although one form of statistical discrimination arises from assigning an individual the characteristic of the group, another form arises from greater uncertainty about one group than another. In health care, poor communication between physicians and minority patients may lead to greater uncertainty about the probability that a minority patient has a certain diagnosis or will respond to a certain treatment. In this setting, physicians are less able to “match” treatment to a patient's specific situation, and the patient is less likely to receive appropriate treatment. If the treatment is risky or has a limited benefit, clinicians become less certain that a minority patient meets the treatment threshold and are less likely to recommend treatment.
Money shot:
Reducing clinical uncertainty is an important focus for efforts to reduce disparities. For the first level of uncertainty, this effort requires gathering evidence about clinical effectiveness and translating that evidence into population guidelines. The recent reductions in racial disparities in influenza vaccination and cervical cancer screening have coincided with the widespread acceptance of population-based guidelines for these low-risk interventions. However, for most decisions, information is also needed to address the second level of uncertainty, translating evidence of “average” effectiveness to the individual patient. It is this level of uncertainty for which genomics may have the greatest effects on disparities...

Over the last decade, the relationship between genomics and disparities has become a national research endeavor. Although genetic variation among racial and ethnic groups has been widely demonstrated, the most effective approach for harnessing genomics to address racial disparities may come from focusing outside the race question. Advances in genomics offer the ability to improve clinical decision making, particularly in settings where uncertainty is high and statistical discrimination, including the use of stereotype and bias, is likely to occur.
 OK, fine, all very cool. Close on the heels of that came yesterday's FDL Book Salon:

FDL Book Salon Welcomes Sheldon Krimsky and Tania Simoncelli, Genetic Justice: DNA Data Banks, Criminal Investigations, and Civil Liberties
...[The book] covers the obvious and not-so-obvious privacy implications of DNA databanks, and the danger that racial disparities in the criminal justice system will be amplified through these databanks. It also explodes the myth of DNA infallibility, and explores just how effective these databanks are at detecting and deterring crime. Finally, it ends with a series of basic principles, which, if followed as a matter of both law and policy, would go a long way toward the responsible use of DNA in law enforcement.

Genetic Justice provides an accessible, yet exhaustive, review of this vital public policy issue. Many of us fail to appreciate that every time we discard a coffee cup, use a napkin, eat with a fork and spoon or otherwise interact with our environment, we leave a piece of ourselves behind. And that piece of ourselves—that DNA—can be used not just to discern our identity, but to provide clues on whether we’re likely to develop a particular disease, what we look like and where we come from. The physical trail of DNA can also be used to track our movements, and legal theories that permit the authorities to freely collect this “abandoned” DNA could theoretically make the warrant requirement and other checks on law enforcement abuse obsolete...
I bought the Kindle edition immediately, and have much to review. I will be reaching out to these authors shortly after I've had time to digest this work.

While the focus of "Genetic Justice" has principally to do with criminal and sociopolitical concerns of the broadly deployed genetic assay, it doesn't require much imagination to conjure up the health policy implications (principally employment, insurance, and credit discrimination).

Given my history, I have never been one to take the accuracy and precision of commercial DNA assay (nor its "forensic" superior) as a given. Moreover, I have to have serious doubt as to the expertise of the average physician where therapeutic genetic interpretation and decisionmaking is concerned. One hopes that the current crop of med school students will be accorded more curriculum time with the subject, but, I don't take that as a given.

The concept of individual uniqueness
The dilemma faced by practitioners and researchers is that known patterns are rough generalizations about large populations, and as such are usually an imperfect fit with unique individuals. Every individual is a unique combination of myriad similarities to and differences from other individuals. What constitutes a similarity or difference depends on the particular diagnostic or therapeutic context. The similarities mean that different individuals can be medically classified together in the same category— a trait or set of traits in common with other individuals. The differences mean that various individuals classified in the same category are nevertheless different from each other in various respects that may provide different keys to solving the medical problem they seem to have in common.
The similarities and differences arise initially from each individual’s unique genetic heritage and unique developmental history. Each individual is a recombination of pre-existing biological elements, which are built into an enormously complex set of interconnected structures and interacting processes. The recombination of elements is not static but continuously evolving, subject to both internal and external forces. An important internal force is the human body’s extraordinary capacity for self-regulation (known as homeostasis) and self repair. As a result, the normal physiology of healthy persons become increasingly differentiated over time.

This complexity increases by orders of magnitude when normal physiology is disrupted by pathophysiologic processes, psychological processes, the physical environment, the social environment and medical interventions. Some aspects, such as newly evolved pathogens or unidentified disease processes, may be unknown to medical science. Thus a person’s total medical condition can be regarded as a single, aggregate, new disease entity, described by Tolstoy as that person’s “own peculiar, personal, novel, complicated disease, unknown to medicine.”...

In short, each person’s illness will be a unique course of events, never precisely reproduced in any other person. Chronic illness in particular becomes highly personalized in this way. Consequently, when different individuals are labeled with the same illness, their medical condition and therapeutic needs may in fact differ radically. Diagnosis and treatment of each person’s illness must take into account the myriad resemblances to and differences from many other persons’ experiences of the “same” illness. Doing so far exceeds the capacity of the human mind. [Medicine in Denial, pp 181 - 182]
Hence the need for accurate just-in-time HIT -- HIT orders of magnitude more robust than that required to calculate BMI, send patient reminders, or export encrypted HL7 "clinical push messaging" to another EP.


apropos: You won't find this on the ONC website or the HITRC:

New Method Uses Unstructured EMR Text and Genetic Data to Link Diseases, Cluster Patients
A Danish research team has published a method that integrates information mined from free text in electronic patient records with protein and genetic information in order to uncover patterns of disease co-occurrence and help with patient stratification.

The team used text-mining techniques to extract clinically relevant terms from hospital staff notes in patient records and then mapped them to diseases codes in the World Health Organization's International Classification of Disease Ontology.

Team leader Søren Brunak, a professor of bioinformatics and disease systems biology at the Technical University of Denmark and the University of Copenhagen, respectively, said in a statement that when he and his colleagues applied their approach to electronic patient records at a local hospital, they were able to identify ten times more medical terms that characterized each patient than were manually entered by the hospital staff.

This additional detail is important, he added, because the terms used by healthcare providers in medical records "are heavily biased by local practice and billing purposes," which limits the ability to choose personalized treatment options.

Brunak said that the project began with his team's interest in characterizing phenotypes. "We actually started by looking at clinical disease descriptions from OMIM ... but there is no individuality to it ... there is nothing that can be used to classify individuals," he explained to BioInform. "We were interested in a more fine-grained characterization of patients and [that’s] why we turned to the patient record because there you have completely individualized information."

With the additional information, the researchers not only achieved the "fine-grained clinical characterization of each patient" they hoped for, but they were also able to find links between diseases and genes, and to stratify patients based on similar profiles...
Indeed, but, you think your ONC Certified EHR is nasty now while you're trying to hit a dad-gumbed $81 99213 in 20 minutes or less, look here:

Genetic data screen shot. Click to enlarge. "Cognitive load," anyone?
See also
Genomic Data Resources: Challenges and Promises
By: Warren C. Lathe III (OpenHelix), Jennifer M. Williams (OpenHelix), Mary E. Mangan (OpenHelix) & Donna Karolchik (University of California, Santa Cruz Genome Bioinformatics Group) © 2008 Nature Education

Standardized Genome Database Tools: GMOD
Because of historical, biological, and practical reasons, data are not completely consistent between species genomes and research projects. Model organism databases often have unique schemas that are not easily comparable to those used in databases for other species. Indeed, the terminology, analysis techniques, and importance attached to different sequence elements and annotations can be quite different across databases. The genome browsers mentioned above are one solution to this problem. However, another option has emerged to provide deeper and broader data for individual species' genomes, as well as increased standardization that allows for better cross-species comparisons and greater ease of use.

In particular, the consortium has worked on the development of an open-source standard database and set of visualization tools to make querying, browsing, and using genome databases similar for all species. GMOD has collaboratively developed a set of tools and database schema that include an annotation editor (Apollo), a genome browser (GBrowse), pathway tools, an advanced search capability (BioMart), a biological database schema (Chado), and additional resources that allow species research communities to develop databases that are standard and compatible across genomes. The goal of these efforts is to facilitate research and comparative studies.

Many species- and taxa-specific genome databases have made use of this standard, open-source set of database tools. The RGD, TAIR, Gramene, FlyBase, MGI, SGD, and WormBase databases are just a few. Sometimes, these tools are the main foundation of a database; in other cases, they supplement existing databases. Organisms with smaller research communities can also use these handy tools to create the annotation, visualization, and query options they need. As the rate of genome sequencing continues to increase thanks to new technologies, the GMOD tools may prove to be a boon for researchers who need to better explore their sequences of interest. Note that these tools can support many data types. In fact, there are many diverse and creative examples of ways in which the GMOD tools can be used, such as the (HGSDD), human variation data, and even personal genomics in the form of Watson's genome (Cheung & Estivill, 2003; International HapMap Consortium, 2003; Wheeler et al., 2008a). You can find a large and varied list of resources that use these tools by accessing the GMOD website.

Subject-Specific Databases
In addition to species- or genome-oriented databases, there are also databases organized by almost any biological data category one can imagine. For example, there are databases specifically for protein domain information (Pfam) and protein structure information (PDB). There are also repositories and databases of expression data, such as NCBI's Gene Expression Omnibus (GEO) and EBI's ArrayExpress(Berman & Westbrook, 2000; Barrett et al., 2006; Parkinson et al., 2007). In addition, GWAS databases such as dbGaP and HuGE Navigator are emerging (Yu et al., 2008). The list of subject-specific databases is quite large—as mentioned earlier, there are over 3,000 such resources—and the variety of these "focused" databases is as unlimited in scope as the data they contain. Nonetheless, this large number of species- and subject-specific databases, though extremely useful, can lead to its own issues of redundancy and lack of integration.

Solutions to the Current Challenges of Accuracy and Curation

All of the aforementioned resources, from the respositories to the genome databases and subject-specific databases, are increasingly faced with the challenge of ensuring accurate data and efficiently managing and curating that data. Recently, several solutions have been proposed (Waldrop, 2008; Howe et al., 2008). These solutions range from a greater focus on the education of database biocurators in learning institutions and the standardized inclusion of sequence data and references in publications to "community curation." One community curation solution envisions a sort of "wikification" of data update and curation, in which research communities curate their databases themselves. This has been proposed for repositories, specifically GenBank, as well as for focused resources, such as model organism databases (Pennisi, 2008; Salzberg, 2007).

GenBank has resisted this "wikification" proposal for various reasons, feeling that the current system allows an authoritative repository and a database of record and that community editing might diminish this strength. Additionally, there are already programs and efforts aimed at correcting and curating the sequence data, such as RefSeq (mentioned earlier in this article). As for model organism databases, there are currently several relatively successful efforts at community curation and annotation, including the Daphnia Genomics Consortium wiki and several other extensive undertakings. However, these efforts are hampered by factors such as the reliability of curation, the lack of incentives for researchers to contribute, and more. As the authors of a recent paper in Nature suggest, "To date, not much of the research community is rolling up its sleeves to annotate" (Howe et al., 2008).

Discussion and Future Challenges
Various efforts at building archives, databases, and analysis tools have proven successful at facilitating a better understanding of the genomes of multiple species. They have offered researchers authoritative repositories, contextual information, and curated data as a method of handling the exponentially growing amount of sequence data. Although these resources have been useful and have solved many issues, they will continue to face new types and ever-growing amounts of data that will exacerbate the challenges with which the research community is already faced. For example, genome-wide association studies will generate an enormous amount of data that will provide insight into the multifactorial genetic origins of disease, evolution, and more.

These data have also created new challenges related to the development of methods for visualizing and searching information. Recently, unforeseen privacy issues have required that large datasets be removed from public databases (Couzin, 2008; Zerhouni & Nabel, 2008) because personal genetic information could be associated to individuals. Metagenome sequencing projects, which analyze communities of genomes instead of individual genomes, are also creating large, complicated data sets that require unique tools and databases (Markowitz et al., 2008a; National Research Council Committee on Metagenomics, 2007).

Lastly, but importantly, the growing number of genome databases, analysis tools, and other resources available on the web has made it daunting for researchers to use these resources effectively. Even with efforts toward standardization and documentation, researchers continue to find it difficult to locate and learn to use these resources (Collins & Green, 2003). Solutions involving advanced, life-long training on the use and access of specific resources must be found. Next-generation sequencing and personal genomics will further burden efforts in this arena.

In spite of the challenges that have arisen with the growth of data and databases, the rewards and opportunities provided by this information have proven fruitful. Today, there is a wealth of data that was undreamed of just a couple of decades ago, enabling new discoveries and uncovering new relationships between different disciplines. The authors often joke to their students that if these resources had been available when they were in graduate school just 15 years ago, it would have taken them months—not years—to complete their degrees. With the growth of available data and resources in the next few years, amazing discoveries will continue to be made if the scientific community can meet the challenge...


Understanding the human brain is one of the greatest challenges facing 21st century science. If we can rise to the challenge, we can gain fundamental insights into what it means to be human, develop new treatments for brain diseases and build revolutionary new Information and Communications Technologies (ICT). In this report, we argue that the convergence between ICT and biology has reached a point at which it can turn this dream into reality. It was this realisation that motivated the authors to launch the Human Brain Project – Preparatory Study (HBP-PS) – a one-year EU-funded Coordinating Action in which nearly three hundred experts in neuroscience, medicine and computing came together to develop a new “ICT-accelerated” vision for brain research and its applications. Here, we present the conclusions of our work.

We find that the major obstacle that hinders our understanding of the brain is the fragmentation of brain research and the data it produces. Our most urgent need is thus a concerted international effort that can integrate this data in a unified picture of the brain as a single multi-level system. To reach this goal, we propose to build on and transform emerging ICT technologies.
In neuroscience, neuroinformatics and brain simulation can collect and integrate our experimental data, identifying and filling gaps in our knowledge, prioritizing and enormously increasing the value we can extract from future experiments.

In medicine, medical informatics can identify biological signatures of brain disease, allowing diagnosis at an early stage, before the disease has done irreversible damage, and enabling personalised treatment, adapted to the needs of individual patients. Better diagnosis, combined with disease and drug simulation, can accelerate the discovery of new treatments, speeding up and drastically lowering the cost of drug discovery.

In computing, new techniques of interactive supercomputing, driven by the needs of brain simulation, can impact a vast range of industries, while devices and systems, modelled after the brain, can overcome fundamental limits on the energy-efficiency, reliability and programmability of current technologies, clearing the road for systems with brain-like intelligence.

From the soul-crushing Socialist Dystopia of western Europe (pdf).
...In neuroscience, neuroinformatics and brain simulation can collect and integrate our experimental data, identifying and filling gaps in our knowledge, prioritizing and enormously increasing the value we can extract from future experiments.

In medicine, medical informatics can identify biological signatures of brain disease, allowing diagnosis at an early stage, before the disease has done irreversible damage, and enabling personalised treatment, adapted to the needs of individual patients. Better diagnosis, combined with disease and drug simulation, can accelerate the discovery of new treatments, speeding up and drastically lowering the cost of drug discovery.

In computing, new techniques of interactive supercomputing, driven by the needs of brain simulation, can impact a vast range of industries, while devices and systems, modelled after the brain, can overcome fundamental limits on the energy-efficiency, reliability and programmability of current technologies, clearing the road for systems with brain-like intelligence.

Personalised medicine
Disease progression and responses to treatment vary enormously among individuals. The HBP would collect clinical data from many different individuals, from different ethnic groups, subject to different environmental and epigenetic influences. This would make it possible to compare data for individual patients. In cancer, similar methods have been used to predict the effectiveness of specific treatments in individual patients [206], contributing to the development of personalised medicine. The discovery of reliable biological signatures for psychiatric and neurological disorders would make it possible to adopt similar strategies for the treatment of these disorders, improving the effectiveness of treatment. As with other aspects of the HBP’s contribution to medicine, even small improvements would have a very
large impact. [pg 88]

Fascinating, all of it. Yeah, some of it could be "creepy." But, overall it's pretty exciting.

Click to enlarge.

Here I go again, Exceeding My Scope. But, hey, this is, after all the core point of digital HIT, is it not?

MORE ON "DATA," BIG AND SMALL (and "The Haze of Bayes")

“Moneyball,” the 2012 election, and science- and evidence-based medicine
Published by David Gorski under Clinical Trials,Politics and Regulation,Science and Medicine,Science and the Media

...[D]octors are not baseball managers or ideologically-driven political pundits. Or, at least, so we would like to think. However, we are subject to the same sorts of biases as anyone else, and, unfortunately, many of us put more stock in our impressions than we do in data. Overcoming that tendency is the key challenge physicians face in embracing EBM, much less SBM. It doesn’t help that many of us are a bit too enamored of our own ability to analyze observations. As I’ve pointed out time and time again, personal clinical experience, no matter how much it might be touted by misguided physicians like, for example, Dr. Jay Gordon, who thinks that his own personal observations that lead him to believe that vaccines cause autism trump the weight of multiple epidemiological studies that do not. The same sort of dynamic occurs when it comes to “alternative” medicine (or “complementary and alternative medicine” or “integrative medicine” or whatever CAM proponents like to call it these days). At the individual level, placebo effects, regression to the mean, confirmation bias, observation bias, confusing correlation with causation, and a number of other factors can easily mislead one...

...[O]ne of the biggest impediments to data-driven approaches to almost anything, be it baseball, politics, or medicine, is the perception that such approaches take away the “human touch” or “human judgment.” The problem, of course, is that human judgment is often not that reliable, given how we are so prone to cognitive quirks that lead us astray. However, as Philips et al point out, data-driven approaches need not be in conflict with recognizing the importance of contextualized judgment. After all, data-driven approaches depend on the assumptions behind the models, and we’ll never be able to take judgment out of developing the assumptions that shape the them. What the “moneyball” revolution has shown us, at least in baseball and politics, is that the opinions of experts can no longer be viewed as sacrosanct, given how often they conflict with evidence. The same is likely to be true in medicine.
From the referent NEJM article:
The true relevance of moneyball to medicine, however, lies not just in the quantification of performance but in the appreciation of value. Numerical records have been kept for both baseball and medicine for well over a century; what has changed recently are the methods of finding the diamonds in the rough, of discovering true (and truly underappreciated) value. This innovative use of numbers to discover and invest in hidden value links both fields to the tradition of value-based investing pioneered by Benjamin Graham and David Dodd in the 1930s and subsequently popularized by Warren Buffett. It's no accident that the first teams to employ statisticians in baseball were among the poorest: you don't need to crunch the numbers when you can afford to pay top dollar for proven stars. Conversely, in health care, we have been spending as if we had the budget of the Yankees — while all signs suggest we'll soon be operating more like the Athletics. Collaborations among leaders in health services research, management sciences, and health care organizations have yielded new models for putting the value framework to work in medicine — as has already happened in baseball. And yet, cost-effectiveness modeling will always depend on the data and assumptions that are built into the models.

The recent deployment of the accountable care organization model in health care delivery represents an important test of moneyball medicine in practice. If such organizations can demonstrate the delivery of high-value care at lower costs, that would indeed hold promise for a moneyball revolution in medicine...

(large pdf)

Click to enlarge

We shall see, across the next few years and decades...

No comments:

Post a Comment