Search the KHIT Blog

Friday, March 18, 2016

0.946 µg/L, my latest favorite "small data" result

We'll discuss some aspects of knowledge and "Big Data," but first,

"0.946 µg/L."

Well, that was a relief to learn. My latest PSA result post-Calypso radiation tx. I finished up radiation treatment on November 10th. When I began tx, my PSA was "10.74 µg/L." (See my "Shards" posts.) A month after finishing I had my first follow-up blood draw, and the result came back "3.65 µg/L," down considerably, yet probably a bit premature for drawing any calming conclusions. The IMRT literature tells us to expect an ongoing PSA decline for two years post-tx. The RadOnco NP who called me to discuss this result affirmed that, saying she would expect that my next assay in June will likely decline further.

Relieved. But I joked with her that "I don't know that the '0.04' and '0.006' parts of that number tell me anything." She agreed.

Why not just round it to "0.9"? What we call "significant figures rounding." In the lab in Oak Ridge where I worked a quarter century ago as a programmer and QC analyst, you had to demonstrate both to regulators and your clients your bench-level empirical capability of discriminating between, say "0.005" and "0.007" or you could not report the "0.006." Similarly for the "0.04" fraction of the datum (including that of my last 2015 pre-tx result of "10.74 µg/L").

Round 'em up, pardner.

I looked around quickly for some PSA assay tech literature.
"The long-term precision (CV) for the PSA test in our laboratory, determined with commercial quality-control sera at mean values of 2.7, 0.97, and 0.81 µg/L, was 4.7% (n = 396), 8.3% (n = 259), and 8.2% (n = 487), respectively. These data suggest that the long-term precision of the PSA test in our laboratory would meet the fixed-limit criteria based on the biological variation data. A review of the data from a recent College of American Pathologists’ proficiency-testing survey indicates that most laboratories would achieve acceptable performance if the biological-variation-based fixed limits were used. In that survey, with 322 laboratories reporting, a CV of 9% at a mean value of 2.28 µg/L was achieved. A calculated fixed-limit range based on a CV of 5% is 2.04-2.48 µg/L (2 SD) and 1.92-2.60 µg/L (3 SD). This compares well with the peer 3 SD range of 1.68-2.88 µg/L. Finally, a generally accepted empirical criterion for assessing clinically significant changes of serial tumor-marker values is a change of 25%. The CV calculated from the 25% change at a mean value of 4.0, µg/L is 12.5%, which agrees with the 10.0% CV biological variation at that decision point."
OK, quickly...

"Precision" differs from "accuracy." The former goes to replicability/reproducibility whereas the latter goes to "hitting the bull's eye, dead-center of the target." Should your "shots" (e.g., lab sample runs) cluster tightly around dead-center, you can claim both "accuracy" and "precision," which are obviously what you strive for.

But, one can be quite precise, and quite precisely wrong. The shooter depicted above was biased high.

The "CV" is the "Coefficient of Variation," also conventionally known in the lab as the "RSD" -- "Relative Standard Deviation." The RSD is simply the ratio of the Standard Deviation (aka "SD" or "Sigma") to the Mean: an expression of relative variability around an average value. (Note: -- the SD is the "Root Mean Squared" variation; in plain English, the "expected average variation around an average"). If, for example, you have a mean of 100 and a SD of 10, your "CV" is 10%. Ten percent expected variability in repeated testing (1 SD, that is).

In a (mythical) perfectly "normally distributed" set of results (smooth bell curve) "99.73%" of the data will lie between ± 3 Sigma, in equal declining proportions around the mean.

A few points relating to the foregoing. [1] RSD typically increases as the concentration of the target analyte decreases. [2] "Outliers" (extreme ± values) inflate the RSD, and [3] all of it assumes distributional "normality" (the Bell Curve thing). [4] In the lab, there are no "true/known" values where patient specimens are concerned. Any single lab result is a point-estimate of an average concentration. Split my blood draw into, say, nine equal volume "aliquots" and send them in blinded triplicates to three different labs, and you will likely get nine differing reported results. Perhaps the variations will be trivial. Perhaps not. The "most accurate" result will not be directly knowable.

The best you can do is "spike" your samples with a certified NIST "standard" analyte of "known" concentration and infer that your production runs agree closely in relative terms to those of your "matrix spikes" (blood, urine, etc are "matrices"). Finally, as Theranos has learned the hard way, [5] analyte precision and accuracy are to a significant degree a function of sample size -- e.g., my results are reported in "micrograms per liter (µg/L)." They didn't draw a liter of my blood. They are extrapolating up. Extrapolating up from a single drop of blood (Theranos methodology) would be even dicier.

In our lab, the guiding phrase was "you get what you inspect, not what you expect (I had some revelatory fun at my own expense 20 years ago on that point). Lab QC remains both difficult and expensive. We patients and our clinicians, however, typically just take our reported results at face value. We assume them to be "actionable."

Notwithstanding a world wherein people now increasingly cite their academic GPAs to four decimal places, I would be perfectly fine with a reported PSA result of "0.9 µg/L." Useful to keep in mind that "0.9 µg/L" is nine-tenths of a millionth (µ="micro") of a gram in a liter of serum, a very small quantity in its own right. Gilding the lily with smaller decimal fractions simply because computers make it easy does little beyond adding to evaluative noise (what does one ten-thousandth of an ordinal scale GPA metric tell me? Nothing). In the overall context of my post-treatment prognosis given the trend of my PSA assays, my oncologist will come to the same assessment conclusion in the event he sees either "0.9," 0.95," or "0.946" in  the Labcorp report.

Another document I found (University of Washington Medical Center Department of Laboratory Medicine Immunology Division).

    Results are reported to the nearest tenth (0.1). The lowest reportable PSA result is 0.1 ng/mL. The assay does not have a maximum reportable limit since off-line dilutions can be made to bring the concentration within the working range of the assay. Estimates of imprecision can be generated from long-term quality control pool results.


A Jill Lepore New Yorker article I read yesterday caused me to jump a new book to the head of my endless reading queue.
The era of the fact is coming to an end: the place once held by “facts” is being taken over by “data.”
Imagine a society where smartphones are miniaturized and hooked directly into a person’s brain. With a single mental command, those who have this technology— let’s call it neuromedia— can access information on any subject. Want to know the capital of Bulgaria or the average flight velocity of a swallow? It’s right there. Users of neuromedia can take pictures with a literal blink of the eye, do complex calculations instantly, and access, by thought alone, the contact information for anyone they’ve ever met. If you are part of this society, there is no need to remember the name of the person you were introduced to last night at the dinner party; a sub-cellular computing device does it for you. 

For the people of this society, it is as if the world is in their heads. It is a connected world, one where knowledge can be instantly shared with everyone in an extremely intimate way. From the inside, accessing the collective wisdom of the ages is as simple as accessing one’s own memory. Knowledge is not only easy; everyone knows so much more. 

Of course, as some fusspots might point out, not all the information neuromedia allows its users to mentally access is really “knowledge.” Moreover, they might claim, technological windows are two-way. A device that gives you a world of information also gives the world huge amounts of information about you, and that might seem like a threat to privacy. Others might fret about fragmentation— that neuromedia encourages people to share more information with those who already share their worldview, but less with those who don’t. They would worry that this would make us less autonomous, more dependent on our particular hive-mind— less human. 

But we can imagine that many in the society see these potential drawbacks as a price worth paying for immediate and unlimited access to so much information. New kinds of art and experiences are available, and people can communicate and share their selves in ways never before possible. The users of neuromedia are not only free from the burden of memorization, they are free from having to fumble with their smartphone, since thoughts can be uploaded to the cloud or shared at will. With neuromedia, you have the answer to almost any question immediately without effort— and even if your answers aren’t always right, they are right most of the time. Activities that require successful coordination between many people— bridge building, medicine, scientific inquiry, wars— are all made easier by such pooled shared “knowledge.” You can download your full medical history to a doctor in an emergency room by allowing her access to your own internal files. And of course, some people will become immensely wealthy providing and upgrading the neural transplants that make neuromedia possible. All in all, we can imagine, many people see neuromedia as a net gain. 

Now imagine that an environmental disaster strikes our invented society after several generations have enjoyed the fruits of neuromedia. The electronic communication grid that allows neuromedia to function is destroyed. Suddenly no one can access the shared cloud of information by thought alone. Perhaps backup systems preserved the information and knowledge that people had accumulated, and they can still access that information in other ways: personal computers, even books can be dusted off. But for the inhabitants of the society, losing neuromedia is an immensely unsettling experience; it’s like a normally sighted person going blind. They have lost a way of accessing information on which they’ve come to rely. And that, while terrible, also reveals a certain truth. Just as overreliance on one sense can weaken the others, so overdependence on neuromedia might atrophy the ability to access information in other ways, ways that are less easy and require more creative effort. 

While neuromedia is currently still in the realm of science fiction, it may not be as far off as you think. The migration of technology into our bodies— the cyborging of the human— is no longer just fantasy. And it shouldn’t surprise anyone that the possibilities are not lost on companies such as Google: “When you think about something and don’t really know much about it, you will automatically get information,” Google CEO Larry Page is quoted as saying in Steven Levy’s recent book In the Plex. “Eventually you’ll have an implant, where if you think about a fact, it will just tell you the answer.” But as Larry Page’s remark suggests, the deeper question is about information and knowledge itself. How is information technology affecting what we know and how we know it?...

My hypothesis is that information technology, while expanding our ability to know in one way, is actually impeding our ability to know in other, more complex ways; ways that require 1) taking responsibility for our own beliefs and 2) working creatively to grasp and reason how information fits together. Put differently, information technologies, for all their amazing uses, are obscuring a simple yet crucial fact: greater knowledge doesn’t always bring with it greater understanding...

The Internet of Things is made possible by— and is also producing— big data. The term “big data” has no fixed definition, but rather three connected uses. First, it names the ever-expanding volume of data that surrounds us. You’ve heard some of the statistics. As long ago as 2009, there were already 260 million page views per month on Facebook; in 2012, there were 2.7 billion likes per day. An estimated 130 million blogs exist; there are around 500 million tweets per day; and billions of video views on YouTube. By some estimates, the amount of data in the world in 2013 was already something around 1,200 exabytes; now it is in the zetabytes. That’s hard to get your mind around. As Viktor Mayer-Schönberger and Kenneth Cukier estimate in their recent book, Big Data: A Revolution That Will Transform How We Live, Work, and Think, if you placed that much information on CD-ROMs (remember them?) it would stretch to the moon five times. It would be like giving every single person on the earth 320 times as much information as was stored in the ancient library of Alexandria.

And by the time you are reading this, the numbers will be even bigger. So, one use of the term “big data” refers to the massive amount of data making up our digital form of life. In a second sense, it can be used to talk about the analytic techniques used to extract useful information from that data. Over the last several decades, our analytic methods for information extraction have increased in sophistication along with the increasing size of the data sets we have to work with. And these techniques have been put to a mind-boggling assortment of uses, from Wall Street to science of all sorts. A simple example is the data “exhaust” you are leaving as you read these very words on your Kindle or iPad. How much of this book you read, the digital notes you take on it, is commercially available information, extracted from the trail of data you leave behind as you access it in the cloud...

Lynch, Michael P. (2016-03-21). The Internet of Us: Knowing More and Understanding Less in the Age of Big Data (pp. 3-9). Liveright. Kindle Edition.
Interesting read thus far. Stay tuned. Yeah, "0.946 µg/L" is now part of my "data exhaust."
"You can download your full medical history to a doctor in an emergency room by allowing her access to your own internal files."
LOL. Talk about "Interoperability!" Merle Buskin?

I'll have to triangulate this new book with earlier stuff I've read and written of regarding "AI/IA" and other issues pertaining to "structured data." as well as Jo Marchant's new book "Cure."


I finished the Lynch book (all the way through the end notes). It is excellent. Will have much more to say about it, things relevant to the health care and health IT space. Among other things, will have to triangulate it with thoughts in other earlier posts of mine. See, e.g.,
Ms. Lepore, in The New Yorker:
Then came the Internet. The era of the fact is coming to an end: the place once held by “facts” is being taken over by “data.” This is making for more epistemological mayhem, not least because the collection and weighing of facts require investigation, discernment, and judgment, while the collection and analysis of data are outsourced to machines. “Most knowing now is Google-knowing—knowledge acquired online,” Lynch writes in “The Internet of Us” (his title is a riff on the ballyhooed and bewildering “Internet of Things”). We now only rarely discover facts, Lynch observes; instead, we download them. Of course, we also upload them: with each click and keystroke, we hack off tiny bits of ourselves and glom them on to a data Leviathan...
Yeah. Pretty interesting, all of it. Again, I will have much more to say. For one thing, Dr. Lynch's chapter on "privacy" reminds me of some of my own various posts on the topic, outside of the health care space. See, e.g., my post "Clapp Trap" and its links to antecedent reflections of mine.

Also, as it goes to discussions of science and "truth," two of my long-favorite reads come to mind and are recommended.

apropos of the book on the right, see my August 2015 post "On Donald Trump®"

From "On Truth":
We are all aware that our society perennially sustains enormous infusions—some deliberate, some merely incidental—of bullshit, lies, and other forms of misrepresentation and deceit. It is apparent, however, that this burden has somehow failed—at least, so far—to cripple our civilization. Some people may perhaps take this complacently to show that truth is not so important after all, and that there is no particularly strong reason for us to care much about it. In my opinion, that would be a deplorable mistake...
...even those who profess to deny the validity or the objective reality of the true-false distinction continue to maintain without apparent embarrassment that this denial is a position that they do truly endorse. The statement that they reject the distinction between true and false is, they insist, an unqualifiedly true statement about their beliefs, not a false one. This prima facie incoherence in the articulation of their doctrine makes it uncertain precisely how to construe what it is that they propose to deny. It is also enough to make us wonder just how seriously we need to take their claim that there is no objectively meaningful or worthwhile distinction to be made between what is true and what is false [pp. 6-9].

Jerome Carter, MD
Clinical Concepts vs. Chart Concepts – How Do They Differ?
I was goaded into mulling over this question by the same concerns that led to my writing, Is the Electronic Health Record Defunct? I am worried about being trapped in a paper-oriented thinking mode when designing clinical apps. That trap is easy to fall into because patient information is essential to patient care. All clinical work requires access to patient information, so it is easy to jump straight to thinking about what information to store and how to store it. Once snared, taking the next step of borrowing design ideas from paper charts is hard to resist. How does one separate clinical care concepts from chart concepts when creating software???

Let’s start with medication and problem lists. What concepts do these lists embody or convey? Understanding why I ask this question requires a bit of a philosophical diversion. A word, sentence, or phrase has a meaning (semantics). For example, the word aspirin has a meaning—it is a non-steroidal pain medicine and anti-inflammatory agent. It also may denote something specific (i.e., an enteric coated tablet in my hand). However, it is possible for a word or phrase to have semantic import (meaning) and yet denote nothing. Bertrand Russell illustrated this point with his storied example: “The present king of France…” Semantically speaking, anyone who understands English knows what this phrase means. The problem, of course, is that there is no king of France. Semantically, the sentence is fine; however, it denotes nothing. Now, back to clinical vs. charts concepts.

When a clinician uses a nomenclature such as RxNorm to add a drug to a patient’s current medication list, we can be sure that the semantic properties of the term are intact. However, if the patient never receives, refuses, or stops taking the drug, that RxNorm term now listed in the medication list no longer denotes a real fact about the patient. Here, we have a clinical care concept – the clinician deciding/intending to give the drug – rendered in the patient’s medication list; however, the term does not correspond to the patient’s actual situation. A drug being in a patient’s medication list has a meaning, but not necessarily any denotation. Now, here is my question to all of my philosophically-inclined readers: When building clinical care systems, should we design systems to avoid semantic/denotational mismatches? Stated another way, should we record the intention to give a medication in a different way than the way we record that a patient is definitely taking the medication?
Semantic/denotational mismatches are patient safety issues because when they occur, they convey false information about the patient. Medication reconciliation helps to address this with medication issues, but what about problems/diagnoses?

A problem/diagnosis listed in the patient record implies that the patient has that problem. By extension, the absence of a problem implies it isn’t present in the patient.  Every incorrect entry in a patient’s problem list is a semantic/denotational mismatch. What havoc might result for decision support users? How can a system’s design reconcile the clinician’s belief/intent with the actual state of the patient?...
Yeah, and it's perhaps worth citing again that which I posted on HIMSS16 Day One:

"OK. one core thing I will be looking for this week, apropos of my recent review of Dr. Nortin Hadler's book, are affirmative responses to this Hadler cite:

"If there is a role for computers in decision making, it is to facilitate dialogue between patient and physician, not to supplant the input from either or to cut health-care costs."

Now, obviously, the physician and her patient are not the only "stakeholders" in the health care space, but if all of our cherubic talk of "patient-centered care" is more than just Suit talk, then I'm looking for affirmative evidence of the foregoing. The salient question: "how does what you do/sell 'facilitate physician-patient dialogue'?"


More Lynch:
The End of Theory?
In 2008, Chris Anderson, then editor of Wired, wrote a controversial and widely cited editorial called “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” Anderson claimed that what we are now calling big data analytics was overthrowing traditional ways of doing science:

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves ... Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
Traditional scientific theorizing aims at model construction. Collecting data is just a first step; to do good science, you must explain the data by constructing a model of how and why the phenomenon in question occurred as it did. Anderson’s point was that the traditional view assumes that the data is always limited. That, he says, is the assumption big data is overthrowing... [Lynch, op cit, pp. 156-157].
"Who cares?" Seriously, dude? Anderson was simply wrong on this point, IMO. Thinking that you can do sound science absent outset nul hypotheses ("models") is naive. Give me any set of "big data" and I will come back with myriad "significant correlations" that tell us nothing efficaciously actionable (particularly in the clinical space).

Moreover, what about "data quality?" See my Nov 2013 post "(404)^n, the upshot of dirty data."
In many commercial analytic pursuits, you can be mostly wrong yet remain handsomely profitable.

The stakes in areas such as health care are quite another matter.

More to come...

No comments:

Post a Comment