Search the KHIT Blog

Sunday, February 25, 2018

#AI and health diagnostics. "Reproducibility," anyone?

From ARS Technica:

AI trained to spot heart disease risks using retina scan
The blood vessels in the eye reflect the state of the whole circulatory system.


The idea behind using a neural network for image recognition is that you don't have to tell it what to look for in an image. You don't even need to care about what it looks for. With enough training, the neural network should be able to pick out details that allow it to make accurate identifications.

For things like figuring out whether there's a cat in an image, neural networks don't provide much, if any, advantages over the actual neurons in our visual system. But where they can potentially shine are cases where we don't know what to look for. There are cases where images may provide subtle information that a human doesn't understand how to read, but a neural network could pick up on with the appropriate training.

Now, researchers have done just that, getting a deep-learning algorithm to identify risks of heart disease using an image of a patient's retina.

The idea isn't quite as nuts as it might sound. The retina has a rich collection of blood vessels, and it's possible to detect issues in those that also effect the circulatory system as a whole; things like high levels of cholesterol or elevated blood pressure leave a mark on the eye. So, a research team consisting of people at Google and Verily Life Sciences decided to see just how well a deep-learning network could do at figuring those out from retinal images.

To train the network, they used a total of nearly 300,000 patient images tagged with information relevant to heart disease like age, smoking status, blood pressure, and BMI. Once trained, the system was set loose on another 13,000 images to see how it did.
Simply by looking at the retinal images, the algorithm was typically able to get within 3.5 years of a patient's actual age. It also did well at estimating the patient's blood pressure and body mass index. Given those successes, the team then trained a similar network to use the images to estimate the risk of a major cardiac problem within the next five years. It ended up having similar performance to a calculation that used many of the factors mentioned above to estimate cardiac risk—but the algorithm did it all from an image, rather than some tests and a detailed questionnaire.

The neat thing about this work is that the algorithm was set up so it could report back what it was focusing on in order to make its diagnoses. For things like age, smoking status, and blood pressure, the software focused on features of the blood vessels. Training it to predict gender ended up causing it to focus on specific features scattered throughout the eye, while body mass index ended up without any obvious focus, suggesting there are signals of BMI spread throughout the retina…
OK, I'm all for reliable, accurate tech dx assistance. But, from my latest (paywalled) issue of Science Magazine:

Last year, computer scientists at the University of Montreal (U of M) in Canada were eager to show off a new speech recognition algorithm, and they wanted to compare it to a benchmark, an algorithm from a well-known scientist. The only problem: The benchmark's source code wasn't published. The researchers had to recreate it from the published description. But they couldn't get their version to match the benchmark's claimed performance, says Nan Rosemary Ke, a Ph.D. student in the U of M lab. “We tried for 2 months and we couldn't get anywhere close.”

The booming field of artificial intelligence (AI) is grappling with a replication crisis, much like the ones that have afflicted psychology, medicine, and other fields over the past decade. AI researchers have found it difficult to reproduce many key results, and that is leading to a new conscientiousness about research methods and publication protocols. “I think people outside the field might assume that because we have code, reproducibility is kind of guaranteed,” says Nicolas Rougier, a computational neuroscientist at France's National Institute for Research in Computer Science and Automation in Bordeaux. “Far from it.” Last week, at a meeting of the Association for the Advancement of Artificial Intelligence (AAAI) in New Orleans, Louisiana, reproducibility was on the agenda, with some teams diagnosing the problem—and one laying out tools to mitigate it.


The most basic problem is that researchers often don't share their source code. At the AAAI meeting, Odd Erik Gundersen, a computer scientist at the Norwegian University of Science and Technology in Trondheim, reported the results of a survey of 400 algorithms presented in papers at two top AI conferences in the past few years. He found that only 6% of the presenters shared the algorithm's code. Only a third shared the data they tested their algorithms on, and just half shared “pseudocode”—a limited summary of an algorithm. (In many cases, code is also absent from AI papers published in journals, including Science and Nature.)


Researchers say there are many reasons for the missing details: The code might be a work in progress, owned by a company, or held tightly by a researcher eager to stay ahead of the competition. It might be dependent on other code, itself unpublished. Or it might be that the code is simply lost, on a crashed disk or stolen laptop—what Rougier calls the “my dog ate my program” problem.


Assuming you can get and run the original code, it still might not do what you expect. In the area of AI called machine learning, in which computers derive expertise from experience, the training data for an algorithm can influence its performance. Ke suspects that not knowing the training for the speech-recognition benchmark was what tripped up her group. “There's randomness from one run to another,” she says. You can get “really, really lucky and have one run with a really good number,” she adds. “That's usually what people report.”…
Issues of "proprietary code," "intellectual property," etc? Morever, there's the additional problem, cited in the Science article, that AI applications, by virtue of their "learning" functions, are not strictly "algorithmic." There's a "random walk" aspect, no? Moreover, accuracy of AI results assumes accuracy of the training data. Otherwise, the AI software learns our mistakes.

Years ago, when I was Chair of the ASQ Las Vegas Section, we once had a presentation on the "software life cycle QA" of military fighter jets' avionics at nearby Nellis AFB. That stuff was tightly algorithmic, and was managed with an obsessive beginning-to-end focus on accuracy, reliability.

Reproducibility.

Update: of relevance, from Science Based Medicine: 
Replication is the cornerstone of quality control in science, and so failure to replicate studies is definitely a concern. How big a problem is replication, and what can and should be done about it?

As a technical point, there is a difference between the terms “replication” and “reproduction” although I often see the terms used interchangeably (and I probably have myself). Results are said to be reproducible if you analyse the same data again and get the same results. Results are replicable when you repeat the study to obtain fresh data and get the same results.

There are also different kinds of replication. An exact replication, as the name implies, is an effort to exactly repeat the original study in every detail. But scientists acknowledge that “exact” replications are always approximate. There are always going to be slight differences in the materials used and the methodology...
From the lab methodology chapter of my 1998 grad school thesis:
The terms “accuracy” and “precision” are not synonyms. The former refers to closeness of agreement with agreed-upon reference standards, while the latter has to do with the extent of variability in repeated measurements. One can be quite precise, and quite precisely wrong. Precision, in a sense, is a necessary but insufficient prerequisite for the demonstration of “accuracy.” Do you hit the “bull’s eye” red center of the target all the time, or are your shots scattered all over? Are they tightly clustered lower left (high precision, poor accuracy), or widely scattered lower left (poor precision, poor accuracy). In an analytical laboratory, the “accuracy” of production results cannot be directly determined; it is necessarily inferred from the results of quality control (“QC”) data. If the lab does not keep ongoing, meticulous (and expensive) QC records of the performance histories of all instruments and operators, determination of accuracy and precision is not possible….

A “spike” is a sample containing a “known” concentration of an analyte derived from an “NIST-traceable” reference source of established and optimal purity (NIST is the National Institute of Standards and Technology, official source of all U.S. measurement reference standards). A “matrix blank” is an actual sample specimen “known” to not contain any target analytes. Such quality control samples should be run through the lab production process “blind,” i.e., posing as a normal client specimens. Blind testing is the preferred method of quality control assessment, simple in principle but difficult to administer in practice, as lab managers and technicians are usually adept at sniffing out inadequately concealed blinds, which subsequently receive special scrutiny. This is particularly true at certification or contract award time; staffs are typically put on “red alert” when Performance Evaluation samples are certain to arrive in advance of license approvals or contract competitions. Such costly vigilance may be difficult to maintain once the license is on the wall and the contracts signed and filed away…
___

#AI developers, take note. Particularly in the health care space. If someone doesn't get their pizza delivery because of AI errors, that's trivial. Miss an exigent clinical dx, that's entirely another matter.

Related Science Mag article (same issue, Feb. 15th, 2018): "Missing data hinder replication of artificial intelligence studies."

Also tangentially apropos, my November post "Artificial Intelligence and Ethics." And, "Digitech AI news updates."

UPDATE

Also of relevance. A nice long read:


The Coming Software Apocalypse
A small group of programmers wants to change how we code — before catastrophe strikes.


…It’s been said that software is “eating the world.” More and more, critical systems that were once controlled mechanically, or by people, are coming to depend on code. This was perhaps never clearer than in the summer of 2015, when on a single day, United Airlines grounded its fleet because of a problem with its departure-management system; trading was suspended on the New York Stock Exchange after an upgrade; the front page of The Wall Street Journal’s website crashed; and Seattle’s 911 system went down again, this time because a different router failed. The simultaneous failure of so many software systems smelled at first of a coordinated cyberattack. Almost more frightening was the realization, late in the day, that it was just a coincidence.

“When we had electromechanical systems, we used to be able to test them exhaustively,” says Nancy Leveson, a professor of aeronautics and astronautics at the Massachusetts Institute of Technology who has been studying software safety for 35 years. She became known for her report on the Therac-25, a radiation-therapy machine that killed six patients because of a software error. “We used to be able to think through all the things it could do, all the states it could get into.” The electromechanical interlockings that controlled train movements at railroad crossings, for instance, only had so many configurations; a few sheets of paper could describe the whole system, and you could run physical trains against each configuration to see how it would behave. Once you’d built and tested it, you knew exactly what you were dealing with.

Software is different. Just by editing the text in a file somewhere, the same hunk of silicon can become an autopilot or an inventory-control system. This flexibility is software’s miracle, and its curse. Because it can be changed cheaply, software is constantly changed; and because it’s unmoored from anything physical — a program that is a thousand times more complex than another takes up the same actual space — it tends to grow without bound. “The problem,” Leveson wrote in a book, “is that we are attempting to build systems that are beyond our ability to intellectually manage.”…
Read all of it.
__

OPEN SOURCE TO THE RESCUE?

OpenAI's mission is to build safe AGI, and ensure AGI's benefits are as widely and evenly distributed as possible. We expect AI technologies to be hugely impactful in the short term, but their impact will be outstripped by that of the first AGIs.

We're a non-profit research company. Our full-time staff of 60 researchers and engineers is dedicated to working towards our mission regardless of the opportunities for selfish gain which arise along the way...
Lots of ongoing "Open AI" news here. They're on Twitter here.

UPDATE

From Wired:
Why Artificial Intelligence Researchers Should Be More Paranoid
LIFE HAS GOTTEN more convenient since 2012, when breakthroughs in machine learning triggered the ongoing frenzy of investment in artificial intelligence. Speech recognition works most of the time, for example, and you can unlock the new iPhone with your face.

People with the skills to build things such systems have reaped great benefits—they’ve become the most prized of tech workers. But a new report on the downsides of progress in AI warns they need to pay more attention to the heavy moral burdens created by their work.

The 99-page document unspools an unpleasant and sometimes lurid laundry list of malicious uses of artificial-intelligence technology. It calls for urgent and active discussion of how AI technology could be misused. Example scenarios given include cleaning robots being repurposed to assassinate politicians, or criminals launching automated and highly personalized phishing campaigns.

One proposed defense against such scenarios: AI researchers becoming more paranoid, and less open. The report says people and companies working on AI need to think about building safeguards against criminals or attackers into their technology—and even to withhold certain ideas or tools from public release…
We all need to closely read both the article and the 99 page report. The Exec Summary of the Report:


I can assume that many of you have watched to 2018 Winter Olympics. The opening and closing ceremonies featuring dynamic choreographed drone light shows were beautiful, amazing.


Now, imagine a huge hostile swarm of small drones, each armed with explosives, target-enabled with the GPS coordinates of the White House and/or Capitol Hill, AI-assisted, remotely "launched" by controllers halfway around the world.

From a distance they might well resemble a large flock of birds. They wouldn't all have to get through.

'eh?

_____________

More to come...

No comments:

Post a Comment