Search the KHIT Blog

Monday, July 9, 2012

Analytics - SAS, R, SQL, EHR database schema. An old school data miner's ramble involving the intersections of workflow, audit logging, and CER, etc.

Above, a tiny close-up snip from the original "RDBMS schema" (pdf) of "OpenEMR," an open-source ONC MU certified complete-system ambulatory EHR. When health care IT and clinical people give you that "yeah. but, we're different" beg-off, it's by no means baseless.

The full schema depicts a data dictionary layout of 54 linked tables and about 20 more reference tables. I haven't yet tallied up the total number of variables. Many hundreds, minimally.

Much more to come on all of this. I've been too busy playin' this weekend. A musical reflection on the SCOTUS ruling on the PPACA. (There's a tangential connection here, which I will get to.)


Open EMR pretty much rocks! Browser agnostic, too. It's PHP scripting language embedded in browser html code, pulling in Javascript and CSS (Cascading Style Sheets) as well. End user data are saved to, retrieved from SQL tables.

The data dictionary now comprises 118 relational tables.

Because it's open source, you can modify/extend everything and anything. Below, selected data table edit "masks" (you can modify/extend those as well).

I would change some of those "open text" data items to "forced choice format."
I crashed around in the demo for a while, and then dumped an audit log. Exported (.xls) snippet below.

Chronological user sort excerpt, from the time I logged in (that's our HealthInsight Las Vegas IP address at the top). Eleven "events" across 01:43 (avg 10 seconds per). Quick bit of Excel "lag variable" function code off that data/time variable, and I have essentially an EHR information "workflow" record (which I could then illustrate in one way or another). The "comment" column / "field" is the one with the variable length PHI "action(s) taken" piece (per §170.210.b). It will require some study (for granularity, among other things) and extractive analytic coding.


I just signed up for this. I've been a CS3 user for some time now, so I qualified for the $29.95 a month plan. ~A buck a day (for year one: there's always a catch, but, I'll cross that bridge when I come to it). Simply upgrading to the core CS6 Suite from my CS3 old release would have been about $600. There are 17 total apps in this "cloud" suite, I'm installing them this evening (You get to install the various component executables locally. I assume you then have to be logged in to run them. We shall See. No biggie).

Ought to be interesting.


Hmmm... for the other side of HIT data, 'eh? Touted as providing SAS / SPSS / Stata level stats computing power.

I may have to dust off my old software coder hat. Open source as well. (No way can I afford a SAS seat, not even JMP).

"R" is an is an offshoot of the Bell Labs "S programming language." I used "S Plus" while working in credit risk management (mainly FOR "CART" analytics, a.k.a., "Binary Recursive Partitioning" -- a cool, non-parametric (gotta love it) way of parsing the "goods" from the "bads" in terms of lending outcomes for subsequent scorecard modeling.

Here's what I'm seeing, conceptually.

(a 15 minute Keynote slide export)
I am all over this. So much new stuff to learn and apply.


My latest weekend recreational reading just arrived via UPS.

We shall see. All looks very interesting. As touted on one blog:
Data Mining with R - The Rattle Package

R ( is one of the most exciting free data mining software projects of these last years. Its popularity is absolutely justified (see Kdnuggets Polls - Data Mining/ Analytic Tools Used - 2011). Among the reasons which explain this success, we distinguish two very interesting characteristics: (1) we can extend almost indefinitely the features of the tool with the packages; (2) we have a programming language which allows to perform easily sequences of complex operations...
A couple of Rattle interface shots below:

Apropos of health data analytics, courtesy of Lion Data Systems:

All very encouraging. Time to Get My Geek On.



FDA Spied On Scientist Emails
By Amir Khan | July 15, 2012 9:58 PM EDT
The U.S. Food and Drug Administration operated a so-called "enemy list" of disgruntled scientists and spied of their emails using keylogging software, according to a report by the New York Times.  The operation began as an investigation into the possibility of leaked confidential emails, but grew into a surveillance program into critics of the FDA.

The agency used software intended employers to monitor workers to capture screen images, keystrokes, emails and documents line by line on the scientist's government laptops. The FDA admitted to the New York Times to monitoring five scientists, but said it was only to ensure that no information was improperly used.

The FDA did not immediately return a request for comment.

The product used, sold by the company SpectorSoft, cost as little as $99.95 for individual use, according to the Times. On the website, the company advertises that employers can follow all of their employee's moves online.

"Monitor everything they do," the website says. "Catch them red-handed by receiving instant alerts when keywords or phrases are typed or are contained in an e-mail, chat, instant message or Web site."...
Hey: Keywords "EHR usability and safety oversight," lol...

OK apropos of my diversionary GarageBand fun last week, in the context of "workflow."

OK, nominally "obtuse" though I may seem, I will elaborate shortly and tie off the foregoing threads of thought.

Hint: "time-based information workflows" - they're right there for the culling and investigation for process improvement, right within the EHR audit logs.

More to come...


  1. Hello all,I am new and I would like to ask that what are the benefits of sql training, what all topics should be covered and it is kinda bothering me ... and has anyone studies from this course of SQL tutorial online?? or tell me any other guidance...
    would really appreciate help... and Also i would like to thank for all the information you are providing on sql.