OpenMRS rocks Google Summer of Code 2011

With the following worthy projects:

Very impressive lineup. If this is all OpenMRS did this year, this would be an impressive list, but this is just the student projects sponsored by Google.


Hacking data: showing patterns in kids health

Here is my submission for the Local Children’s Data Health 2.0 developer challenge. The challenge was to make data available through come alive.

Generally, the red circles correspond to the percentage of child allergy suffers who had -seen- a doctor, but had no specific plan to address their condition. The red tags, are healthcare providers from the NPI database that are listed as experts in kids allergies… the top of the field for asthma treatment. We are using these “super experts” as a proxy for the availability of specialist care for allergies generally. Notice the under-served areas… The specialist are clustering in the high-population areas. Hopefully this map will inspire an expert to move to Eureka, or Santa Maria..

Here was my process for this for my hack:

  • I would only use Open Source software or Open APIs. The idea here is to show just how powerful FOSS tools can be in health data analysis.
  • I have just created the best API to the National Provider Identifier database at, so I have this rich datasource that previously has not been available as an API.
  • I wanted to target something from that was directly related to the availability of healthcare, something that you can measure geographically using the API.
  • I chose Asthma, because this is something that clearly responds to treatment.
  • I wanted to document my process to show how easy this kind of analysis is with the right tools.

Ok here’s what I did…

  1. First, I browsed for asthma information. That leads you straight to this analysis of asthma hospitalizations for young children over the last few years.
  2. Then I started digging for source data. It looks like the California Health Interview Survey was a substantial source of the data.
  3. They offer Public Use Files of the original survey data. I signed in, and the terms of use for the data were reasonable, and not contrary to my purposes or Open Source. So I signed up and went to download the data.
  4. Sadly, the data was only available in three proprietary data formats, Strata, SPSS and SAS. This was obviously designed for academics that think using proprietary software is ethical and normal. Thankfully there are other options. The R project is where I usually turn first for stats help, but I actually found that there was an Open Source SPSS alternative called PSPP. Using PSPP I was able to open the SPSS data file. Victory for Open Source! It would be nice if organizations like CHIS would release in simple XML or CSV, which is much friendlier to hackers and people who believe in software freedom.
  5. My feeling of elation was short lived. The data had no geo-coded information. Which makes sense, that would make re-identification much easier. There had to be another way to get geo-coded data.
  6. And there was. AskCHIS is a powerful data reporting tool that allowed for xls data download. Again, I am amazed that CHIS would chose to run with a proprietary format without an open alternative. They used alot of advanced xls layout options that meant that an export to CSV would never work. An API would be even better, but at least CSV would allow me to actually parse a file instead of cutting and pasting which is what I ended up doing.
  7. But I had access to lots of data. I could see several different measures of asthma that I could have used in my mashup. This included lots of stuff like missed school days, emergency room visits, diagnosis of asthma, symptoms in the last twelve months… etc etc. If CHIS had given this data up using an API, I would have been able to merge the various asthma measures into an overall asthma status score… but it would have take a week of cutting and pasting to do that manually.
  8. So I had to choose one data point and run with it. I chose “Health professional ever provided asthma management plan“. This was asked to parents whose kids already had a doctor who was “treating” the asthma. I thought this was an interesting question because it seemed to correlate strongly with doctor-availability, something that I had good geo-coded data on.
  9. Now what provider data should I compare it to? Using I can easily grab a list of all/most of the doctors in California who specialize in treating allergies in children I decided to use that as a proxy for “available allergy specialists”. Of course, I had a serious advantage here, because I had already done the work of changing the NPI database into something I could access using an API (that is the idea behind This easily saved me 30 hours of work on this project alone.
  10. So now I have the data I want… but what now? I had addresses for the doctors and clinics from the NPI database, but the asthma data was coded by county. No problem, I just needed to geocode the counties into longitude and latitude. If I had a rich data source from CHIS, it would have been worth writing a script to do this, but since I was using cut-and-paste data, with about 75 rows, it was much simpler to just manually geocode everything. Which is what I did. More cut-and-paste.
  11. But now I have geo-coded data for both data sources.
  12. I needed a method to graphically display geo-coded scoring. This is pretty easy to do using proprietary GIS tools, even costless tools like Google Earth. But I wanted to keep things simple and Open Source at the same time. Enter the EInsert extension to Google Maps API v2. This allowed me to overlay png circle graphics on a Google Map, and size them in accordance with their percentage (bigger is worse, it means more of the kids did not have asthma plans).
  13. Then something tickled my brain. Using circles to represent scaled data is problematic. There is solid research indicating that humans have trouble estimating the area of circles in relation to each other… So I used the ratio suggested by James Flannery to counter this effect. Now the circles are sized in a way that indicates their relative meanings in a somewhat more appropriate way.
  14. Now I had a Google Map that displayed data regarding the frequency of plans as meaningfully sized circles over the California state. This data shows some predictable effects. First, the worst areas are either very urban or very rural. Exactly the places that have trouble attracting medical talent. That means that on this map, Ureka and Los Angeles urban counties have similarly sized circles.
  15. Now all I needed to do was overlay the doctor data on this map. This turned out to be pretty simple. I already have a link to provide a Google Map display of any small search on For instance, here is the link for the map for the search on allergists in California. All I needed to do was copy the html and javascript for the doctor map and integrate the map with the Asthma data map I had already made.
  16. So far, that maps looks pretty good. However, there is no easy way to tell which county, specifically, a given circle represents. I decided that the simplest way to address this was to dynamically rewrite the png using the gd library of php. I would pass the php script a label, and it would generate a circle with a label on it. This would allow me to label all of the circles on the map. As usual, stackoverflow provided a quick and dirty solution. (update 4-20) I realized that the label should show both the name of the county, and the percentage without a plan… now it does.

Take a look at the final result.

Notice that the shapes scale automatically as you zoom in. Try zooming in to Los Angeles or San Francisco to compare the compacted counties more closely. Also note that you can actually get the name of particular doctor that specializes in the treatment of asthma directly from the map. If you click the link you can get all of the contact information from

Which brings us to the point of this exercise.  A better view of the data can prompt change.

If you are a parent of a child with Asthma in one of the “big circles” you need to know that the long term treatment of Asthma requires a plan. If you do not have a plan, the reason might be that there are not enough doctors around you to provide the help you need. This map can put you in touch with the nearest expert.

If you are a doctor, who specializes in childhood allergy treatment, this is an opportunity map for you. Eureka is much smaller than LA or San Francisco, but you would have a near monopoly on a population that needs help with asthma. These people do not have the same access to specialized care and that might be a business opportunity for you. Moreover, a doctor who chose to focus on the urban areas in the larger cities might also be able to gain patients and profit. The data here shows that while there are lots of experts -around- the densely urban areas they are not meeting the demand for care. If a doctor could find a way to make money on a Medicare/Medicaid population in these urban areas, this might also be an opportunity.

Seeing the health data in a new way can provoke change. I hope you think my application is cool and sexy, but frankly I do not give a damn about that. I want to make a difference, not toys.

People remember Florence Nightengale as the mother of modern nursing. But she once made a diagram that changed the way people thought about war. It was that diagram that gave her much of the political clout she needed to create the field of professional nursing that we know today.

I have made the NPI data more liquid with Organizations like CHIS need to a much better job of making their data accessible. If I had been able to access the data from AskCHIS in a normalized and open format using an API, I would have been able to make mapping system that would allow the overlay of -any- type of doctor with -any- health data measure that they survey.

So that leaves me with a call to action for three groups: Patients -> find better care near you. Doctors -> go where the patients need you. Researchers -> expose your data in open formats using APIs and open file formats.

Of course, I publish my source code under an Open Source license. Enjoy.


RPMS is certified

RPMS, the VistA cousin run by the Indian Health Services has received ambulatory and inpatient meaningful use certification.

RPMS is substantially available under FOIA, (there are some proprietary components required to emulate the certified stack, I believe) and is the first Open Source stack that I know of to be certified as both inpatient and ambulatory.

More as it develops.


Correcting Information Asymmetry for patients

Consumer reports is invaluable tool for the purchase of almost anything.

Anytime I am considering a major purchase like a car, or perhaps expensive electronics, I always by temporary access to While the Consumer Reports magazine can be interesting to browse, the website is even more valuable. You can access any recent product review done for the magazine in an instant.

The problem that consumer reports addresses is “information asymmetry“.

Consider going to the car lot to buy a car and then comparing two similar car models. Both of the new cars cost about the same amount of money. Both of the cars have the same essential features. Which brand of car should I buy?

The problem here is that there is an asymmetry of information. The car sales man knows much more about the performance of these brands of cars than I do. So there is a danger that he will recommend the worse of the two cars, which he will have over-priced. If I trust the car salesman, I might be doing what is best for him, not best for me. Even if the salesman is honest, he might be making his recommendation based on what the needs of the average car buyer. To the degree that I am different from the average car buyer, my needs might be different.

Consumer reports helps to reduce this asymmetry. I can learn about how the cars perform from an objective source. I might end up taking the car salesman’s recommendation… I might not. My decision will be based on -my priorities- which can be very divergent from both a typical customers and from the salesman’s interests.

This kind of information asymmetry is even more pronounced in healthcare. I could learn what a car salesman knows about cars in about a month of diligent study. But to understand what a doctor does I would have to study for years. If I am trying to make a decision like “Should I have this surgery” I am at the mercy of the doctors much-greater information position. The Surgeon might be recommending surgery because that would generate income. He also might be recommending surgery because he is assuming that my priorities are the same as the “typical patient”.

Rectifying this information deficient for as a patient is much more difficult, because the resources available to patients are often problematic.

The information on WebMD is probably accurate as far as it goes, but it is dumbed-down. You can always spot information that might not go deep enough on the web, because it always ends with “ask your doctor about…”. That is the least helpful thing to say here. It means “This is actually a much more complicated issue, but we are not going to give you any more information, instead go ask the car salesman (the doctor)!”. It is the doctor that I am trying to evaluate here!

Wikipedia has much more accurate information that goes much deeper, but its articles are of sporadic quality (usually very high, sometimes very low… which one are you reading now?) and it may not be updated with the latest information on its more esoteric articles. It was not never intended to be relied upon for medical information that changes very very rapidly.

My boss and collaborator at the Cautious Patient Foundation Dr. Cari Oliver has just written a detailed blog post where she details how patients can use at service called to get around this problem. This service is intended for doctors, but they have recently allowed temporary access rates so that patients can access a topic or two and not pay the expensive yearly access fee. Of course, this service is aimed at doctors. It might be a little over your head. But it is better to have access to accurate, recent information about the risks and benefits of different procedures, from a disinterested third party authority that is too complex than not to have it all!

This type of recommendation excites me as a technologist passionate about social change! This is a classic example of using information to make patients more powerful!!


Two other Open Source EHRs Meaningful Use certified (partially)

I just found out that at least two other Open Source projects have been meaningful use certified.

OpenEMR has been partially certified.

Medspheres OpenVistA CareVue has been certified.

I hope to get more information about exactly what the partial certification means and what the meaningful use strategies of these organizations mean, but this means that the ClearHealth is no longer alone in certification. (Although from what I can tell, ClearHealth remains the only fully certified Open Source EHR)

I will write more when I know more…

QR code stencils, the problem

I love QR codes.

I think the notion of simple graphical URLs is beautiful and elegant. If my wife were a graphical data object, I think she would be a 2D QR code.

Think of it, you can put links anywhere you want, in the real world!

You can put them on tshirts, coffee mugs, stickers, business cards… anything in the real world becomes a link to something in the virtual world. Awesome.

I have been playing with QR codes, with an eye towards gamification and behavior change for quite some time. I love the fact that with android and/or iphones you can rely on the GPS coordinates that webkit (the core of both browsers) will provide, makes a QR code a token that can do different things in different places. Think of the possibilities!

You could make geo-caching much much more interesting…

But how do you make durable (or intentionally not durable) QR code in a reproducible way? How do you manufacture large QR codes, that can be scanned accurately at a distance?

The first approach is simple to print the QR codes on either single sheets (A4 or US letter) and then clear paste them to some type of flat surface. You can use throw-away planks of wood from the hardware store to make durable QR code links. But what if you want to make a QR code on some permanent surface, like a wall or pavement. This basic idea can be taken pretty far, for instance you can paste the printed QR codes into ceramic tile or even bake it on, for a near permanent tag.

The simplest solution would be to use a stencil with black spray paint. QR code scanners vary greatly in their ability to pick up contrast, but the color black, and some other color, will almost always pick up. This has an advantage over gluing paper, because you can tag objects that are not entirely smooth. Moreover, with spray paint that does not damage the surface (more later) you can create images that can be placed out in public, non-destructively.

But what is the problem with a QR code stencil? In a word, islands. In order to make a stencil with, say, photo paper (which would otherwise be a great technique), you need a way to address bits that the stencil needs to block, that are not physically connected to the rest of the stencil. Its easier to show than explain. If you are spray painting black, for instance, and you want to make a stencil of the following QR code, you will have the following trouble spots:

A demo of the QR code islands that make stencils difficult

See the issue, the two anything white, that does not connect to something else white (even by a corner) is going to be an issue. You might be able to make something clever for the places where this happens in most/all QR codes, but each QR code is going to have random “islands” that are often just one pixel big… and in different spots each time. These are the real headache. Making a traditional stencil simply will not work.

Also, making a stencil is very very slow. If you have to cut each pattern by hand.. ouch… way to much time. We need something faster too!

My first approach to solving this problem was to try and find a programmatic solution. For a given URL, there are many different ways to encode into a QR code. It might be possible to use an algorithm that detects this type of “island status” to find a QR code solution that did not happen to have any islands. You could make an application smarter by posting meaningless GET variables at the end of a URL until you found a version of the URL that would work (of course, I am focused on using URL shorteners like to ensure that you have a simple-as-possible QR-code. The more character in the URL, and the more complex the QR code is and the harder it is to make a stencil. The shortener ensures that the QR code is manageable.

I gave up on this technique after noting that there were islands in all of my test runs for various URLs, but the idea is sound.

Glen Tullman presents the Chewbacca defense

I have been meaning to write about this for a while.

Glen Tullman and I have pretty different opinions about Health IT. Glen is the CEO of Allscripts, which is the largest proprietary EHR vendor in the country. When ONC called for testimony for the definition of meaningful use, Glen and I sat on the same panel. I testified after him, and I painted a much different picture of the state of Health IT than he did. The summary of his testimony: “The future of EHRs is already here, we are doing meaningful use today”. The summary of my testimony: “There is a market failure in Health IT, no other industry needed to be paid to computerize”. He holds his own software company out as an example of the “right way” where as I generally hold VA VistA, which was developed in a Open Source collaborative fashion as the way forward.

Of course we are both financially biased in this regard. I am an upper-middle income software developer, and Glen got paid $4,072,270 last year. Given the kind of money I make on this Open Source stuff you should probably take everything I say with a grain of salt, and take everything he says with about 45 grains of salt… you know… based on the relative bias involved…

But Glen Tullman got an opportunity to testify again (without me this time), regarding VA VistA. (text, video)

In this testimony, I want to focus on one specific statement, that is particularly galling to me.

While the private sector has been moving forward in light of these incentives, the Government has been investing in their own proprietary systems for many years.  Billions of dollars have been spent to build and implement the VistA/CPRS system within the Veteran Health Administration and the AHLTA system within the Military Health System.

So the VistA/CPRS is “proprietary”, while Glens own software is “private sector”. Wow. The Chewbacca defense at its best.

VistA/CPRS can be run for any purpose, the sourcecode is available for anyone to download without cost, you can redistribute those copies of VistA/CPRS without cost, and you can also redistribute modified versions of the software. That means VistA/CPRS meets the definition of freedom-respecting software, which is the soul of Open Source. Moreover, it was and is developed in a collaborative fashion that is at the heart of every successful Open Source project. If you want to know more, you should read What is VistA Really page that I edit for WorldVistA.

Then, Glen takes credit for accomplishments of Open Source technology:

For example, in Hartford, Connecticut, we have been partners in a project for almost two years that has not only led to widespread health IT adoption but successful implementation of open source health information exchange technologies.

What Glen meant by this is that there are some Allscripts node on an Open Source HIE created by MOSS, Misys Open Source Solutions. In short, Open Source -was- responsible for the exchange, and this had very very little to do with Allscripts software.

He goes on to say:

the fact remains that VistA’s basic platform, which relies on the 25-year old technology called Mumps, cannot support the open, flexible approach needed by those providing care to our nation’s wounded servicemen and women. Rather, the demands of today’s military and veteran healthcare environment necessitate the use of technologies – such as those based on Microsoft’s architecture – that can support an open, shared approach that will not just be desirable, but a fundamental requirement in the near future.

It should be noted that -every- instance of VA VistA inside the VA is capable of communicating with every other instance of VistA inside the VA. The VA was the first and probably still the only large scale organization to achieve this kind of internal data fluidity, which has been happening for more than a decade. Interestingly, the other “large” vendor in Health IT is Epic, a proprietary EHR company that relies heavily on MUMPS. I can think of nothing that Allscripts software can do that either Epic, or VistA is not capable of. Holding out Microsoft technology as a source for peer-to-peer leadership is also pretty ironic, but whatever…

Glen is pretty used to speaking out of both sides of his mouth regarding Open Source. And this testimony is far from the only instance. First there was this article in Forbes, which originally claimed that Allscripts had an Open Source platform, but was then quickly redacted to its current “clearer” status. This was not before it was completely flamed..

most recently, Glen was interviewed in the January 2011 Edition (Vol. 19, No. 1) of HealthData Management Magazine

And Tullman has spent those years (since 1997) being a relentless advocate of the use of open source architecture for health I.T. software and pushed his company to develop tool sets to connect its EHR software with virtually any device or software on the market.

This is was, of course, published in time with the edition of the magazine that would be available during the 2011 HIMSS conference.

This is a very disturbing case of a proprietary EHR CEO being completely intellectually dishonest regarding Open Source. I am on speaking terms with several of the top CEOs of proprietary EHR systems. People like Jonathan Bush of Athenahealth and David Winn (formerly CEO of) eMDs. I have advocated Open Source to these figures on a regular basis. But the remain proprietary companies because they believe that they will make more money as proprietary companies. I believe that Open Source has value that should be more important than profit, and have a friendly disagreement about this with most industry CEO’s. They think my ideas are intriguing and have potential, but see no reason to “bet the farm” on Open Source.

But they also -never- hold themselves out as the “Open” or “Open Source” option. Nor do they malign technologies merely because they are other than those chosen by their own developers. Glen Tullman regularly does both of these. Hell, he did in testimony to Congress.

Look I know that not everyone agrees that Open Source is the way to go, this is not what I am arguing here. I am arguing that we need to have honest and sincere disagreements about licensing and technology issues in Health IT rather than listening to Glen Tullman and his Chewbacca defense.

ClearHealth, the first Open Source EHR Meaningful Use certified

I am happy to report (a little late) that ClearHealth is now the first commercial Open Source EHR product to be meaningful use certified.

This project holds a special place in my heart, since David Uhlman and I started it years ago as next generation PHP-based Open Source EHR. It is theoretically possible that some code that I wrote, all that time ago, has actually carried over into this certified version. A careful analysis of the sourcecode tools would probably reveal that I can take credit for perhaps 5 or even 6 carriage returns, in the current ClearHealth code-base.

In all seriousness, the ClearHealth project has grown by leaps and bounds. The PHP-product from ClearHealth, Inc has leveraged many of the innovations from the WebVistA project. Making it probably one of the most capable and robust web-based EHR systems in existence. Only OpenMRS competes in terms of complexity and scope at this stage. Unless Tolven, OpenEMR, PatientOS etc can get their act together, this certification will set ClearHealth aside as the “one project to rule them” in this space.

More importantly, ClearHealth is not ignoring the needs of its community users. They are developing a path to self-certification for ClearHealth users who rely on the community edition of the product. Pretty amazing stuff.


Direct gathers steam

Recently, the AAFP and Surescripts announced Physicians Direct, a secure messaging service for providers.  But neither the article nor the signup page for Physicians Direct detail the most critical single issue regarding the service. This is a very large deployment of the Direct Project. This is by far the most important part of the story, but it is buried deep with the FAQs.

That means that the service is compatible with other large adopters of the Direct Protocol. Most notably, HealthVault has just launched a beta deployment of Direct.
Think of the implications of this. One of the largest PHR providers in the country is on the network, one of the largest network of doctors is on this network.

We are watching the birth of the Health Internet.. its is truly wonderful to be involved in this work.

When I tell my grandkids what I did with my life, I hope the links to my early posts on the Security and Trust Working Group of the Direct Project are still up. “I was part of that from the beginning” I will say… My previous plan was to tell them that I invented bubble-gum ice cream, and then enjoy basking in their amazed adoration, until they discovered that Grampa’s stories are “unreliable”.

This will work out much better.

This is also a tremendous step for Surescripts away from being a proprietary network provider. For those who are unfamiliar with Health IT, Surescripts has a monopoly in e-prescribing after buying out its only competitor several years ago. If you e-prescribe in the United States, there is a 99% chance that the data cross the Surescripts network. Surescripts is free to use for Doctors, but the pharmacies pay for the privilege. But that business model will die as the Health Internet grows. Once the pharmacies realize that you can use the Health Internet to exchange prescriptions rather than the expensive Surescripts network, that business will dry up quickly. Moving into the Health Internet provider business is the only chance Surescripts has at long term survival. This is a very smart move for them.

Of course, this also has implications for meaningful use. Providers can use this exchange network, without making an expensive investment in EHR technology, and still qualify for part of the meaningful use dollars. $15 a month might seem expensive for glorified email, buts a whole lot cheaper than an EHR.