Hacking on the Pubmed API

The pubmed API is pretty convoluted. Every time I try to use it, I have to try and relearn it from scratch.

Generally, I want to get JSON data about an article, using its PubMED ID and I want to do searches programmatically… These are pretty basic and pretty common goals…

The PubMED api is an old-school RESTish API that has hundreds of different purposes and options. Technically the PubMed API is called the Entrez Database, and instructions for using it begin, and end with the Entrez Programming Utilities Help document. Heres the things you probably really wanted to know…

How to search for articles using the PubMed API

To search pubmed you need to use the eSearch API.

Here is the example they give…


The first thing we want to do is not have this thing return XML, but JSON instead. We do that by adding a GET variable called retmode=json. The new url


Ahh… thats better… No lets get more ids in each batch of the results…


Breaking this down…


is kind the entry point for the whole system..


is the actual function that you will be using…

This tells the API that you want to search pubmed.


Next you want to set the “return mode” so that JSON is returned.


And then you want to add the retmax to get at least 1000 results at a time… The documentation says that you can get 100,000 but I get a 404 if I go over 1000


The term argument


db and term are seperated by the classic GET variable layout (starts with a ? and is then seperated by a &) if that sounds strange to you, I suggest you learn a little more about how GET variables work in practice.

Now about the “YOUR SEARCH TERMS HERE” What that is a url_encoded string of arguments to the search string for pubmed. URL coding is (something of a trivialized explanation) how you make sure that there are no spaces or other strangeness in a URL. Here is a handy way to get data into and out of url encoding if you do not know what that is..

Thankfully the search terms are well defined, but not anywhere near the documentation for the API. The simplest way to understand the very advanced search functionality on pubmed is to use the PubMed advanced query builder or you can do a simple search, and then pay close attention to the box labeled “search details” on the right sidebar. For instance, I did a simple search for “Breast Cancer” and then enabled filters for Article Type of Review Articles and Journal Categories of “Core Clinical Journals”.. which results in a search text that looks like this:

("breast neoplasms"[MeSH Terms] OR ("breast"[All Fields] AND "neoplasms"[All Fields]) OR "breast neoplasms"[All Fields] OR ("breast"[All Fields] AND "cancer"[All Fields]) OR "breast cancer"[All Fields]) AND (Review[ptyp] AND jsubsetaim[text])

Lets break that apart into a readable syntax display…

("breast neoplasms"[MeSH Terms] 
  OR ("breast"[All Fields] 
        AND "neoplasms"[All Fields]) 
  OR "breast neoplasms"[All Fields] 
  OR ("breast"[All Fields] 
        AND "cancer"[All Fields]) 
  OR "breast cancer"[All Fields]) 
AND (Review[ptyp] 
  AND jsubsetaim[text])

How did I get this from such a simple search? PubMed is using MesH terms to map my search to what I “really wanted”. MesH stands for “Medical Subject Headings” is an ontology built specifically to make this task easier.

After that, it just tacked on the filter constraints that I manually set.

Now all I have to do is use my handy URL encoder.. to get the following url encoded version of my search parameters.


Lets put the retmode=json ahead of the term= so that we easily just paste this onto the back of the url.. we get the following result.


I wish that my css could handle these really long links better… but oh well. I know it looks silly, lets move on.

To save you (well mostly me at some future date) the trouble of cut and pasting here is the trunk of the url that is just missing the url encoded search term.


At the time of the writing, the PubMed GUI returns 2622 results for this search, and so does the API call… which is consistent and a good check to indicate that I am on the right track. Very satisfying.

The JSON that I get back has a section that looks like this:

    "esearchresult": {
        "count": "2622",
        "retmax": "20",
        "retstart": "0",
        "idlist": [

With this result it is easy to see why you want to set retmax… getting 20 at a time is pretty slow… But how do you page through the results to get the next 1000 results? Add the retstart variable


If you need more help, here is the link to the full documentation for eSearch API again…


How to download data about specific articles using the PubMed API

There are two stages to downloading the specific articles. First, to get article meta-data you want to use the eSummary API… using the ids from the idlist json element above… you can call it like this:


This will return a lovely json summary of this abstract. Technically, you can get more than one id at a time, by separating them with commas like so…


This summary is great, but it will not get the abstracts, if and when they are available. (it will tell you if there is an abstract available however…) In order to get the abstracts you need to use the eFetch API


Unlike the other APIs, there is no json retmode, the default is XML, but you can get plaintext using retmode=text. So if you want structured data here, you must use xml. Why? Because. Thats why. This API will take comma separated id list too, but I cannot see how to separate the plaintext results easily, so if you are using the plaintext (which is fine for me current purposes) better to call it a single id at a time.




Novice EHR Development is now unethical

The original Hipoocratic Oath states:

I will not use the knife, not even on sufferers from stone, but will withdraw in favor of such men as are engaged in this work.

One modern version reads:

I will not be ashamed to say “I know not,” nor will I fail to call in my colleagues when the skills of another are needed for a patient’s recovery.

The idea here is that a doctor needs to recognize when another practitioner has a skill that they do not, and that they must refrain from “practice” when another person has demonstrable expertise in that area of practice.

It is now 2013. It is time for doctors to stop “writing their own EHR” from scratch. They need to bow out of this in favor of people who have developed expertise in the area.

I just found out about another doctor who has decided to write his own EHR, because he has not been able to find one that supports his new direct pay business model adequately. In the distant past I encountered a doctor who believed that his “Microsoft Word Templates” qualified as an EHR system. This is a letter to any doctor who feels like they are comfortable starting from-scratch software development for an EHR in 2013 or later.

You might believe yourself to be an EHR expert.

Are you sure about that? Are you sure that you are not just an EHR expert user?

This difference is not unlike your relationship with your favorite thoracic surgeon. Or for that matter, your relationship with the person who built your car. The fact that you are capable of expertly evaluating and using EHR products does not mean you are qualified to build one. Just like the fact that you are qualified to treat a patient who has recently had heart surgery or to discern when a patient might need heart surgery does not make you qualified to perform that heart surgery. Similarly, the fact that you can drive, or even repair your automobile, does not provide you with the expertise you need to build a car from scratch.

The ethical situation that you are putting yourself in by developing your own EHR is fairly tenuous. Performing heart surgery without being a heart surgeon, building and driving your own car without being an automotive engineer and a doctor coding their own EHR system from scratch all have the same fundamental problem: You might be smart enough to pull it off, but if you don’t you can really mess up another person’s life. Make no mistake, you can kill someone with a shoddy EHR just as easily as by performing medical procedures that you are not qualified for or by driving a car that is not road-safe.

It is not that heart surgeons, automotive engineers and EHR developers are not going to kill people with faulty performance. All experts are fallible. But they will kill far fewer people than you would, performing outside your expertise.

I can understand your feelings of frustration. You likely have totally different goals in mind than the average third-party-payer oriented EHR system has. You are right to be frustrated with the shackles that those systems have placed on you. But you are very wrong to presume that it is ethical for you to do “amateur hour” on your own.

You presume that because you can see the problems with EHR developer performance, that this makes you qualified to build a better EHR. You are utterly and unequivocally wrong about this. Sometimes, EHRs have features that are designed for clinical CYA, basically over-documentation for the sake of unethical defensive medicine. Sometimes EHR systems are designed to be glorified practice management systems, designed mostly to ensure that doctors maximize their paycheck at the expense of patient care. Sometimes EHR design decisions have no rational behind them at all… they are frequently the result of original design whims that are hard to correct in subsequent editions of an EHR product.

But sometimes a feature that frustrates you is precisely what makes that EHR safe for patients. I can promise you that you cannot tell the difference between flaws and features without looking carefully at both the internals of the EHR system and all of the clinical workflows it is exercised in. What you think of as a flaw might be a software crumple zone.

Happily, you get to have your cake and eat it too. There are several Open Source EHR systems that are already meaningful use certified. You can use these Open Source EHR systems for nothing, and for very little money you can even get Meaningful Use credit for using these systems. Given this, you have no excuse to continue to develop an new EHR.

Open Source gives you the right to change what you need to, in order to get the functionality that you want.. and more importantly can connect you with experienced health IT developers, who can serve as a gut check for you as you consider how to implement the features that you need for whatever clinical variation you are interested in implementing.

This is very like the person who orders a “kit car” to build in their garage. They get to -feel- like they are building the car, and indeed they get lots of options normal car owners do not. But in the end, they are able to build a car safely because someone else, someone with specific expertise, has made sure that design of the kit car is fundamentally sound.  You can always shoot yourself in the foot with kit cars and Open Source.. but you have the power you need without being in over your head.

The development of mature EHR systems has been very similar to the development of surgical methods. Primitive EHR systems and primitive surgical procedures were both deadly. In both cases, medical science has already sacrificed thousands of people to the “cause” of learning how to do these things right. In 1850, it would have been entirely appropriate for any doctor to “dabble” with creating their own surgical methods. Even as recently as 2000, it would have been appropriate for you to “dabble” with the creation of your own EHR system. (eMDs was started by a doctor dabbling in 1996. eClinicalWorks was started in a similar fashion in 1999). But those days are over.

A doctor developing a new EHR system from scratch, by themselves, without extensive Health IT programming experience is in over their head. If they continue to develop an EHR, even after being warned of the dangers here, then this is hubris.

Ask yourself: Are you absolutely sure that this action is not a fundamental violation of the oath that you took when you became a doctor?

I want to be clear, I have worked on or around the development of EHR systems for more than a decade, and I would not presume to write a new EHR system without a team of programmers and years of funding. Its not that I think that “a doctor” is not qualified to undertake this task. No single person is.

I wrote a book designed to ensure that novice programmers had basic training in complex Health IT principles. Programmers can be guilty of hubris too, and I consistently advocate for a “clinical pair programming” approach. David Uhlman (my co-author) and I wrote the book because too many people assume that Health IT is easy, and they wonder why things in the industry are so “primitive”. The book is intended to teach clinicians and programmers alike humility when approaching clinical information systems, both as users and as developers. FSM knows that I have been dangerously arrogant regarding clinical information systems, and I have and will make serious mistakes. But there comes a point where making the same mistakes that others have made, and written about, becomes unethical. I think we have reached this point with EHR systems.

Some people took offense that I should link to my own book at the end of this article, so instead I have included some of the reference materials that I use frequently. This is a good sampling of the kind of context that really should be required of any modern Health IT developer.

Begin with Information and Medicine by Marsden S Blois. Then move on to Principles of Health Interoperability HL7 and SNOMED by Tim Benson and The CDA Book by Keith Boone. Finally, you should read about what can go wrong in Health IT by studying EHR generated errors with Clinical Information Systems, Overcoming Adverse Consequences by Dean E Sittig and Joan S. Ash.

These are the books that I refer to when I get stuck on something. I wish I could just hold it all in my head, and in many ways my book is just the cliff notes I need for myself. If you know of other books that should be on the “Health IT required reading list” please leave them in the comments…


Expert Healthcare Hackers

(This is a preview of a talk that I am going to give next week at Healthcare::Refactored, with Karen Herzog)

There are two definitions of the word “Hacker”. One is an original and authentic term that the geekdom uses with respect. This is a cherished label in the technical community, which might read something like:

“A person adept at solving technical problems in clever and delightful ways”

While the one portrayed by popular culture is what real hackers call “crackers”

“Someone who breaks into other people computers and causes havok on the Internet”

People who aspire to be hackers, like me, resent it when other people use the term in a demeaning and co-opted manner.  Or at least, that is what I used to think. For years, I have had a growing unease about the “split” between these two definitions. The original Hackers at the MIT AI lab did spend time breaking into computer resources… it is not an accident that the word has come to mean two things.. It is from observing e-patients, who I consider to be the hackers of the healthcare world, that I have come to understand a higher level definition that encompasses both of these terms.

Hacking is the act of using clever and delightful technical workarounds to reject the morality embedded default settings embedded in a given system.

This puts “Hacking” more on the footing with “Protesting”. This is why crackers give real Hackers a bad name. While crackers might technically be engaged in Hacking, they are doing so in a base and ethically bankrupt manner. Martin Luther King Jr. certainly deserves the moniker of “protester” and this is not made any less noble because Westboro Baptist Church members are labeled protesters too.

Like protesting, Hacking is all about taking a certain set of ethical issues that are important to you, and then performing an act whose central purpose is to restore ethical balance. People with screwed up ethical compasses will give good protesters and good Hackers a bad name.

I like this broader definition because it really shows that Hacking is not at all limited to technology. It relates to “systems”, as long as the “system” is complex enough to encode moral notions. This means that protesting is really just a special kind of Hacking, in fact we might rename it “public opinion hacking”.

Consider Richard Stallman. Stallman realized when he couldn’t get access to printer control software because of a proprietary license, that the license itself was encoding something he had an ethical problem with. Rather than accept that embedded morality, he created a workaround solution (copyleft licenses) that created an alternative with an embed morality that he could live with. The system that Stallman was hacking was copyright and licensing and the modern Open Source movement is the result of this hack.

The notion that technology and other complex systems can have moral notions embedded is neither new, nor mine and I recommend Lessig’s Code and Other Laws of Cyberspace for a full discussion.

I came to this conclusion as we renamed our “meaningful use” book to “Hacking Healthcare“. David Uhlman (my coauthor) and Andy Oram (my editor) seriously considered “Hacking Healthcare Software”, as an alternative title. But in our discussions it became apparent to us that David and I were really hoping to teach people how to use software to change the Healthcare system itself. The software was merely the type of hack that we were proposing, rather than the system being fixed with the hack.

Any efforts to hack healthcare should be embraced because the default settings on the Healthcare system really suck.

We have too many medical errors. We have overtreatment, undertreatment, fraud and disconnected care. Worse, until very recently, we had incentives that were virtually guaranteed to make these problems worse. These problems are merely symptoms of the wrong set of morals being encoded into the healthcare system.

Which leads me to introduce Karen Herzog to you. Karen makes my efforts to hack healthcare look somewhat childish. Like other, more famous e-patients like e-patient Dave and Regina Holiday, Karen, along with her husband Richard Sachs refused to accept the default settings of the healthcare system when their daughter Sophia was born with a rare genetic disorder. Shortly after Sophia’s birth, Karen and Richard were informed that their daughter disease was incurable and that she was dying.

The default settings for the healthcare system in these circumstances could not have been worse. Karen and Richard were offered occupational therapy, physical therapy, grief counseling and “when she turns blue let us know..” by their doctors in a manner that was obviously code for “we cannot help you, sorry for your situationa but get out of our hair”.  Karen and Richard refused to accept this. They did go home, but rather than allow the healthcare system to “wash their hands” of Sophia they created a garden. This literal garden was the first step in creating a community of care that re-engaged their doctors, who were themselves feeling hopeless and overwhelmed a safe environment to try to make Sophia’s life better and to seek a cure. Like all of the greatest “Hacks” Karen and Richard repurposed simple solution and made it apply to a problem that was regarded as unsolvable. They created a literal space that was so welcoming that it inspired collaboration in a group of clinicians that were not used to collaborating worked beautifully. They found ways to make it obvious that Sophia’s space would not be a deathbed, but a different kind of space altogether.

Eventually Sophia died, but only after receiving care that was orders of magnitude better that what could have been accomplished if Sophia would have been hospitalized full time. Hundred of clinicians, friends and family came together to make Sophias garden into a success, in a collaboration that never could have occured inside the walls of any given healthcare institution.

This success was hard-fought. Together, Sophia, Karen and Richard experienced just about every significant problem that patients and caregivers can have. For each hurdle, Karen and Richard continually refused to accept the “default settings” that the healthcare system offered, by responding with hack after hack.

I am humbled to be speaking opposite Karen. Since Sophia died, Karen and Richard have pivoted their design group into one of the preeminent “Patient UX” shops in the country. They have leveraged their troves of poor experiences with the healthcare system, and their methods of working around them, into a series of fundamental insights about how to improve patient experiences with technology and design. They are my default recommendation for design work in the healthcare space.

I have been watching what e-patients like Karen and Richard are able to accomplish for years and I have come to realize that in many ways, they are far more deserving of the honorific of “Hacker” than the bozos who deface websites to make political points. In much the same way that the recognition that MLK Jr was a protester, makes it embarrassing that we have to label the Westboro church members with the same label.

Like the original Hackers who built the Internet and the first computers, e-patients are blazing a trail through the healthcare system. Decades from now we will look back on this class of patient and realize that they remade healthcare by simply refusing to accept the aspects of the healthcare system that typically suck. In the future, when the new norm for doctors is respect patients enough to actually let them finish sentences, we will have this generation of e-patients to thank. Much the same way that we recognize that our iPhones and Androids would not be possible without the pioneering Hackers of the *nix community.

Karen and I will be doing a “dueling keynote” at Health::Refactored, asking each other difficult questions about the state of the art in design and technology in healthcare. I hope that the audience will learn some tidbits from me about how to work with software to help fix healthcare, but I think I have made my case that Karen will be the real healthcare Hacker on the stage.



How to change the world over the weekend

I love hackathons.

I love winning them. I love competing in them. I love winning them.  I love judging them. I also love not losing them.

This weekend, I am acting as a mentor to the first Health 2.0 hackathon in Houston Texas. As far as I know (which is not that far, really) this is the first hackathon in Houston to be focused exclusively on healthcare. Serving as a mentor rather than having the opportunity to directly win might seem counter intuitive, given how competitive I am. But I have had complaints about being a “professional” Health IT expert entering these contests, and as one of the organizers of the event, I do not want to be seen as unfair. This was a hard decision to me because in most cases, if I have to choose between winning and being unfair, I choose winning.. but my Houston Health 2.0 co-conspirators prevailed upon me this time…

I do well in hackathons because I know how to avoid the number one pitfall in healthcare hackathons: It is too tempting to make toys.

To really rock a Healthcare Hackathon you have to have a real strategy to build something that will make a difference, but something that you can still prototype in two days. Here are general thought strategies that have worked for me:

  • Have you carefull searched the web for someone implementing your first-blush idea? The android iphone app stores? Your idea is probably not original?
  • Rather than focus on original “ideas” to find “original problems”, clinician partners on your team are critical for this perspective!
  • Seek problems where there is no money to made solving them. Problems that already have money already have attention, it is hard to do original work in those spaces!
  • Only a few doctors are enlightened enough to pay attention to the hacking approach. How can we multiply the impact of a very few doctors?
  • Most patients are not e-patients, they are reactive and unwilling or unable to change their own healthcare behaviors. How can we minimize what each patient must do, but still have an impact?
  • Are there patient pain points so strong that we can rely on at least a few highly motivated beta testers?
  • How can we leverage the cloud, even with HIPAA limitations?
  • How can we crowd-source effectively, ensuring that every participant is evenly and instantly rewarded for contributions? How can we make crowdsourcing fun?
  • How can we leverage pre-existing Open Source code or APIs? Stand on the shoulders of giants… Hello! Obvious!!
  • How can I flesh out my team at a hackathon by pitching to clinical, educational, design, art or video collaborators?
  • If a programming task is hard for me, can I find a geek that can do in a few minutes what it would take a whole week for me to learn?
  • Getting a good idea is easy. Getting a good idea that is small enough for me to finish in two days is hard. How do I trim all the fat?

Here are some ideas that I will be pitching to participants to this weekends hacking contest. If I can find geeks with the required programming skill-sets and the team to ensure that they have the clinical and design backup that they need, I think these are all doable in two days.

Big Data on medical students:

Medical students are the only ones who understand the problems in medical school. I have designed a hack that will allow us to use big data on them directly to discover and fix the issues with our process for making doctors. I think this will require a team who can code in cross-platform Java… but a web-platform programmer could be tolerated in a pinch. SQLlite experience is a plus.

Better medical wikis

Only Wikipedia has the critical mass to sustain itself, so the only way to make a medical Wikipedia is to do it inside Wikipedia. But how do we ensure that the medical parts of Wikipedia are accurate enough for clinicians and experts, but simple enough for the average patient to find them useful. I think I have found a way to use the Wikipedia API’s to dramatically improve the quality of Wikipedia articles on health issues, but I will need a team who knows how to either build a chrome or firefox module…. are perhaps super fancy JavaScript bookmarklet

Cross the channels at health conferences

Every healthcare conference has a back channel, and in my experience at healthcare conferences, many of the real experts are in the crowd tweeting. Conversely the people who line up to ask questions at a microphone are unvetted, a tragic portion of those who ask questions are actually pitching their own projects, or exercising an obsession, or asking a stupid question (and yes… there is such a thing as a stupid question… or at least there are many morons who feel comfortable wasting my time with questions). I am pretty sure it will require something like Node or Pythons Twisted, but I think we can use Twitter to hack health conference Q&A for the better….

The calculus of pain

In healthcare we have policies that help to ensure that “drug seekers” are unable to access excessive amounts of opioid pain killers. Assuming we define “denying a patient pain medications as a positive”, then these policies are “high sensitivity”  (has few false negatives). Said another way, they have been shown to reduce the number of deaths from medication overdoses in those states that apply them. But good policies are also “high specificity” (has few false positives). In this case, a “false positive” is to deny a patient who has legitimate untreatable-without-opioid pain access to effective pain control. The debate is mostly rhetoric here, with law-enforcement and organizations who represent pain patients both resorting to rhetoric  because there is no way to accurately measure false positives. But what if we could create a dynamic visualization that estimated false positives from the data that we do have? Essentially, we could create a “calculus of pain” diagram that both sides could ‘agree’ on, but use differently. As you might expect, this ‘rhetoric negation GUI’ will require extensive D3/javascript expertise.

Simple games for fitness

I am interested in creating tools that use Geocoding and QR codes together to motivate health. I need IOS and/or Android developers for this one.

Twitter plus epatients

Lastly I am interested in the ways that e-patients tend to favor twitter and I might be interested in developing an e-patient specific twitter tool. Need to code in a web-friendly language.

Quantified Self device hacking tools

The QS community very clearly needs a specific tool that I have gotten alot of requests for. You must know either hardware interfacing (usually C or C++ for usb drivers etc) or web authentication (OAuth et al)

Do something awesome using Natural Language Interfaces.

One of the API sponsors for this hackathon is Ask Ziggy which is essentially a “Siri as an API” for app developers. Its a clever idea and there are lots of possible uses here… no specific technical requirements other than to us this API.

Do something awesome with DocGraph

This is of course, our own data set.. and you can read about it at the main DocGraph site.

Do these sound vague enough?

I hope these are pretty vague ideas. I intentionally am leaving out the critical “how” part of each idea!

I hope this list is enough to spark some interest and get developers to attend this conference. I will not be the only one pitching ideas, and teams attending with pre-baked ideas typically do well at these kinds of events. Still if you want to use my ideas, and hear me explain how to do them and why they will work then you need to meet my specific criteria. First, you must be willing to develop  in the open, and under Open Source licenses. I am giving you a hackathon winning idea for no money. (and I am fairly certain, given that I have judged more health 2.0 contests than anyone else) Even if you do not win the contest, these ideas are so good that I will probably be able to make you fairly famous in the Health IT and Health 2.0 communities.

By working on my ideas you kind of hedge against losing at all. If you are able to pull of the projects, then I will give you credit publically for your awesomeness, which is valuable to anyone looking to make a name. For this valuable insurance service,  I need to be able to start from where you left off if you decide to abandon the project after the hackathon… That means github and the FOSS license of your choice (I like the AGPL)

You also -must- have the skillset that I require for a given project for me to give you the details on a project. I cannot have my best ideas just “out there” for people to run off with!! I am pretty sure that I have at least one project for every kind of developer that I can think of listed above. If I could do all of these ideas myself with my programming skill set.. guess what… I would have already done them or I would save them so that I could win some other hackathon! Each of these projects leverages a very specific hack of some kind. Either hacking hardware interfaces, user expectations, software design, data levers or something like. After I describe the “how” of each project there will be an “aha/wow” moment, when you think “We didn’t I think to do that?” (Note I felt this way after seeing IFTTT for the first time). If I am handing you a “wow” world-changing hack then I have to know that you will make us both look awesome when you pull off the hack. Don’t worry if you do not have a specific skillset I define here. I have lots of other ideas based on what you are good at! This especially applies to designers and other artistic types and to clinicians!! All of these projects could use clinical/design help!!

If you have not signed up yet, then I would get over to the signup page now. So far, every Houston Health 2.0 event has sold out so far, and we expect this one too as well. I have some pretty awesome project proposals but I can tell you now that these will just be a few of the awesome ideas that we are bringing to the table for this Hackathon. Most importantly, if you already have a project in mind, then you will be able to find a team to help you hack on your project! All you need is alot of motivation, a little skill and a willingness to collaborate. Or even just one of those three would do…

Looking forward to seeing you there!!






QR code stencil upmanship

As far as I know, I was the first person to publish a generalizable method for creating a QR code stencil or to even clearly document why such a method was difficult.

However, since that time, I have realized that other approaches would ultimately be superior. The two I have been pursuing are automated embroidery of qr codes and improved stencils using laser cutting or 3d printing.

I will likely be abandoning the latter work, now that my early attempts have been eclipsed by Golan Levin at the Studio for Creative Inquiry at Carnegie Mellon. This work is being released from fffff.at, whose motto is: release early, often and with rap music.

Here is a link to his post on his new laser-cutter QR code stencil generation code. Along with his first application, a remake of hobo-coding.

Golan gave me a shout-out on the post he made. In some ways, it is hardly justified, since his method obviously surpasses my chicken-wire method in several ways. In fact, the only outstanding benefit to my method now is that it is much cheaper, and you do not need access to a laser cutter. The codes generated from his method are cleaner, and could probably be made smaller than my methods, and do not require an hour of working with caulk.

In a private email, he mentioned the possibility of a githib release soon…

In any case, take a look at the wonderful photos of the stencil in action.

Programmable Self Reading List


I am preparing for my talk at Quantified Self about my work on Programmable Self. I was asked to make a “reading list” for the people who were interested in this subject so I wanted to create that here. Please add links in my comments section for titles that I have omitted!! Requirements for inclusion are simple. Anything that applies to using technology to change your own behavior. I would also suggest that you get a kindle from Amazon. Kindle will run as software on the iphone, ipad, and android, as well as OS X and Windows. So you do not need to buy a new device if you do not want to, but buying the kindle edition of the following books will probably save you more than $100.

My goal here is to have something relevant no matter what your background. If you are a behavioral economist, then there is some cool gamification stuff here. If you are all about gamification, then there is some cool behavioral economics stuff here… Please help me make this list even better with comments!! (thanks to Lesath for pointing out a broken link!!)

While there are some “gamification” books here, most of this has to do with recent research into human motivation.

I would also take a moment to check out BJ Fogg’s work from the Persuasive Technology lab at Stanford. What I like about BJ’s work is that he seems very focused on making simple models for clear communication. BJ did a good job convincing me that most people have something different in their head when you say “behavior” or “engage” or “change”. Given that, you need a kind of simple vocabulary for talking about what behavior intervention your are discussing. So you should understand the following basic concepts.

I have decided that this is a good place to keep links to important videos after watching Jane McGonigal (Super Better above) give an awesome TED Talk.

If you feel like learning about programmable self from a conference, then I suggest a couple of options:

I will add more links as time goes on. What did I miss? Leave me a comment with your favorite resources for behavior change.


Hacking data: showing patterns in kids health

Here is my submission for the Local Children’s Data Health 2.0 developer challenge. The challenge was to make data available through kidsdata.org come alive.

Generally, the red circles correspond to the percentage of child allergy suffers who had -seen- a doctor, but had no specific plan to address their condition. The red tags, are healthcare providers from the NPI database that are listed as experts in kids allergies… the top of the field for asthma treatment. We are using these “super experts” as a proxy for the availability of specialist care for allergies generally. Notice the under-served areas… The specialist are clustering in the high-population areas. Hopefully this map will inspire an expert to move to Eureka, or Santa Maria..

Here was my process for this for my hack:

  • I would only use Open Source software or Open APIs. The idea here is to show just how powerful FOSS tools can be in health data analysis.
  • I have just created the best API to the National Provider Identifier database at docnpi.com, so I have this rich datasource that previously has not been available as an API.
  • I wanted to target something from kidsdata.org that was directly related to the availability of healthcare, something that you can measure geographically using the docnpi.com API.
  • I chose Asthma, because this is something that clearly responds to treatment.
  • I wanted to document my process to show how easy this kind of analysis is with the right tools.

Ok here’s what I did…

  1. First, I browsed kidsdata.org for asthma information. That leads you straight to this analysis of asthma hospitalizations for young children over the last few years.
  2. Then I started digging for source data. It looks like the California Health Interview Survey was a substantial source of the data.
  3. They offer Public Use Files of the original survey data. I signed in, and the terms of use for the data were reasonable, and not contrary to my purposes or Open Source. So I signed up and went to download the data.
  4. Sadly, the data was only available in three proprietary data formats, Strata, SPSS and SAS. This was obviously designed for academics that think using proprietary software is ethical and normal. Thankfully there are other options. The R project is where I usually turn first for stats help, but I actually found that there was an Open Source SPSS alternative called PSPP. Using PSPP I was able to open the SPSS data file. Victory for Open Source! It would be nice if organizations like CHIS would release in simple XML or CSV, which is much friendlier to hackers and people who believe in software freedom.
  5. My feeling of elation was short lived. The data had no geo-coded information. Which makes sense, that would make re-identification much easier. There had to be another way to get geo-coded data.
  6. And there was. AskCHIS is a powerful data reporting tool that allowed for xls data download. Again, I am amazed that CHIS would chose to run with a proprietary format without an open alternative. They used alot of advanced xls layout options that meant that an export to CSV would never work. An API would be even better, but at least CSV would allow me to actually parse a file instead of cutting and pasting which is what I ended up doing.
  7. But I had access to lots of data. I could see several different measures of asthma that I could have used in my mashup. This included lots of stuff like missed school days, emergency room visits, diagnosis of asthma, symptoms in the last twelve months… etc etc. If CHIS had given this data up using an API, I would have been able to merge the various asthma measures into an overall asthma status score… but it would have take a week of cutting and pasting to do that manually.
  8. So I had to choose one data point and run with it. I chose “Health professional ever provided asthma management plan“. This was asked to parents whose kids already had a doctor who was “treating” the asthma. I thought this was an interesting question because it seemed to correlate strongly with doctor-availability, something that I had good geo-coded data on.
  9. Now what provider data should I compare it to? Using docnpi.com I can easily grab a list of all/most of the doctors in California who specialize in treating allergies in children I decided to use that as a proxy for “available allergy specialists”. Of course, I had a serious advantage here, because I had already done the work of changing the NPI database into something I could access using an API (that is the idea behind docnpi.com). This easily saved me 30 hours of work on this project alone.
  10. So now I have the data I want… but what now? I had addresses for the doctors and clinics from the NPI database, but the asthma data was coded by county. No problem, I just needed to geocode the counties into longitude and latitude. If I had a rich data source from CHIS, it would have been worth writing a script to do this, but since I was using cut-and-paste data, with about 75 rows, it was much simpler to just manually geocode everything. Which is what I did. More cut-and-paste.
  11. But now I have geo-coded data for both data sources.
  12. I needed a method to graphically display geo-coded scoring. This is pretty easy to do using proprietary GIS tools, even costless tools like Google Earth. But I wanted to keep things simple and Open Source at the same time. Enter the EInsert extension to Google Maps API v2. This allowed me to overlay png circle graphics on a Google Map, and size them in accordance with their percentage (bigger is worse, it means more of the kids did not have asthma plans).
  13. Then something tickled my brain. Using circles to represent scaled data is problematic. There is solid research indicating that humans have trouble estimating the area of circles in relation to each other… So I used the ratio suggested by James Flannery to counter this effect. Now the circles are sized in a way that indicates their relative meanings in a somewhat more appropriate way.
  14. Now I had a Google Map that displayed data regarding the frequency of plans as meaningfully sized circles over the California state. This data shows some predictable effects. First, the worst areas are either very urban or very rural. Exactly the places that have trouble attracting medical talent. That means that on this map, Ureka and Los Angeles urban counties have similarly sized circles.
  15. Now all I needed to do was overlay the doctor data on this map. This turned out to be pretty simple. I already have a link to provide a Google Map display of any small search on docnpi.com. For instance, here is the link for the map for the search on allergists in California. All I needed to do was copy the html and javascript for the doctor map and integrate the map with the Asthma data map I had already made.
  16. So far, that maps looks pretty good. However, there is no easy way to tell which county, specifically, a given circle represents. I decided that the simplest way to address this was to dynamically rewrite the png using the gd library of php. I would pass the php script a label, and it would generate a circle with a label on it. This would allow me to label all of the circles on the map. As usual, stackoverflow provided a quick and dirty solution. (update 4-20) I realized that the label should show both the name of the county, and the percentage without a plan… now it does.

Take a look at the final result.

Notice that the shapes scale automatically as you zoom in. Try zooming in to Los Angeles or San Francisco to compare the compacted counties more closely. Also note that you can actually get the name of particular doctor that specializes in the treatment of asthma directly from the map. If you click the link you can get all of the contact information from docnpi.com

Which brings us to the point of this exercise.  A better view of the data can prompt change.

If you are a parent of a child with Asthma in one of the “big circles” you need to know that the long term treatment of Asthma requires a plan. If you do not have a plan, the reason might be that there are not enough doctors around you to provide the help you need. This map can put you in touch with the nearest expert.

If you are a doctor, who specializes in childhood allergy treatment, this is an opportunity map for you. Eureka is much smaller than LA or San Francisco, but you would have a near monopoly on a population that needs help with asthma. These people do not have the same access to specialized care and that might be a business opportunity for you. Moreover, a doctor who chose to focus on the urban areas in the larger cities might also be able to gain patients and profit. The data here shows that while there are lots of experts -around- the densely urban areas they are not meeting the demand for care. If a doctor could find a way to make money on a Medicare/Medicaid population in these urban areas, this might also be an opportunity.

Seeing the health data in a new way can provoke change. I hope you think my application is cool and sexy, but frankly I do not give a damn about that. I want to make a difference, not toys.

People remember Florence Nightengale as the mother of modern nursing. But she once made a diagram that changed the way people thought about war. It was that diagram that gave her much of the political clout she needed to create the field of professional nursing that we know today.

I have made the NPI data more liquid with docnpi.com. Organizations like CHIS need to a much better job of making their data accessible. If I had been able to access the data from AskCHIS in a normalized and open format using an API, I would have been able to make mapping system that would allow the overlay of -any- type of doctor with -any- health data measure that they survey.

So that leaves me with a call to action for three groups: Patients -> find better care near you. Doctors -> go where the patients need you. Researchers -> expose your data in open formats using APIs and open file formats.

Of course, I publish my source code under an Open Source license. Enjoy.


QR code stencils, the problem

I love QR codes.

I think the notion of simple graphical URLs is beautiful and elegant. If my wife were a graphical data object, I think she would be a 2D QR code.

Think of it, you can put links anywhere you want, in the real world!

You can put them on tshirts, coffee mugs, stickers, business cards… anything in the real world becomes a link to something in the virtual world. Awesome.

I have been playing with QR codes, with an eye towards gamification and behavior change for quite some time. I love the fact that with android and/or iphones you can rely on the GPS coordinates that webkit (the core of both browsers) will provide, makes a QR code a token that can do different things in different places. Think of the possibilities!

You could make geo-caching much much more interesting…

But how do you make durable (or intentionally not durable) QR code in a reproducible way? How do you manufacture large QR codes, that can be scanned accurately at a distance?

The first approach is simple to print the QR codes on either single sheets (A4 or US letter) and then clear paste them to some type of flat surface. You can use throw-away planks of wood from the hardware store to make durable QR code links. But what if you want to make a QR code on some permanent surface, like a wall or pavement. This basic idea can be taken pretty far, for instance you can paste the printed QR codes into ceramic tile or even bake it on, for a near permanent tag.

The simplest solution would be to use a stencil with black spray paint. QR code scanners vary greatly in their ability to pick up contrast, but the color black, and some other color, will almost always pick up. This has an advantage over gluing paper, because you can tag objects that are not entirely smooth. Moreover, with spray paint that does not damage the surface (more later) you can create images that can be placed out in public, non-destructively.

But what is the problem with a QR code stencil? In a word, islands. In order to make a stencil with, say, photo paper (which would otherwise be a great technique), you need a way to address bits that the stencil needs to block, that are not physically connected to the rest of the stencil. Its easier to show than explain. If you are spray painting black, for instance, and you want to make a stencil of the following QR code, you will have the following trouble spots:

A demo of the QR code islands that make stencils difficult

See the issue, the two anything white, that does not connect to something else white (even by a corner) is going to be an issue. You might be able to make something clever for the places where this happens in most/all QR codes, but each QR code is going to have random “islands” that are often just one pixel big… and in different spots each time. These are the real headache. Making a traditional stencil simply will not work.

Also, making a stencil is very very slow. If you have to cut each pattern by hand.. ouch… way to much time. We need something faster too!

My first approach to solving this problem was to try and find a programmatic solution. For a given URL, there are many different ways to encode into a QR code. It might be possible to use an algorithm that detects this type of “island status” to find a QR code solution that did not happen to have any islands. You could make an application smarter by posting meaningless GET variables at the end of a URL until you found a version of the URL that would work (of course, I am focused on using URL shorteners like bit.ly to ensure that you have a simple-as-possible QR-code. The more character in the URL, and the more complex the QR code is and the harder it is to make a stencil. The shortener ensures that the QR code is manageable.

I gave up on this technique after noting that there were islands in all of my test runs for various URLs, but the idea is sound.

Facebook Places powers the first social election game

Today, Farrin Anne ‘Crane’ Gustafson, the manager of the social media strategy for the Clayton Trotter (my father) congressional campaign became the first person in history to use Facebook Places to check-in to a new kind of application: a social election game. She earned the “At the Voting” badge by checking-in using Facebook Places as she early voted today on the first social election game that I have been frantically coding for the last few weeks. The game concept is simple: it rewards real-world political activity with points and badges. There have been a lot of discussion about how Foursquare et al. might be used politically. This is especially true of Gowalla, which has been targeting politicians.  There are also people who have talked of using a facebook game to energize supporters. But as far as I know, my application is unprecedented for the following reasons:

  1. The application is the first to allow its users to specifically earn badges for checking-in at polling stations during voting. This is much different then using the application to mark political rallies etc etc. Obviously, you do not have to vote to get the badge, you do not even need to be of voting age, or registered to vote in the state. All you have to do add the application on facebook, check-in at a polling station during voting (even after hours) and you get credit for the badge. Of course -most- of the people who do this will be registered voters who want to essentially participate in -perfect- exit polling.
  2. The application is built directly into facebook. That means that a users “check-ins” are something they can share directly with their facebook friends. There is no longer any need for a third-party application, or the need to limit the reach of the application to the very very few users of the geo-game like foursquare. This is an app for everyone on the largest single social network.
  3. You can check your friends in when you vote, and that counts too. So one iphone+facebook application can support several different users.
  4. The game does not just support check-ins. You can sign up for vote reminders, get credit for volunteering, and most importantly, use the application to provide a structured endorsement on your wall.
  5. Because it is powered by the facebook social network, you get full credit when your friends score. When your friends show up at the polling station or sign up for a vote reminder, you get credit too. You “win” by cooperating to get the candidate elected. Because there is a powerful proxy for detecting real votes (polling station check-ins), it will be easy to tell who the “vote influencers” were.
  6. The design of the application allows for a deep integration with the ability for the crowd to communicate back to the candidate. If my father is elected, he will be able to use the application to mine the facebook social grid and engage with his constituents in a fundamentally new way.

I believe that all of these elements together (and not just GEO apps or just political games) are the foundations for a new class of facebook game: For now I am calling them social election games. I believe they are the future of politics.

Up until now political power in the United States came from essentially two places: sources of money and sources of fanatical single-issue voters. Democrats cater to different type of unions. Republicans appeal to evangelical Christians. Democrats appeal to environmentalists. Republicans appeal to big business. Each small group would either deliver either a small cache of extremely loyal voters, or expensive advertising, or both. People who were able to directly influence candidates and politicians were either donors, or the leaders of these extreme groups. In short, the people with political influence in this country have become those with agendas that are generally out of sync with anything remotely mainstream. I made it clear, in my endorsement of my father, that I do not agree with all of his extreme views. I support him primarily because I know he will be more careful with defense spending than his opponent has been, and that is a very important issue to me.

I feel out of sync with my fathers extremely conservative positions and I feel (slightly more) out of sync with his opponents extremely liberal policies. They have done well as candidates because they have appealed to the extremes. I know of no reasonable person who agrees with either candidate on all of their political stances. (I am aware, and intend, the implication that if I know you and you agree with my dad 100% that I think you are unreasonable; and that my father, in the sense that he obviously agrees with himself entirely,  is also unreasonable. Given the Tea party energy, me saying that my father is unreasonably conservative, will do nothing but help him. I endorsed my father because he was -more- reasonable than his opponent, not because he was reasonable. Frankly, who thinks of their own parent as ‘reasonable’ in any case… I mean really…)

American politics as a whole suffers from the Myth of Polarization. We have turned politics into a kind of entertainment, something like pro-wrestling. Listen to any televised political commentator and tell me they do not sound like they are going to break out at any moment with “aaaarrree you ready to ruuuuuuuumble?” and then present the surprise cage fight…

Why do we have this kind of environment? Because that kind of low-brow drama gets people to vote. But what if we had a different way to get people to vote?  What if we could have simple, polite conversations with our friends about who the next sheriff or Congressman should be? I think if those conversations were easy, if they were simple and if voting itself were a fun process, then we might see a trend back to center. A trend away from blood-sport politics. In this world, the wielders of influence would not be the arch-bishop, but the local priest, with 300 facebook friends who actually trusted him as a human being. Instead of caring about who the chief of police voted for, you would care about which candidate the policeman who lives down the street from you (with 354 facebook friends) endorsed. Instead of caring about who the national teachers unions endorsed for president, you would care more about your kids third-grade teacher (54 followers on Twitter). Instead of caring about some insane radio talk show host, you might care about the opinion of an intelligent college kid from South Dakota with a podcast followed by 300 people.

In this hopeful/hypothetical world, real-world trust relationships, enabled by virtual social networks, will become the new political currency. I want people like my father and his opponent to care much more about someone who has 1000 followers on facebook or twitter, and has shown that 730 of those followers take their endorsement seriously, than the person who can pay for a political ad for them for $100k.

The whole point of social media is that it is -not- a broadcast medium. It is an engagement medium. No matter who wins the election in the San Antonio ‘Alamo’ district in 2010, this application is a template for something much much bigger. The irony is that now that I have proven that it is possible, others will try to mine this for a profit. I will have none of that. After the election, I plan to Open Source the code. I plan to start a project to enable a whole slew of social election applications for different groups and for different interests. This open source project, (which is looking for a project manager) will keep the goal of bringing reasonableness back to politics as a central design goal.

P.S. Polls indicate that the election between my Clayton Trotter (my father) and Charlie Gonzalez will be very very close. I honestly think this application might tip the scales in my fathers favor.  How cool is that?

(Update 11-22-2010) P.P.S Sadly, my father lost to Congressman Gonzalez… oh well..

Happily it does look like this game might be on to something. It was featured on some of the top tech blogs:


Pretty cool!!