Direct should be in NPPES

For those wondering, the Direct Project is a secure email protocol based on SMTP/S-MIME for doctor-doctor and doctor-patient secure communication. It is all-but-required in Meaningful Use version 2 and it is intended to replace the fax machine for the transfer of health information in the United States. I had a hand in designing the protocol.

NPPES is the authoritative source of doctor contact information in this country. <shamelessplug> is probably the best way to actually search the NPPES data,  and we have an API and everything. </shamelessplug> But you can download the NPPES data yourself and almost every insurance company, clearinghouse, HIE vendor, etc etc does this on a regular basis, in order to ensure that they have updated contact information for doctors, hospitals and other organizations in the healthcare system.

The NPPES publishes the NPI, which is basically the “social security number” for doctors and hospitals as they conduct business. Anyone who is legitimately connected to healthcare can get an NPI and you should, just so you understand what the signup process looks like.

When you register for your NPI, you have the opportunity to insert your contact information. Once you have an NPI, CMS publishes that contact information. This is the list of every possible contact field in the NPI data:

  • Mailing Address Telephone Number
  • Practice Location Address Telephone Number
  • Authorized Official Telephone Number
  • Mailing Address Fax Number
  • Practice Location Address Fax Number
  • Mailing Address
  • Practice Location Address

This bundle of information is what a physcians is required, under HIPAA (the parts no one pays attention to) to keep updated. Right there in the middle you can see two fax numbers. As long as the NPPES data does not have a Direct Mailing address listed in addition to Fax numbers, the message from CMS is clear “Use Fax for health information exchange, not Direct”.

Here, are the reasons that NPPES is really the only place that a centralized Direct provider directory can be kept.

  • It is the only contact information that a physician has a legal obligation to keep updated.
  • The NPI is the basis for “HIPAA covered transactions” which could be conducted over Direct if there were a clean linkage
  • Direct is designed to handle Cert discovery, so there is no need for NPPES do bother with any kind of x509 stuff..
  • The alternative is “competitive” directories from private industry which directly (and already) translates to balkanization.
  • At the stage, all CMS needs to do is starting asking for the address as part of the NPI signup as a non-required field… This would not need to be a mandate
  • But even having a space for it in the record would cause Direct adoption to explode
  • By publishing Direct Addresses in NPPES, it would be trivial to detect and call attention to “certificate balkanization” which is the biggest threat to Direct’s success
  • There might be complicated reasons for also doing some other provider director solution. Which is fine as long as it is additional to putting Direct emails into NPPES..
  • There is literally no other, better, way to get Direct into the conciousness of every doctor in the US. Until you start requiring Direct Email for Meaningful Use Attestation, but that is a post for another day.

Mostly, I just wanted to write this down as a brain dump so that others can easily email a link around as to why this is not a terrible idea.

I have proposed this several times, in person and I believe in some comment to Meaningful Use or something else on I am certain that I am not the only one, but I tend to be more vocal than average about Health IT policy implementation details. But I cannot find what I have already written anywhere, and it is probably included in something longer I wrote. I am unfortunately given to ranting when people formally ask for my opinion. So I wanted to write a short post about why this is clearly the way forward for Direct Project adoption.

If you have anything to add to my bullet points, email me at fred dot trotter at that email service that google runs.






Hacking data: showing patterns in kids health

Here is my submission for the Local Children’s Data Health 2.0 developer challenge. The challenge was to make data available through come alive.

Generally, the red circles correspond to the percentage of child allergy suffers who had -seen- a doctor, but had no specific plan to address their condition. The red tags, are healthcare providers from the NPI database that are listed as experts in kids allergies… the top of the field for asthma treatment. We are using these “super experts” as a proxy for the availability of specialist care for allergies generally. Notice the under-served areas… The specialist are clustering in the high-population areas. Hopefully this map will inspire an expert to move to Eureka, or Santa Maria..

Here was my process for this for my hack:

  • I would only use Open Source software or Open APIs. The idea here is to show just how powerful FOSS tools can be in health data analysis.
  • I have just created the best API to the National Provider Identifier database at, so I have this rich datasource that previously has not been available as an API.
  • I wanted to target something from that was directly related to the availability of healthcare, something that you can measure geographically using the API.
  • I chose Asthma, because this is something that clearly responds to treatment.
  • I wanted to document my process to show how easy this kind of analysis is with the right tools.

Ok here’s what I did…

  1. First, I browsed for asthma information. That leads you straight to this analysis of asthma hospitalizations for young children over the last few years.
  2. Then I started digging for source data. It looks like the California Health Interview Survey was a substantial source of the data.
  3. They offer Public Use Files of the original survey data. I signed in, and the terms of use for the data were reasonable, and not contrary to my purposes or Open Source. So I signed up and went to download the data.
  4. Sadly, the data was only available in three proprietary data formats, Strata, SPSS and SAS. This was obviously designed for academics that think using proprietary software is ethical and normal. Thankfully there are other options. The R project is where I usually turn first for stats help, but I actually found that there was an Open Source SPSS alternative called PSPP. Using PSPP I was able to open the SPSS data file. Victory for Open Source! It would be nice if organizations like CHIS would release in simple XML or CSV, which is much friendlier to hackers and people who believe in software freedom.
  5. My feeling of elation was short lived. The data had no geo-coded information. Which makes sense, that would make re-identification much easier. There had to be another way to get geo-coded data.
  6. And there was. AskCHIS is a powerful data reporting tool that allowed for xls data download. Again, I am amazed that CHIS would chose to run with a proprietary format without an open alternative. They used alot of advanced xls layout options that meant that an export to CSV would never work. An API would be even better, but at least CSV would allow me to actually parse a file instead of cutting and pasting which is what I ended up doing.
  7. But I had access to lots of data. I could see several different measures of asthma that I could have used in my mashup. This included lots of stuff like missed school days, emergency room visits, diagnosis of asthma, symptoms in the last twelve months… etc etc. If CHIS had given this data up using an API, I would have been able to merge the various asthma measures into an overall asthma status score… but it would have take a week of cutting and pasting to do that manually.
  8. So I had to choose one data point and run with it. I chose “Health professional ever provided asthma management plan“. This was asked to parents whose kids already had a doctor who was “treating” the asthma. I thought this was an interesting question because it seemed to correlate strongly with doctor-availability, something that I had good geo-coded data on.
  9. Now what provider data should I compare it to? Using I can easily grab a list of all/most of the doctors in California who specialize in treating allergies in children I decided to use that as a proxy for “available allergy specialists”. Of course, I had a serious advantage here, because I had already done the work of changing the NPI database into something I could access using an API (that is the idea behind This easily saved me 30 hours of work on this project alone.
  10. So now I have the data I want… but what now? I had addresses for the doctors and clinics from the NPI database, but the asthma data was coded by county. No problem, I just needed to geocode the counties into longitude and latitude. If I had a rich data source from CHIS, it would have been worth writing a script to do this, but since I was using cut-and-paste data, with about 75 rows, it was much simpler to just manually geocode everything. Which is what I did. More cut-and-paste.
  11. But now I have geo-coded data for both data sources.
  12. I needed a method to graphically display geo-coded scoring. This is pretty easy to do using proprietary GIS tools, even costless tools like Google Earth. But I wanted to keep things simple and Open Source at the same time. Enter the EInsert extension to Google Maps API v2. This allowed me to overlay png circle graphics on a Google Map, and size them in accordance with their percentage (bigger is worse, it means more of the kids did not have asthma plans).
  13. Then something tickled my brain. Using circles to represent scaled data is problematic. There is solid research indicating that humans have trouble estimating the area of circles in relation to each other… So I used the ratio suggested by James Flannery to counter this effect. Now the circles are sized in a way that indicates their relative meanings in a somewhat more appropriate way.
  14. Now I had a Google Map that displayed data regarding the frequency of plans as meaningfully sized circles over the California state. This data shows some predictable effects. First, the worst areas are either very urban or very rural. Exactly the places that have trouble attracting medical talent. That means that on this map, Ureka and Los Angeles urban counties have similarly sized circles.
  15. Now all I needed to do was overlay the doctor data on this map. This turned out to be pretty simple. I already have a link to provide a Google Map display of any small search on For instance, here is the link for the map for the search on allergists in California. All I needed to do was copy the html and javascript for the doctor map and integrate the map with the Asthma data map I had already made.
  16. So far, that maps looks pretty good. However, there is no easy way to tell which county, specifically, a given circle represents. I decided that the simplest way to address this was to dynamically rewrite the png using the gd library of php. I would pass the php script a label, and it would generate a circle with a label on it. This would allow me to label all of the circles on the map. As usual, stackoverflow provided a quick and dirty solution. (update 4-20) I realized that the label should show both the name of the county, and the percentage without a plan… now it does.

Take a look at the final result.

Notice that the shapes scale automatically as you zoom in. Try zooming in to Los Angeles or San Francisco to compare the compacted counties more closely. Also note that you can actually get the name of particular doctor that specializes in the treatment of asthma directly from the map. If you click the link you can get all of the contact information from

Which brings us to the point of this exercise.  A better view of the data can prompt change.

If you are a parent of a child with Asthma in one of the “big circles” you need to know that the long term treatment of Asthma requires a plan. If you do not have a plan, the reason might be that there are not enough doctors around you to provide the help you need. This map can put you in touch with the nearest expert.

If you are a doctor, who specializes in childhood allergy treatment, this is an opportunity map for you. Eureka is much smaller than LA or San Francisco, but you would have a near monopoly on a population that needs help with asthma. These people do not have the same access to specialized care and that might be a business opportunity for you. Moreover, a doctor who chose to focus on the urban areas in the larger cities might also be able to gain patients and profit. The data here shows that while there are lots of experts -around- the densely urban areas they are not meeting the demand for care. If a doctor could find a way to make money on a Medicare/Medicaid population in these urban areas, this might also be an opportunity.

Seeing the health data in a new way can provoke change. I hope you think my application is cool and sexy, but frankly I do not give a damn about that. I want to make a difference, not toys.

People remember Florence Nightengale as the mother of modern nursing. But she once made a diagram that changed the way people thought about war. It was that diagram that gave her much of the political clout she needed to create the field of professional nursing that we know today.

I have made the NPI data more liquid with Organizations like CHIS need to a much better job of making their data accessible. If I had been able to access the data from AskCHIS in a normalized and open format using an API, I would have been able to make mapping system that would allow the overlay of -any- type of doctor with -any- health data measure that they survey.

So that leaves me with a call to action for three groups: Patients -> find better care near you. Doctors -> go where the patients need you. Researchers -> expose your data in open formats using APIs and open file formats.

Of course, I publish my source code under an Open Source license. Enjoy.


NPI data, the doctors social network

(Update Feb 18 2011: has moved to, I have adjusted links accordingly)  I have been working, part time, on a project for nearly two years to dramatically improve the quality and depth of information that is available on the Internet from the NPI database. For those not familiar, NPI or national provider identifier,  is a government issued health provider enumeration system. Anyone who bills Medicare or subscribes medication now has to have an NPI record, which basically means that it is a comprehensive list of individual and organizational healthcare providers in the United States. You can download the entire NPI database as a csv file under FOIA. There are a little over three million records in that download.

Each healthcare provider provides both credentialing and taxonomy data for inclusion in the database. Healthcare provider taxonomy codes are a fancy way of detailing just what type of doctor you are. Because each provider -can- provide such rich data, there is a tremendous amount of un-used information in the database. NPPES does not do very much data checking, so there is a lot of fat finger data too. I have been working on scripts to improve the overall quality of the data as well as accelerate some obvious  datamining applications. I am happy to announce that after several years of development I am ready to beta launch a dramatically improved NPI search service.

Please visit to try it out.

I have recorded several videos that I will attempt to embed here to show you just what it does, but for those of you who prefer to read:

  • The NPPES search engine has a limit of 150 results, docnpi has no limit
  • NPPES does not allow you to search by type of provider or organization, docnpi allows you to search by both type and group of types.
  • NPPES only lists one taxonomy per provider and it is often over-general, docnpi lists all provider taxonomies in each result
  • NPPES pages the results, while all of the results are listed at one page at docnpi (lets you use your browsers ctrl+f function to do quick sub-searches)
  • the results from any search you do is downloadable as json, xml, or excel/csv
  • No search, except an completely empty search, is too general for docnpi. If you can wait for us to process the data, we will do it for you.
  • Each NPI page automatically exposes the “social network” of any provider or organization by listing all other NPI records that share addresses, phone numbers or identifiers
  • Each NPI page displays a google map for the practice and mailing address listed in the NPI record.

I have lots more features on the way, and I know I need to optimize the site. Loading single NPI record takes too long, because I am doing several huge SQL queries across a very full database. Still if you have some patience you can give me some feedback on the site now. Here are the videos that demo specific searches and expose the data richness of the NPI dataset.



Most people have no idea how much information is truly available in the NPI database.

The one group of people who will probably immediately find the site helpful is medical billers, or insurance company employees who want to understand the relationships between different providers. They have been frustrated by the NPPES search tool for a long, long time.

But most people have no idea that this kind of information is even there! I cannot tell you how many people have no idea, for instance, that public health offices very often have an NPI record. Just looking at the taxonomy drop down, should be very enlightening. Using this search engine, you can get very specific and detailed information about the relationships between location and healthcare provider density. You can ask questions like “How many foot doctors does Denver have per-capita compared to New York”. Before, one had to download the data yourself and then run your own queries. But the data download is not normalized and it almost impossible to determine who shares an address unless you normalize across address. Even then without database optimizations (I have learned so much more about MySQL optimization on this project…) complex queries could take hours to complete. The site probably will feel “slow” to you because it can take a long time even to analyze the data for a single provider (30-50 seconds) but many of the matching provider data displays would have been impossible before. I hope to do more optimization and other improvements and I would like to have your advice doing so. Please click the red feedback tab and tell me how to improve the site!

Essentially this site will make the NPI data set far more accessible that it has been before. Stuff that is now easy to do, was previously the domain of expensive data toolkits or data mining experts. This data should be usable by everyone… and now it is.

I should be frank, this site has to pay for itself.  I have not decided how to charge or what I should charge for or even if I should charge. I will probably have to think about this once the number of searches starts to take the server to its knees. Once that happens I will have to spend “real money” on a dedicated virtual server/cluster and that will mean the site must be monetized somehow.  I will be probably end up limiting the number of searches that a given user can do until they pay $20 or something like. That will let most people use the site without paying, but when people start to overuse my CPU cycles they can afford to pay a little. But until my server starts to choke, everything is free. Enjoy.

Computer Science should be required for Medical School


Currently, in Texas,  one is required to take Physics I and II and Calculus I (or equivalent stats class) to apply for Medical School. That is not all, of course, but they are requirements.

So far, I have never meet a Medical Doctor who needed to use calculus. In fact the only ones that might need to really understand the subject are those who are doing high-level mathematical modeling for Bio-Informatics. For these researchers Calculus is not enough. Your average primary care doctor, *never* uses calculus. They also *never* use Physics! Of course they have all kinds of systems that obey the laws of physics, including blood pressure, syringes etc. But they never treat a patient and think “hmm.. what was the relationship between volume and temperature of a gas….”. They think in higher level abstractions like “When blood pressure is high that could mean X, or Y or Z depending on….”

The real benefit of Physics and Calculus is that they introduce you to new ways of thinking. That new way of thinking makes it easier to understand higher level concepts that you *will* use everyday as a doctor.

I recently had a conversation with a clinician about some work that I am doing on the NPI database which lists every doctor (that prescribes medicine) in the United States. The conversation went like this.

Fred: “I need to munge the data, I need to process the data in a different way than it is listed (a flat CVS file), I need to turn into a real normalized database before I am going to be able to use it effectively”

Clinician: “Wow… Ok… How long will that take you”

Fred: “Well you do not want me to work on this full-time do you? I only have about 5 hours a week I can work on this in my current schedule..”

Clinician: “Yea.. but your other projects are pretty important… given only 5 hours a week, how long would this take?”

Fred: “three months.”

Clinician: “Boy… that’s a long time… I know! Why don’t you just create a database with Texas doctors, instead of all the doctors in the United States! How long will that take?”

Fred: “three months.”

Clinician: “That makes no sense at all. How can that be possible?”

Fred: “Well making the database smaller does not really help me at all, that is the part of the problem that the computer takes care of, not me”

Clincian: “Cmon. A smaller database should mean a shorter time, this seems almost obvious to me. You should be able to do it faster than that. ”

Fred: “Ok.”

Clincian: “Ok, so how long for just a Texas-only database?”

Fred: “three months”

and so on…

Now lets point out… first of all… that this Clinician is not stupid. He was ignorant of how computers, and more importantly of the process of programming computers. This issue is worth discussing in detail because it really illustrates the thought gap between someone who knows how programming works, and someone who does not.

There are many NPI records, specifically in a recent (May 2008) release of the database there were 2,557,650  lines in the comma delimited file as revealed by “wc -l” (subtracting one to account for the first line in the file, which is full of labels… not a real NPI record)

The changes that I need to make to fix the NPI database are pretty complex (fodder for another post) but for now, I will just say that it is a “Complex Reordering of each Record”. Here is how my process for approaching this problem looks:

First I need how to process a single record. So I write a function to do that. For the sake of prose, I will call that function

complexReorderingOfEachRecord( $Record )

I will look at the one Record in the NPI database and then try to pass that Record into the function and see if it does the right thing. The complexReorderingOfEachRecord is a long function, it does lots of really complex things. So complex in fact that I really cannot keep all of its functionality in my head at one time. I use various ways of abstracting the problem so that I can think about the problem in useful chunks, and figure out if each chunk is working.

I am going to actually include some psuedocode in this post.  Psudeocode is code that is not exact enough for a computer to execute, but is clear enough that a human can read it and understand what it does. Programs are like recipies, they are simply exact  instructions that the computer will follow. I will use some basic programming elements in my examples (Note to programmers: this blog is also for clinicians… so you can safely skip this…)

  • Sequential execution – Each line of code is read and executed by the computer before moving down to the next line
  • Variables – a variable is a changing placeholder for information. Each time a program is run, it is possible that the variables will contain different values. I use php, so I mark my variables with dollar signs “$”. This ends of working alot like the “x” in an algebra problem, it can have different values depending…
  • Functions – It is often useful to merge many simple lines of code into a single function. Later you can execute all of the code inside the function by calling the function name and passing data to the function by putting inside the parens “()” after the function. It is basically a way to group useful bits of commands together.
  • If/Else statements – when the computer reaches an IF statment it looks at the contents of the paranthesis “()”beside the if statment. If the contents are “true” then the code inside the braket symbols “{}” following the if statement is run. If the statement in the parens “()” is “false” then the code in brackets “{}” following the ELSE statement is followed.

So the inside of the complexReordingOfEachRecord looks like this

function complexReorderingOfEachRecord( $Record){





(Note to Programmers: I am actually using an OOP design for my project, so in reality these would be function calls on objects, but I want to keep this on a procedural level to make my point)

complexReordingOfEachRecord, reordingStepOne,  reordingStepTwo, reordingStepThree are all functions. The contents of recordingStepOne are not shown, but they are custom functions, meaning that I wrote them. $Record is a variable. There are no IF statements yet.

Ok.. I write this code, test it, debug it about 15 times before it works to import 1 record. But then I run the code on the first 10 records my system blows up! Some NPI records are not for Doctors at all, they are for organizations that provide healthcare: Doh! I need to run the program differently for people vs organizations!

So I modify my function to look like this:

function complexReorderingOfEachRecord( $Record){



$isAPerson = reorderingStepThree($Record);







Now the function has one IF statement that looks in the variable isAPerson and then executes either doOneThing or doAnother based on the contents of $isAPerson.

I have to code, test and debug this another 30 times to get it working. I have to test it more times because the new function calls doOneThing and doAnother do not work without modifications to reorderingStepOne and reorderingStepTwo. I have to switch between thinking about different part of the problem very quickly to make sure it works. To start, everything breaks, but as I discover why, by running the program again and again, I make small changes that eventually make the whole process work correctly. The shorthand for this process is code, test, debug, repeat.

As I am working I start to run the program on the first 100 records. I notice that often the person in the record is not an M.D., there are also dentists and other clinicians who are in the database ! But my work is focused only on M.Ds. So I modify the code again:

function complexReorderingOfEachRecord( $Record){



$isAPerson = reorderingStepThree($Record);


$isAnMD = doOneThing($Record);










Now I have a “nested” IF statement, an IF statement that exists in another IF statement.

As before all of the other functions must be modified to make my two new functions processMD and processNonMD work correctly. This requires 50 repetitions of code, test and debug. Sometimes one code, test and debug cycle takes 30 seconds. Usually it takes about 5 minutes. Sometimes it takes as much as 15 hours.

Now I am testing against 1000 rows of the NPI database, and it works perfectly! I have put in about 40-50 hours (or about 3 months at 5 hours a week)

But now what! I have only imported only 1000 rows of the database. Now I will explain how I ran the code on one row, 100 rows and then 1000 rows. I will introduce the WHILE statement to my simple psuedo code.

$i = 1

while($i < 1000){

$Record = getANewRecord()


$i = $i + 1


The “while”is just like an if statement, except that when the contents of the curly brackets “{}” are done, then the contents in the parens “()” are re-evaluated. If they are still true, then the contents of the “{}” are run again. The $i variable starts at 1, and then grows by one every time the contents of the curly braces are run “{}”

So how do I import the whole NPI database? I change to code to look like this:

$i = 1

while($i < 2557650){

$Record = getANewRecord()


$i = $i + 1


Then I start the program and go to sleep. In the morning, all 2,557,650 records are correctly processed.

Once I had done the work to determine “How to change an NPI record” the computer simply repeated that process for as long as I wanted. Computers are so fast now, that even very very complex processes can be repeated very quickly.

You see *I* never import any data. The computer does that part. *I* the programmer tell it *how* to import that data. Like doctors, when programmers have a simple concept with big implications, we create an important sounding word for it. The important word for *how* to do an information task is “algorithm“.

If you get an algorithm right, computers do just what you want. If you get the algorithm wrong… computers do other things. If you get the algorithm badly wrong… God help you.  This is why computers often seem to have a “mind of their own”; when programmers tell them to do the right thing, they do exactly that. When programmers tell the computer to do the wrong thing… they do exactly that.

Any programmer reading this is likely going blind with boredom. But someone who has not programmed might likely be asking “Wait… what’s a function?” This is actually a pretty terrible introduction to programming. For something more real, I suggest you start here.

My point is this, computers make some types of tasks really easy. Getting to them to do those tasks, without making a serious mistake, is pretty difficult and time-consuming work. If you, as a clinician, do not understand what tasks are hard, and what tasks are easy, then it is almost impossible to evaluate the software you are using. I cannot tell how many times a clinician has requested a “simple change” that has taken me three weeks of programming. On the other hand, I cannot tell you how many times I have seen clinicians (or more often clinicians staff members) subject them selves to terrible software designs that would be trivial to fix.

To create an algorithm you need to understand two things:

  • What the computer can do
  • What the computer should do

There are some people, like my friend Ignacio Valdes, who have been extensively trained in Computer Science and Medicine. These people are amazing, because you can watch them switching back and forth between one part of healthcare IT (Clinical know-how) and the other (Computer Science know-how). But even these few gems (rare as hens teeth), cannot actually hold the complexity of even a single clinical IT problem in their head at one time. That is just not the way that programming, clinical care or anything truly complex works! Programmers must ignore parts of a program to improve on any given part. Clinicians must ignore parts of a patients body to address a problem with one part. (Most heart surgeons, for instance, remain unconcerned about the flaky skin problem while their patients are in open heart surgery.) Knowing what to ignore, and what deserves attention is often the true test of expertise.

The only way to deal with Healthcare IT is to create teams of people to manage the complexity together. The problem with that is that for any given problem domain, there is a danger that the communication cost will grow exponentially in relation to the number of participants. It is common for the communication costs to totally destroy all productivity in a given group. But at the same time, it is simply not possible for a single person to correctly navigate the complexity of even a simple Health IT software project.

The solution to this problem is found in the VA VistA development model. Here are the rules:

  1. You do not work on “the system”, you work on part of the system. VistA is actually hundreds of programs that work together.
  2. Whenever possible you work in pairs. Any more gets unmanageable.
  3. One person must understand everything they need to about the programming of the clinical issue. We can call this person the Programmer. (In the VA this is a Programmer or a CAC)
  4. It helps if the Programmer has a basic healthcare vocabulary.
  5. Another person must understand everything they need to about the clinical problem itself. We can call this person the Clinician.
  6. It helps if the Clinician understands, basically, what is easy, what is hard, what is possible and what is impossible with computers.
  7. You rely on other pairs to address other clinical problems.
  8. You intentionally have redundant “programming pairs” so that you are forced to compete to make better solutions.
  9. When another pair makes a better solution to your problem, you celebrate that and adopt their code as the new starting point.

Its number 6 that this article is focused on. It would be really helpful if Physicians in particular were required to know what a “for loop” meant. Just like calculus and physics they will rarely, if ever, use that information. But for the time being, the fundamental lack of understanding of computer science in clinicians is holding healthcare back. Can you imagine speaking to your doctor if he or she had no idea what the word “pressure” meant in the phrase “blood pressure”. As it stands, most doctors do not really understand what the implications of the word “Information” in the phrase “Health Information Systems”.

What scares the hell out of me is not that the clinician above did not know how the programming process worked. Ignorance has a simple cure: learning. What scares me is that he was willing to pressure me to speed up the schedule, even after I explained how things worked. Trying to force a programmer to take short-cuts to make a deadline is a very very bad idea (see point number 4 here). Doctors, like military officers, often fail to recognize that in “being in charge” is contextual. It does not matter if a Doctor is right about a clinical issue, if they are wrong about a software design issue. The resulting software will fail to perform, despite its clinical correctness. Doctors cannot “be in charge” in software design the way they can in an operating room or in clinical practice. That does not mean they are not vital, it just means they should not be in charge. The programmers should not be “in charge” either. The “Clinical Pair Programming” that I am describing above is a description of the peer thinking that is required to solve these problems. When someone is “the boss”, (meaning they actively back only their own priorities) the system breaks.

The irony is that the few Doctors I know who are my peers with regards to computer science education, are often more hesitant to challenge me regarding my information systems opinions. Do not get me wrong; they often disagree with me, but not more than any programmer would disagree with any other programmer.

This is why I support an undergraduate computer science prerequisite for medical school.


Announcing NPIdentify

A while ago I was contacted by the folks at Health IT Transition (Now defunct.) regarding some NPI development. We decided to collaborate. They turned me on to the intricacies of the NPI database, and I have been doing skunkworks on the NPI database ever since. Sadly, they have been waiting on me since then, hopefully when I come out of skunk works mode, they (and you) will be pleased with the results. But those guy have been far from idle, I am pleased to prompt their announcement of

The site already does some pretty amazing things. It has a mechanism for viewing NPI taxonomy statistics and a tool that allows you to search through the NPI records for your state.

Soon, it will be the most advanced way to manage National Provider Identifier information on the web. Also see the new button on my side menu!!

(updated Feb 18, 2011) Note that my NPI work is now available at You can still download the same applications from if you want that type of interface. If you want really powerful National Provider Identifier search capabilities, is the best available!