QR Code as Art

Recently, I have come to realize how many issues I have hidden from by being a work-a-holic. Especially losing my mother and brother so close together. It was coming out in unhealthy ways and I have been looking for better ways to process those events, rather than letting them fester as a back-ground brain process.

I had two friends recently recommend “art as therapy”. And I loved the idea, it reminded me of how I have used tinkering/hacking as a kind of therapy in the past. I thought perhaps I could try both at once.

QR codes are going to have a small renaissance now that iOS 11 natively reads QR codes in the main camera application. This has renewed my obsession with QR codes, and I have been researching how QR codes, and art, and cryptocurrencies play along. You can embed crypto keys into QR codes, and you can embed QR codes into art, which is an amazing concept that I got from a the cryptoart booth at SXSW several years ago.

I am thinking about QR code stenciling again, as well as QR code tattoos.

But putting QR codes on art, or even spray painting them is just scratching the surface. One of the most interesting things that you can do with a QR code is merge it with art. This merger can be done in several different ways. First, you can just use a mosaic of some kind, and make the “boxes” of the QR code out of something interesting (Mosaic). Because of how QR code error correction works, you can typically fit things in the center of a QR code without interfering with the reading of the QR code (Overlay). The thing that QR code scanning apps typically care about it the contrast, which means that you can actually muck about with the contrast on a regular image and have the QR codes and the images merge (Contrast). Lastly, you can actually change the layout of the blocks in the QR code to have the QR code itself have a visible pattern (Q-Art). This is my favorite because if the tremendous depth of the maths involved, which actually honors the QR code standard and the URL standard to do something pretty profoundly different. This method was first developed by Russ Cox. Its actually pretty difficult to find any single place where these three distinct methods are even listed out. Most of the good article detail how to use one and only one method. So here we go.







Pretty sure this does not count as a “method” by itself, but here is a project that let you map a QR code into a lego building plan. So that you can build your QR code out of legos.

Someone took this a step farther an made a qr code from the shadows of legos..

The upside-down earth logo that you see featured here repeatedly is the work of my dear friend Richard Sachs for my Walking Gallery jacket, which was then adopted as the logo for my healthcare data journalism efforts, with The DocGraph Journal.

The Federal Government Recommends JSON

It is the policy of many governments to support transparency with the release of Open Data. But few understand how important it is that this Open Data be released in machine-readable openly available formats. I have already written a lengthly blog post about how most of the time, the CSV standard is the right data standard to use for releasing large open data sets.  But really JSON, XML, HTML, RTF, TXT, TSV and PDF files, which are all open standard file formats, each have their place as appropriate data standards for governments to use as they release Open Data.

But it can be difficult to explain to someone inside a government or non-profit, who is already releasing Open Data that CSV is a good standard, but XLSX (Microsoft Excel) is not. For many people, a CSV really is an Excel file, so there is no difference in their direct experience. But for those of us who want to parse, ETL or integrate that data automatically there is a world of difference in the level of effort required between a clean CSV and a messy XLSX file (not to mention the cybersecurity implications).

A few months ago (sorry I get distracted) Project Open Data which is a policy website maintained and governed jointly by the Office of Management and Budget and the Office of Science and Technology Policy of the US Federal Government updated its website to include W3C and IETF as sources of Open Data Format Standards, by accepting a pull request that I made. As I had expected, not including IETF and W3C in the list of sources of Open Standards was an omission and not a conspiracy (sometimes I panic).

This is a very important resource for those of us who advocate for Open Data. It means that we can use a single URL link, specifically, this one:


To indicate that it is the policy of the United States Federal Government that not only release Open Data, but it do so using specific standards that are also open. Now that the W3C and IETF are added, the following data standards are by proxy included in the new policy regarding open data standards:

Obviously these four standards make up almost all of the machine readable Open Data that is already easy to work with, and with a few exceptions represents the data formats that 95% (my guesstimate) of all Government data should be released in. In short, while there are certainly other good standards, and even cases where we must tolerate proprietary standards for data, most of the data that we need to release should be released in one of these four data formats.

For those of us who advocate for reasonableness in Open Data releases.. this is a pretty big deal. We can now simply include a few links to publicly available policy documents rather than arguing independently for the underlying principles.

And because the entire Project Open Data website is so clear, concise and well-written and because it comes with the implicit endorsement of US Federal Governments (OMB and OSTP), this is a wonderful new resource for advocating with National, State, City, Local and International governments for the release of Open Data using reasonable data formats. Hell, we might even be able to get some of the NGOs to consider releasing data correctly because of this. My hope is that this will make complaining about proprietary format data releases easier, and therefore more frequent, and help us to educate data releasers on how to make their data more useful. Which in turn will make it easier for data scientists, data journalists, academics and other data wonks to create impact using the data.

My applause to the maintainers and contributors to Project Open Data.







Harris Health Ben Taub Refusal to Triage

This week I went to the Harris Health Ben Taub ED because I was passing a kidney stone. I was refused triage and told that I would have to wait 7 hours for treatment, behind a waiting room full of people who were obviously in the ED for urgent issues that were not as acute as a kidney stone.

As a e-patient advocate, I am aware of the options available for patients who are denied emergency care. Harris Health is not a Joint Commission accredited hospital. It is instead accredited by the DVN GL Corporation. I am sharing my experience with them using their patient compliant process. I choose not to be anonymous while submitting this form (obviously).

Description – please describe your compliant

In the early morning hours of 07-11-2017 I realized that I was passing my second kidney stone. I arranged for a ride to the ED quickly, knowing that crushing pain would soon begin. Indeed, by the time I arrived at the ER, I was unable to stand up straight and I was experiencing waves of nausea.

I walked into the ED told them I was passing a kidney stone and asked for help. Two security guards waved me around the corner where a woman told me to sit. As soon as I sat down, I began vomiting. After vomiting, I told the woman that I was passing a kidney stone and that I was in terrible pain. She never asked what level of pain I was experiencing, or asked what I was experiencing. I made it clear that I had a kidney stone and that I was in pain. I told her that I had not been previously treated at Ben Taub and that I went to Memorial Hermann ED for treatment that last time.

She took my information and told me to proceed through the double doors, and that I would find a trash can in the next room to deposit my pan of vomit away in. I went through the double doors to find the waiting room, threw away my vomit and immediately approached 4 people about what I was supposed to do next. All of the people who were clearly staff in the waiting room said the same thing. They were not the person I needed to talk to and they did not know who I should talk to. I spoke to at least 4 different people in at least 4 groups. Confused as to what I should do next, I returned through the double doors to the original woman and asked what I was supposed to do.

I was then informed that I was to wait in the waiting room for my name to be called and that the wait was going to be 7 hours.  I said that I was passing a kidney stone and made it clear that this was not acceptable. I was given a “sorry”. I asked if I could leave and seek treatment elsewhere, I was told that I could.

I left and went to an urgent care center. The urgent care center promptly treated my pain and confirmed a kidney stone with a CT scan. They also informed me that my blood sugar test qualified me for “first onset diabetes”. I was given insulin to immediately lower my blood sugar. I am a healthcare data scientist, and I am fully aware of what a Diabetes diagnosis means for me long term. However, I still do not understand what it means in the short term and what steps I need to take. I remain confused and frightened.

I know that drug seeking is a difficult problem. I am sympathetic to the pressures that the ED clinicians are under and I know that there were dozens of people seeking primary care in the ED. I was not one of them. I was having an acute event, for which opioid treatment is the only reasonable cure. The Harris Health Ben Taub ED department simply did not triage me.

Despite vomiting multiple times in front of the only clinical person that I engaged with, I was never asked how I felt, what my symptoms were. I was asked “when this started”, and not much else.

After leaving Harris Health without having my symptoms considered and without treatment, I went to an Methodist Urgent Care center were I received prompt treatment. I did not instantly receive opioids there either, but the delay was commensurate with the time it takes to verify that I am not a frequent opioid user (i.e. not an addict, and not an abuser of opioids) then they treated my pain quickly and with compassion. They also performed a CT scan which confirmed the presence of a kidney stone, the results of which I am attaching to this submission.

Desired Outcome of the Compliant

I expect patients to be properly triaged by competent staff in the ED. There are digital mechanisms to determine scripts for opioid usage. I know this, because I designed and implemented at least a few of them.

My personal opioid usage history is a perfect example of someone who suffers from kidney stones. I have never sought opioid prescriptions outside of acute events. These events always correspond with follow up visits to my primary care physician, who then prescribes appropriate, non-pain related medications to treat kidney stones. Using only the SureScripts medication reconciliation process (which is required to a part of any Meaningful Use Certified EHR). Clinicians at Harris Health could have verified that I was not a “drug seeker” (which is BTW, an insulting and pejorative term)

In short, the data shows clearly that I am exactly what I claim to be: A person with a history if kidney stones, passing a kidney stone.

Presumably, Harris Health and Ben Taub are very willing to use prescribing history against patients in their ED. They use this information to deny access to opioids to patients they suspect of being recreational opioid users or opioid addicts.

The desired outcome of my compliant is that Harris Health will actually perform triage on patients, ensuring that patients who are in pain receive timely pain treatment.

The reason that my drug history is available to Harris Health is so that they can better treat me when I arrive. If Harris Health has access to this data, but either ignores it, or merely uses it to deny care, then they have betrayed the intent of the information system. In my opinion, that makes them guilty of defrauding the public of the more than $2 million dollars they received to install the Epic brand EHR system.

I am posting this to my personal website at http://fredtrotter.com


I will update this article with any information I receive back.

My political bias

I think, when one starts to write what could be a politically explosive blog posts, it is good order to reveal your political biases and opinions. I am about to write several, so this is a nice preface that I can just link to, in order to explain my perspective on current political issues.

Let me assure you. I join you, in your disgust for that other party. No matter what party you consider the other party and what party you consider “your” party.

I grew up conservative, in a household that had fought for the Reagan Administration in more ways than one. Modern “conservative” values look nothing like what I grew up with. I still find the core messages of the conservative ideal appealing. I do not want the government doing what corporations, or non-profits should be doing and tend to prefer small government.

I also find mainstream liberal ideas persuasive. The notion that no one should be afraid of getting sick and that sometimes, the governments need to step in when corporations or criminals start to abuse people who are not in a position to defend themselves.

I am utterly dissatisfied with Obamacare, which is much better than any current Trumpcare proposal. Obamacare is deeply problematic in many ways. But if it is in a “death spiral” it is only to the degree that such a spiral can be caused by the current administration pulling the rug out from underneath it. The currently proposed TrumpCare options are orders of magnitude worse. So bad in fact that I think they are more likely straw-man negotiation tactics between the middle-right and far-right components of the Republican Party.

All of which is to say, that if I have any biases, they are against every current political party. I did not choose to vote in the last presidential election, because I felt strongly that I had no viable political options. One candidate had demonstrated that she was very willing to “hard wire” her victory by rigging the outcome of the Democratic Party election. Everyone continues to emphasis how terrible the Russian hacking was, but the Democrats still wrote all the damning emails. And the only conservative thing about Trump is that he A. Not Hillary Clinton and B. willing to pretend to be against abortion on demand… which for the religious right meant that he was tolerable, despite being the anathema of everything else they believe in.

In short, I believe very strongly that its basically all bullshit, and both parties have completely betrayed the US citizenry by substantially betraying their own core values. This is likely the result of dark money in politics and the only political donations I am currently willing to give are to organizations like RootStrikers.

I have no illusions that the United States is the “best” country in the world. Everyone who says that is loading up the word “best” with their own desires, and then brow-beating the rest of us into submission if we disagree. But we are pretty badass country full of badass people who are talented, brilliant, assertive, clever and moral. Why are we being given such poor choices in our leadership?

As a healthcare data wonk, I will not mince words. As far as healthcare goes, here is the basic reality.: Its necessarily very expensive and everyone is pretending that if they were in charge then it would not be expensive. Obamacare at least was a respectable try at solving this problem, and so far Trumpcare is not a respectable attempt. I hope that changes, because if it does not it could be pretty bad, especially for poor people.

I hope Trump and Congress can fix this, because they have basically destroyed any thing they could that Obamacare needed succeed in order to ensure that their “death spiral” criticism is valid. Its like shooting an animal and then saying: “See, this here animal is wounded and useless. No good at all. Sad”


By undermining Obamacare, without having a viable replacement plan in place, Trump is taking an awful risk with millions of lives. I hope is his gamble pays off, for all our sakes.

Just wanted to be clear where I stood on things, since I think my readers have a right to know.


NDC search improves again

As I mentioned recently, the NDC search page improved to include more data and CSV downloads.

Now it is using tabular results, instead of accordion method. I love the improvement.

the old results look like this (click to make the picture clear)



new results look like this (click to make the picture clear)


Great to see the FDA continuing to improve its data browsing capacity!



Drones and healthcare. A brain dump.

Here are my thoughts about drones in healthcare.

  • Most people are not really aware at how blindingly fast small drones are. Most demonstrations have them moving at a snails pace. They are incredibly quick and can cover large distances in the short time that they have battery life.
  • This makes drones ideal for the delivery of small light-weight packages. We can easily foresee a time when very small doses of medications are delivered each day by a drone. The notion that drug delivery could be “bursty” could have major impacts, specifically:
    • In the future, people will be able to call 911 and get an epi-pen to them anywhere in a major city in a matter of minutes. Much faster than an ambulance could arrive. People with serious allergies will have “panic buttons” on their cell phones that enable on-demand delivery of epinephrine via drone.
    • Daily delivery of meds will enable patients to adhere to a medication schedule much more cleanly… which will likely reveal the degree to which patients are actually making very different drug taking choices than what their providers think they are.
    • This could have both positive and negative impacts for patients who rely on opioids for pain control.
  • If we do start to deliver expensive medications via drone, then shooting them down will become a sport for criminals. This is a likely outcome for any drone-based delivery system. Of course, the cameras on drones should also make it very difficult to avoid being caught. Especially if large groups of drones team together. Imagine the behaviors of wasps or bees when “one” of their own is molested.
  • It is possible that groups of drones could be used instead of helicopters for airlifting patients. Its easier to show than describe. This might lead to “get outside so you can be gotten” being an important part of instructions for handling strokes and/or heart attacks. Perhaps doors will become smart enough to allow teams of drones in for airlift purposes.
  • Drones are surprisingly capable or cooperating to accomplish complex goals. So the notion of multiple drones working together to airlift an unconscious person from inside a house all the way to an either emergency care, or another (fully charged) group of emergency lift drones is not unreasonable.
  • We can imagine a feature of future luxury homes being having locally available series of “emergency airlift drones” that are capable of detecting the need for help, or responding to shouts etc etc. This could lead to another layer of healthcare dispartities.
  • This disparity might be easier to resolve by having apartment complexes have “shared” emergency drone pods, that can respond to emergencies in the local area.
  • Drones can swim and fly.  I expect that this will become a part of swimming pools, with one drone capable of rescuing a drowning person, and handing them over to drones capable of doing further airlift. Drowning is a major source of childhood deaths, and I expect that in the same way that “rails” are advocated for swimming pools today, tomorrow AI drown detection and drowned person retrieval will become standard for pools, lakes and rivers.
  • In the interim, drones will be used to detect local health hazards. For instance, you can use a drone fly-over to detect swimming pools that do not have rails (an ironic tie in) but you can also use them to search for mosquito breeding sources, like the massive puddles that currently form outside my apartment (sore subject, I did say it was a brain dump).
  • Drones will become a source of hyper accurate environmental data. Have questions about local air quality? That issue will soon be sampled at a rate hundreds of times more accurately than currently known. This will lead to the ability for public health issues to become local enforcement issues. This could become accurate enough to sort out when a heavy smoker is impacting a local school yard or park. Of course, one might expect that witch-hunts around drone data might become common.
  • For instance, stalking and private hyper-tracking of individuals will become a problem. Now rather than sitting outside a woman’s home, a disgruntled X-boyfriend can just program a drone to track her every move. Or, plic might choose to constantly monitor the movements of convicted sex predators. “Observation rights” are about to be a thing. And observation and scrutiny are known to have mental health implications.
  • The ability to deliver medications via drone will not be limited to legitimate sources. In fact illegal drug delivery is already happening, because drug dealers do not give a shit about FAA regulations the way that Amazon does.
  • In general automated deliver of every kind, groceries etc etc, will give people less reason to leave the house. This will reduce walking and make hyper-sedentary behaviors easier. Given the coorelation between the health of urbanites who are currently forced to walk and suburbanites who drive everywhere, this could create a third even more sedentary population. That is going to be expensive.

Thats all for now, I expect I will add to this…


Self-driving cars and healthcare. A brain dump.

Here are my thoughts on how self-driving cars related to healthcare. In no particular order.

  • First, it is very likely that self driving technology is already well past the reliability and safety of any human driver. This makes for a classic engineering ethics debate. How long will our society tolerate a technical solution to a problem (driving) that we know is much less safe than another solution (AI drivers), just because we are used to a particular paradigm (human drivers).
  • Second, there will be a very strange middle stage when self-driving becomes available in new cars. AI drivers are likely to be vastly more cautious than normal drivers. Some are very concerned that this will cause a problem with humans “bullying” AI drivers. But there is also going to be a health disparity created here. People who can afford new cars could be nearly immune from car accidents, creating a new kind of haves/have nots.
  • We can expect that the geo structure of modern life will change substantially. The introduction of highways to the US caused a migration to suburbs by making a longer drive shorter. Soon it may become both affordable and popular to live in very rural areas, because long commutes will be easier to handle. If you can read, type and make phone calls safely as your car handles a two hour commute to work, long drives will become much more tolerable. This could increase populations in very rural areas, making emergency services drive longer for more people. This could dramatically increase the need for automated drone-based life-flights that are capable of air-lifting people much more cheaply.
  • Similarly, very urban areas could entirely loose parking facilities. Instead of parking cars, they would either drive themselves away from urban areas to use cheaper parking, or perhaps stay active in “uber-mode” delivering other passengers to different destinations. Car ownership could become something that is done only by the very rich (who choose to afford their own self-driving car rather than subject themselves to the availability of a pool of unowned cars) or the very poor (who choose to drive themselves in older cars, and face the problem of vanishing parking)
  • As self-drive cars age, they will become subject to more mechanical problems and more unreliable, increasing the delta between the safety of self-driving cars (which will be smart enough to seek their own maintenance).
  • It is entirely possible that humans will choose to unionize drivers. So even though a truck is capable to driving itself, a human “monitor” will be required to be present for “accountability” purposes (but really just so they stay employed). We are already seeing that “sitting is the new smoking” and these “driver monitors” could lead to a hyper-sedentary lifestyle that could have very negative impacts. Drivers are already subject to lots of unhealthy behaviors, which could be made much worse by disengaging them from any energy expending processes at all.
  • The impact on policing cannot be over emphasized. Many small police departments rely entirely on traffic tickets for revenue. In fact, many small towns are entirely run on ticket revenue. Large police departments also rely on traffic violations for funding policing. In some ways, one might consider the police presence as something that is enabled by the fact that roaming police are constantly able to gain revenue by ticketing traffic violations. Depending on how this issue is resolved we could see police targeting “pedestrian violations” much more heavily, or we could see an uptick in violence as a result of lowered policing. When you consider the possibility of “drone police cars” that are themselves lowering the cost of police presence… it becomes very difficult to predict how policing, and as a result, violent crime, will change in response to self-driving technology.
  • Almost all organ donations are made as the result of traffic accidents. This could lead to a critical shortage of donations. This shortage will likely cause the funding for organ printing to sky-rocket. This is similar to the “parity costs” for solar energy. There is a kind of “parity cost” for organ printing in healthcare and dramatically reduced accident rates as the result of self driving cars could change that calculus very quickly. However, we may face a decade(ish) worth of organ shortage (and corresponding shortages) before organ printing works fully, and after accident based organ donation trails off.
  • Accidents of any kind are generally estimated to be the sixth, fifth or fourth leading cause of death in the United States. Auto accidents are the most common cause of accident. If you eliminate this as a “way to die” it will put greater pressure on heart disease, stroke and age related disorders like Parkinson’s and Alzheimer’s disease. This is a good problem to have, of course, be it needs to be accounted for.

Thats all I can think of off the top of my head.


NoSQL and Technical Debt

So I just had a chance encounter with a former professor of mine, Dr. Hicks from Trinity University.

While I was at Trinity University, mumble mumble long time ago, Dr. Hicks assured me that I needed to take his database class, the course at Trinity CS department that was the most focused on SQL. That class had a sterling reputation, it was regarded as difficult, but practical and comprehensive. Dr. Hicks is a good-natured and capable teacher, and approached the topic with both sophistication, enthusiasm and affection. But databases and SQL is a complicated topic and no amount of nice-professoring is going to make it easier. It was a hard class and at the time, avoiding a hard class seemed like the right decision for me.

I did not take this class, which I profoundly regret. Not a moralizing regret, of course, more of a “that decision cost me” regret. Since leaving Trinity, I have had to teach myself database design, database querying, normalization, un-normalization, query optimization, data importing and in-SQL ETL processes.

Over the last 10 years, I estimate that I have spent at least 1000 hours teaching myself these methods outside of a single class. In terms of missed opportunity dollars, as well as simple compensation, I am sure it has cost me upwards of $100k (this is what happens when you as an entrepreneur… when you take too long to do some work, you suffer as both the client and the programmer). I really wish someone had taken me through the basics, so that I would have only had to teach myself the advanced database topics that I only apply more rarely. As it is, I have lots of legacy code that I am moderately embarrassed by. Not because it is bad code, but because it is code that I wrote to solve problems that are well-solved in a generic manner inside almost all modern database engines.

Dr. Hicks also mentioned to me that he was deliberating how much he should consider including NoSQL technologies in his class. He indicated that students regarded the NoSQL topics as more modern and valuable, and regarded the SQL topics with some distaste.

This prompted the following in-person rant from me on Technical Debt which I thought might be interesting to my readers (why are you here again?) and perhaps some of Dr. Hicks potential students.

His students made the understandable but dangerous error of seeing a new type of technology as exclusively a progression from an old one. NoSQL is not a replacement for SQL, it is solving a different problem in a different way. Both helicopters and airplanes fly, but they do so in different ways that are optimized to solve different problems. They have different benefits and handicaps.

The right way to think about NoSQL is as an excellent solution to the scaling problem, at the expense of supporting complex queries. NoSQL is very simple to query effectively, precisely because complex queries are frequently impossible at the database level. SQL is much harder to learn to query, because one must understand almost everything about the underlying data structures as well as the way the SQL language works, before query writing can even start.

Most of the time, the right way to think about any technology is:

  • What does this make easy?
  • What does this make hard?
  • What does mistakes does this encourage?
  • What mistakes does it prevent?

Almost all modern programming languages are grappling with the “powerful tool I can use to shoot myself in the foot” problem. Computer Science shows us that any Turing-complete programming language can fully emulate any other programming language. So you can use fortran to make web pages if you want, but php makes it easy to make web pages. You can use php to do data science, but R makes that easy. And of course, languages like python seek to be “pretty good” at everything.

NoSQL tends to encourage a very specific and dangerous type of technical debt, because it enables a programmer to skip upfront data architecture, in favor of “store it in a pile and sort it out later”. This is roughly equivilent to storing all of your clothes, clean and otherwise, in large piles on the floor. Adulthood usually means using closets and dressers, since ordering your clothes storage has roughly the same benefit of arranging your data storage.

SQL forces several good habits that avoid the “pile on the floor effect”. To use SQL you have to ask yourself, as an upfront task:

  • What does my data look like now?
  • What relationships are represented in my data?
  • How and Why will I need to query this data in the future?
  • How much data of various types do I expect to get?
  • What will my data look like in the future?

With NoSQL, you get to defer these decisions. With NoSQL you get to just through the data on the pile, in a very generic manner, and later you figure out how you want to use the data. Because the underlying emphasis of NoSQL on scaling, you can be sure that you can defer these decisions without losing data.  If all you need to do is CRUD, at scale, and data analysis is secondary, NoSQL can be ideal. Most of the time, however, data analysis is critical to the operation of an application. When you have to have both scaling and data analysis… well, that is a true data science topic… there is no bottom in that pond.

SQL is not the only querying language that enforces this discipline. Neo4J has created a query language called Cypher, that has many of the same underlying benefits of the SQL language, but is designed for querying graph structures, rather than. Unlike traditional NoSQL databases, Neo4J enforces upfront thinking about data structures, much like a SQL database, it just uses a different underlying data structure: A graph instead of a table. In fact, with time, having experience with both SQL and Graph databases, I have started to understand when the data I am working with “wants” to be in a graph database, vs a traditional tabular SQL database. (Hint: If everything is many-to-many and shortest path or similar things matter… then you probably want a graph database)

Indeed, it is not a requirement that you forgo the efforts to create careful data structures in NoSQL languages. NoSQL experts very quickly realize that using schema’s for data is a good idea, even if doing so is not enforced by the engine.

The key underlying concept of the Technical Debt metaphor is that a programmer must consciously make decisions about how much Debt to incur, in order to avoid the crisis of software that requires so much eventual maintenance, that no further progress can be made on it. Essentially, there is something like “software design bankruptcy” that we should stay far away from.

Like financial debt, bankruptcy is not actually the worst state to reach with technical debt.  The worst state, in both finances and technology, is poverty created and sustained by interest payments. What people sometimes call “debt slavery”. Another state to avoid is taking on no debt at all. Debt is a ready source of capital, and can be used to dramatically accelerate both technical and financial progress.

Also like real life, most individuals manage debt poorly, and the few individuals who learn to use debt wisely have a significant advantage.

But the first step to managing debt wisely is to be recognize when you are taking debt on, and to ensure that it is done with intention and forethought. Make no mistake, forging ahead without designing your data structures is a kind of hedonism, nor dissimilar from those who choose to purchase drinks they cannot afford on their credit card.

If you are looking forward to career with data, not learning SQL is a technical debt equivalent of taking a payday loan. By learning SQL carefully, you will learn to forecast and plan your data strategy, which in many cases is at the heart of your application. Even if you abandon SQL for the sake of another database with some some other benefit later on, the habits you learn from careful data structure planning will always be valuable. Even if you never actually use a SQL database in your career.







Better NDC downloads from the FDA

Recently, the FDA Division of Drug Information, Center for Drug Evaluation and Research dramatically improved how their NDC search tool data downloads work in response to some complaints they received from… someone. Most notably they:

  • Added the NDC Package Code (the NDC-10 code with the dashes) to each row as a distinct field. This is the only field that is unique per row!
  • Added the ability to download the results in plain CSV. (Previously you could only get Microsoft Excel files, which is a proprietary data standard)


NDC search and data download improvements
NDC search and data download improvements

This makes the download functionality much more useful, and IMHO, that improvement makes the searching generally much more worthwhile.

Data hounds like me just download the entire NDC database which is already available as open data already. But these files use non-standard data formats and require special ETL processing to work with conveniently. Now, you can make useful subsets of the NDC data and then download those subsets in an open standard.  Those CSV files will make working with the data in both spreadsheets (other than Excel) and automatic import into databases much easier.

Especially given my recent rant about using simple download formats.  I think it is really important to recognize the folks at the FDA who work every day to ensure that medication information is a little more useful to the public.

Thank you!



Open Data Frustrations

First, let me say that I applaud and salute anyone who releases open data about anything as relevant as healthcare data. It is a tough and thankless slog to properly build, format and document open data files. Really, if you work on this please know that I appreciate you. I value your time and your purpose in life.

But please get your shit together.

Get your shit together
Get your shit together

Please do not make your own data format standards. Please use a standard that does not require me to buy any proprietary expensive software to read. The best open standards have RFCs. Choose one of those.

And most of all. If a comma-delimited file will work for your data, just use a CSV. If you were thinking, “but what if I have commas in my data?”… well you are just wrong. CSV is an actual standard. It has ways to escape commas and most importantly, you do not need to think about that. All you need to do is use the CSV export functionality of whatever you are working with. It will automatically do the right thing for you.

You are not doing yourself any favors creating a fixed length file structure. Soon, you will find that you did not really account for how long last names are. Or you will find an address that is longer than 40 characters. Or the people at the FDA will add another digit to sort out NDC codes… or whatever. CSV files mean that you do not have to think about how many characters your data fields use. More importantly, it means that I do not need to think about it either.

You might be thinking “We should use JSON for this!” or “XML is an open standard”. Yes, thank you for choosing other good open formats… but for very large data sets, you probably just want to use a CSV file. The people at CMS thought JSON would be a good standard to use for the Qualified Health Plan data… and they did in fact design the standard so you could keep the JSON filed to a reasonable size. But the health insurance companies have no incentive to make their JSON files a reasonable size and so they have multiple gigabyte JSON files. That is hugely painful to download and it is a pain to parse.

Just use CSV.

I was recently working with the MAX Provider Characteristics files from Medicaid. Here are the issues I had.

  • They have one zip file from 2009 which empties into a directory with the same name as the zip file. That means that the zip file will not open, because it is trying to write to a directory with the same name as the original file. I have to admit, I am amazed that this mistake is even possible.
  • in 2009, the zip files made subdirectories. In 2010 and 2011 they dumped to the current directory tar-bomb style. (either way is fine, pick one)
  • sometimes the file names of the ‘txt’ files are ALL CAPS and sometimes not, even in the same years data.
  • Sometimes the state codes are upper case like ‘WI’ and ‘WV’, sometimes they are camel case ‘Wy’ and ‘Wa’, sometimes they are lowercase ‘ak’ and ‘al’. Of course, we also have ‘aZ’.
  • Usually the structure is StateCode.year.maxpc.txt .. like GA.2010.maxpc.txt. Except for that one time when they wrote it FL.Y2010.MAXPC.TXT
  • the actual data in the files is fixed length format. Each year, you have to confirm that all of the field lengths are the same in order to ensure that your parser will continue to work.
  • They included instructions for importing the data files in SAS, the single most expense data processing tool available. Which is, of course, what they were using to export the data.
  • They did not include instructions for any of the most popular programming languages. SAS does not even make the top 20 list.
  • There are multiple zip files, each with multiple files inside. We can afford a download that is over 100 MB in size. Just make, one. single. csv file. please.
  • Sometimes the files end in .txt Other times they just end in a ‘.’ (period).
  • The files are not just text files, they have some cruft at the beginning that ensures that they are interpreted as binary files.

Now how does that make me feel as someone trying to make use of these files? Pretty much like you might expect.

I love open data in healthcare. But please, please, start using easy to use and simple data standards. Get your shit together. I spend too much time hacking on ETL, I need to focus on things that change the world. And guess what… you need me to focus on those things too.

So if you are reading this, and you might very well be because I specifically referred you to this rant. Please do the right thing.

Soon, this advise will likely be formally compatible with the Open Data policies of the Federal Government.

  1. Use an open standard for your data
  2. Use CSV if you can
  3. Are you ABSOLUTELY SURE that you cannot use CSV?
  4. Use JSON if you cannot use CSV
  5. Use XML if you cannot use CSV or JSON
  6. Are you looking to compress and manage massive amounts of data moving it around at a furious rate, in a almost-binary compressed format? Perhaps try Protocol Buffers.
  7. Find that Protocol Buffers page confusing? Its because you should be using CSV. So just use CSV.
  8. Make your data and file naming consistent, so that a machine can process it.

This way, we can have all of the wonderful tools for csv data processing available to us. Joy!

Thank you.

Updated Mar 22 2017 (added protocol buffers and links)