Cascading Configuration Design Pattern

I have not been able to find simple plain-language descriptions of what the Cascading Configuration Design Pattern is. And really not that many complicated descriptions either. 

Basically, this pattern is used to simplify very complicated configuration problems, by simplifying what would be a matrix of configuration decisions into a “cascade” of configurations using a hierarchy.

This easier to understand in the context of the most commonly used implementation of this pattern, Cascading Style Sheets (CSS). This is for people who would like to understand the general concept, but if you want, you can just study how CSS works and you get the general idea.

The first is to imagine a “configuration list” where you have common items that are choosable settings (i.e. this is a “configuration” design pattern after all). Which means you have a bunch of items, that can have different things configured about those items. Those can be elements of a webpage, details in a software or data specification, Firewall rules in an access control list or a purchase list for a fleet of vehicles. All of these are examples of  a “configuration matrix”. If we are lucky enough to have a configuration matrix that only requires 2 dimensions to display, we can look at as a “configuration table”. For now, do not get intimidated when I say “matrix”, you can mentally short-cut that to “multi-dimensional table” or perhaps even simpler “really complex table”.

We are going to use an ordering sheet for a fleet of vehicles in our example configuration matrix:

Item: Vehicle Item: Type Config: Color Config: Transmission Config: Windows
Ford Mustang Car Red  Automatic Automatic
Ford Focus Car Red Automatic Automatic
Ford Fiesta Car Red Automatic Automatic
Chevy Corvette Car Red Automatic Automatic
Chevy Volt Car Red Automatic Automatic
Honda Civic Car Red Automatic Automatic
Honda Accord Car Red Automatic Automatic
Ford F-350 Truck Blue Automatic Automatic
Mack Granite Truck Blue Manual Automatic
Mack Titan Truck Blue Manual Automatic

Now, the above table can defined as a data table. Explicitly defining each individual configuration items. They are always vehicles of some type. But you can choose the following items:

  • Color
  • Transmission Type, either Automatic or Manual
  • Window Mechanism, either Automatic or Manual

Now, if you look at the table above, you notice something pretty quickly. There is not that much information there. In fact, rather than having this whole table, you could just tell your fleet manager some simple english sentences about how you configure your vehicles. You could say

  • We never get manual windows. Who needs that noise?
  • All of our cars are red with automatic transmission
  • All of our trucks are blue.
  • All of our trucks have manual transmission, except for the Ford F350, which is automatic

Now why is this set of rules better than the data table? If you imagine a fleet with 200 more models of cars, or 100 different types of semi trucks, you would have a data table with 300 different elements. But as long as the basic rules remained the same, the english version of the configuration is much easier to work with. So you either have a configuration system with 4 rules, or 300 data points in a data table. When you consider that many configuration problems are multi-dimensional (i.e. n-dimensional sparse matrix rather than a 2d table). The benefits having a sparser configuration mechanism become obvious.

But we should note that the english rules have lost no information from the configuration matrix. That basically means they are the same thing, just in two different representations.

The only problem that we have to solve after this is how to ensure that the rules that we have currently written in english are machine-readable. First there is the data encoding standard, and for that we might choose YAML, XML, JSON or any of the other standards that are good for generically encoding hierarchical data. You want a data standard that is both readable by humans, and potentially automatically processable by computers. But the concept is not limited to explicit use inside data formats. You can use this to structure configure data in documents or spreadsheets. If you wanted the best of both worlds, you could use something like Markdown, which is halfway between a data language and a document system.

While you can implement things in very different ways, it is important to keep this rule solidly in mind.

To work, the configuration cascade should be convertible in one and only one way into its corresponding configuration matrix.

If you break that rule. The usefulness of a configuration cascade is completely lost.

So lets recode the english rules above as a simple yaml cascading configuration:

Vehicle: 
    Window: Automatic
    Color: Red

Vehicle-Type:
    Truck: 
         Color: Blue
         Transmission: Manual

    Car: 
         Transmission: Automatic

FordF350: 
    Transmission: Automatic

If you are familiar with CSS, you can see how close this to just applying CSS to another data model. Which is basically the idea of this design pattern.

For everyone else, lets talk a little about what is going on. Using this yaml file, you can recreate the configuration matrix above, by applying the configurations by inheritance in the order of specificity.

Which is to say, the more specifically targeted the rule is, the higher priority it is given. Every thing on the configuration list is a vehicle, which means any configurations under the “Vehicle:” section will apply to the entire configuration. That is awesome because it lets define the fact that we never want to have manual windows (or always automatic windows) easily. We also define the fact that most of our vehicles have the color of red safely at this level.

All of the trucks are blue, and we have a rule that says that under Vehicle-Type:Truck Color:. All of the car configuration items will inherit their color from the Vehicle, but all Trucks will inherit their color from the Vehicle-Type:Truck Color: definition.

We do not need to define the color of the Car Vehicle type, because it will still be red.

Lastly, the specific item of Ford350 is the only truck that has an Automatic Transmission. So we need to specify that item with the Ford350: line and then set its Transmission: option to Automatic. The Ford350 will inherit everything from the Vehicle.. unless it is overridden by the Vehicle-Type:Truck settings (which is what will make the Ford350 blue). And finally, because it is most specifically set, the Transmission: value is going to apply just to that one item.

Note that it is easier to read these cascades when they are written with the most general definitions at the top, and the more specific rules at the bottom. This eliminates the need to figure out whether rule ordering (i.e. what comes first) is more or less important than specificity (what targets the configuration item most specifically). There are some systems that look like cascades that are actually order-sensitive (i.e. this is how firewall configurations work). But rather than try understand which takes precedence, it is better just to start with the most general at the top of configuration files, and end with the most specific. This is the same best practice (BTW) with order of operations in algebra and programming. You should never write 4 x 4 + 10, because that means that you have to know which comes first, multiplication or addition. Instead, you should make it explicit by writing (4 x 4) + 10 because then you are saying the same thing, in a way that is not ambiguous or has to have a set unwritten (and poorly remembered) rules to resolve.

So that is the Cascading Configuration Design Pattern. I hope you found this helpful.

 

QR Code as Art

Recently, I have come to realize how many issues I have hidden from by being a work-a-holic. Especially losing my mother and brother so close together. It was coming out in unhealthy ways and I have been looking for better ways to process those events, rather than letting them fester as a back-ground brain process.

I had two friends recently recommend “art as therapy”. And I loved the idea, it reminded me of how I have used tinkering/hacking as a kind of therapy in the past. I thought perhaps I could try both at once.

QR codes are going to have a small renaissance now that iOS 11 natively reads QR codes in the main camera application. This has renewed my obsession with QR codes, and I have been researching how QR codes, and art, and cryptocurrencies play along. You can embed crypto keys into QR codes, and you can embed QR codes into art, which is an amazing concept that I got from a the cryptoart booth at SXSW several years ago.

I am thinking about QR code stenciling again, as well as QR code tattoos.

But putting QR codes on art, or even spray painting them is just scratching the surface. One of the most interesting things that you can do with a QR code is merge it with art. This merger can be done in several different ways. First, you can just use a mosaic of some kind, and make the “boxes” of the QR code out of something interesting (Mosaic). Because of how QR code error correction works, you can typically fit things in the center of a QR code without interfering with the reading of the QR code (Overlay). The thing that QR code scanning apps typically care about it the contrast, which means that you can actually muck about with the contrast on a regular image and have the QR codes and the images merge (Contrast). Lastly, you can actually change the layout of the blocks in the QR code to have the QR code itself have a visible pattern (Q-Art). This is my favorite because if the tremendous depth of the maths involved, which actually honors the QR code standard and the URL standard to do something pretty profoundly different. This method was first developed by Russ Cox. Its actually pretty difficult to find any single place where these three distinct methods are even listed out. Most of the good article detail how to use one and only one method. So here we go.

average

code_monkey_qr_code

 

docgraph.qr

 

docgraph.qart

Pretty sure this does not count as a “method” by itself, but here is a project that let you map a QR code into a lego building plan. So that you can build your QR code out of legos.

Someone took this a step farther an made a qr code from the shadows of legos..

The upside-down earth logo that you see featured here repeatedly is the work of my dear friend Richard Sachs for my Walking Gallery jacket, which was then adopted as the logo for my healthcare data journalism efforts, with The DocGraph Journal.

The Federal Government Recommends JSON

It is the policy of many governments to support transparency with the release of Open Data. But few understand how important it is that this Open Data be released in machine-readable openly available formats. I have already written a lengthly blog post about how most of the time, the CSV standard is the right data standard to use for releasing large open data sets.  But really JSON, XML, HTML, RTF, TXT, TSV and PDF files, which are all open standard file formats, each have their place as appropriate data standards for governments to use as they release Open Data.

But it can be difficult to explain to someone inside a government or non-profit, who is already releasing Open Data that CSV is a good standard, but XLSX (Microsoft Excel) is not. For many people, a CSV really is an Excel file, so there is no difference in their direct experience. But for those of us who want to parse, ETL or integrate that data automatically there is a world of difference in the level of effort required between a clean CSV and a messy XLSX file (not to mention the cybersecurity implications).

A few months ago (sorry I get distracted) Project Open Data which is a policy website maintained and governed jointly by the Office of Management and Budget and the Office of Science and Technology Policy of the US Federal Government updated its website to include W3C and IETF as sources of Open Data Format Standards, by accepting a pull request that I made. As I had expected, not including IETF and W3C in the list of sources of Open Standards was an omission and not a conspiracy (sometimes I panic).

This is a very important resource for those of us who advocate for Open Data. It means that we can use a single URL link, specifically, this one:

https://project-open-data.cio.gov/open-standards/

To indicate that it is the policy of the United States Federal Government that not only release Open Data, but it do so using specific standards that are also open. Now that the W3C and IETF are added, the following data standards are by proxy included in the new policy regarding open data standards:

Obviously these four standards make up almost all of the machine readable Open Data that is already easy to work with, and with a few exceptions represents the data formats that 95% (my guesstimate) of all Government data should be released in. In short, while there are certainly other good standards, and even cases where we must tolerate proprietary standards for data, most of the data that we need to release should be released in one of these four data formats.

For those of us who advocate for reasonableness in Open Data releases.. this is a pretty big deal. We can now simply include a few links to publicly available policy documents rather than arguing independently for the underlying principles.

And because the entire Project Open Data website is so clear, concise and well-written and because it comes with the implicit endorsement of US Federal Governments (OMB and OSTP), this is a wonderful new resource for advocating with National, State, City, Local and International governments for the release of Open Data using reasonable data formats. Hell, we might even be able to get some of the NGOs to consider releasing data correctly because of this. My hope is that this will make complaining about proprietary format data releases easier, and therefore more frequent, and help us to educate data releasers on how to make their data more useful. Which in turn will make it easier for data scientists, data journalists, academics and other data wonks to create impact using the data.

My applause to the maintainers and contributors to Project Open Data.

-FT

 

 

 

 

 

Harris Health Ben Taub Refusal to Triage

This week I went to the Harris Health Ben Taub ED because I was passing a kidney stone. I was refused triage and told that I would have to wait 7 hours for treatment, behind a waiting room full of people who were obviously in the ED for urgent issues that were not as acute as a kidney stone.

As a e-patient advocate, I am aware of the options available for patients who are denied emergency care. Harris Health is not a Joint Commission accredited hospital. It is instead accredited by the DVN GL Corporation. I am sharing my experience with them using their patient compliant process. I choose not to be anonymous while submitting this form (obviously).

Description – please describe your compliant

In the early morning hours of 07-11-2017 I realized that I was passing my second kidney stone. I arranged for a ride to the ED quickly, knowing that crushing pain would soon begin. Indeed, by the time I arrived at the ER, I was unable to stand up straight and I was experiencing waves of nausea.

I walked into the ED told them I was passing a kidney stone and asked for help. Two security guards waved me around the corner where a woman told me to sit. As soon as I sat down, I began vomiting. After vomiting, I told the woman that I was passing a kidney stone and that I was in terrible pain. She never asked what level of pain I was experiencing, or asked what I was experiencing. I made it clear that I had a kidney stone and that I was in pain. I told her that I had not been previously treated at Ben Taub and that I went to Memorial Hermann ED for treatment that last time.

She took my information and told me to proceed through the double doors, and that I would find a trash can in the next room to deposit my pan of vomit away in. I went through the double doors to find the waiting room, threw away my vomit and immediately approached 4 people about what I was supposed to do next. All of the people who were clearly staff in the waiting room said the same thing. They were not the person I needed to talk to and they did not know who I should talk to. I spoke to at least 4 different people in at least 4 groups. Confused as to what I should do next, I returned through the double doors to the original woman and asked what I was supposed to do.

I was then informed that I was to wait in the waiting room for my name to be called and that the wait was going to be 7 hours.  I said that I was passing a kidney stone and made it clear that this was not acceptable. I was given a “sorry”. I asked if I could leave and seek treatment elsewhere, I was told that I could.

I left and went to an urgent care center. The urgent care center promptly treated my pain and confirmed a kidney stone with a CT scan. They also informed me that my blood sugar test qualified me for “first onset diabetes”. I was given insulin to immediately lower my blood sugar. I am a healthcare data scientist, and I am fully aware of what a Diabetes diagnosis means for me long term. However, I still do not understand what it means in the short term and what steps I need to take. I remain confused and frightened.

I know that drug seeking is a difficult problem. I am sympathetic to the pressures that the ED clinicians are under and I know that there were dozens of people seeking primary care in the ED. I was not one of them. I was having an acute event, for which opioid treatment is the only reasonable cure. The Harris Health Ben Taub ED department simply did not triage me.

Despite vomiting multiple times in front of the only clinical person that I engaged with, I was never asked how I felt, what my symptoms were. I was asked “when this started”, and not much else.

After leaving Harris Health without having my symptoms considered and without treatment, I went to an Methodist Urgent Care center were I received prompt treatment. I did not instantly receive opioids there either, but the delay was commensurate with the time it takes to verify that I am not a frequent opioid user (i.e. not an addict, and not an abuser of opioids) then they treated my pain quickly and with compassion. They also performed a CT scan which confirmed the presence of a kidney stone, the results of which I am attaching to this submission.

Desired Outcome of the Compliant

I expect patients to be properly triaged by competent staff in the ED. There are digital mechanisms to determine scripts for opioid usage. I know this, because I designed and implemented at least a few of them.

My personal opioid usage history is a perfect example of someone who suffers from kidney stones. I have never sought opioid prescriptions outside of acute events. These events always correspond with follow up visits to my primary care physician, who then prescribes appropriate, non-pain related medications to treat kidney stones. Using only the SureScripts medication reconciliation process (which is required to a part of any Meaningful Use Certified EHR). Clinicians at Harris Health could have verified that I was not a “drug seeker” (which is BTW, an insulting and pejorative term)

In short, the data shows clearly that I am exactly what I claim to be: A person with a history if kidney stones, passing a kidney stone.

Presumably, Harris Health and Ben Taub are very willing to use prescribing history against patients in their ED. They use this information to deny access to opioids to patients they suspect of being recreational opioid users or opioid addicts.

The desired outcome of my compliant is that Harris Health will actually perform triage on patients, ensuring that patients who are in pain receive timely pain treatment.

The reason that my drug history is available to Harris Health is so that they can better treat me when I arrive. If Harris Health has access to this data, but either ignores it, or merely uses it to deny care, then they have betrayed the intent of the information system. In my opinion, that makes them guilty of defrauding the public of the more than $2 million dollars they received to install the Epic brand EHR system.

I am posting this to my personal website at http://fredtrotter.com

Updates

I will update this article with any information I receive back.

My political bias

I think, when one starts to write what could be a politically explosive blog posts, it is good order to reveal your political biases and opinions. I am about to write several, so this is a nice preface that I can just link to, in order to explain my perspective on current political issues.

Let me assure you. I join you, in your disgust for that other party. No matter what party you consider the other party and what party you consider “your” party.

I grew up conservative, in a household that had fought for the Reagan Administration in more ways than one. Modern “conservative” values look nothing like what I grew up with. I still find the core messages of the conservative ideal appealing. I do not want the government doing what corporations, or non-profits should be doing and tend to prefer small government.

I also find mainstream liberal ideas persuasive. The notion that no one should be afraid of getting sick and that sometimes, the governments need to step in when corporations or criminals start to abuse people who are not in a position to defend themselves.

I am utterly dissatisfied with Obamacare, which is much better than any current Trumpcare proposal. Obamacare is deeply problematic in many ways. But if it is in a “death spiral” it is only to the degree that such a spiral can be caused by the current administration pulling the rug out from underneath it. The currently proposed TrumpCare options are orders of magnitude worse. So bad in fact that I think they are more likely straw-man negotiation tactics between the middle-right and far-right components of the Republican Party.

All of which is to say, that if I have any biases, they are against every current political party. I did not choose to vote in the last presidential election, because I felt strongly that I had no viable political options. One candidate had demonstrated that she was very willing to “hard wire” her victory by rigging the outcome of the Democratic Party election. Everyone continues to emphasis how terrible the Russian hacking was, but the Democrats still wrote all the damning emails. And the only conservative thing about Trump is that he A. Not Hillary Clinton and B. willing to pretend to be against abortion on demand… which for the religious right meant that he was tolerable, despite being the anathema of everything else they believe in.

In short, I believe very strongly that its basically all bullshit, and both parties have completely betrayed the US citizenry by substantially betraying their own core values. This is likely the result of dark money in politics and the only political donations I am currently willing to give are to organizations like RootStrikers.

I have no illusions that the United States is the “best” country in the world. Everyone who says that is loading up the word “best” with their own desires, and then brow-beating the rest of us into submission if we disagree. But we are pretty badass country full of badass people who are talented, brilliant, assertive, clever and moral. Why are we being given such poor choices in our leadership?

As a healthcare data wonk, I will not mince words. As far as healthcare goes, here is the basic reality.: Its necessarily very expensive and everyone is pretending that if they were in charge then it would not be expensive. Obamacare at least was a respectable try at solving this problem, and so far Trumpcare is not a respectable attempt. I hope that changes, because if it does not it could be pretty bad, especially for poor people.

I hope Trump and Congress can fix this, because they have basically destroyed any thing they could that Obamacare needed succeed in order to ensure that their “death spiral” criticism is valid. Its like shooting an animal and then saying: “See, this here animal is wounded and useless. No good at all. Sad”

 

By undermining Obamacare, without having a viable replacement plan in place, Trump is taking an awful risk with millions of lives. I hope is his gamble pays off, for all our sakes.

Just wanted to be clear where I stood on things, since I think my readers have a right to know.

-FT

NDC search improves again

As I mentioned recently, the NDC search page improved to include more data and CSV downloads.

Now it is using tabular results, instead of accordion method. I love the improvement.

the old results look like this (click to make the picture clear)

ndc_improvements

 

new results look like this (click to make the picture clear)

ndc_improvements_v2

Great to see the FDA continuing to improve its data browsing capacity!

-FT

 

Drones and healthcare. A brain dump.

Here are my thoughts about drones in healthcare.

  • Most people are not really aware at how blindingly fast small drones are. Most demonstrations have them moving at a snails pace. They are incredibly quick and can cover large distances in the short time that they have battery life.
  • This makes drones ideal for the delivery of small light-weight packages. We can easily foresee a time when very small doses of medications are delivered each day by a drone. This will help patients to adhere to a medication schedule much more cleanly… which will likely reveal the degree to which patients are actually making very different drug taking choices than what their providers think they are. This could have both positive and negative impacts for patients who rely on opioids for pain control.
  • If we do start to deliver expensive medications via drone, then shooting them down will become a sport for criminals. This is a likely outcome for any drone-based delivery system. Of course, the cameras on drones should also make it very difficult to avoid being caught. Especially if large groups of drones team together. Imagine the behaviors of wasps or bees when “one” of their own is molested.
  • It is possible that groups of drones could be used instead of helicopters for airlifting patients. Its easier to show than describe. This might lead to “get outside so you can be gotten” being an important part of instructions for handling strokes and/or heart attacks. Perhaps doors will become smart enough to allow teams of drones in for airlift purposes.
  • Drones are surprisingly capable or cooperating to accomplish complex goals. So the notion of multiple drones working together to airlift an unconscious person from inside a house all the way to an either emergency care, or another (fully charged) group of emergency lift drones is not unreasonable.
  • We can imagine a feature of future luxury homes being having locally available series of “emergency airlift drones” that are capable of detecting the need for help, or responding to shouts etc etc. This could lead to another layer of healthcare dispartities.
  • This disparity might be easier to resolve by having apartment complexes have “shared” emergency drone pods, that can respond to emergencies in the local area.
  • Drones can swim and fly.  I expect that this will become a part of swimming pools, with one drone capable of rescuing a drowning person, and handing them over to drones capable of doing further airlift. Drowning is a major source of childhood deaths, and I expect that in the same way that “rails” are advocated for swimming pools today, tomorrow AI drown detection and drowned person retrieval will become standard for pools, lakes and rivers.
  • In the interim, drones will be used to detect local health hazards. For instance, you can use a drone fly-over to detect swimming pools that do not have rails (an ironic tie in) but you can also use them to search for mosquito breeding sources, like the massive puddles that currently form outside my apartment (sore subject, I did say it was a brain dump).
  • Drones will become a source of hyper accurate environmental data. Have questions about local air quality? That issue will soon be sampled at a rate hundreds of times more accurately than currently known. This will lead to the ability for public health issues to become local enforcement issues. This could become accurate enough to sort out when a heavy smoker is impacting a local school yard or park. Of course, one might expect that witch-hunts around drone data might become common.
  • For instance, stalking and private hyper-tracking of individuals will become a problem. Now rather than sitting outside a woman’s home, a disgruntled X-boyfriend can just program a drone to track her every move. Or, plic might choose to constantly monitor the movements of convicted sex predators. “Observation rights” are about to be a thing. And observation and scrutiny are known to have mental health implications.
  • The ability to deliver medications via drone will not be limited to legitimate sources. In fact illegal drug delivery is already happening, because drug dealers do not give a shit about FAA regulations the way that Amazon does.
  • In general automated deliver of every kind, groceries etc etc, will give people less reason to leave the house. This will reduce walking and make hyper-sedentary behaviors easier. Given the coorelation between the health of urbanites who are currently forced to walk and suburbanites who drive everywhere, this could create a third even more sedentary population. That is going to be expensive.

Thats all for now, I expect I will add to this…

-FT

Self-driving cars and healthcare. A brain dump.

Here are my thoughts on how self-driving cars related to healthcare. In no particular order.

  • First, it is very likely that self driving technology is already well past the reliability and safety of any human driver. This makes for a classic engineering ethics debate. How long will our society tolerate a technical solution to a problem (driving) that we know is much less safe than another solution (AI drivers), just because we are used to a particular paradigm (human drivers).
  • Second, there will be a very strange middle stage when self-driving becomes available in new cars. AI drivers are likely to be vastly more cautious than normal drivers. Some are very concerned that this will cause a problem with humans “bullying” AI drivers. But there is also going to be a health disparity created here. People who can afford new cars could be nearly immune from car accidents, creating a new kind of haves/have nots.
  • We can expect that the geo structure of modern life will change substantially. The introduction of highways to the US caused a migration to suburbs by making a longer drive shorter. Soon it may become both affordable and popular to live in very rural areas, because long commutes will be easier to handle. If you can read, type and make phone calls safely as your car handles a two hour commute to work, long drives will become much more tolerable. This could increase populations in very rural areas, making emergency services drive longer for more people. This could dramatically increase the need for automated drone-based life-flights that are capable of air-lifting people much more cheaply.
  • Similarly, very urban areas could entirely loose parking facilities. Instead of parking cars, they would either drive themselves away from urban areas to use cheaper parking, or perhaps stay active in “uber-mode” delivering other passengers to different destinations. Car ownership could become something that is done only by the very rich (who choose to afford their own self-driving car rather than subject themselves to the availability of a pool of unowned cars) or the very poor (who choose to drive themselves in older cars, and face the problem of vanishing parking)
  • As self-drive cars age, they will become subject to more mechanical problems and more unreliable, increasing the delta between the safety of self-driving cars (which will be smart enough to seek their own maintenance).
  • It is entirely possible that humans will choose to unionize drivers. So even though a truck is capable to driving itself, a human “monitor” will be required to be present for “accountability” purposes (but really just so they stay employed). We are already seeing that “sitting is the new smoking” and these “driver monitors” could lead to a hyper-sedentary lifestyle that could have very negative impacts. Drivers are already subject to lots of unhealthy behaviors, which could be made much worse by disengaging them from any energy expending processes at all.
  • The impact on policing cannot be over emphasized. Many small police departments rely entirely on traffic tickets for revenue. In fact, many small towns are entirely run on ticket revenue. Large police departments also rely on traffic violations for funding policing. In some ways, one might consider the police presence as something that is enabled by the fact that roaming police are constantly able to gain revenue by ticketing traffic violations. Depending on how this issue is resolved we could see police targeting “pedestrian violations” much more heavily, or we could see an uptick in violence as a result of lowered policing. When you consider the possibility of “drone police cars” that are themselves lowering the cost of police presence… it becomes very difficult to predict how policing, and as a result, violent crime, will change in response to self-driving technology.
  • Almost all organ donations are made as the result of traffic accidents. This could lead to a critical shortage of donations. This shortage will likely cause the funding for organ printing to sky-rocket. This is similar to the “parity costs” for solar energy. There is a kind of “parity cost” for organ printing in healthcare and dramatically reduced accident rates as the result of self driving cars could change that calculus very quickly. However, we may face a decade(ish) worth of organ shortage (and corresponding shortages) before organ printing works fully, and after accident based organ donation trails off.
  • Accidents of any kind are generally estimated to be the sixth, fifth or fourth leading cause of death in the United States. Auto accidents are the most common cause of accident. If you eliminate this as a “way to die” it will put greater pressure on heart disease, stroke and age related disorders like Parkinson’s and Alzheimer’s disease. This is a good problem to have, of course, be it needs to be accounted for.

Thats all I can think of off the top of my head.

-FT

NoSQL and Technical Debt

So I just had a chance encounter with a former professor of mine, Dr. Hicks from Trinity University.

While I was at Trinity University, mumble mumble long time ago, Dr. Hicks assured me that I needed to take his database class, the course at Trinity CS department that was the most focused on SQL. That class had a sterling reputation, it was regarded as difficult, but practical and comprehensive. Dr. Hicks is a good-natured and capable teacher, and approached the topic with both sophistication, enthusiasm and affection. But databases and SQL is a complicated topic and no amount of nice-professoring is going to make it easier. It was a hard class and at the time, avoiding a hard class seemed like the right decision for me.

I did not take this class, which I profoundly regret. Not a moralizing regret, of course, more of a “that decision cost me” regret. Since leaving Trinity, I have had to teach myself database design, database querying, normalization, un-normalization, query optimization, data importing and in-SQL ETL processes.

Over the last 10 years, I estimate that I have spent at least 1000 hours teaching myself these methods outside of a single class. In terms of missed opportunity dollars, as well as simple compensation, I am sure it has cost me upwards of $100k (this is what happens when you as an entrepreneur… when you take too long to do some work, you suffer as both the client and the programmer). I really wish someone had taken me through the basics, so that I would have only had to teach myself the advanced database topics that I only apply more rarely. As it is, I have lots of legacy code that I am moderately embarrassed by. Not because it is bad code, but because it is code that I wrote to solve problems that are well-solved in a generic manner inside almost all modern database engines.

Dr. Hicks also mentioned to me that he was deliberating how much he should consider including NoSQL technologies in his class. He indicated that students regarded the NoSQL topics as more modern and valuable, and regarded the SQL topics with some distaste.

This prompted the following in-person rant from me on Technical Debt which I thought might be interesting to my readers (why are you here again?) and perhaps some of Dr. Hicks potential students.

His students made the understandable but dangerous error of seeing a new type of technology as exclusively a progression from an old one. NoSQL is not a replacement for SQL, it is solving a different problem in a different way. Both helicopters and airplanes fly, but they do so in different ways that are optimized to solve different problems. They have different benefits and handicaps.

The right way to think about NoSQL is as an excellent solution to the scaling problem, at the expense of supporting complex queries. NoSQL is very simple to query effectively, precisely because complex queries are frequently impossible at the database level. SQL is much harder to learn to query, because one must understand almost everything about the underlying data structures as well as the way the SQL language works, before query writing can even start.

Most of the time, the right way to think about any technology is:

  • What does this make easy?
  • What does this make hard?
  • What does mistakes does this encourage?
  • What mistakes does it prevent?

Almost all modern programming languages are grappling with the “powerful tool I can use to shoot myself in the foot” problem. Computer Science shows us that any Turing-complete programming language can fully emulate any other programming language. So you can use fortran to make web pages if you want, but php makes it easy to make web pages. You can use php to do data science, but R makes that easy. And of course, languages like python seek to be “pretty good” at everything.

NoSQL tends to encourage a very specific and dangerous type of technical debt, because it enables a programmer to skip upfront data architecture, in favor of “store it in a pile and sort it out later”. This is roughly equivilent to storing all of your clothes, clean and otherwise, in large piles on the floor. Adulthood usually means using closets and dressers, since ordering your clothes storage has roughly the same benefit of arranging your data storage.

SQL forces several good habits that avoid the “pile on the floor effect”. To use SQL you have to ask yourself, as an upfront task:

  • What does my data look like now?
  • What relationships are represented in my data?
  • How and Why will I need to query this data in the future?
  • How much data of various types do I expect to get?
  • What will my data look like in the future?

With NoSQL, you get to defer these decisions. With NoSQL you get to just through the data on the pile, in a very generic manner, and later you figure out how you want to use the data. Because the underlying emphasis of NoSQL on scaling, you can be sure that you can defer these decisions without losing data.  If all you need to do is CRUD, at scale, and data analysis is secondary, NoSQL can be ideal. Most of the time, however, data analysis is critical to the operation of an application. When you have to have both scaling and data analysis… well, that is a true data science topic… there is no bottom in that pond.

SQL is not the only querying language that enforces this discipline. Neo4J has created a query language called Cypher, that has many of the same underlying benefits of the SQL language, but is designed for querying graph structures, rather than. Unlike traditional NoSQL databases, Neo4J enforces upfront thinking about data structures, much like a SQL database, it just uses a different underlying data structure: A graph instead of a table. In fact, with time, having experience with both SQL and Graph databases, I have started to understand when the data I am working with “wants” to be in a graph database, vs a traditional tabular SQL database. (Hint: If everything is many-to-many and shortest path or similar things matter… then you probably want a graph database)

Indeed, it is not a requirement that you forgo the efforts to create careful data structures in NoSQL languages. NoSQL experts very quickly realize that using schema’s for data is a good idea, even if doing so is not enforced by the engine.

The key underlying concept of the Technical Debt metaphor is that a programmer must consciously make decisions about how much Debt to incur, in order to avoid the crisis of software that requires so much eventual maintenance, that no further progress can be made on it. Essentially, there is something like “software design bankruptcy” that we should stay far away from.

Like financial debt, bankruptcy is not actually the worst state to reach with technical debt.  The worst state, in both finances and technology, is poverty created and sustained by interest payments. What people sometimes call “debt slavery”. Another state to avoid is taking on no debt at all. Debt is a ready source of capital, and can be used to dramatically accelerate both technical and financial progress.

Also like real life, most individuals manage debt poorly, and the few individuals who learn to use debt wisely have a significant advantage.

But the first step to managing debt wisely is to be recognize when you are taking debt on, and to ensure that it is done with intention and forethought. Make no mistake, forging ahead without designing your data structures is a kind of hedonism, nor dissimilar from those who choose to purchase drinks they cannot afford on their credit card.

If you are looking forward to career with data, not learning SQL is a technical debt equivalent of taking a payday loan. By learning SQL carefully, you will learn to forecast and plan your data strategy, which in many cases is at the heart of your application. Even if you abandon SQL for the sake of another database with some some other benefit later on, the habits you learn from careful data structure planning will always be valuable. Even if you never actually use a SQL database in your career.

HTH,

-FT

 

 

 

 

Better NDC downloads from the FDA

Recently, the FDA Division of Drug Information, Center for Drug Evaluation and Research dramatically improved how their NDC search tool data downloads work in response to some complaints they received from… someone. Most notably they:

  • Added the NDC Package Code (the NDC-10 code with the dashes) to each row as a distinct field. This is the only field that is unique per row!
  • Added the ability to download the results in plain CSV. (Previously you could only get Microsoft Excel files, which is a proprietary data standard)

 

NDC search and data download improvements
NDC search and data download improvements

This makes the download functionality much more useful, and IMHO, that improvement makes the searching generally much more worthwhile.

Data hounds like me just download the entire NDC database which is already available as open data already. But these files use non-standard data formats and require special ETL processing to work with conveniently. Now, you can make useful subsets of the NDC data and then download those subsets in an open standard.  Those CSV files will make working with the data in both spreadsheets (other than Excel) and automatic import into databases much easier.

Especially given my recent rant about using simple download formats.  I think it is really important to recognize the folks at the FDA who work every day to ensure that medication information is a little more useful to the public.

Thank you!

-FT