This is an open letter to the tiger team from HIT Policy Committee as well as the committee generally. Recently a group from HITPC gave recommendations to the NHIN Direct project regarding which protocol it should choose. I realized as I heard the comments there, that this group was reading the NHIN Direct Security and Trust Working Groups latest consensus document. I am on that working group and I wrote a considerable portion of that document (most of the Intent section). I was both startled and flattered that the HITPC group was using that document as the basis for their evaluation of the protocol implementations. In fact, they eliminated the XMPP project from consideration because they felt that the SASL authentication that the XMPP implementation will use was incompatible with the following requirement from the consensus document:
2.1 Use of x.509 Certificates. The NHIN Direct protocol relies on agreement that possession of the private key of an x.509 certificate with a particular subject assures compliance of the bearer with a set of arbitrary policies as defined by the issuing authority of the certificate. For example, Verisign assures that bearers of their “extended validation” certificates have been validated according to their official “Certification Practice Statement.” Certificates can be used in many ways, but NHIN Direct relies on the embedded subject and issuing chain as indicated in the following points. Specific implementations may choose to go beyond these basic requirements.
The HITPC team felt that SASL, which does not typically use certs for authentication did not meet this requirement. As it turns out, the XMPP implementation team believes that SASL can be used with x.509 certs and therefore should not be excluded from consideration. That is a simple question of fact and I do not know the answer, but in reality it should not much matter. (will get into that later)
Even more troubling was the assessment of SMTP. The HITPC reviewers considered an all SMTP protocol network as problematic because it allowed for the use of clients which presented users with the option to make security mistakes. They felt that simpler tools should be used, that prevented these types of mistakes from being made.
None of these were unreasonable comments given the fact that they were reading all of the documents on the NHIN Direct site in parallel.
They also have a strong preference for simplicity. Of course, simplicity is very hard to define, and it is obvious that while everyone agrees that security issues are easier to manage with simpler systems, we disagree about what simplicity means.
As I listened to the call, hearing for the first time how others where seeing my work, and the work of the rest of the NHIN Direct S&T working group, I realized that there were some gaps. Ironically this is going to be primarily a discussion of what did not make into the final proposal. Most of the difficult debates that we held in the S&T group involved two divergent goals: Keep reasonable architecture options open to the implementation teams, and the consideration that security decisions that were reasonable 90% of the time were still unreasonable 10% of the time. We could not exclude end users (or implementation paths) by making technology decisions in ways that 10% of the users could not accept. 10% does not sound like much, but if you make 10 decisions and each of those decisions serves to exclude 10% of the end users… well that could be alot of exclusion. We went around and around and mostly, the result is that we settled on smaller and smaller set of things we -had- to have to make flexible trust architecture that would support lots of distributed requirements. This is not a “compromise” position, but a position of strength. Being able to define many valid sub-policies is critical for things like meeting state level legal requirements. To quote Sean Nolan:
“we’ve created an infrastructure that can with configuration easily not just fit in multiple policy environs, but in multiple policy environs SIMULTANEOUSLY.”
That is quite an achievement, but we should be clear about the options we are leaving open to others. I am particularly comfortable with the approach we are taking because it is strikingly similar to the model I had previously created in the HealthQuilt model. I like the term HealthQuilt because it acknowledges the basic elements of the problem. “Start out different, make connections where you can, end with a pleasing result”.
But we also assumed that someone else would be answering lots of questions that we did not. Most notably we could not agree on:
How to have many CA’s?
Our thinking was that you needed to have the tree structure offered by the CA model so that you could simplify trust decisions. We rejected notions of purely peer-to-peer trust (like gpg/pgp) because it would mean that end users would have to make frequent trust decisions increasing the probability that they would get one wrong. Instead if you trust the root cert of a CA, then you can then trust everyone who is obeying the policies of that CA. So X509 generally gave us the ability to make aggregated trust decisions, but we did not decide on what “valid CA architectures would look like”. Here are some different X509 worldviews that at least some of us thought might be valid models:
- The one-ring-to-rule-them CA model. There is one NHIN policy and one NHIN CA and to be on the NHIN you had to have some relationship with that CA. This is really simple, but it does not support serious policy disagreements. We doubt this would be adopted. The cost of certs becomes a HHS expense item.
- The browser model. The NHIN would choose the list of CA’s commonly distributed in web browsers and then people could import that list and get certs from any of those CA’s. This gives a big market of CA’s to buy from but these CA’s are frequently web-oriented. There is wide variance of costs for browser CA certificates.
- The no CA at all model. for people who knew they would be only trusting a small number of other end nodes, they could just choose to import their public certs directly. This would enable very limited communication but that might be exactly what some organizations want. Note that this also supports the use of self-signed certificates. This will only work in certain small environments, but it will be critical for certain paranoid users. This solution is free.
- The government endorsed CA’s. Some people feel that CA’s already approved by the ICAM Trust Framework should be used. This gives a very NISTy feel to the process, but the requirements for ICAM might exclude some solutions (i.e. CACert.org). ICAM certs are cheap (around $100 a year) assuming you only need a few of them.
- CACert.org peer to peer assurance CA. CACert.org is a CA that provides an unlimited number of certificates to assured individuals for no cost. Becoming assured means that other already assured individuals must meet you face to face and check you government ids. For full assurance at least three people must complete that process. This allows for an unlimited number of costless certs backed by a level of assurance that is otherwise extremely expensive. The CACert.org code is open source, and the processes to run CACert.org are open. This is essentially an “open” approach to the CA problem (I like this one best personally)
Individual vs group cert debate?
If you are going to go with any CA model other than “one ring to rule them” then you are saying that the trust relationships inside the CA’s will need to be managed by end users. Given that, some felt that we should be providing individual certs/keys to individual people. Others suggested that we should support one cert per organization. Other said that groups like “firstname.lastname@example.org” should be supported with a sub-group cert.
In the end we decided not to try and define this issue at all. That means that sometimes messages from an address like email@example.com could be signed with a cert that makes it clear that only John smith could have created the message, or by a cert that could have been used by anyone at ahosptial.com or by some subgroup of people at ahospital.com might have had access to the private key for signing.
Many of us felt that flexibility in cert to address mappings was a good thing, since it would allow us to move towards greater accountability as implementations became better and better at the notoriously difficult cert management problem, while allowing simpler models to work initially. However if you have a CA model where certs are expensive, then it will be difficult to move towards greater accountability as organizations choose single certificates for cost reasons.
Mutual TLS vs TLS vs Protocol encryption?
What we could agree on whether and how to mandate TLS/SSL. This is what we did say:
2.6 Encryption. NHIN Direct messages sent over unsecured channels must be protected by standard encryption techniques using key material from the recipient’s valid, non-expired, non-revoked public certificate inheriting up to a configured Anchor certificate per 2.2. Normally this will mean symmetric encryption with key exchange encrypted with PKI. Implementations must also be able to ensure that source and destination endpoint addresses used for routing purposes are not disclosed in transit.
We did this to enable flexiblity.The only thing we explicitly forbid was not using encryption to full protext the addressing component. So no message-only encryption leaving the addresses exposed.
This is a hugely complex issue. In an ideal world, we would have liked to enforce mutual TLS, where both the system initiating the connection and the system receiving it would need to provide certs. Mutual TLS would virtually eliminate spam/ddos attacks because to even initiate a connection you would need to “mutually trusted public certs”.
However, there are lots of several practical limitations to this. First TLS does not support virtual hosting (using more than one domain with only one IP) without the TLS-SNI extension. SNI is well-supported in servers but poorly supported in browsers and client TLS implementations.
Further, only one cert can be presented by the server side of the connection, or at least that is what we have been led to believe and I have not been able to create a “dual signed” public cert in my own testing. That means in order to have multiple certs per server you have to have multiple ports open.
SRV records address both the limitations with virtual hosting and the need to present multiple certs on the server side. This is because SRV DNS records allow you to define a whole series of port and host combinations for any given TCP service. However, MX records, which provide the same fail-over capability for SMTP does not allow you to specify which port. You can implement SMTP using SRV records, but that is a non-standard configuration and the argument for that protocol is generally that it is well-understood and easier to configure.
Ironically, only the XMPP protocol supports SRV out of the box and therefore enables a much higher level of default security in commonly understood configuration. With this high-level of TLS handshaking, you can argue that only message-content-encryption and message-content-signing require certs beyond the TLS, making the debate about SASL somewhat irrelevant. From a security perspective you actually rejected the protocol with the best combination of security+availability+simplicity.
No assumption of configuration?
You rejected SMTP-only because you assumed that end users would be able to configured their NHIN Direct mail clients directly. Ironically, we did not specifically forbid things like that, because we viewed it as a “policy” decision. But the fact that we did not cover it does not imply that the SMTP configuration should happen in a way that would allow for user security configuration. This is obviously a bad idea.
No one every assumed that the right model for the SMTP end deployment would mean that a doctor installed a cert in his current Microsoft Outlook and then selectively used that cert to send some messages over the NHIN Direct network.
We were assuming SMTP deployments that present the user with options that exclude frequent security decisions. This might be as simple as saying “when you click this shortcut outlook will open and you can send NHIN Direct messages, when you click this shortcut outlook will open and you can send email messages”. The user might try to send nhin direct messages with the email client or vice versa, but when they make that mistake (which is a mistake that -will- happen no matter what protocol or interfaces are chosen) the respective client will simply refuse to send to the wrong network.
There are 16 different ways to enforce this both from a technology and a policy perspective, but we did not try to do that, because we were leaving those decisions up to local policy makers, HHS, and you.
You assumed that there where security implications by choosing SMTP that are simply not there.
Lastly I would like to point out that your recommendation was actually problematically not simple. We in the S&T group spent lots of time looking at the problem of security architecture from the perspective of the four implementation groups. For each of them we focused only on the security of the core protocol. Not on the security of the “HISP-to-user” portion. We have carefully evaluated the implications of each of these protocols from that perspective. We have been assuming that the HISP to user connection might like to use lots and lots of reasonable authentication encryption and protocol combinations. Our responsibility was only to secure the connection between nodes.
With that limitation you have chosen just “REST” as the implementation choice, precisely because you see it as a “simple” way to develop the core. The REST team has done some good work, and I think that is a reasonable protocol option. But I am baffeled that you see that as “simple”.
If we choose REST we have no message exchange protocol, we have a software development protocol, we must build a message exchange protocol out of that development tool. With SMTP, XMPP and to a lesser extent IHE, you are configuring software that already exists to perform in an agreed upon secure fashion. There are distinct advantages to the “build it” approach, but from a security perspective, simplicity is not one of them. I think you are underestimating the complexity of messaging generally. You have to sort out things like
- store and forward,
- compatible availability schemes,
- message validity checking (spam handling),
- delivery status notifications,
- character set handling,
- bounce messages.
The REST implementation will have to either build that, or borrow it from SMTP implementations much the same way they now borrow S/MIME. I would encourage you to look at “related RFCs” for a small taste of all the messaging related problems that SMTP protocol has grown to serve. XMPP was originally designed to eclipse the SMTP standard, so it is similarly broad in scope and functionality. Both SMTP and XMPP have had extensive security analysis and multiple implementations have had vulnerabilities found and patched. IHE actually takes a more limited approach to what a message can be about and what it can do. It is not trying to be generalized messaging protocol and is arguable better at patient oriented messaging and worse at generalized messaging as a result.
But in all three cases, XMPP, SMTP and IHE, you are talking about configuring a secure messaging infrastructure instead of building one. The notion that REST is ‘faster to develop’ with is largely irrelevant. Its like saying “We have three options, Windows, Linux or writing a new operating system in python because python is simpler than C” When put that way you can see the deeply problematic notion of “simplicity” that you are putting forward.
All three of the other protocols, at least from the perspective of security, are easier to account for because the platforms are known-quantities. A REST implementation will be more difficult to secure because you are trying to secure totally new software implementing a totally new protocol.
I want to be clear, I am not arguing against REST as an implementation choice. The central advantage of a REST implementation is that you can focus the implementation on solving the specific use-cases of meaningful use. You can have a little less focus on messaging generally, simplifying the problem of a new protocol, and focus on features that directly address meaningful use. Its a smaller target and that could be valuable. Its like a midway point between the generalized messaging approach found in XMPP and SMTP and the too specific, single-patient oriented IHE messaging protocol.
But if you do choose REST, do not do so thinking that it is the “simple” protocol choice.
Beyond the security issues, there are good reasons to prefer any of the implementation protocols. I wanted to be clear that we are expecting your group to have things to say about the things we did not decide (or at least that you know what it means to say nothing), and to make certain that something that we wrote in the S&T group was not biasing you for or against any particular implemenation, all of which are basically compatible with what our group has done.