Why clinical models are essential to big data
I attended HIMSS 2014 in the mammoth convention centre in Orlando 10 days ago, and went to a session on ‘Clinical Decision Support – is progress being made?’. Despite this being the dead Thursday of HIMSS, around 50 people showed up.
The presentation was done by Cory Tate (sen research dir. KLAS) and Adam Cherrington (research dir. KLAS). KLAS is the organisation that performs research into the shape of the health IT industry and publishes the results. So when they say something, it usually means that it’s actually statistically true across the US at least, rather than just a supposition of one person.
What they said was, in a nutshell was: progress is being made, there are Order set products (Elsevier, ProVation etc) and some surveillance products, e.g. infection control rule sets and so on. These have some nice features. Etc Etc. A discussion developed with the audience in which it became clear that both the presenters and others present identified the main blocker as the inability to connect the order sets and other CDS or analytics modules to the EMR products in use.
The funny thing is, that was the main blocker in 1996 or so when I first saw a decision support product in the UK. It has always been the main blocker.
The presenters in this session voiced the usual lament: well we need the EMR vendors to step up and improve their products.
Had I not been suffering from jet-lag I would have pointed out that it won’t happen if it’s only up to the EMR vendors. Here’s why.
Let’s assume that the current talk about ‘big data’ really does reflect the current aspiration of the health information industry. That is, that vendors and providers have realised or are realising that their holy grail – clinical decision support, computable guidelines and analytics – can only be realised by aggregating patient data from multiple sources, and computing on them. Wait. I left out a step. Between aggregation and analytics you need conversion to a common semantic target. A common semantic target means having semantic definitions, that is to say definitions of the content of the data. Then, you need a way of writing queries against this data that is independent of its original or current physical representation, i.e. messages, documents or database. Why? Because noone can afford to write the queries needed for CDS or CPGs more than once – the ‘x N’ problem. Here’s what we need:
- data standards: data representation and transmission standards
- detailed models of content: independent of the physical representation
- a portable query language: an ability to build queries based on the content models only
When we think about the need to build software and interfaces that deal with the data, we also need:
- single-source modelling: a capability to generate software components, schemas and screen forms (largely) from the content models.
At the sociological level, we need content models to be completely independent of vendor products, and to be maintained instead by the clinical community, either in the form of professional entities, government institutions (like the NLM) or provider institutions collectively. They need to act as standards, but not as data standards. They need to be transformable to all forms required to enable their use in vendor products. It must be possible to create complex sets of queries based on these models. Ultimately, we need data that are format-neutral, but model-based. That means a microbiology result may be physically in an HL7v2 message right now, in a GP system format next year, and in some other format 3 years later. However its semantic content must remain invariant.
If we don’t achieve this separation of content models from a) vendor products and b) implementation formats, we’ll never be able to make health data shareable and computable. If it stays inside vendor products, CDS and CPG systems are uneconomic due to the x N problem. If they are not separated from physical implementation standards, clinical people can’t and probably won’t work with them – clinical professionals don’t care about the details of physical data standards, and shouldn’t have to. People who design efficient XML, binary REST interfaces and suchlike generally don’t know what a model of ante-natal examination information entails.
The architecture described here is the architecture of openEHR, and also of various incarnations of the Intermountain Healthcare environment. In both places, clinical models are worked on primarily by clinical people, and are not tied to concrete representation formats. In the openEHR architecture a portable query language is available for writing queries – once.
Getting it right is not easy. Technically, it requires a solid clinical modelling formalism and tools, including concepts of identification, versioning, specialisation, and re-use. It also requires a portable querying language. The latest version of openEHR’s archetype language, ADL 1.5 does this job.
In CIMI, we are working on bringing these threads together to create a universal health content model framework and repository based on knowledge and experience from openEHR, Intermountain, IHTSDO and many others. I hope to see HL7 FHIR carefully integrated into the ecosystem, so that it works with the models rather than being a separate silo.
And of course we need provider institutions and clinical bodies to get behind this ‘separation of powers’. In my view, there is no other route to realising widespread use of clinical analytics, decision support or guidelines. ‘Big data’ will remain ‘parochial data’.