A few months ago I posted on what makes a standard or set of standards in e-health investible. The headline requirements I can summarise as follows:
- platform-based: the standards must work together in a single coherent technical ecosystem, based on common information models, knowledge definitions, and interfaces;
- semantic scalability: there must be a sustainable way of dealing with both the massive domain diversity and change, and the massive local variability;
- implementability: is the standards ecosystem available in a developer-friendly form?
- utility: does the standards ecosystem actually bring real value?
- responsive governance: does the ecosystem, and its constituent standards have a maintenance pathway?
In the above, I use the word ‘standard’ to mean anything that is in wide use, as per this post.
In the above, #1, and #3-5 are about technical and management issues. They need to be well understood and carefully addressed. But they can be solved. Most importantly, they are of ‘constant size’, more or less, if we agree that the relentless churn in software platforms essentially produces the same thing every time, solved in slightly different ways.
It is #2 that really matters – the question of semantic scalability. This is the one characteristic that directly reflects the domain subject matter itself: biomedical knowledge, clinical information, workflow, practices and processes.
In the ideal, we want IT systems in health to ‘know about’ these things – clinical workflows for example (as Dr Jerome Carter at EHR Science often points out); clinical data (the core of our work in openEHR, of many e-health standards, and of terminologies); and underlying ‘true facts’. How can the IT layer know about all this? Only if it can be formally expressed. There are numerous technical means of expression underpinning any sophisticated system, but in the end it primarily comes down to the following list. The right hand side in each case is the usual technical means of formalisation:
- facts and classifications => ontologies and terminologies
- data => information models
- process definitions => guidelines
- business rules => formal rule bases
- programming interfaces => APIs
- UI => forms and widgets
Pretty much everything gets mapped through these types of things, with the main tools of the software engineer consisting of just information models, APIs and UI artefacts.
Higher order entities like clinical pathways often manifest implicitly in specific types of APIs and data; terminology turns up as special ‘data’ that helps smart reasoning engines under the hood do their inferencing work. Layers of APIs may encode actions in workflow.
The size of the work
The main question I want to ask here is: how big can all this get? And how big can its steady state rate of change be?
Let’s start with the ‘facts’ group. I doubt if anyone has any idea of how many biomedical ontological statements could be made, but let’s just use SNOMED CT, ICD11 and LOINC, which are intermediary terminologies, as proxies. Respectively, they consist of: 350k terms / 1m relationships; 375k classifiers; 70k terms. And this is largely without taking into account genomics or proteomics. So this category is large, and it’s also growing and changing. If we assume that a ‘fully defined’ SNOMED CT could ever be built, it might well have 1m concepts, and it seems reasonable to assume a few percent per year change (due to new knowledge about known concepts) and a larger percent change per year due to new concepts. Perhaps 5% change per year.
Next let’s look at information models. Here I primarily mean ‘content models’, i.e. models of domain content, rather than just generic things like ‘quantity’ etc. One proxy we have for this is the openEHR archetypes and the Intermountain Clinical Element Models. For the first, we can get some numbers from various repositories around the world. A statistical analysis of the openEHR.org CKM’s archetypes indicates: 430 archetypes with an average of 12 substantive data points each, giving roughly 5,000 data points (one such ‘data point’ has a coded name, data type, and possibly value range and cardinality). If we add in the non-overlapping archetypes from national and other CKMs (currently 5 countries), I would guess 600 altogether, i.e. 7,200 data points. The next question is: how much of medicine is covered by these archetypes? It’s hard to say, but I think 20% would be an upper limit right now. On that basis, we could expect 36,000 data points – for general medicine.
Now, at Intermountain, there are something like 6,500 Clinical Element Models. Most of these have one substantive data point each, because they are defined on a deliberately atomic basis. Something like 2/3 covers lab, with the rest sparsely covering other parts of clinical data. Given that not all of lab is covered yet, I’ll take a rough guess that 10,000 would do a pretty good job (that’s assuming we allow variations in speciment, analyte, site, etc; if we don’t then the number is more or less the LOINC count, i.e. 70,000).
It’s very rough, but based on the above, let’s say that 36,000 + 10,000 ~= 50k. Now, modelling 50,000 substantive data points doesn’t mean just naming 50,000 things, it means carefully defining the elements, the structure, the data types, the value ranges, cardinalities, co-variance relationships, not to mention a whole pile of descriptive meta-data and references. And doing all that while keeping an eye on relationships with other such models (inheritance, uses, dealing with overlaps). This can only be done by domain experts – mostly healthcare professionals, with lab being doable by a mixture of the latter and lab experts / scientists.
50,000 clinical data points is roughly 5,000 archetypes in the openEHR style. It might turn out to be 15,000 smaller models in the emerging CIMI archetype repository. It doesn’t really matter too much – it’s a large number, and each model takes a longish time to get right. Based on some rough figures (see below) for the development of the blood pressure archetype, which has 27 data points, I’m going to use the figure of 10 hours domain expert per data point. That’s 500,000 hours, or approximately 250 person years!
That’s not all. We know that clinical practice, processes and guidelines change quite fast, over the top of the base rate of change due to new science. This means that content models, even if built up to cover all of general medicine to some level of detail, are never going to be finished. Quantifying this is guesswork, but it seems reasonable to expect that in a repository of say 10,000 models (where each has an average of 5 data points in some structure – note, some have one, some can have 70!), there will be changes that could affect 10% of models every year. If on average each change requires a total of 50 hours work (remembering that numerous people can be involved in one model), then we need to include 50,000 h / year ongoing work, or 25 person years.
Above, I didn’t try to estimate the work in hours for things like SNOMED CT, LOINC etc, but we can assume that it is in the hundreds of person years as well.
I am not even going to go into business rules or UI forms, but we know from general IT experience, they can easily number in the hundreds and thousands just for one site, such as a major hospital.
I also have no figures for clinical guidelines, but there are thousands, and the business of just converting a published guideline (i.e. a paper) to a computable form is highly time-consuming and only doable by clinical experts. We can assume O(100h) for any computable guideline, so a total for say 1,000 guidelines could easily be 500,000 hours – another 250 person years…
APIs are a somewhat special case: they are either generic, i.e. of a style like run_query (“query stmt”), in which case the semantics are in the data, or they may mimic underlying content and other semantics, e.g. with special functions like ‘get_diagnoses’. The latter kind of API is the idea pursued by the Harvard SMART project and more recently HL7 FHIR. Sophisticated APIs often encode temporal logic and workflows, a topic too complex to get into here.
What to do about it
If we are up for numbers like the above, how are we going to obtain ‘semantic scalability’ in the overall health informatisation enterprise? There are two inescapable conclusions:
- we only want to do each model / part of model once
- we want to maximise the use of the models in the resulting IT solutions.
The first means that we should express our knowledge base, information models, workflows, guidelines and so on, in abstract formalisms that are highly re-usable. If they are not, we can’t achieve the second. The practical meaning of the second point is that we need to use the definitional models as single-source models for all downstream software artefacts – message definitions, content-based APIs, screen forms, document schemas and so on. That means creating and using smart code / artefact generating tools.
Now most of these latter artefacts are actually what current e-health de jure standards standardise. They have their own organisations, meetings, preferred formalisms, communities and methods. At the moment, much of this is disconnected from the definition of semantic content, and thus, these communities create their own semantic definitions directly in the concrete formalism to hand. This is the entire history of message modelling, clinical document modelling, and it may be the future history of the (very welcome) API work like SMART and FHIR.
However, it’s not sustainable.
The consequence of this modus operandi is that each clinical information entity (say ‘microbiology result’) or clinical workflow ends up being expressed in every concrete type of message, document schema, form definition, API etc. As a manual task, it’s either going to take literally millions of hours, or it will be compromised e.g. by having few clinical experts involved, and reducing the model just to current use cases, to stop it taking so long.
Getting out of this bind isn’t simple: ‘code generation’ is easy to say, but code generation that really works in industrial contexts, and deals with edge cases properly isn’t easy to get right. However, I think it’s the only way. So there is a big piece of e-health computer science to work on here.
There are good starting points. The above-mentioned SMART did some ontology => API code generation; Intermountain’s environment contains significant code generation based on CEMs; the MDHT project also has some very impressive class generation from clinical semantics; openEHR tools do various archetype => schema and API generation.
More importantly, we need to do something about the socio-political side of things – i.e. the problem of separated communities. I would say that at the moment there are only weak relationships between the ontology builders, terminology builders, clinical model builders, and the ‘downstream technology’ communities. And I think it’s reasonable to say that governments and e-health programmes are a fair way today from realising the size and nature of the challenge here.
There are some encouraging signs however…
Recently, an openEHR/FHIR joint review of the Adverse Reaction archetype/resource generated good results: lessons learned on both sides, and an improved adverse reaction model. However, the final state of affairs is still some distance from the ideal. There is no tool generation pathway between the openEHR (or one day it might be CIMI) archetype and the FHIR resource, which is a very concrete artefact – these are still separate things. So if openEHR and FHIR maintainers make changes in the future, they will still be changing two models independently, which will inevitably diverge slowly over time, unless special efforts are made.
Now, this one review took some weeks, and presumably a reasonable amount of work time. I don’t have figures, but I’m sure it was (collectively) more like 100h than 10h. And that’s just one model. So here we are back to semantic scalability again. Imagine if a combined group had built an abstract formal model of Adverse Reaction and then generated a FHIR artefact. And also APIs in various programming languages. And an XML Schema. And some screen form artefacts. That’s starting to sound scalable…
I should point out that if we were to model the Adverse Reaction model in openEHR as we do today, we are still missing out (badly) on proper use/reuse of available ontology and terminology elements, further upstream so to speak. But the tooling and organisational connections are not there yet for this either. It’s another huge opportunity.
The question in my mind is: can we afford – economically – to keep doing manual modelling of domain semantics in concrete artefacts? Can we afford not to have a modellers workbench that connects ontologies, terminologies and content models together?
The figures above tell me we can’t. We either give up on large scale informatisation of clinical semantics, and stick with simple things, or else we get serious about a platform-based eco-system driven by models and ontologies, with powerful code generators and tools.
Blood pressure archetype work effort guestimate:
- 27 data points
- initial development: 2 experts x 2 weeks => 160h
- reviews: 30 experts x 3 reviews x 1h/reviewer/round => 90h
- total => 250h, or ~ 10h / data point.
I am 99% sure this is a substantial under-estimate.