A few months ago I posted on what makes a standard or set of standards in e-health investible. The headline requirements I can summarise as follows:
- platform-based: the standards must work together in a single coherent technical ecosystem, based on common information models, knowledge definitions, and interfaces;
- semantic scalability: there must be a sustainable way of dealing with both the massive domain diversity and change, and the massive local variability;
- implementability: is the standards ecosystem available in a developer-friendly form?
- utility: does the standards ecosystem actually bring real value?
- responsive governance: does the ecosystem, and its constituent standards have a maintenance pathway?
In the above, I use the word ‘standard’ to mean anything that is in wide use, as per this post.
In the above, #1, and #3-5 are about technical and management issues. They need to be well understood and carefully addressed. But they can be solved. Most importantly, they are of ‘constant size’, more or less, if we agree that the relentless churn in software platforms essentially produces the same thing every time, solved in slightly different ways.
It is #2 that really matters – the question of semantic scalability. This is the one characteristic that directly reflects the domain subject matter itself: biomedical knowledge, clinical information, workflow, practices and processes.
In the ideal, we want IT systems in health to ‘know about’ these things – clinical workflows for example (as Dr Jerome Carter at EHR Science often points out); clinical data (the core of our work in openEHR, of many e-health standards, and of terminologies); and underlying ‘true facts’. How can the IT layer know about all this? Only if it can be formally expressed. There are numerous technical means of expression underpinning any sophisticated system, but in the end it primarily comes down to the following list. The right hand side in each case is the usual technical means of formalisation:
- facts and classifications => ontologies and terminologies
- data => information models
- process definitions => guidelines
- business rules => formal rule bases
- programming interfaces => APIs
- UI => forms and widgets
Pretty much everything gets mapped through these types of things, with the main tools of the software engineer consisting of just information models, APIs and UI artefacts.
Higher order entities like clinical pathways often manifest implicitly in specific types of APIs and data; terminology turns up as special ‘data’ that helps smart reasoning engines under the hood do their inferencing work. Layers of APIs may encode actions in workflow.
The size of the work
The main question I want to ask here is: how big can all this get? And how big can its steady state rate of change be?
Let’s start with the ‘facts’ group. I doubt if anyone has any idea of how many biomedical ontological statements could be made, but let’s just use SNOMED CT, ICD11 and LOINC, which are intermediary terminologies, as proxies. Respectively, they consist of: 350k terms / 1m relationships; 375k classifiers; 70k terms. And this is largely without taking into account genomics or proteomics. So this category is large, and it’s also growing and changing. If we assume that a ‘fully defined’ SNOMED CT could ever be built, it might well have 1m concepts, and it seems reasonable to assume a few percent per year change (due to new knowledge about known concepts) and a larger percent change per year due to new concepts. Perhaps 5% change per year.
Next let’s look at information models. Here I primarily mean ‘content models’, i.e. models of domain content, rather than just generic things like ‘quantity’ etc. One proxy we have for this is the openEHR archetypes and the Intermountain Clinical Element Models. For the first, we can get some numbers from various repositories around the world. A statistical analysis of the openEHR.org CKM’s archetypes indicates: 430 archetypes with an average of 12 substantive data points each, giving roughly 5,000 data points (one such ‘data point’ has a coded name, data type, and possibly value range and cardinality). If we add in the non-overlapping archetypes from national and other CKMs (currently 5 countries), I would guess 600 altogether, i.e. 7,200 data points. The next question is: how much of medicine is covered by these archetypes? It’s hard to say, but I think 20% would be an upper limit right now. On that basis, we could expect 36,000 data points – for general medicine.
Now, at Intermountain, there are something like 6,500 Clinical Element Models. Most of these have one substantive data point each, because they are defined on a deliberately atomic basis. Something like 2/3 covers lab, with the rest sparsely covering other parts of clinical data. Given that not all of lab is covered yet, I’ll take a rough guess that 10,000 would do a pretty good job (that’s assuming we allow variations in speciment, analyte, site, etc; if we don’t then the number is more or less the LOINC count, i.e. 70,000).
It’s very rough, but based on the above, let’s say that 36,000 + 10,000 ~= 50k. Now, modelling 50,000 substantive data points doesn’t mean just naming 50,000 things, it means carefully defining the elements, the structure, the data types, the value ranges, cardinalities, co-variance relationships, not to mention a whole pile of descriptive meta-data and references. And doing all that while keeping an eye on relationships with other such models (inheritance, uses, dealing with overlaps). This can only be done by domain experts – mostly healthcare professionals, with lab being doable by a mixture of the latter and lab experts / scientists.
50,000 clinical data points is roughly 5,000 archetypes in the openEHR style. It might turn out to be 15,000 smaller models in the emerging CIMI archetype repository. It doesn’t really matter too much – it’s a large number, and each model takes a longish time to get right. Based on some rough figures (see below) for the development of the blood pressure archetype, which has 27 data points, I’m going to use the figure of 10 hours domain expert per data point. That’s 500,000 hours, or approximately 250 person years!
That’s not all. We know that clinical practice, processes and guidelines change quite fast, over the top of the base rate of change due to new science. This means that content models, even if built up to cover all of general medicine to some level of detail, are never going to be finished. Quantifying this is guesswork, but it seems reasonable to expect that in a repository of say 10,000 models (where each has an average of 5 data points in some structure – note, some have one, some can have 70!), there will be changes that could affect 10% of models every year. If on average each change requires a total of 50 hours work (remembering that numerous people can be involved in one model), then we need to include 50,000 h / year ongoing work, or 25 person years.
Above, I didn’t try to estimate the work in hours for things like SNOMED CT, LOINC etc, but we can assume that it is in the hundreds of person years as well.
I am not even going to go into business rules or UI forms, but we know from general IT experience, they can easily number in the hundreds and thousands just for one site, such as a major hospital.
I also have no figures for clinical guidelines, but there are thousands, and the business of just converting a published guideline (i.e. a paper) to a computable form is highly time-consuming and only doable by clinical experts. We can assume O(100h) for any computable guideline, so a total for say 1,000 guidelines could easily be 500,000 hours – another 250 person years…
APIs are a somewhat special case: they are either generic, i.e. of a style like run_query (“query stmt”), in which case the semantics are in the data, or they may mimic underlying content and other semantics, e.g. with special functions like ‘get_diagnoses’. The latter kind of API is the idea pursued by the Harvard SMART project and more recently HL7 FHIR. Sophisticated APIs often encode temporal logic and workflows, a topic too complex to get into here.
What to do about it
If we are up for numbers like the above, how are we going to obtain ‘semantic scalability’ in the overall health informatisation enterprise? There are two inescapable conclusions:
- we only want to do each model / part of model once
- we want to maximise the use of the models in the resulting IT solutions.
The first means that we should express our knowledge base, information models, workflows, guidelines and so on, in abstract formalisms that are highly re-usable. If they are not, we can’t achieve the second. The practical meaning of the second point is that we need to use the definitional models as single-source models for all downstream software artefacts – message definitions, content-based APIs, screen forms, document schemas and so on. That means creating and using smart code / artefact generating tools.
Now most of these latter artefacts are actually what current e-health de jure standards standardise. They have their own organisations, meetings, preferred formalisms, communities and methods. At the moment, much of this is disconnected from the definition of semantic content, and thus, these communities create their own semantic definitions directly in the concrete formalism to hand. This is the entire history of message modelling, clinical document modelling, and it may be the future history of the (very welcome) API work like SMART and FHIR.
However, it’s not sustainable.
The consequence of this modus operandi is that each clinical information entity (say ‘microbiology result’) or clinical workflow ends up being expressed in every concrete type of message, document schema, form definition, API etc. As a manual task, it’s either going to take literally millions of hours, or it will be compromised e.g. by having few clinical experts involved, and reducing the model just to current use cases, to stop it taking so long.
Getting out of this bind isn’t simple: ‘code generation’ is easy to say, but code generation that really works in industrial contexts, and deals with edge cases properly isn’t easy to get right. However, I think it’s the only way. So there is a big piece of e-health computer science to work on here.
There are good starting points. The above-mentioned SMART did some ontology => API code generation; Intermountain’s environment contains significant code generation based on CEMs; the MDHT project also has some very impressive class generation from clinical semantics; openEHR tools do various archetype => schema and API generation.
More importantly, we need to do something about the socio-political side of things – i.e. the problem of separated communities. I would say that at the moment there are only weak relationships between the ontology builders, terminology builders, clinical model builders, and the ‘downstream technology’ communities. And I think it’s reasonable to say that governments and e-health programmes are a fair way today from realising the size and nature of the challenge here.
There are some encouraging signs however…
Recently, an openEHR/FHIR joint review of the Adverse Reaction archetype/resource generated good results: lessons learned on both sides, and an improved adverse reaction model. However, the final state of affairs is still some distance from the ideal. There is no tool generation pathway between the openEHR (or one day it might be CIMI) archetype and the FHIR resource, which is a very concrete artefact – these are still separate things. So if openEHR and FHIR maintainers make changes in the future, they will still be changing two models independently, which will inevitably diverge slowly over time, unless special efforts are made.
Now, this one review took some weeks, and presumably a reasonable amount of work time. I don’t have figures, but I’m sure it was (collectively) more like 100h than 10h. And that’s just one model. So here we are back to semantic scalability again. Imagine if a combined group had built an abstract formal model of Adverse Reaction and then generated a FHIR artefact. And also APIs in various programming languages. And an XML Schema. And some screen form artefacts. That’s starting to sound scalable…
I should point out that if we were to model the Adverse Reaction model in openEHR as we do today, we are still missing out (badly) on proper use/reuse of available ontology and terminology elements, further upstream so to speak. But the tooling and organisational connections are not there yet for this either. It’s another huge opportunity.
The question in my mind is: can we afford – economically – to keep doing manual modelling of domain semantics in concrete artefacts? Can we afford not to have a modellers workbench that connects ontologies, terminologies and content models together?
The figures above tell me we can’t. We either give up on large scale informatisation of clinical semantics, and stick with simple things, or else we get serious about a platform-based eco-system driven by models and ontologies, with powerful code generators and tools.
Blood pressure archetype work effort guestimate:
- 27 data points
- initial development: 2 experts x 2 weeks => 160h
- reviews: 30 experts x 3 reviews x 1h/reviewer/round => 90h
- total => 250h, or ~ 10h / data point.
I am 99% sure this is a substantial under-estimate.
Hello, interesting reading, Im still kind of confused about the relation of FHIR and OpenEHR. Isn’t possible to build archetypes in OpenEHR in conformance with FHIR Resources ? In that way you can use FHIR and OpenEHR together to send/record/retrieve information. I know there are repositories of OpenEHR archetypes, but isn’t a good approach to model archetypes according the FHIR standard instead of building mappings between them?
Hi, well there are a few answers to this. Firstly there are probably 100x more model content in the openEHR archetypes than the FHIR resources. If you add the Intermountain CEM repository, make that 200x – 500x. Those repositories took over a decade to create, with hundreds of clinicians involved (Intermountain’s more like 20y, slowly and carefully). And these two (and other, e.g. VHA FHIM-based CDA templates) will be rationalised in the CIMI model repository.
FHIR is very new, and models perhaps 0.1% – 1% of this, as concrete XML-based resources, plus a generic REST framework.
One of the points of this post is to show that the only likely sustainable future for any kind of standard (including FHIR) that tries to manually model content in some concrete form (XML resources) is to largely be done by forward generation from the much larger and growing abstract model repository. If we work like that, FHIR resources are just one thing we can generate. We can also generate UI (we already do in openEHR), message XSDs, and much else.
None of this is a criticism of FHIR, it’s just that the numbers will beat us if we do things the wrong way round.
Thanks for your answer!
Maybe an alternative to try is to decouple these standards, using FHIR specification for rest API and data exchange, while archetypes that are already defined within openEHR repositories are used as FHIR profiles/resources. What do you think of this approach?
Im still studying and trying to understand the standard ecosystem in e-health, sorry if my thoughts sound too abstract or ideals, Im sure there are more complexity to be aware from.
Well we wouldn’t store health data as FHIR resources, I doubt if anyone would (except in some kind of FHIR server cache). FHIR isn’t designed as EHR data persistence, it’s designed as a REST-servable resource that (hopefully) conveniently expresses information from any back-end that will be stored in all kinds of ways. Typical back-ends include every small EHR, Cerner, Allscripts, and all the other vendors you can think of. The persistence formats are normally optimised in each case for each vendors tools and applications. FHIR is a way for them to publish their data.
The other point I would make is that only some archetypes are openEHR archetypes. There are 13606 archetypes (i.e. based on the 13606 reference model), and a growing number of CIMI archetypes (see opencimi.org). So the question isn’t about openEHR per se, it’s about how to maximise the reuse of clinical models across the industry, and minimise the work required to create FHIR definitions that expose data created from those archetypes.
What is your ‘substantial under-estimate’; the 10h/datapoint or the 99%? 😉
Thanks for sharing the calculations and reasoning.
As you probably know, I share many of your “scalability” concerns, however I believe reasoning about semantic “sustainbility” might trigger even more appropriate associations in peoples minds than “scalability” does.
Have a look at sustainability in sections 4.2-4.4 and 1.4 of my thesis.
Erik, I was remiss in not flagging your thesis as a must read on this topic. It covers more ground than just the topic of scalability in modelling (I recommend readers to download it, it’s a very nice read). I would say that ‘sustainability’ should be treated a bit like ‘quality’ in engineering – it’s the outermost thing you are trying to achieve. To attain sustainability, scalability of various elements is clearly necessary, but just one thing. BTW the Bar-Yam paper you mention is also a must-read – available here – http://www.necsi.edu/projects/yaneer/ESOA04.pdf .
We need to publish more on these issues actually, since lack of understanding continues to cause huge amounts of tax-payers money to be wasted on endeavours that will never work.
Pingback: Vad kostar fullständig interoperablitet? | Oskar Thunman
Pingback: Ewan Davis: the content challenge – Digital Health Intelligence
Pingback: Ewan Davis: the content challenge - Digital Health