Does anyone actually understand what terminology is for?

I really wonder sometimes. A few months ago, an international organisation that has been looking at how to solve the requirement for scalable, sustainable content modelling (research data sets) did some trialling on the use of archetypes. This worked fine as far as it went. I subsequently received an email to do with what they would do, that contained the line

“There has also been talk in our senior management about using SNOMED for this type of requirement”.

More recently, a colleague from Norway posted on the openEHR list various quotes from a Gartner report that was commissioned by the Norwegian government. The one most relevant here is (this comes from a Norwegian report):

“National ICT has chosen archetypes as a method for structuring EHR data. It is unclear whether other options have been considered, for example SNOMED-CT in combination with ICD-10 as used in many of the leading systems internationally.”

Where to start with this? It appears that the authors don’t know the difference between terminology and information models.

Erik Sundvall put it nicely on the list:

Whoever wrote that needs an update on the relationship between…
a) Concept Model or “Ontology” (Includes terminology systems like SNOMED-CT and ICD-10) and
b) Information Model (detailed information structures for example clinical ones based on archetypes).

A classic (somewhat informatics-nerdy but good) explanation of all this (plus decision support rules) is Alan Rectors et al’s paper “Models and Inference Methods for Clinical Systems: A Principled Approach” available via I have not heard anybody seriously questioning the main conclusions of that paper.

To create an EHR system you need information models (“b” above) to put the terminology codes (“a” above) in. Information models like openEHR support the use of both ICD-10 and SNOMED CT.

For those who know anything at all about philosophy, one easy way to remember the difference is that

  • terminology represents an ontological view, i.e. a formal description of general truths of a domain, e.g. relationships of diseases, biomedical entities in the patient etc; and
  • information/content models represent an epistemological view, i.e. a knowledge gathering activity, where the ‘knowledge’ is about individuals (patients, patient events etc in the medical domain).

These are two different things. The model of information corresponding to blood pressure measurements on Peter Smith isn’t (at all) the same as the ontological description of ‘blood pressure’. The information describing Peter’s symptoms in an examination is not the same as the ontological description of the possible diseases he may have. This point is so basic, it’s hard to understand how anyone working in e-health doesn’t get it.

Concretely, information models and terminologies (and/or formal ontologies) need a relationship to enable information about X to be associated with a description of X – this improves interoperability, and enables inferencing about facts contained in a persons health data. This is called terminology binding, and although not a completely solved problem, has been a major area of activity for nearly 20 years.

Just to get it off my chest, I’ll quote further from an email I wrote on these kinds of misconceptions a while ago (this is specific to archetypes, so it doesn’t mention e.g. HL7 TermInfo, CDA terminology binding or anything like that).

There is a generalised misconception in some quarters that SNOMED CT or terminology in general will enable you to model content. It won’t. It enables you to model value sets for coded fields within content models, and to code some field names (or ‘nodes’ in a model). But if you want to build a model for a data set consisting of numerous data items, typically in some hierarchical structure, with some coded and some non-coded (e.g. quantitative, ordinal) nodes, then you need a content modelling approach.

Two approaches that have been succcessfully used to do this are archetypes (Archetype Definition Language, an ISO standard), and Intermountain’s Clinical Element Models (CEMs), which were originally expressed in ASN.1 and then more recently in CDL and them CEML. Both of these efforts use terminology, and indeed rely on it. But the terminology won’t give you the content models, it just allows you to populate certain parts of them. If this job could be done just by using terminology, I guarantee a) both openEHR and Intermountain would have done it that way and b) everyone would be awash with content models generated in some wonderful SNOMED tool. But we aren’t, and in fact the whole reason for CIMI to be set up was to create an internationally agreed approach to content modelling – 2 years ago, CIMI settled on the Archetype Definition Language as its language of choice for content modelling. Note that IHTSDO is a CIMI member, and representatives have been involved in CIMI from the start.

Some of this misconception appears to be due to hubris in years past from a few people in the terminology arena who don’t have a background in information systems, and don’t realise the complexity and sophistication of information modelling in IT. Senior managers hear this kind of thing, and of course it sounds reassuring, because it seems to imply that a) someone else is taking care of the problem and b) that all the money spent on terminology is going to solve everything in the end. Unfortunately, this is misguided, and little progress will be made under this thinking.

Another basic point is that terminology today only covers far less than 100% of data fields in common content models, that could in theory be coded (i.e. if say 40% of data fields in a data set could be coded then of that, typically only 30-50% have actual codes in real terminologies, i.e. maximum 20% of the fields in the data set). It is very easy to find elements in the openEHR & 13606 archetypes for which no SNOMED code exists. Intermountain code all of their data elements using their own internal terminology, for which the mappings to external terminologies are only sparse.

I say all of this as someone who was an IHTSDO standing committee member for 4 years (2y on Technical Committee, 2y on Implementation and Innovation Committee), and whose company built a high-performance subsetting terminology service. And I’m a supporter of SNOMED CT and IHTSDO, and actively working with the latter on terminology binding.

But I’m not a supporter of delusional thinking.

About wolandscat

I work on semantic architectures for interoperability of information systems. Much of my time is spent studying biomedical knowledge using methods from philosophy, particularly ontology and epistemology.
This entry was posted in Health Informatics and tagged , , , . Bookmark the permalink.

11 Responses to Does anyone actually understand what terminology is for?

  1. jpmccusker says:

    SNOMED won’t let you model the world. It’s a concept tree, best expressed in something like SKOS. OBO-based ontologies, SIO, PROV, and similar are more formal ontologies that do let you model the world. There’s a spectrum of ontologies (, and data models can fit along there just as easily as terminologies.

    • wolandscat says:

      Indeed. I didn’t talk about the difference between terminologies and ontologies here, since we are just discussing an even more basic point. BTW the main graphic on your reference is not displaying.

  2. wolandscat says:

    I love the batman cartoon. But the spectrum ( I don’t agree with at all, but possibly because of a mismatch in the meaning of the word ‘data’. Since many (most?) data models and data dictionaries are created with no regard to proper ontological or even epistemological semantics, they can’t usually be treated as coherent in a semantic sense. Unfortunately many models in e-health do this, including some standards. So you can’t usually put these kinds of things on a spectrum with ontologies and vocabularies. In fact, many ‘vocabularies’ would also fail a test of semantic coherence.

    In principle, the spectrum should be a good idea, I just doubt it applies anywhere much in the (badly modelled/described) real IT world.

  3. Pingback: Tankar om Socialstyrelsens Gemensamma Informationsstuktur | Oskar Thunman

  4. Grahame says:

    It’s not really different to the morons whomsaynthat XML will solve all teh problemz(TM). Comment here seems relevant too:

    • wolandscat says:

      yes indeed. We are always being told there is something that is the current solution to everything…

    • Kevin Coonan MD says:

      I thought you said JSON would solve all the problems.

      • wolandscat says:

        Ha! Not me, but I can guarantee that out there is an army of people who think so. Battalions who entered the field in earlier waves swore XML, and subsequently, REST, would solve all problems. I’ve even heard that HL7 FHIR will solve all problems….

  5. Pingback: Another good one – oldish but still true | openEHR New Zealand

  6. Kevin Coonan MD says:

    I think there is a major disconnect in EHRS designers, clinical trial management/electronic data capture systems, etc. and the relationship between clinical models needs to be like. If anyone is considering whether to use ICD-10 vs. SNOMED-CT for an application, they need to consult w/ an expert as they don’t have the fundamental knowledge to understand the answer.

    You CANNOT create a useful clinical model of any degree of complexity needed to support a clinician (Improving data entry efficiency, enabling clinical decision support, automation of workflow. If an EHRS cannot help me see more patients and make fewer mistakes then it will only impede patient care) agnostic to terminology.

    I did, several years ago, use the HL7 v3 clinical statement, to create those models which would be necessary to build detailed clinical models /archetypes. There is a huge gap between the Clinical Statement and useable scaffolding for doing this. Fortunately, it was pretty easy to use openEHR reference models to fill in he gaps (e.g. HL7 didn’t have a standard model for how to represent a temporal series so I “borrowed”—and tweaked—the openEHR pattern).

    I constantly ran into terminology issues at even high level abstractions like how you would represent a clinical problem with the ability to link it to encounters where addressed, view its evolution over time, see the basis for a diagnosis, see the latest status/monitoring, and tie it to a plan.

    To do this agnostic to the terminology being used to identify data elements (LOINC in this case) and coded data (SNOMED-CT). Each data element needed a value set which would be a logical subset of whatever abstract model it was refining. There is a need to have a systematic way to consistently represent the same sort of “things”, such as how to indicate some finding is currently present, is at baseline for the patient, it’s severity, anatomic location, temporal course, certainty your that it was present/absent, etc. If you ever want to use the data that you are forcing someone to add as coded concepts, you need to figure out HOW you are going to use SNOMED-CT.

    Assuming you are knowledgeable in SNOMED-CT, you will leverage the computable semantics in a way that other terminologies cannot. If I have said it once, I have said it a hundred times (likely more!), ICD-10 was not designed to be a clinical terminology and it is not suitable for use as one.

    If you know how to use SNOMED-CT, you also know how to fill in the gaps via local terminology extensions (which usually should be sent to your National SNOMED-CT coordinating body for consideration for being added to National or the International version of SNOMED-CT. It is important that you provide human readable definitions of anything which could be the least bit ambiguous based on the term.

    As Stan Huff explained to me a decade and a half ago, when you design your system/models you are going to end up with your own internal vocabulary. If you do this based on SNOMED-CT, in a consistent fashion, you can have future-compatible information (I.e. the context is present) which you can potentially exchange ge between facilities and/or applications with some hope of accuracy and fidelity to what really happened w/ the patient.

    Current EHRS are not up to this task. They are decades behind state of the art in other fields, and what is done in demonstration after non-interoperable demonstration of the potential that EHRS might SOMEDAY be a useful tool for clinicians. I am optimistic, but it will take a revolution of EHRS customers realizing that they have sunk a lot of money into a legacy system which doesn’t support patient care, which are nearly intractable/very expensive to pull even operational insight, let alone clinical understanding. It is embarrassing that we don’t get real-time clinical effectiveness analytics based on the salient features of our patient when we hit the order menu. It is unacceptable to have to click on endless drop down boxes which lack the needed concepts and which are agonizing slow to complete.

    The key is that you need to (1) Understand the underlying information model(s). (2). Understand how clinicians conceptualize and use the information. (3). Understand how SNOMED-CT works. (4). Have a vision of how the information you are gathering is going to be used, both now, and it’s often unforeseen future uses. It’s hard, I get it. You cannot do this w/ amateurs. It shows in the (lack of) usability of the commercial EHRS we are forced to use, in spite of the dangers to patient safety and huge impediment to delivering and documenting quality care.

    Again, expanding on Tom’s point: you need both a hierarchical and systematic model and a terminology w/ computable semantics which you can tailor to your specifics. But, SNOMED-CT needs to be deployed and used in a thoughtful and intentional fashion. Currently, I think trying to use other terminologies to represent coded clinical content is a fool’s errand. Just to say it again, you cannot build a detailed clinical model (e.g. represented as a deplorable archetype) agnostic to your terminology, nor can you expect your terminology to work in a generic model.

    Now I have said too much, and need to get back to the emergency department (aka A&E).

    If anyone wants to see some UML2 diagrams of how I tried to take the HL7 clinical statement down to the detail in openEHR archetypes I am happy to share as long as they are used only for advancing health information standards (those are the terms from the folks who paid for the work gave me). Unfortunately, I cannot share the actual UML2 model and I have no idea where the value set details are—in some MariaDB/MySQL database includes some long forgotten partition I am sure. But the terminology is shown, often imperfectly, via names of value sets. You just have to take my word that children of abstract models were all bound to a value set which was a true subset of the parent concept. All derived class was done by constraint (similar to how HL7 CDA templates work—I fact it was expected that the whole model could be implemented as templates on the current Clinical Statement by what ever formalism one wanted.). All details are not shown in the diagrams, so you do need the set to understand all of the intended constraints/working parts. E.g. most diagram focusing on Acts don’t always show the participation, the models which focuses on Entities doesn’t show all of their uses, etc. It was a reasonably complex model and I am sorry I don’t have the XMI and value sets to share.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s