Does anyone actually understand what terminology is for?

I really wonder sometimes. A few months ago, an international organisation that has been looking at how to solve the requirement for scalable, sustainable content modelling (research data sets) did some trialling on the use of archetypes. This worked fine as far as it went. I subsequently received an email to do with what they would do, that contained the line

“There has also been talk in our senior management about using SNOMED for this type of requirement”.

More recently, a colleague from Norway posted on the openEHR list various quotes from a Gartner report that was commissioned by the Norwegian government. The one most relevant here is (this comes from a Norwegian report):

“National ICT has chosen archetypes as a method for structuring EHR data. It is unclear whether other options have been considered, for example SNOMED-CT in combination with ICD-10 as used in many of the leading systems internationally.”

Where to start with this? It appears that the authors don’t know the difference between terminology and information models.

Erik Sundvall put it nicely on the list:

Whoever wrote that needs an update on the relationship between…
a) Concept Model or “Ontology” (Includes terminology systems like SNOMED-CT and ICD-10) and
b) Information Model (detailed information structures for example clinical ones based on archetypes).

A classic (somewhat informatics-nerdy but good) explanation of all this (plus decision support rules) is Alan Rectors et al’s paper “Models and Inference Methods for Clinical Systems: A Principled Approach” available via I have not heard anybody seriously questioning the main conclusions of that paper.

To create an EHR system you need information models (“b” above) to put the terminology codes (“a” above) in. Information models like openEHR support the use of both ICD-10 and SNOMED CT.

For those who know anything at all about philosophy, one easy way to remember the difference is that

  • terminology represents an ontological view, i.e. a formal description of general truths of a domain, e.g. relationships of diseases, biomedical entities in the patient etc; and
  • information/content models represent an epistemological view, i.e. a knowledge gathering activity, where the ‘knowledge’ is about individuals (patients, patient events etc in the medical domain).

These are two different things. The model of information corresponding to blood pressure measurements on Peter Smith isn’t (at all) the same as the ontological description of ‘blood pressure’. The information describing Peter’s symptoms in an examination is not the same as the ontological description of the possible diseases he may have. This point is so basic, it’s hard to understand how anyone working in e-health doesn’t get it.

Concretely, information models and terminologies (and/or formal ontologies) need a relationship to enable information about X to be associated with a description of X – this improves interoperability, and enables inferencing about facts contained in a persons health data. This is called terminology binding, and although not a completely solved problem, has been a major area of activity for nearly 20 years.

Just to get it off my chest, I’ll quote further from an email I wrote on these kinds of misconceptions a while ago (this is specific to archetypes, so it doesn’t mention e.g. HL7 TermInfo, CDA terminology binding or anything like that).

There is a generalised misconception in some quarters that SNOMED CT or terminology in general will enable you to model content. It won’t. It enables you to model value sets for coded fields within content models, and to code some field names (or ‘nodes’ in a model). But if you want to build a model for a data set consisting of numerous data items, typically in some hierarchical structure, with some coded and some non-coded (e.g. quantitative, ordinal) nodes, then you need a content modelling approach.

Two approaches that have been succcessfully used to do this are archetypes (Archetype Definition Language, an ISO standard), and Intermountain’s Clinical Element Models (CEMs), which were originally expressed in ASN.1 and then more recently in CDL and them CEML. Both of these efforts use terminology, and indeed rely on it. But the terminology won’t give you the content models, it just allows you to populate certain parts of them. If this job could be done just by using terminology, I guarantee a) both openEHR and Intermountain would have done it that way and b) everyone would be awash with content models generated in some wonderful SNOMED tool. But we aren’t, and in fact the whole reason for CIMI to be set up was to create an internationally agreed approach to content modelling – 2 years ago, CIMI settled on the Archetype Definition Language as its language of choice for content modelling. Note that IHTSDO is a CIMI member, and representatives have been involved in CIMI from the start.

Some of this misconception appears to be due to hubris in years past from a few people in the terminology arena who don’t have a background in information systems, and don’t realise the complexity and sophistication of information modelling in IT. Senior managers hear this kind of thing, and of course it sounds reassuring, because it seems to imply that a) someone else is taking care of the problem and b) that all the money spent on terminology is going to solve everything in the end. Unfortunately, this is misguided, and little progress will be made under this thinking.

Another basic point is that terminology today only covers far less than 100% of data fields in common content models, that could in theory be coded (i.e. if say 40% of data fields in a data set could be coded then of that, typically only 30-50% have actual codes in real terminologies, i.e. maximum 20% of the fields in the data set). It is very easy to find elements in the openEHR & 13606 archetypes for which no SNOMED code exists. Intermountain code all of their data elements using their own internal terminology, for which the mappings to external terminologies are only sparse.

I say all of this as someone who was an IHTSDO standing committee member for 4 years (2y on Technical Committee, 2y on Implementation and Innovation Committee), and whose company built a high-performance subsetting terminology service. And I’m a supporter of SNOMED CT and IHTSDO, and actively working with the latter on terminology binding.

But I’m not a supporter of delusional thinking.

About wolandscat

I currently work in e-health, and am senior architect of the specifications, designed for semantic interoperability of health information. I also designed the Archetype formalism and model used in openEHR. Outside of work, I am interested in guitar, travel, and philosophy.
This entry was posted in Health Informatics and tagged , , , . Bookmark the permalink.

8 Responses to Does anyone actually understand what terminology is for?

  1. jpmccusker says:

    SNOMED won’t let you model the world. It’s a concept tree, best expressed in something like SKOS. OBO-based ontologies, SIO, PROV, and similar are more formal ontologies that do let you model the world. There’s a spectrum of ontologies (, and data models can fit along there just as easily as terminologies.

    • wolandscat says:

      Indeed. I didn’t talk about the difference between terminologies and ontologies here, since we are just discussing an even more basic point. BTW the main graphic on your reference is not displaying.

  2. wolandscat says:

    I love the batman cartoon. But the spectrum ( I don’t agree with at all, but possibly because of a mismatch in the meaning of the word ‘data’. Since many (most?) data models and data dictionaries are created with no regard to proper ontological or even epistemological semantics, they can’t usually be treated as coherent in a semantic sense. Unfortunately many models in e-health do this, including some standards. So you can’t usually put these kinds of things on a spectrum with ontologies and vocabularies. In fact, many ‘vocabularies’ would also fail a test of semantic coherence.

    In principle, the spectrum should be a good idea, I just doubt it applies anywhere much in the (badly modelled/described) real IT world.

  3. Pingback: Tankar om Socialstyrelsens Gemensamma Informationsstuktur | Oskar Thunman

  4. Grahame says:

    It’s not really different to the morons whomsaynthat XML will solve all teh problemz(TM). Comment here seems relevant too:

    • wolandscat says:

      yes indeed. We are always being told there is something that is the current solution to everything…

  5. Pingback: Another good one – oldish but still true | openEHR New Zealand

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s