Ontologies in health: ready for prime time? IAO versus openEHR

A lot of ontology work has been going on for some years that comes loosely under the BFO and OBO activities, which stand to improve how computing in health is done. BFO is the Basic Formal Ontology, and OBO is the Open Biological and Biomedical Ontologies. Work from these efforts is currently being used to better structure the upper level of SNOMED CT, in cooperation with the IHTSDO, its owning organisation.

This week I had the opportunity to read a new paper by André Q Andrade, Maurício B Almeida and Stefan Schulz, entitled “Revisiting ontological foundations of the OpenEHR Entry Model” (PDF). This paper seeks to analyse the openEHR ‘clinical investigator ontology’ which Dr Sam Heard and I published in a MedInfo 2007 paper, using the Information Artefact Ontology (IAO) as the reference.

IAO ontology hierarchy

IAO ontology hierarchy

We need such a review, because openEHR archetypes are being created apace, and their classification and internal structure are not as disciplined as they need to be. This isn’t just an aesthetic concern: it may impact on computability.

The IAO tries to fill the void in realist ontologies by defining an ontology of information. It’s the right thing to compare the openEHR Entry ontology to.

You can see the openEHR ontology below.

Here, the upper-case blue labels such as OBSERVATION etc are the names of classes in the openEHR Entry package.

This ontology is defined to be about information, specifically recordings in health, so we used short names like ‘history’ and ‘observation’ for the categories, on the understanding that these could not be confused with realist categories.

On review, this paper is somewhat disappointing, primarily because the IAO, which I had expected to be in good shape is in fact far from it, and I think of little use in health until it undergoes a complete review, preferably with some domain experts as participants. This isn’t a criticism of the IAO authors or intention – the intention is correct. It just needs to be better designed.

Other than that, the paper is a pleasure to read simply for the reason that the authors understand ontologies properly and don’t make the uninformed errors pervasive in many health informatics papers.

A few comments on the paper.

From p 3.:
The proposed merged ontology can be seen in figures 2 and 3. While the rationale is quite different, the IAO proved capable of faithfully representing the meaning of each information type. Observation is a Data item resulting from the medical encounter, being a description of an entity, usually, the patient. By classifying the other classes according to their intended outcome, we merged the Proposal classes under Objective specification and Instruction classes under Plan specification. Finally, Action was represented as a special type of report, since it necessarily describes a process that has the patient as participant.

There are some misunderstandings of the openEHR ontology categories here (which may well be the fault of our original paper, as it was a 20-page paper squashed into the 5-page format demanded by MedInfo):

  • Observation: is significantly broader than presented here: it may be a description of a whole entity (patient), but much more usually some part of the patient (e.g. auscultation of chest), a process of the patient (e.g. blood pressure measurement), or something about a tissue or sample from the patient (e.g. microbiological assay), or a report of patient circumstances relevant to care, e..g any ‘story’ of the patient (might be about the job or home situation), facts about about family, etc
  • The openEHR Action type might describe a process (like surgery or drug administration) that has the patient as a participant, but can describe any action performed by the healthcare system on behalf of the patient, including booking for surgery, dispensing a drug, etc. This is of critical importance in tracking process that are for the patient, not just data about the patient.

In any case, IAO:data_item can include anything at all, since it has no epistimic sub-classifications. (See this note on the ontology:  “…I think I might defer to Barry, or to Brian Cantwell Smith JAR: A data item is an approximately justified approximately true approximate belief, 2/2/2009”).

The siblings of IAO:data_item are not in any case correct in my view. There is a sibling IAO:measurement_item whose defiinitions is apparently: “Examples of measurement data are the recoding of the weight of a mouse as {40,mass,"grams"}, the recording of an observation of the behavior of the mouse {,process,”agitated”;}, the recording of the expression level of a gene as measured through the process of microarray experiment {3.4,luminosity,}“.

Just because something is ‘measured. doesn’t mean it isn’t ‘data’. Measurement is a means of data acquisition, not a definitional category.

Similarly, cartesian spatial coordinate datum is not an appropriate sibling of either of the above; it is some kind of data that happens to have a certain structure/format.

Lastly, the kind of information that is about ‘models’ / ‘ontologies’ is mixed up with these other types, in the class IAO: data about an ontology part. That’s really surprising. In a ‘normal’ document about real things, you will never see such a thing. In a document about models or ontologies, you will only see such things, and no factual instances whose structure and/or meaning the model is supposed to describe (other than by way of example, which would require a category of something like ‘ontology_instance_example’).

This last one tells us that there probably needs to be a sibling of information content entity such as ontology_content_entity, and possibly also model_content_entity, depending on whether we think that the latter is a kind of the former or v.v.

So four things have been conflated here under IAO:data item :

  • the essence of ‘data’, i.e. some possibly true justified belief about something to do with the topic of the information (here: care of the patient)
  • the means of obtaining it: measurement versus other means
  • the format or structure
  • information about models and ontologies

Even worse, many of the children of information content entity (the parent of data_item!) are mostly ‘information in different forms’ such as diagrams and symbols.

There is evidently some work to go on getting IAO in to shape before it could reasonably be used for this kind of comparison.

Also from p3:

However, several epistemic entities were not successfully modelled, as they are not properly representable in realist ontologies. As an example, consider the metadata “confounding factors”, defined as “Comment on and record other incidental factors that may be contributing to the blood pressure measurement. (…), level of anxiety or ‘white coat  syndrome’; pain or fever; changes in atmospheric pressure  etc.” Events such as pain and changes in atmospheric pressure have little or nothing in common that could map them to one category in an ontology. E.g. a confounding factor can be a process, a disposition, or a quality. Whether such “non-ontological” classes – characterized as “defined classes” by (Smith et al., 2006) – belong in an ontology at all, is contentious. However they can represented by logical definitions in an OWL model (Schulz et al., 2011).

The data points referred to here are in the correct place in the openEHR ontology, which is the ‘state’ property of the Observation type. ‘State’ here refers to patient state, and is part of a generalised data/state/protocol model of recording clinical information. Briefly:

  • ‘data’ – the focal datum being acquired, e.g. heart rate, blood pressure etc
  • ‘state’ – facts about the state of the patient as a whole organism that are required to interpret the data, e.g. exertion level, position, anxiety level.
  • ‘protocol’ – information about the methods, instruments etc of observation.

It has been argued (reasonably) e.g. by Dr Dipak Kalra that a fourth category ‘justification’ should be included here.

It is not surprising that information like ‘confounding factors’ is not seen as a candidate for being in any current ontology: for that to be the case, the ontology would need to recognise something like the data/state/protocol model of measuring data on an organism, or at least an ontological entity such as ‘interpretative meta-data’, which as far as I know doesn’t exist in IAO.

So the statement ‘several epistemic entities were not successfully modelled’ really belies a need for changes to the IAO to include either new categories, or preferably properties on existing categories, e.g. a status like ‘interpretative meta-data’.

There are some criticisms of the openEHR BP archetype that appear reasonable (end p3):

This is clearly shown by the lack of rigor in the distinction between the 4th and 5th sounds, which refer to perceptive capabilities of the actor, defined as (our emphasis) “phase IV, sounds become muffled and softer; and phase V, sounds disappear completely. The fifth phase is thus recorded as the last audible sound” (Pickering et al., 2005).

The descriptions used here are precisely those understood by working physicians using a sphygmomanometer. I don’t believe it is up to archetypes, as information modelling objects to define 4th and 5th Kototkoff sounds in their factual sense – that is precisely the job of OGMS and/or other ontologies such as FMA. What the archetype needs (and lacks) is connections to these, although it is not currently clear to me if they would actually help in any practical computing sense.

In the section on Action archetypes, there are some misunderstandings (again, which may be due to our own highly summarised paper, although the openEHR specifications do describe it in detail). From p4:

We examined the Medication Action Archetype, which represents one of the most commonly described healthcare interventions. Its precise reconstruction was not straightforward, as it included states that contradict the existence of a  process, e.g.Cancelled or  Postponed states. In other words, a cancelled process is not a kind of process, since the process never actually took place. Therefore, a different treatment is required, as only plans about medication administration processes can be cancelled or postponed, not the processes themselves (Raufie et al., 2011; Schulz & Karlsson, 2011).

Here the authors are being too theoretical, and may have misunderstood what is going on here: if we remember that all categories in the openEHR ontology are information about some real thing or process, this also goes for the Action category. Instances of the Action category indicate states of the process of the intervention which they document. So Actions that document the cancelling, postponement or suspension of a process don’t contradict the existence of the process, they simply document its being cancelled etc.

I would suggest that state models of real processes which are being documented by IAO or similar recording entities need to be taken account of in the IAO in some way. We did this in openEHR with state machine related attributes in the Action type.

In conclusion, I welcome analysis such as given by the paper I have reviewed here, but I think they will be of limited value until far more substantial work is done on ontologies like IAO. I also think we need to consciously try to understand what utility such ontologies provide for real health computing. Right now, this has not been articulated. I can think of some: if archetype data points were properly linked to ontologies that knew about things like data/state/protocol, we would be able to machine search and review such archetypes in ways we cannot today.

Right now it seems to me that the openEHR Entry ontology is far more useful in e-health than the IAO, for understanding, modelling and computing with real health information.


About wolandscat

I currently work in e-health, and am senior architect of the openEHR.org specifications, designed for semantic interoperability of health information. I also designed the Archetype formalism and model used in openEHR. Outside of work, I am interested in guitar, travel, and philosophy.
This entry was posted in Health Informatics, openehr, Philosophy and tagged , , , , , . Bookmark the permalink.

3 Responses to Ontologies in health: ready for prime time? IAO versus openEHR

  1. André Andrade says:

    First, thanks for reading the paper so thoroughly – it makes writing it worthwhile. Also, thanks for the criticisms, they are essential for improvement, even if I don’t agree with some of them. As one of the authors, let me make some notes:
    1) About IAO: the ontology is domain independent, and it should remain so – perhaps OGMS is a better place to put domain-related classes. However, there are indeed few users of the ontology in the EHR community. If there are important changes to be made, I am sure they will be considered as it is an eminently open process. That being said, IAO is currently being used in several projects, particularly the OBI (Ontology for Biomedical Investigations), quite successfully. There is, IMO, a pretty robust framework based on the notion of persistence – every information content entity is dependent on some artifact and is about some entity. “Data item” is actually superclass of “measurement datum”, not its sibling. I agree that the process that generates the information is important, but there is a difference between my height (quality) and my height as measured by a measurement tape (measurement datum). It is not conflating process and information, it is saying that 180cm is an information entity about my height (quality) which is the output of a height measuring with a measuring tape process. If you require that as a class (e.g. “length measurement datum”), this is the place. There is also a difference between the pattern of black and white in a paper, which can be figures or texts, and the fact they are an assertion that is a truthful statement (though I think your interpretation makes sense, and this aspect could be improved).
    2) Thanks for clarifying Observation and Action, the formal logical definition is indeed too restrictive.
    3) This paper is part of a research whose objective is to evaluate if it is possible to fit information models into an ontological framework represented in a logical language. Some considerations are indeed theoretical, and based on previous research on ontologies. For instance, realist ontologies require existence, therefore a canceled surgery is not a surgery, but a plan. OpenEHR action state has no sense in BFO ontologies as they merge different hierarchies. This is not intended as a criticism, just a different model. But the rationale for different hierarchies in BFO is very widely discussed, and there are cases where state is ambiguous when translated to a different model (I would refer to the paper “Bridging the semantics gap between terminologies, ontologies, and information models”). I am not arguing that openEHR is ambiguous within, but that parts of it only make sense if your whole world is OpenEHR based, and all archetype creators fully understand that. And I would insist that on the long term, allowing archetypes being created without a formal modeling framework is not sustainable – mapping inconsistent archetypes will be still very difficult.
    4) My goal was not to compare IAO and OpenEHR, and certainly not a “versus” comparison. The motivation of the research is that information models such as OpenEHR set a lot of context to represent medical statements, as medical language is the basis for modeling. E.g. if location is in a symptom archetype, it means body part. If it is in a demographic archetype, it means an address. Pain is a confounding factor, but exertion is not, simply because it is being considered in the archetype. This is fine when developing EHR systems. Ontologies are, according to Mathias Brockhausen, the art of teaching computers. It requires seeing the very trivial distinctions that humans ignore and make them explicit. In the CIR ontology, “Action” is connected through taxonomic links to “Recorded Information”, i.e. it is a kind of “Recorded Information”. Obviously, that is not true, but such a distinction is not really necessary for people. Computer science ontologies are intended to help integrate heterogeneous sources using logics, and we wanted to evaluate if ontologies can represent medical language, even if some epistemic distinctions are not captured. In fact, our final conclusion is that IAO is actually very helpful in solving these context-dependent ambiguities – not that it is better overall, it is just suited for other purposes. On the other hand, OpenEHR archetypes showed that it lacks very important entities, such as a causation relation, which should probably be introduced in application ontologies. If alignment is possible, it would bring advantages to both sides.

    • wolandscat says:

      André, the ‘openEHR versus IAO’ thing was for fun, since it’s a blog, not a journal, and people love reading about that kind of thing. I understand of course that your purpose was not a comparison as such.

      Clearly domain specific things belong in OGMS, FMA or wherever. However, concepts like those in the openEHR CIR ontology arguably fall in a space in between – they can apply to a very wide domain – all of science and medicine, if you agree to the idea of a rational problem-solving process as the paradigm of investigation and documentation. It is a question as to where IAO stops, and where domain-specific ontologies start.

      I still have problems with IAO’s sibling categories at both the information_content_entity and data_item levels. At both levels, I can’t see how the categories are mutually exclusive and/or exhaustive. In health they just wouldn’t work. Some children of the former category seem to be a kind of DTP model of content types that Word or FrameMaker or DITA would know about. That’s probably ok. But ‘narrative_object’ and ‘directive_information_entity’ & children seem to be categories relating to the content, not the form (i.e. symbol/figure etc).

      And then all these are siblings of ‘data_item’, which is a contextual role not a kind of content in most realistic DP contexts. And the categories measurement_datum and setting_datum seem an attempt to cover the epi. categories a posteriori and a priori. We thought that would be useful many years ago, but it turns out to not work very well in health – a very information intensive field!

      In any case, these latter categories are to do with how the information entities were created, not what they look like nor what their content is. So we have sibling categories at 2 levels based on at least 3 different, generally orthogonal criteria. Usually that is bad in an ontology…

      I think IAO needs to be tested against some real domains. Health is a good place to start.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s