Information models, DCMs and Archetypes

I will be attending a ‘Fresh Look’ meeting in Washington next week. The idea is to make some progress on the topic of  ‘detailed clinical models’ (DCMs). Some of the goals include setting up a repository of DCMs, establishing governance, and defining a roadmap for tooling. Underlying all this is a huge list of formalisms and models, including OWL, UML, ADL, HL7 MIF, XSD, LRA, RMIMs, CDA templates, greenCDA and so on.

Are models like Chinese food?

Some people in health informatics appear to believe that ‘models’ and ‘formalisms’ are just a detail to be worked out later by software developers, and that any combination of models can be interchanged, combined and converted, rather like the menu in a cheap Chinese restaurant. They could not be further from the truth.

Models are essentially formalised structural expressions – they are mathematical in nature. There are not only different models, but different kinds of models. Indeed, models cannot sensibly be discussed about without talking about things like:

  • the overall modelling framework, in which different types of models serve distinct functions;
  • the underlying formalisms, each of which has its own philosophy and mathematical properties;
  • the quality of the individual models;
  • whether the models in a framework are mutually coherent.

For these reasons, we cannot compare apples with oranges, e.g. XSD (a data exchange schema definition language) with OWL (a description-logic based ontology language), nor can we just assume that ‘EHR models’ such as openEHR/13606, CDA and CCR are interchangeable.

We need at least three things to make models work:

  • a framework – i.e. a theoretical system of formalisms that will satisfy overall needs
  • an architecture – i.e. base models and patterns which provide the basis for modelling in a certain domain
  • models – the specific models for the problem at hand.

No conversation about ‘models’ can be sensible unless people are talking within the same framework (or understand the differences between frameworks), and within the same architecture (or they understand the differences across architectures).

A modelling framework for DCMs

What kind of thing can be used as a modelling framework for building content models in e-health? As discussed in a previous post, we have found in openEHR over the last few years that it consists of 3 irreducible levels of modelling, as well as two further key elements:

  • modelling level #1: the reference model – this defines information as persisted, shared etc, and is expressed as an object-oriented model;
  • modelling level #2: archetypes – data point / group definitions – e.g. define the possible items in a ‘systemic arterial BP recording’;
  • modelling level #3: templates – data sets defined by aggregating and refining archetypes – e.g. to define a clinical document, pathology report, or any other use-case specific data set;
  • terminology interface: a way of formally connecting elements in the three levels of information modelling to terminologies
  • query formalism: AQL – a query language based not on physical database schemas, but on the archetypes, enabling queries to be defined alongside clinical models.

These elements are shown below.

You can’t get rid of the reference model, because it defines the concrete form of the data (like Quantity, CodedText, Observation etc). You can’t get rid of archetypes because they enable you to define a library of clinical content elements like ‘blood pressure – systolic’ and groups like ‘diagnosis – occurrences’ once, and you can’t get rid of templates because they are where you put the archetype bits together to make real-world data sets, define messages, forms etc. If you get rid of archetype-based querying, you are stuck with SQL statements defined against some concrete schema. And without binding to terminology,  you can’t state the relationship of terms and ref sets to information structures.

A realistic Modelling Architecture

To make the framework really work for us, we need to define various levels of model authoring, sharing, and add some tooling so that we can get useful software artefacts. The following picture shows a real world architecture as used in openEHR.

The key to this architecture is the computable use of templates to create an ‘operational template’ (as if you had built just a custom model on its own), from which software APIs, message definitions, XML schemas and so on can be generated. These downstream artefacts are what software developers work with. This presentation tells the story in a bit more detail.


Does this framework, and the openEHR architecture actually work? We can say today that it does, since every element of it has been put into production in real systems, and it works as intended. Tools like the Archetype Workbench show how the internals of the model types relate to each other. Some of the outcomes:

  • all data in all openEHR systems anywhere in the world are instances of the one reference model – that means we can build and deploy an openEHR back-end system without knowing anything about the archetypes or templates it might one day handle; it also means data can be shared with impunity;
  • software developers have been able to work with downstream XML schemas and APIs fully generated from operational templates;
  • converters in and out of exchange formats like CDA (and soon ‘green CDA’) have been demonstrated, also based on operational templates;
  • the querying capability (Archetype Query Language) has turned out to be of central importance, underlying most screens and all reporting.

Is it all plain sailing? Certainly not. The downstream tool generators need to be more powerful, and more rule-driven, for better flexibility, for one thing. AQL needs to be better connected with terminologies – which requires standards for terminology services to mature. Many lessons learned on the way have created the need for ADL 1.5, the latest version of the archetype formalism, now well into testing.

But the overall evidence is of significant savings in development effort, and a quantum leap in flexibility, as well as ability to compute with health information.


In any discussion such as the ‘Fresh Look’ initiative, framework and architecture have to be discussed and understood. It may well be the case that other participants don’t agree with the above architecture. However, they will need to think about a framework and architecture in order to make any comparisons between models meaningful.

Next: what is the meaning of the ‘reference model’ (aka information model) in this framework?


About wolandscat

I currently work in e-health, and am senior architect of the specifications, designed for semantic interoperability of health information. I also designed the Archetype formalism and model used in openEHR. Outside of work, I am interested in guitar, travel, and philosophy.
This entry was posted in Computing, Health Informatics, openehr and tagged , , , , , . Bookmark the permalink.

7 Responses to Information models, DCMs and Archetypes

  1. Peter Jordan says:

    In the case of a RDBMS at the Persistence/Data Storage Layer, would I be correct in thinking that AQL runs at the Application Layer and, when quering data, uses an Adapter – as part of the Data Access Layer – to convert queries to the relevant SQL/XML dialect?

    • wolandscat says:

      AQL is interpreted at the service layer, and AQL queries are partly converted into SQL statements specific to the RDBMS installation; when the data are retrieved, the rest of the AQL query executes.

      Note that nearly every installation of any EHR database system can easily have its own RDBMS schema, even if it is the same product. This is because of the need for locally specific optimisations and usage patterns.

      Therefore, being able to write queries independently of this means: only writing such queries once; being able to publish them and reuse them; and being able to rely on them when writing decision support and business intelligence software on top of the EHR.

  2. Diego Bosca says:

    For the moment just some comments:
    -I still see the templates layer to be the same layer to archetypes layer. I agree to use the word ‘template’ to separate high level concepts to ‘ground’ concepts, but they are still archetypes
    -Even if AQL is a great approach, it is not the only archetype query language already developed for openEHR (Zilics has also proposed A-Path). Correct me if I’m wrong, but I believe this has yet to be discussed on openEHR
    -This is not the first time I say this, but I don’t like the way openEHR handles terminology/coded texts. The prerequisite of a terminology service is hurting interoperability: if openEHR wants to be adopted has to be semantically interoperable with current systems, and having to rely on an external ‘do-what-you-want’ terminology service for every translation just feels like a bad design move. Is great for the definition of subsets, etc. but as it has yet to be normalized it is only useful at local level.
    -we cannot assume that EHR models are interchangeable, but as they are describing different parts of the same domain they are surely transformable.

    • wolandscat says:

      We need to talk at two levels: technical and ‘domain’, i.e. the level clinical modellers think at. At the technical level, ADL/AOM 1.5 templates and archetypes are essentially the one formalism (with some configurability). So for us people building software, your comments is true.

      From the viewpoint of clinical modellers, templates are quite different because they are built to represent some form or message, e.g. national discharge summary, with specific requirements (could even just be one hospital). And it may be software people building the templates. Archetypes on the other hand are thought of by clinical people as independent of context (well, some local specialised archetypes may be an exception).

      With respect to querying, you are of course right – see the spec page . I only mentioned AQL because most of the DCM people have heard of it. I do want to see a-path being more integrated, and in fact we are working on building it into ADL 1.5 as the ‘rules’ sub-formalism. We need more communication with the original authors to finish this.

      Re: terminology – are you saying that openEHR should define its own terminology service / repository? (Maybe discuss that on the technical list).

      Re: model transformability – I would say it depends heavily on the models. We tried doing HL7 RIM – openEHR/13606 years ago, but the it has never been done, because of the difficult way the HL7 RIM is defined. A more recent attempt to do 13606/CDA conversion showed up some similar problems. Even more recently, we have managed openEHR/CDA transformations based on archetypes, but it is not trivial.

      I think the key issue is the ‘patterrns’ in an underlying RM.

      • Diego Bosca says:

        I don’t know if we need to (should) define a terminology service (although defining a simple API would help a lot). I think that a more powerful coded_text data type could be enough. Current archetypes won’t need to be changed if we just put a new optional attribute for the qualifiers or translations/mappings

  3. William Goossen says:

    Well Thomas,

    to me the OpenEHR ADL world is the cheap Chinese restaurant approach: one framework, one model approach and one architecture and the whole world should eat that complete menu.

    I am afraid that many like a little better restaurant, or even a fine cuisine. It is the diversity that makes the world go around. Hence I see that the conceptual detailed clinical models, that closely represent the reality clinicians deal with, as the core information models. These can be represented in different formats, where in particular the logical modeling helps to achieve its implementation into different technical formats.

    I am quite sure that the whole world is not into cheap Chinese ADL for every day, but once in a while it is fine and easy. Others prefer the diversity and want to move from the logical DCM models into different feasible and useful implementations.

    I am afraid OpenEHR sees ADL too much as an Esperanto that only the happy few want to speak. Healthcare informatics would certainly benefit more from proper translations from and to different languages. There is or was a market for Android, Symbian, Unix, Windows xyz, Apple lions and tigers stuf, OS2, Linux and what else more we have and will get in the future. It just will not get to one speak in our brave new world.


    • wolandscat says:

      Hi William,
      my chinese restaurant analogy may have missed the mark, the stereotype is that you can mix any meat with any sauce and any style of noodles etc.

      Anyway, this is not the main point. ADL/AOM is a generic formalism, like UML. I don’t think anyone is suggesting we don’t represent object models in UML, although there are a few other ways of doing it. In the same way, ADL/AOM is a formalism for representing constraints on an object model. It is completely general. It is clear that we need such a formalism, but there is currently none that does the job. Like UML, perhaps there could be a few alternatives in existence at the same time, but I don’t think the world is interested in many competing formalisms of this kind.

      ADL and archetypes are separate from the question of the reference model used for specific archetypes. Indeed this is a key attraction: using archetype technology does not commit anyone to just the openEHR archetypes. Other RMs can be used; and the openEHR RM can be modified over time. But we need a way of computing with constraint based models.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s