The power of the openEHR archetype formalism – visualised

I made a new beta release of the ADL Workbench today, a tool whose core is a parser and 3-pass validator for archetypes written in the openEHR Archetype Definition Language. Today’s release includes visualisation that really shows how archetypes form a layer above standard information models. The basic idea of archetypes, for those who don’t know, is to be able to configure particular structures of reference model instances to represent specific domain content. For example, the following shows the Indirect Oximetry archetype. The tree column shows the information model (or as we call it in e-health more often, ‘reference model’) classes and properties in blue. So an Indirect Oximetry is a structure made from OBSERVATION, HISTORY, POINT_EVENT, etc and a bunch of ELEMENTs, each having specific meanings in the context of the oximetry Observation.

The meanings (‘SpO2’ etc) are assigned by the simple association of codes with reference model classes in the archetype (this is hidden in this view, but the raw ADL shows it). The visualisation shows constraints on cardinality, occurrences and value ranges in red. It is only when you investigate numerous archetypes that you see the diversity of possibilities. For example, the following Barthel archetype shows coded ordinals for the ‘Eating’ and ‘Dressing/undressing’.

The next interesting thing we can do, thanks to the new release, is to see where the rest of the reference model is. You can turn on the visibility of the reference model properties in stages using the ‘RM Visibility’ controls. The ‘Data level’ exposes properties classified as being ‘archetypable data attributes’ that happen not to be archetyped, but could be; I have highlighted a few of these.

The next level of RM visibility is ‘runtime properties’: data properties whose values can’t sensibly be constrained at design time, but are set at runtime. These include properties like HISTORY.origin, and the ‘feeder_audit’ property common to many RM types. It is useful for archetype authors to be able to see these properties, since they know they don’t have to create them by archetyping.

Finally, we can turn on the remaining Reference Model properties, classified here as ‘Insfrastructure properties’, i.e. properties that do not carry data, but instead data management, protocol etc information. These properties would never be archetyped, and are really only there for the sake of completeness. In the view below, we now see the totality of all the possible information model properties, archetyped or not, starting from the OBSERVATION class, as archetyped for ‘heart rate’.

The classifications of properties into the three categories is done in the information model schema driving the tool, and is completely configurable. It can be done for any reference model, whether for health as shown here, or manufacturing or finance.

We can do more: the reference model properties here will keep unfolding according to the reference model definition. The following shows a ‘deep dive’ on the OBSERVATION.state property, not currently constrained in this archetype. In this example, the static types were overridden to choose specific descendants, here, ITEM_STRUCTURE -> ITEM_TREE, ITEM -> ELEMENT and so on. This reference model traversal, including dynamic binding, presages how new archetype nodes are created (and will be, using this tool, when editing capability is added).

Strategic Importance

The larger question is: why is this of any real interest? Well, in classical software construction, developers are told to build class models that represent the domain concepts they are implementing. This advice leads to huge unmanageable models that never keep up with reality. To escape from this, we need a way of defining an information model that has certain commitments in it, and using it. In this case, the openEHR Reference Model includes health data concepts like COMPOSITION, SECTION, OBSERVATION, EVALUATION, INSTRUCTION, ACTION, HISTORY, EVENT, ELEMENT and so on, but there is no blood pressure, oximetry, heart rate or any other domain class. This gives us a nice stable reference model, but we need a formal way to control its use so that we can make it represent real data such as heart rate observations. This is the purpose of archetypes.

Technically, once the models are built, they can be used to generate many downstream products, including ultimately, software APIs and XML Schemas directly usable by software developers.

Archetypes can be thought of as a missing layer above UML: they provide the ability to define ‘information models’ above what is normally understood to be the information model layer. Moreover, with the right tools (more friendly ones than the ADL Workbench), non-technical domain experts can do the work of defining the models.

A new UML – AML

This situation might be about to change, with the issuing by the OMG of an Request for Proposals (RfP) for ‘Archetype Modelling Language’ (AML), which would be a set of profiles above standard UML enabling the kind of modelling illustrated above. I worked on this RfP document with Robert Lario (OMG Board), Dave Carlson (XMLmodelling.com; heavily involved in mutli-level modelling at the VHA) and other colleagues, because I thought it was worth getting this technology into the mainstream finally – and apparently others did too (including the CIMI effort), which is nice. In the ADL Workbench, all this is easy to do in one sense, because we have total control over all aspects of model representation, and are not affected by limitations of UML, Ecore, MOF or any other ‘given’. The challenge of writing an ‘AML’ standard will be to make it all work within the MOF infrastructure – which doesn’t natively have type/meaning associations, multi-lingual support or constraint semantics, all native to ADL.

If it does, it will be a minor revolution, because it will liberate information modellers from data models (or ‘content modellers’ from ‘information models’…). The textbook approach doesn’t work for real software, and never has – all it does is give you a piece of software that embodies requirements that were out of date on day 2 of its deployment, but quickly becomes too cumbersome to keep changing, especially when terabytes of data are implicated.

Back in openEHR-land, we continue to use the native archetype formalism, and make fast progress on the semantic capabilities of archetype modelling. Watch this space for some of what we have learned…