(With apologies to those who use international English and normally spell it as ‘flavour’; in this post, I will spell it properly in informal text, and in the US way when referring to the formal HL7 null flavour concept.)
Grahame Grieve has pointed out in a recent blog post that I am a major critic of HL7 ‘null flavours’. This is correct, but the reasons are probably misunderstood, so I will try to clarify here.
openEHR: interoperability or systems?
An initial comment I will make is that there is a notion that openEHR is ‘about defining systems’ whereas HL7 ‘is about interoperability’. This is incorrect. openEHR is primarily about solving the information interoperability problem in health, and it addresses all information, regardless of whether it is inside a system or in a message. (It does define some reference model semantics specific to the notion of ‘storing information’, mainly around versioning and auditing, but this has nothing to do with the main interoperability emphasis.)
To see that openEHR is about generalised interoperability, all that is needed is to consider a lab archetype such as Lipid studies in the Clinical Knowledge Manager. This archetype defines a possible structure of a Lipids lab test result, in terms of basic information model primitives (Entry, History, Cluster, Element etc). In the openEHR approach, we use this same model as the formal definition of this kind of information is in a message travelling ‘between systems’, in a database or on the screen within a ‘system’. This is one of the great benefits of openEHR: messages are not done differently from everything else. Neither is interoperability of data in messages between systems different from that of data between applications or other parts of a ‘system’.
This matters here, because in our view, ‘null flavours’ has to be modelled in a way that makes sense for all uses.
Null Flavours in HL7 – objection #1: ‘subtractive modelling’
The first problem is systemic in HL7, namely is the inclusion of context-specific attributes in base classes. What I mean by this can best be illustrated by some simplified data type class definitions. Imagine that you want to define a data type ‘Quantity’ for use in various kinds of information systems – including messaging gateways, EHR systems, decision support – anything. The first basic rule of modelling is to define core abstraction that all software developers are to share. Below is a partial class definition with three properties and one operation, in a Pascal-like pseudo-code. A realistic definition has much more of course.
class Quantity magnitude: Real -- Numeric value of the quantity. units: String -- Stringified units, expressed in UCUM unit syntax, e.g. "kg/m2", "mm[Hg]", "ms-1", "km/h". precision: Integer -- Precision to which the value of the quantity is expressed, in plus alias "+" (other: Quantity): Quantity -- Addition. pre other /= Null do create Result.make (magnitude + other.magnitude, units, precision) end invariant magnitude /= Null units /= Null end
Note the use of a pre-condition and class invariants, providing some of the semantics. In any realistic class definition, the properties, routines and invariants will be more numerous, and when coded, there will generally be interdependencies. Therefore clarity of thought in the abstraction being defined is important in getting it right.
The basic rule is to define the core abstraction independent of any particular use. So let’s assume that a group of people (perhaps in some standards organisation…) agree on what ‘Quantity’ means formally, then they could define the above class to express the core semantics. This class could then be used in software that:
- A – uses Quantities to do statistical analysis;
- B – represents Quantities in data captured from user application screens or devices in an EHR system;
- C – uses Quantities to represent lab test reference range data, as found in a typical lab test handbook.
Clearly in each of the above cases the real world context is likely to be quite different. In case A, the statistical values may be a) derived according to some specific probability model and b) may belong to some particular ‘study’. So a new class might be created within the information model to add these characteristics. This can be done in two ways: by inheritance and by encapsulation (i.e. ‘wrapping’). The inheritance approach makes sense if all Quantities in the statistical analysis application are going to be ‘StatisticalQuantities’. The wrapping approach is more flexible, and allows both ‘StatisticalQuantities’ and normal Quantities to co-exist, and be transformed into/created from each other.
Wrapping is done as follows:
class StatisticalQuantity quantity: Quantity probabilityModel: String studyName: Identifier end
The bold line is the ‘wrapping’. Now, similar things are like to be required for Quantities used in cases B and C above.
Null flavours is one of the things that is typically needed for case B, because this case involves ‘data acquisition’. Whenever data are being acquired by a computer system from the external environment, whether it be a transducer (e.g. a pressure monitor on a gas pipeline) or a human being, problems can arise with obtaining the values. In control systems engineering (see e.g. “SCADA” in google) ‘data quality’ markers are added to data items on incoming streams to indicate such problems, which might include inability to obtain value, stale value (existing value too old), corrupted value (e.g. a value was received but with a wrong checksum) and so on.
HL7 quite rightly recognised the need for something similar in health information – data are not always obtained from machines and humans as expected. From HL7’s point of view, the scope for a model of data types is apparently case B only, i.e. the case where null flavours (and other similar context-specific properties as well) are required constitutes all possible instances of all data types.
It should be immediately clear that a model targetted only to case B, and including all possible semantics in a single class (or in higher base classes, inherited even more widely) will not serve cases A or C, or any of the other multitude of cases that could be mentioned. Nevertheless, this is HL7’s approach. This might not matter if HL7 had called their quantity data type HL7MessageQuantity, so that everyone else would know what it really represented. However, they called it PQ (PhysicalQuantity), and have marketed it along with other data types as being essentially universal for health data interoperability. This causes a lot of problems.
Firstly, it is clear that even within HL7-only software development environments, statistical and lab reference range data (to take the case A and C examples above) are likely to be required somewhere in HL7-related software solutions. That means an HL7 developer who wants a ‘clean’ Quantity class with no ‘nullFlavor’ doesn’t have one; they have to have null flavours on every quantity instance.
Again, this might not have mattered if HL7 had agreed that its standard was for its own restricted scope. However, HL7 has managed to push its data types (with some reworking, which although bringing improvements, doesn’t really change the underlying problems) into an ISO standard 21090, entitled “Health informatics — Harmonized data types for information interchange”. The HL7 data types, designed for the narrow HL7 scope clearly can’t easily be used in other kinds of models or software development that would nevertheless otherwise agree to the same core semantics of each data type (some would say it can, you just ‘ignore’ the nullFlavor attribute if you don’t need it. Multiply that approach by the number of other similar base class attributes in HL7, and you get… hacking).
Whenever the issue is brought up, HL7 states that a ‘profile’ of 21090 can be created for other uses/users, by which they mean a variant specification in which certain base class properties are ‘removed’ or ‘nulled out’ with some kind of invariant statements – in other words, deriving other types is done by ‘subtraction’. This is nonsense. There is no getting around the fact that the modelling approach breaks basic principles of object modelling:
- it is not a clean abstraction, because it mixes multiple concepts in the same class (e.g. ‘quantity’ and ‘acquired data item’), preventing them being designed, reasoned about, and used separately;
- it breaks the extensible nature of normal object models, which requires properties to be added going down the inheritance hierarchy, and to the most specific class for which it makes sense;
- it breaks the ontological view of class hierarchy specialisation, which is that all attributes in class should be valid possibilities for instances of all sub-classes. For example, we would not put ‘wingspan’ as an attribute of the class ‘Mammal’, because it can only possibly apply to winged mammals, not all mammals. Putting it on Mammal forces all instances of Horse to have a ‘wingspan’ data element;
- further it makes models and software brittle, since applying this principle in the extreme requires all possible properties of all descendant types to be included in the base class. This can never be successfully done, since noone can predict all future subtypes needed in a model.
[For interest: an interesting site discussing OO design principles].
NullFlavor is just one example of this poor design thinking in HL7 – there are other attributes specific to HL7 messaging (a subset of ‘case B’ above) that are included in the ‘Any’ class of the data types model, forcing all data types to include them. The following is the ISO 21090 (similar to the HL7) definition of the QTY type, with unavoidably inherited attributes italicised.
type QTY = class ( validTimeLow : characterstring, validTimeHigh : characterstring, controlInformationRoot : characterstring, controlInformationExtension : characterstring, nullFlavor : NullFlavor, updateMode : UpdateMode, flavorId : Set(characterstring), expression : ED, originalText : ED.TEXT, uncertainty : QTY, uncertaintyType : UncertaintyType uncertainRange : IVL(QTY) )
For this class to be used in a non-HL7 system, something has to be done not just about the italicised attributes, but if the model is obtained as real software (as many implementers will do), about the related methods and implementation as well – which could all be inter-dependent.
The same subtractive modelling unfortunately abounds in the RIM as well. It causes endless problems, such as this ISO standard purporting to be for (and assumed to be for) any health data exchange situation. HL7 might like to pretend that the scope is narrow, but noone has an appetite for another data types standard at international level. Unfortunately, that is still what we need.
Null Flavours in HL7 – objection #2: ‘universal inclusion’
In the ISO 21090 QTY class text above, the non-italicised properties are those defined new in the QTY class. The class types (in upper case) are from the same set of ISO 21090 data types, and therefore inherit the same universal properties. That is to say, the property uncertainty of QTY is itself of type QTY, which as we know inherits from the ‘Any’ class, forcing it to have the nullFlavor property.
The effect of this is that not only is there a nullFlavor marker for the original QTY instance (which will of course be sensible in situations where data quality applies), but there is a nullFlavor marker for the uncertainty field, and for all other fields of types inheriting from the Data Types ‘Any’ class. Since this type inclusion carries on recursively, a single QTY instance carries numerous null flavour markers rather than the one that might be expected – if only first-level markers are considered, the total is 6. The following diagram, taken from the appendix of the openEHR data types specification illustrates the problem.
It seems pretty obvious that this situation complicates software: when is a data value really null? What if the top-level marker is not set, but one of the markers on a constituent field, e.g. uncertainty, is set?
The modelling problem here is really a consequence of the subtractive modelling approach described above, since it results from the inclusion of a context-specific property in the most general base class.