A FHIR Experience – the formalism

This post continues the review presented in the previous post, where I looked at the Administrative resources of FHIR. Here I take a look at the formalism used in FHIR, i.e. how the resources (and profiles) are formally expressed. FHIR resources are described in terms of a custom formalism expressed as hierarchical tables. The appearance of a resource, along with the elements of the ‘language’ is shown above.

It has to be said in passing that the FHIR website and various visualisations, linking etc is a masterpiece of content-driven presentation.

FHIR resources contain extension elements that may be populated to express custom content. This is done within profiles, which are derivative models of base Resources.

The concrete source form of the resources appears to be the XML available from this page. A special resource called StructureDefinition appears to act as a schema for the XML definitions, and for any profile of a resource.

According to the documentation, FHIR resources are intended to be mapped to mainstream object-oriented software (which today, is really object-oriented + functional programming, so where I use the acronym ‘OO’, OO+FP should be understood), as well as data formalisms like XML and RDF. As described here, OO is an additive paradigm, while XML / XML schema is a subtractive / mixed paradigm. RDF is an additive paradigm. FHIR uses a mixed paradigm formalism, and suffers from a number of issues common to such approaches.

The open source HAPI FHIR library by James Agnew implements the resources (and much else from FHIR) in Java, which gives some insight into what the mapping from StructureDefinition to OO looks like. The resource classes themselves are generated from the FHIR resource definitions.

Polymorphism

In FHIR, polymorphic typing is an issue. Rather than use a static super-type in a resource, which might have an instance of a sub-type attached at execution time, two non-OO constructs are used, both shown in the screenshot below, of the onset and recorder attributes of the AllergyIntolerance resource.

This is how the first looks within the OO model environment of the ADL Workbench.

Choice Elements

The first construct is the FHIR ‘choice’ element, based on XML-schema’s choice construct (at least I can’t imagine what other inspiration there could be). This is visible in the example above, in which the type of the onset element has been set to a choice of dateTime | Age | Period | Range | string. To achieve this, concrete attributes with duplicate names have been created: onsetDateTime, onsetAge and so on. The FHIR specification also limits the cardinality of such attributes to 1.

This immediately creates potential for confusion in downstream implementation, and indeed the FHIR site itself seems to recognise this:

Note: In object-orientated implementations, this is naturally represented as a polymorphic property. However this is not necessary and the correct implementation varies according to the particular features of the language. In XML schema, these become an xs:choice of element.

FHIR Formats page

How it would be ‘naturally represented as a polymorphic property’ in an OO language is not made clear; a common super-type is needed for that to happen. Now, one could just put ‘Any’ as the type, and then… what? Well, OCL constraints could conceivably be defined in a UML environment, but how do they get translated to say Java? Presumably by being transcribed to a collection of extra functions and assert() statements. Doing this properly really requires a proper constraint-based computing environment.

The use of this structure is by no means rare in FHIR. The above page provides a link to a maintained list of ‘choice elements’, indicating both how common it is (180 occurrences), and demonstrating the need for special measures to cope with ‘choice’ in implementation (in 25 years of software engineering, this is the first time I have ever seen such a list). The HAPI implementation of the AllergyIntolerance resource is correspondingly full of code that hides the difference between a single polymorphically typed attribute, and a multitude of attributes (presented as getters) corresponding to the ad hoc set of allowed types (this is no criticism of the HAPI code as such, it’s just a logical consequence of the underlying resource definitions).

The real problem here is that the choice construct is a reduction or subtractive constraint, in the sense described here, but mixed in among additive-style structural modelling elements. Its use confuses different dimensions of the problem such as:

what is the intended / preferred return type of ‘onset’ within a data structure representing AllergyIntolerance? The obvious answer is that it is a date/time, quite possibly partial, in the ISO 8601 sense (e.g. patient remembers only year, or only year and month) – NB, it is a separate question as to what an application wants to display on the screen;
the different ways such a data item might have been entered in source systems: as a date, an age (‘when I was 15’), a period (‘for the last 5 years’);
whether a computable representation is even available; the use of ‘string’ as a possible type, mixed in with the rest, implies this case;
whether the ad hoc list of types is known to be correct for all possible uses of the resource.

If FHIR were intended to make life easier for application programmers, this attribute would have been specified as an ISO 8601 DateTime (allowing partial values), with differing entry forms (age etc) being silently converted in a standard way in the service, rather than forcing every single application to figure out the conversions. The possibility of informal narrative data (a non-syntax string value) would be dealt with by other means, which should be standard throughout the entire model ecosystem, since it is a general issue.

As an aside, XSD choice is a known problem for mapping to OO type systems. This 2010 paper by Suad Alagic covers the many difficulties of mapping XSD to OO, and has this to say about ‘choice’:

XSD choice represents a major problem for OO interfaces to XML. Specifying a fixed number of subtypes of a type is contrary to the core features of the OO model. Because ofthe lack of a suitable representation for choice, some OO interfaces use the same representation for choice and sequence groups. This representation has nontrivial implications because these two types of groups have different semantics. In fact, widely known OO interfaces to XML do not have a suitable representation of XSD groups and its three subtypes (i.e., sequence, choice, and all groups). There are many more problems in mapping XSD schemas to OO schemas [7].

One has to also question how FHIR resource developers could know that they have gotten the choice of classes correct in any particular case. Consider this set of choices from the Contract.term.offer.answer.value data element (available here):

["boolean", "decimal", "integer", "date", "dateTime", "time", "string", "uri", "Attachment", "Coding", "Quantity", "Reference"],

One has to wonder why the type of this attribute isn’t simply ‘Any’.

It is very unlikely that these 180 choice type lists are all correct. Consider the following ones from the choice-elements list.

"Observation.value[x]": ["Quantity", "CodeableConcept", "string", "boolean", "integer", "Range", "Ratio", "SampledData", "time", "dateTime", "Period"],

  
"Questionnaire.item.enableWhen.answer[x]": ["boolean", "decimal", "integer", "date", "dateTime", "time", "string", "Coding", "Quantity", "Reference"],

For some reason, Observation.value cannot be a date, a decimal or a Reference, while Questionnaire.item.enableWhen.answer can, but the latter cannot be a Range, a Ratio, or a Period. It’s hard to see the logic here.

Here is what Observation.value looks like:

In general, specific choices of attribute dynamic (i.e. polymorphic) types usually don’t relate to their owning types, but instead to specific instances that occur in specific situations. That means that such type lists should usually not be expressed in the main model base in which the primary types are defined, but instead in artefacts more specific to the local or otherwise specialised places where they apply. In this way, we are much more likely to find that in one situation, the Questionnaire answer types are just {boolean, string}, and in another, {date, datetime, string}, and so on. This is likely to apply across the whole FHIR resource base.

In this approach, such type choices would be modelled within a separate constraint model layer, as happens with the HL7 CIMI archetypes, which would a) greatly reduce the FHIR service code and b) isolate changes to type lists from the underlying information model.

Ad hoc Type choice as a Value set

The other method FHIR uses to handle ad hoc type choices is to list the type names as a Value set, as shown in the second highlighted attribute recorder, in the above screenshot of the AllergyIntolerance resource. Here the type is defined as:

Reference(Practitioner | PractionerRole | Patient | RelatedPerson)

This kind of ad hoc list can only be used within the FHIR Reference type, and requires the strange device of a regex enumerating all the Resource type names to check it. Needless to say, such a regex has to be constantly maintained as the set of Resources changes.

The type choice for each Reference is a kind of constraint, and likely to be locally specific. But since it is directly embedded within what is otherwise (or should be) an additive-logic information model, it creates a dependency of the rest of the structural model on a constraint which is likely to be volatile, in the same way as the first form of ‘choice’ described above.

In some cases, the type list for Reference implies something else: the lack of an appropriate super-type for classes that do in fact share some common semantics. As it happens, there is no inheritance across most of the FHIR resources, other than generically between each Resource and ‘infrastructure’ classes like DomainResource and Resource. There is none among the Administrative resources, where one would usually expect to see it. Consider that all of the types in the list above should correspond to entities that could be a ‘recorder’ (in the sense of an author) of the data. In most demographic models, these types would have a general parent such as Party. Indeed, introducing such inheritance is the only way an object-oriented representation can properly represent an appropriate type for the recorded attribute:

The lack of a type hierarchy among related types has the effect that numerous resources contain replicated copies, or near-copies of common semantics. The consequence for software is replication, reduced maintainability, and a lack of re-use. For downstream information-processing, it is an inability to know whether data structures that appear to be nearly the same can be treated the same way; the default with FHIR is that every Resource is its own thing.

Specific Choice Types

Some types of choice used in FHIR are an attempt to solve a common modelling problem, but fail to use the usual pattern. The MedicationDispense resource provides an example:

Here there are two instances of a choice of {CodeableConcept, Reference(x)}. This pattern intends to represent a property that either names a kind of thing using a coded term, or refers to an instance of that kind of thing, whose type will be specified by the same coded terms. There are various solutions to this, but they are all variations on a dedicated type of roughly the following form (using the FHIR data types):

class TypedReference
    type: CodeableConcept [0..1];
    reference: Reference <Any>[0..1];

    Invariants:
        kind /= null or reference /= null
end

This kind of type should only be used in a static model for situations in which, at runtime, the values really could vary between a coded type, or a reference, or both. For situations where it is known at design time that only one is available, then only the relevant simple type should be used.

A related choice type is MedicationDispense.performer, defined as follows:

This is essentially the same thing as above: a type and a reference to an instance. Since neither of these is actually modelled with a dedicated type, nor even follows the same naming within FHIR resources, there is no way to write software that embodies the necessary semantics, not to achieve any re-use in the multiple locations this pattern occurs. This could be fixed quite easily.

Conclusions

In summary, the various cases of choice-like construct in the FHIR resources could be replaced by one of the following in each case:

limited / local type constraints in a constraint layer (FHIR profiles), where the correspondence to actual use cases will be far better known; this has the benefit that 180 or so elements would be typed with a single abstract type (Any, Element, DomainResource etc), greatly simplifying downstream software implementations that handle them;
the addition of abstract types and inheritance across the resources, as shown (but not limited to) in the Administrative types; ad hoc type choices such as actor: Reference (Practitioner | PractitionerRole | Organization | Patient | Device | RelatedPerson) would then be replaced by Reference (Party) or similar;
dedicated types like the TypedReference class shown above, to replace particular patterns of type choices, like {X, Reference(X)}; this would allow specific logic to be written in software to handle the run-time situations arising from one or other being set.

The application of the above kinds of changes would have the following benefits:

reduce the size of the resources, due to removing (a lot of) replication;
reduce the dependency of globally used FHIR resource definitions on very use-case specific type lists;
reduce the work of resource authors, since they would no longer have to work out the ad hoc type list for every ‘choice’ and Reference() element;
enable the FHIR resources to map much more easily to modern OO languages;
increase software and data re-use, and simplify downstream processing.