FHIR Fixes – the choice construct part I

I have posted before on the FHIR ‘choice’ construct, particularly here, where I have explained the problems of the choice construct (essentially: it’s an ad hoc constraint construct that subverts the type system, and doesn’t belong in typed formalisms; none use it, except XSD, well known as a poor formalism of little use for modelling). In this post I look into the details of the problem and some possible solutions.

Much of what I report here is connected to discussions I have participated in on the FHIR zulip site, methodology stream.

TL;DR spoiler: the main analysis here.

As a starting point, let’s look at a typical use of choice in FHIR:

There are actually two uses of choice (or ‘choice[x]’, as it is referred to in FHIR). The first is a fairly typical usage, where a number of alternative data types are proposed to represent what is intended to be the clinically effective time period or time points for an observation (which may be just a time instant, in many cases). The alternatives, in detail are:

// a single point in time 
dateTime;

// represents an interval  
Period {
    start: dateTime;
    end: dateTime;
}

// a timing structure allowing repeats
Timing {
    event: dateTime [*];
    repeat: Repeat [0..1];
}

// ISO8601 string form date/time 
// YYYY-MM-DDThh:mm:ss.sss+zz:zz 
instant;

It is immediately obvious that the general case here is something like a series of date/time points, possibly with some specification of further repeats. Whether the latter really applies to Observations is a question, but in any case, one problem is that the use of choice[x] means that the data receiver has to handle any of the four types, rather than having a single type that accommodates all possible forms of the data.

The second occurrence of choice[x] is Observation.value, and in this case, the alternatives, are nearly everything, but interestingly, some types such as instant and Duration are not present, for no obvious reason. I have already documented a better way to deal with Observation.value here.

Generally speaking, choice[x] is used to provide alternatives for data types, not constructed types, and the intention is to try to cover how different systems represent the same logical thing, such as a ‘date’, a ‘duration’.

The Problem with Choice

From the point of view of orthodox OO modelling, choice[x] isn’t a valid construct (an ad hoc list of types is more or less the opposite of the formal concept of ‘typing’), and is not supported in most modern programming languages (the more theoretical reasons why it should be avoided I documented here). The closest one can find is the C/C++ union construct, a well-known and generally avoided typing black hole, and the XSD choice construct, which as far as I know is the inspiration of the FHIR choice[x] construct. I recently discovered that TypeScript allows typing of the form String | Number, which is a shorthand for defining multiple overloaded functions with isomorphic signatures in a single declaration instead of many. Overloaded functions are a syntactic sugar to deal with different formats for the same data, e.g. receiving an Integer value as a native int32 or as a string.

Mapping it to normal languages requires various hacks, and there is no clean solution. At a practical level, choice[x] is a brittle construct, since the type list might change at any time, instantly breaking all downstream software. At a modelling level it is also a problem, since no ‘minimal type’ is stated that says what is minimally needed for a given attribute (normally stated as an abstract type); instead, long discussions are repeated in different committees on what concrete types might be applicable in some specific circumstances. Clearly such discussions are unlikely to generally arrive at any definitive list of types, since finding all the types for some particular modelling intention depends on having people in the room knowing of all existing representations in use. This still doesn’t guarantee that all possible representations are found.

… versus The FHIR need

The above formal considerations, although well-understood in IT generally, are not a simple fit for the various real requirements in the integration context (and thus in FHIR):

There really are variant representations in back-end systems of many data elements, whose ideal type might be Duration, DateTime, or a List or Interval of the same, etc;
The common need to represent status and/or a related datum, such as for data points like ‘deceased’: sometimes a Boolean is available, sometimes a datum such as a Date (say, of decease), and the Boolean status is inferred from the presence of that datum;
Various needs to represent the time of something in the past or future;
Code / Reference: the common need to allow a Coded term representing the type of something or a Reference to an instance of that kind of value;
Attachment / Reference: the need to allow an attached (inlined) value object or a Reference to the same object.

Note that I have not included the Observation.value case here, because it is an easy fit for orthodox typed modelling – it just requires a supertype of all data types, e.g. DataType, or as we call it in openEHR, DataValue.

Two instances of the second requirement are visible in the Patient resource:

Numerous instances of all four requirements abound throughout the FHIR DSTU4 resources.

I have no disagreement with the above requirements – I have seen them all in the integration environment in EHR-land.

The question is: can FHIR do better than committee-by-committee ad hoc specification?

Understanding the problem in detail

The problem with the current ad hoc choice[x] approach is that it doesn’t systematise anything – every group has to rediscover some approximate combination of data types for numerous data elements. To get an idea of the scale of this in FHIR, see this page, which has the 180 occurrences of choice[x] from DSTU4 listed. Here are the first 20 or so entries, to get a feel:

"Annotation.author[x]": ["Reference", "string"],
"DataRequirement.subject[x]": ["CodeableConcept", "Reference"],
"DataRequirement.dateFilter.value[x]": ["dateTime", "Period", "Duration"],

"Dosage.asNeeded[x]": ["boolean", "CodeableConcept"],
"Dosage.doseAndRate.dose[x]": ["Range", "SimpleQuantity"],
"Dosage.doseAndRate.rate[x]": ["Ratio", "Range", "SimpleQuantity"],
"Population.age[x]": ["Range", "CodeableConcept"],
"SubstanceAmount.amount[x]": ["Quantity", "Range", "string"],
"Timing.repeat.bounds[x]": ["Duration", "Range", "Period"],
"TriggerDefinition.timing[x]": ["Timing", "Reference", "date", "dateTime"],

"UsageContext.value[x]": ["CodeableConcept", "Quantity", "Range", "Reference"],

"ActivityDefinition.subject[x]": ["CodeableConcept", "Reference"],
"ActivityDefinition.timing[x]": ["Timing", "dateTime", "Age", "Period", "Range", "Duration"],

"ActivityDefinition.product[x]": ["Reference", "CodeableConcept"],
"AllergyIntolerance.onset[x]": ["dateTime", "Age", "Period", "Range", "string"],

"AuditEvent.entity.detail.value[x]": ["string", "base64Binary"],
"BiologicallyDerivedProduct.collection.collected[x]": ["dateTime", "Period"],

"BiologicallyDerivedProduct.processing.time[x]": ["dateTime", "Period"],

"BiologicallyDerivedProduct.manipulation.time[x]": ["dateTime", "Period"],

"CarePlan.activity.detail.scheduled[x]": ["Timing", "Period", "string"],

"CarePlan.activity.detail.product[x]": ["CodeableConcept", "Reference"],
"ChargeItem.occurrence[x]": ["dateTime", "Period", "Timing"],
"ChargeItem.product[x]": ["Reference", "CodeableConcept"],
"Claim.supportingInfo.timing[x]": ["date", "Period"],

If one considers the full list carefully, patterns appear. For example, numerous fields that are logically some kind of date/time (e.g. onset, effective_date, occurrence) have a choice of : ["dateTime", "Period"] or ["dateTime", "Age", "Period", "Range", "string"] or something similar. Similarly, other combinations such as ["CodeableConcept", "Reference"], ["Attachment", "Reference"] and ["boolean", "Age", "Range", "date", "string"] or close approximations turn up regularly.

During recent discussions on the ["CodeableConcept", "Reference"] combination, Grahame Grieve produced the list in reversed form, i.e. choice[x] combinations x occurrences and where in the resources. I sorted this into rough groups and formatted it a bit for readability. An abridged version of this is below.

// Reference, value or code
[Reference, CodeableConcept]:       (41)
[Reference, Attachment]:             (5)
[Reference, string]:                 (2)
[canonical, uri]:                    (3)
[Reference, Timing, date, dateTime]: (1)

// Boolean or value
[boolean, CodeableConcept]:          (3)
[boolean, canonical]:                (2)
[boolean, integer]:                  (1)
[boolean, dateTime]:                 (1)

// coded or value
[CodeableConcept, date]:                                (1)
[CodeableConcept, canonical, uri]:                      (1) 
[CodeableConcept, SimpleQuantity]:                      (1) 
[CodeableConcept, Range]:                               (1) 
[CodeableConcept, Quantity, Range]:                     (1)
[CodeableConcept, Quantity, Range, Reference]:          (1) 
[CodeableConcept, Quantity, Range, Reference, boolean]: (1) 
[CodeableConcept, Duration]:                            (1) 

// time-related
[               Period,         dateTime]:                 (15)
[               Period, Timing, dateTime]:                  (7)
[               Period,         date]:                      (8)
[Age,           Period,         dateTime, Range,  string]:  (4)
[               Period,         date,             string]:  (1)
[               Period, Timing,                   string]:  (1)
[               Period, Timing, dateTime,         instant]: (1)
[               Period,                   Range]:           (1)
[Age, Duration, Period, Timing, dateTime, Range]:           (3)
[     Duration, Period, Timing, dateTime]:                  (3)
[     Duration, Period,         dateTime]:                  (1)
[     Duration, Period,                   Range]:           (1)
[     Duration,                 date]:                      (1)
[Age,                           date,     Range,  string,  boolean]: (1) 
[Age,           Period,                   Range,  string]:  (1) 

// Quantities
[Range,        SimpleQuantity]:     (1)
[Range, Ratio, SimpleQuantity]:     (1)
[Range, Ratio, Quantity]:           (1)
[Range, Ratio, Quantity,  string]:  (1)

// Other
[Money, string, unsignedInt]:       (3)

Modelling the Mess of Reality

I commenced an analysis of the above, the running result of which is here on the openEHR wiki. The semantic categories of choice[x] use I arrived at were the following:

Various kinds of time related choice[x]
- when an event occurred (past) – point or interval in time
- age at which an event occurred (past) – absolute point / interval in time | relative duration
- absolute point / interval in time | relative duration
- scheduled (future) time
Quantity patterns
Money
Numeric: integer | real | Range<T>
‘Any’ value patterns
References
- References proper
- Reference or Value

The approach from here, for which I made some initial notes, is to ‘model the mess – cleanly’. In other words, look at the semantics the modellers are trying to achieve, and design types that implement these properly. For example, type could be defined to represent each of the four items under the ‘time-related’ category in the list just above. If modelling were used, then the committees would choose types based on semantic need, rather than just adding another type to an existing ad hoc list. The repeated addition of types to choice lists is almost guaranteed to quickly obscure any earlier attempt to arrive at a meaningful list.

The post following this one will look at modelled solutions, and how choice[x] could be largely removed from FHIR.

One thing I’ll say in conclusion is that although I think the use of choice[x] in FHIR is wrong (as it subverts typing both at a technical level and at a design / thinking level), it’s there because it’s a concrete solution to a bunch of fairly annoying problems. We have solved some of these in openEHR, but the timing and reference ones not cleanly, because they are indeed hard. My proposal is: let’s solve these for the whole sector. I’d be happy to propose some modelled types for all the above use cases in FHIR that we would use in openEHR as well.