FHIR Quality Review Part 1 – Overview

Abstract

As part of translation of the HL7 FHIR resources (DSTU4) to a computable model representation, a detailed review of model semantics was undertaken. Major quality problems were discovered within the resource definitions.

TBC

Motivations

The original motivation for the translation of the FHIR resources to the BMM computable representation used in openEHR, HL7 CIMI and other e-health environments was twofold. Firstly, it was to make FHIR resources available in a computable form within non-FHIR model environments, so as to increase the chances of integration with other artefact types e.g. those defined in the openEHR platform. A second motivation was to discover whether a BMM + ADL representation of FHIR and profiles could provide a more effective means of ‘profiling’, a model specialisation function currently represented by the HL7 FHIR StructureDefinition resource.

It had been assumed that the resources themselves would be relatively coherent, at least within tightly related sub-groups (‘Medication’ etc) and that basic good modelling practices would have been followed, enabling trouble-free representation in standard formalisms. Neither of these proved to be the case and the exercise exposed major quality problems in the FHIR resources, which are the primary topic of this review.

One of the problems historically in achieving intended goals in e-health, notably interoperability, computability of patient data, patient-centric health records and so on, has been the quality of the de jure standards, including those from HL7, ISO, IHTSDO and other sources. In other domains, most technical standards are conceived, developed and made economically viable by companies and then brought to standards organisations in working order, sometimes in competition with similar offerings from other companies. The work of standardisation is then one of either choosing between working options or facilitating industry to produce a new working compromise version.

In e-health, this method is not generally used, and standards are generally developed in a committee environment and issued with minimal testing or implementation (OMG standards are an exception to this rule). The problems with this design-by-committee approach are documented in previous blog posts. This author had assumed that due to frequently advertised ‘hackathons’ and demonstrable commitment to implementation, that FHIR might be different. This review demonstrates otherwise, and indeed, the FHIR resources appear to be one of the strongest exemplars of design-by-committee yet encountered in e-health.

Since public health budgets are not limitless, it is clearly of great importance that official standards are of high quality, fit for purpose, economically implementable, scalable, and serve the long-term purposes of their ultimate users, in this case, patients, healthcare professionals, and provider institutions. Poor quality standards create unnecessary costs and long delays in bringing available technological benefits, and (as the past has shown), can derail an entire industry for a decade or more.

Secondly, since it appears that some e-health authorities, ministries of health and other jurisdictional entities have succumbed to the FHIR hype ([HealthcareITtoday, 2016], [SansoroHealth, 2018], [FHIR Dev Days 2018 report]) and uncritically adopted the FHIR standard without review, it is likely that many health systems will be stuck with it for some years to come. Therefore, in the interests of minimising problems and lost time, a proper critical review prepares the way for the development and publication of pro-active measures to correct or compensate for its shortcomings. The alternative would appear to be a long period during which numerous implementers around the world waste time and resources trying on isolated remediation attempts, and in the process, exacerbate the interoperability problem FHIR claims to address.

It is with these issues in mind that the current review was conducted.

Methods

The HL7 FHIR resources were converted to a formalism known as Basic Meta-Model (BMM), which is published as an open specification by the openEHR Foundation. BMM is an object-oriented formalism, conceptually similar to UML (minus the diagramming), but with a fully formal definition. It has been in use since 2009 within the openEHR ADL Workbench, since about 2011 in HL7 CIMI, and since about 2016 in openEHR Archie (ADL2/BMM libraries and tools) and commercial tools including Marand ADL-designer and Veratech LinkEHR.

The BMM file encoding the FHIR resources is available within the openEHR reference-models Git repo on Github. It compiles within the ADL Workbench, and the views of the translated model shown in this document are from this tool.

Scope

The scope of this review is limited to the structural and semantic qualities of the FHIR resources understood as ‘models’ in the usual sense. Other aspects of FHIR, such as the design of URIs, use of REST or terminology service are not covered.

About Formalisms

At the outset a few basic formalism-related concepts need to be understood, because they come into play in the analysis of the FHIR ‘formalism’ and how it maps to software engineering formalisms.

Object-Oriented Languages

The type of formalism in use in ,mainstream software development today is the object-oriented (OO) language, with or without functional facilities (curried functions, lambdas etc). Programming languages such as Java, C#, Python, Ruby, TypeScript, C++, PHP as well as the UML and openEHR BMM fit this description. While supporting encapsulation of data and behaviour, like other module-based formalisms, the defining characteristic of an OO language is inheritance, which is a facility enabling the progressively specialised definition of classes down lineages, by inheriting from more basic ancestors. Genericity (template classes) may also be supported. Classes provide the definition of types, which are in turn templates for data instances.

Inheritance, and therefore OO, are additive paradigms in the sense that any class definition adds to and/or overrides elements of its inheritance ancestor(s). Any specialised class thus contains differential elements with respect to its ancestors. The effective definition of the type for a class is arrived at by flattening the definition elements (data, methods, constants etc) down an inheritance lineage leading to the class.

Inheritance also leads to polymorphism, which is the ability for dynamic attachment of instances to references of more general statically defined types, e.g. instances of Circle or Square to attach to an attribute shape of type Shape, where Circle and Square are classes inheriting from Shape.

Some practical consequences of object-oriented modelling formalisms include:

  • ‘base’ classes, i.e. top-level and near top-level classes, are very general, and should contain few features (attributes and methods);
    • -> a well-known anti-pattern is the ‘god’ class, filled with attributes relating to more specialised types, e.g. a class Animal that has features like Wingspan, TuskLength and EggSize which should only belong to classes like FlyingAnimal etc. This blog post explains consequences for e-health standards.
  • where an attribute or function could return objects of multiple types at runtime, in the design-time model it must be specified as being of an abstract parent type of the intended concrete types allowed at runtime.
    • a well-known anti-pattern is for pseudo-parent types to be created that do not define any coherent common semantics, to enable runtime substitution of objects of arbitrary types.

Constraint Formalisms

Constraint formalisms are those that define statements or structures that apply to artefact expressing in modelling formalisms, usually to reduce the instance space according to specific semantic or domain rules. The OMG’s Object Constraint Language (OCL) is one relatively well known such formalism, and allows class invariants, and routine pre- and post-conditions to be applied to UML model elements. W3C’s Xquery is another constraint formalism, that works by progressively applying constraints to a data set (XML content) in order to generate a final result set matching specific criteria.

openEHR’s ADL (original version 2002; adopted as ISO 13606-2 in 2008, 2019) is another constraint formalism that includes the equivalent of invariants, as well as structural and value-based constraining. It operates on UML class models, although concrete implementations today all use BMM-expressed models. ADL-expressed archetypes can be understood as something like classes or types at the domain level, in the sense that one object model class, say Observation can be constrained into archetypes for hundreds of specific kinds of observations such as vital signs, eye exam, lab tests and so on.

Constraint formalism can be understood as reductive or subtractive, since they generate refined variants of a basic model concept by adding constraints which reduce the set of data instances that will match a definition. For example, only a few Observation instances in a database will conform to an ADL blood-sugar archetype, based on Observation.

Relational Databases and Queries

Relational databases consist of tables, which are multiple rows of a typed tuple (the set of column name:type definitions). A row in a table contains data roughly equivalent to an instance of a class in an object model. Fields may be values or relations, specified as primary/foreign key pairs. The main characteristic we are interested in here is how querying works. The essential structure of an SQL query is given by the standard syntax:

SELECT cols FROM some_table WHERE value_constraints

The first two parts generate a view, which is a projection defined by the SELECTEDed columns (cols) from the total available columns, i.e. the original table definition. Imagine there is a table with 26 columns, each named with the letters of the alphabet, from ‘a’ – ‘z’ – this is the ‘some_table’ argument in the SELECT / FROM / WHERE statement above. The SELECT part is a particular subset of columns, say a, c, h, i, j. The WHERE part determines what rows will be included in the result, but the important part is that the SELECT project defines a view on the original table. This is the primary way new instance definitions (which we might think of as new ‘types’ in OO theory) are derived from existing ones (tables, or previously generated views). This is a subtractive paradigm – each new view is a reduced version of its predecessor.

A SQL query as a way of generating a new ‘type’ is thus a very different paradigm from object-orientation, which is additive, as described above. Confusion between relational thinking and object-orientation was one of the main reasons for the problems in the HL7v3 RIM, and the approach to generating dependent models, i.e. RMIMs and CMETs. Specifically, the RIM was modelled in UML and presented as an object-oriented model, but most of the RIM classes were ‘god classes’, and were treated like relational tables as if they were a basis for generating both additive classes – RIM sub-classes and also subtractive views – the RMIM message definitions.

Mixed formalisms

Some formalisms contain a mixture of additive and subtractive semantics, which is usually a recipe for problems. Probably the best known is W3C XML-schema, which supports two kinds of inheritance, ‘restriction’ (subtractive logic) and ‘extension’ (additive logic). A series of specialised schemas may use both kinds, resulting in schemas that are hard to analyse in modelling environments as well as for run-time use.

Making things more difficult is the fact that the rules for inheritance of tag attributes are different for elements (sub-objects). For these reasons, XML-schema is not considered an OO formalism, nor generally used as a primary modelling formalism in the IT industry, but rather generated from other models as a way of concretely defining XML document contents.

XML schema also supports the notion of arbitrary type choice, which bypasses any concept of inheritance-based typing, by allowing the type of an element to be one of an arbitrary set. This kind of modelling results in significant extra complexity in software that deals with XML documents, in the form of if / then / elseif logic chains with minimal logic re-use via polymorphic invocation.

Formalism Layering

Formalisms that are additive or subtractive along the lineage of definition specialisation (the inheritance lineage in the OO case) both have their uses, including potentially within the same model ecosystem. A general rule for success to using both is to separate their artefacts to different modelling layers or dimensions, so that representation and processing of any one dimension of model uses only one kind of logic. Formalisms that enable the definition of entities with mixed additive and subtractive logic at the finest level will run into trouble in implementation.

In the UML framework, UML (additive OO logic) and its constraint counterpart OCL are clearly separated within UML models: the UML part can easily be processed on its own, with OCL statements being processed separately with respect to the UML structures they annotate, or even ignored (as happens in most UML tools).

In the openEHR model framework, information models are defined in UML and also represented for machine processing in BMM. A separate layer of models called archetypes is defined in the Archetype Definition Language (ADL), which is a constraint-logic based formalism. An archetype is a pure set of constraints applied to constructive type definitions from the underlying information model. This enables coherent tools to be written for each layer, and for the logic that applies to the different levels to be easy to understand by modellers concerned with each one.

FHIR: General Picture

The HL7 FHIR resources consist of over 200 definitions, apparently defined in a custom formalism from which other formal views (generated UML, XML, JSON etc) are computationally derived. They may be considered as something like classes in an object-oriented model. The issues discovered in the review fall into the following categories:

  • Part 2 – design-related: issues related to the lack of design methods or philosophy;
  • Part 3 – formalism-related: systemic issues as a consequence of the FHIR native formalism, leading to various systemic anti-patterns;
  • Part 4 – semantics: problems relating to use of terminology, mixing of semantic categories;
  • Part 5 – inconsistency: structures and semantics that are inconsistent across the resource base.

The overall impression from reviewing all of the resources as one would review a ‘model’ is one of major inconsistency and semantic incoherence. The resources cannot be said to constitute a ‘model’, since none of the usual inheritance, encapsulation of common elements or typing practices has been used. Indeed the FHIR resources appear to be the result of separate committees working with almost no cross-referencing, common semantic rules, or common modelling approach.

The following sections document the various issues under the above categories.

Advertisements