The CDA ‘dual-content’ conundrum

In his recent blog post, Eric Browne highlights what may be a problem in the design of the Australian PCEHR, due to the well-known CDA feature allowing dual forms of content – text and structured, supposedly equivalent – to be stored in the one document. If Eric’s examples are representative of real data in the future PCEHR system, there is definitely a problem. In any case, there is a general problem, to do with common misuse of the CDA architecture, which itself should be changed to remove such possibilities.

I have to admit I have never been able to understand the logic of the CDA design. The general idea is that a CDA (Clinical Document, defined by the HL7 Clinical Document Architecture standard) must have ‘narrative’ sections containing text, and can optionally have ‘structured’ data sections containing equivalent structured form of the data. Here it is in more detail, from [1], my bolding:

A CDA document section is wrapped by the <section> element. Each section can contain a single “narrative block” and any number of CDA entries and external references. The narrative block is a critical component of CDA and must contain the human readable content to be rendered. It is wrapped by the <text> element within the <section> element and contains XML markup that is similar to XHTML. The “originator” (defined as the application role responsible for creation of a conformant CDA document) must ensure that the attested portion of the document body is conveyed in narrative blocks such that a recipient, adhering to recipient rendering rules, will correctly render the document. This process ensures human readability and enables a recipient to receive a CDA document from anyone and faithfully render the attested content using a single style sheet.

Within a document section, the narrative block represents content to be rendered, whereas CDA entries represent structured content provided for further computer processing (e.g., decision-support applications). CDA entries typically encode content present in the narrative block of the same section… These entries are derived from classes in the RIM and enable formal representation of clinical statements in the narrative.

While the narrative blocks must always be present, the CDA entries are optional. An originator of a CDA document is not required to fully encode all narrative into CDA entries within the CDA body, nor is a recipient required to parse and interpret the complete set of CDA entries contained within the CDA body. Within an implementation, trading partners may ascribe additional originator and recipient responsibilities to create various entries and may create various templates and/or implementation guides that require the use of various entries. As a result, CDA R2 can be relatively simple to implement (i.e., just narrative blocks) or can be relatively detailed to implement (i.e., with the inclusion of many rich and expressive entries) and provides a migration pathway toward progressively richer computer-processable content.

There are various things to contemplate here. The most obvious is that CDA provides a persistent place for two representations of the same data. While this might be done as an optimisation in some health information database, it doesn’t make sense in an application-level information artefact like a CDA. One would normally have expected that either a) there was structured content which could be rendered by some algorithm into text (a commonplace feature of software applications in all industries) or that there was just b) narrative content. To achieve this, all that is needed is a single information model, that accommodates variable structuring of data (typically in a tree structure of name-value pairs). The simplest case will be a single element containing a potentially large amount of text (+/- some formatting, assuming such markup is allowed). More structuring just means more elements, most likely with the text either represented in native forms (e.g. quantities & units) or simply sliced up into smaller fragments (e.g. patient answers to separate questions).

From the quote above, it is clear that the intention of the CDA design is that the (structured) entries ‘encode’ the narrative content. It is hard to see what this really means. What would make sense would be if the narrative block were a (reproducible) text rendering of the structured data. One reason you might want to do this is to ensure that what was rendered on the screen was guaranteed to be the same no matter where the CDA document was sent. Fair enough. In that case, the rules of CDA would have to be:

  1. Where there is structured data present, the narrative block must contain a faithful and standardised rendering of the structured part into an accepted HTML or XML form that everyone agrees to trust, generated by a published, standardised algorithm (the version of the algorithm probably should be included in the block).
  2. Where there is no structured text, the narrative block stands on its own (but see below)…
  3. Clinical sign-off is done on the narrative block, rendered to the screen (otherwise there is no purpose to the narrative block).

According to these rules, CDAs should be medico-legally safe. Note that a standardised algorithm is required for producing the narrative part. Without this, there is no guarantee that two sites producing the same structured content would generate the same narrative. There are other requirements of the algorithm: it must be ‘complete’ in the sense of rendering all the information present in the structured part to the screen, i.e. not hiding any of it. Further requirements would relate to the details of doing this properly. (Note that this is not the only way to render data and get sign-off – a common alternative is to render structured data in a near-to-native tree structure, with each atom being turned into text by a simple agreed transform. More on this below.)

But it didn’t have to be like this. A safer design for CDA would have been:

  • to have a structured part, in which the primary data are always placed, even if the data are just a single narrative block of text, contained in a single text atom.
  • if ‘standard rendering’ was necessary, the ‘narrative’ block (better to call it a ‘rendered block’ or similar) would contain the standard rendering of the structured section, generated using the published standard algorithm.
  • various exceptions to generating the narrative block would then be allowed:
    • if the structured content were just a single atom of text, the narrative could omitted, and assumed to be the same as this already stored atom of text (but note: one has to be very careful about what a text field contains: it might be some funny XML, HTML, or even worse, some wiki markup, base64 rendered binary or who knows what – therefore ‘text’ would have to be carefully defined);
    • if all parties using the CDA were in possession of the standard rendering algorithm and appropriate software to use it (but this is difficult to know, since the CDA might be stored and used years later by unknown parties);
    • if all parties using the CDA agree that they would render the information in specific ways (this is not so dumb: getting safe signoff of clinical data doesn’t actually rely on the data being displayed in identical ways to all parties, but in the most natural way for the relevant speciality or individual).

Now, I happen to know that some of the key CDA designers are clinicians, and keenly aware of medico-legal and safety issues. I can only conclude that the committee-based standards process is responsible for the strange design of the CDA, which can clearly be easily abused by parties not following rules like the above. It may well be the case that someone in the CDA community has already formulated such rules. If they have, they should be published in an update of the CDA standard as soon as practically possible, including a standardised rendering algorithm.

For now, users of the CDA standard like Nehta and other bodies around the world should create local policy based on the considerations above, and formulate a watertight set of rules guaranteeing safe data.

I personally don’t agree with storing the generated result of such an algorithm at all; this would only make sense if CDAs were to be stored in the very long term, with no assumptions made about future users. But CDA is not a very useful format for that purpose, and was not designed for it. Instead, CDAs should be converted to an EHR information architecture that accounts for longitudinal patient records, distributed versioning, and model-based semantic marking.

[1] HL7 Clinical Document Architecture, Release 2. Robert H. Dolin, MD, Liora Alschuler, Sandy Boyer, BSP, Calvin Beebe, Fred M. Behlen, PhD, Paul V. Biron, and Amnon Shabo (Shvo), PhD. J Am Med Inform Assoc. 2006 Jan-Feb; 13(1): 30–39. Abstract available here.

About wolandscat

I currently work in e-health, and am senior architect of the specifications, designed for semantic interoperability of health information. I also designed the Archetype formalism and model used in openEHR. Outside of work, I am interested in guitar, travel, and philosophy.
This entry was posted in Health Informatics and tagged , , , , . Bookmark the permalink.

3 Responses to The CDA ‘dual-content’ conundrum

  1. Eric Browne says:


    Your amendments to CDA seem eminently sensible and would go some way to improving the CDA standard.
    Of course, they would also need to be backed up by an effective conformance, compliance and accreditation regime, particularly for national programs like NEHTA’s current PCEHR implementation. I think that would take years to establish.

    Unfortunately, down here in Australia, we currently have a government, and a complicit NEHTA, seemingly hell-bent on starting to populate repositories around the country from July 1 this year with all manner of clinical documents based on HL7 CDA in its current guise.

    Your post alerted me to another potential problem with the current PCEHR design, which I don’t think has been taken into consideration, although the final design of the PCEHR has not been published. I think it is possible, even probable, that a clinician may be able to download a CDA document such as a hospital discharge summary, view the narrative, but not have access to one or more computable entries because they were never created in the first place. (This, after all, is a “feature” hailed by many advocates of CDA, since it lowers the bar to creating the document!) It is unlikely that the amount of “coded content” in any one document would be stored as metadata in the PCEHR indexing service, so it would be left to each and every document reading application to stumble on this for themselves. If this turns out to be the design for the PCEHR, it could lead to a pretty unhappy collection of clinicians in a year or two!

  2. Michael Osborne says:

    I can only speak for lab results but this dual reporting issue is not only true for CDA, it is true for all HL7 V2.x ORU messages. A common example would be a Cumulative (Serial) Full Blood Examination. HL7 V2 only allows you to send the current test values (eg. Hb 140 g/L), there is no place to put the previous values in an OBX segment. It is assumed by the lab that the previous values are already in the doctor’s database for trending and graphing. There may be 5 or more previous Hb results in the full text report. There may also be headings, white space or other formatting that are not in the atomic section of the HL7 message. The pathologists go to court on the full text report, not the atomic results – no doctor and few scientists (except in configuration and testing) ever look at the atomic results. I believe the purpose of the atomic data is different to the purpose of the display segment, which is to be rendered to the clinician as representative of the old fashioned paper report that s/he used to see.

    • wolandscat says:

      Hi Michael,

      I can’t answer for the v2 OBX segment uses specifically, but I think if we stray from the concept that the abstract notion of a ‘message’ is an ‘update indicating a change of state (very often of knowledge about the patient)’, then it has to be reasonable for each message to be memoryless. It would be up to a receiving EHR system to retain multiple instances of a test result over time – i.e. the EHR has to be the longitudinal memory device, and the messages the notification-of-change mechanism.

      This clean model is no doubt blurred/confused by the legacy function (apparently) of some labs being expected to provide a history of values, not just the most recently ordered test result. In a hospital with standing orders on in-patients it clearly becomes a question of who is responsible for the longitudinal memory function: the lab, or some receiver system? Historically, the lab was/is computerised, and patient record still on paper or only recently becoming computerised – when an EMR is introduced, the owner of the memory function is unclear, and data clashes are inevitable.

      In some cases, the only clarity on whose job it is to retain longitudinal results might be on a local basis, even department by department. But if IT provision overall is not even conscious of this issue, there is a real problem indeed. I believe if we deviate from 1 message = 1 update on the most recent order, we are in trouble.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s