In his recent blog post, Eric Browne highlights what may be a problem in the design of the Australian PCEHR, due to the well-known CDA feature allowing dual forms of content – text and structured, supposedly equivalent – to be stored in the one document. If Eric’s examples are representative of real data in the future PCEHR system, there is definitely a problem. In any case, there is a general problem, to do with common misuse of the CDA architecture, which itself should be changed to remove such possibilities.
I have to admit I have never been able to understand the logic of the CDA design. The general idea is that a CDA (Clinical Document, defined by the HL7 Clinical Document Architecture standard) must have ‘narrative’ sections containing text, and can optionally have ‘structured’ data sections containing equivalent structured form of the data. Here it is in more detail, from , my bolding:
A CDA document section is wrapped by the <section> element. Each section can contain a single “narrative block” and any number of CDA entries and external references. The narrative block is a critical component of CDA and must contain the human readable content to be rendered. It is wrapped by the <text> element within the <section> element and contains XML markup that is similar to XHTML. The “originator” (defined as the application role responsible for creation of a conformant CDA document) must ensure that the attested portion of the document body is conveyed in narrative blocks such that a recipient, adhering to recipient rendering rules, will correctly render the document. This process ensures human readability and enables a recipient to receive a CDA document from anyone and faithfully render the attested content using a single style sheet.
Within a document section, the narrative block represents content to be rendered, whereas CDA entries represent structured content provided for further computer processing (e.g., decision-support applications). CDA entries typically encode content present in the narrative block of the same section… These entries are derived from classes in the RIM and enable formal representation of clinical statements in the narrative.
While the narrative blocks must always be present, the CDA entries are optional. An originator of a CDA document is not required to fully encode all narrative into CDA entries within the CDA body, nor is a recipient required to parse and interpret the complete set of CDA entries contained within the CDA body. Within an implementation, trading partners may ascribe additional originator and recipient responsibilities to create various entries and may create various templates and/or implementation guides that require the use of various entries. As a result, CDA R2 can be relatively simple to implement (i.e., just narrative blocks) or can be relatively detailed to implement (i.e., with the inclusion of many rich and expressive entries) and provides a migration pathway toward progressively richer computer-processable content.
There are various things to contemplate here. The most obvious is that CDA provides a persistent place for two representations of the same data. While this might be done as an optimisation in some health information database, it doesn’t make sense in an application-level information artefact like a CDA. One would normally have expected that either a) there was structured content which could be rendered by some algorithm into text (a commonplace feature of software applications in all industries) or that there was just b) narrative content. To achieve this, all that is needed is a single information model, that accommodates variable structuring of data (typically in a tree structure of name-value pairs). The simplest case will be a single element containing a potentially large amount of text (+/- some formatting, assuming such markup is allowed). More structuring just means more elements, most likely with the text either represented in native forms (e.g. quantities & units) or simply sliced up into smaller fragments (e.g. patient answers to separate questions).
From the quote above, it is clear that the intention of the CDA design is that the (structured) entries ‘encode’ the narrative content. It is hard to see what this really means. What would make sense would be if the narrative block were a (reproducible) text rendering of the structured data. One reason you might want to do this is to ensure that what was rendered on the screen was guaranteed to be the same no matter where the CDA document was sent. Fair enough. In that case, the rules of CDA would have to be:
- Where there is structured data present, the narrative block must contain a faithful and standardised rendering of the structured part into an accepted HTML or XML form that everyone agrees to trust, generated by a published, standardised algorithm (the version of the algorithm probably should be included in the block).
- Where there is no structured text, the narrative block stands on its own (but see below)…
- Clinical sign-off is done on the narrative block, rendered to the screen (otherwise there is no purpose to the narrative block).
According to these rules, CDAs should be medico-legally safe. Note that a standardised algorithm is required for producing the narrative part. Without this, there is no guarantee that two sites producing the same structured content would generate the same narrative. There are other requirements of the algorithm: it must be ‘complete’ in the sense of rendering all the information present in the structured part to the screen, i.e. not hiding any of it. Further requirements would relate to the details of doing this properly. (Note that this is not the only way to render data and get sign-off – a common alternative is to render structured data in a near-to-native tree structure, with each atom being turned into text by a simple agreed transform. More on this below.)
But it didn’t have to be like this. A safer design for CDA would have been:
- to have a structured part, in which the primary data are always placed, even if the data are just a single narrative block of text, contained in a single text atom.
- if ‘standard rendering’ was necessary, the ‘narrative’ block (better to call it a ‘rendered block’ or similar) would contain the standard rendering of the structured section, generated using the published standard algorithm.
- various exceptions to generating the narrative block would then be allowed:
- if the structured content were just a single atom of text, the narrative could omitted, and assumed to be the same as this already stored atom of text (but note: one has to be very careful about what a text field contains: it might be some funny XML, HTML, or even worse, some wiki markup, base64 rendered binary or who knows what – therefore ‘text’ would have to be carefully defined);
- if all parties using the CDA were in possession of the standard rendering algorithm and appropriate software to use it (but this is difficult to know, since the CDA might be stored and used years later by unknown parties);
- if all parties using the CDA agree that they would render the information in specific ways (this is not so dumb: getting safe signoff of clinical data doesn’t actually rely on the data being displayed in identical ways to all parties, but in the most natural way for the relevant speciality or individual).
Now, I happen to know that some of the key CDA designers are clinicians, and keenly aware of medico-legal and safety issues. I can only conclude that the committee-based standards process is responsible for the strange design of the CDA, which can clearly be easily abused by parties not following rules like the above. It may well be the case that someone in the CDA community has already formulated such rules. If they have, they should be published in an update of the CDA standard as soon as practically possible, including a standardised rendering algorithm.
For now, users of the CDA standard like Nehta and other bodies around the world should create local policy based on the considerations above, and formulate a watertight set of rules guaranteeing safe data.
I personally don’t agree with storing the generated result of such an algorithm at all; this would only make sense if CDAs were to be stored in the very long term, with no assumptions made about future users. But CDA is not a very useful format for that purpose, and was not designed for it. Instead, CDAs should be converted to an EHR information architecture that accounts for longitudinal patient records, distributed versioning, and model-based semantic marking.