Analytic Framework for the EHR

[material copyright (c) 2014-2016 Thomas Beale]

Healthcare and related research areas are complicated domains to informatise. Sometimes it is hard to see the wood for the trees. Here I state what I consider to be a reasonable underlying analytical framework for health information systems and the EHR, from which system and information design should be derived. The motivation is to address recurring difficulties and edge cases not dealt with properly by any current HIS or EHR design framework or standard. What follows below is an emerging new version of the design underpinning the original openEHR EHR architecture. In the below:

  • HCP = healthcare professional
  • HCF = health care facility (i.e. provider facility)
  • HIS = healthcare information system
  • EHR = electronic health record, in its broadest conception as an informational recording device to support clinical care. Here we use ‘EHR’ as a synonym for any kind of ‘EMR’, ‘EPR’, patient journal, dossier, etc.

Useful references:

Many of the ideas presented below are recapitulations of what appears in the above references, with adjustments due to recent evidence from the field. The ‘Applied Ontology’ reference in particular contains an exposition of most of the theoretical basics required to build a disciplined approach to health domain ontology and terminology. [Note: I have not included detailed citations where they would belong in the text below.]


The background to this framework is over a decade spent on the openEHR architecture, now implemented in various places around the world, as well as innumerable discussions with experts on every aspect of health information systems. As long ago as 2004, I attempted an epistemological analysis of EHR information, based on the idea that statements in the EHR represent beliefs and statements on the part of the author. This was founded on the idea of taking Alan Rector’s dictum about the EHR as being a ‘record of what was seen, thought and said’, and adding a bit of epistemological framework. This evolved into the Clinical Investigator Ontology (MedInfo 2007). This included categories like ‘Observation’ (meaning: recording of what was observed), ‘Assessment’ (recording of clinical opinions) and ‘Instruction’ (recording of orders). These categories cover most health information adequately, but but don’t deal with certain things properly:

  • Observations are more or less taken to be true statements about reality, while Assessments are opinions about reality. The former are ontological statements while the latter appear to be epistemic claims, but might also be ontological. For example, a differential diagnosis that includes coeliac disease may initially be considered no more than conjecture about the patient, whereas a confirmed diagnosis of the same is ontological in the sense that it is taken by healthcare professionals as well as the patient to mean that the patient really does have the auto-immune condition we call coeliac disease actively in them (normally backed up by lab evidence).
  • some types of information provided by the patient are not dealt with very well. Patient-provided information appears to have different epistemic and/or ontological status, depending on what it is. A patient-measured blood sugar (at least for a diabetic patient) is as good as any physician’s, and can normally be regarded as a direct reflection of reality. Patient statements about previous diagnoses might or might not be treated as ontological. A patient who claims to be diabetic and knows all about diabetic self-care, and has insulin is undoubtedly really diabetic. A patient claim never to have been diagnosed with depression or schizophrenia would not by default be treated as being true in some cases, but probably would be in others. Patient statements about the subjective experience of pain or fatigue have an unknown relationship to reality, but are usually assumed to be true simply due to the lack of alternative sources of information.
  • some types of information like ‘patient consent’, ‘patient preferences’ and various kinds of so-called ‘administrative’ information don’t have a comfortable home in the current ontology underpinning openEHR.
  • things like ‘care plan’ and ‘concern’ need models of their own, rather than being seen simply as linkages or indexes of observations, diagnoses, orders etc.

Over some years of use of openEHR it has become clear that a more solid philosophical and ontological basis is needed to think about health information properly. Concretely, a systematic analysis of both healthcare (in the real world) semantics and health information semantics is critical for success, if we define success the ability to do something like routinely use health data for inferencing. What philosophy provides is disciplined ways of understanding the cognitive sphere (where we talk, think, and record thoughts, for example in an EHR) distinct from reality. In particular, a realist (and fallibilist) stance is required to properly tease out different types of statements about real things. The realist approach, applied to health means understanding health information as information about entities in the real world of healthcare. How to design health information and HISs depends heavily on understanding the latter reality, which is where an ontological approach comes in. One source of difficulty is that healthcare has as a central part of its own reality the information that talks about other parts of healthcare reality – i.e. documented observations, diagnoses and so on. Otherwise stated, ‘clinical statements’ are real things too, and become part of the healthcare reality as it unfolds in time.The general answer to this situation is that an epistemological aspect to the analysis of health information is unavoidable. Thus, an overall ‘semantics of health information systems’ must comprehend at least scientific realism, an ontological descriptive approach to reality, and an epistemological stance on health data. This is what I outline below.

A Realist Basis

The first commitment made here is that we understand the world from a scientific realist viewpoint, meaning that we accept that there can exist items of health information that really do refer to things in reality (and of course, that there is a ‘reality’ apprehendable by us), in an acceptably faithful fashion. More precisely, it is assumed that there is information about ‘individuals’, i.e. specific patients, hearts, injuries etc, as well as ‘knowledge’ about classes of patients (e.g. diabetics), conditions (e.g. that we have a definition of ‘ischaemic heart disease’) and treatments. The former are found inside patient health records, the latter inside terminologies, ontologies and medical textbooks.

Secondly we assume a certain basic structure of the health domain: namely that the ‘healthcare’ domain is about providing ‘care’ to a subject of care, which we call the ‘patient’, and that the ‘health research’ domain is distinct and has as its interest extracting knowledge from data about subjects, obtained from various sources, e.g. point of care (EHRs), clinical trials, epidemiological monitoring. Although we will normally talk about human healthcare here, there is very little that would not apply to non-human healthcare, and much that would apply to other kinds of maintenance and monitoring, e.g. building management etc.

The general shape of healthcare reality consists of the following.

  • The patient ‘system’: the reality of the patient as a physical subject of care, including:
    • evidence of problems & issues:
      • subjective experiences
      • objectively observable signs and symptoms
    • goals and needs – the patient’s requirements that drive anything to be done about his/her issues
    • objectively existing processes in the patient – real disease and other processes such as insulin insufficiency or pregnancy that have a trajectory in time and consequences for the patient
  • The healthcare system: including entities such as:
    • requests by the patient
      • for care, either by appointments, presenting at A&E, or ongoing care
      • preferences on care provision
    • administrative / management / logistic events
      • booking, admission, discharge, and referral
      • obtaining of patient consent
    • clinical care events (including acts of carers and patient) e.g.
      • obtaining of patient history
      • determining current clinical need
      • observations of the visible signs of processes in the patient
      • assessments of what processes the observations imply are occurring
      • plans based on goals, needs, and assessments
      • clinical actions, e.g. drug administration
      • patient management activities
    • workflows and order processes within which the above occur
  • The health information system environment: the reality of HIS & EHR system(s):
    • committal of information to EHR
    • updating of information in EHR
    • deletion of information in EHR
    • query / report from EHR
  • The healthcare funding environment, which interacts with the healthcare delivery system, primarily in the terms of a more or less fine-grained relationship between real events, actions and resources, and the funding that allows them to happen.
  • Health / medical research ‘systems’: where drug trials, public health operations, and other non-care delivery activities etc take place

Kinds of Entities in Reality

The above is a very broad description of reality in the clinical care domain. To be more precise computationally we need a way of talking about the kinds of entities there are. The Basic Foundational Ontology (BFO) provides such a basis. It makes a top-level division of the world into ‘continuants’ and ‘occurrents’. A continuant is something whose identity and description does not rely specifically on time, including material entities, such as a person. a person’s heart, blood, ‘the abdomen (region)’ and so on. Such entities are (more or less) the same from one instant to the next. An occurrent is any entity whose description is inherently time-related, and includes the following kinds of things:

  • temporal regions (time periods)
  • processes, including complex process with multiple actors and material and other entities
  • events – understood as a change in the state of one or more continuant(s) in a temporal region

I would add ‘state’ to this, where a state is understood as the state of one or more continuants at a point or interval in time. Both occurrents and continuants are understood in BFO to have ‘properties’, including things like ‘qualities’ (‘severe’, ‘red’), roles, dispositions (innate capacity), function (overt adaptation for an intended purpose) and relationships with other occurrents and continuants (in part-of, participation and other relationships).

Beneath an ontology like BFO there are domain general and domain specific ontologies, whose entities relate via the is-a relationship as well as other relation types, to BFO entities. Ontologies like OGMS for example provide generalised medical concepts like ‘disease’ that relate to the ‘process’ and ‘disposition’ categories in BFO via the is-a relationship.

[TBD: more detail here]

The Biomedical / Clinical Point of View

Within the healthcare reality, all kinds of phenomena may be observed, just as in any other part of reality. In recording any of it for healthcare use, we are positing a specific point of view, which entails specific aspects of the reality being of interest, and others being ignored. To see that this is so, imagine two other notional points of view of what goes on in say a hospital.

  • A ‘gods-eye’ version of reality, in which every single physical and biochemical detail is recorded, right down to sub-atomic level. We can think of this as a 4-dimensional continuous sub-atomic CAT scan of the hospital. God’s disk drive would clearly need to be of epic proportions to contain all the data, and his search engine would need to be equally titanic, in order to find things of interest. Nevertheless, with this sort of data, a complete reconstruction of the reality in all its detail could in theory be created.
  • A socio-cultural version of events, where sociologists observe the goings on in the hospital and record what they find interesting. This ‘sociology record’ is likely to contain all kinds of facts and ideas about hierarchical relationships, how people talk to each other, hospital decor, what people are wearing, and so on, but probably not that much in the way of useful lab data, diagnoses or care plans.

A clinical record of healthcare reality is a distinct point of view. It can’t include the position and velocity of every quark as per the gods-eye version, for practical reasons, and it will clearly be different from the socio-cultural record. It is oriented to serve the needs of healthcare delivery, and it thus chooses certain aspects of reality to record as information.

We need to understand the filter on reality implied by this point of view because it will colour the entire conception and realisation of information systems in healthcare. Similar statements can be made about information systems used in biomedical research. Any characterisation of this filter is based on the general idea that healthcare is about detecting, understanding and addressing ‘health concerns’ in patients. Thus the overarching motivation is a ‘problem-solving’ one, i.e. the activities of healthcare are about generating solutions to needs posed by patients. There is a corresponding overarching conceptual schema to the activities of healthcare, which we can understand as ‘case matching’, or more generally, ‘inductive inferencing’, within the context of prior knowledge (i.e. the canon of medical and clinical science to date).

The general flow of things in clinical medicine is as follows:

  • Request: the healthcare system receives a request from the patient (e.g. in a patient encounter at the local doctor’s surgery).
  • Goal-setting: although often implicit, there is a step in which a target situation for the patient is determined. If the patient has a bad cold, it’s not mentioned because both patient and doctor know what it is: to get back to ‘normal’. In many cases, goals are explicitly set, e.g. BP for a hypertensive patient. The goal is crucial in defining when/if intervention can cease, or assume a maintenance state.
    • Target-setting: in many cases, physicians distinguish between subjective goals of the patient (‘feeling less breathless’, ‘feeling healthier’) and objective targets that can be measured, that act as surrogates for the goals, e.g. body weight, and blood pressure. This is not strictly necessary, and certain kinds of healthcare intervention stop simply when the patient says ‘I feel great doc, thanks for the help’.
  • Assessment: physician(s) try to determine what the problem really is, which actually means trying to match the specifics of the problem to known case types, i.e. constellations of specifics (typically signs and symptoms) that have known treatment or management methods available.
    • Evidence-gathering: to do this, they use various methods, including questioning the patient, physical examination, ordering other kinds of investigation.
    • Positing of Candidates: early on, the physician usually has some idea of what is going on, since most patients have problems that resemble known problems of many other patients. At this stage, there may be a differential diagnosis (more than one candidate) or a ‘working diagnosis’ (a strong candidate).
    • Refinement: as more evidence becomes available, they may refine their search by choosing to look for specific kinds of evidence that appear to be relate more closely to the case corresponding to the working diagnosis/es. They may also also refine in terms of biological specificity, e.g. to determine the exact type of anti-body, cancer or parasite. On the other hand, specificity may not matter beyond a certain point, if all the cures are the same, regardless of e.g. specific type of virus or parasite.
    • Diagnosis: at some point, the case-matching process produces a clear candidate which the physician attaches to the patient. Here we understand ‘diagnosis’ to mean attaching of any such label to the patient, even if it is just ‘rhinovirus’, or ‘stomach bug’, that is deemed good enough to act upon.
  • Planning: creation of a care plan relating to the condition, or modification of existing care plan(s) depending on whether the condition is considered major or minor.
  • Intervention: the next phase is to commence an intervention, based on some standard therapy associated with the case type of the diagnosis, normally adjusted to the patient. This may be anything from surgery, to patient education (stop eating sugary foods, start exercising).
    • Evaluation: the intervention continues until an evaluation(s) determines that the targets and goals have been met. For chronic diseases, Evaluation may be a life-long process.
    • Monitoring: to enable Evaluation, a monitoring period may be needed in which evidence of progress is gathered for use in the Evaluation step. This may also be life-long in the case of a chronic condition.

We can informally distinguish a small number of categories of information needed to document the above process adequately:

  • observation statements – statements about state or events to do with the patient
  • opinion / inference statements
  • order statements – request to some party to perform some action
  • action statements – record of a clinical action having been performed
  • administrative statements – we can assume numerous peripheral information such as bookings, consent forms and so on, are created during the process, in order to make it happen.

A more precise classification of clinical statements is developed later.

Note that we don’t treat opinion statements in the same way as statements about state or event, even though technically speaking, ‘opinion-forming’ is a documentable act (and therefore an event) itself. However, the act of diagnosing, as opposed to the diagnosis is generally of limited interest (assuming of course that the diagnosis itself includes its reasons and justifications).

This description is of course an abstract model of what really happens. Sometimes in real life, good assessments are hard to make (e.g. many mental health conditions, certain gastro-intestinal problems); in other cases, intervention goes wrong, and has to be stopped. Wrong diagnoses are often made, so the process may have to restart. Nevertheless, the above provides a model for understanding what kinds of activities are likely to occur, and therefore what kinds of information could possibly be recorded, which is our core aim.

The process as described relies completely on the availability of prior medical knowledge, and in particular, a repository of case descriptions that identify signs and symptoms of pathological conditions, and associate them with a) an underlying etiology (causal explanation) and b) known treatments.

This ‘repository’ allows us to say something about non-clinical medicine, which is its creator. In contrast to clinical medicine and its inductive inferencing approach, medical research proceeds using a more general scientific method. It is beyond this discussion to go into much detail, but we can say that the process involves a cycle of evidence-gathering, hypothesis formation, hypothesis testing and refinement. Philosophical debates still occur about what exactly is going on, and various models of doing science have been described, including things like the deductive-nomological model (Hempel/Popper), as well as more recent inductive models and statistical models. Many of these fail to deal properly with temporal-causal relations. A modern view appears to be converging on two points:

  • the need for both an ontological and an epistemological view of ‘knowledge’ as it is created during the scientific process;
  • a proper accounting of causal relations.

The important point here is that in both clinical medicine and biomedical research, there are strong models of process that heavily influence the aspects of reality seen as relevant to each endeavour. More precisely, many aspects of reality are not seen as relevant to the execution of a clinical process or research project. Which things are relevant is of crucial importance in characterising clinical information. Much of the remainder of this discussion will justify choices made in models of clinical information in terms of these process models.

In both clinical medicine and medical research, an overriding tension between the ontological and epistemological is always present, since at any point in the formation of a theory (e.g. cause of certain kinds of cancer), there is always a question of the quality of knowledge gathered so far. Any model of information created in these domains cannot therefore just make claims about reality, it needs to account for epistemic status of these claims. In looking for models of information in medicine, there needs to be a language that can express ‘facts’, which is ontologically based, and ways of indicating epistemic status of such facts.

A corollory of the need for a ‘point of view’ is that while biomedical ontologies may take some account of the general domain of interest (e.g. by describing diseases of patients in case, but not the decor of hospitals in which they are cared for), they will nevertheless not capture the same specificity of viewpoint as the information created in clinical and other biomedical areas of activity. This tells us that the models of information in these pursuits are not simply the ontologies themselves, but something different and formally related.

Health Information – a brief Analysis

The main task here is to develop a conceptualisation of the semantics of EHRs / HISs, such that real HISs can be built or modified to behave in the way we intend. Doing that means having an understanding of healthcare reality, and a distinct understanding of health information created in it. The starting point for a formal characterisation of the EHR or other kind of HIS is the idea that health information – here called ‘clinical statements’ – are about things in the health reality outlined above. This introduces an epistemic aspect: every statement in an EHR or other HIS has a certain correspondence with reality (generally the patient), i.e. it makes (implicitly or overtly) a claim about reality. We need to know in the EHR what this is, so that the meaning of the information is not mistaken, potentially in a catastrophic way.

The way I will proceed below is to start from the central content of clinical statements, and work outward through the various layers of ‘context’.

The Question of ‘Context’

The word ‘context’ is almost universally (ab)used in health informatics to refer to almost anything that can be said about the location, situation or other circumstances around a focal act, event or observation. The problem is that there is no standard meaning for ‘context’, and it encompasses radically different categories of things, including location in space/time, participants of procedures, different levels of causes or purposes (sometimes even outnumbering Aristotle’s four types of cause), audit information of data committed to the EHR, and many content-specific facts, e.g. position of patient during BP measurement, order workflows and so on. One of the key purposes here is to tease out this mess into clear categories or features of both healthcare reality and information system data. Accordingly I don’t use the word ‘context’ in anything other than a completely informal sense, and I prefer the term ‘interpretive context’ meaning: items of information peripheral to the focal items, needed to accompany the latter in order to constitute a whole statement.

What do the data say?

A clinical statement at its core could be understood as one or more assertions of either the form entity predicate determiner, or the form entity relation entity. Let’s think of each of these assertions at a practical level as ‘data points’. At the outset, we assume that more than one data point is needed to characterise the entity we want to describe, i.e. the general case is a ‘data group’. A proper discussion of this would take many pages, but the essential reason is that to characterise a complex entity – generally a state or event – such as the blood pressure in the systemic circuit or a colonic polyp, we choose a specific set of items that stand for a much more complete, fine-grained description of the entity in question.

In other words, clinical statements consist of something like the fewest representative data points that adequately stand for a gods-eye view right down to the sub-atomic level of the entity being described. The particular choice of data points is often non-obvious, but well known to practicing physicians as being items that:

  • are effective surrogates for the entity they want to describe, e.g. pulse is an effective surrogate for heart rate, most of the time
  • may be surrogates whose sensitivity to an underlying cause means they will obtain abnormal values early on, and act as indicators
  • are quicker to obtain than via other possible methods
  • are cheaper to obtain than via other possible methods
  • are less invasive or injurious to the patient

The core content of a clinical statement can thus be thought of as an optimised surrogate description of the entity of interest. This is a point of more importance than it might first appear, because the data points comprising actual clinical statements can easily be a non-obvious subset of the possible things that can be said about the entity, in the textbook sense. The practical outcome of this is that models of clinical statements are not generally the same as textbook or ontological descriptions of the same thing. Worse, the relationship does not appear to be a systematic one, since factors of patient safety, optimal speed and economy are in play. An important conclusion that we can reach at this point is that the ontologies of patient and clinical reality are generally unlikely to serve directly as models of clinical data, and that instead, a mapping or binding mechanism is needed between the two.

This is true, even if in some cases they are more or less the same thing at some point in time, since although the ontological description will not necessarily change, the observational approach can easily change to being something based on different properties.

Statements about Continuants

Statements about continuants are statements that characterise continuants in a non-temporal way. These tend to include identifying and related information of the patient, healthcare professionals and provider institutions. Such information is typically collated under the ‘demographic’ rubric, and is stored in a demographic system, patient master index (PMI), provider registry or other such system. Contact and address information can be included here, since it is usually relatively non-volatile information.

Similarly, long-term relationships between continuants can usually be treated as properties of patients and HCPs.

Statements about Occurrents: States and Events

In the patient reality, the information of the healthcare process is mainly expressed in terms of statements about:

  • past or present patient state and events (including state and events to do with the physical and social environment of the patient);
  • past or present state of other entities (e.g. patient’s parents) that are predictors for the current patient’s state, e.g. family history of breast cancer.

The state of the patient, or a part thereof takes the ontological form of assertions about the current values of properties of the organs, systems, general health and social or other situation. Statements about events take the form of assertions about changes in values of properties. Both kinds of information are recorded during the Assessment part of the clinical case-matching process, and we normally think of them as ‘observations’. They may be subjective (reported by the patient in subjective terms) or objective (supplied by patient or clinician via a repeatable measurement process). As part of our POV filter, we can say that ‘observation’ information is primarily about what is observed, rather than how it is observed.

A faithful phenomenological view would talk of ‘acts of observing’ as a kind of ‘event’ (as described above). However this is often not useful, because an ‘observation’ in a health record is almost always understood as a recording (e.g. via instruments) about the state of something, rather than a report of the act of observing. To be sure, details of the observation process/act are often reported as well (e.g. the method, instrument etc), but generally in the role of attributes of the focal information. Where acts of observation, e.g. biopsies need to be reported as first class entities, we will consider them as ‘actions’, typically on the healthcare system side.

Examples of state and event information in the patient reality include:

  • blood pressure – the instantaneous or averaged state of the systemic circuit arterial blood pressure;
  • allergic reaction;
  • heart attack (that has occurred) – normally considered an instantaneous event for practical purposes of general medicine;
  • car accident – for practical purposes, an instantaneous event.

In the healthcare system reality, events also occur. These are generally intentional clinical acts, e.g. measuring of heart rate. Many such events occur within the context of a care episode or encounter, which is here denoted as a care situation. Ideally we would have a classification of events, as a sub-ontology of the general occurrent category. As noted above, it would include categories such as ‘process’, ‘structured process’, and instantaneous events. Typical examples relevant to health:

  • endoscopy – a structured procedure
  • biopsy procedure

As noted above, we could in theory routinely include ‘observation’ here, e.g. recording of a vital sign e.g. temperature, understood as an event. Instead, we limit records of the act of observation to non-trivial observational procedures like cancer biopsy, in which the event is considered as a procedure rather than an observation.

In all cases, what is important is what is needed to characterise various kinds of occurrents. This includes the following:

  • the time instant(s) or interval at which the occurrent took place
  • the description of the event, act or process, i.e. the actual details
  • the location or place where the event occurred, or process was performed, e.g. surgery took place in theatre no 2
  • the participant(s) if any – actors performing some role in an intentional process
  • object(s) of the occurrent, in the case where it is an intentional process, e.g. some heart is the object of heart surgery; or in the case of a spontaneous event such as a heart attack
  • passive entities – resources used in a process (e.g. heart valve) or objects implicated in an event (e.g. telegraph pole in car accident)

Not all of these are necessary in every case; which are necessary depends on the particular category of occurrent, and would be described an an ontology of occurrent types.

While many events and acts in healthcare implicate substantial entities, a central type of healthcare event – that of ‘assessment’, is typically a mental event of a single ‘participant’, in which no substantial acts or entities are implicated. We can think of typical assessments (i.e. diagnoses and other kinds of opinions) as being cognitive acts in the minds of health professionals (or maybe even decision support systems), but at a practical level, they are usually understood as items of information. Thus when we talk of John’s ‘diagnosis’ of diabetes, we are talking about the statement ‘John has diabetes mellitus’, not the cognitive event in Dr Susan’s mind 2 years ago, when this diagnosis occurred. This is important, because clinical assessments are – for practical purposes – both a part of the healthcare reality, but also part of the information in the EHR or HIS. Seen another way, a diagnosing physician is both the participant in the cognitive act of diagnosing, and the author of the diagnosis, as recorded for others to see. We need to be conscious of this seeming ambiguity in order to deal with assessments properly in health information.

An example of a clinical statement about an occurrent is:

  • systemic arterial BP measurement
    • time = 2001-08-19 09:30:00
    • core data
      • systolic blood pressure = 110 mmHg
      • diastolic blood pressure = 80 mmHg
    • patient state
      • position = sitting
      • exertion = not exercising
    • protocol
      • instrument = sphygmo-manometer
      • cuff = standard


Most clinical statements are about occurrents, and occurrents are not properly specified without reference to time. Only two choices really exist: past and future. When we measure something (BP) or do something (administer a drug) now, the statement we can commit to an EHR is about something in the past. As time goes on, all such statements are temporally in the more or less distant past. Thus, ‘past’ and ‘present’ are conceptually the same thing.

If we make a statement about a future state or event, things are somewhat more complicated, since we are talking about possibility.


Which particular things are the data about?

Health data found in typical systems are intended as statements ‘about’ specific things in the patient or healthcare reality, such as the patient’s left eye (a continuant), a tissue sample (a continuant), or a particular occurrence of ‘strep throat’ (an occurrent). Some of these things are uniquely identified in reality by dedicated systems, e.g. organ identifiers, specimen identifiers (and order ids, accession ids, result ids…), and prescription ids. Most are not, and the link between the reference and referent (the entity in reality) is a mental one, reverse-engineered from concept ids / coded terms, dates, and other contextual data. So the only way to distinguish between say two occurrences of ‘strep throat’ in a typical patient record is by date. Many things in the patient’s body occur twice, due to symmetry, but are often specified in the record in non-systematic ways, e.g. ‘left femur, neck of’, ‘femur, left’ and so on, where the ‘femur’ part may be coded, but the laterality may not be.

The problem is greatly compounded by the fact that patient data are full of historical mentions of events or health states from the past and future. A physician may want to refer to a previous occurrence of cancer, in the notes on a current occurrence (or is it recurrence?); a patient states a goal weight of 85kg for 1 years’ time. Thus, one occurrence of strep throat can be referred to 3 years ago in the present, and also in the present as ‘my strep throat of 3y ago, that required penicillin’. Humans normally understand the relationship of these references to reality well enough, but health data rarely reflect them properly, and computers would have no hope of unambiguously determining which exact set of real entities are referred to by a patient health record. One of the problems with non-systematic reference/referent linking is that N mentions of the same thing typically occur (often with retrospective or future time modifiers, as above), with no way to know if they really refer to the same real instance or not.

Dealing with this properly requires an approach such as Referent Tracking (RT), which uses identifiers of real entities as references in data about those entities. RT is a non-trivial enterprise to implement globally in all health data, and it is probably not required except in a reasonably well-defined subset. For tracking major diagnoses, medications, procedures and so on, it is indispensable. The requirement here for EHR systems and the like is to support a standard approach to link (a subset) statements to their real world referents.

Patients, HISs and EHRs

A health information system has to enable a coherent picture of the patient and clinical care realities to be recorded so that healthcare delivery can operate, and patients can progress toward goals and resolution of problems. There are many different kinds of HIS. For purposes of healthcare (as opposed to research, trials etc), we assume that the HIS is an EHR system of some kind, i.e. a system that treats the ‘patient’ as the basic organising principle, and records information relating to the patient and the clinical and care processes occurring around him/her. The most ideal version of the EHR concept would be that there is one per patient, in the world. This is unlikely ever to pass, and probably isn’t practically that useful.

We might then posit an EHR per patient ‘within a given healthcare system’, for example the UK NHS, or ‘Sweden’. In highly privatised healthcare economies, what constitutes a ‘healthcare system’ may not be that clear. In the US for example, Kaiser Permanente includes all the hospitals, clinics and professionals needed for its 10 million or so patients, and manages these patients in its own way – like a European country. Other hospitals (e.g. in Germany, US, Brazil, Australia) may be themselves part of a state-level healthcare system. The question of what ‘healthcare system’ a patient is ‘in’ is usually resolvable at the local level, even if it cannot be abstractly described in a universal way. The answer may even be different depending on what kind of care is being obtained.

In any case, one EHR per patient within a healthcare system is still more of an ideal than fact. In Kaiser’s environment, there is a 1:1 correspondence between patient and EHR, but for most other healthcare systems – even those of the nordic countries we often imagine to be the most organised – there are multiple health records per patient, typically one at each HCF (this is improving – Denmark in 2014 has a national medication record). The most typical situation is that there is at least a ‘GP record’, an episodic record at each hospital the patient attends, and potentially other records maintained by specialists, social services and so on. One way to classify these systems is as ‘episodic’ and ‘longitudinal’, where the former only have coverage of specific intervals of time in the life of the patient – but at a high level of detail, whereas the latter attempt to be lifelong, typically in less detail.

Regardless, the practical outcome is that there are usually more than one IT system in which clinical ‘facts’, claims and orders are recorded about the same patient. We therefore have to assume in general multiple health records per patient, which can contain competing descriptions of the patient and her clinical care, and additionally have information ‘holes’ that may or may not be made up for in other records for the same patient. One of the most common problems is that of competing GP and hospital medication lists. These descriptions don’t necessarily contradict each other – they may be the same in some cases. However, it is often not clear even in the case of an apparently identical ‘fact’ (e.g. fracture of tibia) whether it refers to the same thing in reality (the same fracture or really different fracture occurrences?). Where there are competing descriptions of the same reality, at least the following mechanisms will be required in order to work with the information without confusion:

  • auditing: the stamping of every piece of information in an EHR with details of the EHR system to which it was committed, by whom, when, and why.
  • synchronisation: the provision of information by one HCF to another, to fill in gaps in the receiver’s EHR and/or augment existing information
  • reconciliation: where information from different sources purport to be about the same thing, e.g. patient allergies and medications, a conscious process of reconciling distinct information with reality (often by asking the patient questions or by ringing other doctors) in order to create a ‘single-source of truth’ version of the original information.
  • referent marking and tracking: (see above) a way of connecting statements about entities in reality to the actual entity they refer to, so that it can be determined whether two statements referring to apparently the same (kind of) thing do in fact refer to the same thing.

Clinical Workflow Context


Problem / Issue / Concern Context


Content Semantics

Epistemic Categories

Different types of statements are committed to the EHR which we can classify epistemically, i.e. according to the type of claim they make about reality. A proper classification of epistemic categories for health is beyond this discussion (an attempt is shown here), but is not required to explain the principle. For the moment we simply note some of the major categories:

  • observational: statements that describe the actual state of something (e.g. a heartrate), or an event (heart attack) or act (cardiac resuscitation)
  • opinions about processes in the patient: including diagnosis (a claim to know what processes are really occurring in the patient), prognosis (a prediction of forward trajectory in time of the process in the patient)
  • opinions about what interventions should be performed by the health system to/on the patient, including plans: proposed courses of action
  • targets: required future state of an entity (e.g. a target BP)
  • orders: official requests for something to be done

Epistemic category is important because many items of information look the same in their structural detail, but are not in their claim about reality. For example, if a naive EHR doesn’t distinguish between a ‘BP of 12/80’ as an actual and as a target, obvious errors will occur in querying. Similarly, opinions like risk of X, no risk of X, has X (diagnosed), family history of X etc should not be returned in a query looking for ‘family history of X’. In general, if such items are confused in querying and reporting, the result will be clinical errors, patient injuries and deaths.

Information Reliability and Curation

If we agree with the understanding of ‘epistemic status’ with respect to health information, we are agreeing that instances of healthcare information committed to a system at a point in time are approximations of the best possible knowledge of the entities they mention. One function of the clinical process over time is to improve the reliability and completeness of this knowledge. This can be understood as evolving over time the epistemic status of information items that start out as opinions – we can think of them as ‘epistemic’, i.e. ‘claims’ – toward statements of truth, i.e. ‘ontological’ statements. Concretely on an information system we therefore expect to see what may start out as ‘notes’ or ‘claims’ about things in the real world evolving into faithful descriptions of those same things – faithful enough that they are assumed to be true, and can be acted upon. Reliability as used here corresponds to the ‘true, justified’ part of the notion of ‘true justified belief’ in standard philosophy. Examples of statements that may not be initially taken as reliable, but that may become so over time are:

  • tentative diagnoses, differential diagnoses, and other suspicions recorded by HCPs about patient processes
  • subjective statements by the patient about pain and other experiences
  • statements by the patient about previous diagnoses (‘I have been diabetic for 5 years’) and allergies (‘I’m allergic to penicillin’)
  • assessments made by junior doctors and even experienced physicians in some cases (e.g. outside their specialty).
  • demographic details, due to improper data entry, other errors in the identification process.

On the other hand, many things are committed to an EHR that are usually assumed to be reliable from the outset, including most observations derived from instruments, physical examinations, and so on. Consequently the EHR at any point in time consists of information items at various levels of reliability. What does it mean to say that an information entity in an EHR is ‘reliable’? Although there is undoubtedly a continuum of levels of reliability, we need to be practical, since clinical medicine is about doing things, not endlessy theorising about possibilities. We assume the following definition:

  • an item of health information is treated as reliable when its users (HCP, potentially the patient) regard it as a sufficiently close approximation of reality to commit to acting upon it, rather than continuing to validate it or ignore it.

Accordingly, an abstract way of thinking about an EHR is as two pools of information: one consisting of ‘claims’ about reality, and one of ‘reliable statements’ about reality. One of the aims in the clinical process is to convert some of the claims (those considered relevant) to being reliable statements. As noted above, statements newly committed to an EHR might be added to either pool in the first instance. We therefore need to recognise the existence of a process to perform this conversion, denoted here as curation, and a way of marking EHR information as being reliable or not (= having been curated). Note that reliability and epistemic category are not the same thing. A diagnosis made by one physician may be a reliable reflection of reality, or it may turn out to be false.

Information Currency

Reliability is only part of the picture. There may be some items of information not yet regarded as reliable, but regarded as relevant, which are still being worked on, typically diagnoses. Other items are not relevant, or no longer relevant. We can think of relevance as the information property often called currency. A blood pressure taken 10 years ago was probably reliable then, but is unlikely to be reliable now; an HCP will always take a new measurement to obtain the current blood pressure. This consideration leads us to understand the EHR as something like a continuous historical archive of the patient’s biological state, and of healthcare provision to the patient. Which EHR information items are current and which are not? An informal assessment of major categories suggests that the following are current:

  • A – observations about the patient state within a time window extending from the present instant to a prior timepoint, usually in the recent past
  • B – reliable statements about long-lived processes within the patient, i.e. diagnoses of things like diabetes
  • C – reliable statements about processes currently active in the patient, e.g. pregnancy
  • D – reliable statements about ongoing clinical processes, e.g. medication for managing chronic conditions.

These considerations suggest some kind of active partitioning and / or management of the EHR, so that current information can be relatively easily distinguished from irrelevant information. This does occur in some models of the EHR, and takes the form of:

  • managed problem list – partial coverage of B & C
  • managed medication list – partial coverage of D

Case A is routinely dealt with by using a time-window on queries to ignore certain kinds of observations that are older than say 2 years. There is no obvious way to formalise this, since the time period varies as a function of the specific content. The idea of the managed medications and managed problem lists are partway to the referent tracking concept, that is they attempt to reference distinct problems and medications in the patient reality in a clear way. They don’t typically use an identification system, but they constitute a limited part of the EHR where it could sensibly be applied.

A Basis for Health Information System Semantics

Based on the above realist picture, it is proposed that any EHR system architecture needs to model its information taking account of the following things. A patient EHR / EMR containing one or more:

  • Primary statements:
    • what – the specific statement being made about an event or state of an entity in reality
      • specific information about the state, act or event in adequate detail, e.g. BP measurement or drug admin; here we need structure, values, terminology
      • referent IDs as appropriate
    • event / state context:- the real world situation being captured
      • subject – who is the information about?
      • provider – who provided the information?
      • other participants – other actors who were part of the situation
      • where – location, e.g. physical, organisational
      • when – timing information
    • how – the protocol or method by which the information was arrived at, if relevant, e.g. following a specific guideline, using a certain kind of instrument
    • epistemic status – what kind of knowledge, fact or opinion is being stated.
    • reliability – a way of marking the information as reliable or not
    • audit – the audit details of:
      • EHR system
      • time
      • committing person or agent
      • reason (e.g. initial creation, error correction etc)
  • Indexes and linkages sufficient to:
    • re-construct a temporal picture of patient processes from disparate recordings in the primary data
    • re-construct a temporal picture of clinical care processes, e.g orders and results from disparate recordings in the primary data
  • Other meta-information sufficient to:
    • enable analysis and extraction of patterns from populations of patient-focussed care processes in order to create new knowledge about how to do healthcare
    • support clinical trial and other non-care delivery processes.

The above should not be understood as a model of health information directly, but as a basis or even checklist for creating such models. It is assumed that all concrete models of healthcare information can be situated somewhere in the above scheme.

Information Modelling

What kind of information models can we create that incorporate the above elements properly? I’ll assume some starting point classes that are typically used in openEHR, ISO 13606, HL7  CDA and other similar models, namely Composition (= ‘Document’), Section (=’heading’), Entry, and fine-grained elements, Cluster and Element. The task here is to describe principles for building an information model that can be used practically to represent health data on the basis described above.

Unit of Committal

The first thing we need to consider is a container type for committing information to the health record. To this is attached audit information to do with the information author, time, reason for commit, and potentially other context like location, setting, and so on. If the system implements versioning, version ids will be created here. To be technically correct, we have to speak of committing changes to the record, not just adding information, since logical deletions and modifications are also types of change to the record. Following 13606 and openEHR, we call this container a Composition (although an arguably better name would be Transaction, which was the name in the precursor project to openEHR, GEHR.) Nothing can be committed to the health record that is not inside a Composition; the Composition guarantees auditing of all changes to the record; it is the unit of committal, and provides the medico-legal basis for shared information in a clinical context.

Clinical Statements

The next level of encapsulation we can identify is not initially so obvious. What is common about ‘observations’, ‘diagnoses’ and all of the myriad other things that can be said? It is that they are all ‘statements’ of some kind. To avoid ambiguity, I’ll call them ‘clinical statements’. The list at the bottom of this page gives an idea of how many types of statement could be made. We can say that any clinical statement can consist of the following:

  • Focal referent: it is about an ontological target, which we can think of as an ontological referent, which could be:
    • a continuant, e.g. the patient as a whole,
    • an occurrent, e.g. any kind of state or event or process in the patient, or event in the clinical system (e.g. the act of a nurse)
  • Focal data: it makes some substantive characterisations of the referent, i.e. it states values for some properties of the referent, e.g.
    • identification and relationship information
    • a general characterisation of a continuant, e.g. being diabetic, being a smoker etc
    • stating the pressure of the blood
    • characterising colon polyps
    • describing a breast biopsy procedure
    • a lab result such as serum sodium value, serum potassium value, etc
  • Interpretive referent(s): it may make substantive characterisations of related referents, required to correctly interpret the focal reference, e.g. if the focal referent is my blood pressure, and the focal data is 110/80 mmHg, interpretive data could include patient state, position etc. Similarly, a blood sugar value during a glucose tolerance test (focal datum) can’t be understood without knowing whether it is 30 mins, 60 mins or 90 mins after the glucose challenge
  • Interpretive data: the values of interpretive referents, e.g. ‘patient state = 90 minutes post 75 g glucose challenge’; patient position = sitting.
  • Method data: data expressing the method used to obtain the focal and interpretive data, e.g. the protocol, instrument, etc.
  • Epistemic status: it makes an epistemic claim about the focal referent, according to the possible types of claims mentioned earlier, e.g.:
    • for a state or event:
      • that it is like this currently – an observation of any occurrent at a point in time
      • that it was like this in the past – a report of an earlier observation of an occurrent
      • that it will become like this – a prediction about a process in future time
      • that it will not become like this – a prediction that a process will not occur, i.e. ‘no risk of’
    • for an action:
      • that it is recommended to be done
      • that it is scheduled to be done
      • that it was done
    • and so on
  • Spatio-temporal context: if the referent is an occurrent, it records
    • timing information which might be a single point in time, but is often more fine-grained
    • relevant participants and other involved entities
    • location, place, etc
  • Workflow context: if the recording occurs as part of a workflow in which various recordings are made representing observations, assessments, actions etc, then relevant details, e.g. workflow id are recorded.
  • Curation status / reliability: potentially some statement about how reliable the statement is considered to be. However, this may also be inferred from the type of epistemic status and/or timing information (e.g. a BP from 5 years ago is probably unreliable)

The above can be considered a proto-structure of a clinical statement, for which I will assume that the Entry type from the openEHR and other models will be used. It is not possible to convert the above structure directly to e.g. a UML model, because of the dependence of the structure on content. For example a simple past diagnosis of diabetes in patient X needs only a few data points, whereas a hospital measurement of ECG under an exercise challenge requires a more complex model. Nevertheless, we can do two things. We can firstly aim to define a model of Entry containing the bare bones elements of the above list, providing a ‘modelling skeleton’. This is what we did in openEHR by creating the Entry sub-types Observation, Evaluation, Instruction, Action and AdminEntry. I would expect to revise these classes into a new form based on the analysis presented here, and potentially to add one or two new ones. Secondly, we can use an approach like openEHR archetyping and terminology binding to enable the skeleton to be fleshed out for each concrete type of clinical statement. It should be the case that we know what kind of thing from the above list every single element in the final structure is. Why do we need Entries in the health record? Why don’t we just write things into it, like a diary? The primary reason is to be able to separate out (computationally) different statements and encapsulate them. Equivalently, this means to be able to determine the boundaries of a statement with respect to other statements. If the boundaries are not clear, no software or human being can reliably understand what the health record is actually saying. Concretely we can say:

  • an Entry (clinical statement) is about one Focal referent (which may be a group of related things; see ‘granularity’ below), not more than one, and reports Focal and Interpretive data on only that referent
    • in particular, if it is about an occurrent, an Entry cannot be about two unrelated occurrents, i.e. events, situations or processes that would normally be considered unrelated in time.
  • an Entry is limited to one epistemic claim about the Focal referent & its data, e.g. it can’t be a mixture of ‘observation’ and ‘risk assessment’ information
  • an Entry cannot ‘contain’ another Entry, since to do so would imply that any Entry could be about any number of (unrelated or related) events or situations, and also about any number of referents.

As a corollory, the Entry, taken as a whole, acts as a dependable unit of information. Logically, taking only a piece of an Entry and presenting it e.g. on a screen or to a decision support application risks mis-representing the statement.

Querying and Contextually Safe Information Use

The notional indivisiblity of a clinical statement doesn’t mean that in a software environment, pieces of Entries cannot be processed on their own, but it does mean that this can only be safely done in a computational context where the overall meaning of the Entry has already been obtained, displayed or otherwise communicated to the information consumer. Fine grained querying by a screen form or report, where the main information of the Entry has already been posted wlll be safe, if designed properly. However, queries performed between systems e.g. over a stateless REST service will not be – they need to communicate at least Entries, to be sure of obtaining the originally intended meanings of the clinical statements contained therein. Entries might not be enough. If the information consumer also needs to know the medico-legal basis for the information, they will want the Composition as well and its version and audit meta-data. At this point, some will claim: aha, to be completely safe, we need to work with a unit of information that conveys all possible interpretive context. This might seem to imply that versioning and audit meta-data, which are attached to Compositions in standards like 13606, openEHR and CDA, should also be on Entry. Life is not however so simple.

Other Factors Influencing EHR Models

Medico-legal Coherence

One aspect of information systems not yet mentioned is the concept of coherence of the database with respect to reality. It may be that a number of things occur / are done in reality that need to be reported together. For example, the doctor may record a diagnosis (pregnant), create/update a patient care plan (perinatal care plan), and at the same time create a prescription for a medication (e.g. anti-nausea drug). This information represents at least 3 if not more distinct clinical statements, and therefore Entries. It would be seen as reasonable and arguably even mandatory that all of these changes are committed to the patient record together, as a ‘change set’. This provides a basis for a higher level container than the Entry.


One thorny problem of health data is to do with what actually constitutes the focal referent(s) of a clinical statement. If the focus is ‘the colon’ then an entire endoscopy observation, with multiple sub-parts about the different parts of the colon, and in each, about polyps and lesions might make sense as a clinical statement. However, as soon as one particular polyp, or section of the colon appears to be of interest, then that alone will be the subject of further clinical statements. A single polyp could be the focal referent of a clinical statement. The above problem occurs primarily in spatially and temporally complex entities – i.e. entities made up of other entities in complex arrangements. An information modelling framework will need to allow Entries to be about focal referents at any level of granularity. However, health data is also full of ‘characterising information’, i.e. a record of properties or qualities of a target referent. For example, a model of heart sounds can include rate, rhythm, loudness, and so on. These are not self-standing entities, but ‘dependent properties’. It therefore doesn’t make sense to create different Entries to report the complex of related properties for a given target entitiy, such as heart sound observed at a certain time. This tells us that an archetype for heart sounds should include the various properties together. Although it appears trivial, the simple policy of knowing which information elements represent either continuants or occurrents, versus properties, provides a useful rule of modelling, namely:

  • for any complex (i.e. consisting of parts) continuant or occcurrent X, a clinical statement about X or any sub-part is reasonable
  • for any group of related properties about a continuant or occcurrent X, the group probably does not make sense separately from the X itself.

A model of heart sound properties that doesn’t say it is about heart sounds doesn’t make sense on its own.


Other data management factors also force certain modelling decisions. For example, ideally we expect that every Entry should indicate its referent, which, if it is about the patient, will implicate the patient himself and therefore his identity. However, in most of Europe it is illegal to include patient-identifying information in most clinical information. The ‘patient record’ thus becomes a loose container for all statements about the same patient, with the actual identity being obfuscated in various ways to satisfy privacy and legislative requirements.


A key consideration in the modelling world is re-use. This will also affect how models of clinical statements are ‘sliced and diced’. This subject deserves a treatment at least as complex as the treatment of the same topic in mainstream software engineering.


The concrete outcome of any modelling exercise based on the analysis presented here may well be coloured by these other kinds of considerations. However, I would claim that it should not be compromised, in the sense of losing grip on being able to computationally know what the information mean. Therefore a new version of the reference model of openEHR for example would need to document at what level and by what means certain kinds of interpretive context can be obtained, if they are not to be found on clinical statements, i.e. Entries.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s