Are we in a Semantic Emergency?

The AI singularity

The LLM (Large Language Model) AI revolution has forced the hand of the IT industry on the question of semantics in data, because now meaning is now consumed by machines as well as humans. The industry has survived for decades by a messy combination of fragmented technologies for database and software development, poor problem space analysis, weak methods for semantic architecture, manual implementation / test / deploy, not to mention the Agile revolution, which led to the entire activity of ‘design’ being sidelined. The consequential endless consulting and churn of enormously expensive solutions has become normalised as expectations eroded to nothing over the years.

Procuring organisations labour under two constant fears: that changes to systems might compromise the existing data (often Terabytes or more, and covering significant population); and of having to replace an entire solution (typically, the EMR within a healthcare enterprise), which might be due to intolerable consequences of semantic drift of the incumbent system, but just as often, due to an organisational merger or other business event.

The advent of usable and affordable LLM AI has changed everything, notwithstanding its shortcomings and associated hype. The two essential effects are as follows.

  • The win: radical reduction of implementation effort to create any technical output – software, databases, websites, translations, summaries.
  • The risk: the use of LLM AI to surface insights (e.g. summaries, differential diagnoses etc.) via direct processing of data as language inserts a black box whose ‘reasoning’ is opaque (it is in fact statistical language prediction), and which does not compute on formal semantics, but lexical patterns.

An unsettling consequence of the first point is that much of the cost justification of highly priced ‘mega-suite’ solutions is likely to evaporate, since procuring organisations will realise that they can build their own solutions based on formally stated requirements and models, or at least hire far cheaper specialist companies to do so. The risk and difficulty of obsoleting an incumbent is dropping significantly. Not only is ‘build your own’ becoming possible, ‘rebuild your own’ (as many times as you like) comes almost for free.

In addition, AI-assisted migration (or in situ MCP translation) of terabytes of legacy data may become safe and economic in the near future, completely changing the equation of migrating to new technology.

Freedom is usually a double-edged sword, and this case is no exception. AI-based solution development brings with it the need to specify what to build, in a sufficiently structured way for AI coding tools to generate correct output. This is no small challenge for complex domains like healthcare.

The second point above reflects the increasing recognition that LLM AI on its own cannot establish truth, other than by statistical association (i.e. repetition of an assertion, and / or direct linguistic assertions of truth, e.g. in medical journals). Unfortunately, the presence of both garbage and of assertions once thought correct (e.g. ‘stomach ulcers are caused by stress’) in foundation models undermines even this mode of establishing truth. Additionally, the inability to access longer term context for any given piece of data consumed by an AI further limits its ability to generate correct outputs.

It is now understood that LLM AI can only be safely harnessed at an enterprise level via integration with ontologies and other formalised representations of domain truth – so-called hybrid systems.

In sum, both the potent promise of software for (nearly) free as well as net-new operational insight generation are achievable, but only with a common underpinning: sophisticated computable representation of domain semantics, via ontologies and advanced information modelling.

The Medieval State of Modelling

The current state of the art in semantic architecture is artisanal, which is to say while there are a few localised experts and even centres of excellence (just as there were a few 12th century bootmakers whose shoes were fit for the king), there is no generalised methodology for semantic architecture, and almost no comprehension of how to approach modelling of the domain.

The advent of modern AI makes this state of affairs untenable, since AI and agentic solutions need access to semantic definitions of data to work properly, plus ontologies to interpret it, but can’t obtain either. Not for lack of technical methods, but because the needed models and ontologies remain undefined.

What do such models look like? SPLASH proposes three levels of formalisation:

  • Master Ontology (MO): define the base categories and relationships that establish the semantic grounding for the framework;
  • Reference Model (RM): define the concrete representation of information instances based on the MO;
  • Domain Models (DMs): define the semantics of domain information, decision pathways and processes, expressed as RM-based utterances – models of data, decision structures, and plans.

In subsequent articles, I’ll provide an outline of each of these levels of representation, and show how they can take us from computing on text to computing with meaning.

Production note: readers will correctly guess that the images in this post were AI-generated (including the second, which was created new). I created all three in less than two minutes. It will be appreciated that producing such custom images 5 or more years ago would have been some hours’ work.

The text is however mine alone …

This entry was posted in Computing, Health Informatics, SPLASH and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *