The real reason most software fails

To my mind there is a problem in academia to do with where disciplines like ‘computer science’ (CS) and applications of computing sit. Pure computer science is the study of computational theory and applications. It develops things like data structures, algorithms, models of parallel computing and much else relating to computation as an object of study. Things like bio-informatics, financial computing and avionics, just to name a few, aren’t usually thought of as proper ‘science’, but as some sort of ‘applied’ form of pure CS. Somewhere in the middle is ‘software engineering’.

However, in my view, the offspring of CS were never properly conceived as disciplines, but rather artisanal pursuits. In this post I have a look at a few domains and try to show how disciplines derived from computing should be understood as both proper science and engineering. In doing so, I came up with the schema above, as a way of establishing the key concepts. There’s nothing on this diagram that can’t be found in a dozen books on philosophy of science. But … I like diagrams.

The general picture

For a discipline to be understood properly, we need to know its object or focus of enquiry. What, in the real world is it interested in? Then, for science to occur, we need a programme of observation and analysis, in order to come up with some conceptualisation of what we see – models. Models are formal artefacts, and are expressed in one form or other of mathematics, and indeed, mathematics is often created or adapted specifically for the domain.

Doing science is largely about finding models (which are equivalent to hypotheses), and refining them via ongoing investigations. As the models develop, new mathematics often has to be created to formalise them. These models form part of the theories that emerge in the domain. Theories explain the phenomena and make predictions about what might be seen. Good theories have good ‘explanatory and predictive power’, as philosophers of science call it. Bad ones fail in the light of contradictory evidence, or conflicts with other theories.

A theory isn’t just a model (i.e. equations / formulae), it needs to be more. It’s a description of something, and to work well, it needs words, metaphors and images. This can be a real challenge, but many consider it crucial for purposes of human communication and teaching, as well as further development of the domain.

The respective viewpoints of Stephen Hawking and Roger Penrose on this are instructive: Hawking is a logical positivist – if the model (i.e. equations in Quantum Mechanics) fits the observations, it more or less is the theory. Penrose, a scientific realist would demand that models would fit into a theory that describes the connection of the models with reality in a holistically satisfying way. Personally I prefer the latter. Why? There have been many examples of models that fit all the known observations (e.g. ideal gas law, Newtonian mechanics) but turn out to be only approximations or special cases of a different model of reality when later observations are made.

Science usually takes as its goal successful theories, ones that either predict further deep truths about our world (satisfying human curiosity), or ones that can be applied reliably in some way to change the real world, for example to cure polio, build high-speed trains and create the internet.

This is where we encounter engineering, healthcare, and other applied disciplines that function like them – they systematise the making of artefacts (skyscrapers, mobile phone networks) and the management of aspects of nature we want to control (agriculture, human health).

In sum, we could say that (good) science uses mathematics and language to create models and descriptions of reality.

A Science Example – Quantum Physics

It’s useful to consider the domain of Quantum Physics, which has as its object of enquiry phenomena at the sub-atomic scale. Here is a version of the schema above for this domain:

I just chose a few semi-arbitrary examples of the relevant mathematics (Hilbert spaces), models (the wave function), theories (‘interpretations’, of which there are many). and some of the metaphorical language (probability clouds) and visualisations (Feynman diagrams) underpinning some interpretations.

What can we learn from Quantum Physics as a discipline? Firstly, the object of study is exceedingly difficult to understand, and remains so to this day, to the point where some scientists speak seriously of parallel realities to account for some quantum phenomena, while others think that the presence of human consciousness in experiments changes not only the experiment but the structure and meaning of theories. Nevertheless, a century of careful observation and thinking have produced a marvellous array of theories and models. L I Ponomarev’s ‘The Quantum Dice’ provides a wonderfully readable history, and Penrose’s ‘The Road to Reality’ arguably provides the definitive exposition of the field to date (at least from the realist viewpoint).

As is clear from these and many other publications in the field, there is a very strong relationship with mathematics. Existing mathematical concepts and methods are used, but in addition many have been adapted (e.g. quantum form of Lagrangian, Hamiltonian), and others created new (e.g. the Dirac notation). The result is a rich mathematical language, specific to the Quantum Physics domain, used to express the ‘models’ of quantum theory.

As I noted above however, equations on their own are not enough. Both of the above-mentioned publications and many others describe the not just mathematical models, but also go to some length to provide prose descriptions, illustrations and metaphors to make the models comprehensible. These appear to be indispensable to the progress of quantum physics and probably to human beings in general.

So here we have science at its best: a horribly difficult aspect of reality to try and understand, an extensive use of 1) mathematics, coupled with 2) language, images and metaphor to construct 3) models (key equations) and 4) theories (descriptions) of this reality, based around the equations. None of these 4 elements can be dispensed with without weakening or destroying Quantum Physics as a discipline. Even with the controversies in quantum theory (e.g. the debate about whether string theory is a blind alley), the discipline is set out in such a way that battles can occur in appropriate ways, i.e. by reference to the mathematics, and interpretive aspects of the theories.

An Engineering Example – Electrical Engineering

I’m going to use ‘elec eng’ as the next example, because it’s the primary discipline I graduated in some aeons ago. Electrical Engineering is principally about building electrical systems and devices, which come in major categories including: power systems, control systems, digital communications, analogue electronics, and computational machines & devices. Each of these is its own discipline. As an example consider the area of analogue filters, which pervades all of the above domains.

Here the aspect of reality under study is something called RC networks and filters. These can be real networks of resistors and capacitors that you would find in an analogue stereo amplifier or mixing desk, or they can be other objects (e.g. semi-conductors) that have resistance and capacitance, and can thus be analysed in a similar fashion. So there is science to be done here – RC filters are real things, and a maths book doesn’t describe them, nor do they appear in core physics. The science involved working out what mathematical models applied (heavily based on time- and frequency-domain analysis and therefore both kinds of calculus, plus Laplace and Fourier transforms), and a description of what happens, involving all kinds of interesting diagrams like pole-zero diagrams and so on.

Result: RC filters are studied; models and descriptions of their behaviour are developed; pretty soon new objects can be engineered using these principles, e.g. real filters for stereo equipment, stereo speakers and so on.

Can you do engineering without science?

Personally, I think the answer is no. Any engineering endeavour needs to understand two things: the properties of the things it intends to make (cars, heart pace-makers, …) and secondly, the context into which the thing is to be deployed (cities, human bodies…).

A poor understanding of the first leads to artefacts that fail in unexpected and sometimes catastrophic ways. The de Havilland Comet in the early 1950s started to break up mid-air, due to a lack of understanding of fatigue in fuselage materials. The Tacoma Narrows bridge on Puget Sound collapsed in 1940 due to resonance caused by high wind – a key property of suspension bridges that had not been hitherto understood.

A poor understanding of the deployment context can create unexpected and wide-ranging problems. Cars generally work well in terms of their design parameters, but their use creates huge traffic problems and road fatalities. Fossil fuel power stations make power, but also CO2 and sulphur dioxide, and may yet prove to be a central cause of environmental devastation.

In general, the first kind of failure seems relatively rare. This is because there is an unavoidable cycle of development and an unavoidable expense in manufacturing, and engineers are more or less forced by these constraints to understand, design and test concepts to death, and also to include ‘safety factors’, ‘redundancy’ and other fail-safe concepts – ‘before the concrete is poured’, as engineers say. This is why the first 747 flew and the Hoover dam is still there today.

Failures of the second kind are rarer when the context of deployment is very well defined, e.g. the human body into which will be a pacemaker will be inserted. When the context is sociological (think: cars and cities) then ‘failure’ of some kind is the norm, but is rarely catastrophic, usually coming under the heading of ‘unforeseen consequences’.

I think things are getting better, due to routine observations of the use of engineered objects in our lives over the last couple of centuries. Computer engineers now think about issues such as monitor radiation, noise, heating, as well as ergonomics of use. This is only happening because both the manufactured object and its deployment context are taken as serious objects of study. The most successful attempts are nearly always by companies who spend significant resources on ‘quality’, rather than sticking with ‘bare production’.

Computer science and software engineering

So far so good. What about computing and software development? In pure computer science, the object of study is typically a computing artefact, e.g. data structures, or a whole system of software. The models are (or should be) developed in mathematical logic, à la Knuth and Djikstra. There are good theories to do with computability, data structure logic, parallel computing and so on. The above schema could be drawn in a reasonable way(s) for computer science.

But when it comes to software engineering, as performed in any domain application of computing, problems emerge. The schema appears to break down. This is primarily because what is being engineered – software – appears to be essentially the same as any ‘models’ that might be developed. Additionally, there is often a failure to realise that there really has to be a focus of enquiry (i.e. the problem domain) that must be understood properly. Instead, superficial inspections are made via methods such as ‘use case engineering’, as a precursor to building ‘models’ which are generally understood as precursors to the built software.

There is abundant evidence. I won’t provide a survey, but it is well known that a high proportion of IT projects fail, and the reasons are almost always a combination of:

failure to understand the problem space
lack of understanding of the properties of the proposed artefact to be constructed, including
- (commonly) a complete lack of understanding of its interior complexity, leading to costs skyrocketing and missed deliveries
- a lack of understanding of the behaviour of the installed system
a lack of understanding of the operational context – user reaction, performance, security and many other factors.

Although it is decades since I did my own software engineering courses at university, I don’t have the impression (going by, for example, published books) that much has changed. Software engineering in my view, largely fails to take the following things scientifically: the problem space, the system or component to be built, and the operational context. Instead, a variety of weak methods try to ‘gather requirements’ by methods such as ‘use-case engineering’, and ‘models’ are developed essentially in diagrammatic form (e.g. UML) and then thrown away.

A few well-understood formalisms are routinely deployed, such as relational database theory. Things that should be taken seriously in modern object-oriented software engineering, such as type systems, contracts, concurrency and systems theory are only weakly represented in the mainstream programming languages. Things of even greater importance – computable ontologies – are non-existent in mainstream software development.

The collapse of the (applied) science function

This has all come about because of the conflation of the object of study, models, and operational (built) artefacts all being

the same thing (software),
digital and
having (today) almost no barrier to entry.

The consequence of the first is that the boundaries in the schema above disappear, and the science part generally doesn’t get done – everything just becomes ‘building software’. The result of the second fact is that the granular manufacturing cycle time constant for physical objects disappears – you can always deliver ‘today’s build’ of the software. Accordingly, the hard constraints that force designers of the 747 or 60-storey building to ‘get it right’ before production appear to be largely missing. ‘No barrier to entry’ means that anyone literally can start writing software, and in many cases, real systems are built by programmers with no formal knowledge of programming, let alone analysis or ‘science’; the result is often huge amounts of throw-away code. (Interestingly, some of the best developers are people with no CS degree, but with a good background in formal methods and/or science as a discipline).

Considered in this light, it’s amazing that any software project succeeds.

One exception is realtime control system software, where the consequences of failure are essentially the same as for bridges collapsing and planes breaking up mid-air. These kinds of systems are expensive to produce, and are built on a more traditional engineering cycle rather than today’s typical software build cycle.

The Agile response: capitulation

The ultimate proof in my mind of the failure of software development to take a scientific approach is the prevalence of ‘agile’ methods today. The agile manifesto describes a kind of developer-sensitive and customer-caring soft – ware engineering. Some of these ideas are good in and of themselves. But read in toto, it’s also impossible to see past what the manifesto is really evidence for: the fact that the agile community no longer treats software development as a tractable concern at all.

Instead, each software project is treated like an new adventure into an unknown land, and the only promises made are to call the customer with updates every couple of weeks, and to be nice to each other in the team. Agile fundamentalists appear not to believe in design at all; instead they claim that by constant refactoring, the appropriate design will emerge. This is essentially saying that software development is a mutation & natural selection process executed by people on program code, which will (hopefully) generate the right solution in the end. We could call the mammalian eye a ‘success’ in similar terms, but let’s not forget it took over 3 billion years to get to it from bacteria…

My aim here isn’t a polemic against agile methods. The point is that I think we have taken a fundamental wrong turn in applied computing, and a re-orientation is needed. In fact, I would say that no turn has been made at all; applied computing has been allowed to develop completely organically, with no oversight or interest in its structure.

I should point out that the serious computer scientists of the last 50 years – those who formulated classical notions of algorithms, software engineering, quality and semantics – did in fact create a worthy canon of learning. I am talking of people such as Donald Knuth, Edgar Djikstra, Barry Boehm, Niklaus Wirth, Bertrand Meyer, David Parnas (Bertrand I know, and I had the pleasure of meeting Barry Boehm and David Parnas 2 years ago in Zurich). However, the go-fast world of the young software developer seems to be almost unaware of their existence or wisdom. We need a new generation of serious scientists like these. There is much work for them (see below).

How to fix things? Start doing science again…

It’s not that hard how to see how we should do things properly. Since I have been working in the bio- & health- informatics domain for over 15 years, I’ll take that as an example. Currently, clinical computing systems – medical record systems, radiology applications, lab messaging systems and the like – are built by the standard methods, with the standard results. Most are poorly adapted to their deployment contexts (long-lived lab information systems being better than the others), and many big ticket projects such as the UK NHS National Programme for IT have been (in any honest accounting sense) massive failures.

Consider the specifics: to build any clinical computing system, we need to:

deeply understand the problem space, for example acute and community-based patient care;
deeply understand the thing to be built, e.g. a health record system;
develop formal models of relevant aspects of both;
describe some proper theories based on the models;
commit to the theories within the domain, and treat them in a scientific manner – i.e. constantly test them against evidence; determine their performance as a basis of design;
treat the building of health information systems as distinct from the models which underpin the theories on which such systems are based.

There are in fact many extant threads of work that could be tied together to make this happen:

biomedical ontologies, e.g. OBO; which aim to provide various levels of computable description of the biomedical and clinical domains;
terminology development e.g. IHTSDO;
computable guidelines, including development of languages and logic for representation of diagnostic decision graphs and therapeutic processes;
referent tracking theory;
clinical decision support research and formalisms;
formal models of the health record that incorporate a concept of ‘context’, e.g. openEHR;
formal models of domain content, e.g. archetypes;
and many more.

The essential problem isn’t that noone has thought deeply about aspects of the domain, it is that there is no coherent programme to put a canon of theory together that would function as a basis for thinking about health computing, or building health information systems.

Doing so would initially require an exercise in which the domain and its many sub-domains are mapped out and formally related. For example, a standard way is needed of connecting ontologies and terminologies to nearly every other item on the above list. This won’t happen while the work proceeds in splendid isolation. It would require much subsequent work in order to get layers of dependable theory working together, and make them usable for real world application as well as further study. I know for example that the openEHR EHR model (which I have worked on for 12 years) needs to be properly integrated with Referent tracking and ontologies.

This is not currently occurring in an organised way, and one piece of evidence, about which I have complained in the past is the bizarre situation that people trying to develop standards for health information are actually trying to do the primary work (generally without the necessary formal background), due to lack of an accepted canon of theory and models.

My belief (after some 17 years in the health informatics domain) is that academia needs to treat applied computing as first class scientific and engineering disciplines, not backwater departments hanging on the coat-tails of other supposedly real disciplines; in the case of health / bio-informatics – electrical engineering, medicine or computer science.

Many of the endeavours mentioned above have excellent exemplars floating around in these under-appreciated academic departments, as well as in companies and hospital IT departments. It’s the connections that are missing.

What Computer Science departments need to do

One of the global weaknesses is the failure in CS education to make mathematical logic and language theory a central part of the curriculum. In my day it was an option, and I believe today is still treated as a post-graduate option. The failure to take formalisms seriously has led to the absurd situation where the de facto ‘formalism’ for designing software systems is UML, which for 20 years has remained little more than a diagramming notation. To this day I know of no software engineer who uses it for anything more on any serious project. That’s because it is full of basic holes (typing, generics, unclear notion of inheritance). It’s ‘formal counterpart’, XMI, is humanly unreadable or writable, and exists in so many variants it’s not reliably computable either. This situations needs to be fixed.

Following that, the concept of building real world semantics into computer code (classes, database schemas) needs to be jettisoned. It’s the very reason why large systems can’t keep up with requirements. I would argue that there is no system – no airline booking system, no conference booking system, no banking system, and certainly no health record system (all textbook favourites) – that can be conceived of as a ‘single level model’. That means throwing out all those misleading object-oriented and relational ‘modelling’ textbooks.

Next, a multi-level modelling approach needs to be formulated. This is partly what we have done with archetypes, but that’s only an early start. With multi-level modelling principles in place, it is unavoidable that Computer Science has to engage with specific domains in a completely different way; it can’t deal with them via ‘use case analysis’ or other vague and superficial means. Instead, a proper methodology is needed for expressing domain processes and information so that domain specific models can be created that can be consumed by software instead of strangling it.

In sum, Computer Science needs to recognise that each domain – health, finance, etc – requires its own science, its own formal models, and its own theories. And that to achieve this, proper (meta-)methods are required.

Applied Informatics

From the point of view of an applied computing department, such as health informatics or financial IT, the starting point is to use something like the schema shown at the top of this post to define the programme of research. Realistically, it would probably mean numerous universities developing a common model of applied informatics research for the domain in question, and teaching and signing up to that model.

In health it means e.g. examining how clinical medicine is executed, and coming up with a proper model of the patient / clinician interaction over time, of the course disease, or decision making processes, of ‘treatment’ versus diagnosis, and numerous other things. Proper models of these are needed, and the models need to be part of an overall body of theory.

Conclusion

This post is already far too long, so I’ll finish with a simple conclusion: I don’t think there there is any serious domain of enquiry, no matter how narrow or seemingly ‘applied’ that doesn’t merit a proper scientific approach, nor serious domain of development that doesn’t merit proper science and engineering relating to both the built artefact(s) and deployment context(s). All applied computing domains fall into one of these categories, but their pedagogical frameworks are woefully inadequate today.

8 Responses to The real reason most software fails

J Carter says:

04/10/2013 at 19:25

Thomas, thanks for such a thoughtful and insightful post! I have been involved in medical informatics since 1987, and I find the lack of formal methods and theories to be a major impediment to building robust clinical systems. Perhaps, the introduction of discrete mathematics into training programs is a good place to start in training the next generation biomedical informaticists.

Sets, logic, graphs, relations and functions provide many of the concepts required to express important aspects of clinical care and clinical systems. I look forward to reading future posts on this very important topic.

Jerome

- wolandscat says:
  
  10/11/2013 at 15:55
  
  Thanks Jerome. I think we need formal models of complex real world things like ‘shared medications list’, and ‘shared care plan’ and so on. Otherwise each software solution has its own private idea of these concepts, and it will continue to block sharing of information and workflow as it does today.
  
  - J Carter says:
    
    11/11/2013 at 19:44
    
    Thomas, it seems that we agree on the ultimate goal, but are considering somewhat different approaches. We do need formal models of informatics structures such as medication lists. The issue seems to be what exactly constitutes formal.
    
    In the post, you suggested a need for mathematical models, and I completely agree. The gist of my suggestion concerning discrete mathematics is that we need proven tools for analyzing the computational properties of informatics structures. A medication list can be rendered quite easily in ASCII and would be easily read by any system. More structure and computational utility could be attained by using XML or JSON. At some point, the issue becomes whether a medication list is simply a list or if it is a unique clinical information construct with specific computational/mathematical properties. If the former situation holds, then consensus is sufficient to address interoperability concerns; however, if the latter holds true, a different approach seems necessary. Thus, I am advocating an approach to formalization that focuses on analyzing the mathematical underpinnings of key clinical informatics concepts. It seems that your approach to formalization starts at a higher level of analysis. I think the two approaches are complementary and not in any way in opposition.
Pingback: EHR Systems and Decreased Productivity: What’s Wrong?
CW says:

29/12/2013 at 19:35

RE: “The Agile response: capitulation”

Insofar as this whole discussion is a extended lament on “the way things are” I think it neglects—at least in the United States—the role of contemporary business culture, and in particular, much of the venture capital community. Together these drive the attitudes and priorities of many young software developers and influence priorities in the training of software developers.

The overall effect I see is a tremendous amount of “churn”. From a certain point of view, everything is fine as long as projects are getting started, salaries are getting paid (for a while), and resumes are getting updated with new skills and project experience to take into subsequent jobs. Investors in particular are often ready to encourage a fair number of poorly thought-out and even reckless entrepreneurial experiments—with a lot of shallow imitation—on the grounds that a few of those experiments will pay off big and pay for all the failures. Nearly everyone is in a hurry, and the perceived urgency of problems and opportunities, e.g., American health care’s cost and quality problems, exacerbates the “fools rush in” dynamic.

The exception you mention, realtime control system software, is telling. In this field, the definition of success is clear, the consequences of failure are stark, and the stakeholders therefore tend to be very savvy, technically sophisticated, and intolerant of BS and broken promises. In health care (and government contracting, which often overlaps with it) the customers and clients are more resigned to just muddle through and work around the system failures—the show must go on.

As a parting shot, see the observations in this article (more of an op-ed) in the HIE Watch newsletter on the so-called mHealth market:

http://www.hiewatch.com/perspective/mobile-could-drive-hie

- wolandscat says:
  
  30/12/2013 at 20:25
  
  Your thoughts on HIE startups potentially solving the interop problem are very interesting. Although most people think of openEHR as ‘EHR’ its designed ultimately to fulfill the HIE remit (what we call ‘EHR’ in Europe /Aus is more like what you think of as the HIE data from the ‘EHR system’ in the US).
  
  But proper science and engineering still needs to be done, otherwise those HIE companies will end up in the same place as the EHR systems builders are today. To realise your proposal (which despite my comments, I think has real merit), you need a bucket of standardised key technologies for all those HIEs to feed off – they can’t all (re)do the science independently over and over. A possible list in the main post (OBO, IHTSDO, openEHR, archetypes, referent tracking etc). The real challenge here would be to find VCs not just for all those competing HIE builders, but for the ‘technology bucket’. (It’s work that should be done by academia, but isn’t). Another commons problem…
  
  - CW says:
    
    22/01/2014 at 22:19
    
    Actually, I was trying to say that you were neglecting “the role of contemporary business culture” in creating the mess. I wasn’t trying to be encouraging about the role of HIE startups. In fact, I was suggesting that a lot of them are uninterested in doing the proper science and engineering. Their time horizons are too short, and they’re mainly in it for the money and the shallow frisson of doing something “innovative”. If the current situation seems fairly intractable, that is substantially the reason why, IMO. I’m afraid the “solution” will be a fairly brutal, drawn-out process of Darwinian selection and failure, as the opportunists and drive-by entrepreneurs gradually fall by the wayside or move on to other things.
    
    I know this sounds cynical, but that’s the way the situation looks to me right now. Of course, in the midst of it all there are people who are quietly doing solid work that will hold up over time.
Pingback: Precepts for Clinical Software Design