The folly of the obsession with source code

My favourite topic these days is the phenomenon of fundamentalist thinking. You don’t need to go to Iraq to find it, it’s all around us….

Recently I chanced upon a post entitled ‘Coding is not the new literacy’ by Chris Granger, who as far as I can tell is one of the smart young generation of start-up developers creating interesting new ways of doing software. I suspect he is not yet 30, going by his ‘about’ page. Not a bad post for a young guy. It essentially says the following:

  • coding is ultimately the act of externalising our mental models into computer-understandable form
  • the main game is building and refining those mental models

In his view, ‘modelling is the new literacy’. He says:

Modeling is creating a representation of a system (or process) that can be explored or used.

I happen to agree with this, and I would go so far as to say that if you think that coding is the main activity of understanding or formalising a solution to a problem, you are profoundly wrong.

There are a few different ways to see this. One of them is described in my post ‘The reason most software fails’. The gist of it boiled down to the idea that building a system or software ‘solution’ is actually a scientific process, in the sense that you must start by understanding not just the problem domain but the deployment context, creating some theories, refine those theories by testing and further observation, and iteratively arrive at a working realisation of the theory in the deployment context. The process never finishes. The ‘theories’ are expressed primarily as models. Coding is an expression of the models into one or more specific programming languages, but it doesn’t properly embody the models or problem domain understanding; it’s just an expression of executable parts of the solution according to conclusions drawn from the models. All ‘coding’ involves working around the semantic limitations of the programming language to obtain the originally intended semantics.

The following diagram was my attempt to illustrate the variety of cognitive activity going on here.

mmser-complete

 

Here is what it looks like mapped to e-health.

 

mmser-health-software

 

Most of what we do is modelling. In the approaches used by any of the serious health informatics efforts, very little of the domain information or workflow semantics are ever converted to source code – instead, they are expressed as terminology, clinical content models, and computable guidelines. A lot of system logic and user workflow is of in code course, but this can’t be achieved without clear logical models of these workflows. Most of what is code is fairly generic stuff for moving data and notifications around in specific ways.

In fact in e-health, we do a lot of modelling in languages like UML, OWL, ADL, BPEL and Proforma. We use a lot of different languages to address different intellectual problems. Unfortunately, neither companies nor public institutions put much value in this, and it is a poorly supported aspect of most IT projects.

Here’s another quote from Granger’s post that nicely encapsulates the problem with thinking code is the main game (my emphasis):

Coding requires us to break our systems down into actions that the computer understands, which represents a fundamental disconnect in intent. …. We are not trying to model how a computer does something. Instead, we are modeling human interaction, the weather, or spacecraft. From that angle, it’s like trying to paint using a welder’s torch. We are employing a set of tools designed to model how computers work, but we’re representing systems that are nothing like them.

Even in the case where we are talking specifically about how machines should behave, our tools aren’t really designed with the notion of modeling in mind. Our editors and debuggers, for example, make it difficult to pick out pieces at different depths of abstraction. Instead, we have to look at the system laid out in its entirety and try to make sense of where all the screws came from.

Most of the human activity and the economic expense of generating good IT solutions for the real world is in the a) understanding of the problem domain and b) creation and refinement of models of phenomena in the domain. This is where the hard work is, and where understanding is encapsulated. This is what can be reused, and what should be supported especially in tough domains like e-health.

However, there is a prevailing idea today that ‘code is king’, with the implication that all the understanding of the domain, the system being built, and the models and paradigms on which all that are based will be primarily expressed in the code, and will emerge due to endless ‘refactoring’ of the code, à la the agile revolution. I’m not suggesting that agile ideas are useless (see my Amazon review of a recent book on the topic) or that messing round in code isn’t a useful way of doing some modelling, or working on certain ideas (although, if you scratch the surface, the type of languages used to do this seriously tend not to be mainstream languages, but things like Lisp, Clojure, Haskell. You won’t create much new understanding in Java, unless you are studying the problem of why it’s so hard to do simple things in Java).

The obsession with source code bypasses all these considerations, and makes all kinds of fundamental errors such as:

  • If I have the source code, I have the understanding of the problem, analysis, architecture and system dynamics
    • = if I have a copy of human genome, I understand how cancer works.
  • My people can quickly get to work on your 5,000,000 lines of source code and make it better
    • = I don’t need a medical degree or training in oncology, I’ll just sort these nasty tumours out over a couple of weeks.
  • The semantics are in the source code.
    • = the genetic code is the cancer.
  • I’ll just keep refactoring this darn code. The design I originally wanted will appear sooner or later.
    • Try ‘later’.
  • The system I have is based on a common design paradigm, architectural approach, and coding styles.
    • Trust me it isn’t, unless you built the entire thing yourself. It’s full of components, each designed well or badly in the design approach of its authors, and integrated in your environment using all kinds of glue created by your guys, who understood who knows how much or little of the overall problem, requirements or the components you obtained.
  • The amount of code written on my project = the amount of economic value I have.
    • Wrong. If you don’t have re-usable models, designs and well-crafted descriptions of the problem space, you don’t actually have much value, you have a looming future maintenance debt, one that will probably be expensively serviced by waves of staff hacking at the code without the slightest understanding of the big picture – and nowhere to obtain such an understanding.
  • If we make the world ‘open source’, we’ve got instant shared understanding of everything, for free.
    • = we’ll just share these genome maps around for free, then cancer will be universally solved.

I would go so far as to say that the ‘code is king’ idea is a profound misunderstanding of where code sits with respect to intellectual value created in IT projects.

The result of this wrong-headedness is that private and public investment in IT projects typically measure success / output in code delivery milestones and fails altogether to require that reusable intellectual material is being created, particularly formal models. Consider that the UK NHS’s failed £9bn National programme for IT not only didn’t generate much in the way of working systems for the money, but generated no lasting legacy of problem space description, models, theories or designs, despite the fact that much useful intellectual activity of this sort, by world-class health informatics experts in the UK did occur. That’s quite a non-achievement.

Don’t get me wrong. Source code is important in many ways, and open source is also a good thing. It’s just not the fundamental thing. Don’t forget, all source code will be thrown away at some point. It is conceptual models that live on.

I know that none of the above will be listened to by the vast majority, but if I could make one plea to funders, public and private, it would be this: if you are about to fund a development that might take some years, and whose result is meant to function for 20 years, then you are funding an intellectual journey, not a ‘thing’, and you need to take great care to properly support the core intellectual descriptions and models that underpin the code. If you are forced to restart a few times, as you will inevitably have to, it will be these models of understanding that save you, not the code.

Advertisements

About wolandscat

I work on semantic architectures for interoperability of information systems. Much of my time is spent studying data, ideas, and knowledge, and using methods from philosophy, particularly ontology and epistemology.
This entry was posted in Computing, Health Informatics, openehr, Philosophy and tagged , , , , , . Bookmark the permalink.

7 Responses to The folly of the obsession with source code

  1. bertverhees says:

    Good blog, I agree for 95%. Worth reading for people engaged in software development.
    Indeed there is a code-idiocracy, and it works both ways. One way is being overconcerned about source-leaks, and the other, as mentioned, overestimating the value of sourcecode in open source.

    I often regard code as a living creature, if you don’t feed it, it dies. I can never throw something away, so I have code, C, C++ whatever, even from 1996. Worthless code, I never look at it. There is no living souls which can understand that code, without the effort of rewriting it. It is dead code.

    Code which is not actively worked on is dead or it will die real soon.

    Sourceforge is for 80% filled with dead sourcecode. Github is more alive.

    Living code is not something that is stored, as explained, it is a process, all the time evolving. I spent one hour a day on refactoring, always looking for double code, bad names, badly documented, extracting functions, that kind of easy targets. Sometimes I do large refactoring jobs. Changing thousands of lines in one, two days.

    To do that I need unit-tests, without unit-tests I cannot tell if my refactoring was successful. And after the unit tests I also do a build test, which is part of continuously delivery, also needed, and integration-tests, etc.

    So testing bring me closer to the concepts the code is trying to achieve then the code itself.

    That is why code itself does not bring us much.

    I remember vaguely that the sourcecode of a Windows-version had leaked, 15 years ago, or so. In that time, it was 15 million lines, I think nowadays it is 500 million lines.

    It was no problem at all. It shouldn’t be a security problem because security may never depend on obscurity.
    No one wants to compile Windows, no one can use that code for building Linux or OSX. It is useless code, only for Microsoft it is useful, because there are the processes around the code working. There are the discussions, there are the plans there, is the vision. There is the code lives. Outside it is dead, almost instantaneously after leaking.

    For fun, now and then I read this:
    http://www.mit.edu/~xela/tao.html

    I can recommend it, it is a joke, with some small pearls in it.

    The wise programmer is told about Tao and follows it.
    The average programmer is told about Tao and searches for it.
    The foolish programmer is told about Tao and laughs at it.

    If it were not for laughter, there would be no Tao.

    The highest sounds are hardest to hear.
    Going forward is a way to retreat.
    Great talent shows itself late in life.
    Even a perfect program still has bugs.

    • wolandscat says:

      The Tao link is very worthwhile, I recommend it also. Funnily enough I had been going to use the Tao Te Ching instead of the genome/cancer metaphor in the original post. Ah, I feel the enlightenment coming….

  2. Nicely put! Code (at least in most of the forms I know it) can’t help but over-specify a solution, and therefore obfuscate the shape of the real solution. Research projects that hand over MATLAB or c++ code aren’t finished, IMO. Finishing requires a paper that explains both the why’s and the wherefores. Maths also suffices: it can be sufficiently fungible. I haven’t encountered an executable programming language that has the same malleability or frictionless abstraction. I have seen some noble tries though. Perhaps we’re getting closer.

    • wolandscat says:

      C++ code is a great example of what all this is about. It’s quite far from the underlying intellectual ideas (e.g. some kind of data or signal transforms), but it embodies fairly closely one thing: your best understanding of how to make the logical computation run as fast as possible on the hardware. Seen like that, C++ is both a typical programming language, but also some kind of software performance optimisation design language.

  3. jpmccusker says:

    There’s a saying around “data ages like wine, software ages like fish” (just remember that even the best wine must be drank in its time). But what you’re referring to is a pendulum swing away from the days of treating software engineers like construction workers. Modeling structured knowledge/data/information is a key part of understanding the modern world, but I think we need to have both nouns (models) and verbs (software) before we can do anything.

  4. Peter Jordan says:

    If a successful software project is one that derives a working solution from well-understood business, needs then I’d argue that neither source code or conceptual models are ‘king’. It’s the relationship between the requirements and the practical use of the runtime artifacts that define a software system.

    Sadly, in many cases models may be an even poorer, and less durable, representation of the working solution than source code – particularly if the original design documentation is not updated during maintenance releases.

    However, that’s not underplay the seriousness of the disconnect between modeling and building – particularly when they are performed by separate individuals or groups. A clear manifestation of this can be found in the Pragmatic Programmers ‘Practices of an Agile Developer’ which proclaims “don’t trust an architect who doesn’t” code! The model will never save anyone in that environment (i.e. where, at best, it will be treated as nothing more than a rough guide by the coder).

    Perhaps if was possible to generate usable source code from standard modeling techniques, such as UML, King Coder might be dethroned. Unfortunately the main culprit, as Thomas suggests, is object-orientation: a wonderful way of organising source code, but completely unsuitable to expressing business logic.

    Maybe domain-driven design and functional programming will afford better code-generation and hand-crafting that’s directly traceable to requirements?

  5. Pingback: Precepts for Clinical Software Design

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s