What’s wrong with UML

Unified Modelling Language (UML) was supposed to be what the name implies – the one language for modelling in IT. And in they sense of human authoring and consumption of architectural concepts for software development, it sort of is – most developers will gravitate to a whiteboard and draw some class diagrams. Occasionally someone will draw an Activity diagram or Object Interaction diagram.

And then they go and write the code.

Sometime later, they hit serious challenges, and need to go back to group discussion mode. That’s when they realise their code no longer matches those diagrams from 18 months ago (now long scrubbed off the whiteboards and only by luck saved on a few mobile phones).

Some organisations publish models using UML, as we do in openEHR. But those published models generally only get consumed by human eyes. A few intrepid types will download the published XMI (a horrible model serialisation format deserving of its own circle of helll) or native UML tool files (assuming they can get to them) and try forward generation to code and/or XML schemas. With a lot of pain, they’ll get something, which they will then work on for another year, leaving the models behind in the dust, as they adjust for the differences between UML and their chosen programming language.

It wasn’t meant to be this way.

UML was supposed to be fully computable, enabling forward generation into code, with ongoing round-tripping. We are so far from that….

There are now thousands of younger generation developers who have no use at all for UML – they bypass it completely, going from rough block diagrams to code, or maybe just to code. Starting with code is a terrible idea in my view, but that’s what people do these days, and we can’t stop them. But one has to admit: if UML had value to software development, which is what it was meant for, wouldn’t they use it?

I’ll explain everything I think is wrong with UML, with a few pictures. Since I have to use a tool, I’ll use MagicDraw, which I must point out is an excellent implementation of the UML 2.5 standard, and the vendor, NoMagic is likewise excellent in support, and supporting our specification work in openEHR. I say this because I have to use some tool to illustrate the problems, and it’s the one I actually use. Similar comments I suspect would apply to e.g. Rational Software Architect. So: this post is nothing to do with UML tools or vendors, it’s only about UML itself.

Formal Language and Serialisation

Wrong Semantics

Typing of Container Properties

Typing for container properties – anything that is a List<>, Hash<> etc – is a complete mess in UML. Consider the property forenames in the following class pseudo-code:

class PERSON_NAME
    forenames: List<String>
end

In UML, the usual way to do this is with a definition like:

uml_person_name

uml_person_name_def

So the signature is forenames: String, with a multiplicity of ‘*’. Any code- or document-generation software has to infer from this that a List is really intended. It is possible to help such generators by adding a UML constraint such as {ordered}, or {unique} (implying a Set<> rather than a List<>). But why make life so hard? All I want is to be able to state the type (not class) List<String>, as I would in a dozen mainstream programming languages.

Modelling Hash / Map

And if I were to choose a different container, say a Hash<String, String>, how would I do that? Here’s one way:

hash_string_example

Now I could have used other_details: String, and added comments to the effect that I really wanted a String-keyed Hash of Strings. But we want the model to be computable. Notice here that the multiplicity is just 0..1 – I only want one possible Hash<>, not many. But earlier, I had to put ‘*’ to imply that there was a container of Strings.

To do the above, I had to first create – painfully – a ‘class’ called Hash<String, String> that contains template bindings of String, String to the K and V template parameters of the class Hash:

hash_string_tpl_binding

Note that Hash<String, String> isn’t even a class, it’s a type, generated by the combination of Hash<> and String classes. But UML doesn’t do types.

Another way to model a Hash<> in UML is using the ‘qualified association’. For example:

qualified_association

The ‘properties’ attribute here is implied to be something like properties: Hash<P_BMM_PROPERTY, String>. Firstly, the graphical representation is unintuitive, and secondly, we have to ask he question how does this attribute look if it is converted from an association form to an attribute within the class? Hard to know – in MagicDraw at least, the usual option to refactor an association end into an attribute is not available.

If we go back to the forenames: String example, why didn’t I use forenames: List<String>? I certainly could have done – the method is just the same as for the Hash<String, String>. So we have:

  • two possible ways of representing the most common kinds of containers (List<>, Set<> etc) in UML, with differing signatures;
  • a confusion about what multiplicity actually means with respect to containers – does it refer to the container or the contents?
  • we are forced to define a ‘class’ to represent a bound generic type.

Another thing that is very hard to model in UML is any kind of nested Hash<> structures.

You sometimes hear arguments from UML experts that types like ‘Hash’ or ‘HashMap’ are just specifics from e.g. Java, and should not be in UML diagrams. This is nonsense. The logical ‘Hash / Map / Dictionary’ data structure that constitutes a keyed table of objects is one of the most common constructions in any modelling space, and can easily be mapped to any object-oriented programming language. What should be included in UML is:

  • a built-in Hash<V, K> type in the standard profile;
  • a fundamental change such that the type of any attribute or operation is actually a meta-type ‘type’, not a ‘class;
  • the ability to create ‘types’ from generic and container classes, and use them as the type of any attribute, operation or operation argument.

Inheritance of Template types

Define TUPLE, then TUPLE1<A:Any>. Now define TUPLE2 as inheriting from TUPLE1. We should see that TUPLE2 is really TUPLE2<A:Any>. We want to add another template parameter to make it TUPLE2<A:Any, B:Any>.

Weak Semantics

Generic (template) Types

Generic Type Binding

If I define a type X<T> or X<T->C> (‘C’ is a constrainer type), I can then use both the ‘open’ X<T> and ‘closed’ versions of X<T> such as X<A>, X<B> etc as concrete types in other classes in a class model. I can only use the open form if the class I use it in, say D, itself has an open type parameter T in its definition, i.e. D is D<T>, D<T, U>, or similar. All of this is possible in UML, but badly over-complicated, leading to very complex meta-model (i.e. the underlying semantics, which are exposed in XMI serialisation of a model), and as a result, painful user interaction with UML tools to create something that can be written in one line by a programmer.

Enumerations

In UML, enumerations have no associated values, and no way to set them. But enum types in most languages do – for the simple reason that you want to be able to guarantee interoperability across software components produced by different groups or vendors. Most programming languages that support an ‘enum’ type directly enable the enum lables to be mapped to set of values of a specific underlying type, usually Integer or String.

Missing Semantics

Uniform Reference for Computed Attributes

One of the worst failures in UML and most object-oriented languages is the simplest: the failure to get signatures right for computed attributes. A computed attribute is technically a function with no parameters. It should have a signature like:

attr: T

e.g.

age: Integer

but in UML we are forced to use:

age(): Integer

This error is repeated in many languages, with the concrete result that if I want to change a computed implementation of ‘age’ to a stored version, I am changing the syntax signature, and thus breaking all of my (or someone else’s) calling classes. This is just dumb.

Anchored Types

 

Poor Visualisation Rules

Containers & Generic types (again)

Redefinition

Closures / Lambdas / Agents