Review of OCL 2.0

Overview

This page is a short review of the OMG OCL submission to UML 2.0 – OCL. The reviewed document version is 1.6, submitted 6 – Jan – 2003 by Boldsoft, Rational  and Iona. The purpose of this review of OCL was primarily to determine whether it can be used for the openEHR project on which I currently work. Any review is coloured by the experience of the reviewer, so I state my relevant experience here.

The OCL submission document is nicely written, and generally very clear to read, which has made this review quite easy. The organisation is not so good toward the end (section 5+), where it is clear that a number of other documents have been concatenated.

Global Issues

This section outlines a number of areas which appear to be deeply problematic in the submission.

Purpose of OCL

It is not clear what the purpose of OCL really is from the specification. There are at least two possible broad purposes that I can think of:

  1. a language for writing formal assertions in static class models, for the purpose of stating correctness conditions.
  2. a language for writing business rules

These two are often confused by developers and the industry in general. I don’t know of any normative definition of the difference, so I will offer my own.

Assertions: formal statements which define semantic correctness conditions for models. Three categories of assertion are usually recognised: pre-conditions, post-conditions, and class invariants (the details are too long to go into here, but here is a tutorial http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=595). Some other assertions are possible as well: check instructions (Java ‘assert’) and loop invariants. The first 3 are of most importance in OCL. If the model is implemented as software, then all assertions must hold true for all instances of all classes, at the appropriate times. Any violation of an assertion at runtime represents a bug in the software.

Business Rules: business rules are also formal statements, but rather than defining correctness of the model or software, they state correctness of things in the real world. Since the model may be a model of such things (e.g. bank accounts) then some business rules can be confused with model assertions. We can characterise the differences between the two as follows. A business rule is a statement that:

  • is not true for all instances of a class at all times, therefore cannot be put into the semantic contract of a class at design/compile time
  • business rules may be evaluated at any time, and do not necessarily conform to the structure of pre-, post-condition and invariants used for writing assertions
  • business rules are usually written “from the outside looking in” at objects; whereas assertions are written as part of the class semantics
  • may be created and/or modified at runtime by a user – violation at runtime does not mean there is a software bug, it means that a business requirement / process rule or something else at the business level has been broken
  • often business rules require special syntaxes, since from business users’/analysts’ point of view, the way certain things are expressed or understood in the enterprise do not correspond closely to how they might be represented in software.

To build reliable software, assertions are definitely needed. Whether business rules are needed and what they look like depends completely on what the system is intended to do, and what the enterprise does.

In my own experience, I have been involved in systems which both have model assertions – contracts – and business rules. For one instance of the latter, we wrote a language, used yacc/lex tools to build the parser, built a GUI editor, and stored the rules and evaluated them at runtime. When one of these rules broke, we didn’t go looking for a bug in the software, we went to some person in the company who had done a trade outside the bounds defined by the rule for the fundholder.

None of the points above are to do with correctness of software, on the contrary they are to do with using software to determine whether processes in the running of some business are working correctly.

So…. what is OCL for? Is it about static model (and hence software) correctness or is it about a language for writing business rules? Or both? What difference would it make? I would suggest:

  • for OCL to act as a language for writing assertions for model/software correctness, it must:
    •  absolutely respect the semantics of underlying  model, which it does not quite seem to do, as noted below under the heading “Set/Single association confusion”. The inability to compare reference attributes to null or Void, and the failure to respect the principle of uniform reference all make it harder for OCL to be used for this purpose.
    • define the semantics of assertions over inheritance, i.e. what is the relationship between pre- and post-conditions and assertions in a subclass to those in a parent class. This discussion appears to be missing from the submission.
  • for OCL to act as a language for writing business rules, it would have to be the perfect syntax for all business situations. Whether this is true remains to be seen.

My suggestion is that OCL should concentrate on the first of this primarily, and get that right, before trying to be a general purpose business rule syntax. If we cannot use OCL for adding software contracts to UML models, any other use seems somewhat superfluous.

Informal UML Diagrams

It is clear the the informal nature of many UML diagrams that making formal statements about them is goinng tobe difficult if not impossible. What is often forgotten, no matter how often pointed out by well-known authors, is that UML can be used for conceptual modelling, and for design- and implementation-level modelling. These two are not the same thing – the latter requires fully formal specifications; without them, it leaves room for software errors. This comment is particularly true in standards specifications, which might be implemented by thousands of developers. Areas where UML allows ambiguity include:

  • missing role names – role names correspond to attribute names in the source class; a missinng role name means the specification is incomplete;
  • non-directional association arrows. With no traversal direction indicated on an association line, there is no information as to whether either or both classes are inntended to have an attribute/relation of the type at the other end of the line – the specification is also incomplete;
  • association classes. While these might be attractive for conceptual modelling, these are useless for formal modelling. They need to be replaced by a formal relationship between one or both end classes, and the association class, with appropriate multiplicity and role names. See section 2.5.5 for the difficulties this creates.
  • cardinality without other semantics. Line ends which indicate multiple cardinality are insufficient for a formal specification. At a minimum, the intended abstract semantics of the containment are required, e.g. with UML constraints such as “{ordered, non-unique}” etc, or better, with a proper usage of generic types such as Set<> etc.

It is suggested that OCL should not try to address anything less than a “fully specified” UML model, at least not completely (i.e. there would be no hope of automatic compilation or validation). None of the above ambiguities should be present in such a model. This has clearly presented a problem to the OCL authors, e.g. in section 2.5.5.

Use in Design and Implementation

The OCL specification does not seem particularly oriented to use in design or implementation, since there are a number of departures from all major object formalisms, such as the mixing of Set and single-attribute semantics. There is also no clear discussion about the relationship of constraints over inheritance.
to be continued

Multiple Aims Compatible?

OCL is stated as trying to be at once a specification language and a query language. It is not at all clear that these are compatible aims, since query languages are usually about retrieving information which may be synthesised across “joins”, while a specification language is concerned with making statements about clearly identified single concepts such as attributes, classes, packages etc. The Set/single association mixup appears to be a result of these incompatible needs.

OCL 2.0 has added the concept of a Tuple to the language. It remains to be seen whether all the semantics of collections and query results (tuples) can be properly integrated with a language whose basic goal should be to define correctness constraints for models.
to be continued

Set/Single Association Confusion

In section 2.5.4, subsection “Navigation over Associations with Multiplicity Zero or One” it is stated that “…a single object can be used as a set as well. It then behaves as if it is a Set containing the single object…”. I had to read this a few times to believe it. While there is clearly a drive to make everything somehow the same for the purpose of querying, there is no way a formal type system can be respected if such liberties are taken – the basis of correct specifications disappears. There is no way, for example, that PERSON.manager: PERSON can act as a Set<> if has been specified as a PERSON – these are two distinct types, and have nothing to do with each other. Hence, any expression in which PERSON.manager appears assumes a Set<> object, even though it is declared as a PERSON, meaning that such expressions will be wrong as well. For example, in section 2.5.4 (sub-heading Navigation over Associations with Multiplicity Zero or One), the expression

context: Company
inv: self.manager->size() = 1

appears, although manager is defined to be of type PERSON. Such expressions cannot easily be mapped to implementations, although one might argue that the use of arrows rather than dot notation will at least facilitate it. Further, it obscures the fact that the relationship COMPANY.manager is already declared in the UML diagram as being of cardinality 0..1, meaning an optional single valued attribute. What would be a much clearer statement is:

context: Company
inv: self.manager <> null

meaning that COMPANY.manager is not allowed to be null. It is apparently impossible to write such a constraint in OCL, even though it is likely to be the most commonly used of all constraints in both invariants and preconditions.

The only interpretation that could be given to association links in UML diagrams with cardinalities of either 1..1 or 0..1 would be that (for some reason), no commitment is being made to whether a container or a direct reference will be used, even though the cardinality implies that latter. Indeed, if there was a reason to use a container in such a context (for example, to make the type of a certain property the same as some other properties in other classes, already declared as containers, and therefore uniformly processable) then this should be stated explicitly in the UML using Set<T> or similar. However, this would be a very unusual interpretation of a UML diagram in this author’s experience.

Principle of Uniform Reference

One of the simplest and clearest tings pointed out by B Meyer (yet lacking in most languages), is the “principle of  uniform access”. This says that the same syntax should be used for class properties with the same semantics. There are two examples:

  • functions with no arguments – in a specification such a “function” should be defined without parentheses, since the intention is to fully formally specify it, not to pre-dispose it to a particular kind of implementation, e.g. computed rather than stored. The problem with not respecting this principle in a specification is that the specification then predisposes certain attributes in implementations to be computed or stored, when this is no business of the specification. (See section 2.5.3 Properties: Operations).
  • dot and arrow referencing. There does not appear to be any reason for arrows to be used when referencing the properties of collections (see section 2.5.4 properties: Association Ends and Navigation) – they are normal types like any other.

Constraints and Inheritance

There did not appear to be a discussion of the semantics of how OCL constraints work over inheritance, e.g. are they cumulative down class hierarchies, etc. Lack of semantics in this area would seem to make any OCL constraints on a class whose parent also has constraints, particularly pre- and post-conditions, ambiguous.

Detailed Issues

Section 2.5.9 Accessing overridden properties of supertypes.

The intent here is reasonable, but the syntax is awful:

context B
inv: self.oclAsType(A)

A far cleaner syntax would have been something like that in Eiffel, which uses the precursor keyword. This needs no qualification in single inheritance, the vast majority of cases, whereas the OCL pseudo-function seems to invite errors.

Section 2.5.10 Predefined properties on all objects

Some of the predefined properties could be more intuitively named. The operator oclIsTypeOf(t: OclType), might have been better named hasDynamicType, while the operator oclIsKindOf(t: oclType) might be more obviously called isInstanceOf.

Section 2.5.13 Collections of Collections

The following comments pertain to the earlier submission. This problem appears to have been fixed in the later submission.

This section states that all collections are flattened automatically, so that the two following expressions are the same:

Set { Set {1,2}, Set {3,4}, Set {5,6} }
Set { 1, 2, 3, 4, 5, 6 }

However, this is never the case in computing abstractions based on strong typing; indeed the first set above has the form of a “power” set, a common mathematical abstraction. Inn purely structural terms, Sets of Sets and Lists of Lists, Sets etc are quite common. It is hard to see how OCL flattening helps specification of systems where nested containment often occurs.

Section 2.5.15 Previous Values in Preconditions

In this section, notation is described for referencing the values of variables as they were at entry into a function, within a post-condition, viz:

post: age = age@pre + 1

This is quite a nice notation, and seems intuitive. However, it starts to look a little strange when parentheses start being used, e.g.:

post: stockprice@pre() + 10

Here it looks as if the function is “pre”, not stockprice. Perhaps more use of OCL will make this part of the notation appear more natural! However, the notation stops making sense in the last part of this section, where it says “when the pre-value of a property evaluates to an object, all further properties that are accessed of this object are the new values…”. E.g. in:

post: ...a.b@pre.c...

the b@pre accesses the old value of b (as at entry into the function); but the .c then accesses the new value of the property c of what b@pre refers to. This seems totally counter-intuitive, since one would expect that if b@pre refers to the old b, then any reference from that point refers to the old values of properties as well. However, the OCL specification has strangely chosen to require repeated use of the @pre expression to achieve this.

Section 2.6 Collection Operations

Operations such as Select() and Reject() are defined on collections. While this is quite nice for a querying language, it implies operations on sets which are in no way guaranteed to exist in target implementation formalisms. This seems to be an example of where the two aims of querying and specification do not appear to meet all that well. The operators exists, forAll are far more reasonable, since they are likely to be available in target formalisms, and are necessary semantics for any specificaiton formalism based on a predicate calculus.

In general, the expressions which can be built using these operators appear quite readable to those used to predicate calculus.

Section 6.3 Primitive Type Specifications

The specifications appearing in this section are surprising nearly devoid of pre- and post-conditions which would would expect to fully specify their semantics.