Style Guide for Writing Comparisons in Conditions

Florian Weimer

This posting describes a style for writing conditions in programming languages, and explains how it makes code more robust against certain security problems.

When writing a condition in a programming language, there are two equivalent formulations, a ≺ b or b ≻ a for some operator ≺. My personal rule for chosing between the two is that the quantity which varies fastest should be on the left hand side of the comparison operator, or if the amont of change is hard to determined, the quantity which we do know less about.

In this example, the loop counter i and the array element array[i] clearly change more often than the array length length (which is constant during the loop) and the constant '\0'.

for (unsigned i = 0; i < length; ++i) {
  if (array[i] == '\0') {
    break;
  }
}

This is rule is quite widely followed, not just for for loops (Kernighan and Pike employ it in The Practice of Programming, for instance), but I have not seen it spelled out explicitly anywhere.

No rule with an exception, though. If the conditional expression checks for interval membership, I try to mimic the mathematical notation a ≤ x < b (where x is being checked against the interval [a, b)) by writing such conditions as a <= x && x < b.

For if statements, there is another option: exchanging the if and else branches allows one to negate the condition. This can lead to simplifications. Usually, I write the most common case first, except when else branch can be omitted altogether if the condition is negated.

Some older C programming language textbooks recommend writing constant == a instead of a == constant because omitting one of the equal signs leads to a compiler error if the constant comes first (because you cannot assign to it). However, these days C compilers warn about assignments in conditional contexts, so this rule is no longer required, and following it would often obfuscate the intended meaning of the code.

Security implications arise if the value of the quantity is untrusted and needs checking. Comparisons where the quantity being checked does not stand by itself on one side of the comparison operator tend to deliver results which are not entirely reliable. Over the integers, there is no difference between a + b > c and a > c - b, but due to integer overflow or wrap-around, there is a difference when integer variables in most programming languages are involved. So the security-relevant rule is, untrusted quantities in comparisons should reside on the left hand side of a comparison operator, alone, and not be combined with other (trusted) quantities.

For example, let's assume that total_length is the total length of a protocol data unit, offset is the current read offset in the PDU (and is less than or equal to total_length), and sub_length is the length of a sub-structure at that offset, all mesured in the same units (probably bytes). If sub_length has been read from the PDU itself, it has to be checked against the overall PDU length for validity. This is a safe way to write this check:

if (sub_length > total_length - offset) {
  report_error();
  return;
}

Or, if sub_length is signed:

if (sub_length < 0 || sub_length > total_length - offset) {
  report_error();
  return;
}

In expression on the right hand side, total_length - offset, we know that offset is not larger than total_length, so there are no overflow or wrap-around issues. The C integer promition rules can be tricky, but in general, they will work out correctly here because both sides of the comparison are non-negative, so promotion from a signed to an unsigned type does not alter the value. This issue can be avoided if the variables are all of the same type.

On the other hand, if we had written offset + sub_length > total_length (which is another natural ordering, reflecting the relative order of the measured quantities of the PDU), sub_length could be so large that offset + sub_length wraps around and turns less than total_length. Depending on what happens next, this can lead to out-of-bounds array reads (and crashes), information disclosure vulnerabilities, or in some rare cases, write buffer overflows which are potentially exploitable for arbitrary code execution.

It seems that this issue is rather difficult to detect by current static analysis tools, so applying this little rule of thumb makes all the more sense.

Revisions

2012-12-26: published

Florian Weimer
Home Blog (DE) Blog (EN) RSS Feeds Impressum