How I classify software bugs

Previously I introduced what I think is a useful categorization of software implementation bugs. These are bugs where the software does not conform to the specification, and as such should be caught during verification, not validation.

These are:

business logic bugs
concurrency bugs
system interaction bugs
hardware interaction bugs

I’ll give a quick run-through of what I mean by each of these categories. The reason I categorize bugs in this way is that each of these different types of bugs are most effectively caught by a different type of verification.

Business Logic bugs

These bugs are the most straightforward. Your code doesn’t handle a case correctly, or an algorithm has a mistake in its implementation. For some defined input, a part of your code does not give the correct output or cause the correct action.

Example: A state-machine implementation is missing a required transition.

Hardware interaction bugs

These bugs occur when you have an imperfect understanding of how hardware components work, and when your software interacts with real, physical component, there’s a problem.

Example: Your I2C driver on your microcontroller does not support clock-stretching, and a sensor on the I2C bus unexpectedly employs clock stretching.

System interaction bugs

When multiple systems interact, the possible states of each system are multiplied so that there is an explosion of different states that the system as a whole can be in. It’s easy to make a mistake and not correctly foresee or handle all of these different states.

A system may work fine by itself, but bugs are revealed only in the interaction between two or more systems.

Example: a user-interface application gets out of sync with a real-time application because of a dropped communication packet, and the user interface starts displaying incorrect data.

Concurrency bugs

Multi-threaded systems bring a host of well-known issues such as:

race conditions
priority inversion
deadlocks
starvation
…and more

These bugs are dependent on timing. If an event (such as an interrupt) occurs at precisely the wrong time (often a very narrow window of time), the error occurs.

These bugs are often extremely hard to reproduce, since they can occur rarely and unpredictably.

Example: a FIFO buffer does not properly disable interrupts when adding or removing elements. An interrupt happens to occur between two CPU instructions and corrupts the buffer. The error may not cause a fault until sometime later in a very different part of the code.

How to catch each type of bug?

Next time, I’ll go into more detail about which verification tactic is best suited for catching each type of bug. As a little preview, here are some different means of verifying software during the design output and verification phases:

design review
code review
static analysis
unit testing on host
unit testing in simulation
integration testing in simulation
subsystem testing on hardware
system testing on hardware

Happy developing!