Enthused: Enthused About: Model-Based Testing

For essentially the past 4 years, I have been doing research in the area of model-based testing. In this post, I outline what I consider motivation, foundations, challenges, and a brief perspective on the future of model-based testing (MBT).

Software testing is hard, but why? It has been my experience that the vast majority of difficulties in software testing can be boiled down to one statement:

For any program, the number of input value combinations which could require testing grows exponentially with every opportunity for the user to make a choice.

One corollary to this statement is that, for any non-trivial application, testing every possible scenario is nowhere close to feasible. The question then becomes, "How much testing is enough?" We've devised strategies as testers to help us cope with this exponential explosion of potential testing points, and the effectiveness of such approaches depends on the validity of assumptions we make. We can consider this domain of testing and assumptions more formally. Here is one formalism developed from the bottom up:

Testing a program P means verifying that the output of P matches expectations for all combinations of inputs to P.

This may work for programs of the form "Input -> Process -> Output", but how can we apply it to an application whose output is continuous?

Testing a program P means verifying that the state of P matches expectations for all combinations of inputs to P.

This seems better in that we can stop and test at any point. But, what about programs which have continuous input as well?

Testing a program P means verifying that the state of P matches expectations after each step of input for all combinations of inputs to P.

Even better. What does "the state of P" mean?

Testing a program P means verifying that all usable data from P matches expectations after each step of input for all combinations of inputs to P.

Now we're cooking. Can we add some formalism to the notions of input, output, step, etc.?

Testing a program P means verifying that A(P(I)) = E(P(I)) for all possible sequences I in D(P), where:

D(X) represents the input domain of a program X.

Note that "all possible sequences I" also includes subsequences of I. The subsequences of I cover the testing required "after each step of input."

P(x) represents a program P to which an input sequence x has been applied
A(X) represents the current ("actual") set of all usable data from a program state X
E(X) represents the expected set of all usable data from a program state X

Even in constructing this fairly generic definition, we've made some fairly broad assumptions (e.g., the assumption that all usable data can be reliably obtained and predicted given a program's current state) and we've left off some details, like the fact that any one "step" of input could actually be a set of inputs (e.g., the set of command-line arguments provided when a program is launched). But for the sake of our own sanity, let's roll with this.

Under this definition, testers are tasked with producing (I, E(I)) pairs for all input sequences I. Realizing that this is not possible for truly ALL sequences of I, we start making assumptions.

With the goal of clearly defining the assumptions we tend to make while testing a piece of software, let's re-visit this now clearly massive domain which we are trying to effectively test. We can represent our formal notion of testing as a directed graph, where each node represents the usable state of a program, and each edge from two nodes A and B represents an input which would cause the program to transition from the state represented by node A to the state represented by node B. Now we can talk about assumptions made during testing in terms of their impact on this graph:

The number of nodes in our graph depends largely on our notion of "usable data". For example, is data only usable from the UI, or are there other interfaces that need verification? More verification points = more distinct nodes.
If we assume that two types of input should be treated equally by the program between one state and another (i.e., that they are equivalence classes), we are combining edges.
We can similarly use equivalence classes to combine nodes, if we can assume, for example, that the program's usable data after application of one input will be synonymous with its usable data after application of another input.

Whatever graph remains after applying such reductions by making assumptions represents the domain which should be covered as part of performing "enough" testing of the application. Stated another way, our notion of "enough" testing is the product of the assumptions we are willing to make about an application's equivalent inputs and equivalent sets of useful data.

In general, I think model-based testing is important for software testers to understand because our brains work from abstractions. In one sense, we all do model-based testing to some extent. The most common errors that I make when testing software come from faulty assumptions in the three categories above: forgetting to consider an externally facing verification point, falsely assuming that two pieces of data should be treated equally when they are actually not, and falsely assuming that an application is in an equivalent state after two specific sequences of input when it is not. I suspect that these are common mistakes for other testers too, and maybe thinking about a model more formally (even if you don't feel the need to draw it out, etc.) can help you avoid these mistakes.

In the domain of model-based software testing, researchers develop techniques and tools for the consideration of models in software testing. For their outcomes to be useful, they often consider the validity of assumptions based on their ability to reduce the size of models-to-be-covered while still finding bugs.

One interesting example of model-based testing research related to my work is the GUITAR project, an open-source framework for model-based testing of GUI-based applications that we have developed and maintain at Maryland. In working with GUITAR, we often (but not necessarily) make a healthy bit of additional assumptions in order to keep our techniques tractable, such as:

The state of a GUI-based application's GUI represents a reasonable approximation of its larger program state, in terms of both inputs and useful data
A GUI-based application only receives input events through its one-time command-line arguments or its GUI
Some input events (e.g., events which appear to modify underlying, non-GUI program state) are more important than others

The first assumption is particularly helpful, as we can even reverse engineer a model of an application's GUI state with the proper tool support (we call our tool for this task the "GUI Ripper"), we can automatically generate test cases which cover this model (we call this our "Test Case Generator", and we have many such tools which generate test suites with various goals and coverage guarantees), and we can automatically replay test cases against the application (done by our "GUI Replayer").

What does the future hold for MBT? I think that model construction and model verification (e.g., by human input or integrity checks) are two major challenges for MBT, and I expect research in these areas to continue. Hopefully, the techniques and tools offered by MBT will become practical enough for application to common domains like Web and mobile testing. I think good old-fashioned manual interaction with an application, maybe even by end-users, will prove to be a key to effective model construction and overall test strategy as well.

I guess what I'm saying is ... I hope someone actually uses my research one day :)

Enthused

Pages

August 11, 2013

Enthused About: Model-Based Testing

1 comment: