Byte Code Dotnet Mutation Test Utility— Faultify

Daposto
10 min readJan 20, 2021

For the past 10 weeks, I and some other people have been working on a mutation test tool called ‘Faultify’. This tool deliberately introduces logical errors in a codebase to test the test quality.

Introduction

In regards to Stryker, so far the only best option to perform .net source-code mutations, Faultify offers byte-code mutation testing. This method is chosen for two reasons: 1) To explore if bytecode mutations are a better alternative to source code mutations 2) To provide an alternative mutation test utility to the .NET ecosystem.

We love the work Stryker has done and their work has helped us in the development process. Both Faultify and Stryker are trying to accomplish the same task from a different perspective and therefore I think they complement each other well. This article will cover the development overview and the encountered challenges.

Contents

  • What is Mutation Testing
  • What Is a Mutation (pseudo infinite loops)
  • Mutation Technique (source-code and mutation switching/byte-code)
  • Optimalisation (memory-mapped files, ramdisk, parallel execution, bin packing, code coverage).
  • Mutation Coverage
  • Byte-code vs Source-code

What is Mutation Testing

Fault Insertion, Fault Injection, Mutation Testing, Mutation Injection are terms that refer to the same subject. With mutation injection, one introduces changes to the logic of a codebase. If a logic mutation (change) is introduced and a test still succeeds then the test may not be fully reliable. This fact can be used to calculate determine the unit test quality.

What Is a Mutation

A mutation is a change in operators, constant values, or variable declinations. Examples of possible mutations are:

  • Arithmetic (+, -, /, *, %) Operators.
  • Assignment Expressions (+=, -=, /=, *=, %=, — , ++).
  • Equivalence Operators (==, !=).
  • Logical Operators (&&, ||).
  • Bitwise Operators (^,|, &).
  • Branching Statements (if(condition), if(!condition)).
  • Variable Literals (true, false).
  • Mutate Constant Fields (string, number, boolean).

The operators above should be mutated to a variant that invalidates the logic. For example, a ‘+’ can be changed to a ‘-’, ‘<’ to a ‘>’, ‘true’ to a ‘false’ etc… With some types of mutations, it may be possible that the mutation results in the same value such as ‘1 + 1’ to ‘1*1’, therefore for these mutations all variants ‘+, -, *, /’ should be performed for the best result.

// Original
public int Add(a, b) {
return a + b;
}
// Mutation 1
public int Add(a, b) {
return a - b;
}
// Mutation 2
public int Add(a, b) {
return a * b;
}

Infinite Loop Mutation

A mutation, such that you get ‘while(true)’, is able to cause an infinite loop. This is an issue because it will cause the test session to run infinitely. This can be tackled in several ways:

  1. An initial test run is performed to calculate the run time of all unit tests without mutations, then this metric is used as a process timeout value.
  2. Before the test run is performed, it is validated whether a mutation can cause an infinite loop. This is a faster solution because you do not have to interrupt a test process and start it again if an infinite loop mutation occurs. However, this can be very difficult to detect in advance for the following reasons:
    1) A loop can contain many conditions that can potentially interrupt the loop using a return or break. All conditions have to be checked in order to know whether a mutation causes an infinite loop.
    2) There can be various loops with different mutations. It is relatively easy to detect that a `while(false)` to `while(true)` mutation results in an infinite loop. However, this is way more difficult for a `while(a < b)` to a `while(a > b)` mutation.

Besides infinite loops, pseudo-infinite is also an edge case that should be taken into account. A pseudo-infinite loop means that a loop takes some time to run however it is finite. The for loop mutation ‘++’ to ‘- -’ will flip the iteration direction which can take a long time (usually int.Max times) before it completes.

Mutation Technique

There are two main ways to mutate code logic:

  1. At the source code level (modifying syntax trees and compile the mutations)
  2. At the byte code level (modifying the byte code/CIL in the compiled assembly).

Mutation Switching

If mutations are performed at the source code level then the code must be compiled to a binary before the test can be run with those mutations. In a large codebase, there can easily be thousands of mutations. If the source code is to be recompiled for each mutation the process will become extremely slow.

Therefore, the source code technique is often used with ‘Mutation Switching’/’Mutant schemata’. In short, this means that all mutations are compiled into the binary. The test process will then turn the mutations on or off with, for example, an environment variable. This is an example of how a ‘+’ is mutated to a ‘-’ and ‘*’ looks like:

public int Add(int a, int b) {
if (Environment.GetEnvironmentVariable("ActiveMutant") == 0) {
return a - b;
} else {
return a * b;
}
}

Thus, the test process can set the ‘ActiveMutant’ environment variable to ‘0’ to execute the ‘-’ and to 1 for ‘*’.

Byte Code Mutation

The advantage of byte code manipulation is that the mutated source code does not need to be recompiled and not all mutations need to be injected ahead of time. For Faultify we use ‘Mono.Cecil’. This is an excellent library for manipulating ‘IL-CODE (CIL)’. A major drawback I have encountered is that the library is very poorly documented which makes it difficult to get started.

‘Mono.Cecil’ is made up of the following structure:

A module is a compiled assembly, the types are classes for example, and these classes have methods, properties, and fields. On these levels, as described earlier, mutations can be performed.

Let’s mutate the following method:

public int Addition(int lhs, int rhs)
{
return lhs + rhs;
}

Code Example

In the left image, you can see the IL code of the ‘Addition’ method inspected with ‘Il-Spy’.

Instruction IL_0003 has the ‘add’ opcode. If we change this to ‘sub’ then the operation here becomes a subtraction instead of addition (dn-spy can be used for editing IL-code manually). The following image shows the ‘add’ to ‘sub’ motion with ‘Mono.Cecil’:

Here you can see that aModuleDefinition has many TypeDefinitions and that the type has many MethodDefinitions who on its turn has Instructions. We want to mutate the ‘add’ opcode to ‘sub’ opcode.

For an extensive view on the meaning and edge cases of opcodes checkout the Wikipedia and Microsoft pages.

Big Gotcha

There are many paths leading to Rome. This is the same for comparing values in IL-Code. This is demonstrated in the following list with the branching comparison variant and the comparison only variant (see Wikipedia for their meaning).

  1. blt: The effect is identical to performing a clt instruction followed by a brtrue branch to the specific target instruction.
  2. bgt: The effect is identical to performing a cgt instruction followed by a brtrue branch to the specific target instruction
  3. bge: The effect is identical to performing a clt instruction (clt.un for floats) followed by a brfalse branch to the specific target instruction.
  4. beq: The effect is the same as performing a ceq instruction followed by a brtrue branch to the specific target instruction.

It turns out that the compiler will usually optimize control flow by translating a boolean operator like ‘<’ into its IL complement branching instruction (clt). Hence it can be the case that different compilers generate different IL-code. My compiler will always generate the comparison operator (clt) however on another PC it might as well use the branching variant (blt). This scenario might confuse one if certain mutations don't seem to be working. This Microsoft article goes into this issue deeper.

Optimalisation

‘dotnet test’ has a process startup/shutdown overhead time of about 1 second. For example, take 2000 mutations, this will take 2000 seconds (33 minutes) for the process management only. Therefore optimizations are essential for a well-functioning mutation tool. There are several possibilities to do this:

Run Tests from Memory

‘dotnet test’ is a wrapper over ‘vs-test-console’ which are both external processes. In an ideal scenario, you would want to be able to load unit tests into memory and then run them from code. This can save about 1 second per test run. As far I was able to find out vsconsole can only be used as an external process (`VsConsoleWrapper`). However, it might be possible for Nunit to be run directly from code. Though that will limit the support for other test frameworks.

Memory Mapped Files and or Ramdisk

Memory-mapped files (a really amazing undervalued technique) and/or a ramdisk makes use of RAM memory allowing for very fast file reading/writing. The ramdisk can be used to run the entire test process and the memory-mapped files can be used for the assemblies under mutation.

Mutation Bin Packing

A mutation can cause a test to fail or to succeed, so it is impossible to mutate two mutations that are covered by a test at the same time. If you did, there is no way to tell which mutation caused the test to fail. This scenario is a perfect bin-packing problem. The test is the bin and the mutation is the packet. Each test can have only one mutation at a time and you can run one test within a test session. Following that algorithm, multiple mutations can be performed at the same time. In order to implement this algorithm, one has to measure code coverage.

Run Mutations in Parallel

An assembly is mutated and the tests have a reference to this assembly. It is not possible to run multiple test sessions because they would mutate the same assembly. In addition, the CLR locks the assemblies of a test process. To use multiple processes, the tests and assemblies under test should be duplicated so that each test process has its own assembly files. Faultify duplicates the entire test project N times. The test runs are then able to acquire a test project and when finished also free it such that others can use it.

Mutation Coverage

Faultify measures code coverage by injecting a static register function in both the unit test and all methods from the assembly under test. When the code coverage test run is performed those static functions are called. First, the unit test that runs registers its name after that all methods called by this unit test will register their ‘Entity Handle’. After this run, we can exactly see what unit tests covered which methods. Difficulties are:

  1. Since unit tests can cover code behind interfaces it's impossible to use reflection to test code coverage.
  2. To know whether a mutation is covered we need to somehow identify this mutation. We had complications doing so, instead, we register only the method entity handle. This entails that test coverage is method based not mutation-based which implies that more mutations will be performed.

Byte-code vs Source-code

I think that these two methods both have advantages as disadvantages. And that they are both valid ways to implement mutation testing. I did some Benchmarks and found out:

  • Both stryker and faultify have about the same amount of mutations and the same score.
  • With this particular project, stryker is faster when 1–2 test runners are configured.
  • Faultify becomes significantly faster when more then 2 test runners are configured. On a larger project with `259 mutations,` it took Faultify 55 seconds, which is `0,21` seconds for a mutation. Whereas for stryker it ook about `150` seconds which is `0,58` seconds permutation. This is a speed increasement of about `58%`.

Sourcecode Mutations with Mutation Switching:

Pros:

  • Recompilation is not required when using mutant switching though this implies that for any new mutations or edits to current mutations the entire assembly is to be recompiled.
  • The exact mutation location/line can be shown to users.
  • Mutation coverage can be easily calculated.
  • Its easier to run mutation test runs in parallel compared to byte-code.

Cons

  • Control over individual mutations is limited since mutations can not be injected without recompilation at runtime.
  • Constant, Method Names, Access modifiers mutations are impossible.
  • With some mutations compile errors can be generated.

Byte-Code

Pros:

  • Recompilation is not required.
  • Integrates with all .NET languages working on CIL.
  • More flexibility and control over mutations since mutations can be injected without recompilation at runtime.
  • Constant, Method Names, Access modifiers mutations are possible.
  • Detailed control since only required mutations can be injected; This is useful when inspecting it with ‘ILSPY’ of ‘DNSPY’.
  • There is a lot of flexibility in having access to IL-code.

Cons

  • It is more difficult (not impossible) to show the exact mutation location/line since IL-code does not have code-lines.
  • Some mutations like array mutations require complex IL-structures.
  • It's more difficult (not impossible) to run mutation test runs in parallel compared to source-code.
  • Calculating code coverage for individual mutations is next to impossible, therefore something like method-based coverage is used in Faultify.

Summing Up

This project can be found on the Faultify GitHub page. It is still an early phase and there are great plans ahead to make Faultify faster and to have it support more advanced mutations. By doing so we hope to provide a good alternative way to perform mutation testing in the Dotnet ecosystem.

--

--

Daposto

Programmer, problem solver, learning everyday. I write about anything mainly to straighten my own thoughts.