Debug Performance

PinDown is made for large test systems. It can handle large amounts of test failures and large teams making many commits to the revision control systems.

The reason PinDown can handle large amounts of test failures is because it only debugs the fastest failing test per bug. There may be thousands of test failures which are due to only 20 bugs, in which case PinDown will only debug 20 tests, one from each bug group. The other tests in the same bug group can still be validated even though they are not debugged. If PinDown made the fastest failing test pass again by unrolling a specific bad commit, then all other tests associated with this bug should also pass with this modification, which means they all need to run just once. On top of this it is possible to configure how many additional tests in a bug group that should be validated. If 1000 tests are suspected of failing for the same reason it may good enough to validate this with say 10 randomly selected tests.

PinDown can handle large amounts of commits to the revision control system because it uses a parallel binary algorithm when exploring the revision history, which can cover a history of 1000 commits by testing only 30 different snapshots. On top of this PinDown typically runs 10 jobs in parallel on the local computer farm, which means these 30 different snapshots will take the same time as rerunning the test in series on 3 different snapshots. For example if it takes 1 hour to checkout, compile and run one test on average then it will take PinDown 3 hours to find one potential bad commit amongst 1000. However this extremely fast algorithm has one downside: in this example it has skipped 970 snapshots that have not been tested. There is a chance that a test has failure has been fixed and a newer failure have been introduced which the binary algorithm misses. PinDown has a trick to combine the extreme speed of the binary algorithm with full precision: validation. Whatever the binary algorithm finds is just candidates. The ultimate test for PinDown is whether it can make the failing test pass again by only removing the bad commit (or combinations of bad commits) while keeping everything else the same as when the test failure was reported. 

In the performance demo below 9550 tests are run out of which 994 tests fail and there has been 1000 commits made recently from different developers. PinDown is able to find the bugs in a fraction of time it took to run the test suite.

 PinDown Performance Demo

Using the Farm Efficiently

PinDown uses 10 jobs on the farm during debug, which means it also consume max 1-10 simulator licenses during the debug session. This is normally much less than the customers initially regression run which may launch hundreds or thousands of test jobs on the computer farm. Bugs debugged by PinDown are fixed much faster, which means there is less chance that these issues still remain when the next regression run starts. The largest waste of computer farm resources issues is to have consecutive runs report the same failures in multiple runs. It is more efficient to use 10 jobs for debug and have the problems fixed before the next run start with a much higher likelihood.

usefarmefficiently

Fig 1. Using the farm efficiently

Also note that PinDown does not debug a failure that it has already debugged in an earlier run, even if it is a new seed number that detects the same issue that was found by another seed in the previous run. If a test failure is grouped together with a known open bug then PinDown can immediately try to fix the new failing test + seed, by unrolling the bad commit associated with the bug report. If that fixes the test then PinDown knows after having just run one test, using just one of its 10 job slots, that this new failure is a known open bug. No debug is necessary in this case. This is why PinDown handles regression testing with random tests very efficiently. It does not get confused by the fact that each run has new seed numbers and it does not waste any extra debug efforts on known open bugs.