PinDown managed to make the test pass again by only undoing the changes from two or more commits, while keeping all other commits both younger and older as they are. This proves that the specified list of commits contain the trigger that causes the test to fail. Either at least one of the commits contain a bug or alternatively, a change that requires a related update somewhere else in the design or the test environment.

Note The confidence level for this type of bug report (validated true) is high. However, please note that PinDown only claims that there is at least one bad commit amongst the specified list of commits. The other commits may be correct, but tightly dependent on the bad commit. Alternatively there may have been some IT-issues that prevented PinDown from dwelving further into exactly which one of the commits that contains the issue. In the bug reports PinDown calls each commit a "bug" in this scenario (see example below) but this should be interpreted as there being at least one bug (and maybe not more) in the complete list of the specificed commits.

Example

images/bugreport_validated_multiple_commits_v5.png

How Did PinDown Reach This Conclusion?

PinDown tested the following during debug to reach this conclusion (using the same test and seed as in the test phase):

  • The test fails when retested a second time on the same revision as during the test phase

  • The test fails when tested on each of the listed commits (including all older commits, but no newer commits)

  • The test passes on the revision just before the oldest of the listed commits (sometimes the code is modified to achieve this)

  • Most important: The test passes on the same revision as during the test phase with only the specified list of commits unrolled.

Potential Stability Issues

In rare cases this type of bug report may be incorrect. If so, then this is due to some kind of stability problem making the test results untrustworthy.

  • Random Instability: With constrained random testing, reproducing the same test scenario using the same seed number is not possible if the testbench is too different than the revision of the testbench that was used during the test phase. However, in this case PinDown managed to make the failing test pass again on the same revision of both the testbench and the design, except for the specified list of commits whose changes were unrolled. Consequently the only possibility for random instability are the changes made in these commits. You can judge the level of risk for random instability by simply reading the bug reports. There is only a risk for random instability if any of the bug reports describe substantial changes to the testbench (design changes don’t affect random stability). Please note that this is the same level of risk as if you fix a testbench problem and re-run the test just to make sure it passes before you commit the change. There is a risk the test may pass only due to random instability.

  • Uncaptured Randomness: With constrained random testing, it is important to capture all randomly generated inputs to the test. If there is a random generator which is not controlled by a seed then this random generator needs to be turned off or captured in some other way (maybe with a different seed). Otherwise we are not retesting the same test. A variant of this scenario is when running in a very flaky environment. Maybe the hardware on which the tests are running are very different from each other. This is not the case with computer farms, but when you are testing ASIC samples and PCB boards in an early development phase it may be the case. In this scenario it is important to re-run on exactly the same piece of hardware, again to minimise randomn behaviour. Some instability can be handled with the set_error_response command.

  • PinDown Setup Problem: If PinDown has been setup incorrectly, e.g. not having the correct criterias for pass and fail then PinDown will interpret the results incorrectly (see the Extraction User Guide).

  • Severe IT issues: If all failures are due to IT issues, both in the test phase and when re-running the tests during the debug phase then there may not be any real failures at all, just intermittent IT-issues. In this scenario you should increase the number of times a test is repeated in the debug phase to make sure it is a real deterministic failure before the debugging starts (see the set_error_response command).