System testing

Testing is the verification of the correctness of a system and simulation of failure modes. Testing is essential. Research shows that:

“92% of catastrophic system failures are the result of incorrect handling of non-fatal error explicitly signalled in software”
“In 58% of the catastrophic failures, the underlying faults could have easily been detected through simple testing of error handling code”

The limits of testing

As E. Dijkstra pointed out, testing can show the presence of bugs, not their absence. That’s because complex systems fails in complex ways and no amount of testing can predict all the types of failure.

The main limitations of testing are:

Tests validate the behaviour of the system against a specified set of input, hence it’s used to surface the known unknowns.
Most testing still happens in heavily controlled pre-release, non-production environments, hence the testing conditions are not the same.

Post-release, production “testing” is what monitoring and observability are about.

Testing and deploying strategies

There are several strategies that can be used to deploy new applications (or versions) to production, to minimise the impact of defects and help testing. Each strategy has pros and cons.

Recreate: Version 1 is terminated before Version 2 is deployed.

Pros Cons

Easy to setup Downtime

Application state renewed in one go 100% impact from a defective version
Incremental: Version 2 is slowly rolled out and replacing Version 1.

Pros Cons

Defects don’t propagate to all users Full rollout takes time and resources

Graceful data rebalancing Supporting multiple versions
Blue/Green: Version 2 is released alongside Version 1 and traffic is redirected.

Pros Cons

Application state renewed in one go Expensive, double the resources

Instant rollout Harder to transfer state
Canary: Version 2 is released to a small subset of users before full rollout.

Pros Cons

Application state renewed in one go Slow rollout

Defects don’t propagate to all users

Instant rollout for some users
A/B testing: Version 2 is released to a small subset of users under specific conditions.

Pros Cons

Instant rollout for some users Harder to troubleshoot sessions

Defects don’t propagate to all users
Shadowing: Version 2 is deployed alongside Version 1 and receives real traffic but doesn’t impact the response.

Pros Cons

Performance testing in production Expensive, double the resources

No impact to users Complex to setup

System testing

Ensure the system is periodically tested for performance and resilience.

Ensure the system employs appropriate testing strategies in the pre-release phase.

Ensure the system employs appropriate testing strategies in the deployment phase.

Ensure the system employs appropriate testing strategies in the release phase.

Ensure the system employs appropriate testing strategies in the post-release phase.

Pros	Cons
Easy to setup	Downtime
Application state renewed in one go	100% impact from a defective version

Pros	Cons
Defects don’t propagate to all users	Full rollout takes time and resources
Graceful data rebalancing	Supporting multiple versions

Pros	Cons
Application state renewed in one go	Expensive, double the resources
Instant rollout	Harder to transfer state

Pros	Cons
Application state renewed in one go	Slow rollout
Defects don’t propagate to all users
Instant rollout for some users

Pros	Cons
Instant rollout for some users	Harder to troubleshoot sessions
Defects don’t propagate to all users

Pros	Cons
Performance testing in production	Expensive, double the resources
No impact to users	Complex to setup