Skip to content

1.4 Types of performance tests

1.4 – Types of performance tests

This chapter introduces the main categories of performance tests used in performance engineering.

Each type of performance test answers a different question about system behavior under load.

Taken together, they help evaluate system performance, stability, scalability, recovery, and capacity in a controlled and measurable way.

Table of Contents


1.4.1 Purpose of performance testing

Definition

Performance testing, as already discussed in the previous paragraphs, evaluates how a system behaves under controlled workload conditions.

It provides measurable data on:

  • latency
  • throughput
  • error rate
  • resource usage

(→ 1.2 Core metrics and formulas)

Performance testing is therefore not only a measurement activity, but also a validation activity.

It is used to compare expected behavior (defined in the NFRs) with observed behavior under defined workload conditions.


Role in performance engineering

Performance testing is not only about measuring results.

It is used to:

  • validate system behavior under expected conditions
  • reveal bottlenecks and limitations
  • support capacity planning
  • validate architectural decisions

It also provides a controlled framework for comparing:

  • versions of the same system
  • different configurations
  • infrastructure changes
  • tuning choices

Without controlled testing, performance discussions often remain based on assumptions rather than evidence.


Workload as a model

A test workload represents a simplified model of real usage.

It defines:

  • arrival rate (requests per second)
  • concurrency (number of active users or requests)
  • request patterns (distribution, mix of operations)

(→ 1.2.1 Little’s Law (system-level concurrency))

A workload is not the exact mirror of real production usage in itself.

It is a practical approximation of the most relevant usage patterns.

For this reason, the value of a performance test depends strongly on how realistic the workload model is.


Controlled conditions

A performance test is meaningful only if the execution conditions are well defined and controlled.

This includes:

  • workload definition
  • the duration of the test
  • the environment in which it runs
  • the metrics collected during execution

If these conditions are unclear, the results, although still numerical, will have little or no knowledge value or predictive value.

Control of the initial conditions is one of those parameters that transforms a test from a simple exercise into an indispensable engineering activity.


Performance testing is therefore the entry point to many of the concepts developed in the rest of this document.

As an overall testing practice, it reveals:

For this reason, test design should always be connected to deep and overall knowledge of the system.


Practical meaning

A good performance test does not answer only:

  • “How fast is the system?”

It also helps answer:

  • “Under which conditions does the system remain stable?”
  • “What changes as load increases?”
  • “Which limit is reached first?”
  • “What kind of degradation appears?”

These questions are essential in performance engineering because they connect measurement to interpretation.


Key idea

Performance tests are controlled experiments.

They are designed to observe system behavior under specific workload conditions.

Their value lies not only in the measurements they produce, but above all in the understanding they provide.


1.4.2 Load testing

Definition

Load testing evaluates system behavior under standard or typical workload.

It is the most common and most direct way to validate that a system behaves acceptably under normal operating conditions.


Objective

  • verify that the system satisfies performance requirements
  • validate latency and throughput targets
  • observe resource usage under normal conditions

Load testing answers the question of whether the system behaves correctly in the operating range it is expected to support.


Characteristics

  • workload is stable and controlled
  • system operates within its expected range
  • focus is on steady-state behavior

The purpose is not to bring the system to its limits, but to establish whether it behaves correctly under a (production) load it was designed for.


Example

A system designed for:

  • 200 requests per second
  • p95 latency < 300 ms

A load test verifies that these targets are met.

It may also verify that:

  • error rate remains low
  • throughput remains stable
  • resource utilization remains within acceptable bounds

Diagnostic value

Load testing provides a baseline:

  • normal latency distribution
  • typical resource utilization
  • expected throughput

This baseline is essential for comparison with the other tests.

Without a reliable baseline, it is difficult to determine whether behavior observed in stress, spike, soak, or capacity tests is abnormal or simply normal for the system under analysis.


Limits of load testing

Load testing alone does not determine:

  • the maximum system capacity
  • the system breaking points
  • the long-term stability of the runtime
  • the recovery behavior after abrupt changes in load

A system may pass a load test and still fail under overload, prolonged execution, or rapid bursts of traffic.

For this reason, load testing is necessary but not sufficient.


Practical interpretation

Load testing is the reference point for the rest of performance analysis.

It defines the system’s normal operating behavior and allows later tests to be interpreted in their context.

If the system already behaves poorly under standard load, there is little value in moving immediately to more advanced test types.


Key idea

Load testing answers: “Does the system behave correctly under expected load?”

It establishes the baseline against which all other performance tests can be interpreted.


1.4.3 Stress testing

Definition

Stress testing evaluates system behavior beyond its expected capacity.

It is used to observe what happens when the system is pushed outside its intended operating range.


Objective

  • identify system limits
  • observe behavior under overload
  • detect failure modes

Stress testing mainly concerns system behavior at the limit and the degradation of working capacities under load exceeding expected standards.


Characteristics

  • workload increases beyond normal levels
  • system approaches or reaches saturation

(→ 1.8 Resource-level performance)

The overload may be applied progressively or maintained at a clearly excessive level.

In both cases, the objective is to expose the way the system behaves when demand exceeds capacity.


Observable effects

  • latency increases rapidly
  • throughput plateaus or decreases
  • error rate increases

(→ 1.5.3 Non-linear degradation)
(→ 1.5.4 Throughput collapse)

Additional effects may include:

  • queue buildup
  • timeout amplification
  • pool exhaustion
  • unstable resource usage
  • retry-driven overload

Diagnostic value

Stress testing reveals:

  • bottlenecks
  • saturation points
  • system stability under pressure

It is particularly useful for understanding whether degradation is gradual, abrupt, recoverable, or unstable.

Two systems with similar load-test results may behave very differently under stress.


Failure behavior

An important aspect of stress testing is not only whether and when the system fails, but how it fails.

Relevant questions include:

  • Does latency increase before errors appear?
  • Do errors appear gradually or suddenly?
  • Does throughput flatten before it collapses?
  • Does the system recover when load is reduced?

These questions matter operationally because overload is a realistic scenario in production systems.


Distinction from capacity testing

Stress testing and capacity testing are related, but different.

  • stress testing focuses on overload behavior and failure modes
  • capacity testing focuses on the maximum sustainable load that still satisfies requirements

Stress testing therefore continues beyond the acceptable operating range in order to examine degradation and failure.


Practical interpretation

Stress testing is useful when the engineering question is not only:

  • “How much load can the system support?”

but also:

  • “What happens after it can no longer support the load?”
  • “Does it degrade gradually?”
  • “Can it recover cleanly?”

These are essential questions for resilience and operational robustness.


Key idea

Stress testing answers: “What happens when the system is pushed beyond its limits?”

It reveals how the system degrades, how it fails, and how much overload it can tolerate before becoming unstable.


1.4.4 Spike testing

Definition

Spike testing evaluates system behavior under sudden increases in load.

Unlike load testing or gradual stress testing, spike testing focuses on rapid transitions rather than stable operating conditions.


Objective

  • observe reaction to abrupt workload changes
  • evaluate elasticity and recovery
  • detect transient instability

Spike testing is particularly relevant for systems exposed to bursty traffic, campaign peaks, event-driven demand, or short-lived surges of activity.


Characteristics

  • workload increases rapidly and in a very short time
  • system must adapt quickly

The defining characteristic is not only the volume of load, but the speed at which load changes.

A system may handle a high load when it is reached gradually, but behave poorly when the same load arrives suddenly.


Observable effects

  • temporary latency spikes
  • queue buildup
  • potential errors during transition

(→ 1.5 System behavior under load)

Additional effects may include:

  • delayed scaling response
  • transient connection exhaustion
  • temporary timeout cascades
  • slow recovery after the burst

Diagnostic value

Spike testing reveals:

  • sensitivity to bursty traffic
  • queueing behavior under sudden load
  • recovery capability after the spike

This type of testing is valuable because many systems are optimized for steady-state conditions but remain fragile during abrupt transitions.


Recovery behavior

The most important part of spike testing is often what happens after the spike.

Relevant questions include:

  • Does the system return quickly to normal latency?
  • Do queues drain in a controlled way?
  • Are resources released correctly?
  • Does the system remain degraded after the spike has passed?

A system that survives the spike but recovers slowly may still be operationally weak.


Practical interpretation

Spike testing is particularly useful for systems that are:

  • externally exposed to bursty traffic
  • dependent on auto-scaling or elastic behavior
  • sensitive to queue buildup
  • subject to event-driven demand changes

In these cases, average load is often less important than short-term peaks and the system’s reaction to them.


Key idea

Spike testing answers: “How does the system react to sudden load changes?”

It evaluates not only resistance to bursts, but also the ability to recover cleanly after them.


1.4.5 Soak testing

Definition

Soak testing evaluates system behavior over an extended period under sustained load.

It is sometimes also called endurance testing.

Its purpose is to expose problems that do not appear in short-duration tests.


Objective

  • detect long-term issues
  • observe stability over time
  • identify gradual degradation

Soak testing is less concerned with peak performance and more with consistency, accumulation, and drift.


Characteristics

  • workload is constant or slowly varying
  • test duration is long (hours or days)

The key dimension is time.

Some systems behave correctly for minutes but degrade after hours because of accumulation effects.


Observable effects

  • memory growth
  • resource leaks
  • performance degradation over time

(→ 1.7 Runtime and memory model)

Additional long-duration symptoms may include:

  • thread accumulation
  • connection leakage
  • slowly increasing queues
  • GC overhead growth
  • cache imbalance or uncontrolled retention

Diagnostic value

Soak testing reveals:

  • slow memory leaks
  • resource exhaustion
  • long-term instability

It is often the only reliable way to validate whether the system remains healthy and operable during prolonged activity.

This is essential for production systems that must run continuously.


Time-dependent degradation

Soak testing is important because some breakdowns are not threshold-based, but time-based.

Examples include:

  • memory retained slowly over time
  • pools not fully released
  • background tasks accumulating drift
  • retry patterns slowly increasing pressure
  • caches growing without effective eviction

These issues may not appear in short-duration load tests or stress tests.


Operational value

A system that performs well for ten minutes but degrades after six hours is not stable.

Soak testing therefore contributes directly to:

  • validation for production deployment
  • confidence in the runtime
  • long-term reliability assessment
  • infrastructure and runtime sizing

It also helps validate that monitoring remains meaningful over long periods of operation.


Practical interpretation

Soak testing is particularly important for systems with:

  • long uptimes
  • background processing
  • memory-managed runtimes
  • connection-heavy architectures
  • resource pools that change slowly over time

In such systems, short-duration performance results are not sufficient to guarantee real stability.


Key idea

Soak testing answers: “Does the system remain stable over time?”

It validates long-duration behavior and reveals issues caused by accumulation, drift, and slow degradation.


1.4.6 Capacity testing

Definition

Capacity testing determines the maximum workload a system can handle while satisfying performance requirements.

It is used to identify the practical operating limit of the system under acceptable conditions.


Objective

  • identify the maximum sustainable throughput
  • determine safe operating limits
  • support capacity planning

Capacity testing is therefore directly linked to planning, sizing, forecasting, and operational decisions.


Method

  • possible unit tests for dimensional baseline
  • gradually increase workload
  • monitor latency, throughput, and errors
  • identify the point where performance degrades

The increase in load should be controlled and measurable.

This allows the system limit to be located more precisely than in a purely exploratory stress test.


Interpretation

The capacity limit is reached when:

  • latency exceeds acceptable thresholds
  • error rate increases
  • throughput no longer scales

(→ 1.2 Core metrics and formulas)
(→ 1.5 System behavior under load)

In practice, the limit is not always a single exact value.

It may be better understood as a range in which acceptable behavior begins to deteriorate.


What capacity testing reveals

Capacity testing reveals:

  • the highest sustainable load under defined acceptance criteria
  • the margin between expected load and maximum acceptable load
  • the relationship between increasing demand and degrading behavior
  • the point at which additional load no longer produces useful throughput

This information is essential for engineering and planning decisions.


Relationship with capacity planning

Capacity testing is one of the main inputs to capacity planning.

It helps answer questions such as:

  • How much traffic can the current system support?
  • How much headroom is available?
  • When will scaling be required?
  • Which component constrains capacity first?

This makes capacity testing particularly useful for forecasting and operational preparation.


Distinction from stress testing

Capacity testing does not consist in forcing failure for failure’s own sake.

It consists in identifying the highest load that still satisfies defined requirements.

  • capacity testing stops at or near the acceptable limit
  • stress testing continues beyond that limit to examine overload behavior

The distinction matters because many business and engineering decisions depend on safe operation, not on total failure.


Practical meaning

Capacity is not only a number.

It depends on:

  • workload mix
  • concurrency level
  • latency objectives
  • acceptable error rate
  • resource constraints

For this reason, every capacity value must always be interpreted in the context of the workload and the acceptance criteria used during the test.


Practical interpretation

Capacity testing is most useful when the engineering objective is to answer:

  • “What is the safe operating range?”
  • “How much headroom do we have?”
  • “When do we need to scale?”
  • “What constrains future growth?”

It is therefore one of the most decision-oriented forms of performance testing.


Key idea

Capacity testing answers: “How far can the system scale before it degrades?”

It identifies the maximum sustainable operating range, not only the point of failure.