Skip to content

1.11 Practical checklists

1.11 – Practical checklists

This chapter provides practical checklists for preparing, running, and analyzing performance tests.

Unlike the previous chapters, which explain concepts and mechanisms, this chapter focuses on operational discipline.

The goal is to reduce avoidable mistakes and ensure that performance tests produce results that are interpretable, reliable, and useful.

Table of Contents


1.11.1 Before running a test

Objectives

Clearly define what the test is intended to validate.

Typical objectives include:

  • latency targets
  • throughput goals
  • capacity limits

A test without a clear objective may still generate data, but that data will be difficult to evaluate.

The first question should always be:

  • what is this test supposed to prove, validate, or reveal?

Workload definition

Define the workload precisely:

  • request rate or concurrency
  • request mix
  • duration

(→ 1.4 Types of performance tests)

The workload must be specific enough to be reproducible and realistic enough to be meaningful.

A vague or artificial workload can produce technically correct results that are operationally irrelevant.


Environment consistency

Ensure that:

  • test environment is stable
  • configuration matches production assumptions
  • external dependencies are controlled

If the environment changes during testing, interpretation becomes uncertain.

Performance results are only comparable if the execution conditions remain sufficiently consistent.

This is especially important when evaluating:

  • configuration changes
  • code changes
  • infrastructure changes

Metrics setup

Verify that all required metrics are available:

  • latency percentiles
  • throughput
  • resource utilization
  • error rate

(→ 1.2 Core metrics and formulas)

It is also useful to ensure that supporting signals are available when relevant, such as:

  • queue lengths
  • dependency timings
  • GC activity
  • thread or pool states

The test should not begin before visibility is in place.


Readiness checks

Before running the test, confirm that:

  • the target system is in the expected state
  • monitoring is active
  • the workload generator is configured correctly
  • the test duration is appropriate for the chosen objective
  • success and failure criteria are known in advance

This avoids a common problem in performance testing: running a technically valid test that cannot later be interpreted with confidence.


Practical interpretation

Preparation is part of the test.

Most unreliable results are not caused by complex system behavior, but by poor test preparation:

  • unclear objectives
  • unrealistic workload
  • inconsistent environment
  • incomplete metrics

A well-prepared test makes later diagnostics far easier.


Key idea

A test is only meaningful if objectives, workload, and measurements are clearly defined.


1.11.2 During test execution

Monitoring

Observe system behavior in real time:

  • latency evolution
  • throughput stability
  • resource usage

Monitoring during execution is important because some issues are visible only while the test is running, especially:

  • sudden saturation
  • unexpected queueing
  • unstable recovery
  • dependency failures

Waiting until the end of the test may hide important time-dependent behavior.


Consistency checks

Ensure that:

  • workload is applied as expected
  • no external disturbances affect the test

This includes verifying that:

  • the intended request rate is actually being generated
  • the mix of operations remains consistent
  • no unrelated activity is distorting results
  • failures are caused by the test conditions rather than by external noise

A mismatch between intended workload and actual workload can invalidate the entire interpretation.


Early signals

Watch for:

  • rapid latency increase
  • unexpected errors
  • resource saturation

(→ 1.8 Resource-level performance)

These are often the first signs that the system is approaching a limit or that the workload is exposing an unanticipated bottleneck.

Early detection matters because it allows the test operator to:

  • capture relevant evidence
  • preserve useful context
  • avoid losing the most informative part of the run

Runtime observations

During execution, it is useful to observe not only absolute values, but also change over time.

Examples:

  • latency rising while throughput remains flat
  • queue lengths growing before CPU saturation
  • errors appearing only after a specific threshold
  • p95/p99 degrading before the average changes significantly

These patterns often reveal more than isolated snapshots.

They help distinguish:

  • transient instability
  • steady overload
  • slow degradation
  • sudden collapse

Intervention discipline

During a test, avoid changing parameters unless the change is part of the test plan.

Unplanned intervention makes results harder to interpret because it mixes multiple causes into the same observation window.

If intervention becomes necessary, it should be:

  • documented
  • timestamped
  • explicitly linked to the observed behavior

This preserves the diagnostic value of the run.


Practical interpretation

Execution is the phase where theoretical preparation meets real system behavior.

A well-designed test can still become misleading if the operator does not confirm that:

  • the workload is correct
  • the environment remains stable
  • the system is behaving as expected or, importantly, as unexpectedly as the test was intended to reveal

Key idea

Execution is not passive.

Continuous observation is required to detect anomalies early.


1.11.3 After test analysis

Data review

Analyze collected data:

  • latency distribution
  • throughput trends
  • resource utilization

Data review should focus not only on average values, but also on the shape of behavior over time.

For example:

  • when degradation began
  • whether throughput scaled as expected
  • whether tail latency widened before failures appeared

This makes the analysis more diagnostic and less descriptive.


Correlation

Relate signals:

  • latency vs CPU
  • latency vs I/O
  • errors vs load

(→ 1.10 Diagnostics and analysis)

Correlation helps identify which resource or mechanism is most likely associated with the observed degradation.

However, correlation should be treated as an analytical starting point, not a final conclusion.


Interpretation

Identify:

  • bottlenecks
  • scaling limits
  • abnormal patterns

Interpretation should answer questions such as:

  • what changed first?
  • what degraded next?
  • which constraint became dominant?
  • was the degradation gradual, abrupt, or time-dependent?

This is the point where raw measurements become system understanding.


Reporting

Summarize:

  • observed behavior
  • identified issues
  • recommendations

A useful report does more than list numbers.

It should explain:

  • what the system was expected to do
  • what it actually did
  • where it diverged from expectations
  • what evidence supports the conclusion

This makes the results actionable for engineering, operations, and future testing.


Next-step orientation

After analysis, define what should happen next.

This may include:

  • re-running the same test after changes
  • refining workload realism
  • collecting deeper diagnostics
  • isolating a suspected bottleneck
  • expanding to stress, soak, or capacity testing

Without a next-step decision, analysis remains informative but not operationally useful.


Practical interpretation

Post-test analysis is where performance engineering becomes decision-making.

The purpose is not only to state that a metric changed, but to explain:

  • why the change matters
  • what it implies about the system
  • what should be done next

Key idea

Analysis transforms raw data into actionable understanding.


1.11.4 Common pitfalls

Misinterpreting averages

  • averages hide tail latency
  • percentiles provide a clearer view

(→ 1.2.7 Percentiles)

A system can appear healthy on average while still producing unacceptable performance for a meaningful fraction of requests.

This is one of the most common mistakes in test interpretation.


Ignoring workload realism

  • unrealistic workloads produce misleading results
  • production patterns must be approximated

A synthetic workload may be easier to generate, but if it does not reflect real request mix, concurrency, and dependency behavior, conclusions may not transfer to production conditions.

Realism does not require perfect reproduction, but it does require credible approximation.


Confusing symptom and cause

  • high CPU is not always the root problem
  • latency must be analyzed in context

(→ 1.10 Diagnostics and analysis)

This pitfall often leads to ineffective optimization.

The visible symptom may be only the consequence of a deeper mechanism such as queueing, blocking, or dependency slowdown.


Overlooking bottlenecks

  • optimizing non-limiting resources has little effect
  • focus must remain on the dominant constraint

(→ 1.8 Resource-level performance)

This is a frequent source of wasted effort.

A system may contain many imperfections, but only some of them matter at the current operating point.


Running tests without acceptance criteria

A test is difficult to interpret if there is no prior definition of acceptable behavior.

Without explicit thresholds, it becomes unclear whether the result means:

  • success
  • failure
  • degradation
  • acceptable risk

Performance numbers are useful only when compared to defined expectations.


Treating one test as definitive

A single test run rarely captures the full behavior of a system.

Different runs may expose:

  • warm-up effects
  • dependency variability
  • long-term drift
  • threshold behavior under different load profiles

Reliable performance analysis usually requires comparison, repetition, and validation.


Ignoring time dimension

Some problems do not appear immediately.

A short test may miss:

  • slow memory growth
  • delayed queue buildup
  • gradual dependency degradation
  • runtime instability over time

This is why test duration must match the type of behavior being evaluated.


Practical interpretation

Most mistakes in performance testing are not caused by bad tools.

They are caused by:

  • weak assumptions
  • incomplete visibility
  • poor interpretation
  • lack of methodological discipline

Avoiding these pitfalls is often more valuable than adding more measurement detail.


Key idea

Incorrect assumptions lead to incorrect conclusions.

Avoiding common pitfalls is essential for reliable performance analysis.