Skip to content

1.3 – Work of a performance engineer

This section describes what performance engineering is, in practice, and how it is applied to real systems.

Table of Contents


1.3.1 What performance engineering is (in practice)

Definition

Performance engineering is the discipline that consists in understanding, measuring, and controlling the way a system behaves under load.

It is limited neither to performance testing, nor to specific tools or technologies.

It refers rather to an overall methodology for reasoning about systems under load or, possibly, under stress.

It focuses on the overall behavior of the system and not on isolated metrics or on individual components.


Performance and non-functional requirements

Performance engineering does not focus on a single property.

When a system is exercised under load, a subset of non-functional requirements (NFRs) becomes visible:

  • latency and throughput (performance)
  • scalability (vertical and horizontal)
  • stability and resilience under stress
  • resource usage and efficiency
  • capacity limits

These properties are not independent of one another.

They all emerge together as the system is brought to its limits.

Load acts as a forcing function that reveals the way the system behaves.

A system that appears perfectly balanced under low load may show completely different behavior when it is stressed.


What performance engineering actually observes

Under load, a system reveals:

  • how work passes through its components
  • how resources are consumed
  • where contentions appear
  • where queues form
  • which limits are reached first

This requires:

The objective, evidently, is not only to observe system behavior, but also to explain it.


Not just testing

Performance engineering is often reduced to load testing alone.

In practice, the testing phase is only one part of the work.

Tests are used to:

  • expose system behavior
  • validate assumptions
  • reproduce problems

But performance engineering also includes:

  • analyzing system design
  • investigating production issues
  • sizing resources (heap, pools, threads, connections)
  • explaining observed behavior

Testing without analysis produces data without understanding.


Practical perspective

In real scenarios, the work typically involves:

  • preparing and calibrating test environments
  • interpreting non-functional requirements (NFRs)
  • identifying and defining (meaningful) test scenarios with respect to NFRs
  • validating behavior with controlled use cases
  • applying load or stress to make problems emerge (often in white-box)
  • identifying and correcting bottlenecks
  • sizing system components (CPU, memory, pools, concurrency limits)
  • refining configurations and parameters (Tuning)
  • running benchmarks to establish reference points
  • running long-duration tests (soak / endurance) to validate stability over time

These activities are not isolated.

They are part of a continuous process aimed at understanding the system’s possibilities and limits.


Key idea

Performance engineering does not consist (only) in making a system faster and more performant.

It instead comprises a set of activities and tasks aimed at understanding how a system behaves under workload, and at ensuring that it remains:

  • predictable
  • stable
  • scalable

Most problems are not caused by a single “slow” operation, but by:

  • interactions between components
  • accumulation of waiting times
  • saturation of shared resources

Taken together, these mechanisms form the core of performance engineering.


1.3.2 Typical workflow

Performance engineering is an iterative process in which the system is progressively exercised, analyzed, stabilized, and understood under increasing levels of load.

The objective is not only to detect problems, but to build a reliable model of how the system behaves under realistic (and limit) production conditions.


1.3.2.1 Environment preparation and calibration

  • verify and align the test environment with production characteristics (as much as possible)
  • verify configurations (CPU, memory, pools, connections)
  • ensure observability (metrics, logs, traces)

Goal:

  • establish a reliable baseline
  • ensure repeatability of results

Without calibration, measurements are difficult (or impossible) to interpret, and comparisons become at the very least unreliable.


1.3.2.2 Use case definition and workload modeling

Before applying load to the system, the workload must be defined.

A system is not tested in isolation, but through the requests it processes.

This requires the precise identification of:

  • critical user and system paths
  • typical operations (read, write, batch, background job)
  • the relative frequency of each operation
  • concurrency patterns

A realistic workload includes:

  • a mix of use cases
  • a weighted distribution (e.g. traffic percentages)
  • different request types and different costs

Workload definition is one of the most critical steps and must be carried out in close collaboration with those who define the non-functional requirements (NFRs).

An incorrect workload leads to misleading conclusions, or even to conclusions that are entirely useless.


Non-functional requirements (NFR)

In parallel with workload definition, non-functional requirements must be clarified.

They define what is considered acceptable system behavior.

Typical examples:

  • throughput targets (e.g. 30 req/s)
  • concurrency levels (e.g. 500 concurrent users)
  • latency objectives (e.g. p95 < 200 ms)
  • error rate thresholds
  • resource usage constraints

NFRs may be:

  • explicitly defined by stakeholders
  • only partially defined
  • missing or inconsistent

In all cases, they must be:

  • reviewed
  • validated
  • made measurable

Practical implication

Workload and NFRs must be aligned.

For each use case:

  • the expected load must be defined
  • the acceptable behavior must be known

Otherwise:

  • results cannot be evaluated
  • tests cannot be considered either successful or failed

An incorrect workload definition or the absence of NFRs leads to results that are technically correct, but not actionable.


1.3.2.3 Initial load / stress testing (problem discovery)

The first “load test” phase aims to establish a reference baseline and possible major problems.

Typical goals:

  • identify obvious bottlenecks
  • detect functional errors under load
  • make instability emerge (timeouts, crashes, saturation)

This phase is often:

  • exploratory
  • iterative
  • partially white-box (using internal visibility)

The objective is discovery, not precision.


1.3.2.4 Analysis and bottleneck identification

Once problems have emerged, the system must be analyzed in detail.

This involves:

  • correlating metrics (latency, throughput, utilization)
  • identifying where time is spent
  • locating saturation points and queues

Typical questions:

  • which resource is saturated?
  • where does latency accumulate?
  • what limits throughput?

This step is based on:

1.1 Foundations
1.2 Core metrics and formulas


1.3.2.5 Fixes and iterative validation

After bottlenecks have been identified, corrective actions must be applied.

These may include:

  • code changes
  • configuration updates
  • resource adjustments (vertical/horizontal scalability)

Each fix must be validated by re-running the tests.

This creates an iterative loop:

  • TestAnalyzeFixTest again

The goal is to progressively stabilize the system.


1.3.2.6 Intermediate validation (stable baseline)

Before moving on to further and long-duration tests, the system must reach a stable baseline.

This means:

  • no critical errors under expected load
  • predictable behavior
  • latency and error rates under control

This phase ensures that:

  • major issues are resolved
  • results are reproducible

1.3.2.7 Long-duration validation (soak / endurance)

Once it has been ensured that the system is stable, it must be investigated over time.

This phase evaluates the system’s behavior under sustained workload over time.

Typical goals:

  • detect slow memory leaks
  • observe resource accumulation (threads, connections, buffers)
  • identify performance degradations over time
  • validate long-term stability

This phase is essential because some issues:

  • do not appear immediately
  • emerge only after prolonged exercise

The results of this phase have a direct impact on:

  • system sizing
  • capacity planning
  • runtime configuration

1.3.2.8 Dimensioning and capacity definition

Based on previous observations, and also starting from possible unit tests subsequent to the baseline stabilization phase, system components are sized.

This phase includes:

  • heap and memory configuration
  • thread pools and connection pools
  • concurrency limits
  • infrastructure sizing
  • clustering

The goal is to define:

  • how much load the system can handle
  • under which conditions it remains stable
  • which possible margins are required

Sizing must be based on observed behavior, not on assumptions.


1.3.2.9 Tuning

Once sizing has been defined, tuning refines system behavior.

Typical areas:

  • garbage collector parameters
  • thread scheduling and pool sizing
  • database and connection settings
  • caching strategies

Tuning aims to:

  • reduce latency
  • improve stability
  • optimize resource usage

It is often iterative and dependent on the specific context.


1.3.2.10 Verification and regression

After the tuning phase, the system must be validated again.

This includes:

  • re-running key scenarios
  • verifying that improvements are effective
  • ensuring that regressions are not introduced

This phase ensures consistency and reliability.


1.3.2.11 Benchmarking and reference points

Finally, benchmarks are established.

They provide:

  • reference performance metrics
  • comparison points between versions
  • validation against expectations

Benchmarks are not goals in themselves.

They are used to:

  • understand system behavior
  • track its evolution over time

Key idea

Performance engineering develops according to an iterative loop:

  • define the workloadtestanalyzefixvalidateoptimize

The objective is not only to improve performance, but to understand system limits and ensure predictable behavior under load.


1.3.3 Black-box vs white-box

Performance engineering can be approached from two complementary perspectives:

  • black-box (external observation)
  • white-box (internal observation)

Both are necessary to understand system behavior under workload.


1.3.3.1 Black-box approach

In a black-box approach, the system is observed from the outside.

Only externally visible behavior is measured:

  • response time
  • throughput
  • error rate

Internal implementation is not taken into consideration.


What it provides

Black-box observation allows:

  • validating system behavior from the user’s point of view
  • measuring end-to-end performance
  • detecting visible errors under load

It answers questions such as:

  • Is the system fast enough?
  • Does it handle the expected load?
  • Does it fail under stress?

Limitations

Black-box alone cannot explain:

  • where time is most often spent
  • which resource is saturated
  • why performance degrades

It shows symptoms, not causes.


1.3.3.2 White-box approach

In a white-box approach, the internal behavior of the system is observed.

This includes:

  • resource utilization (CPU, memory, disk, network)
  • thread pools and connection pools
  • internal queues
  • component-level timings

White-box observation provides a level of introspection into system execution.

In many cases, this includes visibility close to the code level:

  • method-level timings
  • call paths and execution flows
  • hotspots (slow or frequently executed methods)
  • allocation patterns and memory behavior
  • lock contention and synchronization points

What it provides

White-box observation allows:

  • identifying bottlenecks
  • understanding where time is spent
  • detecting contention and queueing
  • analyzing resource saturation

It answers questions such as:

  • Which component is slow?
  • Where does latency accumulate?
  • What limits throughput?
  • Which part of the execution is responsible for the slowdown?

Limitations

White-box alone does not guarantee:

  • correct end-to-end behavior
  • acceptable user experience

A system may appear internally efficient, while still failing under real workload conditions.


1.3.3.3 Observability and instrumentation

Observability provides the data required for white-box analysis.

It typically includes:

  • system and application metrics (e.g. CPU usage, latency, throughput)
  • logs (events, errors, state changes)
  • traces (request flow between components)
  • application performance monitoring (APM)

These sources provide continuous visibility into system behavior.


Diagnostic artifacts

In addition to continuous observability, deeper analysis is often based on diagnostic artifacts.

These are typically collected on demand and provide a snapshot of the system state.

Common examples include:

  • thread dumps (thread states, locks, contention)
  • heap dumps (memory usage, object retention, leaks)
  • profiling snapshots (CPU profiling and allocations)
  • core dumps (process-level failure analysis)

These artifacts allow:

  • inspecting the internal state of execution
  • identifying blocked threads and deadlocks
  • analyzing memory leaks and retention paths
  • investigating performance anomalies in detail

They are generally heavier and more intrusive than observability tools, and are used selectively during diagnostics.


1.3.3.4 Combining both approaches

Effective performance engineering requires combining both perspectives.

Typical workflow:

  • use black-box to detect problems
  • use white-box to explain them
  • validate improvements again with black-box

This creates a feedback loop:

  • observeanalyzefixvalidate

Key idea

Black-box observation reveals that a problem exists.

White-box observation explains why it exists.

Both are necessary to understand and control system behavior under load.


1.3.4 Load testing vs diagnostics

Load testing and diagnostics are often confused.

They serve different purposes and operate at different levels.

Both are necessary to understand system behavior under workload.


1.3.4.1 Load testing

Load testing applies a controlled workload to the system.

It is used to:

  • observe behavior under specific conditions
  • measure latency, throughput, and error rates
  • validate assumptions about capacity and scalability

Load testing operates primarily at the black-box level:

  • requests are generated externally
  • responses are measured externally

What it provides

Load testing answers questions such as:

  • Can the system handle the expected load?
  • What happens when load increases?
  • When does performance degrade?
  • What is the maximum sustainable throughput?

Limitations

Load testing alone does not explain:

  • why the system slows down
  • which component is responsible
  • how resources are used internally

It reveals behavior, but not causes.


1.3.4.2 Diagnostics

Diagnostics investigates the internal behavior of the system.

It is used to:

  • identify bottlenecks
  • understand execution paths
  • analyze resource usage
  • explain observed performance issues

Diagnostics operates at the white-box level:

  • internal metrics are analyzed
  • traces and execution paths are inspected
  • diagnostic artifacts may be collected

What it provides

Diagnostics answers questions such as:

  • Where is time spent?
  • Which resource is saturated?
  • Which component is responsible for latency?
  • What causes performance degradation?

Tools and techniques

Diagnostics typically relies on:

  • metrics, logs, and traces
  • application performance monitoring (APM)
  • thread dumps and heap dumps
  • profiling and execution analysis

Limitations

Diagnostics without load testing may fail to capture:

  • real workload conditions
  • interactions between components
  • behavior under stress

It can explain a problem, but not necessarily reproduce it.


1.3.4.3 Relationship between load testing and diagnostics

Load testing and diagnostics must be combined.

Typical workflow:

  • apply load to expose behavior
  • use diagnostics to analyze internal state
  • apply fixes
  • validate again with load testing

This creates a loop:

  • observe → explain → fix → validate

Key idea

Load testing reveals that a problem exists.

Diagnostics explains why it exists.

Neither of the two is sufficient on its own.

Understanding system behavior requires both.


1.3.5 What actually matters (and what does not)

Performance engineering involves an extensive set of tools, metrics, and techniques.

However, not all of them can have the same level of importance in heterogeneous contexts.

Understanding what matters is essential to avoid wasting effort and drawing incorrect conclusions.


What matters

The most important aspects are:

  • understanding system behavior under load
  • identifying bottlenecks and limiting factors
  • using realistic workloads and validated NFRs
  • reasoning about interactions between components
  • measuring and interpreting results correctly

Performance engineering mainly concerns:

  • building a mental model of the system
  • validating that model through observations
  • refining it through iteration

What does not matter (as much as it seems)

Some aspects are often excessively emphasized:

  • tools and frameworks
  • isolated metrics without context
  • synthetic or unrealistic test scenarios
  • micro-optimizations without system-level impact
  • results of a single test taken in isolation

These elements may be useful, but they are not sufficient.


Common misconceptions

Several misconceptions frequently appear:

  • “If I run a load test, I understand the system”
  • “If CPU is low, the system is healthy”
  • “If average latency is acceptable, the system is fine”
  • “More hardware will solve the problem”

These assumptions often lead to incorrect conclusions.


System-level thinking

Performance emerges from interactions:

  • between components
  • between workload and resources
  • between concurrency and queueing

Focusing on a single part of the system is rarely sufficient.

What is required is a global view.


Practical implication

Effective performance engineering requires:

  • asking the right questions
  • validating assumptions
  • correlating multiple signals
  • iterating on the basis of evidence

Tools, tests, and metrics support this process, but do not replace it.


Key idea

Performance engineering does not consist in collecting data.

It concerns understanding what the data means.

The goal is not to produce numbers, but to explain system behavior and make informed decisions.