1.3 – Work of a performance engineer

This section describes what performance engineering is, in practice, and how it is applied to real systems.

1.3.1 What performance engineering is (in practice)

Definition

Performance engineering is the discipline that consists in understanding, measuring, and controlling the way a system behaves under load.

It is limited neither to performance testing, nor to specific tools or technologies.

It refers rather to an overall methodology for reasoning about systems under load or, possibly, under stress.

It focuses on the overall behavior of the system and not on isolated metrics or on individual components.

Performance and non-functional requirements

Performance engineering does not focus on a single property.

When a system is exercised under load, a subset of non-functional requirements (NFRs) becomes visible:

latency and throughput (performance)
scalability (vertical and horizontal)
stability and resilience under stress
resource usage and efficiency
capacity limits

These properties are not independent of one another.

They all emerge together as the system is brought to its limits.

Load acts as a forcing function that reveals the way the system behaves.

A system that appears perfectly balanced under low load may show completely different behavior when it is stressed.

What performance engineering actually observes

Under load, a system reveals:

how work passes through its components
how resources are consumed
where contentions appear
where queues form
which limits are reached first

This requires:

understanding the system model (→ 1.1 Foundations)
measuring key metrics (→ 1.2 Core metrics and formulas)
identifying limiting factors

The objective, evidently, is not only to observe system behavior, but also to explain it.

Not just testing

Performance engineering is often reduced to load testing alone.

In practice, the testing phase is only one part of the work.

Tests are used to:

expose system behavior
validate assumptions
reproduce problems

But performance engineering also includes:

analyzing system design
investigating production issues
sizing resources (heap, pools, threads, connections)
explaining observed behavior

Testing without analysis produces data without understanding.

Practical perspective

In real scenarios, the work typically involves:

preparing and calibrating test environments
interpreting non-functional requirements (NFRs)
identifying and defining (meaningful) test scenarios with respect to NFRs
validating behavior with controlled use cases
applying load or stress to make problems emerge (often in white-box)
identifying and correcting bottlenecks
sizing system components (CPU, memory, pools, concurrency limits)
refining configurations and parameters (Tuning)
running benchmarks to establish reference points
running long-duration tests (soak / endurance) to validate stability over time

These activities are not isolated.

They are part of a continuous process aimed at understanding the system’s possibilities and limits.

Key idea

Performance engineering does not consist (only) in making a system faster and more performant.

It instead comprises a set of activities and tasks aimed at understanding how a system behaves under workload, and at ensuring that it remains:

predictable
stable
scalable

Most problems are not caused by a single “slow” operation, but by:

interactions between components
accumulation of waiting times
saturation of shared resources

Taken together, these mechanisms form the core of performance engineering.

1.3.2 Typical workflow

Performance engineering is an iterative process in which the system is progressively exercised, analyzed, stabilized, and understood under increasing levels of load.

The objective is not only to detect problems, but to build a reliable model of how the system behaves under realistic (and limit) production conditions.

1.3.2.1 Environment preparation and calibration

verify and align the test environment with production characteristics (as much as possible)
verify configurations (CPU, memory, pools, connections)
ensure observability (metrics, logs, traces)

Goal:

establish a reliable baseline
ensure repeatability of results

Without calibration, measurements are difficult (or impossible) to interpret, and comparisons become at the very least unreliable.

1.3.2.2 Use case definition and workload modeling

Before applying load to the system, the workload must be defined.

A system is not tested in isolation, but through the requests it processes.

This requires the precise identification of:

critical user and system paths
typical operations (read, write, batch, background job)
the relative frequency of each operation
concurrency patterns

A realistic workload includes:

a mix of use cases
a weighted distribution (e.g. traffic percentages)
different request types and different costs

Workload definition is one of the most critical steps and must be carried out in close collaboration with those who define the non-functional requirements (NFRs).

An incorrect workload leads to misleading conclusions, or even to conclusions that are entirely useless.

Non-functional requirements (NFR)

In parallel with workload definition, non-functional requirements must be clarified.

They define what is considered acceptable system behavior.

Typical examples:

throughput targets (e.g. 30 req/s)
concurrency levels (e.g. 500 concurrent users)
latency objectives (e.g. p95 < 200 ms)
error rate thresholds
resource usage constraints

NFRs may be:

explicitly defined by stakeholders
only partially defined
missing or inconsistent

In all cases, they must be:

reviewed
validated
made measurable

Practical implication

Workload and NFRs must be aligned.

For each use case:

the expected load must be defined
the acceptable behavior must be known

Otherwise:

results cannot be evaluated
tests cannot be considered either successful or failed

An incorrect workload definition or the absence of NFRs leads to results that are technically correct, but not actionable.

1.3.2.3 Initial load / stress testing (problem discovery)

The first “load test” phase aims to establish a reference baseline and possible major problems.

Typical goals:

identify obvious bottlenecks
detect functional errors under load
make instability emerge (timeouts, crashes, saturation)

This phase is often:

exploratory
iterative
partially white-box (using internal visibility)

The objective is discovery, not precision.

1.3.2.4 Analysis and bottleneck identification

Once problems have emerged, the system must be analyzed in detail.

This involves:

correlating metrics (latency, throughput, utilization)
identifying where time is spent
locating saturation points and queues

Typical questions:

which resource is saturated?
where does latency accumulate?
what limits throughput?

This step is based on:

→ 1.1 Foundations
→ 1.2 Core metrics and formulas

1.3.2.5 Fixes and iterative validation

After bottlenecks have been identified, corrective actions must be applied.

These may include:

code changes
configuration updates
resource adjustments (vertical/horizontal scalability)

Each fix must be validated by re-running the tests.

This creates an iterative loop:

Test → Analyze → Fix → Test again

The goal is to progressively stabilize the system.

1.3.2.6 Intermediate validation (stable baseline)

Before moving on to further and long-duration tests, the system must reach a stable baseline.

This means:

no critical errors under expected load
predictable behavior
latency and error rates under control

This phase ensures that:

major issues are resolved
results are reproducible

1.3.2.7 Long-duration validation (soak / endurance)

Once it has been ensured that the system is stable, it must be investigated over time.

This phase evaluates the system’s behavior under sustained workload over time.

Typical goals:

detect slow memory leaks
observe resource accumulation (threads, connections, buffers)
identify performance degradations over time
validate long-term stability

This phase is essential because some issues:

do not appear immediately
emerge only after prolonged exercise

The results of this phase have a direct impact on:

system sizing
capacity planning
runtime configuration

1.3.2.8 Dimensioning and capacity definition

Based on previous observations, and also starting from possible unit tests subsequent to the baseline stabilization phase, system components are sized.

This phase includes:

heap and memory configuration
thread pools and connection pools
concurrency limits
infrastructure sizing
clustering

The goal is to define:

how much load the system can handle
under which conditions it remains stable
which possible margins are required

Sizing must be based on observed behavior, not on assumptions.

1.3.2.9 Tuning

Once sizing has been defined, tuning refines system behavior.

Typical areas:

garbage collector parameters
thread scheduling and pool sizing
database and connection settings
caching strategies

Tuning aims to:

reduce latency
improve stability
optimize resource usage

It is often iterative and dependent on the specific context.

1.3.2.10 Verification and regression

After the tuning phase, the system must be validated again.

This includes:

re-running key scenarios
verifying that improvements are effective
ensuring that regressions are not introduced

This phase ensures consistency and reliability.

1.3.2.11 Benchmarking and reference points

Finally, benchmarks are established.

They provide:

reference performance metrics
comparison points between versions
validation against expectations

Benchmarks are not goals in themselves.

They are used to:

understand system behavior
track its evolution over time

Key idea

Performance engineering develops according to an iterative loop:

define the workload → test → analyze → fix → validate → optimize

The objective is not only to improve performance, but to understand system limits and ensure predictable behavior under load.

1.3.3 Black-box vs white-box

Performance engineering can be approached from two complementary perspectives:

black-box (external observation)
white-box (internal observation)

Both are necessary to understand system behavior under workload.

1.3.3.1 Black-box approach

In a black-box approach, the system is observed from the outside.

Only externally visible behavior is measured:

response time
throughput
error rate

Internal implementation is not taken into consideration.

What it provides

Black-box observation allows:

validating system behavior from the user’s point of view
measuring end-to-end performance
detecting visible errors under load

It answers questions such as:

Is the system fast enough?
Does it handle the expected load?
Does it fail under stress?

Limitations

Black-box alone cannot explain:

where time is most often spent
which resource is saturated
why performance degrades

It shows symptoms, not causes.

1.3.3.2 White-box approach

In a white-box approach, the internal behavior of the system is observed.

This includes:

resource utilization (CPU, memory, disk, network)
thread pools and connection pools
internal queues
component-level timings

White-box observation provides a level of introspection into system execution.

In many cases, this includes visibility close to the code level:

method-level timings
call paths and execution flows
hotspots (slow or frequently executed methods)
allocation patterns and memory behavior
lock contention and synchronization points

What it provides

White-box observation allows:

identifying bottlenecks
understanding where time is spent
detecting contention and queueing
analyzing resource saturation

It answers questions such as:

Which component is slow?
Where does latency accumulate?
What limits throughput?
Which part of the execution is responsible for the slowdown?

Limitations

White-box alone does not guarantee:

correct end-to-end behavior
acceptable user experience

A system may appear internally efficient, while still failing under real workload conditions.

1.3.3.3 Observability and instrumentation

Observability provides the data required for white-box analysis.

It typically includes:

system and application metrics (e.g. CPU usage, latency, throughput)
logs (events, errors, state changes)
traces (request flow between components)
application performance monitoring (APM)

These sources provide continuous visibility into system behavior.

Diagnostic artifacts

In addition to continuous observability, deeper analysis is often based on diagnostic artifacts.

These are typically collected on demand and provide a snapshot of the system state.

Common examples include:

thread dumps (thread states, locks, contention)
heap dumps (memory usage, object retention, leaks)
profiling snapshots (CPU profiling and allocations)
core dumps (process-level failure analysis)

These artifacts allow:

inspecting the internal state of execution
identifying blocked threads and deadlocks
analyzing memory leaks and retention paths
investigating performance anomalies in detail

They are generally heavier and more intrusive than observability tools, and are used selectively during diagnostics.

1.3.3.4 Combining both approaches

Effective performance engineering requires combining both perspectives.

Typical workflow:

use black-box to detect problems
use white-box to explain them
validate improvements again with black-box

This creates a feedback loop:

observe → analyze → fix → validate

Key idea

Black-box observation reveals that a problem exists.

White-box observation explains why it exists.

Both are necessary to understand and control system behavior under load.

1.3.4 Load testing vs diagnostics

Load testing and diagnostics are often confused.

They serve different purposes and operate at different levels.

Both are necessary to understand system behavior under workload.

1.3.4.1 Load testing

Load testing applies a controlled workload to the system.

It is used to:

observe behavior under specific conditions
measure latency, throughput, and error rates
validate assumptions about capacity and scalability

Load testing operates primarily at the black-box level:

requests are generated externally
responses are measured externally

What it provides

Load testing answers questions such as:

Can the system handle the expected load?
What happens when load increases?
When does performance degrade?
What is the maximum sustainable throughput?

Limitations

Load testing alone does not explain:

why the system slows down
which component is responsible
how resources are used internally

It reveals behavior, but not causes.

1.3.4.2 Diagnostics

Diagnostics investigates the internal behavior of the system.

It is used to:

identify bottlenecks
understand execution paths
analyze resource usage
explain observed performance issues

Diagnostics operates at the white-box level:

internal metrics are analyzed
traces and execution paths are inspected
diagnostic artifacts may be collected

What it provides

Diagnostics answers questions such as:

Where is time spent?
Which resource is saturated?
Which component is responsible for latency?
What causes performance degradation?

Tools and techniques

Diagnostics typically relies on:

metrics, logs, and traces
application performance monitoring (APM)
thread dumps and heap dumps
profiling and execution analysis

Limitations

Diagnostics without load testing may fail to capture:

real workload conditions
interactions between components
behavior under stress

It can explain a problem, but not necessarily reproduce it.

1.3.4.3 Relationship between load testing and diagnostics

Load testing and diagnostics must be combined.

Typical workflow:

apply load to expose behavior
use diagnostics to analyze internal state
apply fixes
validate again with load testing

This creates a loop:

observe → explain → fix → validate

Key idea

Load testing reveals that a problem exists.

Diagnostics explains why it exists.

Neither of the two is sufficient on its own.

Understanding system behavior requires both.

1.3.5 What actually matters (and what does not)

Performance engineering involves an extensive set of tools, metrics, and techniques.

However, not all of them can have the same level of importance in heterogeneous contexts.

Understanding what matters is essential to avoid wasting effort and drawing incorrect conclusions.

What matters

The most important aspects are:

understanding system behavior under load
identifying bottlenecks and limiting factors
using realistic workloads and validated NFRs
reasoning about interactions between components
measuring and interpreting results correctly

Performance engineering mainly concerns:

building a mental model of the system
validating that model through observations
refining it through iteration

What does not matter (as much as it seems)

Some aspects are often excessively emphasized:

tools and frameworks
isolated metrics without context
synthetic or unrealistic test scenarios
micro-optimizations without system-level impact
results of a single test taken in isolation

These elements may be useful, but they are not sufficient.

Common misconceptions

Several misconceptions frequently appear:

“If I run a load test, I understand the system”
“If CPU is low, the system is healthy”
“If average latency is acceptable, the system is fine”
“More hardware will solve the problem”

These assumptions often lead to incorrect conclusions.

System-level thinking

Performance emerges from interactions:

between components
between workload and resources
between concurrency and queueing

Focusing on a single part of the system is rarely sufficient.

What is required is a global view.

Practical implication

Effective performance engineering requires:

asking the right questions
validating assumptions
correlating multiple signals
iterating on the basis of evidence

Tools, tests, and metrics support this process, but do not replace it.

Key idea

Performance engineering does not consist in collecting data.

It concerns understanding what the data means.

The goal is not to produce numbers, but to explain system behavior and make informed decisions.

1.3 – Work of a performance engineer

Table of Contents

1.3.1 What performance engineering is (in practice)

Definition

Performance and non-functional requirements

What performance engineering actually observes

Not just testing

Practical perspective

Key idea

1.3.2 Typical workflow

1.3.2.1 Environment preparation and calibration

1.3.2.2 Use case definition and workload modeling

Non-functional requirements (NFR)

Practical implication

1.3.2.3 Initial load / stress testing (problem discovery)

1.3.2.4 Analysis and bottleneck identification

1.3.2.5 Fixes and iterative validation

1.3.2.6 Intermediate validation (stable baseline)

1.3.2.7 Long-duration validation (soak / endurance)

1.3.2.8 Dimensioning and capacity definition

1.3.2.9 Tuning

1.3.2.10 Verification and regression

1.3.2.11 Benchmarking and reference points

Key idea

1.3.3 Black-box vs white-box

1.3.3.1 Black-box approach

What it provides

Limitations

1.3.3.2 White-box approach

What it provides

Limitations

1.3.3.3 Observability and instrumentation

Diagnostic artifacts

1.3.3.4 Combining both approaches

Key idea

1.3.4 Load testing vs diagnostics

1.3.4.1 Load testing

What it provides

Limitations

1.3.4.2 Diagnostics

What it provides

Tools and techniques

Limitations

1.3.4.3 Relationship between load testing and diagnostics

Key idea

1.3.5 What actually matters (and what does not)

What matters

What does not matter (as much as it seems)

Common misconceptions

System-level thinking

Practical implication

Key idea