1.2 – Core metrics and formulas

This document presents a concise reference of the main formulas used in application + system performance engineering.

These formulas formalize the concepts introduced in:

→ 1.1 Foundations

They should be read as a complement to the conceptual model, not in isolation.

They provide the quantitative basis used to reason about system behavior, validate hypotheses, and interpret performance test results.

1.2.1 Little’s Law (system-level concurrency)
1.2.2 Utilization Law (resource-level busy time)
1.2.3 Service time vs response time (queueing)
1.2.4 Service Demand (visits × service time)
1.2.5 Throughput
1.2.6 Error rate
1.2.7 Percentiles (p50, p95, p99)
- 1.2.7.1 How to compute a percentile (ordered sample)
- 1.2.7.2 Interpretation vs average (why tails matter)
1.2.8 Empirical CDF (threshold → percentage)
1.2.9 Long-tail latency (what it is)
1.2.10 Quick checklist (what to measure in tests)

Notation (typical)

Symbol	Definition
`X` or `λ`	throughput / arrival rate (requests per second)
`R` or `W`	response time / time in system (seconds)
`S`	service time at a resource (seconds per request)
`U`	utilization of a resource (0–1)
`L`	average concurrency / in-flight requests (count)
`V`	average number of visits to a resource per request
`D`	service demand on a resource (seconds per request)

This notation is used consistently throughout the guide and allows formulas to be applied uniformly across different contexts.

1.2.1 Little’s Law (system-level concurrency)

Definition

This law relates average concurrency to throughput and time in system.

Formula

\[ L = \lambda \cdot W \]

Where

L = average number of requests in the system (in-flight / concurrency)
λ = arrival rate / throughput (requests/s)
W = average time in system (s) (often the average end-to-end response time)

Practical meaning

If throughput and average response time are known, the number of requests that are simultaneously “in flight” in the system can be estimated.

This makes Little’s Law one of the most useful tools for reasoning about system load and concurrency.

Example

If λ = 200 req/s and W = 0.15 s:

\[ L = 200 \cdot 0.15 = 30 \]

On average, there are about 30 requests in flight.

Practical interpretation

Little’s Law links three observable quantities:

throughput
latency
concurrency

This allows:

estimating concurrency from measurements
validating system behavior
detecting inconsistencies in metrics

This law is extensively used in performance engineering, capacity planning, and system diagnostics.

1.2.2 Utilization Law (resource-level busy time)

Definition

Utilization is the fraction of time during which a single resource is busy over a fixed time interval (typically 1 second).
It measures the “busy time percentage”.

Formula

\[ U = X \cdot S \]

Where

U = utilization (0–1)
X = throughput observed by that resource (req/s)
S = mean service time at that resource (s/req)

Resource

A single service unit, e.g. CPU core, thread/worker, DB connection, etc.

Example

A DB worker handles 50 req/s, each query takes 10 ms = 0.01 s:

\[ U = 50 \cdot 0.01 = 0.5 \Rightarrow 50\% \]

Interpretation: the resource is busy 0.5 seconds per second.

Practical interpretation

Utilization is a key indicator of resource saturation.

As utilization approaches 1:

queueing increases
latency grows non-linearly
system stability decreases

This makes it one of the most important signals when diagnosing bottlenecks.

1.2.3 Service time vs response time (queueing)

Definition

Response time at a resource includes:

service time (actual work)
queue time (waiting)

Formula

\[ R = S + W_q \]

Where

R = response time at the resource
S = service time
W_q = waiting time in queue

Practical meaning

As utilization approaches saturation, queueing grows non-linearly and dominates response time, causing long-tail latency.

Practical interpretation

This formula explains why systems slow down under load even when computation cost does not change.

In many real systems:

service time remains relatively stable
waiting time increases rapidly

As a consequence:

response time is dominated by queueing
latency becomes unpredictable

This is a key point in diagnosing performance issues.

1.2.4 Service Demand (visits × service time)

Definition

Total service required from a resource per request, taking multiple visits into account.

Formula

\[ D = V \cdot S \]

Where

D = service demand on the resource (s)
V = average visits to the resource per request
S = service time per visit (s)

Example

A request performs V = 3 DB queries, each taking S = 5 ms = 0.005 s:

\[ D = 3 \cdot 0.005 = 0.015 \text{ s} = 15 \text{ ms} \]

Practical interpretation

Service demand represents the total work required from a resource for each request.

It is particularly useful for:

identifying the most heavily used resources
estimating capacity limits
understanding scaling behavior

Reducing service demand is often more effective than increasing raw capacity.

1.2.5 Throughput

Definition

Requests completed per unit of time.

Formula

Formula: X = N / T

Where

N = number of completed requests
T = observation window (seconds)

Practical interpretation

Throughput is one of the primary indicators of system performance.

It reflects the system’s ability to process work.

However, throughput must always be interpreted together with:

latency
error rate
resource utilization

High throughput alone does not guarantee acceptable system behavior.

1.2.6 Error rate

Definition

Fraction of requests that fail (timeouts, 5xx, etc.).

Formula

Formula: ErrorRate = (N_err / N_total) × 100%

Practical interpretation

Error rate reflects system reliability under load.

An increase in error rate often indicates:

overload conditions
resource exhaustion
instability

Error rate should always be monitored together with latency and throughput.

1.2.7 Percentiles (p50, p95, p99)

Definition

The p-th percentile is the value below which p% of observations fall.

p50 ≈ median (“typical request”)
p95 = threshold for the slowest 5%
p99 = threshold for the slowest 1%

Percentiles capture distribution and tail behavior better than averages.

Practical interpretation

Percentiles are essential for understanding the user’s real experience.

In many systems:

average latency appears acceptable
tail latency (p95/p99) is significantly worse

This difference is critical for system evaluation and SLO definition.

1.2.7.1 How to compute a percentile (ordered sample)

Given N values sorted in ascending order:

\[ v_1 \le v_2 \le \dots \le v_N \]

Compute the theoretical position:

Formula: P = (p / 100) × (N + 1)

If P is an integer → percentile = v_P
Otherwise, let k = floor(P) and δ = P - k (fractional part), then interpolate:

\[ \text{Percentile}(p) \approx v_k + \delta \cdot (v_{k+1} - v_k) \]

Note: percentile definitions vary slightly across tools. This method is a commonly used approach.

1.2.7.2 Interpretation vs average (why tails matter)

If p50 is much lower than the mean, the distribution is right-skewed (a few slow requests inflate the mean).
If p95 or p99 is far above the mean, you have long-tail latency.

A typical pattern:

the mean looks “acceptable”
p95/p99 are bad

→ user experience is degraded for a non-negligible fraction of users and SLOs are at risk.

Practical interpretation

Percentiles highlight behaviors that averages hide.

They are essential for:

defining service level objectives (SLOs)
detecting tail latency issues
understanding worst-case behavior

Ignoring percentiles often leads to incorrect conclusions about system performance.

1.2.8 Empirical CDF (threshold → percentage)

Definition

Given a threshold t, the empirical cumulative distribution function (CDF) indicates the fraction of samples less than or equal to t.

Formula

Formula: F(t) = count(x_i ≤ t) / N

Practical meaning

The CDF answers the question: “If my SLO is 200 ms, what % of requests meet it?”

Percentiles answer the inverse question: “What threshold corresponds to 95% of requests?”

Practical interpretation

CDF and percentiles are complementary views of the same data.

CDF: given a threshold → what fraction meets it
Percentile: given a fraction → what threshold corresponds to it

Both are useful for performance analysis and SLO validation.

1.2.9 Long-tail latency (what it is)

Definition

A small fraction of requests (e.g. 5% or 1%) is much slower than the majority.

Why the tail “dominates”

SLOs are typically defined on p95/p99, so tails drive pass/fail.
In distributed systems, the slowest dependency often determines end-to-end latency.
Tail events are frequently driven by contention / queueing.

Common causes (high-level)

thread pool / connection pool saturation (queueing)
lock contention / synchronization hot spots
slow DB queries, missing indexes, lock waits
retries + timeouts amplifying tail latency
hot keys in caches / uneven shard load
GC pauses / memory pressure (stop-the-world)
network jitter / packet loss / retransmissions
disk I/O spikes, compactions, fsync/wal flush

Practical interpretation

Long-tail latency is one of the most critical aspects of system performance.

It explains why:

average metrics may appear acceptable
user experience is still degraded

Managing tail latency is often more important than improving average performance.

1.2.10 Quick checklist (what to measure in tests)

Latency: p50/p90/p95/p99
Throughput: RPS/TPS
Error rate: timeouts/5xx
Utilization: CPU, memory, DB, pools
Queue lengths: thread pools, connection pools, message backlogs
Dependency timings: DB/Redis/external APIs

Practical interpretation

These metrics constitute the minimum set required to understand system behavior during performance tests.

They allow:

identifying bottlenecks
detecting instability
correlating workload with system behavior

Measuring only a subset of these metrics often leads to incomplete or misleading analysis.

Key idea

Formulas are not isolated abstractions.

They are tools used to explain observed behavior and validate system models.

Evaluating them is an essential element of performance engineering.

1.2 – Core metrics and formulas

Table of Contents

Notation (typical)

1.2.1 Little’s Law (system-level concurrency)

Definition

Formula

Where

Practical meaning

Example

Practical interpretation

1.2.2 Utilization Law (resource-level busy time)

Definition

Formula

Where

Resource

Example

Practical interpretation

1.2.3 Service time vs response time (queueing)

Definition

Formula

Where

Practical meaning

Practical interpretation

1.2.4 Service Demand (visits × service time)

Definition

Formula

Where

Example

Practical interpretation

1.2.5 Throughput

Definition

Formula

Where

Practical interpretation

1.2.6 Error rate

Definition

Formula

Practical interpretation

1.2.7 Percentiles (p50, p95, p99)

Definition

Practical interpretation

1.2.7.1 How to compute a percentile (ordered sample)

1.2.7.2 Interpretation vs average (why tails matter)

Practical interpretation

1.2.8 Empirical CDF (threshold → percentage)

Definition

Formula

Practical meaning

Practical interpretation

1.2.9 Long-tail latency (what it is)

Definition

Why the tail “dominates”

Common causes (high-level)

Practical interpretation

1.2.10 Quick checklist (what to measure in tests)

Practical interpretation

Key idea