1.6 – Concurrency and parallelism

This chapter introduces concurrency and parallelism as fundamental concepts in performance engineering for systems and applications.

It introduces work scheduling, how multiple tasks interact, and why coordination overhead, contention, and synchronization often become limiting factors under load.

Concurrency and parallelism are essential for scalability, but they also introduce complexity, overhead, and breaking points that directly influence latency, throughput, and system stability.

1.6.1 Concurrency vs parallelism

Definition

Concurrency and parallelism are related but distinct concepts.

They are often confused, but they describe different aspects of system behavior.

Understanding the distinction is essential because a system may manage many activities concurrently from a structural point of view without actually executing many activities simultaneously at the hardware level.

Concurrency

Concurrency refers to the ability of a system to handle multiple tasks during the same time interval.

These tasks:

may not be executed at exactly the same moment
may be “interleaved”
share system resources

Concurrency concerns:

structure
coordination
management of multiple “in flight” operations

It is therefore mainly concerned with how work is organized and scheduled.

Parallelism

Parallelism refers to the execution of multiple tasks at the same instant.

This requires:

multiple processing units (e.g. CPU cores)
true simultaneous execution

Parallelism concerns:

execution
hardware utilization
doing more work at the same instant

It is therefore mainly concerned with simultaneous execution.

Key difference

Concurrency = dealing with many tasks
Parallelism = executing many tasks simultaneously

A system may be:

concurrent but not parallel (single core, “interleaved” tasks)
parallel but not highly concurrent (few long-running tasks)

This distinction matters because the scalability properties of a system depend not only on how much work exists, but also on how that work is coordinated and scheduled.

Relationship with performance

Concurrency affects:

how many requests can be in execution
how resources are shared
how contention arises

Parallelism affects:

how quickly work can be executed
how effectively hardware is utilized

Both influence:

throughput
latency
scalability

In practice, adding concurrency without sufficient parallelism may increase waiting and contention, while adding parallelism without good concurrency control may waste resources or expose coordination problems.

Practical intuition

A concurrent system:

can accept many requests
may still process them sequentially or with limited parallelism

A parallel system:

can process multiple requests at the same time
but may still suffer from contention or coordination overhead

For this reason, concurrency and parallelism should not be treated as automatically beneficial.

Their value depends on how they interact with workload, shared resources, and execution constraints.

Link with previous concepts

Concurrency increases:

the number of in-flight requests (→ 1.2.1 Little’s Law)

This leads to:

resource sharing
potential queueing (→ 1.5.2 Saturation and queueing)

This is one of the main reasons why concurrency becomes a central topic in performance engineering and not only a programming concern.

Practical interpretation

Concurrency is often necessary to support many simultaneous operations, especially in networked and I/O-driven systems.

However, concurrency also increases the probability of:

shared-state interactions
queue buildup
lock contention
coordination overhead

Parallelism may increase throughput, but only if useful work is actually being executed rather than blocked or serialized.

Key idea

Concurrency determines how many tasks are active.

Parallelism determines how many tasks are executed at the same time.

Performance depends on both, and on how they interact with system resources.

1.6.2 Threads and execution model

Definition

The execution model defines how work is executed within a system.

In most systems, work is performed by threads, which are executed within a process.

The execution model determines how requests are mapped onto execution units, how waiting is handled, and how system resources are consumed under load.

Processes and threads

A process is an isolated execution environment:

it has its own memory space
it contains resources (files, sockets, memory)

A thread is an execution unit within a process:

multiple threads share the same process memory
threads execute tasks concurrently

In most applications:

one process hosts multiple threads
threads handle incoming requests

This shared-memory model makes threads efficient for communication, but it also introduces the complexity of shared state.

Threads

A thread:

executes instructions
consumes CPU time
may block while waiting (e.g. I/O, locks)

Multiple threads allow a system to:

handle more requests
overlap computation and waiting
increase concurrency

However, threads are not free.

Each additional thread introduces memory overhead, scheduling overhead, and coordination complexity.

Thread lifecycle

A thread typically goes through different states:

running (actively executing)
runnable (ready to execute, waiting for CPU)
waiting / blocked (waiting for a resource or an event)

Performance is influenced by how threads move between these states.

A system with many threads in “runnable” or “blocked” state may appear active, but achieve limited useful progress.

Understanding thread states is therefore essential when diagnosing concurrency issues.

Stack and memory

Each thread has its own stack:

it stores method calls and local variables
it grows and shrinks during execution

Implications:

more threads → greater memory usage (one stack per thread)
deep call chains → greater stack usage
stack exhaustion may lead to failures

This is particularly relevant in high-concurrency systems.

Thread count therefore affects not only scheduling, but also memory footprint and stability.

Execution models

Different systems use different execution models.

Common models include:

One thread per request

Each request is handled by a dedicated thread.

Characteristics:

simple model
easy to understand
blocking operations are straightforward

Limitations:

high memory usage with many threads
limited scalability under high-concurrency conditions

This model is conceptually simple, but it often behaves poorly when concurrency becomes very high or when blocking is frequent.

Thread pool

A fixed number of threads handles incoming requests.

Requests are queued and assigned to available threads.

Characteristics:

controlled concurrency
reduced overhead compared with unbounded threads

Limitations:

queueing when all threads are busy
potential saturation of the pool

This model is widely used because it provides controlled resource usage, but it introduces an explicit queue and therefore a visible capacity limit.

Event-driven / asynchronous model

Work is handled using non-blocking operations and event loops.

Characteristics:

a few threads can handle many concurrent requests
efficient for I/O-bound workloads

Limitations:

more complex programming model
requires careful handling of asynchronous flows

This model reduces the number of blocked threads, but shifts complexity to coordination, callbacks, state handling, and non-blocking design.

Java perspective (example)

In Java, a common execution model uses thread pools.

For example:

ExecutorService executor = Executors.newFixedThreadPool(10);

executor.submit(() -> {
    // task logic
});

Requests are:

submitted to a queue
executed by a limited number of threads

If all threads are busy:

tasks wait in the queue
latency increases

For a detailed explanation of threads in Java, see:

→ https://ars-digitale.github.io/java-21-study-guide/en/module-07/threads/

This example is simple, but it highlights a key idea: limited execution resources naturally introduce queueing when demand exceeds immediate processing capacity.

Blocking vs non-blocking

Threads may:

block (wait for I/O, locks, external resources)
remain active (CPU-bound work)

Blocking reduces effective concurrency:

threads are occupied but do not progress
fewer threads are available for new work

Non-blocking approaches aim to:

reduce idle waiting
improve resource utilization

The distinction is important because a high thread count does not necessarily mean high throughput.

If threads spend most of their time waiting, concurrency is present, but productive execution is limited.

Practical implications

The execution model determines:

how concurrency is handled
how resources are used
how queueing appears

Typical effects include:

thread pool saturation → request queueing
blocking operations → reduced throughput
too many threads → context-switching overhead

The execution model also determines where bottlenecks become visible: in queues, in pools, in blocked threads, or in event loops.

Link with previous concepts

Thread behavior directly impacts:

queueing (→ 1.5.2 Saturation and queueing)
latency under load
effective capacity of the system

It also influences how quickly a system moves from stable behavior to saturation when concurrency increases.

Practical interpretation

Choosing an execution model is not only a programming decision.

It is a performance decision.

The model affects:

memory consumption
scheduling overhead
latency under waiting conditions
scalability under real workload

A design that is easy to implement may not be the design that behaves best under sustained load.

Key idea

The execution model defines how work is scheduled and processed.

Threads are not free.

How they are used determines:

how much work can be handled
how efficiently resources are utilized
how the system behaves under load

1.6.3 Contention and synchronization

Definition

Contention occurs when multiple threads compete for the same resource.

Synchronization is the mechanism used to coordinate access to shared resources.

These concepts are central to understanding performance degradation in concurrent systems.

They connect correctness and performance: the same mechanisms that protect shared state may also become the source of waiting and reduced scalability.

Shared resources

In concurrent systems, threads often share resources such as:

memory structures (objects, caches)
locks and monitors
thread pools and queues
database connections
I/O channels

When access is not coordinated, data corruption may occur.

When access is coordinated, contention may appear.

This makes synchronization necessary, but not free.

Synchronization

Synchronization guarantees that shared resources are accessed safely.

Common mechanisms include:

locks (mutexes, monitors)
synchronized sections
semaphores
atomic operations

Synchronization guarantees correctness, but introduces overhead.

That overhead may derive from:

waiting
serialization of execution
additional memory barriers
coordination costs between threads

Contention

Contention arises when multiple threads attempt to access the same resource simultaneously.

When contention occurs:

threads may block or wait
execution is delayed
throughput is reduced

The more threads compete:

the greater the waiting time
the lower the effective parallelism

A highly concurrent system may therefore behave like a partially serialized system if much of its work depends on the same shared resources.

Lock contention

A common form of contention involves locks.

When a thread holds a lock:

other threads must wait
a queue of waiting threads may form

Effects include:

increased latency
reduced throughput
potential bottlenecks

Lock contention is especially problematic when critical sections are long, frequently accessed, or placed on hot execution paths.

Contention vs utilization

High contention may occur even when CPU utilization is moderate.

For example:

many threads are waiting on a lock
CPU is partially idle
the system appears underutilized but is actually constrained

This is a common source of misleading diagnostics.

It explains why low or moderate CPU usage does not necessarily mean that the system has available capacity.

Fine-grained vs coarse-grained synchronization

Synchronization may be:

coarse-grained (few locks, large critical sections)
fine-grained (many locks, smaller critical sections)

Trade-offs:

coarse-grained → simpler but higher contention
fine-grained → more scalable but more complex

Choosing between the two models depends on workload characteristics, access patterns, and the cost of added design complexity.

Java perspective (example)

In Java, synchronization may be implemented using synchronized blocks:

synchronized (lock) {
    // critical section
}

Or explicit locks:

Lock lock = new ReentrantLock();

lock.lock();
try {
    // critical section
} finally {
    lock.unlock();
}

If many threads attempt to enter the same critical section:

contention increases
threads block
performance degrades

This example highlights how a correctness mechanism may become a scalability constraint under load.

Symptoms of contention

Typical indicators include:

increasing response time under load
low CPU utilization with high latency
threads in blocked or waiting states
long queues on shared resources

These symptoms often appear before total saturation and may be mistaken for other resource problems if not analyzed carefully.

Practical implications

Contention limits scalability.

Even with:

sufficient CPU
adequate memory

A system may fail to scale if:

threads spend time waiting instead of executing

Reducing contention often has a greater impact than optimizing individual operations.

This is especially true for systems whose performance is constrained by shared access rather than by pure computation.

Link with previous concepts

Contention contributes to:

queueing (→ 1.5.2 Saturation and queueing)
non-linear degradation (→ 1.5.3 Non-linear degradation)
throughput collapse (→ 1.5.4 Throughput collapse)

Contention is therefore both a local synchronization phenomenon and a system-level performance mechanism.

Practical interpretation

Concurrency increases opportunities for useful overlap, but it also increases competition for shared resources.

The practical challenge is not simply to add more threads, but to ensure that additional concurrency produces useful work rather than additional waiting.

Key idea

Concurrency introduces the need for synchronization.

Synchronization introduces contention.

Contention limits performance.

Understanding and controlling contention is essential for scalable systems.

1.6.4 Common concurrency issues

Concurrency introduces complexity.

When multiple threads interact, incorrect assumptions or poor coordination may lead to specific classes of problems.

These problems often appear under load and may severely affect performance and correctness.

Many of them are difficult to reproduce in superficial tests because they depend on timing, scheduling, or resource pressure.

1.6.4.1 Race conditions

Definition

A race condition occurs when multiple threads access shared data without adequate synchronization, and the result depends on timing.

The outcome is therefore not deterministic and may vary from one execution to another.

Example

Two threads update a shared counter:

Thread A reads value = 10
Thread B reads value = 10
Thread A writes 11
Thread B writes 11

Expected result: 12
Actual result: 11

The final value depends on the order in which unsynchronized operations are executed.

Impact

incorrect results
inconsistent system state
bugs difficult to reproduce

Race conditions may also corrupt internal assumptions in ways that appear only later under load.

Performance relevance

Race conditions may not always cause visible errors, but:

they often require additional synchronization
improper fixes may introduce contention

This is one of the reasons why correctness and performance cannot be treated as completely separate concerns in concurrent systems.

1.6.4.2 Deadlock

Definition

A deadlock occurs when two or more threads wait indefinitely for each other.

Each thread holds a resource and waits for another resource held by the other thread.

As a consequence, progress stops completely.

Example

Thread A holds lock L1 and waits for L2
Thread B holds lock L2 and waits for L1

Neither can proceed any further.

This circular waiting pattern is the defining characteristic of deadlock.

Impact

the system stalls
requests are never completed
resources remain locked

Deadlocks are especially severe because they turn active resources into permanently blocked resources.

Detection

threads remain blocked
thread dumps show circular waiting

Deadlocks are often detected through thread analysis rather than through general performance metrics.

1.6.4.3 Livelock

Definition

A livelock occurs when threads are not blocked but continuously change state in response to one another without making progress.

Unlike deadlock, activity continues, but useful work does not.

Example

Two threads repeatedly retry an operation:

both detect a conflict
both retry at the same time
the conflict persists

The system remains active, but the conflicting behavior continues indefinitely.

Impact

CPU is used
no useful work is completed

Livelocks may therefore look like active processing even though effective progress is zero.

1.6.4.4 Starvation

Definition

Starvation occurs when some threads are unable to obtain resources for a prolonged period.

Other threads continue to execute while some are effectively ignored.

This means that the system is making progress, but not in a fair or predictable way for all work.

Causes

unfair scheduling
high-priority threads dominating execution
resource monopolization

Starvation is especially problematic when a subset of requests experiences extreme latency while the rest of the system appears functional.

Impact

some requests experience very high latency
the system appears partially functional
tail latency increases

This makes starvation particularly relevant both from a performance and a user-experience perspective.

1.6.4.5 Thread pool exhaustion

Definition

Thread pool exhaustion occurs when all threads in a pool are busy and incoming tasks must wait.

This is one of the most common concurrency-related bottlenecks in real systems.

Causes

blocking operations within threads
insufficient pool size
long-running tasks

These causes may exist independently or reinforce each other under increasing load.

Effects

the request queue grows
latency increases
throughput may degrade

If saturation continues, thread pool exhaustion may also contribute to timeouts, retries, and instability in upstream components.

Link with previous concepts

Thread pool exhaustion is a direct example of:

saturation (→ 1.5.2 Saturation and queueing)
non-linear degradation (→ 1.5.3 Non-linear degradation)

It therefore constitutes one of the clearest practical expressions of the system behaviors introduced in the previous chapter.

Key idea

Concurrency issues are not only correctness problems.

They are also performance problems.

Many performance degradations are caused by:

contention
blocking
coordination failures

Understanding these issues is essential for diagnosing real systems.

1.6 – Concurrency and parallelism

Table of Contents

1.6.1 Concurrency vs parallelism

Definition

Concurrency

Parallelism

Key difference

Relationship with performance

Practical intuition

Link with previous concepts

Practical interpretation

Key idea

1.6.2 Threads and execution model

Definition

Processes and threads

Threads

Thread lifecycle

Stack and memory

Execution models

One thread per request

Thread pool

Event-driven / asynchronous model

Java perspective (example)

Blocking vs non-blocking

Practical implications

Link with previous concepts

Practical interpretation

Key idea

1.6.3 Contention and synchronization

Definition

Shared resources

Synchronization

Contention

Lock contention

Contention vs utilization

Fine-grained vs coarse-grained synchronization

Java perspective (example)

Symptoms of contention

Practical implications

Link with previous concepts

Practical interpretation

Key idea

1.6.4 Common concurrency issues

1.6.4.1 Race conditions

Definition

Example

Impact

Performance relevance

1.6.4.2 Deadlock

Definition

Example

Impact

Detection

1.6.4.3 Livelock

Definition

Example

Impact

1.6.4.4 Starvation

Definition

Causes

Impact

1.6.4.5 Thread pool exhaustion

Definition

Causes

Effects

Link with previous concepts

Key idea