Skip to content

1.6 – Concurrency and parallelism

This chapter introduces concurrency and parallelism as fundamental concepts in performance engineering for systems and applications.

It introduces work scheduling, how multiple tasks interact, and why coordination overhead, contention, and synchronization often become limiting factors under load.

Concurrency and parallelism are essential for scalability, but they also introduce complexity, overhead, and breaking points that directly influence latency, throughput, and system stability.

Table of Contents


1.6.1 Concurrency vs parallelism

Definition

Concurrency and parallelism are related but distinct concepts.

They are often confused, but they describe different aspects of system behavior.

Understanding the distinction is essential because a system may manage many activities concurrently from a structural point of view without actually executing many activities simultaneously at the hardware level.


Concurrency

Concurrency refers to the ability of a system to handle multiple tasks during the same time interval.

These tasks:

  • may not be executed at exactly the same moment
  • may be “interleaved”
  • share system resources

Concurrency concerns:

  • structure
  • coordination
  • management of multiple “in flight” operations

It is therefore mainly concerned with how work is organized and scheduled.


Parallelism

Parallelism refers to the execution of multiple tasks at the same instant.

This requires:

  • multiple processing units (e.g. CPU cores)
  • true simultaneous execution

Parallelism concerns:

  • execution
  • hardware utilization
  • doing more work at the same instant

It is therefore mainly concerned with simultaneous execution.


Key difference

  • Concurrency = dealing with many tasks
  • Parallelism = executing many tasks simultaneously

A system may be:

  • concurrent but not parallel (single core, “interleaved” tasks)
  • parallel but not highly concurrent (few long-running tasks)

This distinction matters because the scalability properties of a system depend not only on how much work exists, but also on how that work is coordinated and scheduled.


Relationship with performance

Concurrency affects:

  • how many requests can be in execution
  • how resources are shared
  • how contention arises

Parallelism affects:

  • how quickly work can be executed
  • how effectively hardware is utilized

Both influence:

  • throughput
  • latency
  • scalability

In practice, adding concurrency without sufficient parallelism may increase waiting and contention, while adding parallelism without good concurrency control may waste resources or expose coordination problems.


Practical intuition

A concurrent system:

  • can accept many requests
  • may still process them sequentially or with limited parallelism

A parallel system:

  • can process multiple requests at the same time
  • but may still suffer from contention or coordination overhead

For this reason, concurrency and parallelism should not be treated as automatically beneficial.

Their value depends on how they interact with workload, shared resources, and execution constraints.


Concurrency increases:

This leads to:

This is one of the main reasons why concurrency becomes a central topic in performance engineering and not only a programming concern.


Practical interpretation

Concurrency is often necessary to support many simultaneous operations, especially in networked and I/O-driven systems.

However, concurrency also increases the probability of:

  • shared-state interactions
  • queue buildup
  • lock contention
  • coordination overhead

Parallelism may increase throughput, but only if useful work is actually being executed rather than blocked or serialized.


Key idea

Concurrency determines how many tasks are active.

Parallelism determines how many tasks are executed at the same time.

Performance depends on both, and on how they interact with system resources.


1.6.2 Threads and execution model

Definition

The execution model defines how work is executed within a system.

In most systems, work is performed by threads, which are executed within a process.

The execution model determines how requests are mapped onto execution units, how waiting is handled, and how system resources are consumed under load.


Processes and threads

A process is an isolated execution environment:

  • it has its own memory space
  • it contains resources (files, sockets, memory)

A thread is an execution unit within a process:

  • multiple threads share the same process memory
  • threads execute tasks concurrently

In most applications:

  • one process hosts multiple threads
  • threads handle incoming requests

This shared-memory model makes threads efficient for communication, but it also introduces the complexity of shared state.


Threads

A thread:

  • executes instructions
  • consumes CPU time
  • may block while waiting (e.g. I/O, locks)

Multiple threads allow a system to:

  • handle more requests
  • overlap computation and waiting
  • increase concurrency

However, threads are not free.

Each additional thread introduces memory overhead, scheduling overhead, and coordination complexity.


Thread lifecycle

A thread typically goes through different states:

  • running (actively executing)
  • runnable (ready to execute, waiting for CPU)
  • waiting / blocked (waiting for a resource or an event)

Performance is influenced by how threads move between these states.

A system with many threads in “runnable” or “blocked” state may appear active, but achieve limited useful progress.

Understanding thread states is therefore essential when diagnosing concurrency issues.


Stack and memory

Each thread has its own stack:

  • it stores method calls and local variables
  • it grows and shrinks during execution

Implications:

  • more threads → greater memory usage (one stack per thread)
  • deep call chains → greater stack usage
  • stack exhaustion may lead to failures

This is particularly relevant in high-concurrency systems.

Thread count therefore affects not only scheduling, but also memory footprint and stability.


Execution models

Different systems use different execution models.

Common models include:


One thread per request

Each request is handled by a dedicated thread.

Characteristics:

  • simple model
  • easy to understand
  • blocking operations are straightforward

Limitations:

  • high memory usage with many threads
  • limited scalability under high-concurrency conditions

This model is conceptually simple, but it often behaves poorly when concurrency becomes very high or when blocking is frequent.


Thread pool

A fixed number of threads handles incoming requests.

Requests are queued and assigned to available threads.

Characteristics:

  • controlled concurrency
  • reduced overhead compared with unbounded threads

Limitations:

  • queueing when all threads are busy
  • potential saturation of the pool

This model is widely used because it provides controlled resource usage, but it introduces an explicit queue and therefore a visible capacity limit.


Event-driven / asynchronous model

Work is handled using non-blocking operations and event loops.

Characteristics:

  • a few threads can handle many concurrent requests
  • efficient for I/O-bound workloads

Limitations:

  • more complex programming model
  • requires careful handling of asynchronous flows

This model reduces the number of blocked threads, but shifts complexity to coordination, callbacks, state handling, and non-blocking design.


Java perspective (example)

In Java, a common execution model uses thread pools.

For example:

ExecutorService executor = Executors.newFixedThreadPool(10);

executor.submit(() -> {
    // task logic
});

Requests are:

  • submitted to a queue
  • executed by a limited number of threads

If all threads are busy:

  • tasks wait in the queue
  • latency increases

For a detailed explanation of threads in Java, see:

→ https://ars-digitale.github.io/java-21-study-guide/en/module-07/threads/

This example is simple, but it highlights a key idea: limited execution resources naturally introduce queueing when demand exceeds immediate processing capacity.


Blocking vs non-blocking

Threads may:

  • block (wait for I/O, locks, external resources)
  • remain active (CPU-bound work)

Blocking reduces effective concurrency:

  • threads are occupied but do not progress
  • fewer threads are available for new work

Non-blocking approaches aim to:

  • reduce idle waiting
  • improve resource utilization

The distinction is important because a high thread count does not necessarily mean high throughput.

If threads spend most of their time waiting, concurrency is present, but productive execution is limited.


Practical implications

The execution model determines:

  • how concurrency is handled
  • how resources are used
  • how queueing appears

Typical effects include:

  • thread pool saturation → request queueing
  • blocking operations → reduced throughput
  • too many threads → context-switching overhead

The execution model also determines where bottlenecks become visible: in queues, in pools, in blocked threads, or in event loops.


Thread behavior directly impacts:

It also influences how quickly a system moves from stable behavior to saturation when concurrency increases.


Practical interpretation

Choosing an execution model is not only a programming decision.

It is a performance decision.

The model affects:

  • memory consumption
  • scheduling overhead
  • latency under waiting conditions
  • scalability under real workload

A design that is easy to implement may not be the design that behaves best under sustained load.


Key idea

The execution model defines how work is scheduled and processed.

Threads are not free.

How they are used determines:

  • how much work can be handled
  • how efficiently resources are utilized
  • how the system behaves under load

1.6.3 Contention and synchronization

Definition

Contention occurs when multiple threads compete for the same resource.

Synchronization is the mechanism used to coordinate access to shared resources.

These concepts are central to understanding performance degradation in concurrent systems.

They connect correctness and performance: the same mechanisms that protect shared state may also become the source of waiting and reduced scalability.


Shared resources

In concurrent systems, threads often share resources such as:

  • memory structures (objects, caches)
  • locks and monitors
  • thread pools and queues
  • database connections
  • I/O channels

When access is not coordinated, data corruption may occur.

When access is coordinated, contention may appear.

This makes synchronization necessary, but not free.


Synchronization

Synchronization guarantees that shared resources are accessed safely.

Common mechanisms include:

  • locks (mutexes, monitors)
  • synchronized sections
  • semaphores
  • atomic operations

Synchronization guarantees correctness, but introduces overhead.

That overhead may derive from:

  • waiting
  • serialization of execution
  • additional memory barriers
  • coordination costs between threads

Contention

Contention arises when multiple threads attempt to access the same resource simultaneously.

When contention occurs:

  • threads may block or wait
  • execution is delayed
  • throughput is reduced

The more threads compete:

  • the greater the waiting time
  • the lower the effective parallelism

A highly concurrent system may therefore behave like a partially serialized system if much of its work depends on the same shared resources.


Lock contention

A common form of contention involves locks.

When a thread holds a lock:

  • other threads must wait
  • a queue of waiting threads may form

Effects include:

  • increased latency
  • reduced throughput
  • potential bottlenecks

Lock contention is especially problematic when critical sections are long, frequently accessed, or placed on hot execution paths.


Contention vs utilization

High contention may occur even when CPU utilization is moderate.

For example:

  • many threads are waiting on a lock
  • CPU is partially idle
  • the system appears underutilized but is actually constrained

This is a common source of misleading diagnostics.

It explains why low or moderate CPU usage does not necessarily mean that the system has available capacity.


Fine-grained vs coarse-grained synchronization

Synchronization may be:

  • coarse-grained (few locks, large critical sections)
  • fine-grained (many locks, smaller critical sections)

Trade-offs:

  • coarse-grained → simpler but higher contention
  • fine-grained → more scalable but more complex

Choosing between the two models depends on workload characteristics, access patterns, and the cost of added design complexity.


Java perspective (example)

In Java, synchronization may be implemented using synchronized blocks:

synchronized (lock) {
    // critical section
}

Or explicit locks:

Lock lock = new ReentrantLock();

lock.lock();
try {
    // critical section
} finally {
    lock.unlock();
}

If many threads attempt to enter the same critical section:

  • contention increases
  • threads block
  • performance degrades

This example highlights how a correctness mechanism may become a scalability constraint under load.


Symptoms of contention

Typical indicators include:

  • increasing response time under load
  • low CPU utilization with high latency
  • threads in blocked or waiting states
  • long queues on shared resources

These symptoms often appear before total saturation and may be mistaken for other resource problems if not analyzed carefully.


Practical implications

Contention limits scalability.

Even with:

  • sufficient CPU
  • adequate memory

A system may fail to scale if:

  • threads spend time waiting instead of executing

Reducing contention often has a greater impact than optimizing individual operations.

This is especially true for systems whose performance is constrained by shared access rather than by pure computation.


Contention contributes to:

Contention is therefore both a local synchronization phenomenon and a system-level performance mechanism.


Practical interpretation

Concurrency increases opportunities for useful overlap, but it also increases competition for shared resources.

The practical challenge is not simply to add more threads, but to ensure that additional concurrency produces useful work rather than additional waiting.


Key idea

Concurrency introduces the need for synchronization.

Synchronization introduces contention.

Contention limits performance.

Understanding and controlling contention is essential for scalable systems.


1.6.4 Common concurrency issues

Concurrency introduces complexity.

When multiple threads interact, incorrect assumptions or poor coordination may lead to specific classes of problems.

These problems often appear under load and may severely affect performance and correctness.

Many of them are difficult to reproduce in superficial tests because they depend on timing, scheduling, or resource pressure.


1.6.4.1 Race conditions

Definition

A race condition occurs when multiple threads access shared data without adequate synchronization, and the result depends on timing.

The outcome is therefore not deterministic and may vary from one execution to another.


Example

Two threads update a shared counter:

  • Thread A reads value = 10
  • Thread B reads value = 10
  • Thread A writes 11
  • Thread B writes 11

Expected result: 12
Actual result: 11

The final value depends on the order in which unsynchronized operations are executed.


Impact

  • incorrect results
  • inconsistent system state
  • bugs difficult to reproduce

Race conditions may also corrupt internal assumptions in ways that appear only later under load.


Performance relevance

Race conditions may not always cause visible errors, but:

  • they often require additional synchronization
  • improper fixes may introduce contention

This is one of the reasons why correctness and performance cannot be treated as completely separate concerns in concurrent systems.


1.6.4.2 Deadlock

Definition

A deadlock occurs when two or more threads wait indefinitely for each other.

Each thread holds a resource and waits for another resource held by the other thread.

As a consequence, progress stops completely.


Example

  • Thread A holds lock L1 and waits for L2
  • Thread B holds lock L2 and waits for L1

Neither can proceed any further.

This circular waiting pattern is the defining characteristic of deadlock.


Impact

  • the system stalls
  • requests are never completed
  • resources remain locked

Deadlocks are especially severe because they turn active resources into permanently blocked resources.


Detection

  • threads remain blocked
  • thread dumps show circular waiting

Deadlocks are often detected through thread analysis rather than through general performance metrics.


1.6.4.3 Livelock

Definition

A livelock occurs when threads are not blocked but continuously change state in response to one another without making progress.

Unlike deadlock, activity continues, but useful work does not.


Example

Two threads repeatedly retry an operation:

  • both detect a conflict
  • both retry at the same time
  • the conflict persists

The system remains active, but the conflicting behavior continues indefinitely.


Impact

  • CPU is used
  • no useful work is completed

Livelocks may therefore look like active processing even though effective progress is zero.


1.6.4.4 Starvation

Definition

Starvation occurs when some threads are unable to obtain resources for a prolonged period.

Other threads continue to execute while some are effectively ignored.

This means that the system is making progress, but not in a fair or predictable way for all work.


Causes

  • unfair scheduling
  • high-priority threads dominating execution
  • resource monopolization

Starvation is especially problematic when a subset of requests experiences extreme latency while the rest of the system appears functional.


Impact

  • some requests experience very high latency
  • the system appears partially functional
  • tail latency increases

This makes starvation particularly relevant both from a performance and a user-experience perspective.


1.6.4.5 Thread pool exhaustion

Definition

Thread pool exhaustion occurs when all threads in a pool are busy and incoming tasks must wait.

This is one of the most common concurrency-related bottlenecks in real systems.


Causes

  • blocking operations within threads
  • insufficient pool size
  • long-running tasks

These causes may exist independently or reinforce each other under increasing load.


Effects

  • the request queue grows
  • latency increases
  • throughput may degrade

If saturation continues, thread pool exhaustion may also contribute to timeouts, retries, and instability in upstream components.


Thread pool exhaustion is a direct example of:

It therefore constitutes one of the clearest practical expressions of the system behaviors introduced in the previous chapter.


Key idea

Concurrency issues are not only correctness problems.

They are also performance problems.

Many performance degradations are caused by:

  • contention
  • blocking
  • coordination failures

Understanding these issues is essential for diagnosing real systems.