Skip to content

1.7 Runtime and memory model

1.7 – Runtime and memory model

This chapter explains how "managed runtimes" organize memory, allocate objects, reclaim memory no longer used, and behave in a situation of memory "under pressure".

The focus is on runtime and memory mechanisms that directly influence latency, stability, and throughput under load.

Understanding these mechanisms is essential because many performance problems are not caused only by CPU or I/O limits, but by the way memory is allocated, maintained, and reclaimed over time.

Table of Contents


1.7.1 Memory structure (heap, stack)

Memory management models

Different systems use different memory management strategies.

Two common approaches are:

  • manual memory management
    Memory is explicitly allocated and freed by the programmer (e.g. C, C++)

  • managed memory
    Memory is allocated automatically and reclaimed by the runtime (e.g. Java, .NET)

This guide focuses on managed memory systems, where:

  • objects are allocated dynamically
  • memory is automatically reclaimed by one or more dedicated threads of the respective virtual machines (garbage collection)

This distinction is important because performance behavior changes significantly depending on whether the memory lifecycle is controlled directly by the programmer or indirectly by the runtime.


Definition

Memory is organized into different regions with well distinct roles.

The two most important areas for the performance discussion are:

  • heap
  • stack

These two regions support different aspects of program execution and have very different performance implications.


Heap

The heap is a shared memory area used for dynamic allocation.

In managed runtimes (such as Java):

  • objects are allocated on the heap
  • memory is managed by the runtime
  • garbage collection reclaims unused objects

Implications:

  • memory usage grows with the allocation rate
  • garbage collection impacts performance
  • shared access may introduce contention

The heap therefore is not only a storage area, but a central section with respect to runtime behavior under load.


Stack

Each thread has its own stack.

The stack stores:

  • method calls (call frames)
  • local variables
  • intermediate values

Characteristics:

  • private to each thread
  • grows and shrinks during execution
  • typically much smaller than the heap

Because the stack is private to the thread, access is simple and efficient, but the number of threads directly affects total stack memory usage.


Heap vs stack

Aspect Heap Stack
Scope Shared across threads Private per thread
Allocation Dynamic (objects) Automatic (method calls)
Lifetime Managed by runtime Tied to method execution
Performance More complex Very fast
Memory impact Global Per-thread

Interaction with threads

Each thread:

  • has its own stack
  • shares the heap

This creates a model in which:

  • execution is isolated per thread (stack)
  • data is shared across threads (heap)

This interaction is a source of:

  • contention (shared objects)
  • coordination overhead

It also explains why concurrency and behavior at the memory level are closely correlated in systems managed by the runtime.


Performance implications

Heap:

  • excessive allocation → increased GC activity
  • large heap → longer garbage collection cycles
  • shared access → potential contention

Stack:

  • many threads → higher total memory usage (one stack per thread)
  • deep call chains → increased stack usage
  • stack overflow → failure in extreme cases

These implications become particularly important when the system is under sustained or high-concurrency load.


Practical interpretation

Heap and stack are not only implementation details.

They influence:

  • how data is shared
  • how work is executed
  • how memory grows under concurrency
  • where runtime overhead appears

A system with many threads and frequent allocations stresses both regions differently: the stack through thread count and call depth, the heap through object creation and retention.


Key idea

The heap stores shared data.

The stack supports execution.

Performance depends on how these two interact under load.


Memory behavior directly impacts:

For this reason runtime and memory model cannot be analyzed separately from concurrency and system behavior.


1.7.2 Allocation and object lifecycle

Definition

In managed memory systems, objects are created dynamically and live for a certain period of time before being reclaimed by the runtime.

The way objects are allocated and how long they live has a direct impact on performance.

Allocation behavior therefore is not only a memory issue, but also a latency and stability issue.


Allocation

Allocation is the process of creating new objects in memory.

In most managed runtimes:

  • allocation happens on the heap
  • it is designed to be fast and efficient
  • it occurs very frequently in typical applications

Examples of allocation:

  • creation of request objects
  • building data structures
  • processing intermediate results

In high-throughput systems, allocation is often continuous and closely tied to workload intensity.


Allocation rate

The allocation rate is the quantity of memory allocated per unit of time.

It is a key performance factor.

A high allocation rate means:

  • more objects created
  • increased memory churn
  • increased pressure on the runtime

Even if individual allocations are fast, large volumes impact the system.

This is one of the reasons why “fast allocation” does not automatically mean “low memory overhead.”


Object lifecycle

Objects do not all live for the same duration.

Typical categories include:

  • short-lived objects
    created and discarded quickly (e.g. temporary request data)

  • medium-lived objects
    survive for some time during processing

  • long-lived objects
    remain in memory for extended periods (e.g. cache, shared state)

Understanding object lifetime is essential for reasoning about memory behavior.

This characteristic determines how much memory remains active over time and how the runtime must organize reclamation work.


Allocation patterns

Real systems tend to exhibit patterns such as:

  • many short-lived objects per request
  • occasional long-lived objects
  • bursts of allocation under load

These patterns determine:

  • memory usage
  • garbage collection behavior
  • performance stability

Allocation patterns are often more informative than isolated allocation events, because the runtime reacts to aggregate behavior over time.


Impact on performance

Allocation itself is usually fast.

The main impact comes from:

  • increased memory usage
  • pressure on garbage collection

A high allocation rate can lead to:

  • more frequent garbage collection cycles
  • increased latency
  • unpredictable pauses

The important point is that memory cost is often indirect: the system pays not only for creating objects, but for managing the consequences of creating many objects.


Under load

As load increases:

  • more requests are processed
  • more objects are created
  • allocation rate increases

This amplifies:

  • memory pressure
  • garbage collection activity
  • latency variability

A system that is stable at low load may therefore become memory-sensitive as request volume rises, even if the logic of each request remains unchanged.


Interaction with concurrency

Allocation is often performed by multiple threads.

This can lead to:

  • contention on memory structures
  • increased coordination overhead
  • uneven memory usage patterns

In high-concurrency systems:

  • allocation rate grows with concurrency
  • memory becomes a shared bottleneck

This is one of the ways in which concurrency and memory behavior reinforce each other under load.


Practical implications

To reason about performance, it is important to consider:

  • how many objects are created per request
  • how long they live
  • how allocation rate changes under load

Understanding allocation is essential to:

  • explain latency behavior
  • identify bottlenecks
  • predict system limits

It also helps distinguish between problems caused by computation and problems caused by memory churn.


Practical interpretation

Allocation is often invisible at code level because it is easy to write and generally inexpensive per operation.

However, at system level, repeated allocation changes the runtime workload.

A design that creates large quantities of temporary objects may work correctly, but still impose significant pressure on the memory subsystem.


Allocation and object lifetime directly influence:

  • garbage collection behavior (→ next section)
  • memory pressure
  • latency under load

They therefore constitute the causal basis of the runtime effects described in the rest of this chapter.


Key idea

Performance depends on how much memory is allocated and how long it is maintained.

Allocation patterns shape system behavior under load.


1.7.3 Garbage collection (conceptual)

Definition

Garbage collection (GC) is the process through which a managed runtime reclaims memory that is no longer in use.

Instead of requiring explicit deallocation, the runtime:

  • identifies unused objects
  • frees their memory
  • makes space available for new allocations

Garbage collection is one of the distinctive mechanisms of managed runtimes and one of the main ways in which memory behavior becomes visible in performance analysis.


Basic principle

An object is eligible for "collection" when it is no longer reachable (pointed to) by other elements of the program.

This means:

  • no active reference points to it
  • it cannot be accessed by the program

The runtime periodically:

  • scans object references
  • identifies unreachable objects
  • reclaims their memory

This model allows automatic memory management, but also implies that reclamation work must be performed during program execution.


Allocation and reclamation cycle

Memory usage follows a cycle:

  1. objects are allocated
  2. objects become unused
  3. garbage collection reclaims memory

This cycle repeats continuously during execution.

The runtime therefore alternates allocation of new memory and reclamation of old memory, with overall behavior driven by allocation rate and retention patterns.


Java perspective (example)

In Java, object allocation is frequent and inexpensive.

For example:

for (int i = 0; i < 1_000_000; i++) {
    String s = new String("test");
}

This code creates a large number of short-lived objects.

In a managed runtime:

  • these objects are allocated quickly on the heap
  • they become unreachable shortly after creation
  • garbage collection reclaims them

If such allocation patterns occur under load:

  • GC activity increases
  • memory pressure grows
  • latency may become unstable

The impact depends not on a single allocation, but on the allocation rate over time.

For this reason memory behavior must be analyzed as a pattern, not as an isolated operation.

Example: object retention

Objects that remain referenced are not collected.

List<String> cache = new ArrayList<>();

while (true) {
    cache.add(new String("data"));
}

In this case:

  • objects are continuously allocated
  • they are never released
  • memory usage grows over time

This leads to:

  • increased memory pressure
  • more expensive garbage collection cycles
  • potential system instability

This example illustrates the difference between temporary allocation churn and persistent retention.

Cost of garbage collection

Garbage collection is not free.

It introduces overhead:

  • CPU time to analyze memory
  • pauses during collection (depending on GC strategy/policy)

The cost depends on:

  • allocation rate
  • number of active objects
  • memory size

In other words, GC cost depends not only on how much memory exists, but on how much memory is active and still reachable.


Stop-the-world effect

Some phases (of some policies) of garbage collection may suspend application execution.

During these pauses:

  • application threads are temporarily on stand-by
  • no application work is performed

Even short pauses can:

  • increase latency
  • affect tail response times (p95, p99)

This is one of the reasons why GC issues often appear first in percentile-based latency analysis rather than in averages.


Generational behavior (conceptual)

Most modern runtimes use a generational approach.

Based on observation:

  • most objects are short-lived
  • few objects have prolonged lifetime

Memory is organized so that:

  • short-lived objects are collected frequently
  • long-lived objects are collected less often

This improves efficiency because reclaiming many short-lived objects is usually cheaper than repeatedly scanning memory with long retention.


Under load

As load increases:

  • allocation rate increases
  • garbage collection runs more frequently

This can lead to:

  • higher CPU usage
  • more frequent pauses
  • increased latency variability

Under significant load, GC may therefore shift from a background maintenance mechanism to a visible part of the system’s performance behavior.


Interaction with object lifecycle

Garbage collection behavior depends on:

  • how many objects are created
  • how long they live

Typical patterns:

  • many short-lived objects → frequent collections
  • many long-lived objects → heavier collections

For this reason allocation and retention must be analyzed together: object count alone is not sufficient.


Observable effects

Garbage collection issues often appear as:

  • latency spikes
  • tail latency (p95/p99 degradation)
  • periodic pauses
  • increased CPU usage without clear cause

These symptoms are often intermittent, which makes GC-related problems difficult to diagnose without correlating memory and latency signals.


Practical implications

Performance analysis must consider:

  • allocation rate
  • object lifetime distribution
  • frequency and cost of GC cycles

Optimization typically focuses on:

  • understanding allocation patterns
  • reducing unnecessary object creation
  • controlling memory pressure

Collector tuning may help, but it is usually more effective to understand in advance why the runtime is under pressure.


Practical interpretation

Garbage collection is not a bug or an anomaly.

It is a necessary runtime mechanism.

The performance question is not whether GC exists, but whether its operating cost remains compatible with the workload and latency objectives of the system.


Garbage collection is directly linked to:

It is therefore both a runtime mechanism and a system-level contributor to performance variability.


Key idea

Garbage collection enables automatic memory management but introduces variability.

Performance depends on how efficiently memory is reclaimed.


1.7.4 Memory pressure and performance

Definition

Memory pressure refers to the stress placed on the memory system when allocation, retention, and reclamation interact under load.

It concerns not only how much memory is used, but how memory is managed and behaves over time.

Memory pressure is therefore a dynamic condition, not simply a static measure of heap occupancy.


What creates memory pressure

Memory pressure is driven by a combination of factors:

  • high allocation rate
  • large number of active objects
  • long object lifetimes
  • inefficient memory reclamation

These factors reinforce each other and determine how much work the runtime must perform to keep memory usable.


Allocation vs retention

Two different patterns can create pressure:

  • high allocation rate
    many objects are created and quickly discarded

  • high retention
    objects remain in memory for long periods

These patterns create pressure in different ways.

High allocation rate increases churn and collection frequency.

High retention increases the amount of memory that remains active and must be scanned or preserved.


Example: high allocation rate

for (int i = 0; i < 1_000_000; i++) {
    String s = new String("test");
}

Characteristics:

  • many short-lived objects
  • frequent allocation
  • frequent garbage collection

Effects:

  • increased GC activity
  • CPU overhead
  • potential latency spikes

This example highlights pressure driven by churn rather than by long-term retention.


Example: memory retention

List<String> cache = new ArrayList<>();

while (true) {
    cache.add(new String("data"));
}

Characteristics:

  • objects are retained
  • memory usage continuously grows

Effects:

  • increased heap usage
  • heavier garbage collection cycles
  • eventual instability or failure

This example highlights pressure driven by retained memory rather than by the sole frequency of temporary allocation.


Under load

As system load increases:

  • more requests are processed
  • more objects are created
  • more objects are retained

This leads to:

  • increased allocation rate
  • increased memory usage
  • increased GC activity

Memory pressure amplifies:

  • latency variability
  • tail latency

For this reason memory-related degradation often becomes more visible when the system moves from moderate load to sustained high load.


Interaction with garbage collection

Garbage collection responds to memory pressure.

Under pressure:

  • collections become more frequent
  • pauses may increase
  • CPU usage grows

In extreme cases:

  • GC dominates execution
  • useful work decreases

When this happens, the runtime is spending a significant share of its work effort on memory management itself rather than on processing application work.


Observable symptoms

Memory pressure often appears as:

  • latency spikes without a clear CPU bottleneck
  • tail latency degradation (p95, p99)
  • periodic pauses
  • increased GC frequency
  • growing memory usage over time

These symptoms are especially important because they can be mistaken for generic slowness unless memory behavior is examined directly.


Practical intuition

A system may appear:

  • lightly loaded (moderate CPU)
  • but still slow

This often indicates:

  • memory pressure
  • GC-related overhead

This is one of the main reasons why CPU alone is not sufficient to assess system health.


Simplified model

System behavior can be approximated as:

  • allocation rate ↑ → GC activity ↑
  • retention ↑ → memory usage ↑
  • GC activity ↑ → latency variability ↑

These relationships are not linear.

They depend on runtime strategy, workload shape, object lifetimes, and the amount of active data.


Practical implications

To manage memory pressure:

  • understand allocation patterns
  • identify long-lived objects
  • monitor GC behavior
  • correlate memory metrics with latency

Optimization should focus on:

  • reducing unnecessary allocations
  • controlling object lifetime
  • avoiding unbounded retention

In many cases, the most effective solution is not collector tuning, but reducing the memory work that the runtime is forced to perform.


Memory pressure contributes to:

It is therefore a direct bridge between runtime internals and visible system behavior under load.


Practical interpretation

Memory pressure explains why a system may degrade even when it is not evidently CPU-bound or externally blocked.

A runtime under stress at memory level may appear active, but produce increasing latency, reduced throughput, and unstable behavior.

This makes memory pressure one of the most important hidden causes in the performance degradation of managed runtimes.


Key idea

Memory pressure derives from the interaction between allocation, retention, and garbage collection under load.

Understanding this interaction is essential to explain latency and stability problems in real systems.