PostgreSQL / EDB

Ryzen Under Pressure: How CPU Resource Contention Degrades PostgreSQL Performance

22/05/2026
Share:

Database performance is about far more than TPS numbers and low average response times. In real-world environments, it is often the latency spikes hidden beneath seemingly healthy metrics that destabilize applications, trigger difficult-to-diagnose issues, and ultimately reduce service reliability. In this article, we show why running PostgreSQL on the same host as the backend can become a costly architectural trap, and how the “Noisy Neighbor” effect impacts system stability under load. Based on real-world tests conducted on AMD Ryzen processors, we examine how PostgreSQL behaves when competing for CPU resources, exposing the limitations of commonly used monitoring metrics and demonstrating why isolating the data layer often delivers greater benefits than raw hardware power alone.

In the era of widespread virtualization and containerization, striving for maximum hardware utilization has become standard practice. Often, to save resources or simplify the architecture, we decide to run a database engine (e.g., PostgreSQL) on the same host where the backend application operates.

Theoretically, this is an ideal solution: we eliminate network latency, benefit from instant connections via Unix Sockets, and simplify the technology stack. In practice, however, we expose the database to a phenomenon known as the “Noisy Neighbor”.

Spis treści:

What is a Noisy Neighbor?

It is a situation where one process (in our case, the backend application) aggressively consumes shared resources – CPU cycles, L3 cache, or memory bus bandwidth – negatively impacting the performance of other processes running on the same physical host.

For a database that requires predictable response times to handle transactions, being a “neighbor” to an active application can mean a drastic degradation of service quality, which is not visible in simple average CPU load statistics.

Purpose of this Article

In this paper, based on precise tests conducted on AMD Ryzen processors, we will demonstrate:

  1. Why a local database connection with an application is a “performance trap” under load.
  2. How competition for the processor affects the stability of p99 latency.
  3. Why database isolation (horizontal scaling) is crucial for maintaining a high SLA standard, even if it involves using theoretically weaker hardware.

We will go through hard data, from an ideal idle state (0% Load) to extreme resource saturation, exposing the weaknesses of the most popular monitoring metrics along the way.

Test Environment and Methodology

To ensure the results are reliable and repeatable, the test environment was standardized for performance, despite differences in the hardware layer. The key element of the methodology was to create a situation where the computing power of both virtual machines was comparable in idle states.

Hardware Infrastructure

The tests were conducted on two nodes of a Proxmox cluster with different processor characteristics:

  • Node 1 (Faster): AMD Ryzen 7 8745HS (Zen 4 architecture, 4nm), 32 GB DDR5 RAM.
  • Node 2 (More Stable): AMD Ryzen 7 5825U (Zen 3 architecture, 7nm), 64 GB DDR4 RAM.

Both machines were connected via a 1 Gbps switch, simulating standard network conditions in small and medium-sized infrastructures.

Virtual Machine Configuration (Software)

Identical virtual machines running Rocky Linux 9.6 were launched on both nodes. The VM parameters were selected to minimize virtualization overhead:

  • CPU: 4 cores (type: host), to ensure direct access to processor instructions.
  • RAM: 8096 MB.
  • Disk: 32 GB SSD (local storage).
  • Database: PostgreSQL 17 (default configuration, without aggressive optimization, to highlight the impact of processor resources).

Computing Power Normalization

Proxmox mechanisms (cpulimit=1.8, cpuunits=1024) were applied to the faster node to equalize its performance with the weaker unit. Verification using sysbench cpu confirmed that nearly identical synthetic performance was achieved:

  • Machine 26: 10709 events per second.
  • Machine 27: 10685 events per second.

Load Scenarios (Baseline vs. Stress)

The main goal of the study was to examine how increasing CPU load from an external application (a “noisy neighbor”) affects database parameters. For this purpose, three levels of background load generated by the stress-ng tool were defined:

  • 0% Load: A database performance test with no additional processes loading the host. This allows for determining the theoretical maximum performance under ideal conditions.
  • 30% Load: Simulation of a standard backend application sharing resources with the database. The following command was used: stress-ng --cpu 4 --cpu-load 30 --vm 1 --vm-bytes 256M --vm-keep --vm-hang 0
  • 40% Load: A high-load scenario testing the system’s stability limits. The following command was used: stress-ng --cpu 4 --cpu-load 40 --vm 1 --vm-bytes 256M --vm-keep --vm-hang 0

Measurement Methodology and Connection Configurations:

All tests were automated using a custom script, executed with commands in the format ./test.sh "SCENARIO" 20 60 (where “20” represented the number of pgbench threads, and “60” the test duration in seconds). A duration of 60 seconds was considered fully representative after conducting earlier, 300-second validation tests that yielded convergent results. For each scenario, throughput was examined as a function of an increasing number of parallel clients (connections): 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, and 100.

Regardless of the scenario, the traffic generator (pgbench) and the load-generating application (stress-ng) were always run on machine .26.

  • A0 (Local Direct): The database was run on the same machine as the client (.26). Communication occurred locally, using the highly efficient Unix Sockets.
  • A2 (Remote Bouncer): The database was moved to a dedicated machine (.27). The client from machine .26 connected over the network to a pgbouncer process running in front of the database on machine .27.

Results: Transactions Per Second (TPS)

Database performance measured by the number of transactions per second (TPS) is often the first indicator that administrators look at. In our tests, with increasing CPU load from an external application, the TPS results reveal the non-linear nature of the throughput degradation.

Throughput Comparison (Baseline vs. Load)

Analyzing the results from the 0% Load, 30% Load, and 40% Load tests, we see drastic differences in the resilience of both scenarios to increasing background load.

Scenario TPS (0% Load) TPS (30% Load) TPS (40% Load) Decrease (0% -> 40%)
A0 (Local Direct) 5615.76 3785.25 3726.01 -33.65%
A2 (Remote Bouncer) 4442.78 2352.46 1708.40 -61.55%
tps comparison
The chart shows TPS performance for different load levels. Notice how the remote model (A2) drastically loses throughput at 40% load, which is due to CPU saturation on the application host..

Conclusions from the TPS Results Analysis

  1. Local Model (A0): Despite resource competition, it maintains a relatively high throughput (3726.01 TPS) thanks to the low communication overhead of the Unix Socket. However, as we will see in the next chapter, this comes at the cost of gigantic latency instability.
  2. Remote Model (A2): The drop in TPS to 1708.40 in scenario A2 is a critical finding. It demonstrates that as the application host becomes CPU-saturated, the bottleneck moves from the database to the client itself. The load generator (pgbench) is throttled by the “noisy neighbor” (stress-ng) and can no longer generate enough traffic to saturate the well-performing, isolated database. This highlights the importance of resource management on the application tier, even when the database is remote.
  3. Client-Side Bottleneck: In scenario A2, the TPS drop is driven primarily by the lack of free CPU cycles on the .26 host, which throttles the load generator (pgbench). This prevents it from fully utilizing the potential of the isolated database on host .27. While throughput decreases, the remote database remains stable and ready for more load, provided the client can deliver it.

The Average Latency Trap – p95 and p99 Analysis

Most monitoring systems focus on Average Latency. Our tests prove that under resource contention, the average is an extremely misleading indicator because it “masks” the real problems faced by end-users.

avg latency v5
Comparison of average latency. Although the local model (A0) appears faster, this is only part of the truth about the system.

Full Latency Results Breakdown (A0 vs A2)

Scenario Load [%] Avg Latency [ms] p95 Latency [ms] p99 Latency [ms]
A0 (Local Direct) 0% 17.20 71.54 98.34
A0 (Local Direct) 30% 25.53 89.78 164.56
A0 (Local Direct) 40% 25.82 93.30 173.94
A2 (Remote Bouncer) 0% 22.50 70.52 97.23
A2 (Remote Bouncer) 30% 42.50 91.63 103.58
A2 (Remote Bouncer) 40% 58.48 88.73 167.53

Analysis of “Tail” Latency Stability

The most important conclusion from the above comparison is the dynamic of p99 latency growth:

  1. Resilience to Spikes (0% -> 30% Load): In the test with a high number of connections (j=100), adding 30% application load in the local model (A0) caused a p99 spike of 66.22 ms. At the same time, in the remote model (A2), p99 latency increased by only 6.35 ms.
  2. Critical Point (40% Load): At 40% application host load, the p99 latency in the remote model began to rise sharply, reaching 167.53 ms.
p95 comparison v5
The p95 chart shows that even for 95% of all queries, the differences between the local and remote models begin to blur under high load.
99 comparison log
The key p99 latency chart (logarithmic scale). It clearly shows how the local model (A0) rapidly loses stability under high load (40% Load), while the remote model (A2) offers much more predictable response times.

CPU Utilization Analysis vs. Database Stability

Understanding the causes of p99 latency degradation requires looking at what happens directly inside the processor during the tests. A comparison of CPU performance for scenarios A0 and A2 at 30% and 40% load reveals critical differences in resource availability.

Scenario A0 (Local): Fighting for Every Cycle

In the local model (A0), the .26 machine must simultaneously handle the database, the stress-ng application, and the pgbench traffic generator.

cpu v5 26 A0
CPU utilization on the .26 machine (A0) – 30% load. Note the almost complete lack of idle time (Idle < 3%). The system is operating at the edge of its capacity.

When the host load increases to 40% Load, the margin for error disappears completely:

cpu v10 26 A0
CPU utilization on the .26 machine (A0) – 40% load. The processor is constantly saturated. Every database query must “fight” for processor time with the aggressive stress-ng process, resulting in a p99 latency of 173.94 ms.

Scenario A2 (Remote): Isolation as a Guarantee of Stability

In the remote model (A2), the load is distributed. The .26 machine handles the application, and the .27 machine is dedicated exclusively to the database.

However, the key to success lies in the load on the .27 machine (DB Host):

cpu v5 27 A2
CPU utilization on the dedicated .27 machine (A2). The database has a significant portion of the resources for itself (large Idle area), which allows it to react instantly to queries and maintain low p99 latency (at the level of 3-10 ms for a low number of connections, which is impossible in scenario A0).

Conclusions from CPU Utilization Analysis

The correlation between the lack of Idle time and p99 latency is almost linear. The moment Idle on the .26 machine drops below 5%, every process synchronization operation (Locking) or context switch begins to generate huge “tail” delays.

Isolating the database on the .27 machine provides it with so-called Headroom – a reserve of computational power necessary to handle sudden latency spikes, which is impossible in the saturated local model A0. Even when the application host’s load increases to 40% Load, the dedicated machine still retains this power reserve, as illustrated by the chart below:

cpu v10 27 A2
CPU utilization on the dedicated .27 machine (A2) with 40% application host load. The database, despite the increased query load, maintains an available Idle area, guaranteeing predictability and less response time degradation than in the shared scenario.

Comparative Analysis of Dedicated and Shared Resources

In database system optimization, processor selection is often based on raw performance benchmarks (CPU Mark). However, test data (30% Load) indicates significant differences in the behavior of the PostgreSQL engine depending on the degree of compute resource isolation, which redefines the concept of an “efficient host” in a production environment.

Results Comparison: Ryzen 8000 (Shared) vs. Ryzen 5000 (Dedicated)

Key conclusions come from comparing the A0 (Local Direct) and A2 (Remote Bouncer) scenarios under a constant load generated by an external application (stress-ng at 30%).

Metric (j=2) A0 (Local – Ryzen 8000 + App) A2 (Remote – Ryzen 5000 Dedic)
p99 Latency 53.83 ms 3.24 ms
Latency Variance High (instability) Low (predictability)

Despite using a newer architecture (Zen 4) in the A0 scenario, the p99 latency was over 16 times higher than on the older processor (Zen 3) operating in full isolation. This result confirms that network overhead (a physical switch) is negligible compared to the overhead resulting from the lack of temporal stability of a shared processor.

Why Does a Local Socket Yield to Network Isolation? (Technical Analysis)

This phenomenon stems directly from the processor’s architecture and the way the operating system kernel manages processes.

“Cache Thrashing” in L3 Cache

Modern Ryzen processors base their performance on large and fast L3 caches. In the 8745HS model, this cache is a critical resource.

  • Problem: The stress-ng application, operating on large data structures, aggressively evicts PostgreSQL’s memory pages from the L3 cache.
  • Effect: After acquiring CPU cycles, the Postgres backend experiences a series of “cache misses.” The need to access RAM (latency on the order of microseconds) instead of L3 (nanoseconds) causes a sharp increase in the execution time of individual operations, which accumulates into high latency percentiles.

Context Switching – Why Does the Processor Waste Time “Thinking About Work”?

From a developer’s perspective, PostgreSQL’s process model (one-process-per-connection) is very safe and clear – each database backend is a separate system process. However, at a large scale (in this case, 100 connections) and with a limited number of cores (4 cores in the VM), this model becomes a challenge for the operating system’s scheduler.

What Exactly is a Context Switch for a Postgres Process?

Imagine you are working on a difficult programming task (this is our Postgres process), but you only have one desk (a CPU core). Suddenly, the system decides it’s time for another application (stress-ng). To do this, the processor must perform a series of operations that provide no business value:

  1. Save Context: The processor must save the current content of all registers (including the stack pointer and instruction pointer) to memory. This freezes the state of the SQL query mid-execution.
  2. Clear the “Desk” (Potential TLB Flush): A particularly costly part of this process is the potential need to flush the Translation Lookaside Buffer (TLB). Because processes have isolated memory (Virtual Memory), switching from the database to the application often requires refreshing these memory management structures in the processor. It’s as if with every task change, you had to discard your personal address book (which stores the locations of your documents) and create a new one for the next task because the old addresses no longer apply.
  3. Restore Context: Loading the state of the stress-ng application and restoring its registers.

When stress-ng finishes its quantum time, the entire process happens in reverse. With a 30% load and 100 connections, such operations occur thousands of times per second, turning a powerful processor into a device that spends most of its time “shuffling papers”.

The “Lock Holder Preemption” Problem

This phenomenon best explains why p99 latency can jump by several tens of milliseconds with even the slightest load. In Postgres, processes often have to wait for each other, using lightweight locks (so-called LWLocks or Spinlocks) on structures in shared memory.

Scenariusz katastrofy:

  1. Backend A acquires a lock on an important data structure in shared memory. At that exact millisecond, the Scheduler preempts Backend A and gives the core to the stress-ng application.
  2. Backend B (and 10 others) wants to acquire the same lock. But Backend A is “sleeping” in the processor queue, still holding the key to the lock.
  3. All other Postgres processes come to a halt. From the application’s perspective, the database “freezes” for several to several tens of milliseconds, waiting for Backend A to finally get its 100 microseconds of processor time to release the lock. This phenomenon causes the p99 latency in the A0 model (30% Load) with 100 connections (j=100) to be as high as 164.56 ms.

The “Tail Latency” Phenomenon

Sharing CPU resources directly translates to a so-called “long tail” of latency. While the average response time may seem acceptable, the p99 (representing the 1% slowest queries) exposes the system’s lack of predictability. In the A2 (Remote) scenario, thanks to dedicated cores, the system scheduler handles almost exclusively database processes, which allows for smooth lock releasing and maintaining low latency even under load.

Architectural Conclusions

Empirical data indicates that resource isolation is a more important factor for stabilizing a database than raw computing power. Moving the data layer to a separate host (A2):

  • Eliminated competition for the L3 cache.
  • Minimized the risk of backends being blocked by preempted processes (Lock Holder Preemption).
  • Enabled the maintenance of a stable p99 latency, which at low load (j=2) is almost 17 times lower than in the local model (3.24 ms vs. 53.83 ms), and at full saturation (j=100 and 30% background load) remains 60.98 ms lower than in the A0 scenario.

For system architects, the conclusion is clear: in scenarios dominated by contention for CPU resources, horizontal scaling (database isolation) offers a higher quality of service (QoS) and predictability than aggressive vertical scaling on a single host.

Summary and Recommendations for Architects

The conducted tests debunk the popular myth that adding raw computing power to a single host (Vertical Scaling) will always solve database performance problems. In mixed-workload environments (application + database), the real challenge is not the lack of CPU cycles, but the unpredictability of their allocation.

Main Conclusions from the Tests

  1. Isolation Trumps Raw Power: Even a theoretically weaker processor (AMD Ryzen 7 5825U) operating in isolation on a dedicated host offers significantly more stable response times (p99) than the latest unit (Ryzen 7 8745HS) sharing resources with the application.
  2. Average Latency is a Trap: The average response time (Average Latency) favors the local model but completely ignores drastic latency spikes (jitter), which in the A0 model reach 173.94 ms at 40% Load.
  3. Stability Breaking Point: At a moderate load (30% Load), the remote model (A2) experiences almost no increase in p99 latency (+6.35 ms), while the local model loses stability almost immediately (+66.22 ms).

When to Absolutely Separate Layers (App and DB)?

Based on the collected data, we recommend migrating the database to a separate host in the following cases:

  • When SLA stability is a priority: If a p99 latency above 100 ms is unacceptable for your application.
  • In environments with spiky CPU load: If the backend application performs periodic, heavy tasks (e.g., report generation, image processing) that destabilize the database’s operation.
  • With a high number of connections: Above 50-100 active connections (j), the overhead from process management (Context Switching) and the struggle for CPU cache become the dominant factors degrading the quality of service.

The Golden Architectural Rule

Despite the theoretical gains from the lack of network overhead, in a high-availability and high-quality-of-service (QoS) architecture, separation of the data layer is the foundation of predictability.

If your budget allows for a choice between one “powerful” server and two “medium” ones, and the main challenge is intensive and variable CPU load on the application side, our test data unequivocally suggests choosing two smaller units. Such physical separation of the data layer is the foundation of system predictability. However, it should be remembered that in systems limited by disk I/O or memory bandwidth, the architectural assessment must be carried out individually.

Look more