Spanner PostgreSQL performance benchmarking

hengfeng · 01-15-2025 10:22 AM

Introduction

Spanner is Google’s always-on, virtually unlimited database that powers some of the most demanding operational workload behind brands, like Yahoo!, The Home Depot, Wayfair, and Pokémon Go as well as Gmail, YouTube, Photos within Google. Today Spanner handles over 4 billion queries per second at peak and more than 15 exabytes of data, with five 9s of multi-regional availability and global consistency.

Spanner’s PostgreSQL interface provides the scalability and reliability of Spanner that ops teams rely on with the familiarity and portability of PostgreSQL that developers love. It allows you to use PostgreSQL schemas, queries, and clients against Spanner’s distributed query processing, transactions, and storage. You can learn more in an earlier post, Understanding Spanner’s PostgreSQL Interface.

Here we’ll explain the latency profile of PostgreSQL interface databases, including PGAdapter, a lightweight proxy that allows applications to use off-the-shelf PostgreSQL drivers. We’ll address the two main questions we’ve heard from users building applications with Spanner:

What is the latency overhead of PGAdapter?
What is the latency difference between Spanner’s PostgreSQL and GoogleSQL dialects?

To answer these questions, we ran a series of benchmarks detailed below. To summarize our findings,

PGAdapter only adds minimal latency, especially as a percentage of the end-to-end latency of typical queries.
The PostgreSQL interface is as fast as GoogleSQL.

Background

TPC-C benchmarking. We chose TPC-C for our benchmarking, and modified Percona TPCC from Lua to Java for our setup. TPC-C uses random transactions that contain complex queries and updates, which provides a good approximation of a transaction processing workload.

Spanner. We benchmarked four approaches, three using various client drivers against a Spanner PostgreSQL interface database and a control against a GoogleSQL dialect database:

PGAdapter with the community PostgreSQL JDBC driver
Spanner’s JDBC driver
Spanner’s Java Client library
Spanner’s Java Client library against a GoogleSQL dialect database, as a control

We compare (1) and (2) to determine whether PGAdapter introduces any additional latency. Then, the comparison between (3) and (4) can be used to determine the performance differences between the same driver and queries running against PostgreSQL and GoogleSQL dialect databases.

Latency metrics. For benchmarking results, we focus on latency metrics. We rely on metrics detailed in Latency points in a Spanner request.

Below is an overview of our setup with PGAdapter and the PostgreSQL JDBC driver (PG JDBC) for evaluating whether PGAdapter introduces any additional latency:We check the following metrics:

query_latencies (6): the length of time it takes for a query to execute.
request_latencies (5): the difference between when the Spanner API Frontend receives the first byte of the request and when it sends the last byte of the response.
gfe_latencies (4): the length of time between when the Google Front End receives a remote call and receives the first byte of a response.
client_lib_latencies at PGAdapter (3): the time it takes between making a remote call to the client library and getting the response.
pgadapter’s roundtrip_latencies (2): the time it takes to process a PG JDBC query execution. This may involve multiple requests between PG JDBC and PGAdapter.
benchmarker’s roundtrip_latencies (1): the time it takes to execute a query and get its result at the client side.

Below is an overview of the benchmarking setup with Spanner JDBC (using PG dialect) for evaluating whether the PostgreSQL dialect introduces any additional latency compared to the GoogleSQL dialect:

For this approach, the latency metrics are very similar to the PGAdapter and PostgreSQL JDBC option. However, since we don't have PGAdapter, we record client_lib_latencies at Spanner JDBC. By using this metric, we can understand if the time from the benchmarker to the client library becomes higher after introducing PGAdapter.

If we use the client library directly (PostgreSQL or GoogleSQL), we measure the following metrics:

query_latencies
request_latencies
gfe_latencies
benchmarker’s roundtrip_latencies

Benchmark setup

Environment. We set up the benchmarking environment as follows:

Spanner cluster:

Name	Value
Spanner region	us-east4 (Northern Virginia)
Regional configuration	Single region
Nodes	3 nodes
Storage used	1 TB
Scaling mode	Manual
CPU utilization	Approximate 60%, 75%, and 90% CPU

Client VM:

Name	Value
VM instance type	n2-standard-16
vCPU	16
Memory in GB	64
Disk size in GB	10
OS	Debian GNU/Linux 11 (bullseye)
Platform	Intel Cascade Lake (x86/64)

TPCC:

Name	Value
Warehouses	10,000
Data size	~1TB
Threads	64, 96, and 128.

Benchmark results

Get benchmark results from Cloud Monitoring

We export all metrics to Cloud Monitoring to get benchmarking results. For example, we can get P50 latency data by running the following queries in Cloud Monitoring:

#1 PGAdapter does not contribute significantly to overall latency

We compare the benchmarker’s roundtrip_latencies of: (1) using PGAdapter (with the PG JDBC driver) and (2) using the Spanner JDBC driver, over different traffic loads - sending requests over 64, 96, and 128 client threads.

This diagram makes the comparison of approaches with and without PGAdapter. It checks P50 and P95 latencies on varying client connections from 64 to 128 threads. The y-axis is the end-to-end latency for a query or DML statement: using PGAdapter (with the PG JDBC driver) versus using the Spanner JDBC driver. Lower is better.

When the load is normal (64 threads), we can see the delta is very small (< 0.1ms). With increasing load (96 threads), the P50 difference is still small (0.11ms) and the P95 difference increases, which are caused by other factors explained below. With even more load (128 threads), PGAdapter has better latency compared to Spanner JDBC in the P50 difference, which indicates that PGAdapter is not a key factor of additional latencies and there are other significant factors.

Why does PGAdapter have better latency than Spanner JDBC on its own in some tests? The entire client stack only introduces a small amount of latency relative to the rest of the processing. As a result, these measures are susceptible to relatively small variations across tests, such as network delays between clients and servers.

In addition, the combination of PG JDBC and PGAdapter may have small advantages in specific scenarios. The PostgreSQL wire protocol supports named prepared statements, which means that a prepared SQL statement can be reused without having to send and parse the entire SQL string again. The Spanner JDBC driver can also largely use this, but needs to do one additional conversion that's not needed by PostgreSQL; converting JDB- style query parameters (?) to PostgreSQL-style query parameters ($1). This allows PGAdapter to use this cache more efficiently.

#2 PGAdapter latency is a very small percentage of overall latency

Queries, including updates, only spend a small amount of time in PGAdapter. The stacked bars below show the percentage of time that a query spends in each component of a round-trip latency. For example, the PGAdapter bar shows how much time PGAdapter occupies to process a statement and return a response, from receiving a request to calling the client library API and from receiving the client library’s response to returning to the PGAdapter’s caller. This equals pgadapter’s roundtrip_latencies minus client_lib_latencies. Similarly, the Spanner API Frontend bar represents request_latencies minus query_latencies for how much time a query/DML spends just in Spanner API Frontend.

The above diagram is collected from P50 latencies with 64 client threads. Compared to Spanner JDBC, PGAdapter contributes only 2.7% to the total latency. Other components like Client Library (19.5%), GFE (10.2%), Spanner API Frontend (31.5%) and Query (34.6%), contribute a much higher percentage. Those components can have variances due to network or other reasons which can impact the overall latency more than the overhead introduced by PGAdapter. PGAdapter only introduces a negligible amount of overhead because it runs in a single process locally with no complex query processing logic, which will not be affected by the unreliable network.

#3 Spanner PostgreSQL interface is as fast as GoogleSQL

In this section, we compare the results by running the same Spanner client library against equivalently configured PostgreSQL and GoogleSQL dialect databases. This test controls for the client to isolate the performance differences between the database dialects themselves. This test does not use PGAdapter. Similar to previous sections, we compare the differences of benchmarker’s roundtrip_latencies and the latency constitution.

In the above diagram, the y-axis shows the end-to-end latency for each dialect. Lower is better. The results show that there is no significant difference between the two dialects. The small variances can be the difference among different runs. In some cases, PostgreSQL performs better than GoogleSQL which shows that the query parsing only takes a small amount of time and other factors, such as the network delays between GFE and Spanner API Frontend or Spanner API Frontend and Spanner backend, can cause variances.

The following diagram shows the amount of time spent between each component (P50 latencies with 64 client threads). For example, Benchmarker represents the time of two trips: from the benchmarker starting a request to GFE receiving it, and from getting a GFE response to the benchmarker returning to its caller. GFE stands for the time between Google Frontend latency (measured in GFE) minus Spanner API Frontend latency (measured in Spanner API Frontend). The major portion of time is between when Spanner API Frontend receives the first request and returns a response which takes ~70% of the latency. The percentage of latency is very close between PostgreSQL and GoogleSQL in each component.

Overall, we can see that the PostgreSQL query parsing is not the major portion of latencies and does not add significant overhead.

How to reproduce the results

The TPCC benchmarking code is checked into the PGAdapter repository. To reproduce the results, we need to create a Spanner instance and run the following steps:

$ git clone git@github.com:GoogleCloudPlatform/pgadapter.git
$ cd benchmarks/tpcc
$ mvn clean spring-boot:run -Dspring-boot.run.arguments="
--tpcc.benchmark-duration=PT60s 
--tpcc.warehouses=1 
--tpcc.benchmark-threads=1 
--tpcc.load-data=true 
--tpcc.run-benchmark=true 
--tpcc.benchmark-runner=pgadapter 
--tpcc.use-read-only-transactions=true 
--spanner.project=my-project 
--spanner.instance=my-instance 
--spanner.database=tpcc 
--pgadapter.credentials=/path/to/credentials.json"

We can change tpcc.benchmark-runner to a different runner, such as, spanner_jdbc for Spanner JDBC (PG). More default configurations can be found in src/main/resources/application.properties.

A few tips for benchmarking large data sets:

Use a large number of Spanner nodes (e.g., 10 nodes) to load the data first. After the data loading, reduce it to a smaller size (e.g., 3 nodes) for benchmarking. After resizing, wait for a few hours before benchmarking.
Export the initial loaded database into Cloud Storage for backup. To perform new benchmarking, directly load the database from Cloud Storage using Dataflow. This is expected to be much faster.
Warm up the database before doing the actual benchmarking. Let the database be idle (no incoming requests) for a while between benchmark runs. For example, do a 30-min warm-up run, let the database be idle for 30 mins, and then perform a benchmark run (30 mins).

What’s next?

Spanner is a scale-out database that allows forward-looking organizations to modernize their data management. Spanner’s PostgreSQL interface provides the familiarity and portability of open-source PostgreSQL without having to compromise on scalability, availability, or performance, as we’ve demonstrated above.

Learn more about what makes Spanner unique and how it’s being used today. Or try it yourself for free for 90-days or for as little as $65 USD/month for a production-ready instance that grows with your business without downtime or disruptive rearchitecture.