PostgreSQL 17 - A Major Step Forward in Performance, Logical Replication and More
After a successful 3rd beta in August 2024, the PostgreSQL development group released the GA version of Postgres 17 on September 26th. Recently, I blogged about some of the key logical replication features that you'll see in PostgreSQL 17 https://pgedges.com/blog/logical-replication-features-in-Postgres 17. In this blog I'll describe a couple of new performance features that you'll find in Postgres 17 as well as another important logical replication feature that I didn't cover in my earlier blog of this series.
PostgreSQL has grown remarkably over the years, and with each major release has become a more robust, reliable, and responsive database for both mission critical and non-mission critical enterprise applications. The global and vibrant PostgreSQL community is contributing to PostgreSQL success, diligently ensuring that all changes are carefully scrutinized and reviewed before they are added to the project source code. It is also very encouraging to see big technology names like Microsoft, Google, and others investing in Postgres by developing in-house expertise and giving back to the open source community.
Improvements to logical replication are making it even more robust and reliable for enterprise use, while providing core capabilities that vendors like pgEdge can build on to deliver fully distributed PostgreSQL. Distributed PostgreSQL refers to the implementation of PostgreSQL in a distributed architecture, allowing for enhanced scalability, fault tolerance, and improved performance across multiple nodes. A pgEdge fully distributed PostgreSQL cluster already provides essential enterprise features like improved performance with low latency, ultra-high availability, data residency, and fault tolerance.
Now without further adieu let's discuss some PostgreSQL 17 performance features:
Improved Query Performance with Materialized CTEs
Common Table Expressions (CTEs) in PostgreSQL are temporary result sets that can be referenced within a SELECT
, INSERT
, UPDATE
, or DELETE
statement. They enhance the readability and organization of complex queries and can be recursive, making them particularly useful for hierarchical data. The basic syntax of a CTE query is as follows:
WITH cte_names AS
(
– QUERY here
)
Select * from cte_names
Include the WITH
keyword in a query to create the CTE; the parent query (that defines the result set) follows the AS
clause after the CTE name. After defining the CTE, you can refer to the CTE by name to reference the result set of the CTE and carry out further operations on the result set within the same query.
PostgreSQL 17 continues to enhance performance and capabilities around CTEs, including improvements in query planning and execution. Older versions of Postgres treat CTEs as optimization fences, meaning the planner could not push down predicates into them. However, from PostgreSQL 12 onward, you can define more efficient execution plans. You should always analyze your queries and consider the execution plans when performance is critical.
Performance tip: If you will be referring to the same result set multiple times, create the CTE with the MATERIALIZED
keyword. When you create a materialized CTE, Postgres computes and stores the result of the parent query. Then, subsequent queries aren't required to perform complex computations multiple times if you reference the CTE multiple times.
Extracting column statistics from CTE references; Postgres 17 improves materialized CTE’s
A materialized CTE basically acts as an optimization fence, which means that the outer query won’t influence the plan of the sub-query once that plan is chosen. The outer query has visibility into the estimated width and row counts of the CTE result set, so it makes sense to propagate the column statistics from the sub-query to the planner for the outer query. The outer query can make use of whatever information is available, allowing the column statistical information to propagate up to the outer query plan but not down to the CTE plan.
This bug reported to the community contains a simple test case that can demonstrate the improvement and effect on the query planner as a result of this improvement.
https://www.postgresql.org/message-id/flat/18466-1d296028273322e2%40postgresql.org
Example - Comparing Postgres 16 behavior to Postgres 17
First, we create our work space in Postgres 16 and run ANALYZE against it; two tables and indexes:
postgres=# create table t1(a int);
CREATE TABLE
postgres=# create table t2(b int);
CREATE TABLE
postgres=# create index my_index on t1 using btree (a);
CREATE INDEX
postgres=# insert into t1 select generate_series(1, 100000) from generate_series(1, 3);
INSERT 0 300000
postgres=# insert into t2 select generate_series(1, 100) from generate_series(1, 10);
INSERT 0 1000
postgres=# analyze t1;
ANALYZE
postgres=# analyze t2;
ANALYZE
Then, we create our materialized CTE:
postgres=# explain analyze with my_cte as materialized (select b from t2)
select * from t1 where t1.a in (select b from my_cte);
The query plan from our Postgres 16 code sample contains:
QUERY PLAN
----------------------------------------------------------------------
Nested Loop (cost=37.92..856.50 rows=2966 width=4) (actual time=0.574..0.722 rows=300 loops=1)
CTE my_cte
-> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=4) (actual time=0.038..0.161 rows=1000 loops=1)
-> HashAggregate (cost=22.50..24.50 rows=200 width=4) (actual time=0.449..0.461 rows=100 loops=1)
Group Key: my_cte.b
Batches: 1 Memory Usage: 40kB
-> CTE Scan on my_cte (cost=0.00..20.00 rows=1000 width=4) (actual time=0.046..0.322 rows=1000 loops=1)
-> Index Only Scan using my_index on t1 (cost=0.42..4.06 rows=3 width=4) (actual time=0.002..0.002 rows=3 loops=100)
Index Cond: (a = my_cte.b)
Heap Fetches: 0
Planning Time: 1.242 ms
Execution Time: 1.051 ms
(12 rows)
As you can see in the query plan, the column statistics of 200 rows from the sub-query is wrong, which is impacting the overall plan.
-> HashAggregate (cost=22.50..24.50 rows=200 width=4) (actual time=0.449..0.461 rows=100 loops=1)
Group Key: my_cte.b
Then, we test the same setup and query against PostgreSQL 17
postgres=# explain analyze with my_cte as materialized (select b from t2)
select * from t1 where t1.a in (select b from my_cte);
QUERY PLAN
-------------------------------------------------------------------------------------------------
---------------------------------
Merge Join (cost=42.25..54.29 rows=302 width=4) (actual time=0.627..0.712 rows=300 loops=1)
Merge Cond: (t1.a = my_cte.b)
CTE my_cte
-> Seq Scan on t2 (cost=0.00..15.00 rows=1000 width=4) (actual time=0.031..0.134 rows=1000 loops=1)
-> Index Only Scan using my_index on t1 (cost=0.42..7800.42 rows=300000 width=4) (actual time=0.027..0.049 rows=301 loops=1)
Heap Fetches: 0
-> Sort (cost=26.82..27.07 rows=100 width=4) (actual time=0.598..0.604 rows=100 loops=1)
Sort Key: my_cte.b
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=22.50..23.50 rows=100 width=4) (actual time=0.484..0.494 rows=100 loops=1)
Group Key: my_cte.b
Batches: 1 Memory Usage: 24kB
-> CTE Scan on my_cte (cost=0.00..20.00 rows=1000 width=4) (actual time=0.033..0.324 rows=1000 loops=1)
Planning Time: 1.066 ms
Execution Time: 0.946 ms
(15 rows)
As you can see in the query plan for Postgres 17, the column statistics from the subquery are correctly propagating to the upper planner of the outer query. This helps PostgreSQL choose a better plan that improves the execution time of the query.
This is a simple query, but with bigger and complex queries this change can result in a major performance difference.
Propagating pathkeys from a CTE to an Outer Query
Another interesting improvement to CTE functionality in Postgres 17 is the propagation of path keys from the sub-query to the outer query. In PostgreSQL, pathkeys are a part of the query execution planning process used primarily for sorting and ordering rows in queries that require ordered results, such as queries with an ORDER BY
clause, or when sorting is needed for other operations like merge joins.
Prior to Postgres 17, the sort order of the materialized CTE sub-query was not shared with the outer query, even if sort order was guaranteed by either an index scan node or sort node. Not having a guaranteed sort order allows the PostgreSQL planner to choose a less optimized plan, whereas having a guaranteed sort order will make it more likely to choose an optimized plan.
With PostgreSQL 17, if a CTE is materialized and has a specific sort order, the planner can reuse that information in the outer query, improving performance by avoiding redundant sorting or enabling more efficient join methods. As noted in the commit comments by Tom Lane, "The code for hoisting pathkeys into the outer query already exists for regular RTE_SUBQUERY
subqueries, but it wasn't getting used for CTEs, possibly out of concern for maintaining an optimization fence between the CTE and the outer query."
This simple modification to the Postgres source code should result in performance improvements for queries involving complex CTEs, especially those where sorting or merge joins can be optimized based on the inherent order of CTE results.
Here is an example using the data in PostgreSQL regression
postgres=# CREATE TABLE tenk1 (
postgres(# unique1 int4,
postgres(# unique2 int4,
postgres(# two int4,
postgres(# four int4,
postgres(# ten int4,
postgres(# twenty int4,
postgres(# hundred int4,
postgres(# thousand int4,
postgres(# twothousand int4,
postgres(# fivethous int4,
postgres(# tenthous int4,
postgres(# odd int4,
postgres(# even int4,
postgres(# stringu1 name,
postgres(# stringu2 name,
postgres(# string4 name
postgres(# );
CREATE TABLE
postgres=# CREATE INDEX tenk1_unique1 ON tenk1 USING btree(unique1 int4_ops);
CREATE INDEX
postgres=# \copy tenk1 FROM '~/projects/postgres/src/test/regress/data/tenk.data';
COPY 10000
postgres=# VACUUM ANALYZE tenk1;
VACUUM
The query plan from our Postgres 16 code sample contains:
postgres=# explain analyze with x as materialized (select unique1 from tenk1 b order by unique1)
select count(*) from tenk1 a where unique1 in (select * from x);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=764.29..764.30 rows=1 width=8) (actual time=21.592..21.593 rows=1 loops=1)
CTE x
-> Index Only Scan using tenk1_unique1 on tenk1 b (cost=0.29..306.29 rows=10000 width=4) (actual time=0.046..1.415 rows=10000 loops=1)
Heap Fetches: 0
-> Nested Loop (cost=225.28..445.50 rows=5000 width=0) (actual time=7.545..20.911 rows=10000 loops=1)
-> HashAggregate (cost=225.00..227.00 rows=200 width=4) (actual time=7.535..9.051 rows=10000 loops=1)
Group Key: x.unique1
Batches: 1 Memory Usage: 929kB
-> CTE Scan on x (cost=0.00..200.00 rows=10000 width=4) (actual time=0.070..3.933 rows=10000 loops=1)
-> Index Only Scan using tenk1_unique1 on tenk1 a (cost=0.29..1.08 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=10000)
Index Cond: (unique1 = x.unique1)
Heap Fetches: 0
Planning Time: 0.806 ms
Execution Time: 21.890 ms
(14 rows)
The query plan from our Postgres 17 code sample contains:
postgres=# explain analyze with x as materialized (select unique1 from tenk1 b order by unique1)
select count(*) from tenk1 a where unique1 in (select * from x);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=987.55..987.56 rows=1 width=8) (actual time=8.777..8.778 rows=1 loops=1)
CTE x
-> Index Only Scan using tenk1_unique1 on tenk1 b (cost=0.29..306.29 rows=10000 width=4) (actual time=0.010..1.095 rows=10000 loops=1)
Heap Fetches: 0
-> Merge Semi Join (cost=0.31..656.26 rows=10000 width=0) (actual time=0.037..8.024 rows=10000 loops=1)
Merge Cond: (a.unique1 = x.unique1)
-> Index Only Scan using tenk1_unique1 on tenk1 a (cost=0.29..306.29 rows=10000 width=4) (actual time=0.013..1.262 rows=10000 loops=1)
Heap Fetches: 0
-> CTE Scan on x (cost=0.00..200.00 rows=10000 width=4) (actual time=0.016..3.678 rows=10000 loops=1)
Planning Time: 0.800 ms
Execution Time: 8.899 ms
(11 rows)
The query plans in Postgres 16 and Postgres 17 are significantly different due to this version 17 enhancement. This is a small example; you can see the performance gain will be significant in larger queries. Please note that this improvement is only effective if the CTE subquery has an ORDER BY
clause.
Fast B-Tree index scans for Scalar Array
In PostgreSQL, ScalarArrayOpExpr
is a node type in the execution plan that handles queries involving operations like IN
or ANY
with arrays or lists of values. It's particularly useful for queries where you compare a column against a set of values, such as:
SELECT * FROM table WHERE column = ANY(ARRAY[1, 2, 3]);
ScalarArrayOpExpr
allows PostgreSQL to optimize queries that involve multiple comparisons that use IN
or ANY
. PostgreSQL 17 has introduced new performance enhancements to make these operations even faster.
In PostgreSQL 17, significant improvements have been made to B-tree index scans, which optimize performance, particularly for queries with large IN
lists or ANY
conditions. These enhancements reduce the number of index scans performed by the system, thereby decreasing CPU and buffer page contention, resulting in faster query execution.
One of the key improvements is in handling Scalar Array Operation Expressions (SAOP
), which allows more efficient traversal of B-tree indexes, particularly for multidimensional queries. For example, when you have multiple index columns (each with its own IN
list), PostgreSQL 17 can now process these operations more efficiently in a single index scan, rather than multiple scans as in earlier versions. This can lead to performance gains of 20-30% in CPU-bound workloads where page accesses were previously a bottleneck.
Additionally, PostgreSQL 17 introduces better management of internal locks, further enhancing performance for high-concurrency workloads, especially when scanning multiple dimensions within a B-tree index.
We can demonstrate this with a simple example. We'll use the same tenk1
table and data that we used in the previous example from the Postgres regression suite.
Our example, first run on Postgres 16:
CREATE TABLE tenk1 (
postgres(# unique1 int4,
postgres(# unique2 int4,
postgres(# two int4,
postgres(# four int4,
postgres(# ten int4,
postgres(# twenty int4,
postgres(# hundred int4,
postgres(# thousand int4,
postgres(# twothousand int4,
postgres(# fivethous int4,
postgres(# tenthous int4,
postgres(# odd int4,
postgres(# even int4,
postgres(# stringu1 name,
postgres(# stringu2 name,
postgres(# string4 name
postgres(# );
CREATE TABLE
postgres=# CREATE INDEX tenk1_unique1 ON tenk1 USING btree(unique1 int4_ops);
CREATE INDEX
postgres=# \copy tenk1 FROM '~/projects/postgres/src/test/regress/data/tenk.data';
COPY 10000
postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 IN (1, 2, 3);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tenk1 (cost=14.20..330.12 rows=176 width=244) (actual time=0.138..0.153 rows=3 loops=1)
Recheck Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
Heap Blocks: exact=3
Buffers: shared hit=9
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..14.16 rows=176 width=0) (actual time=0.102..0.102 rows=3 loops=1)
Index Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
Buffers: shared hit=6
Planning:
Buffers: shared hit=2
Planning Time: 0.900 ms
Execution Time: 0.242 ms
(11 rows)
postgres=# SELECT idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE relname = 'tenk1';
idx_scan | idx_tup_fetch
----------+---------------
3 | 3
(1 row)
In the previous query you can see that the shared buffer hit for the IN
query was 9 and that it took 3 index scans to get the results from the index scan. In PostgreSQL, the term shared hit refers to a specific type of cache hit related to buffer management. A shared hit occurs when PostgreSQL accesses a data block or page from the shared buffer pool rather than from disk, improving query performance.
The same example, this time run on Postgres 17:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM tenk1 WHERE unique1 IN (1, 2, 3);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tenk1 (cost=12.88..24.08 rows=3 width=244) (actual time=0.043..0.054 rows=3 loops=1)
Recheck Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
Heap Blocks: exact=3
Buffers: shared hit=5
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..12.88 rows=3 width=0) (actual time=0.026..0.026 rows=3 loops=1)
Index Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
Buffers: shared hit=2
Planning:
Buffers: shared hit=59
Planning Time: 0.479 ms
Execution Time: 0.116 ms
(11 rows)
postgres=# SELECT idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE relname = 'tenk1';
idx_scan | idx_tup_fetch
----------+---------------
1 | 3
(1 row)
As you can see, with Postgres 17 the shared buffer hit is reduced to 5, and most importantly it is only doing one index scan (as opposed to 3 scans in the case of Postgres 16). With this improvement in Postgres 17, the performance of scalar array operations is greatly improved, and Postgres can choose from better optimized query plans.
Retention of logical replication slots and subscriptions during upgrade
The retention of logical replication slots and migration of subscription dependencies during themajor upgrade process is another logical replication feature added to PostgreSQL 17. Please note that this feature will only be useful in upgrading from PostgreSQL 17 to later versions, this is not supported for upgrade prior to Postgres 17. The replication slots and replication origins are generated when building a logical replication environment. However this information is specific to the node in order to record replication status, application status and WAL transmission status so they aren’t upgraded as part of the upgrade process. Once the published node is upgraded the user needs to manually construct these objects.
The pg_upgrade process is improved in PostgreSQL 17 to reference and rebuild these internal objects; this functionality enables replication to automatically resume when upgrading a node that has logical replication. Previously, when performing a major version upgrade, users had to drop logical replication slots, requiring them to re-synchronize data with the subscribers after the upgrade. This added complexity and increased downtime during upgrades.
You need to follow these steps when upgrading the publisher cluster:
Ensure any subscriptions to the publisher are temporarily disabled by performing an
ALTER SUBSCRIPTION….DISABLE
. These are enabled after the upgrade process has completed.Set the new cluster's
wal_level
to logical.The
max_replication_slots
on the new cluster must be set to a value greater than or equal to replication slots on the old cluster.Output plugins used by the slots must be installed in the new cluster.
All the changes from the old cluster are already replicated to the target cluster prior to the upgrade.
All slots on the old cluster must be usable; you can ensure this by checking conflicting columns in pg_replication_slots view.
Conflicting
should befalse
for all the slots on the old cluster.No slots in the new cluster should have a value of
false
in theTemporary
column of thepg_replication_slots
view. There should be no permanent logical replication slots in the new cluster.
The pg_upgrade process of upgrading replication slots will result in an error if any of the above prerequisites aren’t met.
Conclusion
With PostgreSQL 17, community focus continues to be on making PostgreSQL more performant, scalable, secure, and enterprise ready. Postgres 17 also improves the developer experience by adding new features for compatibility and making existing features more powerful and robust.
The logical replication feature in PostgreSQL has rapidly grown and become more mature and robust over the last few years. pgEdge provides distributed PostgreSQL by adding significant new changes on top of the Postgres logical replication base; that base enables pgEdge to provide capabilities like low latency, ultra HA, and data residency for enterprise applications. These are extremely important and critical features that are a must have for almost all enterprise applications today.
Beyond version 17, PostgreSQL will continue to grow, improve, and become more performant to cater to enterprise applications requiring more scalable databases. Scalability (both horizontal and vertical) has improved over the years, but there is definitely room to improve horizontal capability by adding sharding capabilities to PostgreSQL. We will see more logical replication improvements, with more to come in the area of DDL replication or replication of missing objects (like sequences) and better node management. The community also recognizes the need to make PostgreSQL more compatible, hence the MERGE
command improvements in Postgres 17, and plans for more compatibility features beyond Postgres 17.