[PATCH] Fix sequence value may rollback after CREATE DATABASE TEMPLATE with WAL_LOG strategy

Started by Daisuke Higuchi3 days ago1 messageshackers
Jump to latest
#1Daisuke Higuchi
higuchi.daisuke11@gmail.com

Hi

I found that "CREATE DATABASE ... TEMPLATE" with WAL_LOG strategy can
cause sequence values to rollback after a streaming replication failover.

When creating a new database with "CREATE DATABASE ... TEMPLATE" from
a template database that has sequences, calling nextval() on those
sequences in the new database causes the primary to have a higher
last_value than the standby temporarily.
When a failover occurs and the standby is promoted in this state, the
sequence value appears to have rollbacked on the new primary.
I found this issue in PostgreSQL version 15, but I've confirmed that it also
happens on the master branch.

I show the example as follows.
After creating a database with "CREATE DATABASE ... TEMPLATE",
the sequence initially has last_value = 5:

-- Initial state on both Primary and Standby:
=====
test_db=# SELECT last_value, log_cnt, is_called FROM test_seq;
last_value | log_cnt | is_called
------------+---------+-----------
5 | 28 | t
(1 row)
=====

When I run nextval() 5 times, the values of primary and standby are
as follows. The values in the standby are not changed.

-- Primary:
=====
test_db=# SELECT last_value, log_cnt, is_called FROM test_seq;
last_value | log_cnt | is_called
------------+---------+-----------
10 | 23 | t
(1 row)
=====

-- Standby:
=====
test_db=# SELECT last_value, log_cnt, is_called FROM test_seq;
last_value | log_cnt | is_called
------------+---------+-----------
5 | 28 | t
(1 row)
=====

When I run nextval() in this state, it returns 6 even though 6 - 10
were already returned on the previous primary. It means that the
sequence rollback after a failover.
I understand that sequences cannot be used to obtain "gapless", but I
think it is an unintended behavior that sequence values rollback.
I attached the script to reproduce this issue and the patch to fix.

In the attached patch, during "CREATE DATABASE ... TEMPLATE" with
WAL_LOG strategy, the sequence's log_cnt is reset to 0 so that
SEQ_LOG_VALS (32) values are fetched in advance on the next nextval()
call. This ensures that the standby's last_value is greater than or
equal to the primary's last_value.
And, this patch opens sequence page using readBufferWithoutRelcache()
without relying on relcache entries and resets log_cnt. This follows
the same approach as ScanSourceDatabasePGClass().
Note that this issue does not occur with FILE_COPY strategy because
this performs a checkpoint internally and a WAL record to fetch
SEQ_LOG_VALS in advance is always emitted on the next nextval() call.
However, I could not apply the same approach because I understand the
WAL_LOG strategy is designed to avoid checkpoints.

Regards,
Daisuke, Higuchi

Attachments:

reproduce_script_sequence_rollback_issue.shapplication/x-sh; name=reproduce_script_sequence_rollback_issue.shDownload
v1_0002_fix_sequence_rollback_by_WAL_LOG_test.patchapplication/x-patch; name=v1_0002_fix_sequence_rollback_by_WAL_LOG_test.patchDownload+82-0
v1_0001_fix_sequence_rollback_by_WAL_LOG.patchapplication/x-patch; name=v1_0001_fix_sequence_rollback_by_WAL_LOG.patchDownload+61-0