[PATCH] Fix sequence value may rollback after CREATE DATABASE TEMPLATE with WAL_LOG strategy
Hi
I found that "CREATE DATABASE ... TEMPLATE" with WAL_LOG strategy can
cause sequence values to rollback after a streaming replication failover.
When creating a new database with "CREATE DATABASE ... TEMPLATE" from
a template database that has sequences, calling nextval() on those
sequences in the new database causes the primary to have a higher
last_value than the standby temporarily.
When a failover occurs and the standby is promoted in this state, the
sequence value appears to have rollbacked on the new primary.
I found this issue in PostgreSQL version 15, but I've confirmed that it also
happens on the master branch.
I show the example as follows.
After creating a database with "CREATE DATABASE ... TEMPLATE",
the sequence initially has last_value = 5:
-- Initial state on both Primary and Standby:
=====
test_db=# SELECT last_value, log_cnt, is_called FROM test_seq;
last_value | log_cnt | is_called
------------+---------+-----------
5 | 28 | t
(1 row)
=====
When I run nextval() 5 times, the values of primary and standby are
as follows. The values in the standby are not changed.
-- Primary:
=====
test_db=# SELECT last_value, log_cnt, is_called FROM test_seq;
last_value | log_cnt | is_called
------------+---------+-----------
10 | 23 | t
(1 row)
=====
-- Standby:
=====
test_db=# SELECT last_value, log_cnt, is_called FROM test_seq;
last_value | log_cnt | is_called
------------+---------+-----------
5 | 28 | t
(1 row)
=====
When I run nextval() in this state, it returns 6 even though 6 - 10
were already returned on the previous primary. It means that the
sequence rollback after a failover.
I understand that sequences cannot be used to obtain "gapless", but I
think it is an unintended behavior that sequence values rollback.
I attached the script to reproduce this issue and the patch to fix.
In the attached patch, during "CREATE DATABASE ... TEMPLATE" with
WAL_LOG strategy, the sequence's log_cnt is reset to 0 so that
SEQ_LOG_VALS (32) values are fetched in advance on the next nextval()
call. This ensures that the standby's last_value is greater than or
equal to the primary's last_value.
And, this patch opens sequence page using readBufferWithoutRelcache()
without relying on relcache entries and resets log_cnt. This follows
the same approach as ScanSourceDatabasePGClass().
Note that this issue does not occur with FILE_COPY strategy because
this performs a checkpoint internally and a WAL record to fetch
SEQ_LOG_VALS in advance is always emitted on the next nextval() call.
However, I could not apply the same approach because I understand the
WAL_LOG strategy is designed to avoid checkpoints.
Regards,
Daisuke, Higuchi