Logical Replication of sequences
In the past, we have discussed various approaches to replicate
sequences by decoding the sequence changes from WAL. However, we faced
several challenges to achieve the same, some of which are due to the
non-transactional nature of sequences. The major ones were: (a)
correctness of the decoding part, some of the problems were discussed
at [1]/messages/by-id/e4145f77-6f37-40e0-a770-aba359c50b93@enterprisedb.com[2]/messages/by-id/CAA4eK1Lxt+5a9fA-B7FRzfd1vns=EwZTF5z9_xO9Ms4wsqD88Q@mail.gmail.com[3]/messages/by-id/CAA4eK1KR4=yALKP0pOdVkqUwoUqD_v7oU3HzY-w0R_EBvgHL2w@mail.gmail.com (b) handling of sequences especially adding certain
sequences automatically (e.g. sequences backing SERIAL/BIGSERIAL
columns) for built-in logical replication is not considered in the
proposed work [1]/messages/by-id/e4145f77-6f37-40e0-a770-aba359c50b93@enterprisedb.com (c) there were some performance concerns in not so
frequent scenarios [4]/messages/by-id/12822961-b7de-9d59-dd27-2e3dc3980c7e@enterprisedb.com (see performance issues), we can probably deal
with this by making sequences optional for builtin logical replication
It could be possible that we can deal with these and any other issues
with more work but as the use case for this feature is primarily major
version upgrades it is not clear that we want to make such a big
change to the code or are there better alternatives to achieve the
same.
This time at pgconf.dev (https://2024.pgconf.dev/), we discussed
alternative approaches for this work which I would like to summarize.
The various methods we discussed are as follows:
1. Provide a tool to copy all the sequences from publisher to
subscriber. The major drawback is that users need to perform this as
an additional step during the upgrade which would be inconvenient and
probably not as useful as some built-in mechanism.
2. Provide a command say Alter Subscription ... Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.
3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.
Any other ideas?
I have added the members I remember that were part of the discussion
in the email. Please feel free to correct me if I have misunderstood
or missed any point we talked about.
Thoughts?
[1]: /messages/by-id/e4145f77-6f37-40e0-a770-aba359c50b93@enterprisedb.com
[2]: /messages/by-id/CAA4eK1Lxt+5a9fA-B7FRzfd1vns=EwZTF5z9_xO9Ms4wsqD88Q@mail.gmail.com
[3]: /messages/by-id/CAA4eK1KR4=yALKP0pOdVkqUwoUqD_v7oU3HzY-w0R_EBvgHL2w@mail.gmail.com
[4]: /messages/by-id/12822961-b7de-9d59-dd27-2e3dc3980c7e@enterprisedb.com
--
With Regards,
Amit Kapila.
Hi,
On Tue, Jun 4, 2024 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.
Thanks. IIUC, both of the above approaches decode the sequences during
only shutdown. I'm wondering, why not periodically decode and
replicate the published sequences so that the decoding at the shutdown
will not take that longer? I can imagine a case where there are tens
of thousands of sequences in a production server, and surely decoding
and sending them just during the shutdown can take a lot of time
hampering the overall server uptime.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Jun 4, 2024 at 4:53 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Jun 4, 2024 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.Thanks. IIUC, both of the above approaches decode the sequences during
only shutdown. I'm wondering, why not periodically decode and
replicate the published sequences so that the decoding at the shutdown
will not take that longer?
Even if we decode it periodically (say each time we decode the
checkpoint record) then also we need to send the entire set of
sequences at shutdown. This is because the sequences may have changed
from the last time we sent them.
I can imagine a case where there are tens
of thousands of sequences in a production server, and surely decoding
and sending them just during the shutdown can take a lot of time
hampering the overall server uptime.
It is possible but we will send only the sequences that belong to
publications for which walsender is supposed to send the required
data. Now, we can also imagine providing option 2 (Alter Subscription
... Replicate Sequences) so that users can replicate sequences before
shutdown and then disable the subscriptions so that there won't be a
corresponding walsender.
--
With Regards,
Amit Kapila.
On Tue, Jun 4, 2024 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.Any other ideas?
In case of primary crash the sequence won't get replicated. That is true
even with the previous approach in case walsender is shut down because of a
crash, but it is more serious with this approach. How about periodically
sending this information?
--
Best Wishes,
Ashutosh Bapat
On 6/4/24 06:57, Amit Kapila wrote:
1. Provide a tool to copy all the sequences from publisher to
subscriber. The major drawback is that users need to perform this as
an additional step during the upgrade which would be inconvenient and
probably not as useful as some built-in mechanism.
Agree, this requires additional steps. Not a preferred approach in my
opinion. When a large set of sequences are present, it will add
additional downtime for upgrade process.
2. Provide a command say Alter Subscription ... Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.
This is similar to option 1 except that it is a SQL command now. Still
not a preferred approach in my opinion. When a large set of sequences are
present, it will add additional downtime for upgrade process.
3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.
At the time of shutdown a) most logical upgrades don't necessarily call
for shutdown b) it will still add to total downtime with large set of
sequences. Incremental option is better as it will not require a shutdown.
I do see a scenario where sequence of events can lead to loss of sequence
and generate duplicate sequence values, if subscriber starts consuming
sequences while publisher is also consuming them. In such cases, subscriber
shall not be allowed sequence consumption.
--
Kind Regards,
Yogesh Sharma
Open Source Enthusiast and Advocate
PostgreSQL Contributors Team @ RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Jun 4, 2024 at 7:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Tue, Jun 4, 2024 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.Any other ideas?
In case of primary crash the sequence won't get replicated. That is true even with the previous approach in case walsender is shut down because of a crash, but it is more serious with this approach.
Right, but if we just want to support a major version upgrade scenario
then this should be fine because upgrades require a clean shutdown.
How about periodically sending this information?
Now, if we want to support some sort of failover then probably this
will help. Do you have that use case in mind? If we want to send
periodically then we can do it when decoding checkpoint
(XLOG_CHECKPOINT_ONLINE) or some other periodic WAL record like
running_xacts (XLOG_RUNNING_XACTS).
--
With Regards,
Amit Kapila.
On Tue, Jun 4, 2024 at 8:56 PM Yogesh Sharma
<yogesh.sharma@catprosystems.com> wrote:
On 6/4/24 06:57, Amit Kapila wrote:
2. Provide a command say Alter Subscription ... Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.This is similar to option 1 except that it is a SQL command now.
Right, but I would still prefer a command as it provides clear steps
for the upgrade. Users need to perform (a) Replicate Sequences for a
particular subscription (b) Disable that subscription (c) Perform (a)
and (b) for all the subscriptions corresponding to the publisher we
want to shut down for upgrade.
I agree there are some manual steps involved here but it is advisable
for users to ensure that they have received the required data on the
subscriber before the upgrade of the publisher node, otherwise, they
may not be able to continue replication after the upgrade. For
example, see the "Prepare for publisher upgrades" step in pg_upgrade
docs [1]https://www.postgresql.org/docs/devel/pgupgrade.html.
3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.At the time of shutdown a) most logical upgrades don't necessarily call
for shutdown
Won't the major version upgrade expect that the node is down? Refer to
step "Stop both servers" in [1]https://www.postgresql.org/docs/devel/pgupgrade.html.
b) it will still add to total downtime with large set of
sequences. Incremental option is better as it will not require a shutdown.
I do see a scenario where sequence of events can lead to loss of sequence
and generate duplicate sequence values, if subscriber starts consuming
sequences while publisher is also consuming them. In such cases, subscriber
shall not be allowed sequence consumption.
It would be fine to not allow subscribers to consume sequences that
are being logically replicated but what about the cases where we
haven't sent the latest values of sequences before the shutdown of the
publisher? In such a case, the publisher would have already consumed
some values that wouldn't have been sent to the subscriber and now
when the publisher is down then even if we re-allow the sequence
values to be consumed from the subscriber, it can lead to duplicate
values.
[1]: https://www.postgresql.org/docs/devel/pgupgrade.html
--
With Regards,
Amit Kapila.
On 04.06.24 12:57, Amit Kapila wrote:
2. Provide a command say Alter Subscription ... Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.
I would start with this. In any case, you're going to need to write
code to collect all the sequence values, send them over some protocol,
apply them on the subscriber. The easiest way to start is to trigger
that manually. Then later you can add other ways to trigger it, either
by timer or around shutdown, or whatever other ideas there might be.
On Wed, Jun 5, 2024 at 9:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 4, 2024 at 8:56 PM Yogesh Sharma
<yogesh.sharma@catprosystems.com> wrote:On 6/4/24 06:57, Amit Kapila wrote:
2. Provide a command say Alter Subscription ... Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.This is similar to option 1 except that it is a SQL command now.
Right, but I would still prefer a command as it provides clear steps
for the upgrade. Users need to perform (a) Replicate Sequences for a
particular subscription (b) Disable that subscription (c) Perform (a)
and (b) for all the subscriptions corresponding to the publisher we
want to shut down for upgrade.
Another advantage of this approach over just a plain tool to copy all
sequences before upgrade is that here we can have the facility to copy
just the required sequences. I mean the set sequences that the user
has specified as part of the publication.
--
With Regards,
Amit Kapila.
Hi,
On Tue, Jun 4, 2024 at 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Even if we decode it periodically (say each time we decode the
checkpoint record) then also we need to send the entire set of
sequences at shutdown. This is because the sequences may have changed
from the last time we sent them.
Agree. How about decoding and sending only the sequences that are
changed from the last time when they were sent? I know it requires a
bit of tracking and more work, but all I'm looking for is to reduce
the amount of work that walsenders need to do during the shutdown.
Having said that, I like the idea of letting the user sync the
sequences via ALTER SUBSCRIPTION command and not weave the logic into
the shutdown checkpoint path. As Peter Eisentraut said here
/messages/by-id/42e5cb35-4aeb-4f58-8091-90619c7c3ecc@eisentraut.org,
this can be a good starting point to get going.
I can imagine a case where there are tens
of thousands of sequences in a production server, and surely decoding
and sending them just during the shutdown can take a lot of time
hampering the overall server uptime.It is possible but we will send only the sequences that belong to
publications for which walsender is supposed to send the required
data.
Right, but what if all the publication tables can have tens of
thousands of sequences.
Now, we can also imagine providing option 2 (Alter Subscription
... Replicate Sequences) so that users can replicate sequences before
shutdown and then disable the subscriptions so that there won't be a
corresponding walsender.
As stated above, I like this idea to start with.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Jun 5, 2024 at 12:51 PM Peter Eisentraut <peter@eisentraut.org> wrote:
On 04.06.24 12:57, Amit Kapila wrote:
2. Provide a command say Alter Subscription ... Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.I would start with this. In any case, you're going to need to write
code to collect all the sequence values, send them over some protocol,
apply them on the subscriber. The easiest way to start is to trigger
that manually. Then later you can add other ways to trigger it, either
by timer or around shutdown, or whatever other ideas there might be.
Agreed. To achieve this, we can allow sequences to be copied during
the initial CREATE SUBSCRIPTION command similar to what we do for
tables. And then later by new/existing command, we re-copy the already
existing sequences on the subscriber.
The options for the new command could be:
Alter Subscription ... Refresh Sequences
Alter Subscription ... Replicate Sequences
In the second option, we need to introduce a new keyword Replicate.
Can you think of any better option?
In addition to the above, the command Alter Subscription .. Refresh
Publication will fetch any missing sequences similar to what it does
for tables.
Thoughts?
--
With Regards,
Amit Kapila.
On Wed, Jun 5, 2024 at 8:45 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 4, 2024 at 7:40 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Tue, Jun 4, 2024 at 4:27 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.Any other ideas?
In case of primary crash the sequence won't get replicated. That is true
even with the previous approach in case walsender is shut down because of a
crash, but it is more serious with this approach.Right, but if we just want to support a major version upgrade scenario
then this should be fine because upgrades require a clean shutdown.How about periodically sending this information?
Now, if we want to support some sort of failover then probably this
will help. Do you have that use case in mind?
Regular failover was a goal for supporting logical replication of
sequences. That might be more common than major upgrade scenario.
If we want to send
periodically then we can do it when decoding checkpoint
(XLOG_CHECKPOINT_ONLINE) or some other periodic WAL record like
running_xacts (XLOG_RUNNING_XACTS).
Yeah. I am thinking along those lines.
It must be noted, however, that none of those optional make sure that the
replicated sequence's states are consistent with the replicated object
state which use those sequences. E.g. table t1 uses sequence s1. By last
sequence replication, as of time T1, let's say t1 had consumed values upto
vl1 from s1. But later, by time T2, it consumed values upto vl2 which were
not replicated but the changes to t1 by T2 were replicated. If failover
happens at that point, INSERTs on t1 would fail because of duplicate keys
(values between vl1 and vl2). Previous attempt to support logical sequence
replication solved this problem by replicating a future state of sequences
(current value +/- log count). Similarly, if the sequence was ALTERed
between T1 and T2, the state of sequence on replica would be inconsistent
with the state of t1. Failing over at this stage, might end t1 in an
inconsistent state.
--
Best Wishes,
Ashutosh Bapat
On Wed, Jun 5, 2024 at 6:01 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Wed, Jun 5, 2024 at 8:45 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
How about periodically sending this information?
Now, if we want to support some sort of failover then probably this
will help. Do you have that use case in mind?Regular failover was a goal for supporting logical replication of sequences. That might be more common than major upgrade scenario.
We can't support regular failovers to subscribers unless we can
replicate/copy slots because the existing nodes connected to the
current publisher/primary would expect that. It should be primarily
useful for major version upgrades at this stage.
If we want to send
periodically then we can do it when decoding checkpoint
(XLOG_CHECKPOINT_ONLINE) or some other periodic WAL record like
running_xacts (XLOG_RUNNING_XACTS).Yeah. I am thinking along those lines.
It must be noted, however, that none of those optional make sure that the replicated sequence's states are consistent with the replicated object state which use those sequences.
Right, I feel as others are advocating, it seems better to support it
manually via command and then later we can extend it to do at shutdown
or at some regular intervals. If we do that then we should be able to
support major version upgrades and planned switchover.
--
With Regards,
Amit Kapila.
On Wed, Jun 5, 2024 at 12:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 4, 2024 at 8:56 PM Yogesh Sharma
<yogesh.sharma@catprosystems.com> wrote:On 6/4/24 06:57, Amit Kapila wrote:
2. Provide a command say Alter Subscription ... Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.This is similar to option 1 except that it is a SQL command now.
Right, but I would still prefer a command as it provides clear steps
for the upgrade. Users need to perform (a) Replicate Sequences for a
particular subscription (b) Disable that subscription (c) Perform (a)
and (b) for all the subscriptions corresponding to the publisher we
want to shut down for upgrade.I agree there are some manual steps involved here but it is advisable
for users to ensure that they have received the required data on the
subscriber before the upgrade of the publisher node, otherwise, they
may not be able to continue replication after the upgrade. For
example, see the "Prepare for publisher upgrades" step in pg_upgrade
docs [1].3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.At the time of shutdown a) most logical upgrades don't necessarily call
for shutdownWon't the major version upgrade expect that the node is down? Refer to
step "Stop both servers" in [1].
I think the idea is that the publisher is the old version and the
subscriber is the new version, and changes generated on the publisher
are replicated to the subscriber via logical replication. And at some
point, we change the application (or a router) settings so that no
more transactions come to the publisher, do the last upgrade
preparation work (e.g. copying the latest sequence values if
requried), and then change the application so that new transactions
come to the subscriber.
I remember the blog post about Knock doing a similar process to
upgrade the clusters with minimal downtime[1]https://knock.app/blog/zero-downtime-postgres-upgrades.
Regards,
[1]: https://knock.app/blog/zero-downtime-postgres-upgrades
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Wed, Jun 5, 2024 at 3:17 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Jun 4, 2024 at 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Even if we decode it periodically (say each time we decode the
checkpoint record) then also we need to send the entire set of
sequences at shutdown. This is because the sequences may have changed
from the last time we sent them.Agree. How about decoding and sending only the sequences that are
changed from the last time when they were sent? I know it requires a
bit of tracking and more work, but all I'm looking for is to reduce
the amount of work that walsenders need to do during the shutdown.
I see your point but going towards tracking the changed sequences
sounds like moving towards what we do for incremental backups unless
we can invent some other smart way.
Having said that, I like the idea of letting the user sync the
sequences via ALTER SUBSCRIPTION command and not weave the logic into
the shutdown checkpoint path. As Peter Eisentraut said here
/messages/by-id/42e5cb35-4aeb-4f58-8091-90619c7c3ecc@eisentraut.org,
this can be a good starting point to get going.
Agreed.
I can imagine a case where there are tens
of thousands of sequences in a production server, and surely decoding
and sending them just during the shutdown can take a lot of time
hampering the overall server uptime.It is possible but we will send only the sequences that belong to
publications for which walsender is supposed to send the required
data.Right, but what if all the publication tables can have tens of
thousands of sequences.
In such cases we have no option but to send all the sequences.
Now, we can also imagine providing option 2 (Alter Subscription
... Replicate Sequences) so that users can replicate sequences before
shutdown and then disable the subscriptions so that there won't be a
corresponding walsender.As stated above, I like this idea to start with.
+1.
--
With Regards,
Amit Kapila.
On Wed, Jun 5, 2024 at 9:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 5, 2024 at 12:51 PM Peter Eisentraut <peter@eisentraut.org> wrote:
On 04.06.24 12:57, Amit Kapila wrote:
2. Provide a command say Alter Subscription ... Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.I would start with this. In any case, you're going to need to write
code to collect all the sequence values, send them over some protocol,
apply them on the subscriber. The easiest way to start is to trigger
that manually. Then later you can add other ways to trigger it, either
by timer or around shutdown, or whatever other ideas there might be.Agreed.
+1
To achieve this, we can allow sequences to be copied during
the initial CREATE SUBSCRIPTION command similar to what we do for
tables. And then later by new/existing command, we re-copy the already
existing sequences on the subscriber.The options for the new command could be:
Alter Subscription ... Refresh Sequences
Alter Subscription ... Replicate SequencesIn the second option, we need to introduce a new keyword Replicate.
Can you think of any better option?
Another idea is doing that using options. For example,
For initial sequences synchronization:
CREATE SUBSCRIPTION ... WITH (copy_sequence = true);
For re-copy (or update) sequences:
ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (copy_sequence = true);
In addition to the above, the command Alter Subscription .. Refresh
Publication will fetch any missing sequences similar to what it does
for tables.
On the subscriber side, do we need to track which sequences are
created via CREATE/ALTER SUBSCRIPTION?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Thu, Jun 6, 2024 at 9:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Jun 5, 2024 at 12:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 4, 2024 at 8:56 PM Yogesh Sharma
<yogesh.sharma@catprosystems.com> wrote:On 6/4/24 06:57, Amit Kapila wrote:
2. Provide a command say Alter Subscription ... Replicate Sequences
(or something like that) which users can perform before shutdown of
the publisher node during upgrade. This will allow copying all the
sequences from the publisher node to the subscriber node directly.
Similar to previous approach, this could also be inconvenient for
users.This is similar to option 1 except that it is a SQL command now.
Right, but I would still prefer a command as it provides clear steps
for the upgrade. Users need to perform (a) Replicate Sequences for a
particular subscription (b) Disable that subscription (c) Perform (a)
and (b) for all the subscriptions corresponding to the publisher we
want to shut down for upgrade.I agree there are some manual steps involved here but it is advisable
for users to ensure that they have received the required data on the
subscriber before the upgrade of the publisher node, otherwise, they
may not be able to continue replication after the upgrade. For
example, see the "Prepare for publisher upgrades" step in pg_upgrade
docs [1].3. Replicate published sequences via walsender at the time of shutdown
or incrementally while decoding checkpoint record. The two ways to
achieve this are: (a) WAL log a special NOOP record just before
shutting down checkpointer. Then allow the WALsender to read the
sequence data and send it to the subscriber while decoding the new
NOOP record. (b) Similar to the previous idea but instead of WAL
logging a new record directly invokes a decoding callback after
walsender receives a request to shutdown which will allow pgoutput to
read and send required sequences. This approach has a drawback that we
are adding more work at the time of shutdown but note that we already
waits for all the WAL records to be decoded and sent before shutting
down the walsender during shutdown of the node.At the time of shutdown a) most logical upgrades don't necessarily call
for shutdownWon't the major version upgrade expect that the node is down? Refer to
step "Stop both servers" in [1].I think the idea is that the publisher is the old version and the
subscriber is the new version, and changes generated on the publisher
are replicated to the subscriber via logical replication. And at some
point, we change the application (or a router) settings so that no
more transactions come to the publisher, do the last upgrade
preparation work (e.g. copying the latest sequence values if
requried), and then change the application so that new transactions
come to the subscriber.
Okay, thanks for sharing the exact steps. If one has to follow that
path then sending incrementally (at checkpoint WAL or other times)
won't work because we want to ensure that the sequences are up-to-date
before the application starts using the new database. To do that in a
bullet-proof way, one has to copy/replicate sequences during the
requests to the new database are paused (Reference from the blog you
shared: For the first second after flipping the flag, our application
artificially paused any new database requests for one second.).
Currently, they are using some guesswork to replicate sequences that
require manual verification and more manual work for each sequence.
The new command (Alter Subscription ... Replicate Sequence) should
ease their procedure and can do things where they would require no or
very less verification.
I remember the blog post about Knock doing a similar process to
upgrade the clusters with minimal downtime[1].
Thanks for sharing the blog post.
--
With Regards,
Amit Kapila.
On Thu, Jun 6, 2024 at 11:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Jun 5, 2024 at 9:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
To achieve this, we can allow sequences to be copied during
the initial CREATE SUBSCRIPTION command similar to what we do for
tables. And then later by new/existing command, we re-copy the already
existing sequences on the subscriber.The options for the new command could be:
Alter Subscription ... Refresh Sequences
Alter Subscription ... Replicate SequencesIn the second option, we need to introduce a new keyword Replicate.
Can you think of any better option?Another idea is doing that using options. For example,
For initial sequences synchronization:
CREATE SUBSCRIPTION ... WITH (copy_sequence = true);
How will it interact with the existing copy_data option? So copy_data
will become equivalent to copy_table_data, right?
For re-copy (or update) sequences:
ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (copy_sequence = true);
Similar to the previous point it can be slightly confusing w.r.t
copy_data. And would copy_sequence here mean that it would copy
sequence values of both pre-existing and newly added sequences, if so,
that would make it behave differently than copy_data? The other
possibility in this direction would be to introduce an option like
replicate_all_sequences/copy_all_sequences which indicates a copy of
both pre-existing and new sequences, if any.
If we want to go in the direction of having an option such as
copy_(all)_sequences then do you think specifying that copy_data is
just for tables in the docs would be sufficient? I am afraid that it
would be confusing for users.
In addition to the above, the command Alter Subscription .. Refresh
Publication will fetch any missing sequences similar to what it does
for tables.On the subscriber side, do we need to track which sequences are
created via CREATE/ALTER SUBSCRIPTION?
I think so unless we find some other way to know at refresh
publication time which all new sequences need to be part of the
subscription. What should be the behavior w.r.t sequences when the
user performs ALTER SUBSCRIPTION ... REFRESH PUBLICATION? I was
thinking similar to tables, it should fetch any missing sequence
information from the publisher.
--
With Regards,
Amit Kapila.
On Thu, Jun 6, 2024 at 9:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 5, 2024 at 6:01 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Wed, Jun 5, 2024 at 8:45 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:
How about periodically sending this information?
Now, if we want to support some sort of failover then probably this
will help. Do you have that use case in mind?Regular failover was a goal for supporting logical replication of
sequences. That might be more common than major upgrade scenario.
We can't support regular failovers to subscribers unless we can
replicate/copy slots because the existing nodes connected to the
current publisher/primary would expect that. It should be primarily
useful for major version upgrades at this stage.
We don't want to design it in a way that requires major rework when we are
able to copy slots and then support regular failovers. That's when the
consistency between a sequence and the table using it would be a must. So
it's better that we take that into consideration now.
--
Best Wishes,
Ashutosh Bapat
On Thu, Jun 6, 2024 at 3:44 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
On Thu, Jun 6, 2024 at 9:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 5, 2024 at 6:01 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:On Wed, Jun 5, 2024 at 8:45 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
How about periodically sending this information?
Now, if we want to support some sort of failover then probably this
will help. Do you have that use case in mind?Regular failover was a goal for supporting logical replication of sequences. That might be more common than major upgrade scenario.
We can't support regular failovers to subscribers unless we can
replicate/copy slots because the existing nodes connected to the
current publisher/primary would expect that. It should be primarily
useful for major version upgrades at this stage.We don't want to design it in a way that requires major rework when we are able to copy slots and then support regular failover.
I don't think we can just copy slots like we do for standbys. The
slots would require WAL locations to continue, so not sure if we can
make it work for failover for subscribers.
That's when the consistency between a sequence and the table using it
would be a must. So it's better that we take that into consideration
now.
With the ideas being discussed here, I could only see the use case of
a major version upgrade or planned switchover to work. If we come up
with any other agreeable way that is better than this then we can
consider the same.
--
With Regards,
Amit Kapila.