Skipping logical replication transactions on subscriber side

Started by Masahiko Sawadaover 4 years ago590 messages
#1Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com

Hi all,

If a logical replication worker cannot apply the change on the
subscriber for some reason (e.g., missing table or violating a
constraint, etc.), logical replication stops until the problem is
resolved. Ideally, we resolve the problem on the subscriber (e.g., by
creating the missing table or removing the conflicting data, etc.) but
occasionally a problem cannot be fixed and it may be necessary to skip
the entire transaction in question. Currently, we have two ways to
skip transactions: advancing the LSN of the replication origin on the
subscriber and advancing the LSN of the replication slot on the
publisher. But both ways might not be able to skip exactly one
transaction in question and end up skipping other transactions too.

I’d like to propose a way to skip the particular transaction on the
subscriber side. As the first step, a transaction can be specified to
be skipped by specifying remote XID on the subscriber. This feature
would need two sub-features: (1) a sub-feature for users to identify
the problem subscription and the problem transaction’s XID, and (2) a
sub-feature to skip the particular transaction to apply.

For (1), I think the simplest way would be to put the details of the
change being applied in errcontext. For example, the following
errcontext shows the remote XID as well as the action name, the
relation name, and commit timestamp:

ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (c)=(1) already exists.
CONTEXT: during apply of "INSERT" for relation "public.test" in
transaction with xid 590 commit timestamp 2021-05-21
14:32:02.134273+09

The user can identify which remote XID has a problem during applying
the change (XID=590 in this case). As another idea, we can have a
statistics view for logical replication workers, showing information
of the last failure transaction.

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog. The syntax allows users to specify one remote XID to skip. In
the future, it might be good if users can also specify multiple XIDs
(a range of XIDs or a list of XIDs, etc).

Feedback and comment are very welcome.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#2Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Skipping logical replication transactions on subscriber side

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

If a logical replication worker cannot apply the change on the
subscriber for some reason (e.g., missing table or violating a
constraint, etc.), logical replication stops until the problem is
resolved. Ideally, we resolve the problem on the subscriber (e.g., by
creating the missing table or removing the conflicting data, etc.) but
occasionally a problem cannot be fixed and it may be necessary to skip
the entire transaction in question. Currently, we have two ways to
skip transactions: advancing the LSN of the replication origin on the
subscriber and advancing the LSN of the replication slot on the
publisher. But both ways might not be able to skip exactly one
transaction in question and end up skipping other transactions too.

I’d like to propose a way to skip the particular transaction on the
subscriber side. As the first step, a transaction can be specified to
be skipped by specifying remote XID on the subscriber. This feature
would need two sub-features: (1) a sub-feature for users to identify
the problem subscription and the problem transaction’s XID, and (2) a
sub-feature to skip the particular transaction to apply.

For (1), I think the simplest way would be to put the details of the
change being applied in errcontext. For example, the following
errcontext shows the remote XID as well as the action name, the
relation name, and commit timestamp:

ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (c)=(1) already exists.
CONTEXT: during apply of "INSERT" for relation "public.test" in
transaction with xid 590 commit timestamp 2021-05-21
14:32:02.134273+09

In the above, the subscription name/id is not mentioned. I think you
need it for sub-feature-2.

The user can identify which remote XID has a problem during applying
the change (XID=590 in this case). As another idea, we can have a
statistics view for logical replication workers, showing information
of the last failure transaction.

It might be good to display at both places. Having subscriber-side
information in the view might be helpful in other ways as well like we
can use it to display the number of transactions processed by a
particular subscriber.

I think you need to consider few more things here:
(a) Say the error occurs after applying some part of changes, then
just skipping the remaining part won't be sufficient, we probably need
to someway rollback the applied changes (by rolling back the
transaction or in some other way).
(b) How do you handle streamed transactions? It is possible that some
of the streams are successful and the error occurs after that, say
when writing to the stream file. Now, would you skip writing to stream
file or will you write it, and then during apply, you will skip the
entire transaction and remove the corresponding stream file.
(c) There is also a possibility that the error occurs while applying
the changes of some subtransaction (this is only possible for
streaming xacts), so, in such cases, do we allow users to rollback the
subtransaction or user has to rollback the entire transaction. I am
not sure but maybe for very large transactions users might just want
to rollback the subtransaction.
(d) How about prepared transactions? Do we need to rollback the
prepared transaction if user decides to skip such a transaction? We
already allow prepared transactions to be streamed to plugins and the
work for subscriber-side apply is in progress [1]/messages/by-id/CAHut+PsDysQA=JWXb6oGFr1npvqi1e7RzzXV-juCCxnbiwHvfA@mail.gmail.com, so I think we need
to consider this case as well.
(e) Do we want to provide such a feature via output plugins as well,
if not, why?

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog.

What if we fail while updating the reset information in the catalog?
Will it be the responsibility of the user to reset such a transaction
or we will retry it after restart of worker? Now, say, we give such a
responsibility to the user and the user forgets to reset it then there
is a possibility that after wraparound we will again skip the
transaction which is not intended. And, if we want to retry it after
restart of worker, how will the worker remember the previous failure?

I think this will be a useful feature but we need to consider few more things.

[1]: /messages/by-id/CAHut+PsDysQA=JWXb6oGFr1npvqi1e7RzzXV-juCCxnbiwHvfA@mail.gmail.com

--
With Regards,
Amit Kapila.

#3Bharath Rupireddy
Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Skipping logical replication transactions on subscriber side

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

If a logical replication worker cannot apply the change on the
subscriber for some reason (e.g., missing table or violating a
constraint, etc.), logical replication stops until the problem is
resolved. Ideally, we resolve the problem on the subscriber (e.g., by
creating the missing table or removing the conflicting data, etc.) but
occasionally a problem cannot be fixed and it may be necessary to skip
the entire transaction in question. Currently, we have two ways to
skip transactions: advancing the LSN of the replication origin on the
subscriber and advancing the LSN of the replication slot on the
publisher. But both ways might not be able to skip exactly one
transaction in question and end up skipping other transactions too.

Does it mean pg_replication_origin_advance() can't skip exactly one
txn? I'm not familiar with the function or never used it though, I was
just searching for "how to skip a single txn in postgres" and ended up
in [1]https://www.postgresql.org/docs/devel/logical-replication-conflicts.html. Could you please give some more details on scenarios when we
can't skip exactly one txn? Is there any other way to advance the LSN,
something like directly updating the pg_replication_slots catalog?

[1]: https://www.postgresql.org/docs/devel/logical-replication-conflicts.html

I’d like to propose a way to skip the particular transaction on the
subscriber side. As the first step, a transaction can be specified to
be skipped by specifying remote XID on the subscriber. This feature
would need two sub-features: (1) a sub-feature for users to identify
the problem subscription and the problem transaction’s XID, and (2) a
sub-feature to skip the particular transaction to apply.

For (1), I think the simplest way would be to put the details of the
change being applied in errcontext. For example, the following
errcontext shows the remote XID as well as the action name, the
relation name, and commit timestamp:

ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (c)=(1) already exists.
CONTEXT: during apply of "INSERT" for relation "public.test" in
transaction with xid 590 commit timestamp 2021-05-21
14:32:02.134273+09

The user can identify which remote XID has a problem during applying
the change (XID=590 in this case). As another idea, we can have a
statistics view for logical replication workers, showing information
of the last failure transaction.

Agree with Amit on this. At times, it is difficult to look around in
the server logs, so it will be better to have it in both places.

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog. The syntax allows users to specify one remote XID to skip. In
the future, it might be good if users can also specify multiple XIDs
(a range of XIDs or a list of XIDs, etc).

What's it like skipping a txn with txn id? Is it that the particular
txn is forced to commit or abort or just skipping some of the code in
the apply worker? IIUC, the behavior of RESET SKIP TRANSACTION is just
to forget the txn id specified in SET SKIP TRANSACTION right?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#4Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#2)
Re: Skipping logical replication transactions on subscriber side

On Mon, May 24, 2021 at 7:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

If a logical replication worker cannot apply the change on the
subscriber for some reason (e.g., missing table or violating a
constraint, etc.), logical replication stops until the problem is
resolved. Ideally, we resolve the problem on the subscriber (e.g., by
creating the missing table or removing the conflicting data, etc.) but
occasionally a problem cannot be fixed and it may be necessary to skip
the entire transaction in question. Currently, we have two ways to
skip transactions: advancing the LSN of the replication origin on the
subscriber and advancing the LSN of the replication slot on the
publisher. But both ways might not be able to skip exactly one
transaction in question and end up skipping other transactions too.

I’d like to propose a way to skip the particular transaction on the
subscriber side. As the first step, a transaction can be specified to
be skipped by specifying remote XID on the subscriber. This feature
would need two sub-features: (1) a sub-feature for users to identify
the problem subscription and the problem transaction’s XID, and (2) a
sub-feature to skip the particular transaction to apply.

For (1), I think the simplest way would be to put the details of the
change being applied in errcontext. For example, the following
errcontext shows the remote XID as well as the action name, the
relation name, and commit timestamp:

ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (c)=(1) already exists.
CONTEXT: during apply of "INSERT" for relation "public.test" in
transaction with xid 590 commit timestamp 2021-05-21
14:32:02.134273+09

In the above, the subscription name/id is not mentioned. I think you
need it for sub-feature-2.

Agreed.

The user can identify which remote XID has a problem during applying
the change (XID=590 in this case). As another idea, we can have a
statistics view for logical replication workers, showing information
of the last failure transaction.

It might be good to display at both places. Having subscriber-side
information in the view might be helpful in other ways as well like we
can use it to display the number of transactions processed by a
particular subscriber.

Yes. I think we can report that information to the stats collector. It
needs to live even after the worker exiting.

I think you need to consider few more things here:
(a) Say the error occurs after applying some part of changes, then
just skipping the remaining part won't be sufficient, we probably need
to someway rollback the applied changes (by rolling back the
transaction or in some other way).

After more thought, it might be better to that setting and resetting
the XID to skip requires disabling the subscription. This would not be
a restriction for users since logical replication is likely to already
stop (and possibly repeating restarting and stopping) due to an error.
Setting and resetting the XID modifies the system catalog so it's a
crash-safe change and survives beyond the server restarts. When a
logical replication worker starts, it checks the XID. If the worker
receives changes associated with the transaction with the specified
XID, it can ignore the entire transaction.

(b) How do you handle streamed transactions? It is possible that some
of the streams are successful and the error occurs after that, say
when writing to the stream file. Now, would you skip writing to stream
file or will you write it, and then during apply, you will skip the
entire transaction and remove the corresponding stream file.

I think streamed transactions can be handled in the same way described in (a).

(c) There is also a possibility that the error occurs while applying
the changes of some subtransaction (this is only possible for
streaming xacts), so, in such cases, do we allow users to rollback the
subtransaction or user has to rollback the entire transaction. I am
not sure but maybe for very large transactions users might just want
to rollback the subtransaction.

If the user specifies XID of a subtransaction, it would be better to
skip only the subtransaction. If specifies top transaction XID, it
would be better to skip the entire transaction. What do you think?

(d) How about prepared transactions? Do we need to rollback the
prepared transaction if user decides to skip such a transaction? We
already allow prepared transactions to be streamed to plugins and the
work for subscriber-side apply is in progress [1], so I think we need
to consider this case as well.

If a transaction replicated from the subscriber could be prepared on
the subscriber, it would be guaranteed to be able to be either
committed or rolled back. Given that this feature is to skip a problem
transaction, I think it should not do anything for transactions that
are already prepared on the subscriber.

(e) Do we want to provide such a feature via output plugins as well,
if not, why?

You mean to specify an XID to skip on the publisher side? Since I've
been considering this feature as a way to resume the logical
replication having a problem I've not thought of that idea but It
would be a good idea. Do you have any use cases? If we specified the
XID on the publisher, multiple subscribers would skip that
transaction.

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog.

What if we fail while updating the reset information in the catalog?
Will it be the responsibility of the user to reset such a transaction
or we will retry it after restart of worker? Now, say, we give such a
responsibility to the user and the user forgets to reset it then there
is a possibility that after wraparound we will again skip the
transaction which is not intended. And, if we want to retry it after
restart of worker, how will the worker remember the previous failure?

As described above, setting and resetting XID to skip is implemented
as a normal system catalog change, so it's crash-safe and persisted. I
think that the worker can either removes the XID or mark it as done
once it skipped the specified transaction so that it won't skip the
same XID again after wraparound. Also, it might be better if we reset
the XID also when a subscription field such as subconninfo is changed
because it could imply the worker will connect to another publisher
having a different XID space.

We also need to handle the cases where the user specifies an old XID
or XID whose transaction is already prepared on the subscriber. I
think the worker can reset the XID with a warning when it finds out
that the XID seems no longer valid or it cannot skip the specified
XID. For example in the former case, it can do that when the first
received transaction’s XID is newer than the specified XID. In the
latter case, it can do that when it receives the commit/rollback
prepared message of the specified XID.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#5Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Bharath Rupireddy (#3)
Re: Skipping logical replication transactions on subscriber side

On Tue, May 25, 2021 at 2:49 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

If a logical replication worker cannot apply the change on the
subscriber for some reason (e.g., missing table or violating a
constraint, etc.), logical replication stops until the problem is
resolved. Ideally, we resolve the problem on the subscriber (e.g., by
creating the missing table or removing the conflicting data, etc.) but
occasionally a problem cannot be fixed and it may be necessary to skip
the entire transaction in question. Currently, we have two ways to
skip transactions: advancing the LSN of the replication origin on the
subscriber and advancing the LSN of the replication slot on the
publisher. But both ways might not be able to skip exactly one
transaction in question and end up skipping other transactions too.

Does it mean pg_replication_origin_advance() can't skip exactly one
txn? I'm not familiar with the function or never used it though, I was
just searching for "how to skip a single txn in postgres" and ended up
in [1]. Could you please give some more details on scenarios when we
can't skip exactly one txn? Is there any other way to advance the LSN,
something like directly updating the pg_replication_slots catalog?

Sorry, it's not impossible. Although the user mistakenly skips more
than one transaction by specifying a wrong LSN it's always possible to
skip an exact one transaction.

[1] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html

I’d like to propose a way to skip the particular transaction on the
subscriber side. As the first step, a transaction can be specified to
be skipped by specifying remote XID on the subscriber. This feature
would need two sub-features: (1) a sub-feature for users to identify
the problem subscription and the problem transaction’s XID, and (2) a
sub-feature to skip the particular transaction to apply.

For (1), I think the simplest way would be to put the details of the
change being applied in errcontext. For example, the following
errcontext shows the remote XID as well as the action name, the
relation name, and commit timestamp:

ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (c)=(1) already exists.
CONTEXT: during apply of "INSERT" for relation "public.test" in
transaction with xid 590 commit timestamp 2021-05-21
14:32:02.134273+09

The user can identify which remote XID has a problem during applying
the change (XID=590 in this case). As another idea, we can have a
statistics view for logical replication workers, showing information
of the last failure transaction.

Agree with Amit on this. At times, it is difficult to look around in
the server logs, so it will be better to have it in both places.

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog. The syntax allows users to specify one remote XID to skip. In
the future, it might be good if users can also specify multiple XIDs
(a range of XIDs or a list of XIDs, etc).

What's it like skipping a txn with txn id? Is it that the particular
txn is forced to commit or abort or just skipping some of the code in
the apply worker?

What I'm thinking is to ignore the entire transaction with the
specified XID. IOW Logical replication workers don't even start the
transaction and ignore all changes associated with the XID.

IIUC, the behavior of RESET SKIP TRANSACTION is just
to forget the txn id specified in SET SKIP TRANSACTION right?

Right. I proposed this RESET command for users to cancel the skipping behavior.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#6Bharath Rupireddy
Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Masahiko Sawada (#5)
Re: Skipping logical replication transactions on subscriber side

On Tue, May 25, 2021 at 1:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, May 25, 2021 at 2:49 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

If a logical replication worker cannot apply the change on the
subscriber for some reason (e.g., missing table or violating a
constraint, etc.), logical replication stops until the problem is
resolved. Ideally, we resolve the problem on the subscriber (e.g., by
creating the missing table or removing the conflicting data, etc.) but
occasionally a problem cannot be fixed and it may be necessary to skip
the entire transaction in question. Currently, we have two ways to
skip transactions: advancing the LSN of the replication origin on the
subscriber and advancing the LSN of the replication slot on the
publisher. But both ways might not be able to skip exactly one
transaction in question and end up skipping other transactions too.

Does it mean pg_replication_origin_advance() can't skip exactly one
txn? I'm not familiar with the function or never used it though, I was
just searching for "how to skip a single txn in postgres" and ended up
in [1]. Could you please give some more details on scenarios when we
can't skip exactly one txn? Is there any other way to advance the LSN,
something like directly updating the pg_replication_slots catalog?

Sorry, it's not impossible. Although the user mistakenly skips more
than one transaction by specifying a wrong LSN it's always possible to
skip an exact one transaction.

IIUC, if the user specifies the "correct LSN", then it's possible to
skip exact txn for which the sync workers are unable to apply changes,
right?

How can the user get the LSN (which we call "correct LSN")? Is it from
pg_replication_slots? Or some other way?

If the user somehow can get the "correct LSN", can't the exact txn be
skipped using it with any of the existing ways, either using
pg_replication_origin_advance or any other ways?

If there's no way to get the "correct LSN", then why can't we just
print that LSN in the error context and/or in the new statistics view
for logical replication workers, so that any of the existing ways can
be used to skip exactly one txn?

IIUC, the feature proposed here guards against the users specifying
wrong LSN. If I'm right, what is the guarantee that users don't
specify the wrong txn id? Why can't we tell the users when a wrong LSN
is specified that "currently, an apply worker is failing to apply the
LSN XXXX, and you specified LSN YYYY, are you sure this is
intentional?"

Please correct me if I'm missing anything.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

#7Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Bharath Rupireddy (#6)
Re: Skipping logical replication transactions on subscriber side

On Tue, May 25, 2021 at 7:21 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Tue, May 25, 2021 at 1:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, May 25, 2021 at 2:49 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

If a logical replication worker cannot apply the change on the
subscriber for some reason (e.g., missing table or violating a
constraint, etc.), logical replication stops until the problem is
resolved. Ideally, we resolve the problem on the subscriber (e.g., by
creating the missing table or removing the conflicting data, etc.) but
occasionally a problem cannot be fixed and it may be necessary to skip
the entire transaction in question. Currently, we have two ways to
skip transactions: advancing the LSN of the replication origin on the
subscriber and advancing the LSN of the replication slot on the
publisher. But both ways might not be able to skip exactly one
transaction in question and end up skipping other transactions too.

Does it mean pg_replication_origin_advance() can't skip exactly one
txn? I'm not familiar with the function or never used it though, I was
just searching for "how to skip a single txn in postgres" and ended up
in [1]. Could you please give some more details on scenarios when we
can't skip exactly one txn? Is there any other way to advance the LSN,
something like directly updating the pg_replication_slots catalog?

Sorry, it's not impossible. Although the user mistakenly skips more
than one transaction by specifying a wrong LSN it's always possible to
skip an exact one transaction.

IIUC, if the user specifies the "correct LSN", then it's possible to
skip exact txn for which the sync workers are unable to apply changes,
right?

How can the user get the LSN (which we call "correct LSN")? Is it from
pg_replication_slots? Or some other way?

If the user somehow can get the "correct LSN", can't the exact txn be
skipped using it with any of the existing ways, either using
pg_replication_origin_advance or any other ways?

One possible way I know is to copy the logical replication slot used
by the subscriber and peek at the changes to identify the correct LSN
(maybe there is another handy way though) . For example, suppose that
two transactions insert tuples as follows on the publisher:

TX-A: BEGIN;
TX-A: INSERT INTO test VALUES (1);
TX-B: BEGIN;
TX-B: INSERT INTO test VALUES (10);
TX-B: COMMIT;
TX-A: INSERT INTO test VALUES (2);
TX-A: COMMIT;

And suppose further that the insertion with value = 10 (by TX-A)
cannot be applied only on the subscriber due to unique constraint
violation. If we copy the slot by
pg_copy_logical_replication_slot('test_sub', 'copy_slot', true,
'test_decoding') , we can peek at those changes with LSN as follows:

=# select * from pg_logical_slot_peek_changes('copy', null, null) order by lsn;
lsn | xid | data
-----------+-----+------------------------------------------
0/1911548 | 736 | BEGIN 736
0/1911548 | 736 | table public.hoge: INSERT: c[integer]:1
0/1911588 | 737 | BEGIN 737
0/1911588 | 737 | table public.hoge: INSERT: c[integer]:10
0/19115F8 | 737 | COMMIT 737
0/1911630 | 736 | table public.hoge: INSERT: c[integer]:2
0/19116A0 | 736 | COMMIT 736
(7 rows)

In this case, '0/19115F8' is the correct LSN to specify. We can
advance the replication origin to ' 0/19115F8' by
pg_replication_origin_advance() so that logical replication streams
transactions committed after ' 0/19115F8'. After the logical
replication restarting, it skips the transaction with xid = 737 but
replicates the transaction with xid = 736.

If there's no way to get the "correct LSN", then why can't we just
print that LSN in the error context and/or in the new statistics view
for logical replication workers, so that any of the existing ways can
be used to skip exactly one txn?

I think specifying XID to the subscription is more understandable for users.

IIUC, the feature proposed here guards against the users specifying
wrong LSN. If I'm right, what is the guarantee that users don't
specify the wrong txn id? Why can't we tell the users when a wrong LSN
is specified that "currently, an apply worker is failing to apply the
LSN XXXX, and you specified LSN YYYY, are you sure this is
intentional?"

With the initial idea, specifying the correct XID is the user's
responsibility. If they specify an old XID, the worker invalids it and
raises a warning to tell "the worker invalidated the specified XID as
it's too old". As the second idea, if we store the last failed XID
somewhere (e.g., a system catalog), the user can just specify to skip
that transaction. That is, instead of specifying the XID they could do
something like "ALTER SUBSCRIPTION test_sub RESOLVE CONFLICT BY SKIP".

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#8Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#4)
Re: Skipping logical replication transactions on subscriber side

On Tue, May 25, 2021 at 12:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 24, 2021 at 7:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I think you need to consider few more things here:
(a) Say the error occurs after applying some part of changes, then
just skipping the remaining part won't be sufficient, we probably need
to someway rollback the applied changes (by rolling back the
transaction or in some other way).

After more thought, it might be better to that setting and resetting
the XID to skip requires disabling the subscription.

It might be better if it doesn't require disabling the subscription
because it would be more steps for the user to disable/enable it. It
is not clear to me what exactly you want to gain by disabling the
subscription in this case.

This would not be
a restriction for users since logical replication is likely to already
stop (and possibly repeating restarting and stopping) due to an error.
Setting and resetting the XID modifies the system catalog so it's a
crash-safe change and survives beyond the server restarts. When a
logical replication worker starts, it checks the XID. If the worker
receives changes associated with the transaction with the specified
XID, it can ignore the entire transaction.

(b) How do you handle streamed transactions? It is possible that some
of the streams are successful and the error occurs after that, say
when writing to the stream file. Now, would you skip writing to stream
file or will you write it, and then during apply, you will skip the
entire transaction and remove the corresponding stream file.

I think streamed transactions can be handled in the same way described in (a).

(c) There is also a possibility that the error occurs while applying
the changes of some subtransaction (this is only possible for
streaming xacts), so, in such cases, do we allow users to rollback the
subtransaction or user has to rollback the entire transaction. I am
not sure but maybe for very large transactions users might just want
to rollback the subtransaction.

If the user specifies XID of a subtransaction, it would be better to
skip only the subtransaction. If specifies top transaction XID, it
would be better to skip the entire transaction. What do you think?

makes sense.

(d) How about prepared transactions? Do we need to rollback the
prepared transaction if user decides to skip such a transaction? We
already allow prepared transactions to be streamed to plugins and the
work for subscriber-side apply is in progress [1], so I think we need
to consider this case as well.

If a transaction replicated from the subscriber could be prepared on
the subscriber, it would be guaranteed to be able to be either
committed or rolled back. Given that this feature is to skip a problem
transaction, I think it should not do anything for transactions that
are already prepared on the subscriber.

makes sense, but I think we need to reset the XID in such a case.

(e) Do we want to provide such a feature via output plugins as well,
if not, why?

You mean to specify an XID to skip on the publisher side? Since I've
been considering this feature as a way to resume the logical
replication having a problem I've not thought of that idea but It
would be a good idea. Do you have any use cases?

No. On again thinking about this, I think we can leave this for now.

If we specified the
XID on the publisher, multiple subscribers would skip that
transaction.

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog.

What if we fail while updating the reset information in the catalog?
Will it be the responsibility of the user to reset such a transaction
or we will retry it after restart of worker? Now, say, we give such a
responsibility to the user and the user forgets to reset it then there
is a possibility that after wraparound we will again skip the
transaction which is not intended. And, if we want to retry it after
restart of worker, how will the worker remember the previous failure?

As described above, setting and resetting XID to skip is implemented
as a normal system catalog change, so it's crash-safe and persisted. I
think that the worker can either removes the XID or mark it as done
once it skipped the specified transaction so that it won't skip the
same XID again after wraparound.

It all depends on when exactly you want to update the catalog
information. Say after skipping commit of the XID, we do update the
corresponding LSN to be communicated as already processed to the
subscriber and then get the error while updating the catalog
information then next time we might not know whether to update the
catalog for skipped XID.

Also, it might be better if we reset
the XID also when a subscription field such as subconninfo is changed
because it could imply the worker will connect to another publisher
having a different XID space.

We also need to handle the cases where the user specifies an old XID
or XID whose transaction is already prepared on the subscriber. I
think the worker can reset the XID with a warning when it finds out
that the XID seems no longer valid or it cannot skip the specified
XID. For example in the former case, it can do that when the first
received transaction’s XID is newer than the specified XID.

But how can we guarantee that older XID can't be received later? Is
there a guarantee that we receive the transactions on subscriber in
XID order.

--
With Regards,
Amit Kapila.

#9Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#7)
Re: Skipping logical replication transactions on subscriber side

On Tue, May 25, 2021 at 6:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, May 25, 2021 at 7:21 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

If there's no way to get the "correct LSN", then why can't we just
print that LSN in the error context and/or in the new statistics view
for logical replication workers, so that any of the existing ways can
be used to skip exactly one txn?

I think specifying XID to the subscription is more understandable for users.

I agree with you that specifying XID could be easier and
understandable for users. I was thinking and studying a bit about what
other systems do in this regard. Why don't we try to provide conflict
resolution methods for users? The idea could be that either the
conflicts can be resolved automatically or manually. In the case of
manual resolution, users can use the existing methods or the XID stuff
you are proposing here and in case of automatic resolution, the
in-built or corresponding user-defined functions will be invoked for
conflict resolution. There are more details to figure out in the
automatic resolution scheme but I see a lot of value in doing the
same.

--
With Regards,
Amit Kapila.

#10Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#8)
Re: Skipping logical replication transactions on subscriber side

On Wed, May 26, 2021 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 25, 2021 at 12:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 24, 2021 at 7:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I think you need to consider few more things here:
(a) Say the error occurs after applying some part of changes, then
just skipping the remaining part won't be sufficient, we probably need
to someway rollback the applied changes (by rolling back the
transaction or in some other way).

After more thought, it might be better to that setting and resetting
the XID to skip requires disabling the subscription.

It might be better if it doesn't require disabling the subscription
because it would be more steps for the user to disable/enable it. It
is not clear to me what exactly you want to gain by disabling the
subscription in this case.

The situation I’m considered is where the user specifies the XID while
the worker is applying the changes of the transaction with that XID.
In this case, I think we need to somehow rollback the changes applied
so far. Perhaps we can either rollback the transaction and ignore the
remaining changes or restart and ignore the entire transaction from
the beginning. Also, we need to handle the case where the user resets
the XID after the worker skips to write some stream files. I thought
those parts could be complicated but it might be not after more
thought.

This would not be
a restriction for users since logical replication is likely to already
stop (and possibly repeating restarting and stopping) due to an error.
Setting and resetting the XID modifies the system catalog so it's a
crash-safe change and survives beyond the server restarts. When a
logical replication worker starts, it checks the XID. If the worker
receives changes associated with the transaction with the specified
XID, it can ignore the entire transaction.

(b) How do you handle streamed transactions? It is possible that some
of the streams are successful and the error occurs after that, say
when writing to the stream file. Now, would you skip writing to stream
file or will you write it, and then during apply, you will skip the
entire transaction and remove the corresponding stream file.

I think streamed transactions can be handled in the same way described in (a).

If setting and resetting the XID can be performed during the worker
running, we would need to write stream files even if we’re receiving
changes that are associated with the specified XID. Since it could
happen that the user resets the XID after we processed some of the
streamed changes, we would need to decide whether or to skip the
transaction when starting to apply changes.

(c) There is also a possibility that the error occurs while applying
the changes of some subtransaction (this is only possible for
streaming xacts), so, in such cases, do we allow users to rollback the
subtransaction or user has to rollback the entire transaction. I am
not sure but maybe for very large transactions users might just want
to rollback the subtransaction.

If the user specifies XID of a subtransaction, it would be better to
skip only the subtransaction. If specifies top transaction XID, it
would be better to skip the entire transaction. What do you think?

makes sense.

(d) How about prepared transactions? Do we need to rollback the
prepared transaction if user decides to skip such a transaction? We
already allow prepared transactions to be streamed to plugins and the
work for subscriber-side apply is in progress [1], so I think we need
to consider this case as well.

If a transaction replicated from the subscriber could be prepared on
the subscriber, it would be guaranteed to be able to be either
committed or rolled back. Given that this feature is to skip a problem
transaction, I think it should not do anything for transactions that
are already prepared on the subscriber.

makes sense, but I think we need to reset the XID in such a case.

Agreed.

(e) Do we want to provide such a feature via output plugins as well,
if not, why?

You mean to specify an XID to skip on the publisher side? Since I've
been considering this feature as a way to resume the logical
replication having a problem I've not thought of that idea but It
would be a good idea. Do you have any use cases?

No. On again thinking about this, I think we can leave this for now.

If we specified the
XID on the publisher, multiple subscribers would skip that
transaction.

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog.

What if we fail while updating the reset information in the catalog?
Will it be the responsibility of the user to reset such a transaction
or we will retry it after restart of worker? Now, say, we give such a
responsibility to the user and the user forgets to reset it then there
is a possibility that after wraparound we will again skip the
transaction which is not intended. And, if we want to retry it after
restart of worker, how will the worker remember the previous failure?

As described above, setting and resetting XID to skip is implemented
as a normal system catalog change, so it's crash-safe and persisted. I
think that the worker can either removes the XID or mark it as done
once it skipped the specified transaction so that it won't skip the
same XID again after wraparound.

It all depends on when exactly you want to update the catalog
information. Say after skipping commit of the XID, we do update the
corresponding LSN to be communicated as already processed to the
subscriber and then get the error while updating the catalog
information then next time we might not know whether to update the
catalog for skipped XID.

Also, it might be better if we reset
the XID also when a subscription field such as subconninfo is changed
because it could imply the worker will connect to another publisher
having a different XID space.

We also need to handle the cases where the user specifies an old XID
or XID whose transaction is already prepared on the subscriber. I
think the worker can reset the XID with a warning when it finds out
that the XID seems no longer valid or it cannot skip the specified
XID. For example in the former case, it can do that when the first
received transaction’s XID is newer than the specified XID.

But how can we guarantee that older XID can't be received later? Is
there a guarantee that we receive the transactions on subscriber in
XID order.

Considering the above two comments, it might be better to provide a
way to skip the transaction that is already known to be conflicted
rather than allowing users to specify the arbitrary XID.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#11Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#10)
Re: Skipping logical replication transactions on subscriber side

On Thu, May 27, 2021 at 9:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, May 26, 2021 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 25, 2021 at 12:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 24, 2021 at 7:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I think you need to consider few more things here:
(a) Say the error occurs after applying some part of changes, then
just skipping the remaining part won't be sufficient, we probably need
to someway rollback the applied changes (by rolling back the
transaction or in some other way).

After more thought, it might be better to that setting and resetting
the XID to skip requires disabling the subscription.

It might be better if it doesn't require disabling the subscription
because it would be more steps for the user to disable/enable it. It
is not clear to me what exactly you want to gain by disabling the
subscription in this case.

The situation I’m considered is where the user specifies the XID while
the worker is applying the changes of the transaction with that XID.
In this case, I think we need to somehow rollback the changes applied
so far. Perhaps we can either rollback the transaction and ignore the
remaining changes or restart and ignore the entire transaction from
the beginning.

If we follow your suggestion of only allowing XIDs that have been
known to have conflicts then probably we don't need to worry about
rollbacks.

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog.

What if we fail while updating the reset information in the catalog?
Will it be the responsibility of the user to reset such a transaction
or we will retry it after restart of worker? Now, say, we give such a
responsibility to the user and the user forgets to reset it then there
is a possibility that after wraparound we will again skip the
transaction which is not intended. And, if we want to retry it after
restart of worker, how will the worker remember the previous failure?

As described above, setting and resetting XID to skip is implemented
as a normal system catalog change, so it's crash-safe and persisted. I
think that the worker can either removes the XID or mark it as done
once it skipped the specified transaction so that it won't skip the
same XID again after wraparound.

It all depends on when exactly you want to update the catalog
information. Say after skipping commit of the XID, we do update the
corresponding LSN to be communicated as already processed to the
subscriber and then get the error while updating the catalog
information then next time we might not know whether to update the
catalog for skipped XID.

Also, it might be better if we reset
the XID also when a subscription field such as subconninfo is changed
because it could imply the worker will connect to another publisher
having a different XID space.

We also need to handle the cases where the user specifies an old XID
or XID whose transaction is already prepared on the subscriber. I
think the worker can reset the XID with a warning when it finds out
that the XID seems no longer valid or it cannot skip the specified
XID. For example in the former case, it can do that when the first
received transaction’s XID is newer than the specified XID.

But how can we guarantee that older XID can't be received later? Is
there a guarantee that we receive the transactions on subscriber in
XID order.

Considering the above two comments, it might be better to provide a
way to skip the transaction that is already known to be conflicted
rather than allowing users to specify the arbitrary XID.

Okay, that makes sense but still not sure how will you identify if we
need to reset XID in case of failure doing that in the previous
attempt. Also, I am thinking that instead of a stat view, do we need
to consider having a system table (pg_replication_conflicts or
something like that) for this because what if stats information is
lost (say either due to crash or due to udp packet loss), can we rely
on stats view for this?

--
With Regards,
Amit Kapila.

#12Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#9)
Re: Skipping logical replication transactions on subscriber side

On Wed, May 26, 2021 at 6:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 25, 2021 at 6:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, May 25, 2021 at 7:21 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

If there's no way to get the "correct LSN", then why can't we just
print that LSN in the error context and/or in the new statistics view
for logical replication workers, so that any of the existing ways can
be used to skip exactly one txn?

I think specifying XID to the subscription is more understandable for users.

I agree with you that specifying XID could be easier and
understandable for users. I was thinking and studying a bit about what
other systems do in this regard. Why don't we try to provide conflict
resolution methods for users? The idea could be that either the
conflicts can be resolved automatically or manually. In the case of
manual resolution, users can use the existing methods or the XID stuff
you are proposing here and in case of automatic resolution, the
in-built or corresponding user-defined functions will be invoked for
conflict resolution. There are more details to figure out in the
automatic resolution scheme but I see a lot of value in doing the
same.

Yeah, I also see a lot of value in automatic conflict resolution. But
maybe we can have both ways? For example, in case where the user wants
to resolve conflicts in different ways or a conflict that cannot be
resolved by automatic resolution (not sure there is in practice
though), the manual resolution would also have value.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#13Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#11)
Re: Skipping logical replication transactions on subscriber side

On Thu, May 27, 2021 at 2:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 9:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, May 26, 2021 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 25, 2021 at 12:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, May 24, 2021 at 7:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, May 24, 2021 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I think you need to consider few more things here:
(a) Say the error occurs after applying some part of changes, then
just skipping the remaining part won't be sufficient, we probably need
to someway rollback the applied changes (by rolling back the
transaction or in some other way).

After more thought, it might be better to that setting and resetting
the XID to skip requires disabling the subscription.

It might be better if it doesn't require disabling the subscription
because it would be more steps for the user to disable/enable it. It
is not clear to me what exactly you want to gain by disabling the
subscription in this case.

The situation I’m considered is where the user specifies the XID while
the worker is applying the changes of the transaction with that XID.
In this case, I think we need to somehow rollback the changes applied
so far. Perhaps we can either rollback the transaction and ignore the
remaining changes or restart and ignore the entire transaction from
the beginning.

If we follow your suggestion of only allowing XIDs that have been
known to have conflicts then probably we don't need to worry about
rollbacks.

For (2), what I'm thinking is to add a new action to ALTER
SUBSCRIPTION command like ALTER SUBSCRIPTION test_sub SET SKIP
TRANSACTION 590. Also, we can have actions to reset it; ALTER
SUBSCRIPTION test_sub RESET SKIP TRANSACTION. Those commands add the
XID to a new column of pg_subscription or a new catalog, having the
worker reread its subscription information. Once the worker skipped
the specified transaction, it resets the transaction to skip on the
catalog.

What if we fail while updating the reset information in the catalog?
Will it be the responsibility of the user to reset such a transaction
or we will retry it after restart of worker? Now, say, we give such a
responsibility to the user and the user forgets to reset it then there
is a possibility that after wraparound we will again skip the
transaction which is not intended. And, if we want to retry it after
restart of worker, how will the worker remember the previous failure?

As described above, setting and resetting XID to skip is implemented
as a normal system catalog change, so it's crash-safe and persisted. I
think that the worker can either removes the XID or mark it as done
once it skipped the specified transaction so that it won't skip the
same XID again after wraparound.

It all depends on when exactly you want to update the catalog
information. Say after skipping commit of the XID, we do update the
corresponding LSN to be communicated as already processed to the
subscriber and then get the error while updating the catalog
information then next time we might not know whether to update the
catalog for skipped XID.

Also, it might be better if we reset
the XID also when a subscription field such as subconninfo is changed
because it could imply the worker will connect to another publisher
having a different XID space.

We also need to handle the cases where the user specifies an old XID
or XID whose transaction is already prepared on the subscriber. I
think the worker can reset the XID with a warning when it finds out
that the XID seems no longer valid or it cannot skip the specified
XID. For example in the former case, it can do that when the first
received transaction’s XID is newer than the specified XID.

But how can we guarantee that older XID can't be received later? Is
there a guarantee that we receive the transactions on subscriber in
XID order.

Considering the above two comments, it might be better to provide a
way to skip the transaction that is already known to be conflicted
rather than allowing users to specify the arbitrary XID.

Okay, that makes sense but still not sure how will you identify if we
need to reset XID in case of failure doing that in the previous
attempt.

It's a just idea but we can record the failed transaction with XID as
well as its commit LSN passed? The sequence I'm thinking is,

1. the worker records the XID and commit LSN of the failed transaction
to a catalog.
2. the user specifies how to resolve that conflict transaction
(currently only 'skip' is supported) and writes to the catalog.
3. the worker does the resolution method according to the catalog. If
the worker didn't start to apply those changes, it can skip the entire
transaction. If did, it rollbacks the transaction and ignores the
remaining.

The worker needs neither to reset information of the last failed
transaction nor to mark the conflicted transaction as resolved. The
worker will ignore that information when checking the catalog if the
commit LSN is passed.

Also, I am thinking that instead of a stat view, do we need
to consider having a system table (pg_replication_conflicts or
something like that) for this because what if stats information is
lost (say either due to crash or due to udp packet loss), can we rely
on stats view for this?

Yeah, it seems better to use a catalog.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#14Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#13)
Re: Skipping logical replication transactions on subscriber side

On Thu, May 27, 2021 at 1:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 27, 2021 at 2:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, that makes sense but still not sure how will you identify if we
need to reset XID in case of failure doing that in the previous
attempt.

It's a just idea but we can record the failed transaction with XID as
well as its commit LSN passed? The sequence I'm thinking is,

1. the worker records the XID and commit LSN of the failed transaction
to a catalog.

When will you record this info? I am not sure if we can try to update
this when an error has occurred. We can think of using try..catch in
apply worker and then record it in catch on error but would that be
advisable? One random thought that occurred to me is to that apply
worker notifies such information to the launcher (or maybe another
process) which will log this information.

2. the user specifies how to resolve that conflict transaction
(currently only 'skip' is supported) and writes to the catalog.
3. the worker does the resolution method according to the catalog. If
the worker didn't start to apply those changes, it can skip the entire
transaction. If did, it rollbacks the transaction and ignores the
remaining.

The worker needs neither to reset information of the last failed
transaction nor to mark the conflicted transaction as resolved. The
worker will ignore that information when checking the catalog if the
commit LSN is passed.

So won't this require us to check the required info in the catalog
before applying each transaction? If so, that might be overhead, maybe
we can build some cache of the highest commitLSN that can be consulted
rather than the catalog table. I think we need to think about when to
remove rows for which conflict has been resolved as we can't let that
information grow infinitely.

Also, I am thinking that instead of a stat view, do we need
to consider having a system table (pg_replication_conflicts or
something like that) for this because what if stats information is
lost (say either due to crash or due to udp packet loss), can we rely
on stats view for this?

Yeah, it seems better to use a catalog.

Okay.

--
With Regards,
Amit Kapila.

#15Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#12)
Re: Skipping logical replication transactions on subscriber side

On Thu, May 27, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, May 26, 2021 at 6:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I agree with you that specifying XID could be easier and
understandable for users. I was thinking and studying a bit about what
other systems do in this regard. Why don't we try to provide conflict
resolution methods for users? The idea could be that either the
conflicts can be resolved automatically or manually. In the case of
manual resolution, users can use the existing methods or the XID stuff
you are proposing here and in case of automatic resolution, the
in-built or corresponding user-defined functions will be invoked for
conflict resolution. There are more details to figure out in the
automatic resolution scheme but I see a lot of value in doing the
same.

Yeah, I also see a lot of value in automatic conflict resolution. But
maybe we can have both ways? For example, in case where the user wants
to resolve conflicts in different ways or a conflict that cannot be
resolved by automatic resolution (not sure there is in practice
though), the manual resolution would also have value.

Right, that is exactly what I was saying. So, even if both can be done
as separate patches, we should try to design the manual resolution in
a way that can be extended for an automatic resolution system. I think
we can try to have some initial idea/design/POC for an automatic
resolution as well to ensure that the manual resolution scheme can be
further extended.

--
With Regards,
Amit Kapila.

#16Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#15)
Re: Skipping logical replication transactions on subscriber side

On Thu, May 27, 2021 at 7:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, May 26, 2021 at 6:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I agree with you that specifying XID could be easier and
understandable for users. I was thinking and studying a bit about what
other systems do in this regard. Why don't we try to provide conflict
resolution methods for users? The idea could be that either the
conflicts can be resolved automatically or manually. In the case of
manual resolution, users can use the existing methods or the XID stuff
you are proposing here and in case of automatic resolution, the
in-built or corresponding user-defined functions will be invoked for
conflict resolution. There are more details to figure out in the
automatic resolution scheme but I see a lot of value in doing the
same.

Yeah, I also see a lot of value in automatic conflict resolution. But
maybe we can have both ways? For example, in case where the user wants
to resolve conflicts in different ways or a conflict that cannot be
resolved by automatic resolution (not sure there is in practice
though), the manual resolution would also have value.

Right, that is exactly what I was saying. So, even if both can be done
as separate patches, we should try to design the manual resolution in
a way that can be extended for an automatic resolution system. I think
we can try to have some initial idea/design/POC for an automatic
resolution as well to ensure that the manual resolution scheme can be
further extended.

Totally agreed.

But perhaps we might want to note that the conflict resolution we're
talking about is to resolve conflicts at the row or column level. It
doesn't necessarily raise an ERROR and the granularity of resolution
is per record or column. For example, if a DELETE and an UPDATE
process the same tuple (searched by PK), the UPDATE may not find the
tuple and be ignored due to the tuple having been already deleted. In
this case, no ERROR will occur (i.g. UPDATE will be ignored), but the
user may want to do another conflict resolution. On the other hand,
the feature proposed here assumes that an error has already occurred
and logical replication has already been stopped. And resolves it by
skipping the entire transaction.

IIUC the conflict resolution can be thought of as a combination of
types of conflicts and the resolution that can be applied to them. For
example, if there is a conflict between INSERT and INSERT and the
latter INSERT violates the unique constraint, an ERROR is raised. So
DBA can resolve it manually. But there is another way to automatically
resolve it by selecting the tuple having a newer timestamp. On the
other hand, in the DELETE and UPDATE conflict described above, it's
possible to automatically ignore the fact that the UPDATE could update
the tuple. Or we can even generate an ERROR so that DBA can resolve it
manually. DBA can manually resolve the conflict in various ways:
skipping the entire transaction from the origin, choose the tuple
having a newer/older timestamp, etc.

In that sense, we can think of the feature proposed here as a feature
that provides a way to resolve the conflict that would originally
cause an ERROR by skipping the entire transaction. If we add a
solution that raises an ERROR for conflicts that don't originally
raise an ERROR (like DELETE and UPDATE conflict) in the future, we
will be able to manually skip each transaction for all types of
conflicts.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#17Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#14)
Re: Skipping logical replication transactions on subscriber side

On Thu, May 27, 2021 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 1:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 27, 2021 at 2:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, that makes sense but still not sure how will you identify if we
need to reset XID in case of failure doing that in the previous
attempt.

It's a just idea but we can record the failed transaction with XID as
well as its commit LSN passed? The sequence I'm thinking is,

1. the worker records the XID and commit LSN of the failed transaction
to a catalog.

When will you record this info? I am not sure if we can try to update
this when an error has occurred. We can think of using try..catch in
apply worker and then record it in catch on error but would that be
advisable? One random thought that occurred to me is to that apply
worker notifies such information to the launcher (or maybe another
process) which will log this information.

Yeah, I was concerned about that too and had the same idea. The
information still could not be written if the server crashes before
the launcher writes it. But I think it's an acceptable.

2. the user specifies how to resolve that conflict transaction
(currently only 'skip' is supported) and writes to the catalog.
3. the worker does the resolution method according to the catalog. If
the worker didn't start to apply those changes, it can skip the entire
transaction. If did, it rollbacks the transaction and ignores the
remaining.

The worker needs neither to reset information of the last failed
transaction nor to mark the conflicted transaction as resolved. The
worker will ignore that information when checking the catalog if the
commit LSN is passed.

So won't this require us to check the required info in the catalog
before applying each transaction? If so, that might be overhead, maybe
we can build some cache of the highest commitLSN that can be consulted
rather than the catalog table.

I think workers can cache that information when starts and invalidates
and reload the cache when the catalog gets updated. Specifying to
skip XID will update the catalog, invalidating the cache.

I think we need to think about when to
remove rows for which conflict has been resolved as we can't let that
information grow infinitely.

I guess we can update catalog tuples in place when another conflict
happens next time. The catalog tuple should be fixed size. The
already-resolved conflict will have the commit LSN older than its
replication origin's LSN.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#18Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#17)
Re: Skipping logical replication transactions on subscriber side

On Sat, May 29, 2021 at 8:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 27, 2021 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 1:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

1. the worker records the XID and commit LSN of the failed transaction
to a catalog.

When will you record this info? I am not sure if we can try to update
this when an error has occurred. We can think of using try..catch in
apply worker and then record it in catch on error but would that be
advisable? One random thought that occurred to me is to that apply
worker notifies such information to the launcher (or maybe another
process) which will log this information.

Yeah, I was concerned about that too and had the same idea. The
information still could not be written if the server crashes before
the launcher writes it. But I think it's an acceptable.

True, because even if the launcher restarts, the apply worker will
error out again and resend the information. I guess we can have an
error queue where apply workers can add their information and the
launcher will then process those. If we do that, then we need to
probably define what we want to do if the queue gets full, either
apply worker nudge launcher and wait or it can just throw an error and
continue. If you have any better ideas to share this information then
we can consider those as well.

2. the user specifies how to resolve that conflict transaction
(currently only 'skip' is supported) and writes to the catalog.
3. the worker does the resolution method according to the catalog. If
the worker didn't start to apply those changes, it can skip the entire
transaction. If did, it rollbacks the transaction and ignores the
remaining.

The worker needs neither to reset information of the last failed
transaction nor to mark the conflicted transaction as resolved. The
worker will ignore that information when checking the catalog if the
commit LSN is passed.

So won't this require us to check the required info in the catalog
before applying each transaction? If so, that might be overhead, maybe
we can build some cache of the highest commitLSN that can be consulted
rather than the catalog table.

I think workers can cache that information when starts and invalidates
and reload the cache when the catalog gets updated. Specifying to
skip XID will update the catalog, invalidating the cache.

I think we need to think about when to
remove rows for which conflict has been resolved as we can't let that
information grow infinitely.

I guess we can update catalog tuples in place when another conflict
happens next time. The catalog tuple should be fixed size. The
already-resolved conflict will have the commit LSN older than its
replication origin's LSN.

Okay, but I have a slight concern that we will keep xid in the system
which might have been no longer valid. So, we will keep this info
about subscribers around till one performs drop subscription,
hopefully, that doesn't lead to too many rows. This will be okay as
per the current design but say tomorrow we decide to parallelize the
apply for a subscription then there could be multiple errors
corresponding to a subscription and in that case, such a design might
appear quite limiting. One possibility could be that when the launcher
is periodically checking for new error messages, it can clean up the
conflicts catalog as well, or maybe autovacuum does this periodically
as it does for stats (via pgstat_vacuum_stat).

--
With Regards,
Amit Kapila.

#19Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#18)
Re: Skipping logical replication transactions on subscriber side

On Sat, May 29, 2021 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, May 29, 2021 at 8:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, May 27, 2021 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, May 27, 2021 at 1:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

1. the worker records the XID and commit LSN of the failed transaction
to a catalog.

When will you record this info? I am not sure if we can try to update
this when an error has occurred. We can think of using try..catch in
apply worker and then record it in catch on error but would that be
advisable? One random thought that occurred to me is to that apply
worker notifies such information to the launcher (or maybe another
process) which will log this information.

Yeah, I was concerned about that too and had the same idea. The
information still could not be written if the server crashes before
the launcher writes it. But I think it's an acceptable.

True, because even if the launcher restarts, the apply worker will
error out again and resend the information. I guess we can have an
error queue where apply workers can add their information and the
launcher will then process those. If we do that, then we need to
probably define what we want to do if the queue gets full, either
apply worker nudge launcher and wait or it can just throw an error and
continue. If you have any better ideas to share this information then
we can consider those as well.

+1 for using error queue. Maybe we need to avoid queuing the same
error more than once to avoid the catalog from being updated
frequently?

2. the user specifies how to resolve that conflict transaction
(currently only 'skip' is supported) and writes to the catalog.
3. the worker does the resolution method according to the catalog. If
the worker didn't start to apply those changes, it can skip the entire
transaction. If did, it rollbacks the transaction and ignores the
remaining.

The worker needs neither to reset information of the last failed
transaction nor to mark the conflicted transaction as resolved. The
worker will ignore that information when checking the catalog if the
commit LSN is passed.

So won't this require us to check the required info in the catalog
before applying each transaction? If so, that might be overhead, maybe
we can build some cache of the highest commitLSN that can be consulted
rather than the catalog table.

I think workers can cache that information when starts and invalidates
and reload the cache when the catalog gets updated. Specifying to
skip XID will update the catalog, invalidating the cache.

I think we need to think about when to
remove rows for which conflict has been resolved as we can't let that
information grow infinitely.

I guess we can update catalog tuples in place when another conflict
happens next time. The catalog tuple should be fixed size. The
already-resolved conflict will have the commit LSN older than its
replication origin's LSN.

Okay, but I have a slight concern that we will keep xid in the system
which might have been no longer valid. So, we will keep this info
about subscribers around till one performs drop subscription,
hopefully, that doesn't lead to too many rows. This will be okay as
per the current design but say tomorrow we decide to parallelize the
apply for a subscription then there could be multiple errors
corresponding to a subscription and in that case, such a design might
appear quite limiting. One possibility could be that when the launcher
is periodically checking for new error messages, it can clean up the
conflicts catalog as well, or maybe autovacuum does this periodically
as it does for stats (via pgstat_vacuum_stat).

Yeah, it's better to have a way to cleanup no longer valid entries in
the catalog in the case where the worker failed to remove it. I prefer
the former idea so far, so I'll implement it in a PoC patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#20Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#19)
Re: Skipping logical replication transactions on subscriber side

On Mon, May 31, 2021 at 12:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, May 29, 2021 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

1. the worker records the XID and commit LSN of the failed transaction
to a catalog.

When will you record this info? I am not sure if we can try to update
this when an error has occurred. We can think of using try..catch in
apply worker and then record it in catch on error but would that be
advisable? One random thought that occurred to me is to that apply
worker notifies such information to the launcher (or maybe another
process) which will log this information.

Yeah, I was concerned about that too and had the same idea. The
information still could not be written if the server crashes before
the launcher writes it. But I think it's an acceptable.

True, because even if the launcher restarts, the apply worker will
error out again and resend the information. I guess we can have an
error queue where apply workers can add their information and the
launcher will then process those. If we do that, then we need to
probably define what we want to do if the queue gets full, either
apply worker nudge launcher and wait or it can just throw an error and
continue. If you have any better ideas to share this information then
we can consider those as well.

+1 for using error queue. Maybe we need to avoid queuing the same
error more than once to avoid the catalog from being updated
frequently?

Yes, I think it is important because after logging the subscription
may still error again unless the user does something to skip or
resolve the conflict. I guess you need to check for the existence of
error in systable and or in the queue.

I guess we can update catalog tuples in place when another conflict
happens next time. The catalog tuple should be fixed size. The
already-resolved conflict will have the commit LSN older than its
replication origin's LSN.

Okay, but I have a slight concern that we will keep xid in the system
which might have been no longer valid. So, we will keep this info
about subscribers around till one performs drop subscription,
hopefully, that doesn't lead to too many rows. This will be okay as
per the current design but say tomorrow we decide to parallelize the
apply for a subscription then there could be multiple errors
corresponding to a subscription and in that case, such a design might
appear quite limiting. One possibility could be that when the launcher
is periodically checking for new error messages, it can clean up the
conflicts catalog as well, or maybe autovacuum does this periodically
as it does for stats (via pgstat_vacuum_stat).

Yeah, it's better to have a way to cleanup no longer valid entries in
the catalog in the case where the worker failed to remove it. I prefer
the former idea so far,

Which idea do you refer to here as former (cleaning up by launcher)?

so I'll implement it in a PoC patch.

Okay.

--
With Regards,
Amit Kapila.

#21Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Amit Kapila (#14)
Re: Skipping logical replication transactions on subscriber side

On 27.05.21 12:04, Amit Kapila wrote:

Also, I am thinking that instead of a stat view, do we need
to consider having a system table (pg_replication_conflicts or
something like that) for this because what if stats information is
lost (say either due to crash or due to udp packet loss), can we rely
on stats view for this?

Yeah, it seems better to use a catalog.

Okay.

Could you store it shared memory? You don't need it to be crash safe,
since the subscription will just run into the same error again after
restart. You just don't want it to be lost, like with the statistics
collector.

#22Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#21)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jun 1, 2021 at 12:55 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 27.05.21 12:04, Amit Kapila wrote:

Also, I am thinking that instead of a stat view, do we need
to consider having a system table (pg_replication_conflicts or
something like that) for this because what if stats information is
lost (say either due to crash or due to udp packet loss), can we rely
on stats view for this?

Yeah, it seems better to use a catalog.

Okay.

Could you store it shared memory? You don't need it to be crash safe,
since the subscription will just run into the same error again after
restart. You just don't want it to be lost, like with the statistics
collector.

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error. I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

Also, I think we can't assume after the restart we will get the same
error because the user can perform some operations after the restart
and before we try to apply the same transaction. It might be that the
user wanted to see all the errors before the user can set the skip
identifier (and or method).

I think the XID (or say another identifier like commitLSN) which we
want to use for skipping the transaction as specified by the user has
to be stored in the catalog because otherwise, after the restart we
won't remember it and the user won't know that he needs to set it
again. Now, say we have multiple skip identifiers (XIDs, commitLSN,
..), isn't it better to store all conflict-related information in a
separate catalog like pg_subscription_conflict or something like that.
I think it might be also better to later extend it for auto conflict
resolution where the user can specify auto conflict resolution info
for a subscription. Is it better to store all such information in
pg_subscription or have a separate catalog? It is possible that even
if we have a separate catalog for conflict info, we might not want to
store error info there.

--
With Regards,
Amit Kapila.

#23Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#22)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jun 1, 2021 at 1:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 12:55 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 27.05.21 12:04, Amit Kapila wrote:

Also, I am thinking that instead of a stat view, do we need
to consider having a system table (pg_replication_conflicts or
something like that) for this because what if stats information is
lost (say either due to crash or due to udp packet loss), can we rely
on stats view for this?

Yeah, it seems better to use a catalog.

Okay.

Could you store it shared memory? You don't need it to be crash safe,
since the subscription will just run into the same error again after
restart. You just don't want it to be lost, like with the statistics
collector.

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error.

I had the same concern. Particularly, the approach we currently
discussed is to skip the transaction based on the information written
by the worker rather than require the user to specify the XID.
Therefore, we will always require the worker to process the same large
transaction after the restart in order to skip the transaction.

I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

Another possible benefit of writing it to a catalog is that we can
replicate it to the physical standbys. If we have failover slots in
the future, the physical standby server also can resolve the conflict
without processing a possibly large transaction.

I think the XID (or say another identifier like commitLSN) which we
want to use for skipping the transaction as specified by the user has
to be stored in the catalog because otherwise, after the restart we
won't remember it and the user won't know that he needs to set it
again. Now, say we have multiple skip identifiers (XIDs, commitLSN,
..), isn't it better to store all conflict-related information in a
separate catalog like pg_subscription_conflict or something like that.
I think it might be also better to later extend it for auto conflict
resolution where the user can specify auto conflict resolution info
for a subscription. Is it better to store all such information in
pg_subscription or have a separate catalog? It is possible that even
if we have a separate catalog for conflict info, we might not want to
store error info there.

Just to be clear, we need to store only the conflict-related
information that cannot be resolved without manual intervention,
right? That is, conflicts cause an error, exiting the workers. In
general, replication conflicts include also conflicts that don’t cause
an error. I think that those conflicts don’t necessarily need to be
stored in the catalog and don’t require manual intervention.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#24Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#23)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jun 1, 2021 at 10:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jun 1, 2021 at 1:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 12:55 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 27.05.21 12:04, Amit Kapila wrote:

Also, I am thinking that instead of a stat view, do we need
to consider having a system table (pg_replication_conflicts or
something like that) for this because what if stats information is
lost (say either due to crash or due to udp packet loss), can we rely
on stats view for this?

Yeah, it seems better to use a catalog.

Okay.

Could you store it shared memory? You don't need it to be crash safe,
since the subscription will just run into the same error again after
restart. You just don't want it to be lost, like with the statistics
collector.

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error.

I had the same concern. Particularly, the approach we currently
discussed is to skip the transaction based on the information written
by the worker rather than require the user to specify the XID.

Yeah, but I was imagining that the user still needs to specify
something to indicate that we need to skip it, otherwise, we might try
to skip a transaction that the user wants to resolve by itself rather
than expecting us to skip it. Another point is if we don't store this
information in a persistent way then how will we restrict a user to
specify some random XID which is not even errored after restart.

Therefore, we will always require the worker to process the same large
transaction after the restart in order to skip the transaction.

I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

Another possible benefit of writing it to a catalog is that we can
replicate it to the physical standbys. If we have failover slots in
the future, the physical standby server also can resolve the conflict
without processing a possibly large transaction.

makes sense.

I think the XID (or say another identifier like commitLSN) which we
want to use for skipping the transaction as specified by the user has
to be stored in the catalog because otherwise, after the restart we
won't remember it and the user won't know that he needs to set it
again. Now, say we have multiple skip identifiers (XIDs, commitLSN,
..), isn't it better to store all conflict-related information in a
separate catalog like pg_subscription_conflict or something like that.
I think it might be also better to later extend it for auto conflict
resolution where the user can specify auto conflict resolution info
for a subscription. Is it better to store all such information in
pg_subscription or have a separate catalog? It is possible that even
if we have a separate catalog for conflict info, we might not want to
store error info there.

Just to be clear, we need to store only the conflict-related
information that cannot be resolved without manual intervention,
right? That is, conflicts cause an error, exiting the workers. In
general, replication conflicts include also conflicts that don’t cause
an error. I think that those conflicts don’t necessarily need to be
stored in the catalog and don’t require manual intervention.

Yeah, I think we want to record the error cases but which other
conflicts you are talking about here which doesn't lead to any sort of
error?

--
With Regards,
Amit Kapila.

#25Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#24)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jun 1, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 10:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jun 1, 2021 at 1:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 12:55 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 27.05.21 12:04, Amit Kapila wrote:

Also, I am thinking that instead of a stat view, do we need
to consider having a system table (pg_replication_conflicts or
something like that) for this because what if stats information is
lost (say either due to crash or due to udp packet loss), can we rely
on stats view for this?

Yeah, it seems better to use a catalog.

Okay.

Could you store it shared memory? You don't need it to be crash safe,
since the subscription will just run into the same error again after
restart. You just don't want it to be lost, like with the statistics
collector.

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error.

I had the same concern. Particularly, the approach we currently
discussed is to skip the transaction based on the information written
by the worker rather than require the user to specify the XID.

Yeah, but I was imagining that the user still needs to specify
something to indicate that we need to skip it, otherwise, we might try
to skip a transaction that the user wants to resolve by itself rather
than expecting us to skip it.

Yeah, currently what I'm thinking is that the worker writes the
conflict that caused an error somewhere. If the user wants to resolve
it manually they can specify the resolution method to the stopped
subscription. Until the user specifies the method and the worker
resolves it or some fields of the subscription such as subconninfo are
updated, the conflict is not resolved and the information lasts.

I think the XID (or say another identifier like commitLSN) which we
want to use for skipping the transaction as specified by the user has
to be stored in the catalog because otherwise, after the restart we
won't remember it and the user won't know that he needs to set it
again. Now, say we have multiple skip identifiers (XIDs, commitLSN,
..), isn't it better to store all conflict-related information in a
separate catalog like pg_subscription_conflict or something like that.
I think it might be also better to later extend it for auto conflict
resolution where the user can specify auto conflict resolution info
for a subscription. Is it better to store all such information in
pg_subscription or have a separate catalog? It is possible that even
if we have a separate catalog for conflict info, we might not want to
store error info there.

Just to be clear, we need to store only the conflict-related
information that cannot be resolved without manual intervention,
right? That is, conflicts cause an error, exiting the workers. In
general, replication conflicts include also conflicts that don’t cause
an error. I think that those conflicts don’t necessarily need to be
stored in the catalog and don’t require manual intervention.

Yeah, I think we want to record the error cases but which other
conflicts you are talking about here which doesn't lead to any sort of
error?

For example, I think it's one type of replication conflict that two
updates that arrived via logical replication or from the client update
the same record (e.g., having the same primary key) at the same time.
In that case an error doesn't happen and we always choose the update
that arrived later. But there are other possible resolution methods
such as choosing the one that arrived former, using the one having a
newer commit timestamp, using something like priority of the node, and
even raising an error so that the user manually resolves it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#26Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#25)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jun 1, 2021 at 1:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jun 1, 2021 at 2:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 10:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jun 1, 2021 at 1:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 12:55 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 27.05.21 12:04, Amit Kapila wrote:

Also, I am thinking that instead of a stat view, do we need
to consider having a system table (pg_replication_conflicts or
something like that) for this because what if stats information is
lost (say either due to crash or due to udp packet loss), can we rely
on stats view for this?

Yeah, it seems better to use a catalog.

Okay.

Could you store it shared memory? You don't need it to be crash safe,
since the subscription will just run into the same error again after
restart. You just don't want it to be lost, like with the statistics
collector.

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error.

I had the same concern. Particularly, the approach we currently
discussed is to skip the transaction based on the information written
by the worker rather than require the user to specify the XID.

Yeah, but I was imagining that the user still needs to specify
something to indicate that we need to skip it, otherwise, we might try
to skip a transaction that the user wants to resolve by itself rather
than expecting us to skip it.

Yeah, currently what I'm thinking is that the worker writes the
conflict that caused an error somewhere. If the user wants to resolve
it manually they can specify the resolution method to the stopped
subscription. Until the user specifies the method and the worker
resolves it or some fields of the subscription such as subconninfo are
updated, the conflict is not resolved and the information lasts.

I think we can work out such details but not sure tinkering anything
with subconninfo was not in my mind.

I think the XID (or say another identifier like commitLSN) which we
want to use for skipping the transaction as specified by the user has
to be stored in the catalog because otherwise, after the restart we
won't remember it and the user won't know that he needs to set it
again. Now, say we have multiple skip identifiers (XIDs, commitLSN,
..), isn't it better to store all conflict-related information in a
separate catalog like pg_subscription_conflict or something like that.
I think it might be also better to later extend it for auto conflict
resolution where the user can specify auto conflict resolution info
for a subscription. Is it better to store all such information in
pg_subscription or have a separate catalog? It is possible that even
if we have a separate catalog for conflict info, we might not want to
store error info there.

Just to be clear, we need to store only the conflict-related
information that cannot be resolved without manual intervention,
right? That is, conflicts cause an error, exiting the workers. In
general, replication conflicts include also conflicts that don’t cause
an error. I think that those conflicts don’t necessarily need to be
stored in the catalog and don’t require manual intervention.

Yeah, I think we want to record the error cases but which other
conflicts you are talking about here which doesn't lead to any sort of
error?

For example, I think it's one type of replication conflict that two
updates that arrived via logical replication or from the client update
the same record (e.g., having the same primary key) at the same time.
In that case an error doesn't happen and we always choose the update
that arrived later.

I think we choose whichever is earlier as we first try to find the
tuple in local rel and if not found then we silently ignore the
update/delete operation.

But there are other possible resolution methods
such as choosing the one that arrived former, using the one having a
newer commit timestamp, using something like priority of the node, and
even raising an error so that the user manually resolves it.

Agreed. I think we need to log only the ones which lead to error.

--
With Regards,
Amit Kapila.

#27Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Amit Kapila (#22)
Re: Skipping logical replication transactions on subscriber side

On 01.06.21 06:01, Amit Kapila wrote:

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error. I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

At least in current practice, skipping parts of the logical replication
stream on the subscriber is a rare, emergency-level operation when
something that shouldn't have happened happened. So it doesn't really
matter how costly it is. It's not going to be more costly than the
error happening in the first place. All you'd need is one shared memory
slot per subscription to store a xid to skip.

We will also want some proper conflict handling at some point. But I
think what is being discussed here is meant to be a repair tool, not a
policy tool, and I'm afraid it might get over-engineered.

#28Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#27)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jun 1, 2021 at 9:05 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 01.06.21 06:01, Amit Kapila wrote:

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error. I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

At least in current practice, skipping parts of the logical replication
stream on the subscriber is a rare, emergency-level operation when
something that shouldn't have happened happened. So it doesn't really
matter how costly it is. It's not going to be more costly than the
error happening in the first place. All you'd need is one shared memory
slot per subscription to store a xid to skip.

Leaving aside the performance point, how can we do by just storing
skip identifier (XID/commitLSN) in shared_memory? How will the apply
worker know about that information after restart? Do you expect the
user to set it again, if so, I think users might not like that? Also,
how will we prohibit users to give some identifier other than for
failed transactions, and if users provide that what should be our
action? Without that, if users provide XID of some in-progress
transaction, we might need to do more work (rollback) than just
skipping it.

We will also want some proper conflict handling at some point. But I
think what is being discussed here is meant to be a repair tool, not a
policy tool, and I'm afraid it might get over-engineered.

I got your point but I am also a bit skeptical that handling all
boundary cases might become tricky if we go with a simple shared
memory technique but OTOH if we can handle all such cases then it is
fine.

--
With Regards,
Amit Kapila.

#29Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#28)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jun 2, 2021 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 9:05 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 01.06.21 06:01, Amit Kapila wrote:

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error. I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

At least in current practice, skipping parts of the logical replication
stream on the subscriber is a rare, emergency-level operation when
something that shouldn't have happened happened. So it doesn't really
matter how costly it is. It's not going to be more costly than the
error happening in the first place. All you'd need is one shared memory
slot per subscription to store a xid to skip.

Leaving aside the performance point, how can we do by just storing
skip identifier (XID/commitLSN) in shared_memory? How will the apply
worker know about that information after restart? Do you expect the
user to set it again, if so, I think users might not like that? Also,
how will we prohibit users to give some identifier other than for
failed transactions, and if users provide that what should be our
action? Without that, if users provide XID of some in-progress
transaction, we might need to do more work (rollback) than just
skipping it.

I think the simplest solution would be to have a fixed-size array on
the shared memory to store information of skipping transactions on the
particular subscription. Given that this feature is meant to be a
repair tool in emergency cases, 32 or 64 entries seem enough. That
information should be visible to users via a system view and each
entry is cleared once the worker has skipped the transaction. Also, we
also would need to clear the entry if the meta information of the
subscription such as conninfo and slot name has been changed. The
worker reads that information at least when starting logical
replication. The worker receives changes from the publication and
checks if the transaction should be skipped when start to apply those
changes. If so the worker skips applying all changes of the
transaction and removes stream files if exist.

Regarding the point of how to check if the specified XID by the user
is valid, I guess it’s not easy to do that since XIDs sent from the
publisher are in random order. Considering the use case of this tool,
the situation seems like the logical replication gets stuck due to a
problem transaction and the worker repeatedly restarts and raises an
error. So I guess it also would be a good idea that the user can
specify to skip the first transaction (or first N transactions) since
the subscription starts logical replication. It’s less flexible but
seems enough to solve such a situation and doesn’t have such a problem
of validating the XID. If the functionality like letting the
subscriber know the oldest XID that is possibly sent is useful also
for other purposes it would also be a good idea to implement it but
not sure about other use cases.

Anyway, it seems to me that we need to consider the user interface
first, especially how and what the user specifies the transaction to
skip. My current feeling is that specifying XID is intuitive and
flexible but the user needs to have 2 steps: checks XID and then
specifies it, and there is a risk that the user mistakenly specifies a
wrong XID. On the other hand, the idea of specifying to skip the first
transaction doesn’t require the user to check and specify XID but is
less flexible, and “the first” transaction might be ambiguous for the
user.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#30Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#29)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jun 15, 2021 at 6:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jun 2, 2021 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 9:05 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 01.06.21 06:01, Amit Kapila wrote:

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error. I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

At least in current practice, skipping parts of the logical replication
stream on the subscriber is a rare, emergency-level operation when
something that shouldn't have happened happened. So it doesn't really
matter how costly it is. It's not going to be more costly than the
error happening in the first place. All you'd need is one shared memory
slot per subscription to store a xid to skip.

Leaving aside the performance point, how can we do by just storing
skip identifier (XID/commitLSN) in shared_memory? How will the apply
worker know about that information after restart? Do you expect the
user to set it again, if so, I think users might not like that? Also,
how will we prohibit users to give some identifier other than for
failed transactions, and if users provide that what should be our
action? Without that, if users provide XID of some in-progress
transaction, we might need to do more work (rollback) than just
skipping it.

I think the simplest solution would be to have a fixed-size array on
the shared memory to store information of skipping transactions on the
particular subscription. Given that this feature is meant to be a
repair tool in emergency cases, 32 or 64 entries seem enough.

IIUC, here you are talking about xids specified by the user to skip?
If so, then how will you get that information after the restart, and
why you need 32 or 64 entries for it?

Anyway, it seems to me that we need to consider the user interface
first, especially how and what the user specifies the transaction to
skip. My current feeling is that specifying XID is intuitive and
flexible but the user needs to have 2 steps: checks XID and then
specifies it, and there is a risk that the user mistakenly specifies a
wrong XID. On the other hand, the idea of specifying to skip the first
transaction doesn’t require the user to check and specify XID but is
less flexible, and “the first” transaction might be ambiguous for the
user.

I see your point in allowing to specify First N transactions but OTOH,
I am slightly afraid that it might lead to skipping some useful
transactions which will make replica out-of-sync. BTW, is there any
data point for the user to check how many transactions it can skip?
Normally, we won't be able to proceed till we resolve/skip the
transaction that is generating an error. One possibility could be that
we provide some *superuser* functions like
pg_logical_replication_skip_xact()/pg_logical_replication_reset_skip_xact()
which takes subscription name/id and xid as input parameters. Then, I
think we can store this information in ReplicationState and probably
try to map to originid from subscription name/id to retrieve that
info. We can probably document that the effects of these functions
won't last after the restart. Now, if this function is used by super
users then we can probably trust that they provide the XIDs that we
can trust to be skipped but OTOH making a restriction to allow these
functions to be used by superusers might restrict the usage of this
repair tool.

--
With Regards,
Amit Kapila.

#31Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#30)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jun 16, 2021 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 15, 2021 at 6:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jun 2, 2021 at 3:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 1, 2021 at 9:05 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 01.06.21 06:01, Amit Kapila wrote:

But, won't that be costly in cases where we have errors in the
processing of very large transactions? Subscription has to process all
the data before it gets an error. I think we can even imagine this
feature to be extended to use commitLSN as a skip candidate in which
case we can even avoid getting the data of that transaction from the
publisher. So if this information is persistent, the user can even set
the skip identifier after the restart before the publisher can send
all the data.

At least in current practice, skipping parts of the logical replication
stream on the subscriber is a rare, emergency-level operation when
something that shouldn't have happened happened. So it doesn't really
matter how costly it is. It's not going to be more costly than the
error happening in the first place. All you'd need is one shared memory
slot per subscription to store a xid to skip.

Leaving aside the performance point, how can we do by just storing
skip identifier (XID/commitLSN) in shared_memory? How will the apply
worker know about that information after restart? Do you expect the
user to set it again, if so, I think users might not like that? Also,
how will we prohibit users to give some identifier other than for
failed transactions, and if users provide that what should be our
action? Without that, if users provide XID of some in-progress
transaction, we might need to do more work (rollback) than just
skipping it.

I think the simplest solution would be to have a fixed-size array on
the shared memory to store information of skipping transactions on the
particular subscription. Given that this feature is meant to be a
repair tool in emergency cases, 32 or 64 entries seem enough.

IIUC, here you are talking about xids specified by the user to skip?

Yes. I think we need to store pairs of subid and xid.

If so, then how will you get that information after the restart, and
why you need 32 or 64 entries for it?

That information doesn't last after the restart. I think that the
situation that DBA uses this tool would be that they fix the
subscription on the spot. Once the subscription skipped the
transaction, the entry of that information is cleared. So I’m thinking
that we don’t need to hold many entries and it does not necessarily to
be durable. I think your below idea of storing that information in
ReplicationState seems better to me.

Anyway, it seems to me that we need to consider the user interface
first, especially how and what the user specifies the transaction to
skip. My current feeling is that specifying XID is intuitive and
flexible but the user needs to have 2 steps: checks XID and then
specifies it, and there is a risk that the user mistakenly specifies a
wrong XID. On the other hand, the idea of specifying to skip the first
transaction doesn’t require the user to check and specify XID but is
less flexible, and “the first” transaction might be ambiguous for the
user.

I see your point in allowing to specify First N transactions but OTOH,
I am slightly afraid that it might lead to skipping some useful
transactions which will make replica out-of-sync.

Agreed.

It might be better to skip only the first transaction.

BTW, is there any
data point for the user to check how many transactions it can skip?
Normally, we won't be able to proceed till we resolve/skip the
transaction that is generating an error. One possibility could be that
we provide some *superuser* functions like
pg_logical_replication_skip_xact()/pg_logical_replication_reset_skip_xact()
which takes subscription name/id and xid as input parameters. Then, I
think we can store this information in ReplicationState and probably
try to map to originid from subscription name/id to retrieve that
info. We can probably document that the effects of these functions
won't last after the restart.

ReplicationState seems a reasonable place to store that information.

Now, if this function is used by super
users then we can probably trust that they provide the XIDs that we
can trust to be skipped but OTOH making a restriction to allow these
functions to be used by superusers might restrict the usage of this
repair tool.

If we specify the subscription id or name, maybe we can allow also the
owner of subscription to do that operation?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#32Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#31)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Now, if this function is used by super
users then we can probably trust that they provide the XIDs that we
can trust to be skipped but OTOH making a restriction to allow these
functions to be used by superusers might restrict the usage of this
repair tool.

If we specify the subscription id or name, maybe we can allow also the
owner of subscription to do that operation?

Ah, the owner of the subscription must be superuser.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#33Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#32)
3 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jun 17, 2021 at 6:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Now, if this function is used by super
users then we can probably trust that they provide the XIDs that we
can trust to be skipped but OTOH making a restriction to allow these
functions to be used by superusers might restrict the usage of this
repair tool.

If we specify the subscription id or name, maybe we can allow also the
owner of subscription to do that operation?

Ah, the owner of the subscription must be superuser.

I've attached PoC patches.

0001 patch introduces the ability to skip transactions on the
subscriber side. We can specify XID to the subscription by like ALTER
SUBSCRIPTION test_sub SET SKIP TRANSACTION 100. The implementation
seems straightforward except for setting origin state. After skipping
the transaction we have to update the session origin state so that we
can start streaming the transaction next to the one that we just
skipped in case of the server crash or restarting the apply worker. We
set origin state to the commit WAL record. However, since we skip all
changes we don’t write any WAL even if we call CommitTransaction() at
the end of the skipped transaction. So the patch sets the origin state
to the transaction that updates the pg_subscription system catalog to
reset the skip XID. I think we need a discussion of this part.

With 0002 and 0003 patches, we report the error information in server
logs and the stats view, respectively. 0002 patch adds errcontext for
messages that happened during applying the changes:

ERROR: duplicate key value violates unique constraint "hoge_pkey"
DETAIL: Key (c)=(1) already exists.
CONTEXT: during apply of "INSERT" for relation "public.hoge" in
transaction with xid 736 committs 2021-06-27 12:12:30.053887+09

0003 patch adds pg_stat_logical_replication_error statistics view
discussed on another thread[1]/messages/by-id/DB35438F-9356-4841-89A0-412709EBD3AB@enterprisedb.com. The apply worker sends the error
information to the stats collector if an error happens during applying
changes. We can check those errors as follow:

postgres(1:25250)=# select * from pg_stat_logical_replication_error;
subname | relid | action | xid | last_failure
----------+-------+--------+-----+-------------------------------
test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
(1 row)

I added only columns required for the skipping transaction feature to
the view for now.

Please note that those patches are meant to evaluate the concept we've
discussed so far. Those don't have the doc update yet.

Regards,

[1]: /messages/by-id/DB35438F-9356-4841-89A0-412709EBD3AB@enterprisedb.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v1-0003-Add-pg_stat_logical_replication_error-statistics-.patchapplication/octet-stream; name=v1-0003-Add-pg_stat_logical_replication_error-statistics-.patch
v1-0002-Add-errcontext-to-errors-of-the-applying-logical-.patchapplication/octet-stream; name=v1-0002-Add-errcontext-to-errors-of-the-applying-logical-.patch
v1-0001-Add-ALTER-SUBSCRIPTION-SET-SKIP-TRANSACTION.patchapplication/octet-stream; name=v1-0001-Add-ALTER-SUBSCRIPTION-SET-SKIP-TRANSACTION.patch
#34Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#33)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jun 28, 2021 at 10:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jun 17, 2021 at 6:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Now, if this function is used by super
users then we can probably trust that they provide the XIDs that we
can trust to be skipped but OTOH making a restriction to allow these
functions to be used by superusers might restrict the usage of this
repair tool.

If we specify the subscription id or name, maybe we can allow also the
owner of subscription to do that operation?

Ah, the owner of the subscription must be superuser.

I've attached PoC patches.

0001 patch introduces the ability to skip transactions on the
subscriber side. We can specify XID to the subscription by like ALTER
SUBSCRIPTION test_sub SET SKIP TRANSACTION 100. The implementation
seems straightforward except for setting origin state. After skipping
the transaction we have to update the session origin state so that we
can start streaming the transaction next to the one that we just
skipped in case of the server crash or restarting the apply worker. We
set origin state to the commit WAL record. However, since we skip all
changes we don’t write any WAL even if we call CommitTransaction() at
the end of the skipped transaction. So the patch sets the origin state
to the transaction that updates the pg_subscription system catalog to
reset the skip XID. I think we need a discussion of this part.

IIUC, for streaming transactions you are allowing stream file to be
created and then remove it at stream_commit/stream_abort time, is that
right? If so, in which cases are you imagining the files to be
created, is it in the case of relation message
(LOGICAL_REP_MSG_RELATION)? Assuming the previous two statements are
correct, this will skip the relation message as well as part of the
removal of stream files which might lead to a problem because the
publisher won't know that we have skipped the relation message and it
won't send it again. This can cause problems while processing the next
messages.

With 0002 and 0003 patches, we report the error information in server
logs and the stats view, respectively. 0002 patch adds errcontext for
messages that happened during applying the changes:

ERROR: duplicate key value violates unique constraint "hoge_pkey"
DETAIL: Key (c)=(1) already exists.
CONTEXT: during apply of "INSERT" for relation "public.hoge" in
transaction with xid 736 committs 2021-06-27 12:12:30.053887+09

0003 patch adds pg_stat_logical_replication_error statistics view
discussed on another thread[1]. The apply worker sends the error
information to the stats collector if an error happens during applying
changes. We can check those errors as follow:

postgres(1:25250)=# select * from pg_stat_logical_replication_error;
subname | relid | action | xid | last_failure
----------+-------+--------+-----+-------------------------------
test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
(1 row)

I added only columns required for the skipping transaction feature to
the view for now.

Isn't it better to add an error message if possible?

Please note that those patches are meant to evaluate the concept we've
discussed so far. Those don't have the doc update yet.

I think your patch is on the lines of what we have discussed. It would
be good if you can update docs and add few tests.

--
With Regards,
Amit Kapila.

#35Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#34)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jun 30, 2021 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 28, 2021 at 10:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

0003 patch adds pg_stat_logical_replication_error statistics view
discussed on another thread[1]. The apply worker sends the error
information to the stats collector if an error happens during applying
changes. We can check those errors as follow:

postgres(1:25250)=# select * from pg_stat_logical_replication_error;
subname | relid | action | xid | last_failure
----------+-------+--------+-----+-------------------------------
test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
(1 row)

I added only columns required for the skipping transaction feature to
the view for now.

Isn't it better to add an error message if possible?

Don't we want to clear stats at drop subscription as well? We do drop
database stats in dropdb via pgstat_drop_database, so I think we need
to clear subscription stats at the time of drop subscription.

In the 0003 patch, if I am reading it correctly then the patch is not
doing anything for tablesync worker. It is not clear to me at this
stage what exactly we want to do about it? Do we want to just ignore
errors from tablesync worker and let the system behave as it is
without this feature? If we want to do anything then I think the way
to skip the initial table sync would be to behave like the user has
given 'copy_data' option as false.

--
With Regards,
Amit Kapila.

#36Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#34)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jun 30, 2021 at 8:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 28, 2021 at 10:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jun 17, 2021 at 6:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Now, if this function is used by super
users then we can probably trust that they provide the XIDs that we
can trust to be skipped but OTOH making a restriction to allow these
functions to be used by superusers might restrict the usage of this
repair tool.

If we specify the subscription id or name, maybe we can allow also the
owner of subscription to do that operation?

Ah, the owner of the subscription must be superuser.

I've attached PoC patches.

0001 patch introduces the ability to skip transactions on the
subscriber side. We can specify XID to the subscription by like ALTER
SUBSCRIPTION test_sub SET SKIP TRANSACTION 100. The implementation
seems straightforward except for setting origin state. After skipping
the transaction we have to update the session origin state so that we
can start streaming the transaction next to the one that we just
skipped in case of the server crash or restarting the apply worker. We
set origin state to the commit WAL record. However, since we skip all
changes we don’t write any WAL even if we call CommitTransaction() at
the end of the skipped transaction. So the patch sets the origin state
to the transaction that updates the pg_subscription system catalog to
reset the skip XID. I think we need a discussion of this part.

IIUC, for streaming transactions you are allowing stream file to be
created and then remove it at stream_commit/stream_abort time, is that
right?

Right.

If so, in which cases are you imagining the files to be
created, is it in the case of relation message
(LOGICAL_REP_MSG_RELATION)? Assuming the previous two statements are
correct, this will skip the relation message as well as part of the
removal of stream files which might lead to a problem because the
publisher won't know that we have skipped the relation message and it
won't send it again. This can cause problems while processing the next
messages.

Good point. In the current patch, we skip all streamed changes at
stream_commit/abort but it should apply changes while skipping only
data-modification changes as we do for non-stream changes.

With 0002 and 0003 patches, we report the error information in server
logs and the stats view, respectively. 0002 patch adds errcontext for
messages that happened during applying the changes:

ERROR: duplicate key value violates unique constraint "hoge_pkey"
DETAIL: Key (c)=(1) already exists.
CONTEXT: during apply of "INSERT" for relation "public.hoge" in
transaction with xid 736 committs 2021-06-27 12:12:30.053887+09

0003 patch adds pg_stat_logical_replication_error statistics view
discussed on another thread[1]. The apply worker sends the error
information to the stats collector if an error happens during applying
changes. We can check those errors as follow:

postgres(1:25250)=# select * from pg_stat_logical_replication_error;
subname | relid | action | xid | last_failure
----------+-------+--------+-----+-------------------------------
test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
(1 row)

I added only columns required for the skipping transaction feature to
the view for now.

Isn't it better to add an error message if possible?

Please note that those patches are meant to evaluate the concept we've
discussed so far. Those don't have the doc update yet.

I think your patch is on the lines of what we have discussed. It would
be good if you can update docs and add few tests.

Okay. I'll incorporate the above suggestions in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#37Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#36)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jul 1, 2021 at 1:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jun 30, 2021 at 8:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

If so, in which cases are you imagining the files to be
created, is it in the case of relation message
(LOGICAL_REP_MSG_RELATION)? Assuming the previous two statements are
correct, this will skip the relation message as well as part of the
removal of stream files which might lead to a problem because the
publisher won't know that we have skipped the relation message and it
won't send it again. This can cause problems while processing the next
messages.

Good point. In the current patch, we skip all streamed changes at
stream_commit/abort but it should apply changes while skipping only
data-modification changes as we do for non-stream changes.

Right.

--
With Regards,
Amit Kapila.

#38Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#35)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jul 1, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 30, 2021 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 28, 2021 at 10:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

0003 patch adds pg_stat_logical_replication_error statistics view
discussed on another thread[1]. The apply worker sends the error
information to the stats collector if an error happens during applying
changes. We can check those errors as follow:

postgres(1:25250)=# select * from pg_stat_logical_replication_error;
subname | relid | action | xid | last_failure
----------+-------+--------+-----+-------------------------------
test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
(1 row)

I added only columns required for the skipping transaction feature to
the view for now.

Isn't it better to add an error message if possible?

Don't we want to clear stats at drop subscription as well? We do drop
database stats in dropdb via pgstat_drop_database, so I think we need
to clear subscription stats at the time of drop subscription.

Yes, it needs to be cleared. In the 0003 patch, pgstat_vacuum_stat()
sends the message to clear the stats. I think it's better to have
pgstat_vacuum_stat() do that job similar to dropping replication slot
statistics rather than relying on the single message send at DROP
SUBSCRIPTION. I've considered doing both: sending the message at DROP
SUBSCRIPTION and periodical checking by pgstat_vacuum_stat(), but
dropping subscription not setting a replication slot is able to
rollback. So we need to send it only at commit time. Given that we
don’t necessarily need the stats to be updated immediately, I think
it’s reasonable to go with only a way of pgstat_vacuum_stat().

In the 0003 patch, if I am reading it correctly then the patch is not
doing anything for tablesync worker. It is not clear to me at this
stage what exactly we want to do about it? Do we want to just ignore
errors from tablesync worker and let the system behave as it is
without this feature? If we want to do anything then I think the way
to skip the initial table sync would be to behave like the user has
given 'copy_data' option as false.

It might be better to have also sync workers report errors, even if
SKIP TRANSACTION feature doesn’t support anything for initial table
synchronization. From the user perspective, The initial table
synchronization is also the part of logical replication operations. If
we report only error information of applying logical changes, it could
confuse users.

But I’m not sure about the way to skip the initial table
synchronization. Once we set `copy_data` to false, all table
synchronizations are disabled. Some of them might have been able to
synchronize successfully. It might be useful if the user can disable
the table initialization for the particular tables.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#39Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#38)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jul 1, 2021 at 6:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 1, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Don't we want to clear stats at drop subscription as well? We do drop
database stats in dropdb via pgstat_drop_database, so I think we need
to clear subscription stats at the time of drop subscription.

Yes, it needs to be cleared. In the 0003 patch, pgstat_vacuum_stat()
sends the message to clear the stats. I think it's better to have
pgstat_vacuum_stat() do that job similar to dropping replication slot
statistics rather than relying on the single message send at DROP
SUBSCRIPTION. I've considered doing both: sending the message at DROP
SUBSCRIPTION and periodical checking by pgstat_vacuum_stat(), but
dropping subscription not setting a replication slot is able to
rollback. So we need to send it only at commit time. Given that we
don’t necessarily need the stats to be updated immediately, I think
it’s reasonable to go with only a way of pgstat_vacuum_stat().

Okay, that makes sense. Can we consider sending the multiple ids in
one message as we do for relations or functions in
pgstat_vacuum_stat()? That will reduce some message traffic. BTW, do
we have some way to avoid wrapping around the OID before we clean up
via pgstat_vacuum_stat()?

In the 0003 patch, if I am reading it correctly then the patch is not
doing anything for tablesync worker. It is not clear to me at this
stage what exactly we want to do about it? Do we want to just ignore
errors from tablesync worker and let the system behave as it is
without this feature? If we want to do anything then I think the way
to skip the initial table sync would be to behave like the user has
given 'copy_data' option as false.

It might be better to have also sync workers report errors, even if
SKIP TRANSACTION feature doesn’t support anything for initial table
synchronization. From the user perspective, The initial table
synchronization is also the part of logical replication operations. If
we report only error information of applying logical changes, it could
confuse users.

But I’m not sure about the way to skip the initial table
synchronization. Once we set `copy_data` to false, all table
synchronizations are disabled. Some of them might have been able to
synchronize successfully. It might be useful if the user can disable
the table initialization for the particular tables.

True but I guess the user can wait for all the tablesyncs to either
finish or get an error corresponding to the table sync. After that, it
can use 'copy_data' as false. This is not a very good method but I
don't see any other option. I guess whatever is the case logging
errors from tablesyncs is anyway not a bad idea.

Instead of using the syntax "ALTER SUBSCRIPTION name SET SKIP
TRANSACTION Iconst", isn't it better to use it as a subscription
option like Mark has done for his patch (disable_on_error)?

I am slightly nervous about this way of allowing the user to skip the
errors because if it is not used carefully then it can easily lead to
inconsistent data on the subscriber. I agree that as only superusers
will be allowed to use this option and we can document clearly the
side-effects, the risk could be reduced but is that sufficient? It is
not that we don't have any other tool which allows users to make their
data inconsistent (one recent example is functions
(heap_force_kill/heap_force_freeze) in pg_surgery module) if not used
carefully but it might be better to not expose such tools.

OTOH, if we use the error infrastructure of this patch and allow users
to just disable the subscription on error as was proposed by Mark then
that can't lead to any inconsistency.

What do you think?

--
With Regards,
Amit Kapila.

#40Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#39)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 5, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 1, 2021 at 6:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 1, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Instead of using the syntax "ALTER SUBSCRIPTION name SET SKIP
TRANSACTION Iconst", isn't it better to use it as a subscription
option like Mark has done for his patch (disable_on_error)?

I am slightly nervous about this way of allowing the user to skip the
errors because if it is not used carefully then it can easily lead to
inconsistent data on the subscriber. I agree that as only superusers
will be allowed to use this option and we can document clearly the
side-effects, the risk could be reduced but is that sufficient?

I see that users can create a similar effect by using
pg_replication_origin_advance() and it is mentioned in the docs that
careless use of this function can lead to inconsistently replicated
data. So, this new way doesn't seem to be any more dangerous than what
we already have.

--
With Regards,
Amit Kapila.

#41Alexey Lesovsky
Alexey Lesovsky
lesovsky@gmail.com
In reply to: Masahiko Sawada (#33)
Re: Skipping logical replication transactions on subscriber side

Hi,
Have a few notes about pg_stat_logical_replication_error from the DBA point
of view (which will use this view in the future).
1. As I understand it, this view might contain many errors related to
different subscriptions. It is better to name
"pg_stat_logical_replication_errors" using the plural form (like this done
for stat views for tables, indexes, functions). Also, I'd like to suggest
thinking twice about the view name (and function used in view DDL) -
"pg_stat_logical_replication_error" contains very common "logical
replication" words, but the view contains errors related to subscriptions
only. In the future there could be other kinds of errors related to logical
replication, but not related to subscriptions - what will you do?
2. Add a field with database name or id - it helps to quickly understand to
which database the subscription belongs.
3. Add a counter field with total number of errors - it helps to calculate
errors rates and aggregations (sum), and don't lose information about
errors between view checks.
4. Add text of last error (if it will not be too expensive).
5. Rename the "action" field to "command", as I know this is right from
terminology point of view.

Finally, the view might seems like this:

postgres(1:25250)=# select * from pg_stat_logical_replication_errors;
subname | datid | relid | command | xid | total | last_failure |
last_failure_text
----------+--------+-------+---------+-----+-------+-------------------------------+---------------------------
sub_1 | 12345 | 16384 | INSERT | 736 | 145 | 2021-06-27 12:12:45.142675+09
| something goes wrong...
sub_2 | 12346 | 16458 | UPDATE | 845 | 12 | 2021-06-27 12:16:01.458752+09 |
hmm, something goes wrong

Regards, Alexey

On Mon, Jul 5, 2021 at 2:59 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Thu, Jun 17, 2021 at 6:20 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

Now, if this function is used by super
users then we can probably trust that they provide the XIDs that we
can trust to be skipped but OTOH making a restriction to allow these
functions to be used by superusers might restrict the usage of this
repair tool.

If we specify the subscription id or name, maybe we can allow also the
owner of subscription to do that operation?

Ah, the owner of the subscription must be superuser.

I've attached PoC patches.

0001 patch introduces the ability to skip transactions on the
subscriber side. We can specify XID to the subscription by like ALTER
SUBSCRIPTION test_sub SET SKIP TRANSACTION 100. The implementation
seems straightforward except for setting origin state. After skipping
the transaction we have to update the session origin state so that we
can start streaming the transaction next to the one that we just
skipped in case of the server crash or restarting the apply worker. We
set origin state to the commit WAL record. However, since we skip all
changes we don’t write any WAL even if we call CommitTransaction() at
the end of the skipped transaction. So the patch sets the origin state
to the transaction that updates the pg_subscription system catalog to
reset the skip XID. I think we need a discussion of this part.

With 0002 and 0003 patches, we report the error information in server
logs and the stats view, respectively. 0002 patch adds errcontext for
messages that happened during applying the changes:

ERROR: duplicate key value violates unique constraint "hoge_pkey"
DETAIL: Key (c)=(1) already exists.
CONTEXT: during apply of "INSERT" for relation "public.hoge" in
transaction with xid 736 committs 2021-06-27 12:12:30.053887+09

0003 patch adds pg_stat_logical_replication_error statistics view
discussed on another thread[1]. The apply worker sends the error
information to the stats collector if an error happens during applying
changes. We can check those errors as follow:

postgres(1:25250)=# select * from pg_stat_logical_replication_error;
subname | relid | action | xid | last_failure
----------+-------+--------+-----+-------------------------------
test_sub | 16384 | INSERT | 736 | 2021-06-27 12:12:45.142675+09
(1 row)

I added only columns required for the skipping transaction feature to
the view for now.

Please note that those patches are meant to evaluate the concept we've
discussed so far. Those don't have the doc update yet.

Regards,

[1]
/messages/by-id/DB35438F-9356-4841-89A0-412709EBD3AB@enterprisedb.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

--
С уважением Алексей В. Лесовский

#42Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Alexey Lesovsky (#41)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 5, 2021 at 7:33 PM Alexey Lesovsky <lesovsky@gmail.com> wrote:

Hi,
Have a few notes about pg_stat_logical_replication_error from the DBA point of view (which will use this view in the future).

Thank you for the comments!

1. As I understand it, this view might contain many errors related to different subscriptions. It is better to name "pg_stat_logical_replication_errors" using the plural form (like this done for stat views for tables, indexes, functions).

Agreed.

Also, I'd like to suggest thinking twice about the view name (and function used in view DDL) - "pg_stat_logical_replication_error" contains very common "logical replication" words, but the view contains errors related to subscriptions only. In the future there could be other kinds of errors related to logical replication, but not related to subscriptions - what will you do?

Is pg_stat_subscription_errors or
pg_stat_logical_replication_apply_errors better?

2. Add a field with database name or id - it helps to quickly understand to which database the subscription belongs.

Agreed.

3. Add a counter field with total number of errors - it helps to calculate errors rates and aggregations (sum), and don't lose information about errors between view checks.

Do you mean to increment the error count if the error (command, xid,
and relid) is the same as the previous one? or to have the total
number of errors per subscription? And what can we infer from the
error rates and aggregations?

4. Add text of last error (if it will not be too expensive).

Agreed.

5. Rename the "action" field to "command", as I know this is right from terminology point of view.

Okay.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#43Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#39)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 5, 2021 at 6:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 1, 2021 at 6:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 1, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Don't we want to clear stats at drop subscription as well? We do drop
database stats in dropdb via pgstat_drop_database, so I think we need
to clear subscription stats at the time of drop subscription.

Yes, it needs to be cleared. In the 0003 patch, pgstat_vacuum_stat()
sends the message to clear the stats. I think it's better to have
pgstat_vacuum_stat() do that job similar to dropping replication slot
statistics rather than relying on the single message send at DROP
SUBSCRIPTION. I've considered doing both: sending the message at DROP
SUBSCRIPTION and periodical checking by pgstat_vacuum_stat(), but
dropping subscription not setting a replication slot is able to
rollback. So we need to send it only at commit time. Given that we
don’t necessarily need the stats to be updated immediately, I think
it’s reasonable to go with only a way of pgstat_vacuum_stat().

Okay, that makes sense. Can we consider sending the multiple ids in
one message as we do for relations or functions in
pgstat_vacuum_stat()? That will reduce some message traffic.

Yes. Since subscriptions are objects that are not frequently created
and dropped I prioritized not to increase the message type. But if we
do that for subscriptions, is it better to do that for replication
slots as well? It seems to me that the lifetime of subscriptions and
replication slots are similar.

BTW, do
we have some way to avoid wrapping around the OID before we clean up
via pgstat_vacuum_stat()?

As far as I know there is not.

In the 0003 patch, if I am reading it correctly then the patch is not
doing anything for tablesync worker. It is not clear to me at this
stage what exactly we want to do about it? Do we want to just ignore
errors from tablesync worker and let the system behave as it is
without this feature? If we want to do anything then I think the way
to skip the initial table sync would be to behave like the user has
given 'copy_data' option as false.

It might be better to have also sync workers report errors, even if
SKIP TRANSACTION feature doesn’t support anything for initial table
synchronization. From the user perspective, The initial table
synchronization is also the part of logical replication operations. If
we report only error information of applying logical changes, it could
confuse users.

But I’m not sure about the way to skip the initial table
synchronization. Once we set `copy_data` to false, all table
synchronizations are disabled. Some of them might have been able to
synchronize successfully. It might be useful if the user can disable
the table initialization for the particular tables.

True but I guess the user can wait for all the tablesyncs to either
finish or get an error corresponding to the table sync. After that, it
can use 'copy_data' as false. This is not a very good method but I
don't see any other option. I guess whatever is the case logging
errors from tablesyncs is anyway not a bad idea.

Instead of using the syntax "ALTER SUBSCRIPTION name SET SKIP
TRANSACTION Iconst", isn't it better to use it as a subscription
option like Mark has done for his patch (disable_on_error)?

According to the doc, ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION. Therefore, we can
specify a subset of parameters that can be specified by CREATE
SUBSCRIPTION. It makes sense to me for 'disable_on_error' since it can
be specified by CREATE SUBSCRIPTION. Whereas SKIP TRANSACTION stuff
cannot be done. Are you concerned about adding a syntax to ALTER
SUBSCRIPTION?

I am slightly nervous about this way of allowing the user to skip the
errors because if it is not used carefully then it can easily lead to
inconsistent data on the subscriber. I agree that as only superusers
will be allowed to use this option and we can document clearly the
side-effects, the risk could be reduced but is that sufficient? It is
not that we don't have any other tool which allows users to make their
data inconsistent (one recent example is functions
(heap_force_kill/heap_force_freeze) in pg_surgery module) if not used
carefully but it might be better to not expose such tools.

OTOH, if we use the error infrastructure of this patch and allow users
to just disable the subscription on error as was proposed by Mark then
that can't lead to any inconsistency.

What do you think?

As you mentioned in another mail, what we can do with this feature is
the same as pg_replication_origin_advance(). Like there is a risk that
the user specifies a wrong LSN to pg_replication_origin_advance(),
there is a similar risk at this feature.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#44Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#42)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jul 6, 2021 at 11:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 5, 2021 at 7:33 PM Alexey Lesovsky <lesovsky@gmail.com> wrote:

Hi,
Have a few notes about pg_stat_logical_replication_error from the DBA point of view (which will use this view in the future).

Thank you for the comments!

1. As I understand it, this view might contain many errors related to different subscriptions. It is better to name "pg_stat_logical_replication_errors" using the plural form (like this done for stat views for tables, indexes, functions).

Agreed.

Also, I'd like to suggest thinking twice about the view name (and function used in view DDL) - "pg_stat_logical_replication_error" contains very common "logical replication" words, but the view contains errors related to subscriptions only. In the future there could be other kinds of errors related to logical replication, but not related to subscriptions - what will you do?

Is pg_stat_subscription_errors or
pg_stat_logical_replication_apply_errors better?

Few more to consider: pg_stat_apply_failures,
pg_stat_subscription_failures, pg_stat_apply_conflicts,
pg_stat_subscription_conflicts.

2. Add a field with database name or id - it helps to quickly understand to which database the subscription belongs.

Agreed.

3. Add a counter field with total number of errors - it helps to calculate errors rates and aggregations (sum), and don't lose information about errors between view checks.

Do you mean to increment the error count if the error (command, xid,
and relid) is the same as the previous one? or to have the total
number of errors per subscription?

I would prefer the total number of errors per subscription.

And what can we infer from the
error rates and aggregations?

Say, if we add a column like failure_type/conflict_type as well and
one would be interested in knowing how many conflicts are due to
primary key conflicts vs. update/delete conflicts.

You might want to consider keeping this view patch before the skip_xid
patch in your patch series as this will be base for the skip_xid
patch.

--
With Regards,
Amit Kapila.

#45Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#43)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jul 6, 2021 at 12:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 5, 2021 at 6:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 1, 2021 at 6:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 1, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Don't we want to clear stats at drop subscription as well? We do drop
database stats in dropdb via pgstat_drop_database, so I think we need
to clear subscription stats at the time of drop subscription.

Yes, it needs to be cleared. In the 0003 patch, pgstat_vacuum_stat()
sends the message to clear the stats. I think it's better to have
pgstat_vacuum_stat() do that job similar to dropping replication slot
statistics rather than relying on the single message send at DROP
SUBSCRIPTION. I've considered doing both: sending the message at DROP
SUBSCRIPTION and periodical checking by pgstat_vacuum_stat(), but
dropping subscription not setting a replication slot is able to
rollback. So we need to send it only at commit time. Given that we
don’t necessarily need the stats to be updated immediately, I think
it’s reasonable to go with only a way of pgstat_vacuum_stat().

Okay, that makes sense. Can we consider sending the multiple ids in
one message as we do for relations or functions in
pgstat_vacuum_stat()? That will reduce some message traffic.

Yes. Since subscriptions are objects that are not frequently created
and dropped I prioritized not to increase the message type. But if we
do that for subscriptions, is it better to do that for replication
slots as well? It seems to me that the lifetime of subscriptions and
replication slots are similar.

Yeah, I think it makes sense to do for both, we can work on slots
patch separately. I don't see a reason why we shouldn't send a single
message for multiple clear/drop entries.

True but I guess the user can wait for all the tablesyncs to either
finish or get an error corresponding to the table sync. After that, it
can use 'copy_data' as false. This is not a very good method but I
don't see any other option. I guess whatever is the case logging
errors from tablesyncs is anyway not a bad idea.

Instead of using the syntax "ALTER SUBSCRIPTION name SET SKIP
TRANSACTION Iconst", isn't it better to use it as a subscription
option like Mark has done for his patch (disable_on_error)?

According to the doc, ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION. Therefore, we can
specify a subset of parameters that can be specified by CREATE
SUBSCRIPTION. It makes sense to me for 'disable_on_error' since it can
be specified by CREATE SUBSCRIPTION. Whereas SKIP TRANSACTION stuff
cannot be done. Are you concerned about adding a syntax to ALTER
SUBSCRIPTION?

Both for additional syntax and consistency with disable_on_error.
Isn't it just a current implementation that Alter only allows to
change parameters supported by Create? Is there a reason why we can't
allow Alter to set/change some parameters not supported by Create?

--
With Regards,
Amit Kapila.

#46Alexey Lesovsky
Alexey Lesovsky
lesovsky@gmail.com
In reply to: Masahiko Sawada (#42)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jul 6, 2021 at 10:58 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Also, I'd like to suggest thinking twice about the view name (and

function used in view DDL) - "pg_stat_logical_replication_error" contains
very common "logical replication" words, but the view contains errors
related to subscriptions only. In the future there could be other kinds of
errors related to logical replication, but not related to subscriptions -
what will you do?

Is pg_stat_subscription_errors or
pg_stat_logical_replication_apply_errors better?

It seems to me 'pg_stat_subscription_conflicts' proposed by Amit Kapila is
the most suitable, because it directly says about conflicts occurring on
the subscription side. The name 'pg_stat_subscription_errors' is also good,
especially in case of further extension if some kind of similar errors will
be tracked.

3. Add a counter field with total number of errors - it helps to

calculate errors rates and aggregations (sum), and don't lose information
about errors between view checks.

Do you mean to increment the error count if the error (command, xid,
and relid) is the same as the previous one? or to have the total
number of errors per subscription? And what can we infer from the
error rates and aggregations?

To be honest, I hurried up when I wrote the first email, and read only
about stats view. Later, I read the starting email about the patch and
rethought this note.

As I understand, when the conflict occurs, replication stops (until
conflict is resolved), an error appears in the stats view. Now, no new
errors can occur in the blocked subscription. Hence, there are impossible
situations when many errors (like spikes) have occurred and a user didn't
see that. If I am correct in my assumption, there is no need for counters.
They are necessary only when errors might occur too frequently (like
pg_stat_database.deadlocks). But if this is possible, I would prefer the
total number of errors per subscription, as also proposed by Amit.

Under "error rates and aggregations" I also mean in the context of when a
high number of errors occured in a short period of time. If a user can
read the "total errors" counter and keep this metric in his monitoring
system, he will be able to calculate rates over time using functions in the
monitoring system. This is extremely useful.

I also would like to clarify, when conflict is resolved - the error record
is cleared or kept in the view? If it is cleared, the error counter is
required (because we don't want to lose all history of errors). If it is
kept - the flag telling about the error is resolved is needed (or set xid
to NULL). I mean when the user is watching the view, he should be able to
identify if the error has already been resolved or not.

--
Regards, Alexey

On Tue, Jul 6, 2021 at 10:58 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Mon, Jul 5, 2021 at 7:33 PM Alexey Lesovsky <lesovsky@gmail.com> wrote:

Hi,
Have a few notes about pg_stat_logical_replication_error from the DBA

point of view (which will use this view in the future).

Thank you for the comments!

1. As I understand it, this view might contain many errors related to

different subscriptions. It is better to name
"pg_stat_logical_replication_errors" using the plural form (like this done
for stat views for tables, indexes, functions).

Agreed.

Also, I'd like to suggest thinking twice about the view name (and

function used in view DDL) - "pg_stat_logical_replication_error" contains
very common "logical replication" words, but the view contains errors
related to subscriptions only. In the future there could be other kinds of
errors related to logical replication, but not related to subscriptions -
what will you do?

Is pg_stat_subscription_errors or
pg_stat_logical_replication_apply_errors better?

2. Add a field with database name or id - it helps to quickly understand

to which database the subscription belongs.

Agreed.

3. Add a counter field with total number of errors - it helps to

calculate errors rates and aggregations (sum), and don't lose information
about errors between view checks.

Do you mean to increment the error count if the error (command, xid,
and relid) is the same as the previous one? or to have the total
number of errors per subscription? And what can we infer from the
error rates and aggregations?

4. Add text of last error (if it will not be too expensive).

Agreed.

5. Rename the "action" field to "command", as I know this is right from

terminology point of view.

Okay.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

--
С уважением Алексей В. Лесовский

#47Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#45)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jul 6, 2021 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jul 6, 2021 at 12:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 5, 2021 at 6:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 1, 2021 at 6:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 1, 2021 at 12:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Don't we want to clear stats at drop subscription as well? We do drop
database stats in dropdb via pgstat_drop_database, so I think we need
to clear subscription stats at the time of drop subscription.

Yes, it needs to be cleared. In the 0003 patch, pgstat_vacuum_stat()
sends the message to clear the stats. I think it's better to have
pgstat_vacuum_stat() do that job similar to dropping replication slot
statistics rather than relying on the single message send at DROP
SUBSCRIPTION. I've considered doing both: sending the message at DROP
SUBSCRIPTION and periodical checking by pgstat_vacuum_stat(), but
dropping subscription not setting a replication slot is able to
rollback. So we need to send it only at commit time. Given that we
don’t necessarily need the stats to be updated immediately, I think
it’s reasonable to go with only a way of pgstat_vacuum_stat().

Okay, that makes sense. Can we consider sending the multiple ids in
one message as we do for relations or functions in
pgstat_vacuum_stat()? That will reduce some message traffic.

Yes. Since subscriptions are objects that are not frequently created
and dropped I prioritized not to increase the message type. But if we
do that for subscriptions, is it better to do that for replication
slots as well? It seems to me that the lifetime of subscriptions and
replication slots are similar.

Yeah, I think it makes sense to do for both, we can work on slots
patch separately. I don't see a reason why we shouldn't send a single
message for multiple clear/drop entries.

+1

True but I guess the user can wait for all the tablesyncs to either
finish or get an error corresponding to the table sync. After that, it
can use 'copy_data' as false. This is not a very good method but I
don't see any other option. I guess whatever is the case logging
errors from tablesyncs is anyway not a bad idea.

Instead of using the syntax "ALTER SUBSCRIPTION name SET SKIP
TRANSACTION Iconst", isn't it better to use it as a subscription
option like Mark has done for his patch (disable_on_error)?

According to the doc, ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION. Therefore, we can
specify a subset of parameters that can be specified by CREATE
SUBSCRIPTION. It makes sense to me for 'disable_on_error' since it can
be specified by CREATE SUBSCRIPTION. Whereas SKIP TRANSACTION stuff
cannot be done. Are you concerned about adding a syntax to ALTER
SUBSCRIPTION?

Both for additional syntax and consistency with disable_on_error.
Isn't it just a current implementation that Alter only allows to
change parameters supported by Create? Is there a reason why we can't
allow Alter to set/change some parameters not supported by Create?

I think there is not reason for that but looking at ALTER TABLE I
thought there is such a policy. I thought the skipping transaction
feature is somewhat different from disable_on_error feature. The
former seems a feature to deal with a problem on the spot whereas the
latter seems a setting of a subscription. Anyway, if we use the
subscription option, we can reset the XID by setting 0? Or do we need
ALTER SUBSCRIPTION RESET?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#48Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#47)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jul 7, 2021 at 11:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jul 6, 2021 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

According to the doc, ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION. Therefore, we can
specify a subset of parameters that can be specified by CREATE
SUBSCRIPTION. It makes sense to me for 'disable_on_error' since it can
be specified by CREATE SUBSCRIPTION. Whereas SKIP TRANSACTION stuff
cannot be done. Are you concerned about adding a syntax to ALTER
SUBSCRIPTION?

Both for additional syntax and consistency with disable_on_error.
Isn't it just a current implementation that Alter only allows to
change parameters supported by Create? Is there a reason why we can't
allow Alter to set/change some parameters not supported by Create?

I think there is not reason for that but looking at ALTER TABLE I
thought there is such a policy.

If we are looking for precedent then I think we allow to set
configuration parameters via Alter Database but not via Create
Database. Does that address your concern?

I thought the skipping transaction
feature is somewhat different from disable_on_error feature. The
former seems a feature to deal with a problem on the spot whereas the
latter seems a setting of a subscription. Anyway, if we use the
subscription option, we can reset the XID by setting 0? Or do we need
ALTER SUBSCRIPTION RESET?

The other commands like Alter Table, Alter Database, etc, which
provides a way to Set some parameter/option, have a Reset variant. I
think it would be good to have it for Alter Subscription as well but
we might want to allow other parameters to be reset by that as well.

--
With Regards,
Amit Kapila.

#49Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#48)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jul 8, 2021 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 7, 2021 at 11:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jul 6, 2021 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

According to the doc, ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION. Therefore, we can
specify a subset of parameters that can be specified by CREATE
SUBSCRIPTION. It makes sense to me for 'disable_on_error' since it can
be specified by CREATE SUBSCRIPTION. Whereas SKIP TRANSACTION stuff
cannot be done. Are you concerned about adding a syntax to ALTER
SUBSCRIPTION?

Both for additional syntax and consistency with disable_on_error.
Isn't it just a current implementation that Alter only allows to
change parameters supported by Create? Is there a reason why we can't
allow Alter to set/change some parameters not supported by Create?

I think there is not reason for that but looking at ALTER TABLE I
thought there is such a policy.

If we are looking for precedent then I think we allow to set
configuration parameters via Alter Database but not via Create
Database. Does that address your concern?

Thank you for the info! But it seems like CREATE DATABASE doesn't
support SET in the first place. Also interestingly, ALTER SUBSCRIPTION
support both ENABLE/DISABLE and SET (enabled = on/off). I’m not sure
from the point of view of consistency with other CREATE, ALTER
commands, and disable_on_error but it might be better to avoid adding
additional syntax.

I thought the skipping transaction
feature is somewhat different from disable_on_error feature. The
former seems a feature to deal with a problem on the spot whereas the
latter seems a setting of a subscription. Anyway, if we use the
subscription option, we can reset the XID by setting 0? Or do we need
ALTER SUBSCRIPTION RESET?

The other commands like Alter Table, Alter Database, etc, which
provides a way to Set some parameter/option, have a Reset variant. I
think it would be good to have it for Alter Subscription as well but
we might want to allow other parameters to be reset by that as well.

Agreed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#50Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Alexey Lesovsky (#46)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jul 6, 2021 at 7:13 PM Alexey Lesovsky <lesovsky@gmail.com> wrote:

On Tue, Jul 6, 2021 at 10:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Also, I'd like to suggest thinking twice about the view name (and function used in view DDL) - "pg_stat_logical_replication_error" contains very common "logical replication" words, but the view contains errors related to subscriptions only. In the future there could be other kinds of errors related to logical replication, but not related to subscriptions - what will you do?

Is pg_stat_subscription_errors or
pg_stat_logical_replication_apply_errors better?

It seems to me 'pg_stat_subscription_conflicts' proposed by Amit Kapila is the most suitable, because it directly says about conflicts occurring on the subscription side. The name 'pg_stat_subscription_errors' is also good, especially in case of further extension if some kind of similar errors will be tracked.

I personally prefer pg_stat_subscription_errors since
pg_stat_subscription_conflicts could be used for conflict resolution
features in the future. This stats view I'm proposing is meant to
focus on errors that happened during applying logical changes. So
using the term 'errors' seems to make sense to me.

3. Add a counter field with total number of errors - it helps to calculate errors rates and aggregations (sum), and don't lose information about errors between view checks.

Do you mean to increment the error count if the error (command, xid,
and relid) is the same as the previous one? or to have the total
number of errors per subscription? And what can we infer from the
error rates and aggregations?

To be honest, I hurried up when I wrote the first email, and read only about stats view. Later, I read the starting email about the patch and rethought this note.

As I understand, when the conflict occurs, replication stops (until conflict is resolved), an error appears in the stats view. Now, no new errors can occur in the blocked subscription. Hence, there are impossible situations when many errors (like spikes) have occurred and a user didn't see that. If I am correct in my assumption, there is no need for counters. They are necessary only when errors might occur too frequently (like pg_stat_database.deadlocks). But if this is possible, I would prefer the total number of errors per subscription, as also proposed by Amit.

Yeah, the total number of errors seems better.

Under "error rates and aggregations" I also mean in the context of when a high number of errors occured in a short period of time. If a user can read the "total errors" counter and keep this metric in his monitoring system, he will be able to calculate rates over time using functions in the monitoring system. This is extremely useful.

Thanks for your explanation. Agreed. But the rate depends on
wal_retrieve_retry_interval so is not likely to be high in practice.

I also would like to clarify, when conflict is resolved - the error record is cleared or kept in the view? If it is cleared, the error counter is required (because we don't want to lose all history of errors). If it is kept - the flag telling about the error is resolved is needed (or set xid to NULL). I mean when the user is watching the view, he should be able to identify if the error has already been resolved or not.

With the current patch, once the conflict is resolved by skipping the
transaction in question, its entry on the stats view is cleared. As
you suggested, if we have the total error counts in that view, it
would be good to keep the count and clear other fields such as xid,
last_failure, and command etc.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#51Alexey Lesovsky
Alexey Lesovsky
lesovsky@gmail.com
In reply to: Masahiko Sawada (#50)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jul 9, 2021 at 5:43 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Tue, Jul 6, 2021 at 7:13 PM Alexey Lesovsky <lesovsky@gmail.com> wrote:

On Tue, Jul 6, 2021 at 10:58 AM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

Also, I'd like to suggest thinking twice about the view name (and

function used in view DDL) - "pg_stat_logical_replication_error" contains
very common "logical replication" words, but the view contains errors
related to subscriptions only. In the future there could be other kinds of
errors related to logical replication, but not related to subscriptions -
what will you do?

Is pg_stat_subscription_errors or
pg_stat_logical_replication_apply_errors better?

It seems to me 'pg_stat_subscription_conflicts' proposed by Amit Kapila

is the most suitable, because it directly says about conflicts occurring on
the subscription side. The name 'pg_stat_subscription_errors' is also good,
especially in case of further extension if some kind of similar errors will
be tracked.

I personally prefer pg_stat_subscription_errors since
pg_stat_subscription_conflicts could be used for conflict resolution
features in the future. This stats view I'm proposing is meant to
focus on errors that happened during applying logical changes. So
using the term 'errors' seems to make sense to me.

Agreed

3. Add a counter field with total number of errors - it helps to

calculate errors rates and aggregations (sum), and don't lose information
about errors between view checks.

Do you mean to increment the error count if the error (command, xid,
and relid) is the same as the previous one? or to have the total
number of errors per subscription? And what can we infer from the
error rates and aggregations?

To be honest, I hurried up when I wrote the first email, and read only

about stats view. Later, I read the starting email about the patch and
rethought this note.

As I understand, when the conflict occurs, replication stops (until

conflict is resolved), an error appears in the stats view. Now, no new
errors can occur in the blocked subscription. Hence, there are impossible
situations when many errors (like spikes) have occurred and a user didn't
see that. If I am correct in my assumption, there is no need for counters.
They are necessary only when errors might occur too frequently (like
pg_stat_database.deadlocks). But if this is possible, I would prefer the
total number of errors per subscription, as also proposed by Amit.

Yeah, the total number of errors seems better.

Agreed

Under "error rates and aggregations" I also mean in the context of when

a high number of errors occured in a short period of time. If a user can
read the "total errors" counter and keep this metric in his monitoring
system, he will be able to calculate rates over time using functions in the
monitoring system. This is extremely useful.

Thanks for your explanation. Agreed. But the rate depends on
wal_retrieve_retry_interval so is not likely to be high in practice.

Agreed

I also would like to clarify, when conflict is resolved - the error

record is cleared or kept in the view? If it is cleared, the error counter
is required (because we don't want to lose all history of errors). If it is
kept - the flag telling about the error is resolved is needed (or set xid
to NULL). I mean when the user is watching the view, he should be able to
identify if the error has already been resolved or not.

With the current patch, once the conflict is resolved by skipping the
transaction in question, its entry on the stats view is cleared. As
you suggested, if we have the total error counts in that view, it
would be good to keep the count and clear other fields such as xid,
last_failure, and command etc.

Ok, looks nice. But I am curious how this will work in the case when there
are two (or more) errors in the same subscription, but different relations?
After resolution all these records are kept or they will be merged into a
single record (because subscription was the same for all errors)?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

--
Regards, Alexey Lesovsky

#52Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#49)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jul 9, 2021 at 5:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 8, 2021 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 7, 2021 at 11:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jul 6, 2021 at 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

According to the doc, ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION. Therefore, we can
specify a subset of parameters that can be specified by CREATE
SUBSCRIPTION. It makes sense to me for 'disable_on_error' since it can
be specified by CREATE SUBSCRIPTION. Whereas SKIP TRANSACTION stuff
cannot be done. Are you concerned about adding a syntax to ALTER
SUBSCRIPTION?

Both for additional syntax and consistency with disable_on_error.
Isn't it just a current implementation that Alter only allows to
change parameters supported by Create? Is there a reason why we can't
allow Alter to set/change some parameters not supported by Create?

I think there is not reason for that but looking at ALTER TABLE I
thought there is such a policy.

If we are looking for precedent then I think we allow to set
configuration parameters via Alter Database but not via Create
Database. Does that address your concern?

Thank you for the info! But it seems like CREATE DATABASE doesn't
support SET in the first place. Also interestingly, ALTER SUBSCRIPTION
support both ENABLE/DISABLE and SET (enabled = on/off).

I think that is redundant but not sure if there is any reason behind doing so.

I’m not sure
from the point of view of consistency with other CREATE, ALTER
commands, and disable_on_error but it might be better to avoid adding
additional syntax.

If we can avoid introducing new syntax that in itself is a good reason
to introduce it as an option.

--
With Regards,
Amit Kapila.

#53Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexey Lesovsky (#51)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jul 9, 2021 at 9:02 AM Alexey Lesovsky <lesovsky@gmail.com> wrote:

On Fri, Jul 9, 2021 at 5:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I also would like to clarify, when conflict is resolved - the error record is cleared or kept in the view? If it is cleared, the error counter is required (because we don't want to lose all history of errors). If it is kept - the flag telling about the error is resolved is needed (or set xid to NULL). I mean when the user is watching the view, he should be able to identify if the error has already been resolved or not.

With the current patch, once the conflict is resolved by skipping the
transaction in question, its entry on the stats view is cleared. As
you suggested, if we have the total error counts in that view, it
would be good to keep the count and clear other fields such as xid,
last_failure, and command etc.

Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the same subscription, but different relations?

We can't proceed unless the first error is resolved, so there
shouldn't be multiple unresolved errors. However, there is an
exception to it which is during initial table sync and I think the
view should have separate rows for each table sync.

--
With Regards,
Amit Kapila.

#54Alexey Lesovsky
Alexey Lesovsky
lesovsky@gmail.com
In reply to: Amit Kapila (#53)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Ok, looks nice. But I am curious how this will work in the case when

there are two (or more) errors in the same subscription, but different
relations?

We can't proceed unless the first error is resolved, so there
shouldn't be multiple unresolved errors.

Ok. I thought multiple errors are possible when many tables are initialized
using parallel workers (with max_sync_workers_per_subscription > 1).

--
Regards, Alexey

#55Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Alexey Lesovsky (#54)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky <lesovsky@gmail.com> wrote:

On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the same subscription, but different relations?

We can't proceed unless the first error is resolved, so there
shouldn't be multiple unresolved errors.

Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with max_sync_workers_per_subscription > 1).

Yeah, that is possible but that covers under the second condition
mentioned by me and in such cases I think we should have separate rows
for each tablesync. Is that right, Sawada-san or do you have something
else in mind?

--
With Regards,
Amit Kapila.

#56Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#55)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky <lesovsky@gmail.com> wrote:

On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the same subscription, but different relations?

We can't proceed unless the first error is resolved, so there
shouldn't be multiple unresolved errors.

Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with max_sync_workers_per_subscription > 1).

Yeah, that is possible but that covers under the second condition
mentioned by me and in such cases I think we should have separate rows
for each tablesync. Is that right, Sawada-san or do you have something
else in mind?

Yeah, I agree to have separate rows for each table sync. The table
should not be processed by both the table sync worker and the apply
worker at a time so the pair of subscription OID and relation OID will
be unique. I think that we have a boolean column in the view,
indicating whether the error entry is reported by the table sync
worker or the apply worker, or maybe we also can have the action
column show "TABLE SYNC" if the error is reported by the table sync
worker.

When it comes to removing the subscription errors in
pgstat_vacuum_stat(), I think we need to seq scan on the hash table
and send the messages to purge the subscription error entries.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#57Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#56)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 12, 2021 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky <lesovsky@gmail.com> wrote:

On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the same subscription, but different relations?

We can't proceed unless the first error is resolved, so there
shouldn't be multiple unresolved errors.

Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with max_sync_workers_per_subscription > 1).

Yeah, that is possible but that covers under the second condition
mentioned by me and in such cases I think we should have separate rows
for each tablesync. Is that right, Sawada-san or do you have something
else in mind?

Yeah, I agree to have separate rows for each table sync. The table
should not be processed by both the table sync worker and the apply
worker at a time so the pair of subscription OID and relation OID will
be unique. I think that we have a boolean column in the view,
indicating whether the error entry is reported by the table sync
worker or the apply worker, or maybe we also can have the action
column show "TABLE SYNC" if the error is reported by the table sync
worker.

Or similar to backend_type (text) in pg_stat_activity, we can have
something like error_source (text) which will display apply worker or
tablesync worker? I think if we have this column then even if there is
a chance that both apply and sync worker operates on the same
relation, we can identify it via this column.

--
With Regards,
Amit Kapila.

#58Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#57)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 12, 2021 at 8:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 12, 2021 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky <lesovsky@gmail.com> wrote:

On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the same subscription, but different relations?

We can't proceed unless the first error is resolved, so there
shouldn't be multiple unresolved errors.

Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with max_sync_workers_per_subscription > 1).

Yeah, that is possible but that covers under the second condition
mentioned by me and in such cases I think we should have separate rows
for each tablesync. Is that right, Sawada-san or do you have something
else in mind?

Yeah, I agree to have separate rows for each table sync. The table
should not be processed by both the table sync worker and the apply
worker at a time so the pair of subscription OID and relation OID will
be unique. I think that we have a boolean column in the view,
indicating whether the error entry is reported by the table sync
worker or the apply worker, or maybe we also can have the action
column show "TABLE SYNC" if the error is reported by the table sync
worker.

Or similar to backend_type (text) in pg_stat_activity, we can have
something like error_source (text) which will display apply worker or
tablesync worker? I think if we have this column then even if there is
a chance that both apply and sync worker operates on the same
relation, we can identify it via this column.

Sounds good. I'll incorporate this in the next version patch that I'm
planning to submit this week.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#59Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#58)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jul 14, 2021 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 12, 2021 at 8:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 12, 2021 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky <lesovsky@gmail.com> wrote:

On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the same subscription, but different relations?

We can't proceed unless the first error is resolved, so there
shouldn't be multiple unresolved errors.

Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with max_sync_workers_per_subscription > 1).

Yeah, that is possible but that covers under the second condition
mentioned by me and in such cases I think we should have separate rows
for each tablesync. Is that right, Sawada-san or do you have something
else in mind?

Yeah, I agree to have separate rows for each table sync. The table
should not be processed by both the table sync worker and the apply
worker at a time so the pair of subscription OID and relation OID will
be unique. I think that we have a boolean column in the view,
indicating whether the error entry is reported by the table sync
worker or the apply worker, or maybe we also can have the action
column show "TABLE SYNC" if the error is reported by the table sync
worker.

Or similar to backend_type (text) in pg_stat_activity, we can have
something like error_source (text) which will display apply worker or
tablesync worker? I think if we have this column then even if there is
a chance that both apply and sync worker operates on the same
relation, we can identify it via this column.

Sounds good. I'll incorporate this in the next version patch that I'm
planning to submit this week.

Sorry, I could not make it this week. I'll submit them early next week.
While updating the patch I thought we need to have more design
discussion on two points of clearing error details after the error is
resolved:

1. How to clear apply worker errors. IIUC we've discussed that once
the apply worker skipped the transaction we leave the error entry
itself but clear its fields except for some fields such as failure
counts. But given that the stats messages could be lost, how can we
ensure to clear those error details? For table sync workers’ error, we
can have autovacuum workers periodically check entires of
pg_subscription_rel and clear the error entry if the table sync worker
completes table sync (i.g., checking if srsubstate = ‘r’). But there
is no such information for the apply workers and subscriptions. In
addition to sending the message clearing the error details just after
skipping the transaction, I thought that we can have apply workers
periodically send the message clearing the error details but it seems
not good.

2. Do we really want to leave the table sync worker even after the
error is resolved and the table sync completes? Unlike the apply
worker error, the number of table sync worker errors could be very
large, for example, if a subscriber subscribes to many tables. If we
leave those errors in the stats view, it uses more memory space and
could affect writing and reading stats file performance. If such left
table sync error entries are not helpful in practice I think we can
remove them rather than clear some fields. What do you think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#60Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#59)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jul 16, 2021 at 8:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 14, 2021 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sounds good. I'll incorporate this in the next version patch that I'm
planning to submit this week.

Sorry, I could not make it this week. I'll submit them early next week.

No problem.

While updating the patch I thought we need to have more design
discussion on two points of clearing error details after the error is
resolved:

1. How to clear apply worker errors. IIUC we've discussed that once
the apply worker skipped the transaction we leave the error entry
itself but clear its fields except for some fields such as failure
counts. But given that the stats messages could be lost, how can we
ensure to clear those error details? For table sync workers’ error, we
can have autovacuum workers periodically check entires of
pg_subscription_rel and clear the error entry if the table sync worker
completes table sync (i.g., checking if srsubstate = ‘r’). But there
is no such information for the apply workers and subscriptions.

But won't the corresponding subscription (pg_subscription) have the
XID as InvalidTransactionid once the xid is skipped or at least a
different XID then we would have in pg_stat view? Can we use that to
reset entry via vacuum?

In
addition to sending the message clearing the error details just after
skipping the transaction, I thought that we can have apply workers
periodically send the message clearing the error details but it seems
not good.

Yeah, such things should be a last resort.

2. Do we really want to leave the table sync worker even after the
error is resolved and the table sync completes? Unlike the apply
worker error, the number of table sync worker errors could be very
large, for example, if a subscriber subscribes to many tables. If we
leave those errors in the stats view, it uses more memory space and
could affect writing and reading stats file performance. If such left
table sync error entries are not helpful in practice I think we can
remove them rather than clear some fields. What do you think?

Sounds reasonable to me. One might think to update the subscription
error count by including table_sync errors but not sure if that is
helpful and even if that is helpful, we can extend it later.

--
With Regards,
Amit Kapila.

#61Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#60)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 19, 2021 at 2:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jul 16, 2021 at 8:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 14, 2021 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sounds good. I'll incorporate this in the next version patch that I'm
planning to submit this week.

Sorry, I could not make it this week. I'll submit them early next week.

No problem.

While updating the patch I thought we need to have more design
discussion on two points of clearing error details after the error is
resolved:

1. How to clear apply worker errors. IIUC we've discussed that once
the apply worker skipped the transaction we leave the error entry
itself but clear its fields except for some fields such as failure
counts. But given that the stats messages could be lost, how can we
ensure to clear those error details? For table sync workers’ error, we
can have autovacuum workers periodically check entires of
pg_subscription_rel and clear the error entry if the table sync worker
completes table sync (i.g., checking if srsubstate = ‘r’). But there
is no such information for the apply workers and subscriptions.

But won't the corresponding subscription (pg_subscription) have the
XID as InvalidTransactionid once the xid is skipped or at least a
different XID then we would have in pg_stat view? Can we use that to
reset entry via vacuum?

I think the XID is InvalidTransaction until the user specifies it. So
I think we cannot know whether we're before skipping or after skipping
only by the transaction ID. No?

In
addition to sending the message clearing the error details just after
skipping the transaction, I thought that we can have apply workers
periodically send the message clearing the error details but it seems
not good.

Yeah, such things should be a last resort.

2. Do we really want to leave the table sync worker even after the
error is resolved and the table sync completes? Unlike the apply
worker error, the number of table sync worker errors could be very
large, for example, if a subscriber subscribes to many tables. If we
leave those errors in the stats view, it uses more memory space and
could affect writing and reading stats file performance. If such left
table sync error entries are not helpful in practice I think we can
remove them rather than clear some fields. What do you think?

Sounds reasonable to me. One might think to update the subscription
error count by including table_sync errors but not sure if that is
helpful and even if that is helpful, we can extend it later.

Agreed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#62Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#59)
3 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Sat, Jul 17, 2021 at 12:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 14, 2021 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 12, 2021 at 8:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 12, 2021 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 12, 2021 at 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 12, 2021 at 9:37 AM Alexey Lesovsky <lesovsky@gmail.com> wrote:

On Mon, Jul 12, 2021 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Ok, looks nice. But I am curious how this will work in the case when there are two (or more) errors in the same subscription, but different relations?

We can't proceed unless the first error is resolved, so there
shouldn't be multiple unresolved errors.

Ok. I thought multiple errors are possible when many tables are initialized using parallel workers (with max_sync_workers_per_subscription > 1).

Yeah, that is possible but that covers under the second condition
mentioned by me and in such cases I think we should have separate rows
for each tablesync. Is that right, Sawada-san or do you have something
else in mind?

Yeah, I agree to have separate rows for each table sync. The table
should not be processed by both the table sync worker and the apply
worker at a time so the pair of subscription OID and relation OID will
be unique. I think that we have a boolean column in the view,
indicating whether the error entry is reported by the table sync
worker or the apply worker, or maybe we also can have the action
column show "TABLE SYNC" if the error is reported by the table sync
worker.

Or similar to backend_type (text) in pg_stat_activity, we can have
something like error_source (text) which will display apply worker or
tablesync worker? I think if we have this column then even if there is
a chance that both apply and sync worker operates on the same
relation, we can identify it via this column.

Sounds good. I'll incorporate this in the next version patch that I'm
planning to submit this week.

Sorry, I could not make it this week. I'll submit them early next week.
While updating the patch I thought we need to have more design
discussion on two points of clearing error details after the error is
resolved:

1. How to clear apply worker errors. IIUC we've discussed that once
the apply worker skipped the transaction we leave the error entry
itself but clear its fields except for some fields such as failure
counts. But given that the stats messages could be lost, how can we
ensure to clear those error details? For table sync workers’ error, we
can have autovacuum workers periodically check entires of
pg_subscription_rel and clear the error entry if the table sync worker
completes table sync (i.g., checking if srsubstate = ‘r’). But there
is no such information for the apply workers and subscriptions. In
addition to sending the message clearing the error details just after
skipping the transaction, I thought that we can have apply workers
periodically send the message clearing the error details but it seems
not good.

I think that the motivation behind the idea of leaving error entries
and clearing theirs some fields is that users can check if the error
is successfully resolved and the worker is working find. But we can
check it also in another way, for example, checking
pg_stat_subscription view. So is it worth considering leaving the
apply worker errors as they are?

2. Do we really want to leave the table sync worker even after the
error is resolved and the table sync completes? Unlike the apply
worker error, the number of table sync worker errors could be very
large, for example, if a subscriber subscribes to many tables. If we
leave those errors in the stats view, it uses more memory space and
could affect writing and reading stats file performance. If such left
table sync error entries are not helpful in practice I think we can
remove them rather than clear some fields. What do you think?

I've attached the updated version patch that incorporated all comments
I got so far except for the clearing error details part I mentioned
above. After getting a consensus on those parts, I'll incorporate the
idea into the patches.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v2-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/x-patch; name=v2-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v2-0001-Add-errcontext-to-errors-of-the-applying-logical-.patchapplication/x-patch; name=v2-0001-Add-errcontext-to-errors-of-the-applying-logical-.patch
v2-0002-Add-pg_stat_logical_replication_error-statistics-.patchapplication/x-patch; name=v2-0002-Add-pg_stat_logical_replication_error-statistics-.patch
#63Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#62)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 19, 2021 at 12:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jul 17, 2021 at 12:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

1. How to clear apply worker errors. IIUC we've discussed that once
the apply worker skipped the transaction we leave the error entry
itself but clear its fields except for some fields such as failure
counts. But given that the stats messages could be lost, how can we
ensure to clear those error details? For table sync workers’ error, we
can have autovacuum workers periodically check entires of
pg_subscription_rel and clear the error entry if the table sync worker
completes table sync (i.g., checking if srsubstate = ‘r’). But there
is no such information for the apply workers and subscriptions. In
addition to sending the message clearing the error details just after
skipping the transaction, I thought that we can have apply workers
periodically send the message clearing the error details but it seems
not good.

I think that the motivation behind the idea of leaving error entries
and clearing theirs some fields is that users can check if the error
is successfully resolved and the worker is working find. But we can
check it also in another way, for example, checking
pg_stat_subscription view. So is it worth considering leaving the
apply worker errors as they are?

I think so. Basically, we will send the clear message after skipping
the exact but I think it is fine if that message is lost. At worst, it
will be displayed as the last error details. If there is another error
it will be overwritten or probably we should have a function *_reset()
which allows the user to reset a particular subscription's error info.

--
With Regards,
Amit Kapila.

#64houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#62)
RE: Skipping logical replication transactions on subscriber side

On July 19, 2021 2:40 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patch that incorporated all comments
I got so far except for the clearing error details part I mentioned
above. After getting a consensus on those parts, I'll incorporate the
idea into the patches.

Hi Sawada-san,

I am interested in this feature.
After having a look at the patch, I have a few questions about it.
(Sorry in advance if I missed something)

1) In 0002 patch, it introduces a new view called pg_stat_subscription_errors.
Since it won't be cleaned automatically after we resolve the conflict, do we
need a reset function to clean the statistics in it ? Maybe something
similar to pg_stat_reset_replication_slot which clean the
pg_stat_replication_slots.

2) For 0003 patch, When I am faced with a conflict, I set skip_xid = xxx, and
then I resolve the conflict. If I reset skip_xid after resolving the
conflict, will the change(which cause the conflict before) be applied again ?

3) For 0003 patch, if user set skip_xid to a wrong xid which have not been
assigned, and then will the change be skipped when the xid is assigned in
the future even if it doesn't cause any conflicts ?

Besides, It might be better to add some description of patch in each patch's
commit message which will make it easier for new reviewers to follow.

Best regards,
Houzj

#65Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#63)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 19, 2021 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jul 19, 2021 at 12:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jul 17, 2021 at 12:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

1. How to clear apply worker errors. IIUC we've discussed that once
the apply worker skipped the transaction we leave the error entry
itself but clear its fields except for some fields such as failure
counts. But given that the stats messages could be lost, how can we
ensure to clear those error details? For table sync workers’ error, we
can have autovacuum workers periodically check entires of
pg_subscription_rel and clear the error entry if the table sync worker
completes table sync (i.g., checking if srsubstate = ‘r’). But there
is no such information for the apply workers and subscriptions. In
addition to sending the message clearing the error details just after
skipping the transaction, I thought that we can have apply workers
periodically send the message clearing the error details but it seems
not good.

I think that the motivation behind the idea of leaving error entries
and clearing theirs some fields is that users can check if the error
is successfully resolved and the worker is working find. But we can
check it also in another way, for example, checking
pg_stat_subscription view. So is it worth considering leaving the
apply worker errors as they are?

I think so. Basically, we will send the clear message after skipping
the exact but I think it is fine if that message is lost. At worst, it
will be displayed as the last error details. If there is another error
it will be overwritten or probably we should have a function *_reset()
which allows the user to reset a particular subscription's error info.

That makes sense. I'll incorporate this idea in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#66Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#64)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 19, 2021 at 8:38 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On July 19, 2021 2:40 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patch that incorporated all comments
I got so far except for the clearing error details part I mentioned
above. After getting a consensus on those parts, I'll incorporate the
idea into the patches.

Hi Sawada-san,

I am interested in this feature.
After having a look at the patch, I have a few questions about it.

Thank you for having a look at the patches!

1) In 0002 patch, it introduces a new view called pg_stat_subscription_errors.
Since it won't be cleaned automatically after we resolve the conflict, do we
need a reset function to clean the statistics in it ? Maybe something
similar to pg_stat_reset_replication_slot which clean the
pg_stat_replication_slots.

Agreed. As Amit also mentioned, providing a reset function to clean
the statistics seems a good idea. If the message clearing the stats
that is sent after skipping the transaction gets lost, the user is
able to reset those stats manually.

2) For 0003 patch, When I am faced with a conflict, I set skip_xid = xxx, and
then I resolve the conflict. If I reset skip_xid after resolving the
conflict, will the change(which cause the conflict before) be applied again ?

The apply worker checks skip_xid when it reads the subscription.
Therefore, if you reset skip_xid before the apply worker restarts and
skips the transaction, the change is applied. But if you reset
skip_xid after the apply worker skips transaction, the change is
already skipped and your resetting skip_xid has no effect.

3) For 0003 patch, if user set skip_xid to a wrong xid which have not been
assigned, and then will the change be skipped when the xid is assigned in
the future even if it doesn't cause any conflicts ?

Yes. Currently, setting a correct xid is the user's responsibility. I
think it would be better to disable it or emit WARNING/ERROR when the
user mistakenly set the wrong xid if we find out a convenient way to
detect that.

Besides, It might be better to add some description of patch in each patch's
commit message which will make it easier for new reviewers to follow.

I'll add commit messages in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#67Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#66)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jul 20, 2021 at 6:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 19, 2021 at 8:38 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

3) For 0003 patch, if user set skip_xid to a wrong xid which have not been
assigned, and then will the change be skipped when the xid is assigned in
the future even if it doesn't cause any conflicts ?

Yes. Currently, setting a correct xid is the user's responsibility. I
think it would be better to disable it or emit WARNING/ERROR when the
user mistakenly set the wrong xid if we find out a convenient way to
detect that.

I think in this regard we should clearly document how this can be
misused by users. I see that you have mentioned about skip_xid but
maybe we can add more on how it could lead to skipping a
non-conflicting XID and can lead to an inconsistent replica. As
discussed earlier as well, users can anyway do similar harm by using
pg_replication_slot_advance(). I think if possible we might want to
give some examples as well where it would be helpful for users to use
this functionality.

--
With Regards,
Amit Kapila.

#68houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#66)
RE: Skipping logical replication transactions on subscriber side

On July 20, 2021 9:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 19, 2021 at 8:38 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On July 19, 2021 2:40 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

I've attached the updated version patch that incorporated all
comments I got so far except for the clearing error details part I
mentioned above. After getting a consensus on those parts, I'll
incorporate the idea into the patches.

3) For 0003 patch, if user set skip_xid to a wrong xid which have not been
assigned, and then will the change be skipped when the xid is assigned in
the future even if it doesn't cause any conflicts ?

Yes. Currently, setting a correct xid is the user's responsibility. I think it would
be better to disable it or emit WARNING/ERROR when the user mistakenly set
the wrong xid if we find out a convenient way to detect that.

Thanks for the explanation. As Amit suggested, it seems we can document the
risk of misusing skip_xid. Besides, I found some minor things in the patch.

1) In 0002 patch

+ */
+static void
+pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
+{
+	if (subscriptionErrHash != NULL)
+		return;
+
+static void
+pgstat_recv_subscription_error(PgStat_MsgSubscriptionErr *msg, int len)
+{

the second paramater "len" seems not used in the function
pgstat_recv_subscription_purge() and pgstat_recv_subscription_error().

2) in 0003 patch

  * Helper function for apply_handle_commit and apply_handle_stream_commit.
  */
 static void
-apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
+apply_handle_commit_internal(LogicalRepCommitData *commit_data)
 {

This looks like a separate change which remove unused paramater in existing
code, maybe we can get this committed first ?

Best regards,
Houzj

#69Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#68)
4 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jul 22, 2021 at 8:53 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On July 20, 2021 9:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 19, 2021 at 8:38 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On July 19, 2021 2:40 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

I've attached the updated version patch that incorporated all
comments I got so far except for the clearing error details part I
mentioned above. After getting a consensus on those parts, I'll
incorporate the idea into the patches.

3) For 0003 patch, if user set skip_xid to a wrong xid which have not been
assigned, and then will the change be skipped when the xid is assigned in
the future even if it doesn't cause any conflicts ?

Yes. Currently, setting a correct xid is the user's responsibility. I think it would
be better to disable it or emit WARNING/ERROR when the user mistakenly set
the wrong xid if we find out a convenient way to detect that.

Thanks for the explanation. As Amit suggested, it seems we can document the
risk of misusing skip_xid. Besides, I found some minor things in the patch.

1) In 0002 patch

+ */
+static void
+pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
+{
+       if (subscriptionErrHash != NULL)
+               return;
+
+static void
+pgstat_recv_subscription_error(PgStat_MsgSubscriptionErr *msg, int len)
+{

the second paramater "len" seems not used in the function
pgstat_recv_subscription_purge() and pgstat_recv_subscription_error().

'len' is not used at all in not only functions the patch added but
also other pgstat_recv_* functions. Can we remove all of them in a
separate patch? 'len' in pgstat_recv_* functions has never been used
since the stats collector code is introduced. It seems like that it
was mistakenly introduced in the first commit and other pgstat_recv_*
functions were added that followed it to define ‘len’ but didn’t also
use it at all.

2) in 0003 patch

* Helper function for apply_handle_commit and apply_handle_stream_commit.
*/
static void
-apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
+apply_handle_commit_internal(LogicalRepCommitData *commit_data)
{

This looks like a separate change which remove unused paramater in existing
code, maybe we can get this committed first ?

Yeah, it seems to be introduced by commit 0926e96c493. I've attached
the patch for that.

Also, I've attached the updated version patches. This version patch
has pg_stat_reset_subscription_error() SQL function and sends a clear
message after skipping the transaction. 0004 patch includes the
skipping transaction feature and introducing RESET to ALTER
SUBSCRIPTION. It would be better to separate them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

0001-Remove-unused-function-argument-in-apply_handle_comm.patchapplication/octet-stream; name=0001-Remove-unused-function-argument-in-apply_handle_comm.patch
v3-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v3-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v3-0001-Add-errcontext-to-errors-of-the-applying-logical-.patchapplication/octet-stream; name=v3-0001-Add-errcontext-to-errors-of-the-applying-logical-.patch
v3-0002-Add-pg_stat_logical_replication_error-statistics-.patchapplication/octet-stream; name=v3-0002-Add-pg_stat_logical_replication_error-statistics-.patch
#70Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#69)
4 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jul 26, 2021 at 11:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 22, 2021 at 8:53 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On July 20, 2021 9:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 19, 2021 at 8:38 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On July 19, 2021 2:40 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

I've attached the updated version patch that incorporated all
comments I got so far except for the clearing error details part I
mentioned above. After getting a consensus on those parts, I'll
incorporate the idea into the patches.

3) For 0003 patch, if user set skip_xid to a wrong xid which have not been
assigned, and then will the change be skipped when the xid is assigned in
the future even if it doesn't cause any conflicts ?

Yes. Currently, setting a correct xid is the user's responsibility. I think it would
be better to disable it or emit WARNING/ERROR when the user mistakenly set
the wrong xid if we find out a convenient way to detect that.

Thanks for the explanation. As Amit suggested, it seems we can document the
risk of misusing skip_xid. Besides, I found some minor things in the patch.

1) In 0002 patch

+ */
+static void
+pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
+{
+       if (subscriptionErrHash != NULL)
+               return;
+
+static void
+pgstat_recv_subscription_error(PgStat_MsgSubscriptionErr *msg, int len)
+{

the second paramater "len" seems not used in the function
pgstat_recv_subscription_purge() and pgstat_recv_subscription_error().

'len' is not used at all in not only functions the patch added but
also other pgstat_recv_* functions. Can we remove all of them in a
separate patch? 'len' in pgstat_recv_* functions has never been used
since the stats collector code is introduced. It seems like that it
was mistakenly introduced in the first commit and other pgstat_recv_*
functions were added that followed it to define ‘len’ but didn’t also
use it at all.

2) in 0003 patch

* Helper function for apply_handle_commit and apply_handle_stream_commit.
*/
static void
-apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
+apply_handle_commit_internal(LogicalRepCommitData *commit_data)
{

This looks like a separate change which remove unused paramater in existing
code, maybe we can get this committed first ?

Yeah, it seems to be introduced by commit 0926e96c493. I've attached
the patch for that.

Also, I've attached the updated version patches. This version patch
has pg_stat_reset_subscription_error() SQL function and sends a clear
message after skipping the transaction. 0004 patch includes the
skipping transaction feature and introducing RESET to ALTER
SUBSCRIPTION. It would be better to separate them.

I've attached the new version patches that fix cfbot failure.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v4-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/x-patch; name=v4-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v4-0001-Add-errcontext-to-errors-of-the-applying-logical-.patchapplication/x-patch; name=v4-0001-Add-errcontext-to-errors-of-the-applying-logical-.patch
v4-0002-Add-pg_stat_logical_replication_error-statistics-.patchapplication/x-patch; name=v4-0002-Add-pg_stat_logical_replication_error-statistics-.patch
0001-Remove-unused-function-argument-in-apply_handle_comm.patchapplication/x-patch; name=0001-Remove-unused-function-argument-in-apply_handle_comm.patch
#71Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#70)
4 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jul 29, 2021 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 26, 2021 at 11:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 22, 2021 at 8:53 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On July 20, 2021 9:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jul 19, 2021 at 8:38 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On July 19, 2021 2:40 PM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

I've attached the updated version patch that incorporated all
comments I got so far except for the clearing error details part I
mentioned above. After getting a consensus on those parts, I'll
incorporate the idea into the patches.

3) For 0003 patch, if user set skip_xid to a wrong xid which have not been
assigned, and then will the change be skipped when the xid is assigned in
the future even if it doesn't cause any conflicts ?

Yes. Currently, setting a correct xid is the user's responsibility. I think it would
be better to disable it or emit WARNING/ERROR when the user mistakenly set
the wrong xid if we find out a convenient way to detect that.

Thanks for the explanation. As Amit suggested, it seems we can document the
risk of misusing skip_xid. Besides, I found some minor things in the patch.

1) In 0002 patch

+ */
+static void
+pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
+{
+       if (subscriptionErrHash != NULL)
+               return;
+
+static void
+pgstat_recv_subscription_error(PgStat_MsgSubscriptionErr *msg, int len)
+{

the second paramater "len" seems not used in the function
pgstat_recv_subscription_purge() and pgstat_recv_subscription_error().

'len' is not used at all in not only functions the patch added but
also other pgstat_recv_* functions. Can we remove all of them in a
separate patch? 'len' in pgstat_recv_* functions has never been used
since the stats collector code is introduced. It seems like that it
was mistakenly introduced in the first commit and other pgstat_recv_*
functions were added that followed it to define ‘len’ but didn’t also
use it at all.

2) in 0003 patch

* Helper function for apply_handle_commit and apply_handle_stream_commit.
*/
static void
-apply_handle_commit_internal(StringInfo s, LogicalRepCommitData *commit_data)
+apply_handle_commit_internal(LogicalRepCommitData *commit_data)
{

This looks like a separate change which remove unused paramater in existing
code, maybe we can get this committed first ?

Yeah, it seems to be introduced by commit 0926e96c493. I've attached
the patch for that.

Also, I've attached the updated version patches. This version patch
has pg_stat_reset_subscription_error() SQL function and sends a clear
message after skipping the transaction. 0004 patch includes the
skipping transaction feature and introducing RESET to ALTER
SUBSCRIPTION. It would be better to separate them.

I've attached the new version patches that fix cfbot failure.

Sorry I've attached wrong ones. Reattached the correct version patches.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v4-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v4-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v4-0002-Add-pg_stat_logical_replication_error-statistics-.patchapplication/octet-stream; name=v4-0002-Add-pg_stat_logical_replication_error-statistics-.patch
0001-Remove-unused-function-argument-in-apply_handle_comm.patchapplication/octet-stream; name=0001-Remove-unused-function-argument-in-apply_handle_comm.patch
v4-0001-Add-errcontext-to-errors-of-the-applying-logical-.patchapplication/octet-stream; name=v4-0001-Add-errcontext-to-errors-of-the-applying-logical-.patch
#72Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#71)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 29, 2021 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, it seems to be introduced by commit 0926e96c493. I've attached
the patch for that.

Also, I've attached the updated version patches. This version patch
has pg_stat_reset_subscription_error() SQL function and sends a clear
message after skipping the transaction. 0004 patch includes the
skipping transaction feature and introducing RESET to ALTER
SUBSCRIPTION. It would be better to separate them.

+1, to separate out the reset part.

I've attached the new version patches that fix cfbot failure.

Sorry I've attached wrong ones. Reattached the correct version patches.

Pushed the 0001* patch that removes the unused parameter.

Few comments on v4-0001-Add-errcontext-to-errors-of-the-applying-logical-
===========================================================
1.
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -78,6 +78,7 @@
 #include "partitioning/partbounds.h"
 #include "partitioning/partdesc.h"
 #include "pgstat.h"
+#include "replication/logicalworker.h"
 #include "rewrite/rewriteDefine.h"
 #include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
@@ -1899,6 +1900,9 @@ ExecuteTruncateGuts(List *explicit_rels,
  continue;
  }
+ /* Set logical replication error callback info if necessary */
+ set_logicalrep_error_context_rel(rel);
+
  /*
  * Build the lists of foreign tables belonging to each foreign server
  * and pass each list to the foreign data wrapper's callback function,
@@ -2006,6 +2010,9 @@ ExecuteTruncateGuts(List *explicit_rels,
  pgstat_count_truncate(rel);
  }
+ /* Reset logical replication error callback info */
+ reset_logicalrep_error_context_rel();
+

Setting up logical rep error context in a generic function looks a bit
odd to me. Do we really need to set up error context here? I
understand we can't do this in caller but anyway I think we are not
sending this to logical replication view as well, so not sure we need
to do it here.

2.
+/* Struct for saving and restoring apply information */
+typedef struct ApplyErrCallbackArg
+{
+ LogicalRepMsgType command; /* 0 if invalid */
+
+ /* Local relation information */
+ char    *nspname; /* used for error context */
+ char    *relname; /* used for error context */
+
+ TransactionId remote_xid;
+ TimestampTz committs;
+} ApplyErrCallbackArg;
+static ApplyErrCallbackArg apply_error_callback_arg =
+{
+ .command = 0,
+ .relname = NULL,
+ .nspname = NULL,
+ .remote_xid = InvalidTransactionId,
+ .committs = 0,
+};
+

Better to have a space between the above two declarations.

3. commit message:
This commit adds the error context to errors happening during applying
logical replication changes, showing the command, the relation
relation, transaction ID, and commit timestamp in the server log.

'relation' is mentioned twice.

The patch is not getting applied probably due to yesterday's commit in
this area.

--
With Regards,
Amit Kapila.

#73houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#71)
RE: Skipping logical replication transactions on subscriber side

On July 29, 2021 1:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sorry I've attached wrong ones. Reattached the correct version patches.

Hi,

I had some comments on the new version patches.

1)

-       relstate = (SubscriptionRelState *) palloc(sizeof(SubscriptionRelState));
-       relstate->relid = subrel->srrelid;
+       relstate = (SubscriptionRelState *) hash_search(htab, (void *) &subrel->srrelid,
+                                                       HASH_ENTER, NULL);

I found the new version patch changes the List type 'relstate' to hash table type
'relstate'. Will this bring significant performance improvements ?

2)
+ * PgStat_StatSubRelErrEntry represents a error happened during logical

a error => an error

3)
+CREATE VIEW pg_stat_subscription_errors AS
+    SELECT
+   d.datname,
+   sr.subid,
+   s.subname,

It seems the 'subid' column is not mentioned in the document of the
pg_stat_subscription_errors view.

4)
+
+                   if (fread(&nrels, 1, sizeof(long), fpin) != sizeof(long))
+                   {
    ...
+                   for (int i = 0; i < nrels; i++)

the type of i(int) seems different of the type or 'nrels'(long), it might be
better to use the same type.

Best regards,
houzj

#74Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#72)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jul 29, 2021 at 2:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, it seems to be introduced by commit 0926e96c493. I've attached
the patch for that.

Also, I've attached the updated version patches. This version patch
has pg_stat_reset_subscription_error() SQL function and sends a clear
message after skipping the transaction. 0004 patch includes the
skipping transaction feature and introducing RESET to ALTER
SUBSCRIPTION. It would be better to separate them.

+1, to separate out the reset part.

Okay, I'll do that.

I've attached the new version patches that fix cfbot failure.

Sorry I've attached wrong ones. Reattached the correct version patches.

Pushed the 0001* patch that removes the unused parameter.

Thanks!

Few comments on v4-0001-Add-errcontext-to-errors-of-the-applying-logical-
===========================================================

Thank you for the comments!

1.
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -78,6 +78,7 @@
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "pgstat.h"
+#include "replication/logicalworker.h"
#include "rewrite/rewriteDefine.h"
#include "rewrite/rewriteHandler.h"
#include "rewrite/rewriteManip.h"
@@ -1899,6 +1900,9 @@ ExecuteTruncateGuts(List *explicit_rels,
continue;
}
+ /* Set logical replication error callback info if necessary */
+ set_logicalrep_error_context_rel(rel);
+
/*
* Build the lists of foreign tables belonging to each foreign server
* and pass each list to the foreign data wrapper's callback function,
@@ -2006,6 +2010,9 @@ ExecuteTruncateGuts(List *explicit_rels,
pgstat_count_truncate(rel);
}
+ /* Reset logical replication error callback info */
+ reset_logicalrep_error_context_rel();
+

Setting up logical rep error context in a generic function looks a bit
odd to me. Do we really need to set up error context here? I
understand we can't do this in caller but anyway I think we are not
sending this to logical replication view as well, so not sure we need
to do it here.

Yeah, I'm not convinced of this part yet. I wanted to show relid also
in truncate cases but I came up with only this idea.

If an error happens during truncating the table (in
ExecuteTruncateGuts()), relid set by
set_logicalrep_error_context_rel() is actually sent to the view. If we
don’t have it, the view always shows relid as NULL in truncate cases.
On the other hand, it doesn’t cover all cases. For example, it doesn’t
cover an error that the target table doesn’t exist on the subscriber,
which happens when opening the target table. Anyway, in most cases,
even if relid is NULL, the error message in the view helps users to
know which relation the error happened on. What do you think?

2.
+/* Struct for saving and restoring apply information */
+typedef struct ApplyErrCallbackArg
+{
+ LogicalRepMsgType command; /* 0 if invalid */
+
+ /* Local relation information */
+ char    *nspname; /* used for error context */
+ char    *relname; /* used for error context */
+
+ TransactionId remote_xid;
+ TimestampTz committs;
+} ApplyErrCallbackArg;
+static ApplyErrCallbackArg apply_error_callback_arg =
+{
+ .command = 0,
+ .relname = NULL,
+ .nspname = NULL,
+ .remote_xid = InvalidTransactionId,
+ .committs = 0,
+};
+

Better to have a space between the above two declarations.

Will fix.

3. commit message:
This commit adds the error context to errors happening during applying
logical replication changes, showing the command, the relation
relation, transaction ID, and commit timestamp in the server log.

'relation' is mentioned twice.

Will fix.

The patch is not getting applied probably due to yesterday's commit in
this area.

Okay. I'll rebase the patches to the current HEAD.

I'm incorporating all comments from you and Houzj, and will submit the
new patch soon.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#75Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#73)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jul 30, 2021 at 3:47 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On July 29, 2021 1:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sorry I've attached wrong ones. Reattached the correct version patches.

Hi,

I had some comments on the new version patches.

Thank you for the comments!

1)

-       relstate = (SubscriptionRelState *) palloc(sizeof(SubscriptionRelState));
-       relstate->relid = subrel->srrelid;
+       relstate = (SubscriptionRelState *) hash_search(htab, (void *) &subrel->srrelid,
+                                                       HASH_ENTER, NULL);

I found the new version patch changes the List type 'relstate' to hash table type
'relstate'. Will this bring significant performance improvements ?

For pgstat_vacuum_stat() purposes, I think it's better to use a hash
table to avoid O(N) lookup. But it might not be good to change the
type of the return value of GetSubscriptionNotReadyRelations() since
this returned value is used by other functions to iterate over
elements. The list iteration is faster than the hash table’s one. It
would be better to change it so that pgstat_vacuum_stat() constructs a
hash table for its own purpose.

2)
+ * PgStat_StatSubRelErrEntry represents a error happened during logical

a error => an error

Will fix.

3)
+CREATE VIEW pg_stat_subscription_errors AS
+    SELECT
+   d.datname,
+   sr.subid,
+   s.subname,

It seems the 'subid' column is not mentioned in the document of the
pg_stat_subscription_errors view.

Will fix.

4)
+
+                   if (fread(&nrels, 1, sizeof(long), fpin) != sizeof(long))
+                   {
...
+                   for (int i = 0; i < nrels; i++)

the type of i(int) seems different of the type or 'nrels'(long), it might be
better to use the same type.

Will fix.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#76Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#74)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 2, 2021 at 7:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Setting up logical rep error context in a generic function looks a bit
odd to me. Do we really need to set up error context here? I
understand we can't do this in caller but anyway I think we are not
sending this to logical replication view as well, so not sure we need
to do it here.

Yeah, I'm not convinced of this part yet. I wanted to show relid also
in truncate cases but I came up with only this idea.

If an error happens during truncating the table (in
ExecuteTruncateGuts()), relid set by
set_logicalrep_error_context_rel() is actually sent to the view. If we
don’t have it, the view always shows relid as NULL in truncate cases.
On the other hand, it doesn’t cover all cases. For example, it doesn’t
cover an error that the target table doesn’t exist on the subscriber,
which happens when opening the target table. Anyway, in most cases,
even if relid is NULL, the error message in the view helps users to
know which relation the error happened on. What do you think?

Yeah, I also think at this stage error message is sufficient in such cases.

--
With Regards,
Amit Kapila.

#77Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#76)
4 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 2, 2021 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 2, 2021 at 7:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Setting up logical rep error context in a generic function looks a bit
odd to me. Do we really need to set up error context here? I
understand we can't do this in caller but anyway I think we are not
sending this to logical replication view as well, so not sure we need
to do it here.

Yeah, I'm not convinced of this part yet. I wanted to show relid also
in truncate cases but I came up with only this idea.

If an error happens during truncating the table (in
ExecuteTruncateGuts()), relid set by
set_logicalrep_error_context_rel() is actually sent to the view. If we
don’t have it, the view always shows relid as NULL in truncate cases.
On the other hand, it doesn’t cover all cases. For example, it doesn’t
cover an error that the target table doesn’t exist on the subscriber,
which happens when opening the target table. Anyway, in most cases,
even if relid is NULL, the error message in the view helps users to
know which relation the error happened on. What do you think?

Yeah, I also think at this stage error message is sufficient in such cases.

I've attached new patches that incorporate all comments I got so far.
Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v5-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v5-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v5-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v5-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v5-0001-Add-errcontext-to-errors-happening-during-applyin.patchapplication/octet-stream; name=v5-0001-Add-errcontext-to-errors-happening-during-applyin.patch
v5-0002-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v5-0002-Add-pg_stat_subscription_errors-statistics-view.patch
#78vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#77)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 3, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 2, 2021 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 2, 2021 at 7:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Setting up logical rep error context in a generic function looks a bit
odd to me. Do we really need to set up error context here? I
understand we can't do this in caller but anyway I think we are not
sending this to logical replication view as well, so not sure we need
to do it here.

Yeah, I'm not convinced of this part yet. I wanted to show relid also
in truncate cases but I came up with only this idea.

If an error happens during truncating the table (in
ExecuteTruncateGuts()), relid set by
set_logicalrep_error_context_rel() is actually sent to the view. If we
don’t have it, the view always shows relid as NULL in truncate cases.
On the other hand, it doesn’t cover all cases. For example, it doesn’t
cover an error that the target table doesn’t exist on the subscriber,
which happens when opening the target table. Anyway, in most cases,
even if relid is NULL, the error message in the view helps users to
know which relation the error happened on. What do you think?

Yeah, I also think at this stage error message is sufficient in such cases.

I've attached new patches that incorporate all comments I got so far.
Please review them.

I had a look at the first patch, couple of minor comments:
1) Should we include this in typedefs.lst
+/* Struct for saving and restoring apply information */
+typedef struct ApplyErrCallbackArg
+{
+       LogicalRepMsgType command;      /* 0 if invalid */
+
+       /* Local relation information */
+       char       *nspname;
2)  We can keep the case statement in the same order as in the
LogicalRepMsgType enum, this will help in easily identifying if any
enum gets missed.
+               case LOGICAL_REP_MSG_RELATION:
+                       return "RELATION";
+               case LOGICAL_REP_MSG_TYPE:
+                       return "TYPE";
+               case LOGICAL_REP_MSG_ORIGIN:
+                       return "ORIGIN";
+               case LOGICAL_REP_MSG_MESSAGE:
+                       return "MESSAGE";
+               case LOGICAL_REP_MSG_STREAM_START:
+                       return "STREAM START";

Regards,
Vignesh

#79houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#77)
RE: Skipping logical replication transactions on subscriber side

On Tuesday, August 3, 2021 2:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached new patches that incorporate all comments I got so far.
Please review them.

Hi,

I had a few comments for the 0003 patch.

1).
-      This clause alters parameters originally set by
-      <xref linkend="sql-createsubscription"/>.  See there for more
-      information.  The parameters that can be altered
-      are <literal>slot_name</literal>,
-      <literal>synchronous_commit</literal>,
-      <literal>binary</literal>, and
-      <literal>streaming</literal>.
+      This clause sets or resets a subscription option. The parameters that can be
+      set are the parameters originally set by <xref linkend="sql-createsubscription"/>:
+      <literal>slot_name</literal>, <literal>synchronous_commit</literal>,
+      <literal>binary</literal>, <literal>streaming</literal>.
+     </para>
+     <para>
+       The parameters that can be reset are: <literal>streaming</literal>,
+       <literal>binary</literal>, <literal>synchronous_commit</literal>.

Maybe the doc looks better like the following ?

+      This clause alters parameters originally set by
+      <xref linkend="sql-createsubscription"/>.  See there for more
+      information.  The parameters that can be set
+      are <literal>slot_name</literal>,
+      <literal>synchronous_commit</literal>,
+      <literal>binary</literal>, and
+      <literal>streaming</literal>.
+     </para>
+     <para>
+       The parameters that can be reset are: <literal>streaming</literal>,
+       <literal>binary</literal>, <literal>synchronous_commit</literal>.
2).
-           opts->create_slot = defGetBoolean(defel);
+           if (!is_reset)
+               opts->create_slot = defGetBoolean(defel);
        }

Since we only support RESET streaming/binary/synchronous_commit, it
might be unnecessary to add the check 'if (!is_reset)' for other
option.

3).
typedef struct AlterSubscriptionStmt
{
NodeTag type;
AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_OPTIONS, etc */

Since the patch change the remove the enum value
'ALTER_SUBSCRIPTION_OPTIONS', it'd better to change the comment here
as well.

Best regards,
houzj

#80Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#78)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 3, 2021 at 7:54 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Aug 3, 2021 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 2, 2021 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 2, 2021 at 7:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Setting up logical rep error context in a generic function looks a bit
odd to me. Do we really need to set up error context here? I
understand we can't do this in caller but anyway I think we are not
sending this to logical replication view as well, so not sure we need
to do it here.

Yeah, I'm not convinced of this part yet. I wanted to show relid also
in truncate cases but I came up with only this idea.

If an error happens during truncating the table (in
ExecuteTruncateGuts()), relid set by
set_logicalrep_error_context_rel() is actually sent to the view. If we
don’t have it, the view always shows relid as NULL in truncate cases.
On the other hand, it doesn’t cover all cases. For example, it doesn’t
cover an error that the target table doesn’t exist on the subscriber,
which happens when opening the target table. Anyway, in most cases,
even if relid is NULL, the error message in the view helps users to
know which relation the error happened on. What do you think?

Yeah, I also think at this stage error message is sufficient in such cases.

I've attached new patches that incorporate all comments I got so far.
Please review them.

I had a look at the first patch, couple of minor comments:
1) Should we include this in typedefs.lst
+/* Struct for saving and restoring apply information */
+typedef struct ApplyErrCallbackArg
+{
+       LogicalRepMsgType command;      /* 0 if invalid */
+
+       /* Local relation information */
+       char       *nspname;
2)  We can keep the case statement in the same order as in the
LogicalRepMsgType enum, this will help in easily identifying if any
enum gets missed.
+               case LOGICAL_REP_MSG_RELATION:
+                       return "RELATION";
+               case LOGICAL_REP_MSG_TYPE:
+                       return "TYPE";
+               case LOGICAL_REP_MSG_ORIGIN:
+                       return "ORIGIN";
+               case LOGICAL_REP_MSG_MESSAGE:
+                       return "MESSAGE";
+               case LOGICAL_REP_MSG_STREAM_START:
+                       return "STREAM START";

Thank you for reviewing the patch!

I agreed with all comments and will fix them in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#81Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#79)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 4, 2021 at 1:02 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, August 3, 2021 2:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached new patches that incorporate all comments I got so far.
Please review them.

Hi,

I had a few comments for the 0003 patch.

Thanks for reviewing the patch!

1).
-      This clause alters parameters originally set by
-      <xref linkend="sql-createsubscription"/>.  See there for more
-      information.  The parameters that can be altered
-      are <literal>slot_name</literal>,
-      <literal>synchronous_commit</literal>,
-      <literal>binary</literal>, and
-      <literal>streaming</literal>.
+      This clause sets or resets a subscription option. The parameters that can be
+      set are the parameters originally set by <xref linkend="sql-createsubscription"/>:
+      <literal>slot_name</literal>, <literal>synchronous_commit</literal>,
+      <literal>binary</literal>, <literal>streaming</literal>.
+     </para>
+     <para>
+       The parameters that can be reset are: <literal>streaming</literal>,
+       <literal>binary</literal>, <literal>synchronous_commit</literal>.

Maybe the doc looks better like the following ?

+      This clause alters parameters originally set by
+      <xref linkend="sql-createsubscription"/>.  See there for more
+      information.  The parameters that can be set
+      are <literal>slot_name</literal>,
+      <literal>synchronous_commit</literal>,
+      <literal>binary</literal>, and
+      <literal>streaming</literal>.
+     </para>
+     <para>
+       The parameters that can be reset are: <literal>streaming</literal>,
+       <literal>binary</literal>, <literal>synchronous_commit</literal>.

Agreed.

2).
-           opts->create_slot = defGetBoolean(defel);
+           if (!is_reset)
+               opts->create_slot = defGetBoolean(defel);
}

Since we only support RESET streaming/binary/synchronous_commit, it
might be unnecessary to add the check 'if (!is_reset)' for other
option.

Good point.

3).
typedef struct AlterSubscriptionStmt
{
NodeTag type;
AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_OPTIONS, etc */

Since the patch change the remove the enum value
'ALTER_SUBSCRIPTION_OPTIONS', it'd better to change the comment here
as well.

Agreed.

I'll incorporate those comments in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#82osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#77)
RE: Skipping logical replication transactions on subscriber side

On Tuesday, August 3, 2021 3:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached new patches that incorporate all comments I got so far.
Please review them.

Hi, I had a chance to look at the patch-set during my other development.
Just let me share some minor cosmetic things.

[1] unnatural wording ? in v5-0002.
+ * create tells whether to create the new subscription entry if it is not
+ * create tells whether to create the new subscription relation entry if it is

I'm not sure if this wording is correct or not.
You meant just "tells whether to create ...." ?,
although we already have 1 other "create tells" in HEAD.

[2]: typo "kep" in v05-0002.

I think you meant "kept" in below sentence.

+/*
+ * Subscription error statistics kep in the stats collector.  One entry represents
+ * an error that happened during logical replication, reported by the apply worker
+ * (subrelid is InvalidOid) or by the table sync worker (subrelid is a valid OID).

[3]: typo "lotigcal" in the v05-0004 commit message.

If incoming change violates any constraint, lotigcal replication stops
until it's resolved. This commit introduces another way to skip the
transaction in question.

It should be "logical".

[4]: warning of doc build

I've gotten an output like below during my process of make html.
Could you please check this ?

Link element has no content and no Endterm. Nothing to show in the link to monitoring-pg-stat-subscription-errors

Best Regards,
Takamichi Osumi

#83osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#81)
RE: Skipping logical replication transactions on subscriber side

On Wednesday, August 4, 2021 8:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'll incorporate those comments in the next version patch.

Hi, when are you going to make and share the updated v6 ?

Best Regards,
Takamichi Osumi

#84Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#82)
4 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 5, 2021 at 5:58 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, August 3, 2021 3:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached new patches that incorporate all comments I got so far.
Please review them.

Hi, I had a chance to look at the patch-set during my other development.
Just let me share some minor cosmetic things.

Thank you for reviewing the patches!

[1] unnatural wording ? in v5-0002.
+ * create tells whether to create the new subscription entry if it is not
+ * create tells whether to create the new subscription relation entry if it is

I'm not sure if this wording is correct or not.
You meant just "tells whether to create ...." ?,
although we already have 1 other "create tells" in HEAD.

create here means the function argument of
pgstat_get_subscription_entry() and
pgstat_get_subscription_error_entry(). That is, the function argument
'create' tells whether to create the new entry if not found. I
single-quoted the 'create' to avoid confusion.g

[2] typo "kep" in v05-0002.

I think you meant "kept" in below sentence.

+/*
+ * Subscription error statistics kep in the stats collector.  One entry represents
+ * an error that happened during logical replication, reported by the apply worker
+ * (subrelid is InvalidOid) or by the table sync worker (subrelid is a valid OID).

Fixed.

[3] typo "lotigcal" in the v05-0004 commit message.

If incoming change violates any constraint, lotigcal replication stops
until it's resolved. This commit introduces another way to skip the
transaction in question.

It should be "logical".

Fixed.

[4] warning of doc build

I've gotten an output like below during my process of make html.
Could you please check this ?

Link element has no content and no Endterm. Nothing to show in the link to monitoring-pg-stat-subscription-errors

Fixed.

I've attached the latest patches that incorporated all comments I got
so far. Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v6-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v6-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v6-0001-Add-errcontext-to-errors-happening-during-applyin.patchapplication/octet-stream; name=v6-0001-Add-errcontext-to-errors-happening-during-applyin.patch
v6-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v6-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v6-0002-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v6-0002-Add-pg_stat_subscription_errors-statistics-view.patch
#85Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#84)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 10, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the latest patches that incorporated all comments I got
so far. Please review them.

I am not able to apply the latest patch
(v6-0001-Add-errcontext-to-errors-happening-during-applyin) on HEAD,
getting the below error:
patching file src/backend/replication/logical/worker.c
Hunk #11 succeeded at 1195 (offset 50 lines).
Hunk #12 succeeded at 1253 (offset 50 lines).
Hunk #13 succeeded at 1277 (offset 50 lines).
Hunk #14 succeeded at 1305 (offset 50 lines).
Hunk #15 succeeded at 1330 (offset 50 lines).
Hunk #16 succeeded at 1362 (offset 50 lines).
Hunk #17 succeeded at 1508 (offset 50 lines).
Hunk #18 succeeded at 1524 (offset 50 lines).
Hunk #19 succeeded at 1645 (offset 50 lines).
Hunk #20 succeeded at 1671 (offset 50 lines).
Hunk #21 succeeded at 1772 (offset 50 lines).
Hunk #22 succeeded at 1828 (offset 50 lines).
Hunk #23 succeeded at 1934 (offset 50 lines).
Hunk #24 succeeded at 1962 (offset 50 lines).
Hunk #25 succeeded at 2399 (offset 50 lines).
Hunk #26 FAILED at 2405.
Hunk #27 succeeded at 3730 (offset 54 lines).
1 out of 27 hunks FAILED -- saving rejects to file
src/backend/replication/logical/worker.c.rej

--
With Regards,
Amit Kapila.

#86Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#85)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 10, 2021 at 11:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 10, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the latest patches that incorporated all comments I got
so far. Please review them.

I am not able to apply the latest patch
(v6-0001-Add-errcontext-to-errors-happening-during-applyin) on HEAD,
getting the below error:

Few comments on v6-0001-Add-errcontext-to-errors-happening-during-applyin
==============================================================

1. While applying DML operations, we are setting up the error context
multiple times due to which the context information is not
appropriate. The first is set in apply_dispatch and then during
processing, we set another error callback slot_store_error_callback in
slot_store_data and slot_modify_data. When I forced one of the errors
in slot_store_data(), it displays the below information in CONTEXT
which doesn't make much sense.

2021-08-10 15:16:39.887 IST [6784] ERROR: incorrect binary data
format in logical replication column 1
2021-08-10 15:16:39.887 IST [6784] CONTEXT: processing remote data
for replication target relation "public.test1" column "id"
during apply of "INSERT" for relation "public.test1" in
transaction with xid 740 committs 2021-08-10 14:44:38.058174+05:30

2.
I think we can slightly change the new context information as below:
Before
during apply of "INSERT" for relation "public.test1" in transaction
with xid 740 committs 2021-08-10 14:44:38.058174+05:30
After
during apply of "INSERT" for relation "public.test1" in transaction id
740 with commit timestamp 2021-08-10 14:44:38.058174+05:30

3.
+/* Struct for saving and restoring apply information */
+typedef struct ApplyErrCallbackArg
+{
+ LogicalRepMsgType command; /* 0 if invalid */
+
+ /* Local relation information */
+ char    *nspname;
+ char    *relname;

...
...

+
+static ApplyErrCallbackArg apply_error_callback_arg =
+{
+ .command = 0,
+ .relname = NULL,
+ .nspname = NULL,

Let's initialize the struct members in the order they are declared.
The order of relname and nspname should be another way.

4.
+
+ TransactionId remote_xid;
+ TimestampTz committs;
+} ApplyErrCallbackArg;

It might be better to add a comment like "remote xact information"
above these structure members.

5.
+static void
+apply_error_callback(void *arg)
+{
+ StringInfoData buf;
+
+ if (apply_error_callback_arg.command == 0)
+ return;
+
+ initStringInfo(&buf);

At the end of this call, it is better to free this (pfree(buf.data))

6. In the commit message, you might want to indicate that this
additional information can be used by the future patch to skip the
conflicting transaction.

--
With Regards,
Amit Kapila.

#87Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#85)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 10, 2021 at 3:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 10, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the latest patches that incorporated all comments I got
so far. Please review them.

I am not able to apply the latest patch
(v6-0001-Add-errcontext-to-errors-happening-during-applyin) on HEAD,
getting the below error:
patching file src/backend/replication/logical/worker.c
Hunk #11 succeeded at 1195 (offset 50 lines).
Hunk #12 succeeded at 1253 (offset 50 lines).
Hunk #13 succeeded at 1277 (offset 50 lines).
Hunk #14 succeeded at 1305 (offset 50 lines).
Hunk #15 succeeded at 1330 (offset 50 lines).
Hunk #16 succeeded at 1362 (offset 50 lines).
Hunk #17 succeeded at 1508 (offset 50 lines).
Hunk #18 succeeded at 1524 (offset 50 lines).
Hunk #19 succeeded at 1645 (offset 50 lines).
Hunk #20 succeeded at 1671 (offset 50 lines).
Hunk #21 succeeded at 1772 (offset 50 lines).
Hunk #22 succeeded at 1828 (offset 50 lines).
Hunk #23 succeeded at 1934 (offset 50 lines).
Hunk #24 succeeded at 1962 (offset 50 lines).
Hunk #25 succeeded at 2399 (offset 50 lines).
Hunk #26 FAILED at 2405.
Hunk #27 succeeded at 3730 (offset 54 lines).
1 out of 27 hunks FAILED -- saving rejects to file
src/backend/replication/logical/worker.c.rej

Sorry, I forgot to rebase the patches to the current HEAD. Since
stream_prepare is introduced, I'll add some tests to the patches. I’ll
submit the new patches tomorrow that also incorporates your comments
on v6-0001 patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#88Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#84)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 10, 2021 at 3:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the latest patches that incorporated all comments I got
so far. Please review them.

Some initial review comments on the v6-0001 patch:

src/backend/replication/logical/proto.c:
(1)

+ TimestampTz committs;

I think it looks better to name "committs" as "commit_ts", and also is
more consistent with naming for other member "remote_xid".

src/backend/replication/logical/worker.c:
(2)
To be consistent with all other function headers, should start
sentence with capital: "get" -> "Get"

+ * get string representing LogicalRepMsgType.

(3) It looks a bit cumbersome and repetitive to set/update the members
of apply_error_callback_arg in numerous places.

I suggest making the "set_apply_error_context..." and
"reset_apply_error_context..." functions as "static inline void"
functions (moving them to the top part of the source file, and
removing the existing function declarations for these).

Also, can add something similar to below:

static inline void
set_apply_error_callback_xid(TransactionId xid)
{
apply_error_callback_arg.remote_xid = xid;
}

static inline void
set_apply_error_callback_xid_info(TransactionId xid, TimestampTz commit_ts)
{
apply_error_callback_arg.remote_xid = xid;
apply_error_callback_arg.commit_ts = commit_ts;
}

so that instances of, for example:

apply_error_callback_arg.remote_xid = prepare_data.xid;
apply_error_callback_arg.committs = prepare_data.commit_time;

can be:

set_apply_error_callback_tx_info(prepare_data.xid, prepare_data.commit_time);

(4) The apply_error_callback() function is missing a function header/comment.

Regards,
Greg Nancarrow
Fujitsu Australia

#89Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#86)
4 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 10, 2021 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 10, 2021 at 11:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 10, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the latest patches that incorporated all comments I got
so far. Please review them.

I am not able to apply the latest patch
(v6-0001-Add-errcontext-to-errors-happening-during-applyin) on HEAD,
getting the below error:

Few comments on v6-0001-Add-errcontext-to-errors-happening-during-applyin

Thank you for the comments!

==============================================================

1. While applying DML operations, we are setting up the error context
multiple times due to which the context information is not
appropriate. The first is set in apply_dispatch and then during
processing, we set another error callback slot_store_error_callback in
slot_store_data and slot_modify_data. When I forced one of the errors
in slot_store_data(), it displays the below information in CONTEXT
which doesn't make much sense.

2021-08-10 15:16:39.887 IST [6784] ERROR: incorrect binary data
format in logical replication column 1
2021-08-10 15:16:39.887 IST [6784] CONTEXT: processing remote data
for replication target relation "public.test1" column "id"
during apply of "INSERT" for relation "public.test1" in
transaction with xid 740 committs 2021-08-10 14:44:38.058174+05:30

Yes, but we cannot change the error context message depending on other
error context messages. So it seems hard to construct a complete
sentence in the context message that is okay in terms of English
grammar. Is the following message better?

CONTEXT: processing remote data for replication target relation
"public.test1" column “id"
applying "INSERT" for relation "public.test1” in transaction
with xid 740 committs 2021-08-10 14:44:38.058174+05:30

2.
I think we can slightly change the new context information as below:
Before
during apply of "INSERT" for relation "public.test1" in transaction
with xid 740 committs 2021-08-10 14:44:38.058174+05:30
After
during apply of "INSERT" for relation "public.test1" in transaction id
740 with commit timestamp 2021-08-10 14:44:38.058174+05:30

Fixed.

3.
+/* Struct for saving and restoring apply information */
+typedef struct ApplyErrCallbackArg
+{
+ LogicalRepMsgType command; /* 0 if invalid */
+
+ /* Local relation information */
+ char    *nspname;
+ char    *relname;

...
...

+
+static ApplyErrCallbackArg apply_error_callback_arg =
+{
+ .command = 0,
+ .relname = NULL,
+ .nspname = NULL,

Let's initialize the struct members in the order they are declared.
The order of relname and nspname should be another way.

Fixed.

4.
+
+ TransactionId remote_xid;
+ TimestampTz committs;
+} ApplyErrCallbackArg;

It might be better to add a comment like "remote xact information"
above these structure members.

Fixed.

5.
+static void
+apply_error_callback(void *arg)
+{
+ StringInfoData buf;
+
+ if (apply_error_callback_arg.command == 0)
+ return;
+
+ initStringInfo(&buf);

At the end of this call, it is better to free this (pfree(buf.data))

Fixed.

6. In the commit message, you might want to indicate that this
additional information can be used by the future patch to skip the
conflicting transaction.

Fixed.

I've attached the new patches. Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v7-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v7-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v7-0001-Add-errcontext-to-errors-happening-during-applyin.patchapplication/octet-stream; name=v7-0001-Add-errcontext-to-errors-happening-during-applyin.patch
v7-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v7-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v7-0002-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v7-0002-Add-pg_stat_subscription_errors-statistics-view.patch
#90Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#88)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 10, 2021 at 10:27 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Aug 10, 2021 at 3:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the latest patches that incorporated all comments I got
so far. Please review them.

Some initial review comments on the v6-0001 patch:

Thanks for reviewing the patch!

src/backend/replication/logical/proto.c:
(1)

+ TimestampTz committs;

I think it looks better to name "committs" as "commit_ts", and also is
more consistent with naming for other member "remote_xid".

Fixed.

src/backend/replication/logical/worker.c:
(2)
To be consistent with all other function headers, should start
sentence with capital: "get" -> "Get"

+ * get string representing LogicalRepMsgType.

Fixed

(3) It looks a bit cumbersome and repetitive to set/update the members
of apply_error_callback_arg in numerous places.

I suggest making the "set_apply_error_context..." and
"reset_apply_error_context..." functions as "static inline void"
functions (moving them to the top part of the source file, and
removing the existing function declarations for these).

Also, can add something similar to below:

static inline void
set_apply_error_callback_xid(TransactionId xid)
{
apply_error_callback_arg.remote_xid = xid;
}

static inline void
set_apply_error_callback_xid_info(TransactionId xid, TimestampTz commit_ts)
{
apply_error_callback_arg.remote_xid = xid;
apply_error_callback_arg.commit_ts = commit_ts;
}

so that instances of, for example:

apply_error_callback_arg.remote_xid = prepare_data.xid;
apply_error_callback_arg.committs = prepare_data.commit_time;

can be:

set_apply_error_callback_tx_info(prepare_data.xid, prepare_data.commit_time);

Okay. I've added set_apply_error_callback_xact() function to set
transaction information to apply error callback. Also, I inlined those
helper functions since we call them every change.

(4) The apply_error_callback() function is missing a function header/comment.

Added.

The fixes for the above comments are incorporated in the v7 patch I
just submitted[1]/messages/by-id/CAD21AoALAq_0q_Zz2K0tO=kuUj8aBrDdMJXbey1P6t4w8snpQQ@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoALAq_0q_Zz2K0tO=kuUj8aBrDdMJXbey1P6t4w8snpQQ@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#91Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#89)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 11, 2021 at 11:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 10, 2021 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

==============================================================

1. While applying DML operations, we are setting up the error context
multiple times due to which the context information is not
appropriate. The first is set in apply_dispatch and then during
processing, we set another error callback slot_store_error_callback in
slot_store_data and slot_modify_data. When I forced one of the errors
in slot_store_data(), it displays the below information in CONTEXT
which doesn't make much sense.

2021-08-10 15:16:39.887 IST [6784] ERROR: incorrect binary data
format in logical replication column 1
2021-08-10 15:16:39.887 IST [6784] CONTEXT: processing remote data
for replication target relation "public.test1" column "id"
during apply of "INSERT" for relation "public.test1" in
transaction with xid 740 committs 2021-08-10 14:44:38.058174+05:30

Yes, but we cannot change the error context message depending on other
error context messages. So it seems hard to construct a complete
sentence in the context message that is okay in terms of English
grammar. Is the following message better?

CONTEXT: processing remote data for replication target relation
"public.test1" column “id"
applying "INSERT" for relation "public.test1” in transaction
with xid 740 committs 2021-08-10 14:44:38.058174+05:30

I don't like the proposed text. How about if we combine both and have
something like: "processing remote data during "UPDATE" for
replication target relation "public.test1" column "id" in transaction
id 740 with commit timestamp 2021-08-10 14:44:38.058174+05:30"? For
this, I think we need to remove slot_store_error_callback and
add/change the ApplyErrCallbackArg to include the additional required
information in that callback.

--
With Regards,
Amit Kapila.

#92Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#89)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 11, 2021 at 2:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the new patches. Please review them.

Please note that newly added tap tests fail due to known assertion
failure in pgstats that I reported here[1]/messages/by-id/CAD21AoCCAa+J1-udHRo5-Hbtv=D38WdZDAaXZGDbQQ_Vg_d3bQ@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoCCAa+J1-udHRo5-Hbtv=D38WdZDAaXZGDbQQ_Vg_d3bQ@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#93Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#91)
5 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 11, 2021 at 5:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 11, 2021 at 11:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 10, 2021 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

==============================================================

1. While applying DML operations, we are setting up the error context
multiple times due to which the context information is not
appropriate. The first is set in apply_dispatch and then during
processing, we set another error callback slot_store_error_callback in
slot_store_data and slot_modify_data. When I forced one of the errors
in slot_store_data(), it displays the below information in CONTEXT
which doesn't make much sense.

2021-08-10 15:16:39.887 IST [6784] ERROR: incorrect binary data
format in logical replication column 1
2021-08-10 15:16:39.887 IST [6784] CONTEXT: processing remote data
for replication target relation "public.test1" column "id"
during apply of "INSERT" for relation "public.test1" in
transaction with xid 740 committs 2021-08-10 14:44:38.058174+05:30

Yes, but we cannot change the error context message depending on other
error context messages. So it seems hard to construct a complete
sentence in the context message that is okay in terms of English
grammar. Is the following message better?

CONTEXT: processing remote data for replication target relation
"public.test1" column “id"
applying "INSERT" for relation "public.test1” in transaction
with xid 740 committs 2021-08-10 14:44:38.058174+05:30

I don't like the proposed text. How about if we combine both and have
something like: "processing remote data during "UPDATE" for
replication target relation "public.test1" column "id" in transaction
id 740 with commit timestamp 2021-08-10 14:44:38.058174+05:30"? For
this, I think we need to remove slot_store_error_callback and
add/change the ApplyErrCallbackArg to include the additional required
information in that callback.

Oh, I've never thought about that. That's a good idea.

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset
cleanup to make cfbot tests happy.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v8-0005-Move-shared-fileset-cleanup-to-before_shmem_exit.patchapplication/octet-stream; name=v8-0005-Move-shared-fileset-cleanup-to-before_shmem_exit.patch
v8-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v8-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v8-0001-Add-logical-changes-details-to-errcontext-of-appl.patchapplication/octet-stream; name=v8-0001-Add-logical-changes-details-to-errcontext-of-appl.patch
v8-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v8-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v8-0002-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v8-0002-Add-pg_stat_subscription_errors-statistics-view.patch
#94Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#93)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 12, 2021 at 3:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset
cleanup to make cfbot tests happy.

A minor comment on the 0001 patch: In the message I think that using
"ID" would look better than lowercase "id" and AFAICS it's more
consistent with existing messages.

+ appendStringInfo(&buf, _(" in transaction id %u with commit timestamp %s"),

Regards,
Greg Nancarrow
Fujitsu Australia

#95Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Greg Nancarrow (#94)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 12, 2021 at 1:21 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Thu, Aug 12, 2021 at 3:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset
cleanup to make cfbot tests happy.

A minor comment on the 0001 patch: In the message I think that using
"ID" would look better than lowercase "id" and AFAICS it's more
consistent with existing messages.

+ appendStringInfo(&buf, _(" in transaction id %u with commit timestamp %s"),

You have a point but I think in this case it might look a bit odd as
we have another field 'commit timestamp' after that which is
lowercase.

--
With Regards,
Amit Kapila.

#96Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Amit Kapila (#95)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 12, 2021 at 9:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

A minor comment on the 0001 patch: In the message I think that using
"ID" would look better than lowercase "id" and AFAICS it's more
consistent with existing messages.

+ appendStringInfo(&buf, _(" in transaction id %u with commit timestamp %s"),

You have a point but I think in this case it might look a bit odd as
we have another field 'commit timestamp' after that which is
lowercase.

I did a quick search and I couldn't find any other messages in the
Postgres code that use "transaction id", but I could find some that
use "transaction ID" and "transaction identifier".

Regards,
Greg Nancarrow
Fujitsu Australia

#97Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Greg Nancarrow (#96)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 12, 2021 at 5:41 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Thu, Aug 12, 2021 at 9:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

A minor comment on the 0001 patch: In the message I think that using
"ID" would look better than lowercase "id" and AFAICS it's more
consistent with existing messages.

+ appendStringInfo(&buf, _(" in transaction id %u with commit timestamp %s"),

You have a point but I think in this case it might look a bit odd as
we have another field 'commit timestamp' after that which is
lowercase.

I did a quick search and I couldn't find any other messages in the
Postgres code that use "transaction id", but I could find some that
use "transaction ID" and "transaction identifier".

Okay, but that doesn't mean using it here is bad. I am personally fine
with a message containing something like "... in transaction
id 740 with commit timestamp 2021-08-10 14:44:38.058174+05:30" but I
won't mind if you and or others find some other way convenient. Any
opinion from others?

--
With Regards,
Amit Kapila.

#98Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Amit Kapila (#97)
Re: Skipping logical replication transactions on subscriber side

On Fri, Aug 13, 2021 at 2:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 12, 2021 at 5:41 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Thu, Aug 12, 2021 at 9:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

A minor comment on the 0001 patch: In the message I think that using
"ID" would look better than lowercase "id" and AFAICS it's more
consistent with existing messages.

+ appendStringInfo(&buf, _(" in transaction id %u with commit timestamp %s"),

You have a point but I think in this case it might look a bit odd as
we have another field 'commit timestamp' after that which is
lowercase.

I did a quick search and I couldn't find any other messages in the
Postgres code that use "transaction id", but I could find some that
use "transaction ID" and "transaction identifier".

Okay, but that doesn't mean using it here is bad. I am personally fine
with a message containing something like "... in transaction
id 740 with commit timestamp 2021-08-10 14:44:38.058174+05:30" but I
won't mind if you and or others find some other way convenient. Any
opinion from others?

Just to be clear, all I was saying is that I thought using uppercase
"ID" looked better in the message, and was more consistent with
existing logged messages, than using lowercase "id".
i.e. my suggestion was a trivial change:

BEFORE:
+ appendStringInfo(&buf, _(" in transaction id %u with commit timestamp %s"),
AFTER:
+ appendStringInfo(&buf, _(" in transaction ID %u with commit timestamp %s"),

But it was just a suggestion. Maybe others feel differently.

Regards,
Greg Nancarrow
Fujitsu Australia

#99Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#97)
Re: Skipping logical replication transactions on subscriber side

On Fri, Aug 13, 2021 at 1:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 12, 2021 at 5:41 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Thu, Aug 12, 2021 at 9:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

A minor comment on the 0001 patch: In the message I think that using
"ID" would look better than lowercase "id" and AFAICS it's more
consistent with existing messages.

+ appendStringInfo(&buf, _(" in transaction id %u with commit timestamp %s"),

You have a point but I think in this case it might look a bit odd as
we have another field 'commit timestamp' after that which is
lowercase.

I did a quick search and I couldn't find any other messages in the
Postgres code that use "transaction id", but I could find some that
use "transaction ID" and "transaction identifier".

Okay, but that doesn't mean using it here is bad. I am personally fine
with a message containing something like "... in transaction
id 740 with commit timestamp 2021-08-10 14:44:38.058174+05:30" but I
won't mind if you and or others find some other way convenient. Any
opinion from others?

I don't have a strong opinion on this but in terms of consistency we
often use like "transaction %u" in messages when showing XID value,
rather than "transaction [id|ID|identifier]":

$ git grep -i "errmsg.*transaction %u" src/backend/
src/backend/access/transam/commit_ts.c: errmsg("cannot
retrieve commit timestamp for transaction %u", xid)));
src/backend/access/transam/slru.c: errmsg("could not
access status of transaction %u", xid),
src/backend/access/transam/slru.c: errmsg("could not
access status of transaction %u", xid),
src/backend/access/transam/slru.c: errmsg("could
not access status of transaction %u", xid),
src/backend/access/transam/slru.c: (errmsg("could
not access status of transaction %u", xid),
src/backend/access/transam/slru.c: errmsg("could
not access status of transaction %u", xid),
src/backend/access/transam/slru.c: (errmsg("could
not access status of transaction %u", xid),
src/backend/access/transam/slru.c: errmsg("could not
access status of transaction %u", xid),
src/backend/access/transam/slru.c: errmsg("could not
access status of transaction %u", xid),
src/backend/access/transam/twophase.c:
(errmsg("recovering prepared transaction %u from shared memory",
xid)));
src/backend/access/transam/twophase.c:
(errmsg("removing stale two-phase state file for transaction %u",
src/backend/access/transam/twophase.c:
(errmsg("removing stale two-phase state from memory for transaction
%u",
src/backend/access/transam/twophase.c:
(errmsg("removing future two-phase state file for transaction %u",
src/backend/access/transam/twophase.c:
(errmsg("removing future two-phase state from memory for transaction
%u",
src/backend/access/transam/twophase.c:
errmsg("corrupted two-phase state file for transaction %u",
src/backend/access/transam/twophase.c:
errmsg("corrupted two-phase state in memory for transaction %u",
src/backend/access/transam/xlog.c: (errmsg("recovery
stopping before commit of transaction %u, time %s",
src/backend/access/transam/xlog.c: (errmsg("recovery
stopping before abort of transaction %u, time %s",
src/backend/access/transam/xlog.c:
(errmsg("recovery stopping after commit of transaction %u, time %s",
src/backend/access/transam/xlog.c:
(errmsg("recovery stopping after abort of transaction %u, time %s",
src/backend/replication/logical/worker.c:
errmsg_internal("transaction %u not found in stream XID hash table",
src/backend/replication/logical/worker.c:
errmsg_internal("transaction %u not found in stream XID hash table",
src/backend/replication/logical/worker.c:
errmsg_internal("transaction %u not found in stream XID hash table",
src/backend/replication/logical/worker.c:
errmsg_internal("transaction %u not found in stream XID hash table",

$ git grep -i "errmsg.*transaction identifier" src/backend/
src/backend/access/transam/twophase.c:
errmsg("transaction identifier \"%s\" is too long",
src/backend/access/transam/twophase.c:
errmsg("transaction identifier \"%s\" is already in use",

$ git grep -i "errmsg.*transaction id" src/backend/
src/backend/access/transam/twophase.c:
errmsg("transaction identifier \"%s\" is too long",
src/backend/access/transam/twophase.c:
errmsg("transaction identifier \"%s\" is already in use",
src/backend/access/transam/varsup.c:
(errmsg_internal("transaction ID wrap limit is %u, limited by database
with OID %u",
src/backend/access/transam/xlog.c: (errmsg_internal("next
transaction ID: " UINT64_FORMAT "; next OID: %u",
src/backend/access/transam/xlog.c: (errmsg_internal("oldest
unfrozen transaction ID: %u, in database %u",
src/backend/access/transam/xlog.c: (errmsg("invalid next
transaction ID")));
src/backend/replication/logical/snapbuild.c:
(errmsg_plural("exported logical decoding snapshot: \"%s\" with %u
transaction ID",
src/backend/replication/logical/worker.c:
errmsg_internal("invalid transaction ID in streamed replication
transaction")));
src/backend/replication/logical/worker.c:
errmsg_internal("invalid transaction ID in streamed replication
transaction")));
src/backend/replication/logical/worker.c:
errmsg_internal("invalid two-phase transaction ID")));
src/backend/utils/adt/xid8funcs.c: errmsg("transaction
ID %s is in the future",

Therefore, perhaps a message like "... in transaction 740 with commit
timestamp 2021-08-10 14:44:38.058174+05:30" is better in terms of
consistency with other messages?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#100Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#99)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 16, 2021 at 6:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Therefore, perhaps a message like "... in transaction 740 with commit
timestamp 2021-08-10 14:44:38.058174+05:30" is better in terms of
consistency with other messages?

Yes, I think that would be more consistent.

On another note, for the 0001 patch, the elog ERROR at the bottom of
the logicalrep_message_type() function seems to assume that the
unrecognized "action" is a printable character (with its use of %c)
and also that the character is meaningful to the user in some way.
But given that the compiler normally warns of an unhandled enum value
when switching on an enum, such an error would most likely be when
action is some int value that wouldn't be meaningful to the user (as
it wouldn't be one of the LogicalRepMsgType enum values).
I therefore think it would be better to use %d in that ERROR:

i.e.

+ elog(ERROR, "invalid logical replication message type %d", action);

Similar comments apply to the apply_dispatch() function (and I realise
it used %c before your patch).

Regards,
Greg Nancarrow
Fujitsu Australia

#101Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#99)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 16, 2021 at 1:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Aug 13, 2021 at 1:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, but that doesn't mean using it here is bad. I am personally fine
with a message containing something like "... in transaction
id 740 with commit timestamp 2021-08-10 14:44:38.058174+05:30" but I
won't mind if you and or others find some other way convenient. Any
opinion from others?

I don't have a strong opinion on this but in terms of consistency we
often use like "transaction %u" in messages when showing XID value,
rather than "transaction [id|ID|identifier]":

..

Therefore, perhaps a message like "... in transaction 740 with commit
timestamp 2021-08-10 14:44:38.058174+05:30" is better in terms of
consistency with other messages?

+1.

--
With Regards,
Amit Kapila.

#102houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#93)
RE: Skipping logical replication transactions on subscriber side

On Thu, Aug 12, 2021 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset cleanup to make
cfbot tests happy.

Hi,

Thanks for the new patches.
I have a few comments on the v8-0001 patch.

1)
+
+	if (TransactionIdIsNormal(errarg->remote_xid))
+		appendStringInfo(&buf, _(" in transaction id %u with commit timestamp %s"),
+						 errarg->remote_xid,
+						 errarg->commit_ts == 0
+						 ? "(unset)"
+						 : timestamptz_to_str(errarg->commit_ts));
+
+	errcontext("%s", buf.data);
I think we can output the timestamp in a separete check which can be more
consistent with the other code style in apply_error_callback()
(ie)
+	if (errarg->commit_ts != 0)
+		appendStringInfo(&buf, _(" with commit timestamp %s"),
+						timestamptz_to_str(errarg->commit_ts));
2)
+/*
+ * Get string representing LogicalRepMsgType.
+ */
+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
...
+
+	elog(ERROR, "invalid logical replication message type \"%c\"", action);
+}

Some old compilers might complain that the function doesn't have a return value
at the end of the function, maybe we can code like the following:

+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
+	switch (action)
+	{
+		case LOGICAL_REP_MSG_BEGIN:
+			return "BEGIN";
...
+		default:
+			elog(ERROR, "invalid logical replication message type \"%c\"", action);
+	}
+	return NULL;				/* keep compiler quiet */
+}

3)
Do we need to invoke set_apply_error_context_xact() in the function
apply_handle_stream_prepare() to save the xid and timestamp ?

Best regards,
Hou zj

#103houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: houzj.fnst@fujitsu.com (#102)
RE: Skipping logical replication transactions on subscriber side

Monday, August 16, 2021 3:00 PM Hou, Zhijie wrote:

On Thu, Aug 12, 2021 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset
cleanup to make cfbot tests happy.

Hi,

Thanks for the new patches.
I have a few comments on the v8-0001 patch.
3)
Do we need to invoke set_apply_error_context_xact() in the function
apply_handle_stream_prepare() to save the xid and timestamp ?

Sorry, this comment wasn't correct, please ignore it.
Here is another comment:

+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
...
+		case LOGICAL_REP_MSG_STREAM_END:
+			return "STREAM END";
...

I think most the existing code use "STREAM STOP" to describe the
LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in
function logicalrep_message_type() too ?

Best regards,
Hou zj

#104Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: houzj.fnst@fujitsu.com (#103)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 16, 2021 at 5:54 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Here is another comment:

+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
...
+               case LOGICAL_REP_MSG_STREAM_END:
+                       return "STREAM END";
...

I think most the existing code use "STREAM STOP" to describe the
LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in
function logicalrep_message_type() too ?

+1
I think you're right, it should be "STREAM STOP" in that case.

Regards,
Greg Nancarrow
Fujitsu Australia

#105Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#93)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 12, 2021 at 3:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset
cleanup to make cfbot tests happy.

Another comment on the 0001 patch: as there is now a mix of setting
"apply_error_callback_arg" members directly and also through inline
functions, it might look better to have it done consistently with
functions having prototypes something like the following:

static inline void set_apply_error_context_rel(LogicalRepRelMapEntry *rel);
static inline void reset_apply_error_context_rel(void);
static inline void set_apply_error_context_attnum(int remote_attnum);

Regards,
Greg Nancarrow
Fujitsu Australia

#106Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#102)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 16, 2021 at 3:59 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Thu, Aug 12, 2021 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset cleanup to make
cfbot tests happy.

Hi,

Thanks for the new patches.
I have a few comments on the v8-0001 patch.

Thank you for the comments!

2)
+/*
+ * Get string representing LogicalRepMsgType.
+ */
+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
...
+
+       elog(ERROR, "invalid logical replication message type \"%c\"", action);
+}

Some old compilers might complain that the function doesn't have a return value
at the end of the function, maybe we can code like the following:

+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
+       switch (action)
+       {
+               case LOGICAL_REP_MSG_BEGIN:
+                       return "BEGIN";
...
+               default:
+                       elog(ERROR, "invalid logical replication message type \"%c\"", action);
+       }
+       return NULL;                            /* keep compiler quiet */
+}

Fixed.

3)
Do we need to invoke set_apply_error_context_xact() in the function
apply_handle_stream_prepare() to save the xid and timestamp ?

Yes. I think that v8-0001 patch already set xid and timestamp just
after parsing stream_prepare message. You meant it's not necessary?

I'll submit the updated patches soon.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#107Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#104)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 16, 2021 at 5:30 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 16, 2021 at 5:54 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Here is another comment:

+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
...
+               case LOGICAL_REP_MSG_STREAM_END:
+                       return "STREAM END";
...

I think most the existing code use "STREAM STOP" to describe the
LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in
function logicalrep_message_type() too ?

+1
I think you're right, it should be "STREAM STOP" in that case.

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#108Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#107)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 17, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 16, 2021 at 5:30 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 16, 2021 at 5:54 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Here is another comment:

+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
...
+               case LOGICAL_REP_MSG_STREAM_END:
+                       return "STREAM END";
...

I think most the existing code use "STREAM STOP" to describe the
LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in
function logicalrep_message_type() too ?

+1
I think you're right, it should be "STREAM STOP" in that case.

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END'
seems to be a bit better because of the value 'E' we use for it.

--
With Regards,
Amit Kapila.

#109tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#93)
RE: Skipping logical replication transactions on subscriber side

On Thursday, August 12, 2021 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset
cleanup to make cfbot tests happy.

Hi

Thanks for your patch. I met a problem when using it. The log is not what I expected in some cases, but in streaming mode, they work well.

For example:
------publisher------
create table test (a int primary key, b varchar);
create publication pub for table test;

------subscriber------
create table test (a int primary key, b varchar);
insert into test values (10000);
create subscription sub connection 'dbname=postgres port=5432' publication pub with(streaming=on);

------publisher------
insert into test values (10000);

Subscriber log:
2021-08-17 14:24:43.415 CST [3630341] ERROR: duplicate key value violates unique constraint "test_pkey"
2021-08-17 14:24:43.415 CST [3630341] DETAIL: Key (a)=(10000) already exists.

It didn't give more context info generated by apply_error_callback function.

In streaming mode(which worked as I expected):
------publisher------
INSERT INTO test SELECT i, md5(i::text) FROM generate_series(1, 10000) s(i);

Subscriber log:
2021-08-17 14:26:26.521 CST [3630510] ERROR: duplicate key value violates unique constraint "test_pkey"
2021-08-17 14:26:26.521 CST [3630510] DETAIL: Key (a)=(10000) already exists.
2021-08-17 14:26:26.521 CST [3630510] CONTEXT: processing remote data during "INSERT" for replication target relation "public.test" in transaction id 710 with commit timestamp 2021-08-17 14:26:26.403214+08

I looked into it briefly and thought it was related to some code in
apply_dispatch function. It set callback when apply_error_callback_arg.command
is 0, and reset the callback back at the end of the function. But
apply_error_callback_arg.command was not reset to 0, so it won't set callback
when calling apply_dispatch function next time.

I tried to fix it with the following change, thoughts?

@@ -2455,7 +2455,10 @@ apply_dispatch(StringInfo s)

        /* Pop the error context stack */
        if (set_callback)
+       {
                error_context_stack = errcallback.previous;
+               apply_error_callback_arg.command = 0;
+       }
 }

Besides, if we make the changes like this, do we still need to reset
apply_error_callback_arg.command in reset_apply_error_context_info function?

Regards
Tang

#110Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#108)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 17, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 17, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 16, 2021 at 5:30 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 16, 2021 at 5:54 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Here is another comment:

+char *
+logicalrep_message_type(LogicalRepMsgType action)
+{
...
+               case LOGICAL_REP_MSG_STREAM_END:
+                       return "STREAM END";
...

I think most the existing code use "STREAM STOP" to describe the
LOGICAL_REP_MSG_STREAM_END message, is it better to return "STREAM STOP" in
function logicalrep_message_type() too ?

+1
I think you're right, it should be "STREAM STOP" in that case.

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END'
seems to be a bit better because of the value 'E' we use for it.

But I think we don't care about the actual value of
LOGICAL_REP_MSG_STREAM_END since we use the enum value rather than
'E'?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#111Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#110)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 18, 2021 at 6:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 17, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END'
seems to be a bit better because of the value 'E' we use for it.

But I think we don't care about the actual value of
LOGICAL_REP_MSG_STREAM_END since we use the enum value rather than
'E'?

True, but here we are trying to be consistent with other enum values
where we try to use the first letter of the last word (which is E in
this case). I can see there are other cases where we are not
consistent so it won't be a big deal if we won't be consistent here. I
am neutral on this one, so, if you feel using STREAM_STOP would be
better from a code readability perspective then that is fine.

--
With Regards,
Amit Kapila.

#112Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#111)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 18, 2021 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 6:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 17, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END'
seems to be a bit better because of the value 'E' we use for it.

But I think we don't care about the actual value of
LOGICAL_REP_MSG_STREAM_END since we use the enum value rather than
'E'?

True, but here we are trying to be consistent with other enum values
where we try to use the first letter of the last word (which is E in
this case). I can see there are other cases where we are not
consistent so it won't be a big deal if we won't be consistent here. I
am neutral on this one, so, if you feel using STREAM_STOP would be
better from a code readability perspective then that is fine.

In addition of a code readability, there is a description in the doc
that mentions "Stream End" but we describe "Stream Stop" in the later
description, which seems a bug in the doc to me:

The following messages (Stream Start, Stream End, Stream Commit, and
Stream Abort) are available since protocol version 2.

</para>

(snip)

<varlistentry>
<term>
Stream Stop
</term>
<listitem>

Perhaps it's better to hear other opinions too, but I've attached the
patch. Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

0001-Rename-LOGICAL_REP_MSG_STREAM_END-to-LOGICAL_REP_MSG.patchapplication/octet-stream; name=0001-Rename-LOGICAL_REP_MSG_STREAM_END-to-LOGICAL_REP_MSG.patch
#113Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#112)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 18, 2021 at 10:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 18, 2021 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 6:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 17, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END'
seems to be a bit better because of the value 'E' we use for it.

But I think we don't care about the actual value of
LOGICAL_REP_MSG_STREAM_END since we use the enum value rather than
'E'?

True, but here we are trying to be consistent with other enum values
where we try to use the first letter of the last word (which is E in
this case). I can see there are other cases where we are not
consistent so it won't be a big deal if we won't be consistent here. I
am neutral on this one, so, if you feel using STREAM_STOP would be
better from a code readability perspective then that is fine.

In addition of a code readability, there is a description in the doc
that mentions "Stream End" but we describe "Stream Stop" in the later
description, which seems a bug in the doc to me:

Doc changes looks good to me. But, I have question for code change:

--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -65,7 +65,7 @@ typedef enum LogicalRepMsgType
  LOGICAL_REP_MSG_COMMIT_PREPARED = 'K',
  LOGICAL_REP_MSG_ROLLBACK_PREPARED = 'r',
  LOGICAL_REP_MSG_STREAM_START = 'S',
- LOGICAL_REP_MSG_STREAM_END = 'E',
+ LOGICAL_REP_MSG_STREAM_STOP = 'E',
  LOGICAL_REP_MSG_STREAM_COMMIT = 'c',

As this is changing the enum name and if any extension (logical
replication extension) has started using it then they would require a
change. As this is the latest change in PG-14, so it might be okay but
OTOH, as this is just a code readability change, shall we do it only
for PG-15?

--
With Regards,
Amit Kapila.

#114houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#106)
RE: Skipping logical replication transactions on subscriber side

On Tues, Aug 17, 2021 1:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 16, 2021 at 3:59 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:

3)
Do we need to invoke set_apply_error_context_xact() in the function
apply_handle_stream_prepare() to save the xid and timestamp ?

Yes. I think that v8-0001 patch already set xid and timestamp just after parsing
stream_prepare message. You meant it's not necessary?

Sorry, I thought of something wrong, please ignore the above comment.

I'll submit the updated patches soon.

I was thinking about the place to set the errcallback.callback.

apply_dispatch(StringInfo s)
 {
 	LogicalRepMsgType action = pq_getmsgbyte(s);
+	ErrorContextCallback errcallback;
+	bool		set_callback = false;
+
+	/*
+	 * Push apply error context callback if not yet. Other fields will be
+	 * filled during applying the change.  Since this function can be called
+	 * recursively when applying spooled changes, we set the callback only
+	 * once.
+	 */
+	if (apply_error_callback_arg.command == 0)
+	{
+		errcallback.callback = apply_error_callback;
+		errcallback.previous = error_context_stack;
+		error_context_stack = &errcallback;
+		set_callback = true;
+	}
...
+	/* Pop the error context stack */
+	if (set_callback)
+		error_context_stack = errcallback.previous;

It seems we can put the above code in the function LogicalRepApplyLoop()
around invoking apply_dispatch(), and in that approach we don't need to worry
about the recursively case. What do you think ?

Best regards,
Hou zj

#115Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#113)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 18, 2021 at 3:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 10:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 18, 2021 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 6:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 17, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END'
seems to be a bit better because of the value 'E' we use for it.

But I think we don't care about the actual value of
LOGICAL_REP_MSG_STREAM_END since we use the enum value rather than
'E'?

True, but here we are trying to be consistent with other enum values
where we try to use the first letter of the last word (which is E in
this case). I can see there are other cases where we are not
consistent so it won't be a big deal if we won't be consistent here. I
am neutral on this one, so, if you feel using STREAM_STOP would be
better from a code readability perspective then that is fine.

In addition of a code readability, there is a description in the doc
that mentions "Stream End" but we describe "Stream Stop" in the later
description, which seems a bug in the doc to me:

Doc changes looks good to me. But, I have question for code change:

--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -65,7 +65,7 @@ typedef enum LogicalRepMsgType
LOGICAL_REP_MSG_COMMIT_PREPARED = 'K',
LOGICAL_REP_MSG_ROLLBACK_PREPARED = 'r',
LOGICAL_REP_MSG_STREAM_START = 'S',
- LOGICAL_REP_MSG_STREAM_END = 'E',
+ LOGICAL_REP_MSG_STREAM_STOP = 'E',
LOGICAL_REP_MSG_STREAM_COMMIT = 'c',

As this is changing the enum name and if any extension (logical
replication extension) has started using it then they would require a
change. As this is the latest change in PG-14, so it might be okay but
OTOH, as this is just a code readability change, shall we do it only
for PG-15?

I think that the doc changes could be backpatched to PG14 but I think
we should do the code change only for PG15.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#116houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#115)
RE: Skipping logical replication transactions on subscriber side

On Wed, Aug 18, 2021 2:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 18, 2021 at 3:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 10:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In addition of a code readability, there is a description in the doc
that mentions "Stream End" but we describe "Stream Stop" in the
later description, which seems a bug in the doc to me:

Doc changes looks good to me. But, I have question for code change:

--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -65,7 +65,7 @@ typedef enum LogicalRepMsgType
LOGICAL_REP_MSG_COMMIT_PREPARED = 'K',
LOGICAL_REP_MSG_ROLLBACK_PREPARED = 'r',
LOGICAL_REP_MSG_STREAM_START = 'S',
- LOGICAL_REP_MSG_STREAM_END = 'E',
+ LOGICAL_REP_MSG_STREAM_STOP = 'E',
LOGICAL_REP_MSG_STREAM_COMMIT = 'c',

As this is changing the enum name and if any extension (logical
replication extension) has started using it then they would require a
change. As this is the latest change in PG-14, so it might be okay but
OTOH, as this is just a code readability change, shall we do it only
for PG-15?

I think that the doc changes could be backpatched to PG14 but I think we
should do the code change only for PG15.

+1, and the patch looks good to me.

Best regards,
Hou zj

#117Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#114)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 18, 2021 at 3:33 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tues, Aug 17, 2021 1:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 16, 2021 at 3:59 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:

3)
Do we need to invoke set_apply_error_context_xact() in the function
apply_handle_stream_prepare() to save the xid and timestamp ?

Yes. I think that v8-0001 patch already set xid and timestamp just after parsing
stream_prepare message. You meant it's not necessary?

Sorry, I thought of something wrong, please ignore the above comment.

I'll submit the updated patches soon.

I was thinking about the place to set the errcallback.callback.

apply_dispatch(StringInfo s)
{
LogicalRepMsgType action = pq_getmsgbyte(s);
+       ErrorContextCallback errcallback;
+       bool            set_callback = false;
+
+       /*
+        * Push apply error context callback if not yet. Other fields will be
+        * filled during applying the change.  Since this function can be called
+        * recursively when applying spooled changes, we set the callback only
+        * once.
+        */
+       if (apply_error_callback_arg.command == 0)
+       {
+               errcallback.callback = apply_error_callback;
+               errcallback.previous = error_context_stack;
+               error_context_stack = &errcallback;
+               set_callback = true;
+       }
...
+       /* Pop the error context stack */
+       if (set_callback)
+               error_context_stack = errcallback.previous;

It seems we can put the above code in the function LogicalRepApplyLoop()
around invoking apply_dispatch(), and in that approach we don't need to worry
about the recursively case. What do you think ?

Thank you for the comment!

I think you're right. Maybe we can set the callback before entering to
the main loop and pop it after breaking from it. It would also fix the
problem reported by Tang[1]/messages/by-id/OS0PR01MB6113E5BC24922A2D05D16051FBFE9@OS0PR01MB6113.jpnprd01.prod.outlook.com. But one thing we need to note that since
we want to reset apply_error_callback_arg.command at the end of
apply_dispatch() (otherwise we could end up setting the apply error
context to an irrelevant error such as network error), when
apply_dispatch() is called recursively probably we need to save the
apply_error_callback_arg.command before setting the new command and
then revert back to the saved command. Is that right?

Regards,

[1]: /messages/by-id/OS0PR01MB6113E5BC24922A2D05D16051FBFE9@OS0PR01MB6113.jpnprd01.prod.outlook.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#118Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#117)
5 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 18, 2021 at 5:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 18, 2021 at 3:33 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tues, Aug 17, 2021 1:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Aug 16, 2021 at 3:59 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:

3)
Do we need to invoke set_apply_error_context_xact() in the function
apply_handle_stream_prepare() to save the xid and timestamp ?

Yes. I think that v8-0001 patch already set xid and timestamp just after parsing
stream_prepare message. You meant it's not necessary?

Sorry, I thought of something wrong, please ignore the above comment.

I'll submit the updated patches soon.

I was thinking about the place to set the errcallback.callback.

apply_dispatch(StringInfo s)
{
LogicalRepMsgType action = pq_getmsgbyte(s);
+       ErrorContextCallback errcallback;
+       bool            set_callback = false;
+
+       /*
+        * Push apply error context callback if not yet. Other fields will be
+        * filled during applying the change.  Since this function can be called
+        * recursively when applying spooled changes, we set the callback only
+        * once.
+        */
+       if (apply_error_callback_arg.command == 0)
+       {
+               errcallback.callback = apply_error_callback;
+               errcallback.previous = error_context_stack;
+               error_context_stack = &errcallback;
+               set_callback = true;
+       }
...
+       /* Pop the error context stack */
+       if (set_callback)
+               error_context_stack = errcallback.previous;

It seems we can put the above code in the function LogicalRepApplyLoop()
around invoking apply_dispatch(), and in that approach we don't need to worry
about the recursively case. What do you think ?

Thank you for the comment!

I think you're right. Maybe we can set the callback before entering to
the main loop and pop it after breaking from it. It would also fix the
problem reported by Tang[1]. But one thing we need to note that since
we want to reset apply_error_callback_arg.command at the end of
apply_dispatch() (otherwise we could end up setting the apply error
context to an irrelevant error such as network error), when
apply_dispatch() is called recursively probably we need to save the
apply_error_callback_arg.command before setting the new command and
then revert back to the saved command. Is that right?

I've attached the updated version patches that incorporated all
comments I got so far unless I'm missing something. Please review
them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v9-0005-Move-shared-fileset-cleanup-to-before_shmem_exit.patchapplication/octet-stream; name=v9-0005-Move-shared-fileset-cleanup-to-before_shmem_exit.patch
v9-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v9-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v9-0001-Add-logical-changes-details-to-errcontext-of-appl.patchapplication/octet-stream; name=v9-0001-Add-logical-changes-details-to-errcontext-of-appl.patch
v9-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v9-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v9-0002-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v9-0002-Add-pg_stat_subscription_errors-statistics-view.patch
#119Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tanghy.fnst@fujitsu.com (#109)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 17, 2021 at 5:21 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Thursday, August 12, 2021 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset
cleanup to make cfbot tests happy.

Hi

Thanks for your patch. I met a problem when using it. The log is not what I expected in some cases, but in streaming mode, they work well.

For example:
------publisher------
create table test (a int primary key, b varchar);
create publication pub for table test;

------subscriber------
create table test (a int primary key, b varchar);
insert into test values (10000);
create subscription sub connection 'dbname=postgres port=5432' publication pub with(streaming=on);

------publisher------
insert into test values (10000);

Subscriber log:
2021-08-17 14:24:43.415 CST [3630341] ERROR: duplicate key value violates unique constraint "test_pkey"
2021-08-17 14:24:43.415 CST [3630341] DETAIL: Key (a)=(10000) already exists.

It didn't give more context info generated by apply_error_callback function.

Thank you for reporting the issue! This issue must be fixed in the
latest (v9) patches I've just submitted[1]/messages/by-id/CAD21AoCH4Jwn_NkJhvS6W5bZJKSaAYnC9inXqMJc6dLLvhvTQg@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoCH4Jwn_NkJhvS6W5bZJKSaAYnC9inXqMJc6dLLvhvTQg@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#120Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#115)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 18, 2021 at 12:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 18, 2021 at 3:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 10:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 18, 2021 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 6:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 17, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END'
seems to be a bit better because of the value 'E' we use for it.

But I think we don't care about the actual value of
LOGICAL_REP_MSG_STREAM_END since we use the enum value rather than
'E'?

True, but here we are trying to be consistent with other enum values
where we try to use the first letter of the last word (which is E in
this case). I can see there are other cases where we are not
consistent so it won't be a big deal if we won't be consistent here. I
am neutral on this one, so, if you feel using STREAM_STOP would be
better from a code readability perspective then that is fine.

In addition of a code readability, there is a description in the doc
that mentions "Stream End" but we describe "Stream Stop" in the later
description, which seems a bug in the doc to me:

Doc changes looks good to me. But, I have question for code change:

--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -65,7 +65,7 @@ typedef enum LogicalRepMsgType
LOGICAL_REP_MSG_COMMIT_PREPARED = 'K',
LOGICAL_REP_MSG_ROLLBACK_PREPARED = 'r',
LOGICAL_REP_MSG_STREAM_START = 'S',
- LOGICAL_REP_MSG_STREAM_END = 'E',
+ LOGICAL_REP_MSG_STREAM_STOP = 'E',
LOGICAL_REP_MSG_STREAM_COMMIT = 'c',

As this is changing the enum name and if any extension (logical
replication extension) has started using it then they would require a
change. As this is the latest change in PG-14, so it might be okay but
OTOH, as this is just a code readability change, shall we do it only
for PG-15?

I think that the doc changes could be backpatched to PG14 but I think
we should do the code change only for PG15.

Okay, done that way!

--
With Regards,
Amit Kapila.

#121Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#118)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 19, 2021 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches that incorporated all
comments I got so far unless I'm missing something. Please review
them.

The comments I made on Aug 16 and Aug 17 for the v8-0001 patch don't
seem to be addressed in the v9-0001 patch (if you disagree with them
that's fine, but best to say so and why).

Regards,
Greg Nancarrow
Fujitsu Australia

#122Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Greg Nancarrow (#100)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 16, 2021 at 8:33 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 16, 2021 at 6:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Therefore, perhaps a message like "... in transaction 740 with commit
timestamp 2021-08-10 14:44:38.058174+05:30" is better in terms of
consistency with other messages?

Yes, I think that would be more consistent.

On another note, for the 0001 patch, the elog ERROR at the bottom of
the logicalrep_message_type() function seems to assume that the
unrecognized "action" is a printable character (with its use of %c)
and also that the character is meaningful to the user in some way.
But given that the compiler normally warns of an unhandled enum value
when switching on an enum, such an error would most likely be when
action is some int value that wouldn't be meaningful to the user (as
it wouldn't be one of the LogicalRepMsgType enum values).
I therefore think it would be better to use %d in that ERROR:

i.e.

+ elog(ERROR, "invalid logical replication message type %d", action);

Similar comments apply to the apply_dispatch() function (and I realise
it used %c before your patch).

The action in apply_dispatch is always a single byte so not sure why
we need %d here. Also, if it is used as %c before the patch then I
think it is better not to change it in this patch.

--
With Regards,
Amit Kapila.

#123Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#121)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 19, 2021 at 2:18 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Thu, Aug 19, 2021 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches that incorporated all
comments I got so far unless I'm missing something. Please review
them.

The comments I made on Aug 16 and Aug 17 for the v8-0001 patch don't
seem to be addressed in the v9-0001 patch (if you disagree with them
that's fine, but best to say so and why).

Oops, sorry about that. I had just missed those comments. Let's
discuss them and I'll incorporate those comments in the v10 patch if
we agree with the changes.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#124Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#122)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 19, 2021 at 3:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Aug 16, 2021 at 8:33 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 16, 2021 at 6:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Therefore, perhaps a message like "... in transaction 740 with commit
timestamp 2021-08-10 14:44:38.058174+05:30" is better in terms of
consistency with other messages?

Yes, I think that would be more consistent.

On another note, for the 0001 patch, the elog ERROR at the bottom of
the logicalrep_message_type() function seems to assume that the
unrecognized "action" is a printable character (with its use of %c)
and also that the character is meaningful to the user in some way.
But given that the compiler normally warns of an unhandled enum value
when switching on an enum, such an error would most likely be when
action is some int value that wouldn't be meaningful to the user (as
it wouldn't be one of the LogicalRepMsgType enum values).
I therefore think it would be better to use %d in that ERROR:

i.e.

+ elog(ERROR, "invalid logical replication message type %d", action);

Similar comments apply to the apply_dispatch() function (and I realise
it used %c before your patch).

The action in apply_dispatch is always a single byte so not sure why
we need %d here. Also, if it is used as %c before the patch then I
think it is better not to change it in this patch.

Yes, I agree that it's better no to change it in this patch since %c
is used before the patch. Also I can see some error messages in
walsender.c also use %c. If we conclude that it should use %d instead
of %c, we can change all of them as another patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#125Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#105)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 17, 2021 at 12:00 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Thu, Aug 12, 2021 at 3:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches. FYI I've included the patch
(v8-0005) that fixes the assertion failure during shared fileset
cleanup to make cfbot tests happy.

Thank you for the comment!

Another comment on the 0001 patch: as there is now a mix of setting
"apply_error_callback_arg" members directly and also through inline
functions, it might look better to have it done consistently with
functions having prototypes something like the following:

static inline void set_apply_error_context_rel(LogicalRepRelMapEntry *rel);
static inline void reset_apply_error_context_rel(void);
static inline void set_apply_error_context_attnum(int remote_attnum);

It might look consistent, but if we do that, we will end up needing
functions every field to update when we add new fields to the struct
in the future?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#126Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Amit Kapila (#122)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 19, 2021 at 4:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

The action in apply_dispatch is always a single byte so not sure why
we need %d here. Also, if it is used as %c before the patch then I
think it is better not to change it in this patch.

As I explained before, the point is that all the known message types
are handled in the switch statement cases (and you will get a compiler
warning if you miss one of the enum values in the switch cases).
So anything NOT handled in the switch, will be some OTHER value (and
note that any "int" value can be assigned to an enum).
Who says its value will be a printable character (%c) in this case?
And even if it is printable, will it help?
I think in this case it would be better to know the exact value of the
byte ("%d" or "0x%x" etc.), not the character equivalent.
I'm OK if it's done as a separate patch.

Regards,
Greg Nancarrow
Fujitsu Australia

#127Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#120)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 19, 2021 at 2:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 12:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 18, 2021 at 3:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 10:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Aug 18, 2021 at 12:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Aug 18, 2021 at 6:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 17, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

It's right that we use "STREAM STOP" rather than "STREAM END" in many
places such as elog messages, a callback name, and source code
comments. As far as I have found there are two places where we’re
using "STREAM STOP": LOGICAL_REP_MSG_STREAM_END and a description in
doc/src/sgml/protocol.sgml. Isn't it better to fix these
inconsistencies in the first place? I think “STREAM STOP” would be
more appropriate.

I think keeping STREAM_END in the enum 'LOGICAL_REP_MSG_STREAM_END'
seems to be a bit better because of the value 'E' we use for it.

But I think we don't care about the actual value of
LOGICAL_REP_MSG_STREAM_END since we use the enum value rather than
'E'?

True, but here we are trying to be consistent with other enum values
where we try to use the first letter of the last word (which is E in
this case). I can see there are other cases where we are not
consistent so it won't be a big deal if we won't be consistent here. I
am neutral on this one, so, if you feel using STREAM_STOP would be
better from a code readability perspective then that is fine.

In addition of a code readability, there is a description in the doc
that mentions "Stream End" but we describe "Stream Stop" in the later
description, which seems a bug in the doc to me:

Doc changes looks good to me. But, I have question for code change:

--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -65,7 +65,7 @@ typedef enum LogicalRepMsgType
LOGICAL_REP_MSG_COMMIT_PREPARED = 'K',
LOGICAL_REP_MSG_ROLLBACK_PREPARED = 'r',
LOGICAL_REP_MSG_STREAM_START = 'S',
- LOGICAL_REP_MSG_STREAM_END = 'E',
+ LOGICAL_REP_MSG_STREAM_STOP = 'E',
LOGICAL_REP_MSG_STREAM_COMMIT = 'c',

As this is changing the enum name and if any extension (logical
replication extension) has started using it then they would require a
change. As this is the latest change in PG-14, so it might be okay but
OTOH, as this is just a code readability change, shall we do it only
for PG-15?

I think that the doc changes could be backpatched to PG14 but I think
we should do the code change only for PG15.

Okay, done that way!

Thanks!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#128houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#118)
RE: Skipping logical replication transactions on subscriber side

On Thu, Aug 19, 2021 9:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches that incorporated all comments I
got so far unless I'm missing something. Please review them.

Thanks for the new version patches.
The v9-0001 patch looks good to me and I will start to review other patches.

Best regards,
Hou zj

#129Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#125)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 19, 2021 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 17, 2021 at 12:00 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

Another comment on the 0001 patch: as there is now a mix of setting
"apply_error_callback_arg" members directly and also through inline
functions, it might look better to have it done consistently with
functions having prototypes something like the following:

static inline void set_apply_error_context_rel(LogicalRepRelMapEntry *rel);
static inline void reset_apply_error_context_rel(void);
static inline void set_apply_error_context_attnum(int remote_attnum);

It might look consistent, but if we do that, we will end up needing
functions every field to update when we add new fields to the struct
in the future?

Yeah, I also think it is too much, but we can add comments where ever
we set the information for error callback. I see it is missing when
the patch is setting remote_attnum, see similar other places and add
comments if already not there.

--
With Regards,
Amit Kapila.

#130Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#129)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 19, 2021 at 9:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 19, 2021 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 17, 2021 at 12:00 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

Another comment on the 0001 patch: as there is now a mix of setting
"apply_error_callback_arg" members directly and also through inline
functions, it might look better to have it done consistently with
functions having prototypes something like the following:

static inline void set_apply_error_context_rel(LogicalRepRelMapEntry *rel);
static inline void reset_apply_error_context_rel(void);
static inline void set_apply_error_context_attnum(int remote_attnum);

It might look consistent, but if we do that, we will end up needing
functions every field to update when we add new fields to the struct
in the future?

Yeah, I also think it is too much, but we can add comments where ever
we set the information for error callback. I see it is missing when
the patch is setting remote_attnum, see similar other places and add
comments if already not there.

Agred. Will add comments in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#131tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#119)
RE: Skipping logical replication transactions on subscriber side

On Thursday, August 19, 2021 9:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for reporting the issue! This issue must be fixed in the
latest (v9) patches I've just submitted[1].

Thanks for your patch.
I've confirmed the issue is fixed as you said.

Regards
Tang

#132Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tanghy.fnst@fujitsu.com (#131)
Re: Skipping logical replication transactions on subscriber side

On Fri, Aug 20, 2021 at 6:14 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Thursday, August 19, 2021 9:53 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for reporting the issue! This issue must be fixed in the
latest (v9) patches I've just submitted[1].

Thanks for your patch.
I've confirmed the issue is fixed as you said.

Thanks for your confirmation!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#133Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#130)
5 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 19, 2021 at 10:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 19, 2021 at 9:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 19, 2021 at 12:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Aug 17, 2021 at 12:00 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

Another comment on the 0001 patch: as there is now a mix of setting
"apply_error_callback_arg" members directly and also through inline
functions, it might look better to have it done consistently with
functions having prototypes something like the following:

static inline void set_apply_error_context_rel(LogicalRepRelMapEntry *rel);
static inline void reset_apply_error_context_rel(void);
static inline void set_apply_error_context_attnum(int remote_attnum);

It might look consistent, but if we do that, we will end up needing
functions every field to update when we add new fields to the struct
in the future?

Yeah, I also think it is too much, but we can add comments where ever
we set the information for error callback. I see it is missing when
the patch is setting remote_attnum, see similar other places and add
comments if already not there.

Agred. Will add comments in the next version patch.

I've attached updated patches. Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v10-0005-Move-shared-fileset-cleanup-to-before_shmem_exit.patchapplication/octet-stream; name=v10-0005-Move-shared-fileset-cleanup-to-before_shmem_exit.patch
v10-0001-Add-logical-changes-details-to-errcontext-of-app.patchapplication/octet-stream; name=v10-0001-Add-logical-changes-details-to-errcontext-of-app.patch
v10-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v10-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v10-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v10-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v10-0002-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v10-0002-Add-pg_stat_subscription_errors-statistics-view.patch
#134tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#133)
RE: Skipping logical replication transactions on subscriber side

On Monday, August 23, 2021 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches. Please review them.

I tested v10-0001 patch in both streaming and no-streaming more. All tests works well.

I also tried two-phase commit feature, the error context was set as expected,
but please allow me to propose a fix suggestion on the error description:

CONTEXT: processing remote data during "INSERT" for replication target relation
"public.test" in transaction 714 with commit timestamp 2021-08-24
13:20:22.480532+08

It said "commit timestamp", but for 2pc feature, the timestamp could be "prepare timestamp" or "rollback timestamp", too.
Could we make some change to make the error log more comprehensive?

Regards
Tang

#135Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: tanghy.fnst@fujitsu.com (#134)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 24, 2021 at 11:44 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Monday, August 23, 2021 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches. Please review them.

I tested v10-0001 patch in both streaming and no-streaming more. All tests works well.

I also tried two-phase commit feature, the error context was set as expected,
but please allow me to propose a fix suggestion on the error description:

CONTEXT: processing remote data during "INSERT" for replication target relation
"public.test" in transaction 714 with commit timestamp 2021-08-24
13:20:22.480532+08

It said "commit timestamp", but for 2pc feature, the timestamp could be "prepare timestamp" or "rollback timestamp", too.
Could we make some change to make the error log more comprehensive?

I think we can write something like: (processing remote data during
"INSERT" for replication target relation "public.test" in transaction
714 at 2021-08-24 13:20:22.480532+08). Basically replacing "with
commit timestamp" with "at". This is similar to what we do
test_decoding module for transaction timestamp. The other idea could
be we print the exact operation like commit/prepare/rollback which is
also possible because we have that information while setting context
info but that might add a bit more complexity which I don't think is
worth it.

One more point about the v10-0001* patch: From the commit message
"Add logical changes details to errcontext of apply worker errors.",
it appears that the context will be added only for the apply worker
but won't it get added for tablesync worker as well during its sync
phase (when it tries to catch up with apply worker)?

--
With Regards,
Amit Kapila.

#136Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#135)
5 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Tue, Aug 24, 2021 at 10:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 24, 2021 at 11:44 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Monday, August 23, 2021 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches. Please review them.

I tested v10-0001 patch in both streaming and no-streaming more. All tests works well.

I also tried two-phase commit feature, the error context was set as expected,
but please allow me to propose a fix suggestion on the error description:

Thank you for the suggestion!

CONTEXT: processing remote data during "INSERT" for replication target relation
"public.test" in transaction 714 with commit timestamp 2021-08-24
13:20:22.480532+08

It said "commit timestamp", but for 2pc feature, the timestamp could be "prepare timestamp" or "rollback timestamp", too.
Could we make some change to make the error log more comprehensive?

I think we can write something like: (processing remote data during
"INSERT" for replication target relation "public.test" in transaction
714 at 2021-08-24 13:20:22.480532+08). Basically replacing "with
commit timestamp" with "at". This is similar to what we do
test_decoding module for transaction timestamp.

+1

The other idea could
be we print the exact operation like commit/prepare/rollback which is
also possible because we have that information while setting context
info but that might add a bit more complexity which I don't think is
worth it.

Agreed.

I replaced "with commit timestamp" with "at" and rename 'commit_ts'
field name to 'ts'.

One more point about the v10-0001* patch: From the commit message
"Add logical changes details to errcontext of apply worker errors.",
it appears that the context will be added only for the apply worker
but won't it get added for tablesync worker as well during its sync
phase (when it tries to catch up with apply worker)?

Right. I've updated the message.

Attached updated version patches. Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v11-0001-Add-logical-change-details-to-logical-replicatio.patchapplication/octet-stream; name=v11-0001-Add-logical-change-details-to-logical-replicatio.patch
v11-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v11-0003-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v11-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v11-0004-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v11-0005-Move-shared-fileset-cleanup-to-before_shmem_exit.patchapplication/octet-stream; name=v11-0005-Move-shared-fileset-cleanup-to-before_shmem_exit.patch
v11-0002-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v11-0002-Add-pg_stat_subscription_errors-statistics-view.patch
#137tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#136)
RE: Skipping logical replication transactions on subscriber side

On Wednesday, August 25, 2021 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com>wrote:

Attached updated version patches. Please review them.

Thanks for your new patch. The v11-0001 patch LGTM.

Regards
Tang

#138Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#136)
Re: Skipping logical replication transactions on subscriber side

On Wed, Aug 25, 2021 at 2:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Attached updated version patches. Please review them.

Regarding the v11-0001 patch, it looks OK to me, but I do have one point:
In apply_dispatch(), wouldn't it be better to NOT move the error
reporting for an invalid message type into the switch as the default
case - because then, if you add a new message type, you won't get a
compiler warning (when warnings are enabled) for a missing switch
case, which is a handy way to alert you that the new message type
needs to be added as a case to the switch.

Regards,
Greg Nancarrow
Fujitsu Australia

#139Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Greg Nancarrow (#138)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 26, 2021 at 7:15 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Wed, Aug 25, 2021 at 2:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Attached updated version patches. Please review them.

Regarding the v11-0001 patch, it looks OK to me, but I do have one point:
In apply_dispatch(), wouldn't it be better to NOT move the error
reporting for an invalid message type into the switch as the default
case - because then, if you add a new message type, you won't get a
compiler warning (when warnings are enabled) for a missing switch
case, which is a handy way to alert you that the new message type
needs to be added as a case to the switch.

Do you have any suggestions on how to achieve that without adding some
additional variable? I think it is not a very hard requirement as we
don't follow the same at other places in code.

--
With Regards,
Amit Kapila.

#140Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#139)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 26, 2021 at 12:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 26, 2021 at 7:15 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Wed, Aug 25, 2021 at 2:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Attached updated version patches. Please review them.

Regarding the v11-0001 patch, it looks OK to me, but I do have one point:
In apply_dispatch(), wouldn't it be better to NOT move the error
reporting for an invalid message type into the switch as the default
case - because then, if you add a new message type, you won't get a
compiler warning (when warnings are enabled) for a missing switch
case, which is a handy way to alert you that the new message type
needs to be added as a case to the switch.

Do you have any suggestions on how to achieve that without adding some
additional variable? I think it is not a very hard requirement as we
don't follow the same at other places in code.

Yeah, I agree that it's a handy way to detect missing a switch case
but I think that we don't necessarily need it in this case. Because
there are many places in the code where doing similar things and when
it comes to apply_dispatch() it's the entry function to handle the
incoming message so it will be unlikely that we miss adding a switch
case until the patch gets committed. If we don't move it, we would end
up either adding the code resetting the
apply_error_callback_arg.command to every message type, adding a flag
indicating the message is handled and checking later, or having a big
if statement checking if the incoming message type is valid etc.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#141Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Amit Kapila (#139)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 26, 2021 at 1:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Do you have any suggestions on how to achieve that without adding some
additional variable? I think it is not a very hard requirement as we
don't follow the same at other places in code.

Sorry, forget my suggestion, I see it's not easy to achieve it and
still execute the non-error-case code after the switch.
(you'd have to use a variable set in the default case, defeating the
purpose, or have the switch in a separate function with return for
each case)

So the 0001 patch LGTM.

Regards,
Greg Nancarrow
Fujitsu Australia

#142houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#136)
RE: Skipping logical replication transactions on subscriber side

On Wed, Aug 25, 2021 12:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Attached updated version patches. Please review them.

The v11-0001 patch LGTM.

Best regards,
Hou zj

#143Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#140)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 26, 2021 at 9:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 26, 2021 at 12:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah, I agree that it's a handy way to detect missing a switch case
but I think that we don't necessarily need it in this case. Because
there are many places in the code where doing similar things and when
it comes to apply_dispatch() it's the entry function to handle the
incoming message so it will be unlikely that we miss adding a switch
case until the patch gets committed. If we don't move it, we would end
up either adding the code resetting the
apply_error_callback_arg.command to every message type, adding a flag
indicating the message is handled and checking later, or having a big
if statement checking if the incoming message type is valid etc.

I was reviewing and making minor edits to your v11-0001* patch and
noticed that the below parts of the code could be improved:
1.
+ if (errarg->rel)
+ appendStringInfo(&buf, _(" for replication target relation \"%s.%s\""),
+ errarg->rel->remoterel.nspname,
+ errarg->rel->remoterel.relname);
+
+ if (errarg->remote_attnum >= 0)
+ appendStringInfo(&buf, _(" column \"%s\""),
+ errarg->rel->remoterel.attnames[errarg->remote_attnum]);

Isn't it better if 'remote_attnum' check is inside if (errargrel)
check? It will be weird to print column information without rel
information and in the current code, we don't set remote_attnum
without rel. The other possibility could be to have an Assert for rel
in 'remote_attnum' check.

2.
+ /* Reset relation for error callback */
+ apply_error_callback_arg.rel = NULL;
+
  logicalrep_rel_close(rel, NoLock);

end_replication_step();

Isn't it better to reset relation info as the last thing in
apply_handle_insert/update/delete as you do for a few other
parameters? There is very little chance of error from those two
functions but still, it will be good if they ever throw an error and
it might be clear for future edits in this function that this needs to
be set as the last thing in these functions.

Note - I can take care of the above points based on whatever we agree
with, you don't need to send a new version for this.

--
With Regards,
Amit Kapila.

#144Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#143)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 26, 2021 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 26, 2021 at 9:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 26, 2021 at 12:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah, I agree that it's a handy way to detect missing a switch case
but I think that we don't necessarily need it in this case. Because
there are many places in the code where doing similar things and when
it comes to apply_dispatch() it's the entry function to handle the
incoming message so it will be unlikely that we miss adding a switch
case until the patch gets committed. If we don't move it, we would end
up either adding the code resetting the
apply_error_callback_arg.command to every message type, adding a flag
indicating the message is handled and checking later, or having a big
if statement checking if the incoming message type is valid etc.

I was reviewing and making minor edits to your v11-0001* patch and
noticed that the below parts of the code could be improved:
1.
+ if (errarg->rel)
+ appendStringInfo(&buf, _(" for replication target relation \"%s.%s\""),
+ errarg->rel->remoterel.nspname,
+ errarg->rel->remoterel.relname);
+
+ if (errarg->remote_attnum >= 0)
+ appendStringInfo(&buf, _(" column \"%s\""),
+ errarg->rel->remoterel.attnames[errarg->remote_attnum]);

Isn't it better if 'remote_attnum' check is inside if (errargrel)
check? It will be weird to print column information without rel
information and in the current code, we don't set remote_attnum
without rel. The other possibility could be to have an Assert for rel
in 'remote_attnum' check.

2.
+ /* Reset relation for error callback */
+ apply_error_callback_arg.rel = NULL;
+
logicalrep_rel_close(rel, NoLock);

end_replication_step();

Isn't it better to reset relation info as the last thing in
apply_handle_insert/update/delete as you do for a few other
parameters? There is very little chance of error from those two
functions but still, it will be good if they ever throw an error and
it might be clear for future edits in this function that this needs to
be set as the last thing in these functions.

I see that resetting it before logicalrep_rel_close has an advantage
that we might not accidentally access some information after close
which is not there in rel. I am not sure if that is the reason you
have in mind for resetting it before close.

--
With Regards,
Amit Kapila.

#145Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#143)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 26, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 26, 2021 at 9:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 26, 2021 at 12:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah, I agree that it's a handy way to detect missing a switch case
but I think that we don't necessarily need it in this case. Because
there are many places in the code where doing similar things and when
it comes to apply_dispatch() it's the entry function to handle the
incoming message so it will be unlikely that we miss adding a switch
case until the patch gets committed. If we don't move it, we would end
up either adding the code resetting the
apply_error_callback_arg.command to every message type, adding a flag
indicating the message is handled and checking later, or having a big
if statement checking if the incoming message type is valid etc.

I was reviewing and making minor edits to your v11-0001* patch and
noticed that the below parts of the code could be improved:

Thank you for the comments!

1.
+ if (errarg->rel)
+ appendStringInfo(&buf, _(" for replication target relation \"%s.%s\""),
+ errarg->rel->remoterel.nspname,
+ errarg->rel->remoterel.relname);
+
+ if (errarg->remote_attnum >= 0)
+ appendStringInfo(&buf, _(" column \"%s\""),
+ errarg->rel->remoterel.attnames[errarg->remote_attnum]);

Isn't it better if 'remote_attnum' check is inside if (errargrel)
check? It will be weird to print column information without rel
information and in the current code, we don't set remote_attnum
without rel. The other possibility could be to have an Assert for rel
in 'remote_attnum' check.

Agreed to check 'remote_attnum' inside "if(errargrel)".

2.
+ /* Reset relation for error callback */
+ apply_error_callback_arg.rel = NULL;
+
logicalrep_rel_close(rel, NoLock);

end_replication_step();

Isn't it better to reset relation info as the last thing in
apply_handle_insert/update/delete as you do for a few other
parameters? There is very little chance of error from those two
functions but still, it will be good if they ever throw an error and
it might be clear for future edits in this function that this needs to
be set as the last thing in these functions.

On Thu, Aug 26, 2021 at 3:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I see that resetting it before logicalrep_rel_close has an advantage
that we might not accidentally access some information after close
which is not there in rel. I am not sure if that is the reason you
have in mind for resetting it before close.

Yes, that's why I reset the apply_error_callback_arg.rel before
logicalrep_rel_close(), not at the end of the function.

Since the callback function refers to apply_error_callback_arg.rel it
still needs to be valid when an error occurs. Moving it to the end of
the function is no problem for now, but if we always reset relation
info as the last thing, I think that we cannot allow adding changes
between setting relation info and the end of the function (i.g.,
resetting relation info) that could lead to invalidate fields of
apply_error_callback_arg.rel (e.g, freeing a string value etc).

Note - I can take care of the above points based on whatever we agree
with, you don't need to send a new version for this.

Thanks!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#146Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#145)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 26, 2021 at 4:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 26, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

1.
+ if (errarg->rel)
+ appendStringInfo(&buf, _(" for replication target relation \"%s.%s\""),
+ errarg->rel->remoterel.nspname,
+ errarg->rel->remoterel.relname);
+
+ if (errarg->remote_attnum >= 0)
+ appendStringInfo(&buf, _(" column \"%s\""),
+ errarg->rel->remoterel.attnames[errarg->remote_attnum]);

Isn't it better if 'remote_attnum' check is inside if (errargrel)
check? It will be weird to print column information without rel
information and in the current code, we don't set remote_attnum
without rel. The other possibility could be to have an Assert for rel
in 'remote_attnum' check.

Agreed to check 'remote_attnum' inside "if(errargrel)".

Okay, changed accordingly. Additionally, I have changed the code which
sets timestamp to (unset) when it is 0 so that it won't display the
timestamp in that case. I have made few other cosmetic changes in the
attached patch. See and let me know what you think of it?

Note - I have just attached the first patch here, once this is
committed we can focus on others.

--
With Regards,
Amit Kapila.

Attachments:

v12-0001-Add-logical-change-details-to-logical-replicatio.patchapplication/octet-stream; name=v12-0001-Add-logical-change-details-to-logical-replicatio.patch
#147Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#146)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 26, 2021 at 9:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 26, 2021 at 4:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 26, 2021 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

1.
+ if (errarg->rel)
+ appendStringInfo(&buf, _(" for replication target relation \"%s.%s\""),
+ errarg->rel->remoterel.nspname,
+ errarg->rel->remoterel.relname);
+
+ if (errarg->remote_attnum >= 0)
+ appendStringInfo(&buf, _(" column \"%s\""),
+ errarg->rel->remoterel.attnames[errarg->remote_attnum]);

Isn't it better if 'remote_attnum' check is inside if (errargrel)
check? It will be weird to print column information without rel
information and in the current code, we don't set remote_attnum
without rel. The other possibility could be to have an Assert for rel
in 'remote_attnum' check.

Agreed to check 'remote_attnum' inside "if(errargrel)".

Okay, changed accordingly. Additionally, I have changed the code which
sets timestamp to (unset) when it is 0 so that it won't display the
timestamp in that case. I have made few other cosmetic changes in the
attached patch. See and let me know what you think of it?

Thank you for the patch!

Agreed with these changes. The patch looks good to me.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#148Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#147)
Re: Skipping logical replication transactions on subscriber side

On Thu, Aug 26, 2021 at 6:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 26, 2021 at 9:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, changed accordingly. Additionally, I have changed the code which
sets timestamp to (unset) when it is 0 so that it won't display the
timestamp in that case. I have made few other cosmetic changes in the
attached patch. See and let me know what you think of it?

Thank you for the patch!

Agreed with these changes. The patch looks good to me.

Pushed, feel free to rebase and send the remaining patch set.

--
With Regards,
Amit Kapila.

#149Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#148)
Re: Skipping logical replication transactions on subscriber side

On Fri, Aug 27, 2021 at 1:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 26, 2021 at 6:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 26, 2021 at 9:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, changed accordingly. Additionally, I have changed the code which
sets timestamp to (unset) when it is 0 so that it won't display the
timestamp in that case. I have made few other cosmetic changes in the
attached patch. See and let me know what you think of it?

Thank you for the patch!

Agreed with these changes. The patch looks good to me.

Pushed, feel free to rebase and send the remaining patch set.

Thanks!

I'll post the updated version patch on Monday.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#150Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#149)
4 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Fri, Aug 27, 2021 at 8:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Aug 27, 2021 at 1:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 26, 2021 at 6:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Aug 26, 2021 at 9:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, changed accordingly. Additionally, I have changed the code which
sets timestamp to (unset) when it is 0 so that it won't display the
timestamp in that case. I have made few other cosmetic changes in the
attached patch. See and let me know what you think of it?

Thank you for the patch!

Agreed with these changes. The patch looks good to me.

Pushed, feel free to rebase and send the remaining patch set.

Thanks!

I'll post the updated version patch on Monday.

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1]/messages/by-id/CAFiTN-v-zFpmm7Ze1Dai5LJjhhNYXvK8QABs35X89WY0HDG4Ww@mail.gmail.com to fix the assertion
failure for newly added tests. Please review them.

Regards,

[1]: /messages/by-id/CAFiTN-v-zFpmm7Ze1Dai5LJjhhNYXvK8QABs35X89WY0HDG4Ww@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v12-0001-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v12-0001-Add-pg_stat_subscription_errors-statistics-view.patch
v12-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v12-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v12-0004-Using-fileset-more-effectively-in-the-apply-work.patchapplication/octet-stream; name=v12-0004-Using-fileset-more-effectively-in-the-apply-work.patch
v12-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v12-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
#151Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#150)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

I have some initial feedback on the v12-0001 patch.
Most of these are suggested improvements to wording, and some typo fixes.

(0) Patch comment

Suggestion to improve the patch comment:

BEFORE:
Add pg_stat_subscription_errors statistics view.

This commits adds new system view pg_stat_logical_replication_error,
showing errors happening during applying logical replication changes
as well as during performing initial table synchronization.

The subscription error entries are removed by autovacuum workers when
the table synchronization competed in table sync worker cases and when
dropping the subscription in apply worker cases.

It also adds SQL function pg_stat_reset_subscription_error() to
reset the single subscription error.

AFTER:
Add a subscription errors statistics view "pg_stat_subscription_errors".

This commits adds a new system view pg_stat_logical_replication_errors,
that records information about any errors which occur during application
of logical replication changes as well as during performing initial table
synchronization.

The subscription error entries are removed by autovacuum workers after
table synchronization completes in table sync worker cases and after
dropping the subscription in apply worker cases.

It also adds an SQL function pg_stat_reset_subscription_error() to
reset a single subscription error.

doc/src/sgml/monitoring.sgml:

(1)
BEFORE:
+      <entry>One row per error that happened on subscription, showing
information about
+       the subscription errors.
AFTER:
+      <entry>One row per error that occurred on subscription,
providing information about
+       each subscription error.
(2)
BEFORE:
+   The <structname>pg_stat_subscription_errors</structname> view will
contain one
AFTER:
+   The <structname>pg_stat_subscription_errors</structname> view contains one
(3)
BEFORE:
+        Name of the database in which the subscription is created.
AFTER:
+        Name of the database in which the subscription was created.
(4)
BEFORE:
+       OID of the relation that the worker is processing when the
+       error happened.
AFTER:
+       OID of the relation that the worker was processing when the
+       error occurred.
(5)
BEFORE:
+        Name of command being applied when the error happened.  This
+        field is always NULL if the error is reported by
+        <literal>tablesync</literal> worker.
AFTER:
+        Name of command being applied when the error occurred.  This
+        field is always NULL if the error is reported by a
+        <literal>tablesync</literal> worker.
(6)
BEFORE:
+        Transaction ID of publisher node being applied when the error
+        happened.  This field is always NULL if the error is reported
+        by <literal>tablesync</literal> worker.
AFTER:
+        Transaction ID of the publisher node being applied when the error
+        happened.  This field is always NULL if the error is reported
+        by a <literal>tablesync</literal> worker.
(7)
BEFORE:
+        Type of worker reported the error: <literal>apply</literal> or
+        <literal>tablesync</literal>.
AFTER:
+        Type of worker reporting the error: <literal>apply</literal> or
+        <literal>tablesync</literal>.
(8)
BEFORE:
+       Number of times error happened on the worker.
AFTER:
+       Number of times the error occurred in the worker.

[or "Number of times the worker reported the error" ?]

(9)
BEFORE:
+       Time at which the last error happened.
AFTER:
+       Time at which the last error occurred.
(10)
BEFORE:
+       Error message which is reported last failure time.
AFTER:
+       Error message which was reported at the last failure time.

Maybe this should just say "Last reported error message" ?

(11)
You shouldn't call hash_get_num_entries() on a NULL pointer.

Suggest swappling lines, as shown below:

BEFORE:
+ int32 nerrors = hash_get_num_entries(subent->suberrors);
+
+ /* Skip this subscription if not have any errors */
+ if (subent->suberrors == NULL)
+    continue;
AFTER:
+ int32 nerrors;
+
+ /* Skip this subscription if not have any errors */
+ if (subent->suberrors == NULL)
+    continue;
+ nerrors = hash_get_num_entries(subent->suberrors);

(12)
Typo: legnth -> length

+ * contains the fixed-legnth error message string which is

src/backend/postmaster/pgstat.c

(13)
"Subscription stat entries" hashtable is created in two different
places, one with HASH_CONTEXT and the other without. Is this
intentional?
Shouldn't there be a single function for creating this?

(14)
+ * PgStat_MsgSubscriptionPurge Sent by the autovacuum purge the subscriptions.

Seems to be missing a word, is it meant to say "Sent by the autovacuum
to purge the subscriptions." ?

(15)
+ * PgStat_MsgSubscriptionErrPurge Sent by the autovacuum purge the subscription
+ * errors.

Seems to be missing a word, is it meant to say "Sent by the autovacuum
to purge the subscription errors." ?

Regards,
Greg Nancarrow
Fujitsu Australia

#152Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#150)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

I have a few comments on the v12-0002 patch:

(1) Patch comment

Has a typo and could be expressed a bit better.

Suggestion:

BEFORE:
RESET command is reuiqred by follow-up commit introducing to a new
parameter skip_xid to reset.
AFTER:
The RESET parameter for ALTER SUBSCRIPTION is required by the
follow-up commit that introduces a new resettable subscription
parameter "skip_xid".

doc/src/sgml/ref/alter_subscription.sgml

(2)
I don't think "RESET" is sufficiently described in
alter_subscription.sgml. Just putting it under "SET" and changing
"altered" to "set" doesn't explain what resetting does. It should say
something about setting the parameter back to its original (default)
value.

(3)
case ALTER_SUBSCRIPTION_RESET_OPTIONS

Some comments here would be helpful e.g. Reset the specified
parameters back to their default values.

Regards,
Greg Nancarrow
Fujitsu Australia

#153houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#150)
RE: Skipping logical replication transactions on subscriber side

From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this patch. It's
borrowed from another thread[1] to fix the assertion failure for newly added
tests. Please review them.

Hi,

I reviewed the v12-0001 patch, here are some comments:

1)
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1441,7 +1441,6 @@ getinternalerrposition(void)
 	return edata->internalpos;
 }

-

It seems a miss change in elog.c

2)

+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);

The document doesn't mention the column "stats_reset".

3)

+typedef struct PgStat_StatSubErrEntry
+{
+	Oid			subrelid;		/* InvalidOid if the apply worker, otherwise
+								 * the table sync worker. hash table key. */

From the comments of subrelid, I think one subscription only have one apply
worker error entry, right ? If so, I was thinking can we move the the apply
error entry to PgStat_StatSubEntry. In that approach, we don't need to build a
inner hash table when there are no table sync error entry.

4)
Is it possible to add some testcases to test the subscription error entry delete ?

Best regards,
Hou zj

#154houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#150)
1 attachment(s)
RE: Skipping logical replication transactions on subscriber side

From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

Hi,

I reviewed the 0002 patch and have a suggestion for it.

+				if (IsSet(opts.specified_opts, SUBOPT_SYNCHRONOUS_COMMIT))
+				{
+					values[Anum_pg_subscription_subsynccommit - 1] =
+						CStringGetTextDatum("off");
+					replaces[Anum_pg_subscription_subsynccommit - 1] = true;
+				}

Currently, the patch set the default value out of parse_subscription_options(),
but I think It might be more standard to set the value in
parse_subscription_options(). Like:

			if (!is_reset)
			{
				...
+			}
+			else
+				opts->synchronous_commit = "off";

And then, we can set the value like:

values[Anum_pg_subscription_subsynccommit - 1] =
CStringGetTextDatum(opts.synchronous_commit);

Besides, instead of adding a switch case like the following:
+		case ALTER_SUBSCRIPTION_RESET_OPTIONS:
+			{

We can add a bool flag(isReset) in AlterSubscriptionStmt and check the flag
when invoking parse_subscription_options(). In this approach, the code can be
shorter.

Attach a diff file based on the v12-0002 which change the code like the above
suggestion.

Best regards,
Hou zj

Attachments:

0001-diff-for-0002_patchapplication/octet-stream; name=0001-diff-for-0002_patch
#155Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#150)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

Some initial comments for the v12-0003 patch:

(1) Patch comment
"This commit introduces another way to skip the transaction in question."

I think it should further explain: "This commit introduces another way
to skip the transaction in question, other than manually updating the
subscriber's database or using pg_replication_origin_advance()."

doc/src/sgml/logical-replication.sgml
(2)

Suggested minor update:

BEFORE:
+   transaction that conflicts with the existing data.  When a conflict produce
+   an error, it is shown in
<structname>pg_stat_subscription_errors</structname>
+   view as follows:
AFTER:
+   transaction that conflicts with the existing data.  When a conflict produces
+   an error, it is recorded in the
<structname>pg_stat_subscription_errors</structname>
+   view as follows:

(3)
+ found from those outputs (transaction ID 740 in the above case).
The transaction

Shouldn't it be transaction ID 716?

(4)
+ can be skipped by setting <replaceable>skip_xid</replaceable> to
the subscription

Is it better to say here ... "on the subscription" ?

(5)
Just skipping a transaction could make a subscriber inconsistent, right?

Would it be better as follows?

BEFORE:
+   In either way, those should be used as a last resort. They skip the whole
+   transaction including changes that may not violate any constraint and easily
+   make subscriber inconsistent if a user specifies the wrong transaction ID or
+   the position of origin.
AFTER:
+   Either way, those transaction skipping methods should be used as a
last resort.
+   They skip the whole transaction, including changes that may not violate any
+   constraint.  They may easily make the subscriber inconsistent,
especially if a
+   user specifies the wrong transaction ID or the position of origin.

(6)
The grammar is not great in the following description, so here's a
suggested improvement:

BEFORE:
+          incoming change or by skipping the whole transaction.  This option
+          specifies transaction ID that logical replication worker skips to
+          apply.  The logical replication worker skips all data modification
AFTER:
+          incoming changes or by skipping the whole transaction.  This option
+          specifies the ID of the transaction whose application is to
be skipped
+          by the logical replication worker.  The logical replication worker
+          skips all data modification
src/backend/postmaster/pgstat.c
(7)
BEFORE:
+ * Tell the collector about clear the error of subscription.
AFTER:
+ * Tell the collector to clear the subscription error.

src/backend/replication/logical/worker.c
(8)
+ * subscription is invalidated and* MySubscription->skipxid gets
changed or reset.

There is a "*" after "and".

(9)
Do these lines really need to be moved up?

+ /* extract XID of the top-level transaction */
+ stream_xid = logicalrep_read_stream_start(s, &first_segment);
+

src/backend/postmaster/pgstat.c
(10)

+ bool m_clear; /* clear all fields except for last_failure and
+ * last_errmsg */

I think it should say: clear all fields except for last_failure,
last_errmsg and stat_reset_timestamp.

Regards,
Greg Nancarrow
Fujitsu Australia

#156Mark Dilger
Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Masahiko Sawada (#150)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Aug 30, 2021, at 12:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches.

Thanks for these patches, Sawada-san!

The first patch in your series, v12-0001, seems useful to me even before committing any of the rest. I would like to integrate the new pg_stat_subscription_errors view it creates into regression tests for other logical replication features under development.

In particular, it can be hard to write TAP tests that need to wait for subscriptions to catch up or fail. With your view committed, a new PostgresNode function to wait for catchup or for failure can be added, and then developers of different projects can all use that. I am attaching a version of such a function, plus some tests of your patch (since it does not appear to have any). Would you mind reviewing these and giving comments or including them in your next patch version?

Attachments:

0001-Adding-tests-of-subscription-errors.patchapplication/octet-stream; name=0001-Adding-tests-of-subscription-errors.patch; x-unix-mode=0644
#157Mark Dilger
Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Masahiko Sawada (#150)
Re: Skipping logical replication transactions on subscriber side

On Aug 30, 2021, at 12:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches.

Here are some review comments:

For the v12-0002 patch:

The documentation changes for ALTER SUBSCRIPTION .. RESET look strange to me. You grouped SET and RESET together, much like sql-altertable.html has them grouped, but I don't think it flows naturally here, as the two commands do not support the same set of parameters. It might look better if you documented these separately. It might also be good to order the parameters the same, so that the differences can more quickly be seen.

For the v12-0003 patch:

I believe this feature is needed, but it also seems like a very powerful foot-gun. Can we do anything to make it less likely that users will hurt themselves with this tool?

I am thinking back to support calls I have attended. When a production system is down, there is often some hesitancy to perform ad-hoc operations on the database, but once the decision has been made to do so, people try to get the whole process done as quickly as possible. If multiple transactions on the publisher fail on the subscriber, they will do so in series, not in parallel. The process of clearing these errors will amount to copying the xid of each failed transaction to the ALTER SUBSCRIPTION ... SET (skip_xid = xxx) command and running it, then the next, then the next, .... Perhaps the first couple times through the process, the customer will look to see that the failure is of the same type and on the same table, but after a short time they will likely just script something to clear the rest as quickly as possible. In the heat of the moment, they may not include a check of the failure message, but merely a grep of the failing xid.

If the user could instead clear all failed transactions of the same type, that might make it less likely that they unthinkingly also skip subsequent errors of some different type. Perhaps something like ALTER SUBSCRIPTION ... SET (skip_failures = 'duplicate key value violates unique constraint "test_pkey"')? This is arguably a different feature request, and not something your patch is required to address, but I wonder how much we should limit people shooting themselves in the foot? If we built something like this using your skip_xid feature, rather than instead of your skip_xid feature, would your feature need to be modified?

The docs could use some rewording, too:

+          If incoming data violates any constraints the logical replication
+          will stop until it is resolved. 

In my experience, logical replication doesn't stop, but instead goes into an infinite loop of retries.

+          The resolution can be done either
+          by changing data on the subscriber so that it doesn't conflict with
+          incoming change or by skipping the whole transaction.

I'm having trouble thinking of an example conflict where skipping a transaction would be better than writing a BEFORE INSERT trigger on the conflicting table which suppresses or redirects conflicting rows somewhere else. Particularly for larger transactions containing multiple statements, suppressing the conflicting rows using a trigger would be less messy than skipping the transaction. I think your patch adds a useful tool to the toolkit, but maybe we should mention more alternatives in the docs? Something like, "changing the data on the subscriber so that it doesn't conflict with incoming changes, or dropping the conflicting constraint or unique index, or writing a trigger on the subscriber to suppress or redirect conflicting incoming changes, or as a last resort, by skipping the whole transaction"?

Perhaps I'm reading your phrase "changing the data on the subscriber" too narrowly. To me, that means running DML (either a DELETE or an UPDATE) on the existing data in the table where the conflict arises. These other options are DDL and do not easily come to mind when I read that phrase.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#158Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#150)
Re: Skipping logical replication transactions on subscriber side

On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

BTW, these patches need rebasing (broken by recent commits, patches
0001, 0003 and 0004 no longer apply, and it's failing in the cfbot).

Regards,
Greg Nancarrow
Fujitsu Australia

#159Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Mark Dilger (#157)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 3, 2021 at 2:15 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

On Aug 30, 2021, at 12:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches.

For the v12-0003 patch:

I believe this feature is needed, but it also seems like a very powerful foot-gun. Can we do anything to make it less likely that users will hurt themselves with this tool?

This won't do any more harm than currently, users can do via
pg_replication_slot_advance and the same is documented as well, see
[1]: . This will be allowed to only superusers. Its effect will be documented with a precautionary note to use it only when the other available ways can't be used. Any better ideas?
documented with a precautionary note to use it only when the other
available ways can't be used. Any better ideas?

I am thinking back to support calls I have attended. When a production system is down, there is often some hesitancy to perform ad-hoc operations on the database, but once the decision has been made to do so, people try to get the whole process done as quickly as possible. If multiple transactions on the publisher fail on the subscriber, they will do so in series, not in parallel.

The subscriber will know only one transaction failure at a time, till
that is resolved, the apply won't move ahead and it won't know even if
there are other transactions that are going to fail in the future.

If the user could instead clear all failed transactions of the same type, that might make it less likely that they unthinkingly also skip subsequent errors of some different type. Perhaps something like ALTER SUBSCRIPTION ... SET (skip_failures = 'duplicate key value violates unique constraint "test_pkey"')?

I think if we want we can allow to skip particular error via
skip_error_code instead of via error message but not sure if it would
be better to skip a particular operation of the transaction rather
than the entire transaction. Normally from the atomicity purpose the
transaction can be either committed or rolled-back but not partially
done so I think it would be preferable to skip the entire transaction
rather than skipping it partially.

This is arguably a different feature request, and not something your patch is required to address, but I wonder how much we should limit people shooting themselves in the foot? If we built something like this using your skip_xid feature, rather than instead of your skip_xid feature, would your feature need to be modified?

Sawada-San can answer better but I don't see any problem building any
such feature on top of what is currently proposed.

I'm having trouble thinking of an example conflict where skipping a transaction would be better than writing a BEFORE INSERT trigger on the conflicting table which suppresses or redirects conflicting rows somewhere else. Particularly for larger transactions containing multiple statements, suppressing the conflicting rows using a trigger would be less messy than skipping the transaction. I think your patch adds a useful tool to the toolkit, but maybe we should mention more alternatives in the docs? Something like, "changing the data on the subscriber so that it doesn't conflict with incoming changes, or dropping the conflicting constraint or unique index, or writing a trigger on the subscriber to suppress or redirect conflicting incoming changes, or as a last resort, by skipping the whole transaction"?

+1 for extending the docs as per this suggestion.

--
With Regards,
Amit Kapila.

#160Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#159)
Re: Skipping logical replication transactions on subscriber side

On Sat, Sep 4, 2021 at 8:54 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 3, 2021 at 2:15 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

On Aug 30, 2021, at 12:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches.

For the v12-0003 patch:

I believe this feature is needed, but it also seems like a very powerful foot-gun. Can we do anything to make it less likely that users will hurt themselves with this tool?

This won't do any more harm than currently, users can do via
pg_replication_slot_advance and the same is documented as well, see
[1].

Sorry, forgot to give the link.

[1]: https://www.postgresql.org/docs/devel/logical-replication-conflicts.html

--
With Regards,
Amit Kapila.

#161Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#151)
Re: Skipping logical replication transactions on subscriber side

On Thu, Sep 2, 2021 at 12:06 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

I have some initial feedback on the v12-0001 patch.
Most of these are suggested improvements to wording, and some typo fixes.

Thank you for the comments!

(0) Patch comment

Suggestion to improve the patch comment:

BEFORE:
Add pg_stat_subscription_errors statistics view.

This commits adds new system view pg_stat_logical_replication_error,

Oops, I realized that it should be pg_stat_subscription_errors.

showing errors happening during applying logical replication changes
as well as during performing initial table synchronization.

The subscription error entries are removed by autovacuum workers when
the table synchronization competed in table sync worker cases and when
dropping the subscription in apply worker cases.

It also adds SQL function pg_stat_reset_subscription_error() to
reset the single subscription error.

AFTER:
Add a subscription errors statistics view "pg_stat_subscription_errors".

This commits adds a new system view pg_stat_logical_replication_errors,
that records information about any errors which occur during application
of logical replication changes as well as during performing initial table
synchronization.

I think that views don't have any data so "show information" seems
appropriate to me here. Thoughts?

The subscription error entries are removed by autovacuum workers after
table synchronization completes in table sync worker cases and after
dropping the subscription in apply worker cases.

It also adds an SQL function pg_stat_reset_subscription_error() to
reset a single subscription error.

doc/src/sgml/monitoring.sgml:

(1)
BEFORE:
+      <entry>One row per error that happened on subscription, showing
information about
+       the subscription errors.
AFTER:
+      <entry>One row per error that occurred on subscription,
providing information about
+       each subscription error.

Fixed.

(2)
BEFORE:
+   The <structname>pg_stat_subscription_errors</structname> view will
contain one
AFTER:
+   The <structname>pg_stat_subscription_errors</structname> view contains one

I think that descriptions of other statistics view also say "XXX view
will contain ...".

(3)
BEFORE:
+        Name of the database in which the subscription is created.
AFTER:
+        Name of the database in which the subscription was created.

Fixed.

(4)
BEFORE:
+       OID of the relation that the worker is processing when the
+       error happened.
AFTER:
+       OID of the relation that the worker was processing when the
+       error occurred.

Fixed.

(5)
BEFORE:
+        Name of command being applied when the error happened.  This
+        field is always NULL if the error is reported by
+        <literal>tablesync</literal> worker.
AFTER:
+        Name of command being applied when the error occurred.  This
+        field is always NULL if the error is reported by a
+        <literal>tablesync</literal> worker.

Fixed.

(6)
BEFORE:
+        Transaction ID of publisher node being applied when the error
+        happened.  This field is always NULL if the error is reported
+        by <literal>tablesync</literal> worker.
AFTER:
+        Transaction ID of the publisher node being applied when the error
+        happened.  This field is always NULL if the error is reported
+        by a <literal>tablesync</literal> worker.

Fixed.

(7)
BEFORE:
+        Type of worker reported the error: <literal>apply</literal> or
+        <literal>tablesync</literal>.
AFTER:
+        Type of worker reporting the error: <literal>apply</literal> or
+        <literal>tablesync</literal>.

Fixed.

(8)
BEFORE:
+       Number of times error happened on the worker.
AFTER:
+       Number of times the error occurred in the worker.

[or "Number of times the worker reported the error" ?]

I prefer "Number of times the error occurred in the worker."

(9)
BEFORE:
+       Time at which the last error happened.
AFTER:
+       Time at which the last error occurred.

Fixed.

(10)
BEFORE:
+       Error message which is reported last failure time.
AFTER:
+       Error message which was reported at the last failure time.

Maybe this should just say "Last reported error message" ?

Fixed.

(11)
You shouldn't call hash_get_num_entries() on a NULL pointer.

Suggest swappling lines, as shown below:

BEFORE:
+ int32 nerrors = hash_get_num_entries(subent->suberrors);
+
+ /* Skip this subscription if not have any errors */
+ if (subent->suberrors == NULL)
+    continue;
AFTER:
+ int32 nerrors;
+
+ /* Skip this subscription if not have any errors */
+ if (subent->suberrors == NULL)
+    continue;
+ nerrors = hash_get_num_entries(subent->suberrors);

Right. Fixed.

(12)
Typo: legnth -> length

+ * contains the fixed-legnth error message string which is

Fixed.

src/backend/postmaster/pgstat.c

(13)
"Subscription stat entries" hashtable is created in two different
places, one with HASH_CONTEXT and the other without. Is this
intentional?
Shouldn't there be a single function for creating this?

Yes, it's intentional. It's consistent with hash tables for other statistics.

(14)
+ * PgStat_MsgSubscriptionPurge Sent by the autovacuum purge the subscriptions.

Seems to be missing a word, is it meant to say "Sent by the autovacuum
to purge the subscriptions." ?

Yes, fixed.

(15)
+ * PgStat_MsgSubscriptionErrPurge Sent by the autovacuum purge the subscription
+ * errors.

Seems to be missing a word, is it meant to say "Sent by the autovacuum
to purge the subscription errors." ?

Thanks, fixed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#162Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#152)
Re: Skipping logical replication transactions on subscriber side

On Thu, Sep 2, 2021 at 2:55 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

I have a few comments on the v12-0002 patch:

Thank you for the comments!

(1) Patch comment

Has a typo and could be expressed a bit better.

Suggestion:

BEFORE:
RESET command is reuiqred by follow-up commit introducing to a new
parameter skip_xid to reset.
AFTER:
The RESET parameter for ALTER SUBSCRIPTION is required by the
follow-up commit that introduces a new resettable subscription
parameter "skip_xid".

Fixed.

doc/src/sgml/ref/alter_subscription.sgml

(2)
I don't think "RESET" is sufficiently described in
alter_subscription.sgml. Just putting it under "SET" and changing
"altered" to "set" doesn't explain what resetting does. It should say
something about setting the parameter back to its original (default)
value.

Doesn't "RESET" normally mean to change the parameter back to its default value?

(3)
case ALTER_SUBSCRIPTION_RESET_OPTIONS

Some comments here would be helpful e.g. Reset the specified
parameters back to their default values.

Okay, added.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#163Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#155)
Re: Skipping logical replication transactions on subscriber side

On Thu, Sep 2, 2021 at 9:03 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

Thank you for the comments!

Some initial comments for the v12-0003 patch:

(1) Patch comment
"This commit introduces another way to skip the transaction in question."

I think it should further explain: "This commit introduces another way
to skip the transaction in question, other than manually updating the
subscriber's database or using pg_replication_origin_advance()."

Updated.

doc/src/sgml/logical-replication.sgml
(2)

Suggested minor update:

BEFORE:
+   transaction that conflicts with the existing data.  When a conflict produce
+   an error, it is shown in
<structname>pg_stat_subscription_errors</structname>
+   view as follows:
AFTER:
+   transaction that conflicts with the existing data.  When a conflict produces
+   an error, it is recorded in the
<structname>pg_stat_subscription_errors</structname>
+   view as follows:

Fixed.

(3)
+ found from those outputs (transaction ID 740 in the above case).
The transaction

Shouldn't it be transaction ID 716?

Right, fixed.

(4)
+ can be skipped by setting <replaceable>skip_xid</replaceable> to
the subscription

Is it better to say here ... "on the subscription" ?

Okay, fixed.

(5)
Just skipping a transaction could make a subscriber inconsistent, right?

Would it be better as follows?

BEFORE:
+   In either way, those should be used as a last resort. They skip the whole
+   transaction including changes that may not violate any constraint and easily
+   make subscriber inconsistent if a user specifies the wrong transaction ID or
+   the position of origin.
AFTER:
+   Either way, those transaction skipping methods should be used as a
last resort.
+   They skip the whole transaction, including changes that may not violate any
+   constraint.  They may easily make the subscriber inconsistent,
especially if a
+   user specifies the wrong transaction ID or the position of origin.

Agreed, fixed.

(6)
The grammar is not great in the following description, so here's a
suggested improvement:

BEFORE:
+          incoming change or by skipping the whole transaction.  This option
+          specifies transaction ID that logical replication worker skips to
+          apply.  The logical replication worker skips all data modification
AFTER:
+          incoming changes or by skipping the whole transaction.  This option
+          specifies the ID of the transaction whose application is to
be skipped
+          by the logical replication worker.  The logical replication worker
+          skips all data modification

Fixed.

src/backend/postmaster/pgstat.c
(7)
BEFORE:
+ * Tell the collector about clear the error of subscription.
AFTER:
+ * Tell the collector to clear the subscription error.

Fixed.

src/backend/replication/logical/worker.c
(8)
+ * subscription is invalidated and* MySubscription->skipxid gets
changed or reset.

There is a "*" after "and".

Fixed.

(9)
Do these lines really need to be moved up?

+ /* extract XID of the top-level transaction */
+ stream_xid = logicalrep_read_stream_start(s, &first_segment);
+

I had missed to revert this change, fixed.

src/backend/postmaster/pgstat.c
(10)

+ bool m_clear; /* clear all fields except for last_failure and
+ * last_errmsg */

I think it should say: clear all fields except for last_failure,
last_errmsg and stat_reset_timestamp.

Fixed.

Those comments including your comments on the v12-0001 and v12-0002
are incorporated into local branch. I'll submit the updated patches
after incorporating all other comments.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#164Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#153)
Re: Skipping logical replication transactions on subscriber side

On Thu, Sep 2, 2021 at 5:41 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this patch. It's
borrowed from another thread[1] to fix the assertion failure for newly added
tests. Please review them.

Hi,

I reviewed the v12-0001 patch, here are some comments:

Thank you for the comments!

1)
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1441,7 +1441,6 @@ getinternalerrposition(void)
return edata->internalpos;
}

-

It seems a miss change in elog.c

Fixed.

2)

+       TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+                                          TIMESTAMPTZOID, -1, 0);

The document doesn't mention the column "stats_reset".

Added.

3)

+typedef struct PgStat_StatSubErrEntry
+{
+       Oid                     subrelid;               /* InvalidOid if the apply worker, otherwise
+                                                                * the table sync worker. hash table key. */

From the comments of subrelid, I think one subscription only have one apply
worker error entry, right ? If so, I was thinking can we move the the apply
error entry to PgStat_StatSubEntry. In that approach, we don't need to build a
inner hash table when there are no table sync error entry.

I wanted to avoid having unnecessary error entry fields when there is
no apply worker error but there is a table sync worker error. But
after more thoughts, the apply worker is likely to raise an error than
table sync workers. So it might be better to have both
PgStat_StatSubErrEntry for the apply worker error and hash table for
table sync workers errors in PgStat_StatSubEntry.

4)
Is it possible to add some testcases to test the subscription error entry delete ?

Do you mean the tests checking if subscription error entry is deleted
after DROP SUBSCRIPTION?

Those comments are incorporated into local branches. I'll submit the
updated patches after incorporating other comments.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#165Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#154)
Re: Skipping logical replication transactions on subscriber side

On Thu, Sep 2, 2021 at 8:37 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

Hi,

I reviewed the 0002 patch and have a suggestion for it.

+                               if (IsSet(opts.specified_opts, SUBOPT_SYNCHRONOUS_COMMIT))
+                               {
+                                       values[Anum_pg_subscription_subsynccommit - 1] =
+                                               CStringGetTextDatum("off");
+                                       replaces[Anum_pg_subscription_subsynccommit - 1] = true;
+                               }

Currently, the patch set the default value out of parse_subscription_options(),
but I think It might be more standard to set the value in
parse_subscription_options(). Like:

if (!is_reset)
{
...
+                       }
+                       else
+                               opts->synchronous_commit = "off";

And then, we can set the value like:

values[Anum_pg_subscription_subsynccommit - 1] =
CStringGetTextDatum(opts.synchronous_commit);

You're right. Fixed.

Besides, instead of adding a switch case like the following:
+               case ALTER_SUBSCRIPTION_RESET_OPTIONS:
+                       {

We can add a bool flag(isReset) in AlterSubscriptionStmt and check the flag
when invoking parse_subscription_options(). In this approach, the code can be
shorter.

Attach a diff file based on the v12-0002 which change the code like the above
suggestion.

Thank you for the patch!

@@ -3672,11 +3671,12 @@ typedef enum AlterSubscriptionType
 typedef struct AlterSubscriptionStmt
 {
        NodeTag         type;
-       AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_SET_OPTIONS, etc */
+       AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_OPTIONS, etc */
        char       *subname;            /* Name of the subscription */
        char       *conninfo;           /* Connection string to publisher */
        List       *publication;        /* One or more publication to
subscribe to */
        List       *options;            /* List of DefElem nodes */
+       bool            isReset;                /* true if RESET option */
 } AlterSubscriptionStmt;

It's unnatural to me that AlterSubscriptionStmt has isReset flag even
in spite of having 'kind' indicating the command. How about having
RESET comand use the same logic of SET as you suggested while having
both ALTER_SUBSCRIPTION_SET_OPTIONS and
ALTER_SUBSCRIPTION_RESET_OPTIONS?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#166Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#158)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 3, 2021 at 3:46 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

BTW, these patches need rebasing (broken by recent commits, patches
0001, 0003 and 0004 no longer apply, and it's failing in the cfbot).

Thanks! I'll submit the updated patches early this week.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#167houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#165)
RE: Skipping logical replication transactions on subscriber side

From Sun, Sep 5, 2021 9:58 PM Masahiko Sawada <sawada.mshk@gmail.com>:

On Thu, Sep 2, 2021 at 8:37 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:

From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

Hi,

I reviewed the 0002 patch and have a suggestion for it.

@@ -3672,11 +3671,12 @@ typedef enum AlterSubscriptionType  typedef
struct AlterSubscriptionStmt  {
NodeTag         type;
-       AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_SET_OPTIONS,
etc */
+       AlterSubscriptionType kind; /* ALTER_SUBSCRIPTION_OPTIONS, etc
+ */
char       *subname;            /* Name of the subscription */
char       *conninfo;           /* Connection string to publisher */
List       *publication;        /* One or more publication to
subscribe to */
List       *options;            /* List of DefElem nodes */
+       bool            isReset;                /* true if RESET option */
} AlterSubscriptionStmt;

It's unnatural to me that AlterSubscriptionStmt has isReset flag even in spite of
having 'kind' indicating the command. How about having RESET comand use
the same logic of SET as you suggested while having both
ALTER_SUBSCRIPTION_SET_OPTIONS and
ALTER_SUBSCRIPTION_RESET_OPTIONS?

Yes, I agree with you it will look more natural with ALTER_SUBSCRIPTION_RESET_OPTIONS.

Best regards,
Hou zj

#168Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#159)
Re: Skipping logical replication transactions on subscriber side

On Sat, Sep 4, 2021 at 12:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 3, 2021 at 2:15 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

On Aug 30, 2021, at 12:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches.

For the v12-0003 patch:

I believe this feature is needed, but it also seems like a very powerful foot-gun. Can we do anything to make it less likely that users will hurt themselves with this tool?

This won't do any more harm than currently, users can do via
pg_replication_slot_advance and the same is documented as well, see
[1]. This will be allowed to only superusers. Its effect will be
documented with a precautionary note to use it only when the other
available ways can't be used.

Right.

I am thinking back to support calls I have attended. When a production system is down, there is often some hesitancy to perform ad-hoc operations on the database, but once the decision has been made to do so, people try to get the whole process done as quickly as possible. If multiple transactions on the publisher fail on the subscriber, they will do so in series, not in parallel.

The subscriber will know only one transaction failure at a time, till
that is resolved, the apply won't move ahead and it won't know even if
there are other transactions that are going to fail in the future.

If the user could instead clear all failed transactions of the same type, that might make it less likely that they unthinkingly also skip subsequent errors of some different type. Perhaps something like ALTER SUBSCRIPTION ... SET (skip_failures = 'duplicate key value violates unique constraint "test_pkey"')?

I think if we want we can allow to skip particular error via
skip_error_code instead of via error message but not sure if it would
be better to skip a particular operation of the transaction rather
than the entire transaction. Normally from the atomicity purpose the
transaction can be either committed or rolled-back but not partially
done so I think it would be preferable to skip the entire transaction
rather than skipping it partially.

I think the suggestion by Mark is to skip the entire transaction if
the kind of error matches the specified error.

I think my proposed feature is meant to be a tool to cover the
situation like where something should not happen have happened, rather
than conflict resolution. If the users failed into a difficult
situation where they need to skip a lot of transaction by this
skip_xid feature, they should rebuild the logical replication from
scratch. It seems to me that skipping all transactions that failed due
to the same type of failure seems to be problematic, for example, if
the user forget to reset it. If we want to skip the particular
operation that failed due to the specified error, we should have a
proper conflict resolution feature that can handle various types of
conflicts by various types of resolutions methods, like other RDBMS
supports.

This is arguably a different feature request, and not something your patch is required to address, but I wonder how much we should limit people shooting themselves in the foot? If we built something like this using your skip_xid feature, rather than instead of your skip_xid feature, would your feature need to be modified?

Sawada-San can answer better but I don't see any problem building any
such feature on top of what is currently proposed.

If the feature you proposed is to skip the entire transaction, I also
don't see any problem building the feature on top of my patch. The
patch adds the mechanism to skip the entire transaction so what we
need to do for that feature is to extend how to trigger the skipping
behavior.

I'm having trouble thinking of an example conflict where skipping a transaction would be better than writing a BEFORE INSERT trigger on the conflicting table which suppresses or redirects conflicting rows somewhere else. Particularly for larger transactions containing multiple statements, suppressing the conflicting rows using a trigger would be less messy than skipping the transaction. I think your patch adds a useful tool to the toolkit, but maybe we should mention more alternatives in the docs? Something like, "changing the data on the subscriber so that it doesn't conflict with incoming changes, or dropping the conflicting constraint or unique index, or writing a trigger on the subscriber to suppress or redirect conflicting incoming changes, or as a last resort, by skipping the whole transaction"?

+1 for extending the docs as per this suggestion.

Agreed. I'll add such description to the doc.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#169Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#166)
3 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Sun, Sep 5, 2021 at 10:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 3, 2021 at 3:46 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches. 0004 patch is not the scope of this
patch. It's borrowed from another thread[1] to fix the assertion
failure for newly added tests. Please review them.

BTW, these patches need rebasing (broken by recent commits, patches
0001, 0003 and 0004 no longer apply, and it's failing in the cfbot).

Thanks! I'll submit the updated patches early this week.

Sorry for the late response. I've attached the updated patches that
incorporate all comments unless I missed something. Please review
them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v13-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v13-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v13-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v13-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v13-0001-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v13-0001-Add-pg_stat_subscription_errors-statistics-view.patch
#170Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#169)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 10, 2021 at 12:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sorry for the late response. I've attached the updated patches that
incorporate all comments unless I missed something. Please review
them.

Here's some review comments for the v13-0001 patch:

doc/src/sgml/monitoring.sgml

(1)
There's an extra space in the following line, before "processing":

+ OID of the relation that the worker was processing when the

(2) Suggested wording update:
BEFORE:
+        field is always NULL if the error is reported by
AFTER:
+        field is always NULL if the error is reported by the
(3) Suggested wording update:
BEFORE:
+        by <literal>tablesync</literal> worker.
AFTER:
+        by the <literal>tablesync</literal> worker.

(4)
Missing "." at end of following description (inconsistent with other doc):

+ Time at which these statistics were last reset

(5) Suggested wording update:
BEFORE:
+         can be granted EXECUTE to run the function.
AFTER:
+         can be granted EXECUTE privilege to run the function.

src/backend/postmaster/pgstat.c

(6) Suggested wording update:
BEFORE:
+ * for this relation already completes or the table is no
AFTER:
+ * for this relation already completed or the table is no

(7)
In the code below, since errmsg.m_nentries only ever gets incremented
by the 1st IF condition, it's probably best to include the 2nd IF
block within the 1st IF condition. Then can avoid checking
"errmsg.m_nentries" each loop iteration.

+ if (hash_search(not_ready_rels_htab, (void *) &(errent->relid),
+ HASH_FIND, NULL) == NULL)
+ errmsg.m_relids[errmsg.m_nentries++] = errent->relid;
+
+ /*
+ * If the message is full, send it out and reinitialize to
+ * empty
+ */
+ if (errmsg.m_nentries >= PGSTAT_NUM_SUBSCRIPTIONERRPURGE)
+ {
+ len = offsetof(PgStat_MsgSubscriptionErrPurge, m_relids[0])
+ + errmsg.m_nentries * sizeof(Oid);
+
+ pgstat_setheader(&errmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE);
+ pgstat_send(&errmsg, len);
+ errmsg.m_nentries = 0;
+ }

(8)
+ * Tell the collector about reset the subscription error.

Is this meant to say "Tell the collector to reset the subscription error." ?

(9)
I think the following:

+ len = offsetof(PgStat_MsgSubscriptionErr, m_errmsg[0]) + strlen(errmsg);

should be:

+ len = offsetof(PgStat_MsgSubscriptionErr, m_errmsg[0]) + strlen(errmsg) + 1;

to account for the \0 terminator.

(10)
I don't think that using the following Assert is really correct here,
because PgStat_MsgSubscriptionErr is not setup to have the maximum
number of m_errmsg[] entries to fill up to PGSTAT_MAX_MSG_SIZE (as are
some of the other pgstat structs):

+ Assert(len < PGSTAT_MAX_MSG_SIZE);

(the max size of all of the pgstat structs is statically asserted anyway)

It would be correct to do the following instead:

+ Assert(strlen(errmsg) < PGSTAT_SUBSCRIPTIONERR_MSGLEN);

The overflow is guarded by the strlcpy() in any case.

(11)
Would be better to write:

+ rc = fwrite(&nerrors, sizeof(nerrors), 1, fpout);

instead of:

+ rc = fwrite(&nerrors, sizeof(int32), 1, fpout);

(12)
Would be better to write:

+ if (fread(&nerrors, 1, sizeof(nerrors), fpin) != sizeof(nerrors))

instead of:

+ if (fread(&nerrors, 1, sizeof(int32), fpin) != sizeof(int32))

src/include/pgstat.h

(13)
BEFORE:
+ * update/reset the error happening during logical
AFTER:
+ * update/reset the error occurring during logical

(14)
Typo: replicatoin -> replication

+ * an error that occurred during application of logical replicatoin or

(15) Suggested wording update:
BEFORE:
+ * there is no table sync error, where is the common case in practice.
AFTER:
+ * there is no table sync error, which is the common case in practice.

Regards,
Greg Nancarrow
Fujitsu Australia

#171Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#169)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 10, 2021 at 12:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sorry for the late response. I've attached the updated patches that
incorporate all comments unless I missed something. Please review
them.

A few review comments for the v13-0002 patch:

(1)
I suggest a small update to the patch comment:

BEFORE:
ALTER SUBSCRIPTION ... RESET command resets subscription
parameters. The parameters that can be set are streaming, binary,
synchronous_commit.
AFTER:
ALTER SUBSCRIPTION ... RESET command resets subscription
parameters to their default value. The parameters that can be reset
are streaming, binary, and synchronous_commit.

(2)
In the documentation, the RESETable parameters should be listed in the
same way and order as for SET:

BEFORE:
+     <para>
+       The parameters that can be reset are: <literal>streaming</literal>,
+       <literal>binary</literal>, <literal>synchronous_commit</literal>.
+     </para>
AFTER:
+     <para>
+       The parameters that can be reset are
<literal>synchronous_commit</literal>,
+       <literal>binary</literal>, and <literal>streaming</literal>.
+      </para>

Also I'm thinking it would be beneficial to say before this:

RESET is used to set parameters back to their default value.

(3)
I notice that if you try to reset the slot_name, you get the following message:

postgres=# alter subscription sub reset (slot_name);
ERROR: unrecognized subscription parameter: "slot_name"

This is a bit misleading, because slot_name IS a subscription
parameter, just not resettable.
It would be better if it said something like: ERROR: not a resettable
subscription parameter: "slot_name"

However, it seems that this is also an existing issue with SET (e.g.
for "refresh" or "two_phase")
postgres=# alter subscription sub set (refresh=true);
ERROR: unrecognized subscription parameter: "refresh"

Regards,
Greg Nancarrow
Fujitsu Australia

#172Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#170)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 10, 2021 at 8:46 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Fri, Sep 10, 2021 at 12:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sorry for the late response. I've attached the updated patches that
incorporate all comments unless I missed something. Please review
them.

Here's some review comments for the v13-0001 patch:

Thank you for the comments!

doc/src/sgml/monitoring.sgml

(1)
There's an extra space in the following line, before "processing":

+ OID of the relation that the worker was processing when the

Fixed.

(2) Suggested wording update:
BEFORE:
+        field is always NULL if the error is reported by
AFTER:
+        field is always NULL if the error is reported by the

Fixed.

(3) Suggested wording update:
BEFORE:
+        by <literal>tablesync</literal> worker.
AFTER:
+        by the <literal>tablesync</literal> worker.

Fixed.

(4)
Missing "." at end of following description (inconsistent with other doc):

+ Time at which these statistics were last reset

Fixed.

(5) Suggested wording update:
BEFORE:
+         can be granted EXECUTE to run the function.
AFTER:
+         can be granted EXECUTE privilege to run the function.

Since descriptions of other stats reset functions don't use "EXECUTE
privilege" so I think it'd be better to leave it for consistency.

src/backend/postmaster/pgstat.c

(6) Suggested wording update:
BEFORE:
+ * for this relation already completes or the table is no
AFTER:
+ * for this relation already completed or the table is no

Fixed.

(7)
In the code below, since errmsg.m_nentries only ever gets incremented
by the 1st IF condition, it's probably best to include the 2nd IF
block within the 1st IF condition. Then can avoid checking
"errmsg.m_nentries" each loop iteration.

+ if (hash_search(not_ready_rels_htab, (void *) &(errent->relid),
+ HASH_FIND, NULL) == NULL)
+ errmsg.m_relids[errmsg.m_nentries++] = errent->relid;
+
+ /*
+ * If the message is full, send it out and reinitialize to
+ * empty
+ */
+ if (errmsg.m_nentries >= PGSTAT_NUM_SUBSCRIPTIONERRPURGE)
+ {
+ len = offsetof(PgStat_MsgSubscriptionErrPurge, m_relids[0])
+ + errmsg.m_nentries * sizeof(Oid);
+
+ pgstat_setheader(&errmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE);
+ pgstat_send(&errmsg, len);
+ errmsg.m_nentries = 0;
+ }

Agreed. Instead of including the 2nd if block within the 1st if block,
I changed the 1st if condition to check the opposite condition and
continued the loop if it's true (i.g., the table is still under table
synchronization).

(8)
+ * Tell the collector about reset the subscription error.

Is this meant to say "Tell the collector to reset the subscription error." ?

Yes, fixed.

(9)
I think the following:

+ len = offsetof(PgStat_MsgSubscriptionErr, m_errmsg[0]) + strlen(errmsg);

should be:

+ len = offsetof(PgStat_MsgSubscriptionErr, m_errmsg[0]) + strlen(errmsg) + 1;

to account for the \0 terminator.

Fixed.

(10)
I don't think that using the following Assert is really correct here,
because PgStat_MsgSubscriptionErr is not setup to have the maximum
number of m_errmsg[] entries to fill up to PGSTAT_MAX_MSG_SIZE (as are
some of the other pgstat structs):

+ Assert(len < PGSTAT_MAX_MSG_SIZE);

(the max size of all of the pgstat structs is statically asserted anyway)

It would be correct to do the following instead:

+ Assert(strlen(errmsg) < PGSTAT_SUBSCRIPTIONERR_MSGLEN);

The overflow is guarded by the strlcpy() in any case.

Agreed. Fixed.

(11)
Would be better to write:

+ rc = fwrite(&nerrors, sizeof(nerrors), 1, fpout);

instead of:

+ rc = fwrite(&nerrors, sizeof(int32), 1, fpout);

(12)
Would be better to write:

+ if (fread(&nerrors, 1, sizeof(nerrors), fpin) != sizeof(nerrors))

instead of:

+ if (fread(&nerrors, 1, sizeof(int32), fpin) != sizeof(int32))

Agreed.

src/include/pgstat.h

(13)
BEFORE:
+ * update/reset the error happening during logical
AFTER:
+ * update/reset the error occurring during logical

Fixed.

(14)
Typo: replicatoin -> replication

+ * an error that occurred during application of logical replicatoin or

Fixed.

(15) Suggested wording update:
BEFORE:
+ * there is no table sync error, where is the common case in practice.
AFTER:
+ * there is no table sync error, which is the common case in practice.

Fixed.

I'll submit the updated patches.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#173houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#169)
RE: Skipping logical replication transactions on subscriber side

From Thur, Sep 9, 2021 10:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sorry for the late response. I've attached the updated patches that incorporate
all comments unless I missed something. Please review them.

Thanks for the new version patches.
Here are some comments for the v13-0001 patch.

1)

+					pgstat_setheader(&errmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE);
+					pgstat_send(&errmsg, len);
+					errmsg.m_nentries = 0;
+				}

It seems we can invoke pgstat_setheader once before the loop like the
following:

+			errmsg.m_nentries = 0;
+			errmsg.m_subid = subent->subid;
+			pgstat_setheader(&errmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE);
2)
+					pgstat_setheader(&submsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
+					pgstat_send(&submsg, len);
Same as 1), we can invoke pgstat_setheader once before the loop like:
+		submsg.m_nentries = 0;
+		pgstat_setheader(&submsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);

3)

+/* ----------
+ * PgStat_MsgSubscriptionErrPurge	Sent by the autovacuum to purge the subscription
+ *									errors.

The comments said it's sent by autovacuum, would the manual vacuum also send
this message ?

4)
+
+	pgstat_send(&msg, offsetof(PgStat_MsgSubscriptionErr, m_reset) + sizeof(bool));
+}

Does it look cleaner that we use the offset of m_relid here like the following ?

pgstat_send(&msg, offsetof(PgStat_MsgSubscriptionErr, m_relid));

Best regards,
Hou zj

#174Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#173)
3 attachment(s)
Re: Skipping logical replication transactions on subscriber side

Sorry for the late reply. I was on vacation.

On Tue, Sep 14, 2021 at 11:27 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

From Thur, Sep 9, 2021 10:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Sorry for the late response. I've attached the updated patches that incorporate
all comments unless I missed something. Please review them.

Thanks for the new version patches.
Here are some comments for the v13-0001 patch.

Thank you for the comments!

1)

+                                       pgstat_setheader(&errmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE);
+                                       pgstat_send(&errmsg, len);
+                                       errmsg.m_nentries = 0;
+                               }

It seems we can invoke pgstat_setheader once before the loop like the
following:

+                       errmsg.m_nentries = 0;
+                       errmsg.m_subid = subent->subid;
+                       pgstat_setheader(&errmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE);
2)
+                                       pgstat_setheader(&submsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
+                                       pgstat_send(&submsg, len);
Same as 1), we can invoke pgstat_setheader once before the loop like:
+               submsg.m_nentries = 0;
+               pgstat_setheader(&submsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);

But if we do that, we set the header even if there is no message to
send, right? Looking at other similar code in pgstat_vacuum_stat(), we
set the header just before sending the message. So I'd like to leave
them since it's cleaner.

3)

+/* ----------
+ * PgStat_MsgSubscriptionErrPurge      Sent by the autovacuum to purge the subscription
+ *                                                                     errors.

The comments said it's sent by autovacuum, would the manual vacuum also send
this message ?

Right. Fixed.

4)
+
+       pgstat_send(&msg, offsetof(PgStat_MsgSubscriptionErr, m_reset) + sizeof(bool));
+}

Does it look cleaner that we use the offset of m_relid here like the following ?

pgstat_send(&msg, offsetof(PgStat_MsgSubscriptionErr, m_relid));

Thank you for the suggestion. After more thought, it was a bit odd to
use PgStat_MsgSubscriptionErr to both report and reset the stats by
sending the part or the full struct. So in the latest version, I've
added a new message struct type to reset the subscription error
statistics.

I've attached the updated version patches. Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v14-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v14-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v14-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v14-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v14-0001-Add-pg_stat_subscription_errors-statistics-view.patchapplication/octet-stream; name=v14-0001-Add-pg_stat_subscription_errors-statistics-view.patch
#175Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Mark Dilger (#156)
Re: Skipping logical replication transactions on subscriber side

Hi,

On Fri, Sep 3, 2021 at 4:33 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

On Aug 30, 2021, at 12:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached rebased patches.

Thanks for these patches, Sawada-san!

Sorry for the very late response.

Thank you for the suggestions and the patch!

The first patch in your series, v12-0001, seems useful to me even before committing any of the rest. I would like to integrate the new pg_stat_subscription_errors view it creates into regression tests for other logical replication features under development.

In particular, it can be hard to write TAP tests that need to wait for subscriptions to catch up or fail. With your view committed, a new PostgresNode function to wait for catchup or for failure can be added, and then developers of different projects can all use that.

I like the idea of creating a common function that waits for the
subscription to be ready (i.e., all relations are either in 'r' or 's'
state). There are many places where we wait for all subscription
relations to be ready in existing tap tests. We would be able to
replace those codes with the function. But I'm not sure that it's
useful to have a function that waits for the subscriptions to either
be ready or raise an error. In tap tests, I think that if we wait for
the subscription to raise an error, we should wait only for the error
but not for the subscription to be ready. Thoughts?

I am attaching a version of such a function, plus some tests of your patch (since it does not appear to have any). Would you mind reviewing these and giving comments or including them in your next patch version?

I've looked at the patch and here are some comments:

+
+-- no errors should be reported
+SELECT * FROM pg_stat_subscription_errors;
+
+
+-- Test that the subscription errors view exists, and has the right columns
+-- If we expected any rows to exist, we would need to filter out unstable
+-- columns.  But since there should be no errors, we just select them all.
+select * from pg_stat_subscription_errors;

The patch adds checks of pg_stat_subscription_errors in order to test
if the subscription doesn't have any error. But since the subscription
errors are updated in an asynchronous manner, we cannot say the
subscription is working fine by checking the view only once.

---
The newly added tap tests by 025_errors.pl have two subscribers raise
a table sync error, which seems very similar to the tests that
024_skip_xact.pl adds. So I'm not sure we need this tap test as a
separate tap test file.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#176Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#174)
Re: Skipping logical replication transactions on subscriber side

On Tue, Sep 21, 2021 at 2:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches. Please review them.

Some comments on the v14-0001 patch:

(1)
Patch comment

The existing patch comment doesn't read well. I suggest the following updates:

BEFORE:
Add pg_stat_subscription_errors statistics view.

This commits adds new system view pg_stat_logical_replication_error,
showing errors happening during applying logical replication changes
as well as during performing initial table synchronization.

The subscription error entries are removed by autovacuum workers when
the table synchronization competed in table sync worker cases and when
dropping the subscription in apply worker cases.

It also adds SQL function pg_stat_reset_subscription_error() to
reset the single subscription error.

AFTER:
Add a subscription errors statistics view "pg_stat_subscription_errors".

This commit adds a new system view pg_stat_logical_replication_errors,
that shows information about any errors which occur during application
of logical replication changes as well as during performing initial table
synchronization.

The subscription error entries are removed by autovacuum workers after
table synchronization completes in table sync worker cases and after
dropping the subscription in apply worker cases.

It also adds an SQL function pg_stat_reset_subscription_error() to
reset a single subscription error.

src/backend/postmaster/pgstat.c
(2)
In pgstat_read_db_statsfile_timestamp(), you've added the following
code for case 'S':

+ case 'S':
+ {
+ PgStat_StatSubEntry subbuf;
+ PgStat_StatSubErrEntry errbuf;
+ int32 nerrors;
+
+ if (fread(&subbuf, 1, sizeof(PgStat_StatSubEntry), fpin)
+ != sizeof(PgStat_StatSubEntry))
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("corrupted statistics file \"%s\"",
+ statfile)));
+ FreeFile(fpin);
+ return false;
+ }
+
+ if (fread(&nerrors, 1, sizeof(nerrors), fpin) != sizeof(nerrors))
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("corrupted statistics file \"%s\"",
+ statfile)));
+ goto done;
+ }
+
+ for (int i = 0; i < nerrors; i++)
+ {
+ if (fread(&errbuf, 1, sizeof(PgStat_StatSubErrEntry), fpin) !=
+ sizeof(PgStat_StatSubErrEntry))
+ {
+ ereport(pgStatRunningInCollector ? LOG : WARNING,
+ (errmsg("corrupted statistics file \"%s\"",
+ statfile)));
+ goto done;
+ }
+ }
+ }
+
+ break;
+

Why in the 2nd and 3rd instances of calling fread() and detecting a
corrupted statistics file, does it:
goto done;
instead of:
FreeFile(fpin);
return false;

?
(so ends up returning true for these instances)

It looks like a mistake, but if it's intentional then comments need to
be added to explain it.

(3)
In pgstat_get_subscription_error_entry(), there seems to be a bad comment.

Shouldn't:

+ /* Return the apply error worker */
+ return &(subent->apply_error);

be:

+ /* Return the apply worker error */
+ return &(subent->apply_error);

src/tools/pgindent/typedefs.list
(4)

"PgStat_MsgSubscriptionErrReset" is missing from the list.

Regards,
Greg Nancarrow
Fujitsu Australia

#177Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#174)
Re: Skipping logical replication transactions on subscriber side

On Tue, Sep 21, 2021 at 2:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches. Please review them.

A few review comments for the v14-0002 patch:

(1)
I suggest a small update to the patch comment:

BEFORE:
ALTER SUBSCRIPTION ... RESET command resets subscription
parameters. The parameters that can be set are streaming, binary,
synchronous_commit.

AFTER:
ALTER SUBSCRIPTION ... RESET command resets subscription
parameters to their default value. The parameters that can be reset
are streaming, binary, and synchronous_commit.

(2)
In the documentation, the RESETable parameters should be listed in the
same way and order as for SET:

BEFORE:
+     <para>
+       The parameters that can be reset are: <literal>streaming</literal>,
+       <literal>binary</literal>, <literal>synchronous_commit</literal>.
+     </para>
AFTER:
+     <para>
+       The parameters that can be reset are
<literal>synchronous_commit</literal>,
+       <literal>binary</literal>, and <literal>streaming</literal>.
+      </para>

Also, I'm thinking it would be beneficial to say the following before this:

RESET is used to set parameters back to their default value.

(3)
I notice that if you try to reset the slot_name, you get the following message:
postgres=# alter subscription sub reset (slot_name);
ERROR: unrecognized subscription parameter: "slot_name"

This is a bit misleading, because "slot_name" actually IS a
subscription parameter, just not resettable.
It would be better in this case if it said something like:
ERROR: not a resettable subscription parameter: "slot_name"

However, it seems that this is also an existing issue with SET (e.g.
for "refresh" or "two_phase"):
postgres=# alter subscription sub set (refresh=true);
ERROR: unrecognized subscription parameter: "refresh"

Regards,
Greg Nancarrow
Fujitsu Australia

#178houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#174)
RE: Skipping logical replication transactions on subscriber side

On Tuesday, September 21, 2021 12:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches. Please review them.

Thanks for updating the patch,
here are a few comments on the v14-0001 patch.

1)
+				hash_ctl.keysize = sizeof(Oid);
+				hash_ctl.entrysize = sizeof(SubscriptionRelState);
+				not_ready_rels_htab = hash_create("not ready relations in subscription",
+												  64,
+												  &hash_ctl,
+												  HASH_ELEM | HASH_BLOBS);
+

ISTM we can pass list_length(not_ready_rels_list) as the nelem to hash_create.

2)

+	/*
+	 * Search for all the dead subscriptions and error entries in stats
+	 * hashtable and tell the stats collector to drop them.
+	 */
+	if (subscriptionHash)
+	{
...
+		HTAB	   *htab;
+

It seems we already delacre a "HTAB *htab;" in function pgstat_vacuum_stat(),
can we use the existing htab here ?

3)

 	PGSTAT_MTYPE_RESETREPLSLOTCOUNTER,
+	PGSTAT_MTYPE_SUBSCRIPTIONERR,
+	PGSTAT_MTYPE_SUBSCRIPTIONERRRESET,
+	PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE,
+	PGSTAT_MTYPE_SUBSCRIPTIONPURGE,
 	PGSTAT_MTYPE_AUTOVAC_START,

Can we append these values at the end of the Enum struct which won't affect the
other Enum values.

Best regards,
Hou zj

#179Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#174)
Re: Skipping logical replication transactions on subscriber side

On Tue, Sep 21, 2021 at 10:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches. Please review them.

Review comments for v14-0001-Add-pg_stat_subscription_errors-statistics-view
==============================================================
1.
<entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>command</structfield> <type>text</type>
+      </para>
+      <para>
+        Name of command being applied when the error occurred.  This
+        field is always NULL if the error is reported by the
+        <literal>tablesync</literal> worker.
+      </para></entry>
+     </row>
..
..
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>xid</structfield> <type>xid</type>
+      </para>
+      <para>
+        Transaction ID of the publisher node being applied when the error
+        occurred.  This field is always NULL if the error is reported
+        by the <literal>tablesync</literal> worker.
+      </para></entry>

Shouldn't we display command and transaction id even for table sync
worker if it occurs during sync phase (syncing with apply worker
position)

2.
+ /*
+ * The number of not-ready relations can be high for example right
+ * after creating a subscription, so we load the list of
+ * SubscriptionRelState into the hash table for faster lookups.
+ */

I am not sure this optimization of converting to not-ready relations
list to hash table is worth it. Are we expecting thousands of
relations per subscription? I think that will be a rare case even if
it is there.

3.
+static void
+pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
+{
+ if (subscriptionHash == NULL)
+ return;
+
+ for (int i = 0; i < msg->m_nentries; i++)
+ {
+ PgStat_StatSubEntry *subent;
+
+ subent = pgstat_get_subscription_entry(msg->m_subids[i], false);
+
+ /*
+ * Nothing to do if the subscription entry is not found.  This could
+ * happen when the subscription is dropped and the message for
+ * dropping subscription entry arrived before the message for
+ * reporting the error.
+ */
+ if (subent == NULL)

Is the above comment true even during the purge? I can think of this
during normal processing but not during the purge.

4.
+typedef struct PgStat_MsgSubscriptionErr
+{
+ PgStat_MsgHdr m_hdr;
+
+ /*
+ * m_subid and m_subrelid are used to determine the subscription and the
+ * reporter of this error.  m_subrelid is InvalidOid if reported by the
+ * apply worker, otherwise by the table sync worker.  In table sync worker
+ * case, m_subrelid must be the same as m_relid.
+ */
+ Oid m_subid;
+ Oid m_subrelid;
+
+ /* Error information */
+ Oid m_relid;

Is m_subrelid is used only to distinguish the type of worker? I think
it could be InvalidOid during the syncing phase in the table sync
worker.

5.
+/*
+ * Subscription error statistics kept in the stats collector, representing
+ * an error that occurred during application of logical replication or

The part of the message " ... application of logical replication ..."
sounds a little unclear. Shall we instead write: " ... application of
logical message ..."?

6.
+typedef struct PgStat_StatSubEntry
+{
+ Oid subid; /* hash table key */
+
+ /*
+ * Statistics of errors that occurred during logical replication.  While
+ * having the hash table for table sync errors we have a separate
+ * statistics value for apply error (apply_error), because we can avoid
+ * building a nested hash table for table sync errors in the case where
+ * there is no table sync error, which is the common case in practice.
+ *

The above comment is not clear to me. Why do you need to have a
separate hash table for table sync errors? And what makes it avoid
building nested hash table?

--
With Regards,
Amit Kapila.

#180Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#179)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 24, 2021 at 8:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Sep 21, 2021 at 10:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches. Please review them.

Review comments for v14-0001-Add-pg_stat_subscription_errors-statistics-view
==============================================================
1.
<entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>command</structfield> <type>text</type>
+      </para>
+      <para>
+        Name of command being applied when the error occurred.  This
+        field is always NULL if the error is reported by the
+        <literal>tablesync</literal> worker.
+      </para></entry>
+     </row>
..
..
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>xid</structfield> <type>xid</type>
+      </para>
+      <para>
+        Transaction ID of the publisher node being applied when the error
+        occurred.  This field is always NULL if the error is reported
+        by the <literal>tablesync</literal> worker.
+      </para></entry>

Shouldn't we display command and transaction id even for table sync
worker if it occurs during sync phase (syncing with apply worker
position)

Right. I'll fix it.

2.
+ /*
+ * The number of not-ready relations can be high for example right
+ * after creating a subscription, so we load the list of
+ * SubscriptionRelState into the hash table for faster lookups.
+ */

I am not sure this optimization of converting to not-ready relations
list to hash table is worth it. Are we expecting thousands of
relations per subscription? I think that will be a rare case even if
it is there.

Yeah, it seems overkill. I'll use the simple list. If this becomes a
problem, we can add such optimization later.

3.
+static void
+pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
+{
+ if (subscriptionHash == NULL)
+ return;
+
+ for (int i = 0; i < msg->m_nentries; i++)
+ {
+ PgStat_StatSubEntry *subent;
+
+ subent = pgstat_get_subscription_entry(msg->m_subids[i], false);
+
+ /*
+ * Nothing to do if the subscription entry is not found.  This could
+ * happen when the subscription is dropped and the message for
+ * dropping subscription entry arrived before the message for
+ * reporting the error.
+ */
+ if (subent == NULL)

Is the above comment true even during the purge? I can think of this
during normal processing but not during the purge.

Right, the comment is not true during the purge. Since subent could be
NULL if concurrent autovacuum workers do pgstat_vacuum_stat() I'll
change the comment.

4.
+typedef struct PgStat_MsgSubscriptionErr
+{
+ PgStat_MsgHdr m_hdr;
+
+ /*
+ * m_subid and m_subrelid are used to determine the subscription and the
+ * reporter of this error.  m_subrelid is InvalidOid if reported by the
+ * apply worker, otherwise by the table sync worker.  In table sync worker
+ * case, m_subrelid must be the same as m_relid.
+ */
+ Oid m_subid;
+ Oid m_subrelid;
+
+ /* Error information */
+ Oid m_relid;

Is m_subrelid is used only to distinguish the type of worker? I think
it could be InvalidOid during the syncing phase in the table sync
worker.

Right. I'll fix it.

5.
+/*
+ * Subscription error statistics kept in the stats collector, representing
+ * an error that occurred during application of logical replication or

The part of the message " ... application of logical replication ..."
sounds a little unclear. Shall we instead write: " ... application of
logical message ..."?

Will fix.

6.
+typedef struct PgStat_StatSubEntry
+{
+ Oid subid; /* hash table key */
+
+ /*
+ * Statistics of errors that occurred during logical replication.  While
+ * having the hash table for table sync errors we have a separate
+ * statistics value for apply error (apply_error), because we can avoid
+ * building a nested hash table for table sync errors in the case where
+ * there is no table sync error, which is the common case in practice.
+ *

The above comment is not clear to me. Why do you need to have a
separate hash table for table sync errors? And what makes it avoid
building nested hash table?

In the previous patch, a subscription stats entry
(PgStat_StatSubEntry) had one hash table that had error entries of
both apply and table sync. Since a subscription can have one apply
worker and multiple table sync workers it makes sense to me to have
the subscription entry have a hash table for them. The reason why we
have one error entry for an apply error and a hash table for table
sync errors is that there is the common case where an apply error
happens whereas any table sync error doesn’t. With this optimization,
if the subscription has only apply error, since we can store it into
aply_error field, we can avoid building a hash table for sync errors.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#181Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#177)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 24, 2021 at 5:27 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Sep 21, 2021 at 2:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches. Please review them.

A few review comments for the v14-0002 patch:

Thank you for the comments!

(1)
I suggest a small update to the patch comment:

BEFORE:
ALTER SUBSCRIPTION ... RESET command resets subscription
parameters. The parameters that can be set are streaming, binary,
synchronous_commit.

AFTER:
ALTER SUBSCRIPTION ... RESET command resets subscription
parameters to their default value. The parameters that can be reset
are streaming, binary, and synchronous_commit.

(2)
In the documentation, the RESETable parameters should be listed in the
same way and order as for SET:

BEFORE:
+     <para>
+       The parameters that can be reset are: <literal>streaming</literal>,
+       <literal>binary</literal>, <literal>synchronous_commit</literal>.
+     </para>
AFTER:
+     <para>
+       The parameters that can be reset are
<literal>synchronous_commit</literal>,
+       <literal>binary</literal>, and <literal>streaming</literal>.
+      </para>

Also, I'm thinking it would be beneficial to say the following before this:

RESET is used to set parameters back to their default value.

I agreed with all of the above comments. I'll incorporate them into
the next version patch that I'm going to submit next Monday.

(3)
I notice that if you try to reset the slot_name, you get the following message:
postgres=# alter subscription sub reset (slot_name);
ERROR: unrecognized subscription parameter: "slot_name"

This is a bit misleading, because "slot_name" actually IS a
subscription parameter, just not resettable.
It would be better in this case if it said something like:
ERROR: not a resettable subscription parameter: "slot_name"

However, it seems that this is also an existing issue with SET (e.g.
for "refresh" or "two_phase"):
postgres=# alter subscription sub set (refresh=true);
ERROR: unrecognized subscription parameter: "refresh"

Good point. Maybe we can improve it in a separate patch?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#182Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#178)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 24, 2021 at 5:53 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, September 21, 2021 12:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated version patches. Please review them.

Thanks for updating the patch,
here are a few comments on the v14-0001 patch.

Thank you for the comments!

1)
+                               hash_ctl.keysize = sizeof(Oid);
+                               hash_ctl.entrysize = sizeof(SubscriptionRelState);
+                               not_ready_rels_htab = hash_create("not ready relations in subscription",
+                                                                                                 64,
+                                                                                                 &hash_ctl,
+                                                                                                 HASH_ELEM | HASH_BLOBS);
+

ISTM we can pass list_length(not_ready_rels_list) as the nelem to hash_create.

As Amit pointed out, it seems not necessary to build a temporary hash
table for this purpose.

2)

+       /*
+        * Search for all the dead subscriptions and error entries in stats
+        * hashtable and tell the stats collector to drop them.
+        */
+       if (subscriptionHash)
+       {
...
+               HTAB       *htab;
+

It seems we already delacre a "HTAB *htab;" in function pgstat_vacuum_stat(),
can we use the existing htab here ?

Right. Will remove it.

3)

PGSTAT_MTYPE_RESETREPLSLOTCOUNTER,
+       PGSTAT_MTYPE_SUBSCRIPTIONERR,
+       PGSTAT_MTYPE_SUBSCRIPTIONERRRESET,
+       PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE,
+       PGSTAT_MTYPE_SUBSCRIPTIONPURGE,
PGSTAT_MTYPE_AUTOVAC_START,

Can we append these values at the end of the Enum struct which won't affect the
other Enum values.

Yes, I'll move them to the end of the Enum struct.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#183Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#180)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 24, 2021 at 6:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 24, 2021 at 8:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

6.
+typedef struct PgStat_StatSubEntry
+{
+ Oid subid; /* hash table key */
+
+ /*
+ * Statistics of errors that occurred during logical replication.  While
+ * having the hash table for table sync errors we have a separate
+ * statistics value for apply error (apply_error), because we can avoid
+ * building a nested hash table for table sync errors in the case where
+ * there is no table sync error, which is the common case in practice.
+ *

The above comment is not clear to me. Why do you need to have a
separate hash table for table sync errors? And what makes it avoid
building nested hash table?

In the previous patch, a subscription stats entry
(PgStat_StatSubEntry) had one hash table that had error entries of
both apply and table sync. Since a subscription can have one apply
worker and multiple table sync workers it makes sense to me to have
the subscription entry have a hash table for them.

Sure, but each tablesync worker must have a separate relid. Why can't
we have a single hash table for both apply and table sync workers
which are hashed by sub_id + rel_id? For apply worker, the rel_id will
always be zero (InvalidOId) and tablesync workers will have a unique
OID for rel_id, so we should be able to uniquely identify each of
apply and table sync workers.

--
With Regards,
Amit Kapila.

#184Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#183)
Re: Skipping logical replication transactions on subscriber side

On Sat, Sep 25, 2021 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 24, 2021 at 6:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 24, 2021 at 8:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

6.
+typedef struct PgStat_StatSubEntry
+{
+ Oid subid; /* hash table key */
+
+ /*
+ * Statistics of errors that occurred during logical replication.  While
+ * having the hash table for table sync errors we have a separate
+ * statistics value for apply error (apply_error), because we can avoid
+ * building a nested hash table for table sync errors in the case where
+ * there is no table sync error, which is the common case in practice.
+ *

The above comment is not clear to me. Why do you need to have a
separate hash table for table sync errors? And what makes it avoid
building nested hash table?

In the previous patch, a subscription stats entry
(PgStat_StatSubEntry) had one hash table that had error entries of
both apply and table sync. Since a subscription can have one apply
worker and multiple table sync workers it makes sense to me to have
the subscription entry have a hash table for them.

Sure, but each tablesync worker must have a separate relid. Why can't
we have a single hash table for both apply and table sync workers
which are hashed by sub_id + rel_id? For apply worker, the rel_id will
always be zero (InvalidOId) and tablesync workers will have a unique
OID for rel_id, so we should be able to uniquely identify each of
apply and table sync workers.

What I imagined is to extend the subscription statistics, for
instance, transaction stats[1]/messages/by-id/OSBPR01MB48887CA8F40C8D984A6DC00CED199@OSBPR01MB4888.jpnprd01.prod.outlook.com. By having a hash table for
subscriptions, we can store those statistics into an entry of the hash
table and we can think of subscription errors as also statistics of
the subscription. So we can have another hash table for errors in an
entry of the subscription hash table. For example, the subscription
entry struct will be something like:

typedef struct PgStat_StatSubEntry
{
Oid subid; /* hash key */

HTAB *errors; /* apply and table sync errors */

/* transaction stats of subscription */
PgStat_Counter xact_commit;
PgStat_Counter xact_commit_bytes;
PgStat_Counter xact_error;
PgStat_Counter xact_error_bytes;
PgStat_Counter xact_abort;
PgStat_Counter xact_abort_bytes;
PgStat_Counter failure_count;
} PgStat_StatSubEntry;

When a subscription is dropped, we can easily drop the subscription
entry along with those statistics including the errors from the hash
table.

Regards,

[1]: /messages/by-id/OSBPR01MB48887CA8F40C8D984A6DC00CED199@OSBPR01MB4888.jpnprd01.prod.outlook.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#185Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#184)
Re: Skipping logical replication transactions on subscriber side

On Mon, Sep 27, 2021 at 6:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Sep 25, 2021 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sure, but each tablesync worker must have a separate relid. Why can't
we have a single hash table for both apply and table sync workers
which are hashed by sub_id + rel_id? For apply worker, the rel_id will
always be zero (InvalidOId) and tablesync workers will have a unique
OID for rel_id, so we should be able to uniquely identify each of
apply and table sync workers.

What I imagined is to extend the subscription statistics, for
instance, transaction stats[1]. By having a hash table for
subscriptions, we can store those statistics into an entry of the hash
table and we can think of subscription errors as also statistics of
the subscription. So we can have another hash table for errors in an
entry of the subscription hash table. For example, the subscription
entry struct will be something like:

typedef struct PgStat_StatSubEntry
{
Oid subid; /* hash key */

HTAB *errors; /* apply and table sync errors */

/* transaction stats of subscription */
PgStat_Counter xact_commit;
PgStat_Counter xact_commit_bytes;
PgStat_Counter xact_error;
PgStat_Counter xact_error_bytes;
PgStat_Counter xact_abort;
PgStat_Counter xact_abort_bytes;
PgStat_Counter failure_count;
} PgStat_StatSubEntry;

I think these additional stats will be displayed via
pg_stat_subscription, right? If so, the current stats of that view are
all in-memory and are per LogicalRepWorker which means that for those
stats also we will have different entries for apply and table sync
worker. If this understanding is correct, won't it be better to
represent this as below?

typedef struct PgStat_StatSubWorkerEntry
{
/* hash key */
Oid subid;
Oid relid

/* worker stats which includes xact stats */
PgStat_SubWorkerStats worker_stats

/* error stats */
PgStat_StatSubErrEntry worker_error_stats;
} PgStat_StatSubWorkerEntry;

typedef struct PgStat_SubWorkerStats
{
/* define existing stats here */
....

/* transaction stats of subscription */
PgStat_Counter xact_commit;
PgStat_Counter xact_commit_bytes;
PgStat_Counter xact_error;
PgStat_Counter xact_error_bytes;
PgStat_Counter xact_abort;
PgStat_Counter xact_abort_bytes;
} PgStat_SubWorkerStats;

Now, at drop subscription, we do need to find and remove all the subid
+ relid entries.

--
With Regards,
Amit Kapila.

#186Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#175)
Re: Skipping logical replication transactions on subscriber side

On Fri, Sep 24, 2021 at 7:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 3, 2021 at 4:33 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

I am attaching a version of such a function, plus some tests of your patch (since it does not appear to have any). Would you mind reviewing these and giving comments or including them in your next patch version?

I've looked at the patch and here are some comments:

+
+-- no errors should be reported
+SELECT * FROM pg_stat_subscription_errors;
+
+
+-- Test that the subscription errors view exists, and has the right columns
+-- If we expected any rows to exist, we would need to filter out unstable
+-- columns.  But since there should be no errors, we just select them all.
+select * from pg_stat_subscription_errors;

The patch adds checks of pg_stat_subscription_errors in order to test
if the subscription doesn't have any error. But since the subscription
errors are updated in an asynchronous manner, we cannot say the
subscription is working fine by checking the view only once.

One question I have here is, can we reliably write few tests just for
the new view patch? Right now, it has no test, having a few tests will
be better. Here, because the apply worker will keep on failing till we
stop it or resolve the conflict, can we rely on that fact? The idea
is that even if one of the entry is missed by stats collector, a new
one (probably the same one) will be issued and we can wait till we see
one error in view. We can add additional PostgresNode.pm
infrastructure once the main patch is committed.

--
With Regards,
Amit Kapila.

#187Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#185)
Re: Skipping logical replication transactions on subscriber side

On Mon, Sep 27, 2021 at 12:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 27, 2021 at 6:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Sep 25, 2021 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sure, but each tablesync worker must have a separate relid. Why can't
we have a single hash table for both apply and table sync workers
which are hashed by sub_id + rel_id? For apply worker, the rel_id will
always be zero (InvalidOId) and tablesync workers will have a unique
OID for rel_id, so we should be able to uniquely identify each of
apply and table sync workers.

What I imagined is to extend the subscription statistics, for
instance, transaction stats[1]. By having a hash table for
subscriptions, we can store those statistics into an entry of the hash
table and we can think of subscription errors as also statistics of
the subscription. So we can have another hash table for errors in an
entry of the subscription hash table. For example, the subscription
entry struct will be something like:

typedef struct PgStat_StatSubEntry
{
Oid subid; /* hash key */

HTAB *errors; /* apply and table sync errors */

/* transaction stats of subscription */
PgStat_Counter xact_commit;
PgStat_Counter xact_commit_bytes;
PgStat_Counter xact_error;
PgStat_Counter xact_error_bytes;
PgStat_Counter xact_abort;
PgStat_Counter xact_abort_bytes;
PgStat_Counter failure_count;
} PgStat_StatSubEntry;

I think these additional stats will be displayed via
pg_stat_subscription, right? If so, the current stats of that view are
all in-memory and are per LogicalRepWorker which means that for those
stats also we will have different entries for apply and table sync
worker. If this understanding is correct, won't it be better to
represent this as below?

I was thinking that we have a different stats view for example
pg_stat_subscription_xacts that has entries per subscription. But your
idea seems better to me.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#188Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#187)
Re: Skipping logical replication transactions on subscriber side

On Mon, Sep 27, 2021 at 12:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Sep 27, 2021 at 12:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 27, 2021 at 6:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Sep 25, 2021 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sure, but each tablesync worker must have a separate relid. Why can't
we have a single hash table for both apply and table sync workers
which are hashed by sub_id + rel_id? For apply worker, the rel_id will
always be zero (InvalidOId) and tablesync workers will have a unique
OID for rel_id, so we should be able to uniquely identify each of
apply and table sync workers.

What I imagined is to extend the subscription statistics, for
instance, transaction stats[1]. By having a hash table for
subscriptions, we can store those statistics into an entry of the hash
table and we can think of subscription errors as also statistics of
the subscription. So we can have another hash table for errors in an
entry of the subscription hash table. For example, the subscription
entry struct will be something like:

typedef struct PgStat_StatSubEntry
{
Oid subid; /* hash key */

HTAB *errors; /* apply and table sync errors */

/* transaction stats of subscription */
PgStat_Counter xact_commit;
PgStat_Counter xact_commit_bytes;
PgStat_Counter xact_error;
PgStat_Counter xact_error_bytes;
PgStat_Counter xact_abort;
PgStat_Counter xact_abort_bytes;
PgStat_Counter failure_count;
} PgStat_StatSubEntry;

I think these additional stats will be displayed via
pg_stat_subscription, right? If so, the current stats of that view are
all in-memory and are per LogicalRepWorker which means that for those
stats also we will have different entries for apply and table sync
worker. If this understanding is correct, won't it be better to
represent this as below?

I was thinking that we have a different stats view for example
pg_stat_subscription_xacts that has entries per subscription. But your
idea seems better to me.

I mean that showing statistics (including transaction statistics and
errors) per logical replication worker seems better to me, no matter
what view shows these statistics. I'll change the patch in that way.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#189Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#186)
Re: Skipping logical replication transactions on subscriber side

On Mon, Sep 27, 2021 at 12:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 24, 2021 at 7:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 3, 2021 at 4:33 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

I am attaching a version of such a function, plus some tests of your patch (since it does not appear to have any). Would you mind reviewing these and giving comments or including them in your next patch version?

I've looked at the patch and here are some comments:

+
+-- no errors should be reported
+SELECT * FROM pg_stat_subscription_errors;
+
+
+-- Test that the subscription errors view exists, and has the right columns
+-- If we expected any rows to exist, we would need to filter out unstable
+-- columns.  But since there should be no errors, we just select them all.
+select * from pg_stat_subscription_errors;

The patch adds checks of pg_stat_subscription_errors in order to test
if the subscription doesn't have any error. But since the subscription
errors are updated in an asynchronous manner, we cannot say the
subscription is working fine by checking the view only once.

One question I have here is, can we reliably write few tests just for
the new view patch? Right now, it has no test, having a few tests will
be better. Here, because the apply worker will keep on failing till we
stop it or resolve the conflict, can we rely on that fact? The idea
is that even if one of the entry is missed by stats collector, a new
one (probably the same one) will be issued and we can wait till we see
one error in view. We can add additional PostgresNode.pm
infrastructure once the main patch is committed.

Yes, the new tests added by 0003 patch (skip_xid patch) use that fact.
After the error is shown in the view, we fetch the XID from the view
to specify as skip_xid. The tests just for the
pg_stat_subscription_errors view will be a subset of these tests. So
probably we can add it in 0001 patch and 0003 patch can extend the
tests so that it tests skip_xid option.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#190Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#189)
Re: Skipping logical replication transactions on subscriber side

On Mon, Sep 27, 2021 at 11:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Sep 27, 2021 at 12:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 24, 2021 at 7:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Sep 3, 2021 at 4:33 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

I am attaching a version of such a function, plus some tests of your patch (since it does not appear to have any). Would you mind reviewing these and giving comments or including them in your next patch version?

I've looked at the patch and here are some comments:

+
+-- no errors should be reported
+SELECT * FROM pg_stat_subscription_errors;
+
+
+-- Test that the subscription errors view exists, and has the right columns
+-- If we expected any rows to exist, we would need to filter out unstable
+-- columns.  But since there should be no errors, we just select them all.
+select * from pg_stat_subscription_errors;

The patch adds checks of pg_stat_subscription_errors in order to test
if the subscription doesn't have any error. But since the subscription
errors are updated in an asynchronous manner, we cannot say the
subscription is working fine by checking the view only once.

One question I have here is, can we reliably write few tests just for
the new view patch? Right now, it has no test, having a few tests will
be better. Here, because the apply worker will keep on failing till we
stop it or resolve the conflict, can we rely on that fact? The idea
is that even if one of the entry is missed by stats collector, a new
one (probably the same one) will be issued and we can wait till we see
one error in view. We can add additional PostgresNode.pm
infrastructure once the main patch is committed.

Yes, the new tests added by 0003 patch (skip_xid patch) use that fact.
After the error is shown in the view, we fetch the XID from the view
to specify as skip_xid. The tests just for the
pg_stat_subscription_errors view will be a subset of these tests. So
probably we can add it in 0001 patch and 0003 patch can extend the
tests so that it tests skip_xid option.

This makes sense to me.

--
With Regards,
Amit Kapila.

#191Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#188)
Re: Skipping logical replication transactions on subscriber side

On Mon, Sep 27, 2021 at 11:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Sep 27, 2021 at 12:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Sep 27, 2021 at 12:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 27, 2021 at 6:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Sep 25, 2021 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sure, but each tablesync worker must have a separate relid. Why can't
we have a single hash table for both apply and table sync workers
which are hashed by sub_id + rel_id? For apply worker, the rel_id will
always be zero (InvalidOId) and tablesync workers will have a unique
OID for rel_id, so we should be able to uniquely identify each of
apply and table sync workers.

What I imagined is to extend the subscription statistics, for
instance, transaction stats[1]. By having a hash table for
subscriptions, we can store those statistics into an entry of the hash
table and we can think of subscription errors as also statistics of
the subscription. So we can have another hash table for errors in an
entry of the subscription hash table. For example, the subscription
entry struct will be something like:

typedef struct PgStat_StatSubEntry
{
Oid subid; /* hash key */

HTAB *errors; /* apply and table sync errors */

/* transaction stats of subscription */
PgStat_Counter xact_commit;
PgStat_Counter xact_commit_bytes;
PgStat_Counter xact_error;
PgStat_Counter xact_error_bytes;
PgStat_Counter xact_abort;
PgStat_Counter xact_abort_bytes;
PgStat_Counter failure_count;
} PgStat_StatSubEntry;

I think these additional stats will be displayed via
pg_stat_subscription, right? If so, the current stats of that view are
all in-memory and are per LogicalRepWorker which means that for those
stats also we will have different entries for apply and table sync
worker. If this understanding is correct, won't it be better to
represent this as below?

I was thinking that we have a different stats view for example
pg_stat_subscription_xacts that has entries per subscription. But your
idea seems better to me.

I mean that showing statistics (including transaction statistics and
errors) per logical replication worker seems better to me, no matter
what view shows these statistics. I'll change the patch in that way.

Sounds good.

--
With Regards,
Amit Kapila.

#192Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#191)
3 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Mon, Sep 27, 2021 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 27, 2021 at 11:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Sep 27, 2021 at 12:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Sep 27, 2021 at 12:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 27, 2021 at 6:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Sep 25, 2021 at 4:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Sure, but each tablesync worker must have a separate relid. Why can't
we have a single hash table for both apply and table sync workers
which are hashed by sub_id + rel_id? For apply worker, the rel_id will
always be zero (InvalidOId) and tablesync workers will have a unique
OID for rel_id, so we should be able to uniquely identify each of
apply and table sync workers.

What I imagined is to extend the subscription statistics, for
instance, transaction stats[1]. By having a hash table for
subscriptions, we can store those statistics into an entry of the hash
table and we can think of subscription errors as also statistics of
the subscription. So we can have another hash table for errors in an
entry of the subscription hash table. For example, the subscription
entry struct will be something like:

typedef struct PgStat_StatSubEntry
{
Oid subid; /* hash key */

HTAB *errors; /* apply and table sync errors */

/* transaction stats of subscription */
PgStat_Counter xact_commit;
PgStat_Counter xact_commit_bytes;
PgStat_Counter xact_error;
PgStat_Counter xact_error_bytes;
PgStat_Counter xact_abort;
PgStat_Counter xact_abort_bytes;
PgStat_Counter failure_count;
} PgStat_StatSubEntry;

I think these additional stats will be displayed via
pg_stat_subscription, right? If so, the current stats of that view are
all in-memory and are per LogicalRepWorker which means that for those
stats also we will have different entries for apply and table sync
worker. If this understanding is correct, won't it be better to
represent this as below?

I was thinking that we have a different stats view for example
pg_stat_subscription_xacts that has entries per subscription. But your
idea seems better to me.

I mean that showing statistics (including transaction statistics and
errors) per logical replication worker seems better to me, no matter
what view shows these statistics. I'll change the patch in that way.

I've attached updated patches that incorporate all comments I got so
far. Please review them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v15-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v15-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v15-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v15-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v15-0001-Add-a-subscription-errors-statistics-view-pg_sta.patchapplication/octet-stream; name=v15-0001-Add-a-subscription-errors-statistics-view-pg_sta.patch
#193Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Masahiko Sawada (#192)
Re: Skipping logical replication transactions on subscriber side

On 30.09.21 07:45, Masahiko Sawada wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.

I'm uneasy about the way the xids-to-be-skipped are presented as
subscriptions options, similar to settings such as "binary". I see how
that is convenient, but it's not really the same thing, in how you use
it, is it? Even if we share some details internally, I feel that there
should be a separate syntax somehow.

Also, what happens when you forget to reset the xid after it has passed?
Will it get skipped again after wraparound?

#194Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#193)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 1, 2021 at 5:05 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 30.09.21 07:45, Masahiko Sawada wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.

I'm uneasy about the way the xids-to-be-skipped are presented as
subscriptions options, similar to settings such as "binary". I see how
that is convenient, but it's not really the same thing, in how you use
it, is it? Even if we share some details internally, I feel that there
should be a separate syntax somehow.

Since I was thinking that ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION, in the first several
version patches it added a separate syntax for this feature like ALTER
SUBSCRIPTION ... SET SKIP TRANSACTION xxx. But Amit was concerned
about an additional syntax and consistency with disable_on_error[1]/messages/by-id/CAA4eK1LjrU8x+x=bFazVD10pgOVy0PEE8mpz3nQhDG+mmU8ivQ@mail.gmail.com
which is proposed by Mark Diliger[2]/messages/by-id/DB35438F-9356-4841-89A0-412709EBD3AB@enterprisedb.com, so I’ve changed it to a
subscription option. I tried to find a policy of that by checking the
existing syntaxes but I could not find, and interestingly when it
comes to ALTER SUBSCRIPTION syntax, we support both ENABLE/DISABLE
syntax and SET (enabled = on/off) option.

Also, what happens when you forget to reset the xid after it has passed?
Will it get skipped again after wraparound?

Yes. Currently it's a user's responsibility. We thoroughly documented
the risk of this feature and thus it should be used as a last resort
since it may easily make the subscriber inconsistent, especially if a
user specifies the wrong transaction ID.

Regards,

[1]: /messages/by-id/CAA4eK1LjrU8x+x=bFazVD10pgOVy0PEE8mpz3nQhDG+mmU8ivQ@mail.gmail.com
[2]: /messages/by-id/DB35438F-9356-4841-89A0-412709EBD3AB@enterprisedb.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#195Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#194)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 1, 2021 at 6:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 1, 2021 at 5:05 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Also, what happens when you forget to reset the xid after it has passed?
Will it get skipped again after wraparound?

Yes.

Aren't we resetting the skip_xid once we skip that transaction in
stop_skipping_changes()? If so, it shouldn't be possible to skip it
again after the wraparound. Am I missing something?

Now, if the user has wrongly set some XID which we can't skip as that
is already in past or something like that then I think it is the
user's problem and that's why it can be done only by super users. I
think we have even thought of protecting that via cross-checking with
the information in view but as the view data is lossy, we can't rely
on that. I think users can even set some valid XID that never has any
error and we will still skip it which is what can be done today also
by pg_replication_origin_advance(). I am not sure if we can do much
about such scenarios except to carefully document them.

--
With Regards,
Amit Kapila.

#196Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#194)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 1, 2021 at 6:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 1, 2021 at 5:05 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 30.09.21 07:45, Masahiko Sawada wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.

I'm uneasy about the way the xids-to-be-skipped are presented as
subscriptions options, similar to settings such as "binary". I see how
that is convenient, but it's not really the same thing, in how you use
it, is it? Even if we share some details internally, I feel that there
should be a separate syntax somehow.

Since I was thinking that ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION, in the first several
version patches it added a separate syntax for this feature like ALTER
SUBSCRIPTION ... SET SKIP TRANSACTION xxx. But Amit was concerned
about an additional syntax and consistency with disable_on_error[1]
which is proposed by Mark Diliger[2], so I’ve changed it to a
subscription option.

Yeah, the basic idea is that this is not the only option we will
support for taking actions on error/conflict. For example, we might
want to disable subscriptions or allow skipping transactions based on
XID, LSN, etc. So, developing separate syntax for each of the options
doesn't seem like a good idea. However considering Peter's point, how
about something like:

Alter Subscription <sub_name> On Error ( subscription_parameter [=
value] [, ... ] );
OR
Alter Subscription <sub_name> On Conflict ( subscription_parameter [=
value] [, ... ] );

--
With Regards,
Amit Kapila.

#197Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#195)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 1, 2021 at 2:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 1, 2021 at 6:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 1, 2021 at 5:05 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Also, what happens when you forget to reset the xid after it has passed?
Will it get skipped again after wraparound?

Yes.

Aren't we resetting the skip_xid once we skip that transaction in
stop_skipping_changes()? If so, it shouldn't be possible to skip it
again after the wraparound. Am I missing something?

Oops, I'd misunderstood the question. Yes, Amit is right. Once we skip
the transaction, skip_xid is automatically reset. So users don't need
to reset it manually after skipping the transaction. Sorry for the
confusion.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#198Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#196)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 1, 2021 at 5:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 1, 2021 at 6:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 1, 2021 at 5:05 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 30.09.21 07:45, Masahiko Sawada wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.

I'm uneasy about the way the xids-to-be-skipped are presented as
subscriptions options, similar to settings such as "binary". I see how
that is convenient, but it's not really the same thing, in how you use
it, is it? Even if we share some details internally, I feel that there
should be a separate syntax somehow.

Since I was thinking that ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION, in the first several
version patches it added a separate syntax for this feature like ALTER
SUBSCRIPTION ... SET SKIP TRANSACTION xxx. But Amit was concerned
about an additional syntax and consistency with disable_on_error[1]
which is proposed by Mark Diliger[2], so I’ve changed it to a
subscription option.

Yeah, the basic idea is that this is not the only option we will
support for taking actions on error/conflict. For example, we might
want to disable subscriptions or allow skipping transactions based on
XID, LSN, etc.

I guess disabling subscriptions on error/conflict and skipping the
particular transactions are somewhat different types of functions.
Disabling subscriptions on error/conflict seems likes a setting
parameter of subscriptions. The users might want to specify this
option at creation time. Whereas, skipping the particular transaction
is a repair function that the user might want to use on the spot in
case of a failure. I’m concerned a bit that combining these functions
to one syntax could confuse the users.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#199Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#198)
Re: Skipping logical replication transactions on subscriber side

On Mon, Oct 4, 2021 at 6:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 1, 2021 at 5:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 1, 2021 at 6:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 1, 2021 at 5:05 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 30.09.21 07:45, Masahiko Sawada wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.

I'm uneasy about the way the xids-to-be-skipped are presented as
subscriptions options, similar to settings such as "binary". I see how
that is convenient, but it's not really the same thing, in how you use
it, is it? Even if we share some details internally, I feel that there
should be a separate syntax somehow.

Since I was thinking that ALTER SUBSCRIPTION ... SET is used to alter
parameters originally set by CREATE SUBSCRIPTION, in the first several
version patches it added a separate syntax for this feature like ALTER
SUBSCRIPTION ... SET SKIP TRANSACTION xxx. But Amit was concerned
about an additional syntax and consistency with disable_on_error[1]
which is proposed by Mark Diliger[2], so I’ve changed it to a
subscription option.

Yeah, the basic idea is that this is not the only option we will
support for taking actions on error/conflict. For example, we might
want to disable subscriptions or allow skipping transactions based on
XID, LSN, etc.

I guess disabling subscriptions on error/conflict and skipping the
particular transactions are somewhat different types of functions.
Disabling subscriptions on error/conflict seems likes a setting
parameter of subscriptions. The users might want to specify this
option at creation time.

Okay, but they can still specify it by using "On Error" syntax.

Whereas, skipping the particular transaction
is a repair function that the user might want to use on the spot in
case of a failure. I’m concerned a bit that combining these functions
to one syntax could confuse the users.

Fair enough, I was mainly trying to combine the syntax for all actions
that we can take "On Error". We can allow to set them either at Create
Subscription or Alter Subscription time.

I think here the main point is that does this addresses Peter's
concern for this Patch to use a separate syntax? Peter E., can you
please confirm? Do let us know if you have something else going in
your mind?

--
With Regards,
Amit Kapila.

#200Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Amit Kapila (#199)
Re: Skipping logical replication transactions on subscriber side

On Mon, Oct 4, 2021 at 4:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think here the main point is that does this addresses Peter's
concern for this Patch to use a separate syntax? Peter E., can you
please confirm? Do let us know if you have something else going in
your mind?

Peter's concern seemed to be that the use of a subscription option,
though convenient, isn't an intuitive natural fit for providing this
feature (i.e. ability to skip a transaction by xid). I tend to have
that feeling about using a subscription option for this feature. I'm
not sure what possible alternative syntax he had in mind and currently
can't really think of a good one myself that fits the purpose.

I think that the 1st and 2nd patch are useful in their own right, but
couldn't this feature (i.e. the 3rd patch) be provided instead as an
additional Replication Management function (see 9.27.6)?
e.g. pg_replication_skip_xid

Regards,
Greg Nancarrow
Fujitsu Australia

#201osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#192)
RE: Skipping logical replication transactions on subscriber side

On Thursday, September 30, 2021 2:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far. Please
review them.

Hi

Sorry, if I misunderstand something but
did someone check what happens when we
execute ALTER SUBSCRIPTION ... RESET (streaming)
in the middle of one txn which has several streaming of data to the sub,
especially after some part of txn has been already streamed.
My intention of this is something like *if* we can find an actual harm of this,
I wanted to suggest the necessity of a safeguard or some measure into the patch.

An example)

Set the logical_decoding_work_mem = 64kB on the pub.
and create a table and subscription with streaming = true.
In addition, log_min_messages = DEBUG1 on the sub
is helpful to check the LOG on the sub in stream_open_file().

<Session 1> connect to the publisher

BEGIN;
INSERT INTO tab VALUES (generate_series(1, 1000)); -- this exceeds the memory limit
SELECT * FROM pg_stat_replication_slots; -- check the actual streaming bytes&counts just in case

<Session 2> connect to the subscriber
-- after checking some logs of "open file .... for streamed changes" on the sub
ALTER SUBSCRIPTION mysub RESET (streaming)

<Session 1>
INSERT INTO tab VALUES (generate_series(1001, 2000)); -- again, exceeds the limit
COMMIT;

I observed that the subscriber doesn't
accept STREAM_COMMIT in this case but gets BEGIN&COMMIT instead at the end.
I couldn't find any apparent and immediate issue from those steps
but is that no problem ?
Probably, this kind of situation applies to other reset target options ?

Best Regards,
Takamichi Osumi

#202Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#192)
Re: Skipping logical replication transactions on subscriber side

On Thu, Sep 30, 2021 at 3:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.

Some comments about the v15-0001 patch:

(1) patch adds a whitespace error

Applying: Add a subscription errors statistics view
"pg_stat_subscription_errors".
.git/rebase-apply/patch:1656: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

(2) Patch comment says "This commit adds a new system view
pg_stat_logical_replication_errors ..."
BUT this is the wrong name, it should be "pg_stat_subscription_errors".

doc/src/sgml/monitoring.sgml

(3)
"Message of the error" doesn't sound right. I suggest just saying "The
error message".

(4) view column "last_failed_time"
I think it would be better to name this "last_error_time".

src/backend/postmaster/pgstat.c

(5) pgstat_vacuum_subworker_stats()

Spelling mistake in the following comment:

/* Create a map for mapping subscriptoin OID and database OID */

subscriptoin -> subscription

(6)
In the following functions:

pgstat_read_statsfiles
pgstat_read_db_statsfile_timestamp

The following comment should say "... struct describing subscription
worker statistics."
(i.e. need to remove the "a")

+ * 'S' A PgStat_StatSubWorkerEntry struct describing a
+ * subscription worker statistics.

(7) pgstat_get_subworker_entry

Suggest comment change:

BEFORE:
+ * Return the entry of subscription worker entry with the subscription
AFTER:
+ * Return subscription worker entry with the given subscription

(8) pgstat_recv_subworker_error

+ /*
+ * Update only the counter and timestamp if we received the same error
+ * again
+ */
+ if (wentry->relid == msg->m_relid &&
+ wentry->command == msg->m_command &&
+ wentry->xid == msg->m_xid &&
+ strncmp(wentry->message, msg->m_message, strlen(wentry->message)) == 0)
+ {

Is there a reason that the above check uses strncmp() with
strlen(wentry->message), instead of just strcmp()?
msg->m_message is treated as the same error message if it is the same
up to strlen(wentry->message)?
Perhaps if that is intentional, then the comment should be updated.

src/tools/pgindent/typedefs.list

(9)
The added "PgStat_SubWorkerError" should be removed from the
typedefs.list (as there is no such new typedef).

Regards,
Greg Nancarrow
Fujitsu Australia

#203osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#192)
RE: Skipping logical replication transactions on subscriber side

On Thursday, September 30, 2021 2:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far. Please
review them.

Hello

Minor two comments for v15-0001 patch.

(1) a typo in pgstat_vacuum_subworker_stat()

+               /*
+                * This subscription is live.  The next step is that we search errors
+                * of the table sync workers who are already in sync state. These
+                * errors should be removed.
+                */

This subscription is "alive" ?

(2) Suggestion to add one comment next to '0' in ApplyWorkerMain()

+                       /* report the table sync error */
+                       pgstat_report_subworker_error(MyLogicalRepWorker->subid,
+                                                                                 MyLogicalRepWorker->relid,
+                                                                                 MyLogicalRepWorker->relid,
+                                                                                 0,
+                                                                                 InvalidTransactionId,
+                                                                                 errdata->message);

How about writing /* no corresponding message type for table synchronization */ or something ?

Best Regards,
Takamichi Osumi

#204Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Masahiko Sawada (#198)
Re: Skipping logical replication transactions on subscriber side

On 04.10.21 02:31, Masahiko Sawada wrote:

I guess disabling subscriptions on error/conflict and skipping the
particular transactions are somewhat different types of functions.
Disabling subscriptions on error/conflict seems likes a setting
parameter of subscriptions. The users might want to specify this
option at creation time. Whereas, skipping the particular transaction
is a repair function that the user might want to use on the spot in
case of a failure. I’m concerned a bit that combining these functions
to one syntax could confuse the users.

Also, would the skip option be dumped and restored using pg_dump? Maybe
there is an argument for yes, but if not, then we probably need a
different path of handling it separate from the more permanent options.

#205Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#201)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 8, 2021 at 4:09 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Thursday, September 30, 2021 2:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far. Please
review them.

Hi

Sorry, if I misunderstand something but
did someone check what happens when we
execute ALTER SUBSCRIPTION ... RESET (streaming)
in the middle of one txn which has several streaming of data to the sub,
especially after some part of txn has been already streamed.
My intention of this is something like *if* we can find an actual harm of this,
I wanted to suggest the necessity of a safeguard or some measure into the patch.

An example)

Set the logical_decoding_work_mem = 64kB on the pub.
and create a table and subscription with streaming = true.
In addition, log_min_messages = DEBUG1 on the sub
is helpful to check the LOG on the sub in stream_open_file().

<Session 1> connect to the publisher

BEGIN;
INSERT INTO tab VALUES (generate_series(1, 1000)); -- this exceeds the memory limit
SELECT * FROM pg_stat_replication_slots; -- check the actual streaming bytes&counts just in case

<Session 2> connect to the subscriber
-- after checking some logs of "open file .... for streamed changes" on the sub
ALTER SUBSCRIPTION mysub RESET (streaming)

<Session 1>
INSERT INTO tab VALUES (generate_series(1001, 2000)); -- again, exceeds the limit
COMMIT;

I observed that the subscriber doesn't
accept STREAM_COMMIT in this case but gets BEGIN&COMMIT instead at the end.
I couldn't find any apparent and immediate issue from those steps
but is that no problem ?
Probably, this kind of situation applies to other reset target options ?

I think that if a subscription parameter such as ‘streaming’ and
‘binary’ is changed, an apply worker exits and the launcher starts a
new worker (see maybe_reread_subscription()). So I guess that in this
case, the apply worker exited during receiving streamed changes,
restarted, and received the same changes with ‘streaming = off’,
therefore it got BEGIN and COMMIT instead. I think that this happens
even by using ‘SET (‘streaming’ = off)’.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#206osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#205)
RE: Skipping logical replication transactions on subscriber side

On Monday, October 11, 2021 11:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 8, 2021 at 4:09 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Thursday, September 30, 2021 2:45 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.

Sorry, if I misunderstand something but did someone check what happens
when we execute ALTER SUBSCRIPTION ... RESET (streaming) in the middle
of one txn which has several streaming of data to the sub, especially
after some part of txn has been already streamed.
My intention of this is something like *if* we can find an actual harm
of this, I wanted to suggest the necessity of a safeguard or some measure

into the patch.

...

I observed that the subscriber doesn't accept STREAM_COMMIT in this
case but gets BEGIN&COMMIT instead at the end.
I couldn't find any apparent and immediate issue from those steps but
is that no problem ?
Probably, this kind of situation applies to other reset target options ?

I think that if a subscription parameter such as ‘streaming’ and ‘binary’ is
changed, an apply worker exits and the launcher starts a new worker (see
maybe_reread_subscription()). So I guess that in this case, the apply worker
exited during receiving streamed changes, restarted, and received the same
changes with ‘streaming = off’, therefore it got BEGIN and COMMIT instead. I
think that this happens even by using ‘SET (‘streaming’ = off)’.

You are right. Yes, I checked that the apply worker did exit
and the new apply worker process dealt with the INSERT in the above case.
Also, setting streaming = false was same.

Thanks a lot for your explanation.

Best Regards,
Takamichi Osumi

#207Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#204)
Re: Skipping logical replication transactions on subscriber side

On Sun, Oct 10, 2021 at 11:04 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 04.10.21 02:31, Masahiko Sawada wrote:

I guess disabling subscriptions on error/conflict and skipping the
particular transactions are somewhat different types of functions.
Disabling subscriptions on error/conflict seems likes a setting
parameter of subscriptions. The users might want to specify this
option at creation time. Whereas, skipping the particular transaction
is a repair function that the user might want to use on the spot in
case of a failure. I’m concerned a bit that combining these functions
to one syntax could confuse the users.

Also, would the skip option be dumped and restored using pg_dump? Maybe
there is an argument for yes, but if not, then we probably need a
different path of handling it separate from the more permanent options.

Good point. I don’t think the skip option should be dumped and
restored using pg_dump since the utilization of transaction ids in
another installation is different.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#208Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#202)
3 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 8, 2021 at 8:17 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Thu, Sep 30, 2021 at 3:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.

Some comments about the v15-0001 patch:

Thank you for the comments!

(1) patch adds a whitespace error

Applying: Add a subscription errors statistics view
"pg_stat_subscription_errors".
.git/rebase-apply/patch:1656: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

Fixed.

(2) Patch comment says "This commit adds a new system view
pg_stat_logical_replication_errors ..."
BUT this is the wrong name, it should be "pg_stat_subscription_errors".

Fixed.

doc/src/sgml/monitoring.sgml

(3)
"Message of the error" doesn't sound right. I suggest just saying "The
error message".

Fixed.

(4) view column "last_failed_time"
I think it would be better to name this "last_error_time".

Okay, fixed.

src/backend/postmaster/pgstat.c

(5) pgstat_vacuum_subworker_stats()

Spelling mistake in the following comment:

/* Create a map for mapping subscriptoin OID and database OID */

subscriptoin -> subscription

Fixed.

(6)
In the following functions:

pgstat_read_statsfiles
pgstat_read_db_statsfile_timestamp

The following comment should say "... struct describing subscription
worker statistics."
(i.e. need to remove the "a")

+ * 'S' A PgStat_StatSubWorkerEntry struct describing a
+ * subscription worker statistics.

Fixed.

(7) pgstat_get_subworker_entry

Suggest comment change:

BEFORE:
+ * Return the entry of subscription worker entry with the subscription
AFTER:
+ * Return subscription worker entry with the given subscription

Fixed.

(8) pgstat_recv_subworker_error

+ /*
+ * Update only the counter and timestamp if we received the same error
+ * again
+ */
+ if (wentry->relid == msg->m_relid &&
+ wentry->command == msg->m_command &&
+ wentry->xid == msg->m_xid &&
+ strncmp(wentry->message, msg->m_message, strlen(wentry->message)) == 0)
+ {

Is there a reason that the above check uses strncmp() with
strlen(wentry->message), instead of just strcmp()?
msg->m_message is treated as the same error message if it is the same
up to strlen(wentry->message)?
Perhaps if that is intentional, then the comment should be updated.

It's better to use strcmp() in this case. Fixed.

src/tools/pgindent/typedefs.list

(9)
The added "PgStat_SubWorkerError" should be removed from the
typedefs.list (as there is no such new typedef).

Fixed.

I've attached updated patches.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v16-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v16-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v16-0001-Add-a-subscription-errors-statistics-view-pg_sta.patchapplication/octet-stream; name=v16-0001-Add-a-subscription-errors-statistics-view-pg_sta.patch
v16-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v16-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
#209Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#203)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 8, 2021 at 9:22 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Thursday, September 30, 2021 2:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far. Please
review them.

Hello

Minor two comments for v15-0001 patch.

(1) a typo in pgstat_vacuum_subworker_stat()

+               /*
+                * This subscription is live.  The next step is that we search errors
+                * of the table sync workers who are already in sync state. These
+                * errors should be removed.
+                */

This subscription is "alive" ?

(2) Suggestion to add one comment next to '0' in ApplyWorkerMain()

+                       /* report the table sync error */
+                       pgstat_report_subworker_error(MyLogicalRepWorker->subid,
+                                                                                 MyLogicalRepWorker->relid,
+                                                                                 MyLogicalRepWorker->relid,
+                                                                                 0,
+                                                                                 InvalidTransactionId,
+                                                                                 errdata->message);

How about writing /* no corresponding message type for table synchronization */ or something ?

Thank you for the comments! Those comments are incorporated into the
latest patches I just submitted[1]/messages/by-id/CAD21AoDST8-ykrCLcWbWnTLj1u52-ZhiEP+bRU7kv5oBhfSy_Q@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoDST8-ykrCLcWbWnTLj1u52-ZhiEP+bRU7kv5oBhfSy_Q@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#210Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#208)
Re: Skipping logical replication transactions on subscriber side

On Tue, Oct 12, 2021 at 4:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Some comments for the v16-0003 patch:

(1) doc/src/sgml/logical-replication.sgml

The output from "SELECT * FROM pg_stat_subscription_errors;" still
shows "last_failed_time" instead of "last_error_time".

doc/src/sgml/ref/alter_subscription.sgml
(2)

Suggested update (and fix typo: restrited -> restricted):

BEFORE:
+          Setting and resetting of <literal>skip_xid</literal> option is
+          restrited to superusers.
AFTER:
+          The setting and resetting of the
<literal>skip_xid</literal> option is
+          restricted to superusers.

(3)
Suggested improvement to the wording:

BEFORE:
+          incoming change or by skipping the whole transaction.  This option
+          specifies transaction ID that logical replication worker skips to
+          apply.  The logical replication worker skips all data modification
AFTER:
+          incoming changes or by skipping the whole transaction.  This option
+          specifies the ID of the transaction whose application is to
be skipped
+          by the logical replication worker. The logical replication worker
+          skips all data modification

(4) src/backend/replication/logical/worker.c

Suggested improvement to the comment wording:

BEFORE:
+ * Stop the skipping transaction if enabled. Otherwise, commit the changes
AFTER:
+ * Stop skipping the transaction changes, if enabled. Otherwise,
commit the changes

(5) skip_xid value validation

The validation of the specified skip_xid XID value isn't great.
For example, the following value are accepted:

ALTER SUBSCRIPTION sub SET (skip_xid='123abcz');
ALTER SUBSCRIPTION sub SET (skip_xid='99$@*');

Regards,
Greg Nancarrow
Fujitsu Australia

#211Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#208)
Re: Skipping logical replication transactions on subscriber side

On Tue, Oct 12, 2021 at 4:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Some comments for the v16-0001 patch:

src/backend/postmaster/pgstat.c

(1) pgstat_vacuum_subworker_stat()

Remove "the" from beginning of the following comment line:

+ * the all the dead subscription worker statistics.

(2) pgstat_reset_subscription_error_stats()

This function would be better named "pgstat_reset_subscription_subworker_error".

(3) pgstat_report_subworker_purge()

Improve comment:

BEFORE:
+ * Tell the collector about dead subscriptions.
AFTER:
+ * Tell the collector to remove dead subscriptions.

(4) pgstat_get_subworker_entry()

I notice that in the following code:

+ if (create && !found)
+ pgstat_reset_subworker_error(wentry, 0);

The newly-created PgStat_StatSubWorkerEntry doesn't get the "dbid"
member set, so I think it's a junk value in this case, yet the caller
of pgstat_get_subworker_entry(..., true) is referencing it:

+ /* Get the subscription worker stats */
+ wentry = pgstat_get_subworker_entry(msg->m_subid, msg->m_subrelid, true);
+ Assert(wentry);
+
+ /*
+ * Update only the counter and timestamp if we received the same error
+ * again
+ */
+ if (wentry->dbid == msg->m_dbid &&
+ wentry->relid == msg->m_relid &&
+ wentry->command == msg->m_command &&
+ wentry->xid == msg->m_xid &&
+ strcmp(wentry->message, msg->m_message) == 0)
+ {
+ wentry->count++;
+ wentry->timestamp = msg->m_timestamp;
+ return;
+ }

Maybe the cheapest solution is to just set dbid in
pgstat_reset_subworker_error()?

src/backend/replication/logical/worker.c

(5) Fix typo

synchroniztion -> synchronization

+ * type for table synchroniztion.

Regards,
Greg Nancarrow
Fujitsu Australia

#212Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#208)
Re: Skipping logical replication transactions on subscriber side

On Tue, Oct 12, 2021 at 4:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

A couple more comments for some issues that I noticed in the v16 patches:

v16-0002

doc/src/sgml/ref/alter_subscription.sgml

(1) Order of parameters that can be reset doesn't match those that can be set.
Also, it doesn't match the order specified in the documentation
updates in the v16-0003 patch.

Suggested change:

BEFORE:
+       The parameters that can be reset are: <literal>streaming</literal>,
+       <literal>binary</literal>, <literal>synchronous_commit</literal>.
AFTER:
+       The parameters that can be reset are:
<literal>synchronous_commit</literal>,
+       <literal>binary</literal>, <literal>streaming</literal>.

v16-0003

doc/src/sgml/ref/alter_subscription.sgml

(1) Documentation update says "slot_name" is a parameter that can be
reset, but this is not correct, it can't be reset.
Also, the doc update is missing "the" before "parameter".

Suggested change:

BEFORE:
+      The parameters that can be reset are: <literal>slot_name</literal>,
+      <literal>synchronous_commit</literal>, <literal>binary</literal>,
+      <literal>streaming</literal>, and following parameter:
AFTER:
+      The parameters that can be reset are:
<literal>synchronous_commit</literal>,
+      <literal>binary</literal>, <literal>streaming</literal>, and
the following
+      parameter:

Regards,
Greg Nancarrow
Fujitsu Australia

#213Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#210)
Re: Skipping logical replication transactions on subscriber side

On Tue, Oct 12, 2021 at 7:58 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Oct 12, 2021 at 4:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Some comments for the v16-0003 patch:

Thank you for the comments!

(1) doc/src/sgml/logical-replication.sgml

The output from "SELECT * FROM pg_stat_subscription_errors;" still
shows "last_failed_time" instead of "last_error_time".

Fixed.

doc/src/sgml/ref/alter_subscription.sgml
(2)

Suggested update (and fix typo: restrited -> restricted):

BEFORE:
+          Setting and resetting of <literal>skip_xid</literal> option is
+          restrited to superusers.
AFTER:
+          The setting and resetting of the
<literal>skip_xid</literal> option is
+          restricted to superusers.

Fixed.

(3)
Suggested improvement to the wording:

BEFORE:
+          incoming change or by skipping the whole transaction.  This option
+          specifies transaction ID that logical replication worker skips to
+          apply.  The logical replication worker skips all data modification
AFTER:
+          incoming changes or by skipping the whole transaction.  This option
+          specifies the ID of the transaction whose application is to
be skipped
+          by the logical replication worker. The logical replication worker
+          skips all data modification

Updated.

(4) src/backend/replication/logical/worker.c

Suggested improvement to the comment wording:

BEFORE:
+ * Stop the skipping transaction if enabled. Otherwise, commit the changes
AFTER:
+ * Stop skipping the transaction changes, if enabled. Otherwise,
commit the changes

Fixed.

(5) skip_xid value validation

The validation of the specified skip_xid XID value isn't great.
For example, the following value are accepted:

ALTER SUBSCRIPTION sub SET (skip_xid='123abcz');
ALTER SUBSCRIPTION sub SET (skip_xid='99$@*');

Hmm, this is probably a problem of xid data type. For example, we can do like:

postgres(1:12686)=# select 'aa123'::xid;
xid
-----
0
(1 row)

postgres(1:12686)=# select '123aa'::xid;
xid
-----
123
(1 row)

It seems a problem to me. Perhaps we can fix it in a separate patch.
What do you think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#214Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#211)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 13, 2021 at 10:59 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Oct 12, 2021 at 4:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Some comments for the v16-0001 patch:

Thank you for the comments!

src/backend/postmaster/pgstat.c

(1) pgstat_vacuum_subworker_stat()

Remove "the" from beginning of the following comment line:

+ * the all the dead subscription worker statistics.

Fixed.

(2) pgstat_reset_subscription_error_stats()

This function would be better named "pgstat_reset_subscription_subworker_error".

'subworker' contains an abbreviation of 'subscription'. So it seems
redundant to me. No?

(3) pgstat_report_subworker_purge()

Improve comment:

BEFORE:
+ * Tell the collector about dead subscriptions.
AFTER:
+ * Tell the collector to remove dead subscriptions.

Fixed.

(4) pgstat_get_subworker_entry()

I notice that in the following code:

+ if (create && !found)
+ pgstat_reset_subworker_error(wentry, 0);

The newly-created PgStat_StatSubWorkerEntry doesn't get the "dbid"
member set, so I think it's a junk value in this case, yet the caller
of pgstat_get_subworker_entry(..., true) is referencing it:

+ /* Get the subscription worker stats */
+ wentry = pgstat_get_subworker_entry(msg->m_subid, msg->m_subrelid, true);
+ Assert(wentry);
+
+ /*
+ * Update only the counter and timestamp if we received the same error
+ * again
+ */
+ if (wentry->dbid == msg->m_dbid &&
+ wentry->relid == msg->m_relid &&
+ wentry->command == msg->m_command &&
+ wentry->xid == msg->m_xid &&
+ strcmp(wentry->message, msg->m_message) == 0)
+ {
+ wentry->count++;
+ wentry->timestamp = msg->m_timestamp;
+ return;
+ }

Maybe the cheapest solution is to just set dbid in
pgstat_reset_subworker_error()?

I've change the code to reset dbid in pgstat_reset_subworker_error().

src/backend/replication/logical/worker.c

(5) Fix typo

synchroniztion -> synchronization

+ * type for table synchroniztion.

Fixed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#215Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#212)
3 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 14, 2021 at 5:45 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Oct 12, 2021 at 4:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

A couple more comments for some issues that I noticed in the v16 patches:

v16-0002

doc/src/sgml/ref/alter_subscription.sgml

(1) Order of parameters that can be reset doesn't match those that can be set.
Also, it doesn't match the order specified in the documentation
updates in the v16-0003 patch.

Suggested change:

BEFORE:
+       The parameters that can be reset are: <literal>streaming</literal>,
+       <literal>binary</literal>, <literal>synchronous_commit</literal>.
AFTER:
+       The parameters that can be reset are:
<literal>synchronous_commit</literal>,
+       <literal>binary</literal>, <literal>streaming</literal>.

Fixed.

v16-0003

doc/src/sgml/ref/alter_subscription.sgml

(1) Documentation update says "slot_name" is a parameter that can be
reset, but this is not correct, it can't be reset.
Also, the doc update is missing "the" before "parameter".

Suggested change:

BEFORE:
+      The parameters that can be reset are: <literal>slot_name</literal>,
+      <literal>synchronous_commit</literal>, <literal>binary</literal>,
+      <literal>streaming</literal>, and following parameter:
AFTER:
+      The parameters that can be reset are:
<literal>synchronous_commit</literal>,
+      <literal>binary</literal>, <literal>streaming</literal>, and
the following
+      parameter:

Fixed.

I've attached updated patches that incorporate all comments I got so far.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v17-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v17-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v17-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v17-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v17-0001-Add-a-subscription-errors-statistics-view-pg_sta.patchapplication/octet-stream; name=v17-0001-Add-a-subscription-errors-statistics-view-pg_sta.patch
#216Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#206)
Re: Skipping logical replication transactions on subscriber side

On Mon, Oct 11, 2021 at 12:57 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Monday, October 11, 2021 11:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 8, 2021 at 4:09 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Thursday, September 30, 2021 2:45 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so
far. Please review them.

Sorry, if I misunderstand something but did someone check what happens
when we execute ALTER SUBSCRIPTION ... RESET (streaming) in the middle
of one txn which has several streaming of data to the sub, especially
after some part of txn has been already streamed.
My intention of this is something like *if* we can find an actual harm
of this, I wanted to suggest the necessity of a safeguard or some measure

into the patch.

...

I observed that the subscriber doesn't accept STREAM_COMMIT in this
case but gets BEGIN&COMMIT instead at the end.
I couldn't find any apparent and immediate issue from those steps but
is that no problem ?
Probably, this kind of situation applies to other reset target options ?

I think that if a subscription parameter such as ‘streaming’ and ‘binary’ is
changed, an apply worker exits and the launcher starts a new worker (see
maybe_reread_subscription()). So I guess that in this case, the apply worker
exited during receiving streamed changes, restarted, and received the same
changes with ‘streaming = off’, therefore it got BEGIN and COMMIT instead. I
think that this happens even by using ‘SET (‘streaming’ = off)’.

You are right. Yes, I checked that the apply worker did exit
and the new apply worker process dealt with the INSERT in the above case.
Also, setting streaming = false was same.

I think you can additionally verify that temporary streaming files get
removed after restart.

--
With Regards,
Amit Kapila.

#217Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#207)
Re: Skipping logical replication transactions on subscriber side

On Mon, Oct 11, 2021 at 1:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Oct 10, 2021 at 11:04 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 04.10.21 02:31, Masahiko Sawada wrote:

I guess disabling subscriptions on error/conflict and skipping the
particular transactions are somewhat different types of functions.
Disabling subscriptions on error/conflict seems likes a setting
parameter of subscriptions. The users might want to specify this
option at creation time. Whereas, skipping the particular transaction
is a repair function that the user might want to use on the spot in
case of a failure. I’m concerned a bit that combining these functions
to one syntax could confuse the users.

Also, would the skip option be dumped and restored using pg_dump? Maybe
there is an argument for yes, but if not, then we probably need a
different path of handling it separate from the more permanent options.

Good point. I don’t think the skip option should be dumped and
restored using pg_dump since the utilization of transaction ids in
another installation is different.

This is a xid of publisher which subscriber wants to skip. So, even if
one restores the subscriber data in a different installation why would
it matter till it points to the same publisher?

Either way, can't we handle this in pg_dump?

--
With Regards,
Amit Kapila.

#218Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#217)
Re: Skipping logical replication transactions on subscriber side

On Mon, Oct 18, 2021 at 6:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Oct 11, 2021 at 1:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Oct 10, 2021 at 11:04 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 04.10.21 02:31, Masahiko Sawada wrote:

I guess disabling subscriptions on error/conflict and skipping the
particular transactions are somewhat different types of functions.
Disabling subscriptions on error/conflict seems likes a setting
parameter of subscriptions. The users might want to specify this
option at creation time. Whereas, skipping the particular transaction
is a repair function that the user might want to use on the spot in
case of a failure. I’m concerned a bit that combining these functions
to one syntax could confuse the users.

Also, would the skip option be dumped and restored using pg_dump? Maybe
there is an argument for yes, but if not, then we probably need a
different path of handling it separate from the more permanent options.

Good point. I don’t think the skip option should be dumped and
restored using pg_dump since the utilization of transaction ids in
another installation is different.

This is a xid of publisher which subscriber wants to skip. So, even if
one restores the subscriber data in a different installation why would
it matter till it points to the same publisher?

Either way, can't we handle this in pg_dump?

Because of backups (dumps), I think we cannot expect that the user
restore it somewhere soon. If the dump is restored several months
later, the publisher could be a different installation (by rebuilding
from scratch) or XID of the publisher could already be wrapped around.
It might be useful to dump the skip_xid by pg_dump in some cases, but
I think it should be optional if we want to do that.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#219Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#218)
Re: Skipping logical replication transactions on subscriber side

On Tue, Oct 19, 2021 at 8:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Oct 18, 2021 at 6:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Oct 11, 2021 at 1:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Oct 10, 2021 at 11:04 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 04.10.21 02:31, Masahiko Sawada wrote:

I guess disabling subscriptions on error/conflict and skipping the
particular transactions are somewhat different types of functions.
Disabling subscriptions on error/conflict seems likes a setting
parameter of subscriptions. The users might want to specify this
option at creation time. Whereas, skipping the particular transaction
is a repair function that the user might want to use on the spot in
case of a failure. I’m concerned a bit that combining these functions
to one syntax could confuse the users.

Also, would the skip option be dumped and restored using pg_dump? Maybe
there is an argument for yes, but if not, then we probably need a
different path of handling it separate from the more permanent options.

Good point. I don’t think the skip option should be dumped and
restored using pg_dump since the utilization of transaction ids in
another installation is different.

This is a xid of publisher which subscriber wants to skip. So, even if
one restores the subscriber data in a different installation why would
it matter till it points to the same publisher?

Either way, can't we handle this in pg_dump?

Because of backups (dumps), I think we cannot expect that the user
restore it somewhere soon. If the dump is restored several months
later, the publisher could be a different installation (by rebuilding
from scratch) or XID of the publisher could already be wrapped around.
It might be useful to dump the skip_xid by pg_dump in some cases, but
I think it should be optional if we want to do that.

Agreed, I think it depends on the use case, so we can keep it
optional, or maybe in the initial version let's not dump it, and only
if we later see the use case then we can add an optional parameter in
pg_dump. Do you think we need any special handling if we decide not to
dump it? I think if we decide to dump it either optionally or
otherwise, then we do need changes in pg_dump.

--
With Regards,
Amit Kapila.

#220Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#219)
Re: Skipping logical replication transactions on subscriber side

On Tue, Oct 19, 2021 at 12:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Oct 19, 2021 at 8:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Oct 18, 2021 at 6:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Oct 11, 2021 at 1:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Oct 10, 2021 at 11:04 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 04.10.21 02:31, Masahiko Sawada wrote:

I guess disabling subscriptions on error/conflict and skipping the
particular transactions are somewhat different types of functions.
Disabling subscriptions on error/conflict seems likes a setting
parameter of subscriptions. The users might want to specify this
option at creation time. Whereas, skipping the particular transaction
is a repair function that the user might want to use on the spot in
case of a failure. I’m concerned a bit that combining these functions
to one syntax could confuse the users.

Also, would the skip option be dumped and restored using pg_dump? Maybe
there is an argument for yes, but if not, then we probably need a
different path of handling it separate from the more permanent options.

Good point. I don’t think the skip option should be dumped and
restored using pg_dump since the utilization of transaction ids in
another installation is different.

This is a xid of publisher which subscriber wants to skip. So, even if
one restores the subscriber data in a different installation why would
it matter till it points to the same publisher?

Either way, can't we handle this in pg_dump?

Because of backups (dumps), I think we cannot expect that the user
restore it somewhere soon. If the dump is restored several months
later, the publisher could be a different installation (by rebuilding
from scratch) or XID of the publisher could already be wrapped around.
It might be useful to dump the skip_xid by pg_dump in some cases, but
I think it should be optional if we want to do that.

Agreed, I think it depends on the use case, so we can keep it
optional, or maybe in the initial version let's not dump it, and only
if we later see the use case then we can add an optional parameter in
pg_dump.

Agreed. I prefer not to dump it in the first version since it's
difficult to remove the option once it's introduced.

Do you think we need any special handling if we decide not to
dump it? I think if we decide to dump it either optionally or
otherwise, then we do need changes in pg_dump.

Yeah, if we don't dump the skip_xid (which is the current patch
behavior), any special handling is not required for pg_dump. On the
other hand, if we do that in any way, we need changes for pg_dump.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#221houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#215)
RE: Skipping logical replication transactions on subscriber side

On Mon, Oct 18, 2021 9:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far.

Hi,

Here are some minor comments for the patches.

v17-0001-Add-a-subscription-errors-statistics-view-pg_sta.patch

1)

+	/* Clean up */
+	if (not_ready_rels != NIL)
+		list_free_deep(not_ready_rels);

Maybe we don't need the ' if (not_ready_rels != NIL)' check as
list_free_deep will do this check internally.

2)

+	for (int i = 0; i < msg->m_nentries; i++)
+	{
+		HASH_SEQ_STATUS sstat;
+		PgStat_StatSubWorkerEntry *wentry;
+
+		/* Remove all worker statistics of the subscription */
+		hash_seq_init(&sstat, subWorkerStatHash);
+		while ((wentry = (PgStat_StatSubWorkerEntry *) hash_seq_search(&sstat)) != NULL)
+		{
+			if (wentry->key.subid == msg->m_subids[i])
+				(void) hash_search(subWorkerStatHash, (void *) &(wentry->key),
+								   HASH_REMOVE, NULL);

Would it be a little faster if we scan hashtable in outerloop and
scan the msg in innerloop ?
Like:
while ((wentry = (PgStat_StatSubWorkerEntry *) hash_seq_search(&sstat)) != NULL)
{
for (int i = 0; i < msg->m_nentries; i++)
...

v17-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command

1)
I noticed that we cannot RESET slot_name while we can SET it.
And the slot_name have a default behavior that " use the name of the subscription for the slot name.".
So, is it possible to support RESET it ?

Best regards,
Hou zj

#222Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#215)
Re: Skipping logical replication transactions on subscriber side

On Mon, Oct 18, 2021 at 12:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far.

Minor comment on patch 17-0003

src/backend/replication/logical/worker.c

(1) Typo in apply_handle_stream_abort() comment:

/* Stop skipping transaction transaction, if enabled */
should be:
/* Stop skipping transaction changes, if enabled */

Regards,
Greg Nancarrow
Fujitsu Australia

#223Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#221)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 20, 2021 at 12:03 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Mon, Oct 18, 2021 9:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far.

Hi,

Here are some minor comments for the patches.

Thank you for the comments!

v17-0001-Add-a-subscription-errors-statistics-view-pg_sta.patch

1)

+       /* Clean up */
+       if (not_ready_rels != NIL)
+               list_free_deep(not_ready_rels);

Maybe we don't need the ' if (not_ready_rels != NIL)' check as
list_free_deep will do this check internally.

Agreed.

2)

+       for (int i = 0; i < msg->m_nentries; i++)
+       {
+               HASH_SEQ_STATUS sstat;
+               PgStat_StatSubWorkerEntry *wentry;
+
+               /* Remove all worker statistics of the subscription */
+               hash_seq_init(&sstat, subWorkerStatHash);
+               while ((wentry = (PgStat_StatSubWorkerEntry *) hash_seq_search(&sstat)) != NULL)
+               {
+                       if (wentry->key.subid == msg->m_subids[i])
+                               (void) hash_search(subWorkerStatHash, (void *) &(wentry->key),
+                                                                  HASH_REMOVE, NULL);

Would it be a little faster if we scan hashtable in outerloop and
scan the msg in innerloop ?
Like:
while ((wentry = (PgStat_StatSubWorkerEntry *) hash_seq_search(&sstat)) != NULL)
{
for (int i = 0; i < msg->m_nentries; i++)
...

Agreed.

v17-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command

1)
I noticed that we cannot RESET slot_name while we can SET it.
And the slot_name have a default behavior that " use the name of the subscription for the slot name.".
So, is it possible to support RESET it ?

Hmm, I'm not sure resetting slot_name is useful. I think that it’s
common to change the slot name to NONE by ALTER SUBSCRIPTION and vise
versa. But I think resetting the slot name (i.g., changing a
non-default name to the default name) is not the common use case. If
the user wants to do that, it seems safer to explicitly specify the
slot name using by ALTER SUBSCRIPTION ... SET (slot_name = 'XXX').

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#224Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#222)
3 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 20, 2021 at 12:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Oct 18, 2021 at 12:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far.

Minor comment on patch 17-0003

Thank you for the comment!

src/backend/replication/logical/worker.c

(1) Typo in apply_handle_stream_abort() comment:

/* Stop skipping transaction transaction, if enabled */
should be:
/* Stop skipping transaction changes, if enabled */

Fixed.

I've attached updated patches. In this version, in addition to the
review comments I go so far, I've changed the view name from
pg_stat_subscription_errors to pg_stat_subscription_workers as per the
discussion on including xact info to the view on another thread[1]/messages/by-id/CAD21AoDF7LmSALzMfmPshRw_xFcRz3WvB-me8T2gO6Ht=3zL2w@mail.gmail.com.
I’ve also changed related codes accordingly.

Regards,

[1]: /messages/by-id/CAD21AoDF7LmSALzMfmPshRw_xFcRz3WvB-me8T2gO6Ht=3zL2w@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v18-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patchapplication/octet-stream; name=v18-0003-Add-skip_xid-option-to-ALTER-SUBSCRIPTION.patch
v18-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patchapplication/octet-stream; name=v18-0002-Add-RESET-command-to-ALTER-SUBSCRIPTION-command.patch
v18-0001-Add-a-subscription-worker-statistics-view-pg_sta.patchapplication/octet-stream; name=v18-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch
#225Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#200)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 6, 2021 at 11:18 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Oct 4, 2021 at 4:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think here the main point is that does this addresses Peter's
concern for this Patch to use a separate syntax? Peter E., can you
please confirm? Do let us know if you have something else going in
your mind?

Peter's concern seemed to be that the use of a subscription option,
though convenient, isn't an intuitive natural fit for providing this
feature (i.e. ability to skip a transaction by xid). I tend to have
that feeling about using a subscription option for this feature. I'm
not sure what possible alternative syntax he had in mind and currently
can't really think of a good one myself that fits the purpose.

I think that the 1st and 2nd patch are useful in their own right, but
couldn't this feature (i.e. the 3rd patch) be provided instead as an
additional Replication Management function (see 9.27.6)?
e.g. pg_replication_skip_xid

After some thoughts on the syntax, it's somewhat natural to me if we
support the skip transaction feature with another syntax like (I
prefer the former):

ALTER SUBSCRIPTION ... [SET|RESET] SKIP TRANSACTION xxx;

or

ALTER SUBSCRIPTION ... SKIP TRANSACTION xxx; (setting NONE as XID to
reset the XID to skip)

The primary reason to have another syntax is that ability to skip a
transaction seems not to be other subscription parameters such as
slot_name, binary, streaming that are dumped by pg_dump. FWIW IMO the
ability to disable the subscription on an error would be a
subscription parameter. The user is likely to want to specify this
option also at CREATE SUBSCRIPTION and wants it to be dumped by
pg_dump. So I think we can think of the skip xid option separately
from this parameter.

Also, I think we can think of the syntax for this ability (skipping a
transaction) separately from the syntax of the general conflict
resolution feature. I guess that we might rather need a whole new
syntax for conflict resolution. In addition, the user will want to
dump the definitions of confliction resolution by pg_dump in common
cases, unlike the skip XID.

As Amit pointed out, we might want to allow users to skip changes
based on something other than XID but the candidates seem only a few
to me (LSN, time, and something else?). If these are only a few,
probably we don’t need to worry about syntax bloat.

Regarding an additional replication management function proposed by
Greg, it seems a bit unnatural to me; the subscription is created and
altered by DDL but why is only skipping the transaction option
specified by an SQL function?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#226Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#225)
Re: Skipping logical replication transactions on subscriber side

On Mon, Oct 25, 2021 at 7:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 6, 2021 at 11:18 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

I think that the 1st and 2nd patch are useful in their own right, but
couldn't this feature (i.e. the 3rd patch) be provided instead as an
additional Replication Management function (see 9.27.6)?
e.g. pg_replication_skip_xid

After some thoughts on the syntax, it's somewhat natural to me if we
support the skip transaction feature with another syntax like (I
prefer the former):

ALTER SUBSCRIPTION ... [SET|RESET] SKIP TRANSACTION xxx;

or

ALTER SUBSCRIPTION ... SKIP TRANSACTION xxx; (setting NONE as XID to
reset the XID to skip)

The primary reason to have another syntax is that ability to skip a
transaction seems not to be other subscription parameters such as
slot_name, binary, streaming that are dumped by pg_dump. FWIW IMO the
ability to disable the subscription on an error would be a
subscription parameter. The user is likely to want to specify this
option also at CREATE SUBSCRIPTION and wants it to be dumped by
pg_dump. So I think we can think of the skip xid option separately
from this parameter.

Also, I think we can think of the syntax for this ability (skipping a
transaction) separately from the syntax of the general conflict
resolution feature. I guess that we might rather need a whole new
syntax for conflict resolution.

I agree that we will need a separate syntax for conflict resolution
but there is some similarity in what I proposed above (On
Error/Conflict [1]/messages/by-id/CAA4eK1+BOHXC=0S2kA7GkErWq3-QKj34oQvwAPfuTHq=epf34w@mail.gmail.com) with the existing syntax of Insert ... On
Conflict. I understand that here the context is different and we are
storing this information in the catalog but still there is some syntax
similarity and it will avoid adding new syntax variants.

In addition, the user will want to
dump the definitions of confliction resolution by pg_dump in common
cases, unlike the skip XID.

As Amit pointed out, we might want to allow users to skip changes
based on something other than XID but the candidates seem only a few
to me (LSN, time, and something else?). If these are only a few,
probably we don’t need to worry about syntax bloat.

I guess one might want to skip particular operations that cause an
error and that would be possible as we are providing the relevant
information via a view.

Regarding an additional replication management function proposed by
Greg, it seems a bit unnatural to me; the subscription is created and
altered by DDL but why is only skipping the transaction option
specified by an SQL function?

The one advantage I see is that it will be similar to what we already
have via pg_replication_origin_advance() for skipping WAL during
apply. The other thing could be that this feature can lead to problems
if not used carefully so maybe it is better to provide it only by
special functions. Having said that, I still feel we should do it via
Alter Subscription in some way as that will be convenient to use.

[1]: /messages/by-id/CAA4eK1+BOHXC=0S2kA7GkErWq3-QKj34oQvwAPfuTHq=epf34w@mail.gmail.com

--
With Regards,
Amit Kapila.

#227Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Amit Kapila (#226)
Re: Skipping logical replication transactions on subscriber side

On Tue, Oct 26, 2021 at 5:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I agree that we will need a separate syntax for conflict resolution
but there is some similarity in what I proposed above (On
Error/Conflict [1]) with the existing syntax of Insert ... On
Conflict. I understand that here the context is different and we are
storing this information in the catalog but still there is some syntax
similarity and it will avoid adding new syntax variants.

The problem I see with the suggested syntax:

Alter Subscription <sub_name> On Error ( subscription_parameter [=
value] [, ... ] );
OR
Alter Subscription <sub_name> On Conflict ( subscription_parameter [=
value] [, ... ] );

is that "On Error ..." and "On Conflict" imply an action to be done on
a future condition (Error/Conflict), whereas at least in this case
(skip_xid) it's only AFTER the problem condition has occurred that we
know the XID of the failed transaction that we want to skip. So that
syntax looks a little confusing to me. Unless you had something else
in mind on how it would work?

Regards,
Greg Nancarrow
Fujitsu Australia

#228Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Greg Nancarrow (#227)
Re: Skipping logical replication transactions on subscriber side

On Tue, Oct 26, 2021 at 2:27 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Oct 26, 2021 at 5:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I agree that we will need a separate syntax for conflict resolution
but there is some similarity in what I proposed above (On
Error/Conflict [1]) with the existing syntax of Insert ... On
Conflict. I understand that here the context is different and we are
storing this information in the catalog but still there is some syntax
similarity and it will avoid adding new syntax variants.

The problem I see with the suggested syntax:

Alter Subscription <sub_name> On Error ( subscription_parameter [=
value] [, ... ] );
OR
Alter Subscription <sub_name> On Conflict ( subscription_parameter [=
value] [, ... ] );

is that "On Error ..." and "On Conflict" imply an action to be done on
a future condition (Error/Conflict), whereas at least in this case
(skip_xid) it's only AFTER the problem condition has occurred that we
know the XID of the failed transaction that we want to skip. So that
syntax looks a little confusing to me. Unless you had something else
in mind on how it would work?

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Instead of using Skip, we can use WITH as used in Alter Database
syntax but we are already using WITH in Create Subscription for a
different purpose, so that may not be a very good idea.

The basic idea is that I am trying to use options here rather than a
keyword-based syntax as there can be multiple such options.

--
With Regards,
Amit Kapila.

#229Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#228)
Re: Skipping logical replication transactions on subscriber side

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Oct 26, 2021 at 2:27 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Oct 26, 2021 at 5:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I agree that we will need a separate syntax for conflict resolution
but there is some similarity in what I proposed above (On
Error/Conflict [1]) with the existing syntax of Insert ... On
Conflict. I understand that here the context is different and we are
storing this information in the catalog but still there is some syntax
similarity and it will avoid adding new syntax variants.

The problem I see with the suggested syntax:

Alter Subscription <sub_name> On Error ( subscription_parameter [=
value] [, ... ] );
OR
Alter Subscription <sub_name> On Conflict ( subscription_parameter [=
value] [, ... ] );

is that "On Error ..." and "On Conflict" imply an action to be done on
a future condition (Error/Conflict), whereas at least in this case
(skip_xid) it's only AFTER the problem condition has occurred that we
know the XID of the failed transaction that we want to skip. So that
syntax looks a little confusing to me. Unless you had something else
in mind on how it would work?

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

BTW how useful is specifying LSN instead of XID in practice? Given
that this skipping behavior is used to skip the particular transaction
(or its part of operations) in question, I’m not sure specifying LSN
or time is useful. And, if it’s essentially the same as
pg_replication_origin_advance(), we don’t need to have it.

The basic idea is that I am trying to use options here rather than a
keyword-based syntax as there can be multiple such options.

Agreed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#230houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#224)
RE: Skipping logical replication transactions on subscriber side

On Thurs, Oct 21, 2021 12:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches. In this version, in addition to the review
comments I go so far, I've changed the view name from
pg_stat_subscription_errors to pg_stat_subscription_workers as per the
discussion on including xact info to the view on another thread[1].
I’ve also changed related codes accordingly.

When reviewing the v18-0002 patch.
I noticed that "RESET SYNCHRONOUS_COMMIT" does not take effect
(RESET doesn't change the value to 'off').

+			if (!is_reset)
+			{
+				opts->synchronous_commit = defGetString(defel);
-			...
+			}

I think we need to add else branch here to set the synchronous_commit to 'off'.

Best regards,
Hou zj

#231Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#229)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 27, 2021 at 8:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

BTW how useful is specifying LSN instead of XID in practice? Given
that this skipping behavior is used to skip the particular transaction
(or its part of operations) in question, I’m not sure specifying LSN
or time is useful.

I think if the user wants to skip multiple xacts, she might want to
use the highest LSN to skip instead of specifying individual xids.

--
With Regards,
Amit Kapila.

#232Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: houzj.fnst@fujitsu.com (#230)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 27, 2021 at 2:28 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

When reviewing the v18-0002 patch.
I noticed that "RESET SYNCHRONOUS_COMMIT" does not take effect
(RESET doesn't change the value to 'off').

+                       if (!is_reset)
+                       {
+                               opts->synchronous_commit = defGetString(defel);
-                       ...
+                       }

I think we need to add else branch here to set the synchronous_commit to 'off'.

I agree that it doesn't seem to handle the RESET of synchronous_commit.
I think that for consistency, the default value "off" for
synchronous_commit should be set (in the SubOpts) near where the
default values of the boolean supported options are currently set -
near the top of parse_subscription_options().

Regards,
Greg Nancarrow
Fujitsu Australia

#233Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#231)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 27, 2021 at 12:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 8:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

BTW how useful is specifying LSN instead of XID in practice? Given
that this skipping behavior is used to skip the particular transaction
(or its part of operations) in question, I’m not sure specifying LSN
or time is useful.

I think if the user wants to skip multiple xacts, she might want to
use the highest LSN to skip instead of specifying individual xids.

I think it assumes that the situation where the user already knows
multiple transactions that cannot be applied on the subscription but
how do they know?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#234Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#233)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 27, 2021 at 10:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 12:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 8:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

BTW how useful is specifying LSN instead of XID in practice? Given
that this skipping behavior is used to skip the particular transaction
(or its part of operations) in question, I’m not sure specifying LSN
or time is useful.

I think if the user wants to skip multiple xacts, she might want to
use the highest LSN to skip instead of specifying individual xids.

I think it assumes that the situation where the user already knows
multiple transactions that cannot be applied on the subscription but
how do they know?

Either from the error messages in the server log or from the new view
we are planning to add. I think such a case is possible during the
initial synchronization phase where apply worker went ahead then
tablesync worker by skipping to apply the changes on the corresponding
table. After that it is possible, that table sync worker failed during
copy and apply worker fails during the processing of some other rel.
Now, I think the only way to move is via LSNs. Currently, figuring out
LSNs to skip is not straight forward but improving that area is the
work of another patch.

--
With Regards,
Amit Kapila.

#235tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#224)
RE: Skipping logical replication transactions on subscriber side

On Thursday, October 21, 2021 12:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches. In this version, in addition to the
review comments I go so far, I've changed the view name from
pg_stat_subscription_errors to pg_stat_subscription_workers as per the
discussion on including xact info to the view on another thread[1].
I’ve also changed related codes accordingly.

Thanks for your patch.
I have some minor comments on your 0001 and 0002 patch.

1. For 0001 patch, src/backend/catalog/system_views.sql
+CREATE VIEW pg_stat_subscription_workers AS
+    SELECT
+	e.subid,
+	s.subname,
+	e.subrelid,
+	e.relid,
+	e.command,
+	e.xid,
+	e.count,
+	e.error_message,
+	e.last_error_time,
+	e.stats_reset
+    FROM (SELECT
+              oid as subid,
...

Some places use TABs, I think it's better to use spaces here, to be consistent
with other places in this file.

2. For 0002 patch, I think we can add some changes to tab-complete.c, maybe
something like this:

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ecae9df8ed..96665f6115 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1654,7 +1654,7 @@ psql_completion(const char *text, int start, int end)
        /* ALTER SUBSCRIPTION <name> */
        else if (Matches("ALTER", "SUBSCRIPTION", MatchAny))
                COMPLETE_WITH("CONNECTION", "ENABLE", "DISABLE", "OWNER TO",
-                                         "RENAME TO", "REFRESH PUBLICATION", "SET",
+                                         "RENAME TO", "REFRESH PUBLICATION", "SET", "RESET",
                                          "ADD PUBLICATION", "DROP PUBLICATION");
        /* ALTER SUBSCRIPTION <name> REFRESH PUBLICATION */
        else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) &&
@@ -1670,6 +1670,12 @@ psql_completion(const char *text, int start, int end)
        /* ALTER SUBSCRIPTION <name> SET ( */
        else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SET", "("))
                COMPLETE_WITH("binary", "slot_name", "streaming", "synchronous_commit");
+       /* ALTER SUBSCRIPTION <name> RESET */
+       else if (Matches("ALTER", "SUBSCRIPTION", MatchAny, "RESET"))
+               COMPLETE_WITH("(");
+       /* ALTER SUBSCRIPTION <name> RESET ( */
+       else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("RESET", "("))
+               COMPLETE_WITH("binary", "streaming", "synchronous_commit");
        /* ALTER SUBSCRIPTION <name> SET PUBLICATION */
        else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SET", "PUBLICATION"))
        {

Regards
Tang

#236Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#224)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Few comments:
==============
1. Is the patch cleaning tablesync error entries except via vacuum? If
not, can't we send a message to remove tablesync errors once tablesync
is successful (say when we reset skip_xid or when tablesync is
finished) or when we drop subscription? I think the same applies to
apply worker. I think we may want to track it in some way whether an
error has occurred before sending the message but relying completely
on a vacuum might be the recipe of bloat. I think in the case of a
drop subscription we can simply send the message as that is not a
frequent operation. I might be missing something here because in the
tests after drop subscription you are expecting the entries from the
view to get cleared

2.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>count</structfield> <type>uint8</type>
+      </para>
+      <para>
+       Number of consecutive times the error occurred
+      </para></entry>

Shall we name this field as error_count as there will be other fields
in this view in the future that may not be directly related to the
error?

3.
+
+CREATE VIEW pg_stat_subscription_workers AS
+    SELECT
+ e.subid,
+ s.subname,
+ e.subrelid,
+ e.relid,
+ e.command,
+ e.xid,
+ e.count,
+ e.error_message,
+ e.last_error_time,
+ e.stats_reset
+    FROM (SELECT
+              oid as subid,
+              NULL as relid
+          FROM pg_subscription
+          UNION ALL
+          SELECT
+              srsubid as subid,
+              srrelid as relid
+          FROM pg_subscription_rel
+          WHERE srsubstate <> 'r') sr,
+          LATERAL pg_stat_get_subscription_worker(sr.subid, sr.relid) e

It might be better to use 'w' as an alias instead of 'e' as the
information is now not restricted to only errors.

4. +# Test if the error reported on pg_subscription_workers view is expected.

The view name is wrong in the above comment

5.
+# Check if the view doesn't show any entries after dropping the subscriptions.
+$node_subscriber->safe_psql(
+    'postgres',
+    q[
+DROP SUBSCRIPTION tap_sub;
+DROP SUBSCRIPTION tap_sub_streaming;
+]);
+$result = $node_subscriber->safe_psql('postgres',
+       "SELECT count(1) FROM pg_stat_subscription_workers");
+is($result, q(0), 'no error after dropping subscription');

Don't we need to wait after dropping the subscription and before
checking the view as there might be a slight delay in messages to get
cleared?

7.
+# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to
+# infinite error due to violating the unique constraint.
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
application_name=$appname' PUBLICATION tap_pub WITH (streaming = off,
two_phase = on);");
+my $appname_streaming = 'tap_sub_streaming';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION
'$publisher_connstr application_name=$appname_streaming' PUBLICATION
tap_pub_streaming WITH (streaming = on, two_phase = on);");
+
+$node_publisher->wait_for_catchup($appname);
+$node_publisher->wait_for_catchup($appname_streaming);

How can we ensure that subscriber would have caught up when one of the
tablesync workers is constantly in the error loop? Isn't it possible
that the subscriber didn't send the latest lsn feedback till the table
sync worker is finished?

8.
+# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to
+# infinite error due to violating the unique constraint.

The second sentence of the comment can be written as: "The table sync
for test_tab2 on tap_sub will enter into infinite error loop due to
violating the unique constraint."

--
With Regards,
Amit Kapila.

#237Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#229)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 27, 2021 at 8:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Oct 26, 2021 at 2:27 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Oct 26, 2021 at 5:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I agree that we will need a separate syntax for conflict resolution
but there is some similarity in what I proposed above (On
Error/Conflict [1]) with the existing syntax of Insert ... On
Conflict. I understand that here the context is different and we are
storing this information in the catalog but still there is some syntax
similarity and it will avoid adding new syntax variants.

The problem I see with the suggested syntax:

Alter Subscription <sub_name> On Error ( subscription_parameter [=
value] [, ... ] );
OR
Alter Subscription <sub_name> On Conflict ( subscription_parameter [=
value] [, ... ] );

is that "On Error ..." and "On Conflict" imply an action to be done on
a future condition (Error/Conflict), whereas at least in this case
(skip_xid) it's only AFTER the problem condition has occurred that we
know the XID of the failed transaction that we want to skip. So that
syntax looks a little confusing to me. Unless you had something else
in mind on how it would work?

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

If we want to follow the above, then how do we allow users to reset
the parameter? One way is to allow the user to set xid as 0 which
would mean that we reset it. The other way is to allow SET/RESET
before SKIP but not sure if that is a good option. I was also thinking
about how we can extend the current syntax in the future if we want to
allow users to specify multiple xids? I guess we can either make xid
as a list or allow it to be specified multiple times. We don't need to
do this now but just from the point that we should be able to extend
it later if required.

--
With Regards,
Amit Kapila.

#238Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#234)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 27, 2021 at 2:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 10:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 12:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 8:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

BTW how useful is specifying LSN instead of XID in practice? Given
that this skipping behavior is used to skip the particular transaction
(or its part of operations) in question, I’m not sure specifying LSN
or time is useful.

I think if the user wants to skip multiple xacts, she might want to
use the highest LSN to skip instead of specifying individual xids.

I think it assumes that the situation where the user already knows
multiple transactions that cannot be applied on the subscription but
how do they know?

Either from the error messages in the server log or from the new view
we are planning to add. I think such a case is possible during the
initial synchronization phase where apply worker went ahead then
tablesync worker by skipping to apply the changes on the corresponding
table. After that it is possible, that table sync worker failed during
copy and apply worker fails during the processing of some other rel.

Does it mean that if both initial copy for the corresponding table by
table sync worker and applying changes for other rels by apply worker
fail, we skip both by specifying LSN? If so, can't we disable the
initial copy for the table and skip only the changes for other rels
that cannot be applied?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#239Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#238)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 28, 2021 at 7:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 2:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 10:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

BTW how useful is specifying LSN instead of XID in practice? Given
that this skipping behavior is used to skip the particular transaction
(or its part of operations) in question, I’m not sure specifying LSN
or time is useful.

I think if the user wants to skip multiple xacts, she might want to
use the highest LSN to skip instead of specifying individual xids.

I think it assumes that the situation where the user already knows
multiple transactions that cannot be applied on the subscription but
how do they know?

Either from the error messages in the server log or from the new view
we are planning to add. I think such a case is possible during the
initial synchronization phase where apply worker went ahead then
tablesync worker by skipping to apply the changes on the corresponding
table. After that it is possible, that table sync worker failed during
copy and apply worker fails during the processing of some other rel.

Does it mean that if both initial copy for the corresponding table by
table sync worker and applying changes for other rels by apply worker
fail, we skip both by specifying LSN?

Yes.

If so, can't we disable the
initial copy for the table and skip only the changes for other rels
that cannot be applied?

But anyway you need some way to skip changes via a particular
tablesync worker so that it can mark the relation in 'ready' state. I
think one can also try to use disable_on_error option in such
scenarios depending on how we expose it. Say, if the option means that
all workers (apply or table sync) should be disabled on an error then
it would be a bit tricky but if we can come up with a way to behave
differently for different workers then it is possible to disable one
set of workers and skip the changes in another set of workers.

--
With Regards,
Amit Kapila.

#240Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#237)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 27, 2021 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 8:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

If we want to follow the above, then how do we allow users to reset
the parameter? One way is to allow the user to set xid as 0 which
would mean that we reset it. The other way is to allow SET/RESET
before SKIP but not sure if that is a good option.

After thinking some more on this, I think it is better to not use
SET/RESET keyword here. I think we can use a model similar to how we
allow setting some of the options in Alter Database:

# Set the connection limit for a database:
Alter Database akapila WITH connection_limit = 1;
# Reset the connection limit
Alter Database akapila WITH connection_limit = -1;

Thoughts?

With Regards,
Amit Kapila.

#241Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#236)
Re: Skipping logical replication transactions on subscriber side

On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Thank you for the comments!

Few comments:
==============
1. Is the patch cleaning tablesync error entries except via vacuum? If
not, can't we send a message to remove tablesync errors once tablesync
is successful (say when we reset skip_xid or when tablesync is
finished) or when we drop subscription? I think the same applies to
apply worker. I think we may want to track it in some way whether an
error has occurred before sending the message but relying completely
on a vacuum might be the recipe of bloat. I think in the case of a
drop subscription we can simply send the message as that is not a
frequent operation. I might be missing something here because in the
tests after drop subscription you are expecting the entries from the
view to get cleared

Yes, I think we can have tablesync worker send a message to drop stats
once tablesync is successful. But if we do that also when dropping a
subscription, I think we need to do that only the transaction is
committed since we can drop a subscription that doesn't have a
replication slot and rollback the transaction. Probably we can send
the message only when the subscritpion does have a replication slot.

In other cases, we can remember the subscriptions being dropped and
send the message to drop the statistics of them after committing the
transaction but I’m not sure it’s worth having it. FWIW, we completely
rely on pg_stat_vacuum_stats() for cleaning up the dead tables and
functions. And we don't expect there are many subscriptions on the
database.

What do you think?

2.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>count</structfield> <type>uint8</type>
+      </para>
+      <para>
+       Number of consecutive times the error occurred
+      </para></entry>

Shall we name this field as error_count as there will be other fields
in this view in the future that may not be directly related to the
error?

Agreed.

3.
+
+CREATE VIEW pg_stat_subscription_workers AS
+    SELECT
+ e.subid,
+ s.subname,
+ e.subrelid,
+ e.relid,
+ e.command,
+ e.xid,
+ e.count,
+ e.error_message,
+ e.last_error_time,
+ e.stats_reset
+    FROM (SELECT
+              oid as subid,
+              NULL as relid
+          FROM pg_subscription
+          UNION ALL
+          SELECT
+              srsubid as subid,
+              srrelid as relid
+          FROM pg_subscription_rel
+          WHERE srsubstate <> 'r') sr,
+          LATERAL pg_stat_get_subscription_worker(sr.subid, sr.relid) e

It might be better to use 'w' as an alias instead of 'e' as the
information is now not restricted to only errors.

Agreed.

4. +# Test if the error reported on pg_subscription_workers view is expected.

The view name is wrong in the above comment

Fixed.

5.
+# Check if the view doesn't show any entries after dropping the subscriptions.
+$node_subscriber->safe_psql(
+    'postgres',
+    q[
+DROP SUBSCRIPTION tap_sub;
+DROP SUBSCRIPTION tap_sub_streaming;
+]);
+$result = $node_subscriber->safe_psql('postgres',
+       "SELECT count(1) FROM pg_stat_subscription_workers");
+is($result, q(0), 'no error after dropping subscription');

Don't we need to wait after dropping the subscription and before
checking the view as there might be a slight delay in messages to get
cleared?

I think the test always passes without waiting for the statistics to
be updated since we fetch the subscription worker statistics from the
stats collector based on the entries of pg_subscription catalog. So
this test checks if statistics of already-dropped subscription doesn’t
show up in the view after DROP SUBSCRIPTION, but does not check if the
subscription worker statistics entry in the stats collector gets
removed. The primary reason is that as I mentioned above, the patch
relies on pgstat_vacuum_stat() for cleaning up the dead subscriptions.

7.
+# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to
+# infinite error due to violating the unique constraint.
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
application_name=$appname' PUBLICATION tap_pub WITH (streaming = off,
two_phase = on);");
+my $appname_streaming = 'tap_sub_streaming';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION
'$publisher_connstr application_name=$appname_streaming' PUBLICATION
tap_pub_streaming WITH (streaming = on, two_phase = on);");
+
+$node_publisher->wait_for_catchup($appname);
+$node_publisher->wait_for_catchup($appname_streaming);

How can we ensure that subscriber would have caught up when one of the
tablesync workers is constantly in the error loop? Isn't it possible
that the subscriber didn't send the latest lsn feedback till the table
sync worker is finished?

I thought that even if tablesync for a table is still ongoing, the
apply worker can apply commit records, update write LSN and flush LSN,
and send the feedback to the wal sender. No?

8.
+# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to
+# infinite error due to violating the unique constraint.

The second sentence of the comment can be written as: "The table sync
for test_tab2 on tap_sub will enter into infinite error loop due to
violating the unique constraint."

Fixed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#242Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#239)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 28, 2021 at 1:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 28, 2021 at 7:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 2:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 10:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

BTW how useful is specifying LSN instead of XID in practice? Given
that this skipping behavior is used to skip the particular transaction
(or its part of operations) in question, I’m not sure specifying LSN
or time is useful.

I think if the user wants to skip multiple xacts, she might want to
use the highest LSN to skip instead of specifying individual xids.

I think it assumes that the situation where the user already knows
multiple transactions that cannot be applied on the subscription but
how do they know?

Either from the error messages in the server log or from the new view
we are planning to add. I think such a case is possible during the
initial synchronization phase where apply worker went ahead then
tablesync worker by skipping to apply the changes on the corresponding
table. After that it is possible, that table sync worker failed during
copy and apply worker fails during the processing of some other rel.

Does it mean that if both initial copy for the corresponding table by
table sync worker and applying changes for other rels by apply worker
fail, we skip both by specifying LSN?

Yes.

If so, can't we disable the
initial copy for the table and skip only the changes for other rels
that cannot be applied?

But anyway you need some way to skip changes via a particular
tablesync worker so that it can mark the relation in 'ready' state.

Right.

I
think one can also try to use disable_on_error option in such
scenarios depending on how we expose it. Say, if the option means that
all workers (apply or table sync) should be disabled on an error then
it would be a bit tricky but if we can come up with a way to behave
differently for different workers then it is possible to disable one
set of workers and skip the changes in another set of workers.

Yes, I would prefer to skip individual transactions in question rather
than skip changes until the particular LSN. It’s not advisable to use
LSN to skip changes since it has a risk of skipping unrelated changes
too.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#243Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#240)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 28, 2021 at 1:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 8:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

If we want to follow the above, then how do we allow users to reset
the parameter? One way is to allow the user to set xid as 0 which
would mean that we reset it. The other way is to allow SET/RESET
before SKIP but not sure if that is a good option.

After thinking some more on this, I think it is better to not use
SET/RESET keyword here. I think we can use a model similar to how we
allow setting some of the options in Alter Database:

# Set the connection limit for a database:
Alter Database akapila WITH connection_limit = 1;
# Reset the connection limit
Alter Database akapila WITH connection_limit = -1;

Thoughts?

Agreed.

Another thing I'm concerned is that the syntax "SKIP (
subscription_parameter [=value] [, ...])" looks like we can specify
multiple options for example, "SKIP (xid = '100', lsn =
'0/12345678’)”. Is there a case where we need to specify multiple
options? Perhaps when specifying the target XID and operations for
example, “SKIP (xid = 100, action = ‘insert, update’)”?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#244Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#243)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 28, 2021 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 28, 2021 at 1:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 8:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

If we want to follow the above, then how do we allow users to reset
the parameter? One way is to allow the user to set xid as 0 which
would mean that we reset it. The other way is to allow SET/RESET
before SKIP but not sure if that is a good option.

After thinking some more on this, I think it is better to not use
SET/RESET keyword here. I think we can use a model similar to how we
allow setting some of the options in Alter Database:

# Set the connection limit for a database:
Alter Database akapila WITH connection_limit = 1;
# Reset the connection limit
Alter Database akapila WITH connection_limit = -1;

Thoughts?

Agreed.

Another thing I'm concerned is that the syntax "SKIP (
subscription_parameter [=value] [, ...])" looks like we can specify
multiple options for example, "SKIP (xid = '100', lsn =
'0/12345678’)”. Is there a case where we need to specify multiple
options? Perhaps when specifying the target XID and operations for
example, “SKIP (xid = 100, action = ‘insert, update’)”?

Yeah, or maybe prepared transaction identifier and actions. BTW, if we
want to proceed without the SET/RESET keyword then you can prepare the
SKIP xid patch as the second in the series and we can probably work on
the RESET syntax as a completely independent patch.

--
With Regards,
Amit Kapila.

#245Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#242)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 28, 2021 at 10:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 28, 2021 at 1:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 28, 2021 at 7:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Either from the error messages in the server log or from the new view
we are planning to add. I think such a case is possible during the
initial synchronization phase where apply worker went ahead then
tablesync worker by skipping to apply the changes on the corresponding
table. After that it is possible, that table sync worker failed during
copy and apply worker fails during the processing of some other rel.

Does it mean that if both initial copy for the corresponding table by
table sync worker and applying changes for other rels by apply worker
fail, we skip both by specifying LSN?

Yes.

If so, can't we disable the
initial copy for the table and skip only the changes for other rels
that cannot be applied?

But anyway you need some way to skip changes via a particular
tablesync worker so that it can mark the relation in 'ready' state.

Right.

I
think one can also try to use disable_on_error option in such
scenarios depending on how we expose it. Say, if the option means that
all workers (apply or table sync) should be disabled on an error then
it would be a bit tricky but if we can come up with a way to behave
differently for different workers then it is possible to disable one
set of workers and skip the changes in another set of workers.

Yes, I would prefer to skip individual transactions in question rather
than skip changes until the particular LSN. It’s not advisable to use
LSN to skip changes since it has a risk of skipping unrelated changes
too.

Fair enough but I think providing LSN is also useful if user can
identify the same easily as otherwise there might be more
administrative work to make replication progress.

--
With Regards,
Amit Kapila.

#246Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#241)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Thank you for the comments!

Few comments:
==============
1. Is the patch cleaning tablesync error entries except via vacuum? If
not, can't we send a message to remove tablesync errors once tablesync
is successful (say when we reset skip_xid or when tablesync is
finished) or when we drop subscription? I think the same applies to
apply worker. I think we may want to track it in some way whether an
error has occurred before sending the message but relying completely
on a vacuum might be the recipe of bloat. I think in the case of a
drop subscription we can simply send the message as that is not a
frequent operation. I might be missing something here because in the
tests after drop subscription you are expecting the entries from the
view to get cleared

Yes, I think we can have tablesync worker send a message to drop stats
once tablesync is successful. But if we do that also when dropping a
subscription, I think we need to do that only the transaction is
committed since we can drop a subscription that doesn't have a
replication slot and rollback the transaction. Probably we can send
the message only when the subscritpion does have a replication slot.

Right. And probably for apply worker after updating skip xid.

In other cases, we can remember the subscriptions being dropped and
send the message to drop the statistics of them after committing the
transaction but I’m not sure it’s worth having it.

Yeah, let's not go to that extent. I think in most cases subscriptions
will have corresponding slots.

FWIW, we completely

rely on pg_stat_vacuum_stats() for cleaning up the dead tables and
functions. And we don't expect there are many subscriptions on the
database.

True, but we do send it for the database, so let's do it for the cases
you explained in the first paragraph.

5.
+# Check if the view doesn't show any entries after dropping the subscriptions.
+$node_subscriber->safe_psql(
+    'postgres',
+    q[
+DROP SUBSCRIPTION tap_sub;
+DROP SUBSCRIPTION tap_sub_streaming;
+]);
+$result = $node_subscriber->safe_psql('postgres',
+       "SELECT count(1) FROM pg_stat_subscription_workers");
+is($result, q(0), 'no error after dropping subscription');

Don't we need to wait after dropping the subscription and before
checking the view as there might be a slight delay in messages to get
cleared?

I think the test always passes without waiting for the statistics to
be updated since we fetch the subscription worker statistics from the
stats collector based on the entries of pg_subscription catalog. So
this test checks if statistics of already-dropped subscription doesn’t
show up in the view after DROP SUBSCRIPTION, but does not check if the
subscription worker statistics entry in the stats collector gets
removed. The primary reason is that as I mentioned above, the patch
relies on pgstat_vacuum_stat() for cleaning up the dead subscriptions.

That makes sense.

7.
+# Create subscriptions. The table sync for test_tab2 on tap_sub will enter to
+# infinite error due to violating the unique constraint.
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
application_name=$appname' PUBLICATION tap_pub WITH (streaming = off,
two_phase = on);");
+my $appname_streaming = 'tap_sub_streaming';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION
'$publisher_connstr application_name=$appname_streaming' PUBLICATION
tap_pub_streaming WITH (streaming = on, two_phase = on);");
+
+$node_publisher->wait_for_catchup($appname);
+$node_publisher->wait_for_catchup($appname_streaming);

How can we ensure that subscriber would have caught up when one of the
tablesync workers is constantly in the error loop? Isn't it possible
that the subscriber didn't send the latest lsn feedback till the table
sync worker is finished?

I thought that even if tablesync for a table is still ongoing, the
apply worker can apply commit records, update write LSN and flush LSN,
and send the feedback to the wal sender. No?

You are right, this case will work.

--
With Regards,
Amit Kapila.

#247vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#224)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 21, 2021 at 10:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 20, 2021 at 12:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Oct 18, 2021 at 12:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far.

Minor comment on patch 17-0003

Thank you for the comment!

src/backend/replication/logical/worker.c

(1) Typo in apply_handle_stream_abort() comment:

/* Stop skipping transaction transaction, if enabled */
should be:
/* Stop skipping transaction changes, if enabled */

Fixed.

I've attached updated patches.

I have started to have a look at the feature and review the patch, my
initial comments:
1) I could specify invalid subscriber id to
pg_stat_reset_subscription_worker which creates an assertion failure?

+static void
+pgstat_recv_resetsubworkercounter(PgStat_MsgResetsubworkercounter
*msg, int len)
+{
+       PgStat_StatSubWorkerEntry *wentry;
+
+       Assert(OidIsValid(msg->m_subid));
+
+       /* Get subscription worker stats */
+       wentry = pgstat_get_subworker_entry(msg->m_subid,
msg->m_subrelid, false);

postgres=# select pg_stat_reset_subscription_worker(NULL, NULL);
pg_stat_reset_subscription_worker
-----------------------------------

(1 row)

TRAP: FailedAssertion("OidIsValid(msg->m_subid)", File: "pgstat.c",
Line: 5742, PID: 789588)
postgres: stats collector (ExceptionalCondition+0xd0)[0x55d33bba4778]
postgres: stats collector (+0x545a43)[0x55d33b90aa43]
postgres: stats collector (+0x541fad)[0x55d33b906fad]
postgres: stats collector (pgstat_start+0xdd)[0x55d33b9020e1]
postgres: stats collector (+0x54ae0c)[0x55d33b90fe0c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7f8509ccc1f0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x57)[0x7f8509a78ac7]
postgres: stats collector (+0x548cab)[0x55d33b90dcab]
postgres: stats collector (PostmasterMain+0x134c)[0x55d33b90d5c6]
postgres: stats collector (+0x43b8be)[0x55d33b8008be]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7f8509992565]
postgres: stats collector (_start+0x2e)[0x55d33b48e4fe]

2) I was able to provide invalid relation id for
pg_stat_reset_subscription_worker? Should we add any validation for
this?
select pg_stat_reset_subscription_worker(16389, -1);

+pg_stat_reset_subscription_worker(PG_FUNCTION_ARGS)
+{
+       Oid                     subid = PG_GETARG_OID(0);
+       Oid                     relid;
+
+       if (PG_ARGISNULL(1))
+               relid = InvalidOid;             /* reset apply worker
error stats */
+       else
+               relid = PG_GETARG_OID(1);       /* reset table sync
worker error stats */
+
+       pgstat_reset_subworker_stats(subid, relid);
+
+       PG_RETURN_VOID();
+}

3) 025_error_report test is failing because of one of the recent
commit that has made some changes in the way node is initialized in
the tap tests, corresponding changes need to be done in
025_error_report:
t/025_error_report.pl .............. Dubious, test returned 2 (wstat 512, 0x200)
No subtests run
t/100_bugs.pl ...................... ok

Regards,
Vignesh

#248Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#244)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 28, 2021 at 6:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 28, 2021 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 28, 2021 at 1:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 4:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Oct 27, 2021 at 8:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Oct 26, 2021 at 7:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

You have a point. The other alternatives on this line could be:

Alter Subscription <sub_name> SKIP ( subscription_parameter [=value] [, ... ] );

where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

Looks better.

If we want to follow the above, then how do we allow users to reset
the parameter? One way is to allow the user to set xid as 0 which
would mean that we reset it. The other way is to allow SET/RESET
before SKIP but not sure if that is a good option.

After thinking some more on this, I think it is better to not use
SET/RESET keyword here. I think we can use a model similar to how we
allow setting some of the options in Alter Database:

# Set the connection limit for a database:
Alter Database akapila WITH connection_limit = 1;
# Reset the connection limit
Alter Database akapila WITH connection_limit = -1;

Thoughts?

Agreed.

Another thing I'm concerned is that the syntax "SKIP (
subscription_parameter [=value] [, ...])" looks like we can specify
multiple options for example, "SKIP (xid = '100', lsn =
'0/12345678’)”. Is there a case where we need to specify multiple
options? Perhaps when specifying the target XID and operations for
example, “SKIP (xid = 100, action = ‘insert, update’)”?

Yeah, or maybe prepared transaction identifier and actions.

Prepared transactions seem not to need to be skipped since those
changes are already successfully applied, though.

BTW, if we
want to proceed without the SET/RESET keyword then you can prepare the
SKIP xid patch as the second in the series and we can probably work on
the RESET syntax as a completely independent patch.

Right. If we do that, the second patch can be an independent patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#249Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#246)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 28, 2021 at 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Thank you for the comments!

Few comments:
==============
1. Is the patch cleaning tablesync error entries except via vacuum? If
not, can't we send a message to remove tablesync errors once tablesync
is successful (say when we reset skip_xid or when tablesync is
finished) or when we drop subscription? I think the same applies to
apply worker. I think we may want to track it in some way whether an
error has occurred before sending the message but relying completely
on a vacuum might be the recipe of bloat. I think in the case of a
drop subscription we can simply send the message as that is not a
frequent operation. I might be missing something here because in the
tests after drop subscription you are expecting the entries from the
view to get cleared

Yes, I think we can have tablesync worker send a message to drop stats
once tablesync is successful. But if we do that also when dropping a
subscription, I think we need to do that only the transaction is
committed since we can drop a subscription that doesn't have a
replication slot and rollback the transaction. Probably we can send
the message only when the subscritpion does have a replication slot.

Right. And probably for apply worker after updating skip xid.

I'm not sure it's better to drop apply worker stats after resetting
skip xid (i.g., after skipping the transaction). Since the view is a
cumulative view and has last_error_time, I thought we can have the
apply worker stats until the subscription gets dropped. Since the
error reporting message could get lost, no entry in the view doesn’t
mean the worker doesn’t face an issue.

In other cases, we can remember the subscriptions being dropped and
send the message to drop the statistics of them after committing the
transaction but I’m not sure it’s worth having it.

Yeah, let's not go to that extent. I think in most cases subscriptions
will have corresponding slots.

Agreed.

FWIW, we completely

rely on pg_stat_vacuum_stats() for cleaning up the dead tables and
functions. And we don't expect there are many subscriptions on the
database.

True, but we do send it for the database, so let's do it for the cases
you explained in the first paragraph.

Agreed.

I've attached a new version patch. Since the syntax of skipping
transaction id is under the discussion I've attached only the error
reporting patch for now.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v19-0001-Add-a-subscription-worker-statistics-view-pg_sta.patchapplication/x-patch; name=v19-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch
#250Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#247)
Re: Skipping logical replication transactions on subscriber side

On Thu, Oct 28, 2021 at 7:47 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, Oct 21, 2021 at 10:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 20, 2021 at 12:33 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Oct 18, 2021 at 12:34 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches that incorporate all comments I got so far.

Minor comment on patch 17-0003

Thank you for the comment!

src/backend/replication/logical/worker.c

(1) Typo in apply_handle_stream_abort() comment:

/* Stop skipping transaction transaction, if enabled */
should be:
/* Stop skipping transaction changes, if enabled */

Fixed.

I've attached updated patches.

I have started to have a look at the feature and review the patch, my
initial comments:

Thank you for the comments!

1) I could specify invalid subscriber id to
pg_stat_reset_subscription_worker which creates an assertion failure?

+static void
+pgstat_recv_resetsubworkercounter(PgStat_MsgResetsubworkercounter
*msg, int len)
+{
+       PgStat_StatSubWorkerEntry *wentry;
+
+       Assert(OidIsValid(msg->m_subid));
+
+       /* Get subscription worker stats */
+       wentry = pgstat_get_subworker_entry(msg->m_subid,
msg->m_subrelid, false);

postgres=# select pg_stat_reset_subscription_worker(NULL, NULL);
pg_stat_reset_subscription_worker
-----------------------------------

(1 row)

TRAP: FailedAssertion("OidIsValid(msg->m_subid)", File: "pgstat.c",
Line: 5742, PID: 789588)
postgres: stats collector (ExceptionalCondition+0xd0)[0x55d33bba4778]
postgres: stats collector (+0x545a43)[0x55d33b90aa43]
postgres: stats collector (+0x541fad)[0x55d33b906fad]
postgres: stats collector (pgstat_start+0xdd)[0x55d33b9020e1]
postgres: stats collector (+0x54ae0c)[0x55d33b90fe0c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7f8509ccc1f0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x57)[0x7f8509a78ac7]
postgres: stats collector (+0x548cab)[0x55d33b90dcab]
postgres: stats collector (PostmasterMain+0x134c)[0x55d33b90d5c6]
postgres: stats collector (+0x43b8be)[0x55d33b8008be]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7f8509992565]
postgres: stats collector (_start+0x2e)[0x55d33b48e4fe]

Good catch. Fixed.

2) I was able to provide invalid relation id for
pg_stat_reset_subscription_worker? Should we add any validation for
this?
select pg_stat_reset_subscription_worker(16389, -1);

+pg_stat_reset_subscription_worker(PG_FUNCTION_ARGS)
+{
+       Oid                     subid = PG_GETARG_OID(0);
+       Oid                     relid;
+
+       if (PG_ARGISNULL(1))
+               relid = InvalidOid;             /* reset apply worker
error stats */
+       else
+               relid = PG_GETARG_OID(1);       /* reset table sync
worker error stats */
+
+       pgstat_reset_subworker_stats(subid, relid);
+
+       PG_RETURN_VOID();
+}

I think that validation is not necessarily necessary. OID '-1' is interpreted as
4294967295 and we don't reject it.

3) 025_error_report test is failing because of one of the recent
commit that has made some changes in the way node is initialized in
the tap tests, corresponding changes need to be done in
025_error_report:
t/025_error_report.pl .............. Dubious, test returned 2 (wstat 512, 0x200)
No subtests run
t/100_bugs.pl ...................... ok

Fixed.

These comments are incorporated into the latest version patch I just
submitted[1]/messages/by-id/CAD21AoDY-9_x819F_m1_wfCVXXFJrGiSmR2MfC9Nw4nW8Om0qA@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoDY-9_x819F_m1_wfCVXXFJrGiSmR2MfC9Nw4nW8Om0qA@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#251Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#248)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 29, 2021 at 6:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 28, 2021 at 6:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 28, 2021 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Another thing I'm concerned is that the syntax "SKIP (
subscription_parameter [=value] [, ...])" looks like we can specify
multiple options for example, "SKIP (xid = '100', lsn =
'0/12345678’)”. Is there a case where we need to specify multiple
options? Perhaps when specifying the target XID and operations for
example, “SKIP (xid = 100, action = ‘insert, update’)”?

Yeah, or maybe prepared transaction identifier and actions.

Prepared transactions seem not to need to be skipped since those
changes are already successfully applied, though.

I think it can also fail before apply of prepare is successful. Right
now, we are just logging xid in error cases bug gid could also be
logged as we receive that in begin_prepare. I think currently xid is
sufficient but I have given this as an example for future
consideration.

--
With Regards,
Amit Kapila.

#252Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#249)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 29, 2021 at 10:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 28, 2021 at 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Thank you for the comments!

Few comments:
==============
1. Is the patch cleaning tablesync error entries except via vacuum? If
not, can't we send a message to remove tablesync errors once tablesync
is successful (say when we reset skip_xid or when tablesync is
finished) or when we drop subscription? I think the same applies to
apply worker. I think we may want to track it in some way whether an
error has occurred before sending the message but relying completely
on a vacuum might be the recipe of bloat. I think in the case of a
drop subscription we can simply send the message as that is not a
frequent operation. I might be missing something here because in the
tests after drop subscription you are expecting the entries from the
view to get cleared

Yes, I think we can have tablesync worker send a message to drop stats
once tablesync is successful. But if we do that also when dropping a
subscription, I think we need to do that only the transaction is
committed since we can drop a subscription that doesn't have a
replication slot and rollback the transaction. Probably we can send
the message only when the subscritpion does have a replication slot.

Right. And probably for apply worker after updating skip xid.

I'm not sure it's better to drop apply worker stats after resetting
skip xid (i.g., after skipping the transaction). Since the view is a
cumulative view and has last_error_time, I thought we can have the
apply worker stats until the subscription gets dropped.

Fair enough. So statistics can be removed either by vacuum or drop
subscription. Also, if we go by this logic then there is no harm in
retaining the stat entries for tablesync errors. Why have different
behavior for apply and tablesync workers?

I have another question in this regard. Currently, the reset function
seems to be resetting only the first stat entry for a subscription.
But can't we have multiple stat entries for a subscription considering
the view's cumulative nature?

--
With Regards,
Amit Kapila.

#253Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#252)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 29, 2021 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 29, 2021 at 10:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'm not sure it's better to drop apply worker stats after resetting
skip xid (i.g., after skipping the transaction). Since the view is a
cumulative view and has last_error_time, I thought we can have the
apply worker stats until the subscription gets dropped.

Fair enough. So statistics can be removed either by vacuum or drop
subscription. Also, if we go by this logic then there is no harm in
retaining the stat entries for tablesync errors. Why have different
behavior for apply and tablesync workers?

I have another question in this regard. Currently, the reset function
seems to be resetting only the first stat entry for a subscription.
But can't we have multiple stat entries for a subscription considering
the view's cumulative nature?

Don't we want these stats to be dealt in the same way as tables and
functions as all the stats entries (subscription entries) are specific
to a particular database? If so, I think we should write/read these
to/from db specific stats file in the same way as we do for tables or
functions. I think in the current patch, it will unnecessarily read
and probably write subscription stats even when those are not
required.

--
With Regards,
Amit Kapila.

#254Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#252)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 29, 2021 at 10:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 28, 2021 at 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Thank you for the comments!

Few comments:
==============
1. Is the patch cleaning tablesync error entries except via vacuum? If
not, can't we send a message to remove tablesync errors once tablesync
is successful (say when we reset skip_xid or when tablesync is
finished) or when we drop subscription? I think the same applies to
apply worker. I think we may want to track it in some way whether an
error has occurred before sending the message but relying completely
on a vacuum might be the recipe of bloat. I think in the case of a
drop subscription we can simply send the message as that is not a
frequent operation. I might be missing something here because in the
tests after drop subscription you are expecting the entries from the
view to get cleared

Yes, I think we can have tablesync worker send a message to drop stats
once tablesync is successful. But if we do that also when dropping a
subscription, I think we need to do that only the transaction is
committed since we can drop a subscription that doesn't have a
replication slot and rollback the transaction. Probably we can send
the message only when the subscritpion does have a replication slot.

Right. And probably for apply worker after updating skip xid.

I'm not sure it's better to drop apply worker stats after resetting
skip xid (i.g., after skipping the transaction). Since the view is a
cumulative view and has last_error_time, I thought we can have the
apply worker stats until the subscription gets dropped.

Fair enough. So statistics can be removed either by vacuum or drop
subscription. Also, if we go by this logic then there is no harm in
retaining the stat entries for tablesync errors. Why have different
behavior for apply and tablesync workers?

My understanding is that the subscription worker statistics entry
corresponds to workers (but not physical workers since the physical
process is changed after restarting). So if the worker finishes its
jobs, it is no longer necessary to show errors since further problems
will not occur after that. Table sync worker’s job finishes when
completing table copy (unless table sync is performed again by REFRESH
PUBLICATION) whereas apply worker’s job finishes when the subscription
is dropped. Also, I’m concerned about a situation like where a lot of
table sync failed. In which case, if we don’t drop table sync worker
statistics after completing its job, we end up having a lot of entries
in the view unless the subscription is dropped.

I have another question in this regard. Currently, the reset function
seems to be resetting only the first stat entry for a subscription.
But can't we have multiple stat entries for a subscription considering
the view's cumulative nature?

I might be missing your points but I think that with the current
patch, the view has multiple entries for a subscription. That is,
there is one apply worker stats and multiple table sync worker stats
per subscription. And pg_stat_reset_subscription() function can reset
any stats by specifying subscription OID and relation OID.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#255Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#253)
Re: Skipping logical replication transactions on subscriber side

On Sat, Oct 30, 2021 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 29, 2021 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 29, 2021 at 10:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'm not sure it's better to drop apply worker stats after resetting
skip xid (i.g., after skipping the transaction). Since the view is a
cumulative view and has last_error_time, I thought we can have the
apply worker stats until the subscription gets dropped.

Fair enough. So statistics can be removed either by vacuum or drop
subscription. Also, if we go by this logic then there is no harm in
retaining the stat entries for tablesync errors. Why have different
behavior for apply and tablesync workers?

I have another question in this regard. Currently, the reset function
seems to be resetting only the first stat entry for a subscription.
But can't we have multiple stat entries for a subscription considering
the view's cumulative nature?

Don't we want these stats to be dealt in the same way as tables and
functions as all the stats entries (subscription entries) are specific
to a particular database? If so, I think we should write/read these
to/from db specific stats file in the same way as we do for tables or
functions. I think in the current patch, it will unnecessarily read
and probably write subscription stats even when those are not
required.

Good point! So probably we should have PgStat_StatDBEntry have the
hash table for subscription worker statistics, right?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#256Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#249)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 29, 2021 at 4:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached a new version patch. Since the syntax of skipping
transaction id is under the discussion I've attached only the error
reporting patch for now.

I have some comments on the v19-0001 patch:

v19-0001

(1) doc/src/sgml/monitoring.sgml
Seems to be missing the word "information":

BEFORE:
+      <entry>At least one row per subscription, showing about errors that
+      occurred on subscription.
AFTER:
+      <entry>At least one row per subscription, showing information about
+      errors that occurred on subscription.

(2) pg_stat_reset_subscription_worker(subid Oid, relid Oid)
First of all, I think that the documentation for this function should
make it clear that a non-NULL "subid" parameter is required for both
reset cases (tablesync and apply).
Perhaps this could be done by simply changing the first sentence to say:
"Resets statistics of a single subscription worker error, for a worker
running on subscription with <parameter>subid</parameter>."
(and then can remove " running on the subscription with
<parameter>subid</parameter>" from the last sentence)

I think that the documentation for this function should say that it
should be used in conjunction with the "pg_stat_subscription_workers"
view in order to obtain the required subid/relid values for resetting.
(and should provide a link to the documentation for that view)
Also, I think that the function documentation should make it clear
that the tablesync error case is indicated by a NULL "command" in the
information returned from the "pg_stat_subscription_workers" view
(otherwise the user needs to look at the server log in order to
determine whether the error is for the apply/tablesync worker).

Finally, there are currently no tests for this new function.

(3) pg_stat_subscription_workers
In the documentation for this, the description for the "command"
column says: "This field is always NULL if the error was reported
during the initial data copy."
Some users may not realise that this refers to "tablesync", so perhaps
add " (tablesync)" to the end of this sentence, or similar.

Regards,
Greg Nancarrow
Fujitsu Australia

#257Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#254)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 1, 2021 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Fair enough. So statistics can be removed either by vacuum or drop
subscription. Also, if we go by this logic then there is no harm in
retaining the stat entries for tablesync errors. Why have different
behavior for apply and tablesync workers?

My understanding is that the subscription worker statistics entry
corresponds to workers (but not physical workers since the physical
process is changed after restarting). So if the worker finishes its
jobs, it is no longer necessary to show errors since further problems
will not occur after that. Table sync worker’s job finishes when
completing table copy (unless table sync is performed again by REFRESH
PUBLICATION) whereas apply worker’s job finishes when the subscription
is dropped.

Actually, I am not very sure how users can use the old error
information after we allowed skipping the conflicting xid. Say, if
they want to add/remove some constraints on the table based on
previous errors then they might want to refer to errors of both the
apply worker and table sync worker.

Also, I’m concerned about a situation like where a lot of
table sync failed. In which case, if we don’t drop table sync worker
statistics after completing its job, we end up having a lot of entries
in the view unless the subscription is dropped.

True, but the same could be said for apply workers where errors can be
accumulated over a period of time.

I have another question in this regard. Currently, the reset function
seems to be resetting only the first stat entry for a subscription.
But can't we have multiple stat entries for a subscription considering
the view's cumulative nature?

I might be missing your points but I think that with the current
patch, the view has multiple entries for a subscription. That is,
there is one apply worker stats and multiple table sync worker stats
per subscription.

Can't we have multiple entries for one apply worker?

And pg_stat_reset_subscription() function can reset
any stats by specifying subscription OID and relation OID.

Say, if the user has supplied just subscription OID then isn't it
better to reset all the error entries for that subscription?

--
With Regards,
Amit Kapila.

#258Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#255)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 1, 2021 at 7:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Oct 30, 2021 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Don't we want these stats to be dealt in the same way as tables and
functions as all the stats entries (subscription entries) are specific
to a particular database? If so, I think we should write/read these
to/from db specific stats file in the same way as we do for tables or
functions. I think in the current patch, it will unnecessarily read
and probably write subscription stats even when those are not
required.

Good point! So probably we should have PgStat_StatDBEntry have the
hash table for subscription worker statistics, right?

Yes.

--
With Regards,
Amit Kapila.

#259Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#257)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 2, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Nov 1, 2021 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Fair enough. So statistics can be removed either by vacuum or drop
subscription. Also, if we go by this logic then there is no harm in
retaining the stat entries for tablesync errors. Why have different
behavior for apply and tablesync workers?

My understanding is that the subscription worker statistics entry
corresponds to workers (but not physical workers since the physical
process is changed after restarting). So if the worker finishes its
jobs, it is no longer necessary to show errors since further problems
will not occur after that. Table sync worker’s job finishes when
completing table copy (unless table sync is performed again by REFRESH
PUBLICATION) whereas apply worker’s job finishes when the subscription
is dropped.

Actually, I am not very sure how users can use the old error
information after we allowed skipping the conflicting xid. Say, if
they want to add/remove some constraints on the table based on
previous errors then they might want to refer to errors of both the
apply worker and table sync worker.

I think that in general, statistics should be retained as long as a
corresponding object exists on the database, like other cumulative
statistic views. So I’m concerned that an entry of a cumulative stats
view is automatically removed by a non-stats-related function (i.g.,
ALTER SUBSCRIPTION SKIP). Which seems a new behavior for cumulative
stats views.

We can retain the stats entries for table sync worker but what I want
to avoid is that the view shows many old entries that will never be
updated. I've sometimes seen cases where the user mistakenly restored
table data on the subscriber before creating a subscription, failed
table sync on many tables due to unique violation, and truncated
tables on the subscriber. I think that unlike the stats entries for
apply worker, retaining the stats entries for table sync could be
harmful since it’s likely to be a large amount (even hundreds of
entries). Especially, it could lead to bloat the stats file since it
has an error message. So if we do that, I'd like to provide a function
for users to remove (not reset) stats entries manually. Even if we
removed stats entries after skipping the transaction in question, the
stats entries would be left if we resolve the conflict in another way.

I have another question in this regard. Currently, the reset function
seems to be resetting only the first stat entry for a subscription.
But can't we have multiple stat entries for a subscription considering
the view's cumulative nature?

I might be missing your points but I think that with the current
patch, the view has multiple entries for a subscription. That is,
there is one apply worker stats and multiple table sync worker stats
per subscription.

Can't we have multiple entries for one apply worker?

Umm, I think we have one stats entry per one logical replication
worker (apply worker or table sync worker). Am I missing something?

And pg_stat_reset_subscription() function can reset
any stats by specifying subscription OID and relation OID.

Say, if the user has supplied just subscription OID then isn't it
better to reset all the error entries for that subscription?

Agreed. So pg_stat_reset_subscription_worker(oid) removes all errors
for the subscription whereas pg_stat_reset_subscription_worker(oid,
null) reset only the apply worker error for the subscription?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#260tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#249)
RE: Skipping logical replication transactions on subscriber side

On Friday, October 29, 2021 1:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached a new version patch. Since the syntax of skipping
transaction id is under the discussion I've attached only the error
reporting patch for now.

Thanks for your patch. Some comments on 026_error_report.pl file.

1. For test_tab_streaming table, the test only checks initial table sync and
doesn't check anything related to the new view pg_stat_subscription_workers. Do
you want to add more test cases for it?

2. The subscriptions are created with two_phase option on, but I didn't see two
phase transactions. Should we add some test cases for two phase transactions?

3. Errors reported by table sync worker will be cleaned up if the table sync
worker finish, should we add this case to the test? (After checking the table
sync worker's error in the view, delete data which caused the error, then check
the view again after table sync worker finished.)

Regards
Tang

#261Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#259)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 2, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I have another question in this regard. Currently, the reset function
seems to be resetting only the first stat entry for a subscription.
But can't we have multiple stat entries for a subscription considering
the view's cumulative nature?

I might be missing your points but I think that with the current
patch, the view has multiple entries for a subscription. That is,
there is one apply worker stats and multiple table sync worker stats
per subscription.

Can't we have multiple entries for one apply worker?

Umm, I think we have one stats entry per one logical replication
worker (apply worker or table sync worker). Am I missing something?

No, you are right. I got confused.

And pg_stat_reset_subscription() function can reset
any stats by specifying subscription OID and relation OID.

Say, if the user has supplied just subscription OID then isn't it
better to reset all the error entries for that subscription?

Agreed. So pg_stat_reset_subscription_worker(oid) removes all errors
for the subscription whereas pg_stat_reset_subscription_worker(oid,
null) reset only the apply worker error for the subscription?

Yes.

--
With Regards,
Amit Kapila.

#262Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#259)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 2, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Nov 1, 2021 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Fair enough. So statistics can be removed either by vacuum or drop
subscription. Also, if we go by this logic then there is no harm in
retaining the stat entries for tablesync errors. Why have different
behavior for apply and tablesync workers?

My understanding is that the subscription worker statistics entry
corresponds to workers (but not physical workers since the physical
process is changed after restarting). So if the worker finishes its
jobs, it is no longer necessary to show errors since further problems
will not occur after that. Table sync worker’s job finishes when
completing table copy (unless table sync is performed again by REFRESH
PUBLICATION) whereas apply worker’s job finishes when the subscription
is dropped.

Actually, I am not very sure how users can use the old error
information after we allowed skipping the conflicting xid. Say, if
they want to add/remove some constraints on the table based on
previous errors then they might want to refer to errors of both the
apply worker and table sync worker.

I think that in general, statistics should be retained as long as a
corresponding object exists on the database, like other cumulative
statistic views. So I’m concerned that an entry of a cumulative stats
view is automatically removed by a non-stats-related function (i.g.,
ALTER SUBSCRIPTION SKIP). Which seems a new behavior for cumulative
stats views.

We can retain the stats entries for table sync worker but what I want
to avoid is that the view shows many old entries that will never be
updated. I've sometimes seen cases where the user mistakenly restored
table data on the subscriber before creating a subscription, failed
table sync on many tables due to unique violation, and truncated
tables on the subscriber. I think that unlike the stats entries for
apply worker, retaining the stats entries for table sync could be
harmful since it’s likely to be a large amount (even hundreds of
entries). Especially, it could lead to bloat the stats file since it
has an error message. So if we do that, I'd like to provide a function
for users to remove (not reset) stats entries manually.

If we follow the idea of keeping stats at db level (in
PgStat_StatDBEntry) as discussed above then I think we already have a
way to remove stat entries via pg_stat_reset which removes the stats
corresponding to tables, functions and after this patch corresponding
to subscriptions as well for the current database. Won't that be
sufficient? I see your point but I think it may be better if we keep
the same behavior for stats of apply and table sync workers.

Following the tables, functions, I thought of keeping the name of the
reset function similar to "pg_stat_reset_single_table_counters" but I
feel the currently used name "pg_stat_reset_subscription_worker" in
the patch is better. Do let me know what you think?

--
With Regards,
Amit Kapila.

#263vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#249)
Re: Skipping logical replication transactions on subscriber side

On Fri, Oct 29, 2021 at 10:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 28, 2021 at 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Thank you for the comments!

Few comments:
==============
1. Is the patch cleaning tablesync error entries except via vacuum? If
not, can't we send a message to remove tablesync errors once tablesync
is successful (say when we reset skip_xid or when tablesync is
finished) or when we drop subscription? I think the same applies to
apply worker. I think we may want to track it in some way whether an
error has occurred before sending the message but relying completely
on a vacuum might be the recipe of bloat. I think in the case of a
drop subscription we can simply send the message as that is not a
frequent operation. I might be missing something here because in the
tests after drop subscription you are expecting the entries from the
view to get cleared

Yes, I think we can have tablesync worker send a message to drop stats
once tablesync is successful. But if we do that also when dropping a
subscription, I think we need to do that only the transaction is
committed since we can drop a subscription that doesn't have a
replication slot and rollback the transaction. Probably we can send
the message only when the subscritpion does have a replication slot.

Right. And probably for apply worker after updating skip xid.

I'm not sure it's better to drop apply worker stats after resetting
skip xid (i.g., after skipping the transaction). Since the view is a
cumulative view and has last_error_time, I thought we can have the
apply worker stats until the subscription gets dropped. Since the
error reporting message could get lost, no entry in the view doesn’t
mean the worker doesn’t face an issue.

In other cases, we can remember the subscriptions being dropped and
send the message to drop the statistics of them after committing the
transaction but I’m not sure it’s worth having it.

Yeah, let's not go to that extent. I think in most cases subscriptions
will have corresponding slots.

Agreed.

FWIW, we completely

rely on pg_stat_vacuum_stats() for cleaning up the dead tables and
functions. And we don't expect there are many subscriptions on the
database.

True, but we do send it for the database, so let's do it for the cases
you explained in the first paragraph.

Agreed.

I've attached a new version patch. Since the syntax of skipping
transaction id is under the discussion I've attached only the error
reporting patch for now.

Thanks for the updated patch, few comments:
1) This check and return can be moved above CreateTemplateTupleDesc so
that the tuple descriptor need not be created if there is no worker
statistics
+       BlessTupleDesc(tupdesc);
+
+       /* Get subscription worker stats */
+       wentry = pgstat_fetch_subworker(subid, subrelid);
+
+       /* Return NULL if there is no worker statistics */
+       if (wentry == NULL)
+               PG_RETURN_NULL();
+
+       /* Initialise values and NULL flags arrays */
+       MemSet(values, 0, sizeof(values));
+       MemSet(nulls, 0, sizeof(nulls));
2) "NULL for the main apply worker" is mentioned as "null for the main
apply worker" in case of pg_stat_subscription view, we can mention it
similarly.
+      <para>
+       OID of the relation that the worker is synchronizing; NULL for the
+       main apply worker
+      </para></entry>
3) Variable assignment can be done during declaration and this the
assignment can be removed
+       i = 0;
+       /* subid */
+       values[i++] = ObjectIdGetDatum(subid);

4) I noticed that the worker error is still present when queried from
pg_stat_subscription_workers even after conflict is resolved in the
subscriber and the worker proceeds with applying the other
transactions, should this be documented somewhere?

5) This needs to be aligned, the columns in select have used TAB, we
should align it using spaces.
+CREATE VIEW pg_stat_subscription_workers AS
+    SELECT
+       w.subid,
+       s.subname,
+       w.subrelid,
+       w.relid,
+       w.command,
+       w.xid,
+       w.error_count,
+       w.error_message,
+       w.last_error_time,
+       w.stats_reset

Regards,
Vignesh

#264Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#262)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 3, 2021 at 12:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 2, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Nov 1, 2021 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Fair enough. So statistics can be removed either by vacuum or drop
subscription. Also, if we go by this logic then there is no harm in
retaining the stat entries for tablesync errors. Why have different
behavior for apply and tablesync workers?

My understanding is that the subscription worker statistics entry
corresponds to workers (but not physical workers since the physical
process is changed after restarting). So if the worker finishes its
jobs, it is no longer necessary to show errors since further problems
will not occur after that. Table sync worker’s job finishes when
completing table copy (unless table sync is performed again by REFRESH
PUBLICATION) whereas apply worker’s job finishes when the subscription
is dropped.

Actually, I am not very sure how users can use the old error
information after we allowed skipping the conflicting xid. Say, if
they want to add/remove some constraints on the table based on
previous errors then they might want to refer to errors of both the
apply worker and table sync worker.

I think that in general, statistics should be retained as long as a
corresponding object exists on the database, like other cumulative
statistic views. So I’m concerned that an entry of a cumulative stats
view is automatically removed by a non-stats-related function (i.g.,
ALTER SUBSCRIPTION SKIP). Which seems a new behavior for cumulative
stats views.

We can retain the stats entries for table sync worker but what I want
to avoid is that the view shows many old entries that will never be
updated. I've sometimes seen cases where the user mistakenly restored
table data on the subscriber before creating a subscription, failed
table sync on many tables due to unique violation, and truncated
tables on the subscriber. I think that unlike the stats entries for
apply worker, retaining the stats entries for table sync could be
harmful since it’s likely to be a large amount (even hundreds of
entries). Especially, it could lead to bloat the stats file since it
has an error message. So if we do that, I'd like to provide a function
for users to remove (not reset) stats entries manually.

If we follow the idea of keeping stats at db level (in
PgStat_StatDBEntry) as discussed above then I think we already have a
way to remove stat entries via pg_stat_reset which removes the stats
corresponding to tables, functions and after this patch corresponding
to subscriptions as well for the current database. Won't that be
sufficient? I see your point but I think it may be better if we keep
the same behavior for stats of apply and table sync workers.

Make sense.

Following the tables, functions, I thought of keeping the name of the
reset function similar to "pg_stat_reset_single_table_counters" but I
feel the currently used name "pg_stat_reset_subscription_worker" in
the patch is better. Do let me know what you think?

Yeah, I also tend to prefer pg_stat_reset_subscription_worker name
since "single" isn't clear in the context of subscription worker. And
the behavior of the reset function for subscription workers is also
different from pg_stat_reset_single_xxx_counters.

I've attached an updated patch. In this version patch, subscription
worker statistics are collected per-database and handled in a similar
way to tables and functions. I think perhaps we still need to discuss
details of how the statistics should be handled but I'd like to share
the patch for discussion.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v20-0001-Add-a-subscription-worker-statistics-view-pg_sta.patchapplication/octet-stream; name=v20-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch
#265Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#263)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 5, 2021 at 12:57 AM vignesh C <vignesh21@gmail.com> wrote:

On Fri, Oct 29, 2021 at 10:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 28, 2021 at 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 28, 2021 at 10:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Oct 27, 2021 at 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 21, 2021 at 10:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached updated patches.

Thank you for the comments!

Few comments:
==============
1. Is the patch cleaning tablesync error entries except via vacuum? If
not, can't we send a message to remove tablesync errors once tablesync
is successful (say when we reset skip_xid or when tablesync is
finished) or when we drop subscription? I think the same applies to
apply worker. I think we may want to track it in some way whether an
error has occurred before sending the message but relying completely
on a vacuum might be the recipe of bloat. I think in the case of a
drop subscription we can simply send the message as that is not a
frequent operation. I might be missing something here because in the
tests after drop subscription you are expecting the entries from the
view to get cleared

Yes, I think we can have tablesync worker send a message to drop stats
once tablesync is successful. But if we do that also when dropping a
subscription, I think we need to do that only the transaction is
committed since we can drop a subscription that doesn't have a
replication slot and rollback the transaction. Probably we can send
the message only when the subscritpion does have a replication slot.

Right. And probably for apply worker after updating skip xid.

I'm not sure it's better to drop apply worker stats after resetting
skip xid (i.g., after skipping the transaction). Since the view is a
cumulative view and has last_error_time, I thought we can have the
apply worker stats until the subscription gets dropped. Since the
error reporting message could get lost, no entry in the view doesn’t
mean the worker doesn’t face an issue.

In other cases, we can remember the subscriptions being dropped and
send the message to drop the statistics of them after committing the
transaction but I’m not sure it’s worth having it.

Yeah, let's not go to that extent. I think in most cases subscriptions
will have corresponding slots.

Agreed.

FWIW, we completely

rely on pg_stat_vacuum_stats() for cleaning up the dead tables and
functions. And we don't expect there are many subscriptions on the
database.

True, but we do send it for the database, so let's do it for the cases
you explained in the first paragraph.

Agreed.

I've attached a new version patch. Since the syntax of skipping
transaction id is under the discussion I've attached only the error
reporting patch for now.

Thanks for the updated patch, few comments:
1) This check and return can be moved above CreateTemplateTupleDesc so
that the tuple descriptor need not be created if there is no worker
statistics
+       BlessTupleDesc(tupdesc);
+
+       /* Get subscription worker stats */
+       wentry = pgstat_fetch_subworker(subid, subrelid);
+
+       /* Return NULL if there is no worker statistics */
+       if (wentry == NULL)
+               PG_RETURN_NULL();
+
+       /* Initialise values and NULL flags arrays */
+       MemSet(values, 0, sizeof(values));
+       MemSet(nulls, 0, sizeof(nulls));
2) "NULL for the main apply worker" is mentioned as "null for the main
apply worker" in case of pg_stat_subscription view, we can mention it
similarly.
+      <para>
+       OID of the relation that the worker is synchronizing; NULL for the
+       main apply worker
+      </para></entry>
3) Variable assignment can be done during declaration and this the
assignment can be removed
+       i = 0;
+       /* subid */
+       values[i++] = ObjectIdGetDatum(subid);

4) I noticed that the worker error is still present when queried from
pg_stat_subscription_workers even after conflict is resolved in the
subscriber and the worker proceeds with applying the other
transactions, should this be documented somewhere?

5) This needs to be aligned, the columns in select have used TAB, we
should align it using spaces.
+CREATE VIEW pg_stat_subscription_workers AS
+    SELECT
+       w.subid,
+       s.subname,
+       w.subrelid,
+       w.relid,
+       w.command,
+       w.xid,
+       w.error_count,
+       w.error_message,
+       w.last_error_time,
+       w.stats_reset

Thank you for the comments! These comments are incorporated into the
latest (v20) patch I just submitted[1]/messages/by-id/CAD21AoAT42mhcqeB1jPfRL1+EUHbZk8MMY_fBgsyZvJeKNpG+w@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoAT42mhcqeB1jPfRL1+EUHbZk8MMY_fBgsyZvJeKNpG+w@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#266Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#264)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 8, 2021 at 1:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. In this version patch, subscription
worker statistics are collected per-database and handled in a similar
way to tables and functions. I think perhaps we still need to discuss
details of how the statistics should be handled but I'd like to share
the patch for discussion.

That's for the updated patch.
Some initial comments on the v20 patch:

doc/src/sgml/monitoring.sgml

(1) wording
The word "information" seems to be missing after "showing" (otherwise
is reads "showing about errors", which isn't correct grammar).
I suggest the following change:

BEFORE:
+      <entry>At least one row per subscription, showing about errors that
+      occurred on subscription.
AFTER:
+      <entry>At least one row per subscription, showing information about
+      errors that occurred on subscription.

(2) pg_stat_reset_subscription_worker(subid Oid, relid Oid) function
documentation
The description doesn't read well. I'd suggest the following change:

BEFORE:
* Resets statistics of a single subscription worker statistics.
AFTER:
* Resets the statistics of a single subscription worker.

I think that the documentation for this function should make it clear
that a non-NULL "subid" parameter is required for both reset cases
(tablesync and apply).
Perhaps this could be done by simply changing the first sentence to say:
"Resets the statistics of a single subscription worker, for a worker
running on the subscription with <parameter>subid</parameter>."
(and then can remove " running on the subscription with
<parameter>subid</parameter>" from the last sentence)

I think that the documentation for this function should say that it
should be used in conjunction with the "pg_stat_subscription_workers"
view in order to obtain the required subid/relid values for resetting.
(and should provide a link to the documentation for that view)

Also, I think that the function documentation should make it clear how
to distinguish the tablesync vs apply worker statistics case.
e.g. the tablesync error case is indicated by a null "command" in the
information returned from the "pg_stat_subscription_workers" view
(otherwise it seems a user could only know this by looking at the server log).

Finally, there are currently no tests for this new function.

(3) pg_stat_subscription_workers
In the documentation for this, some users may not realise that "the
initial data copy" refers to "tablesync", so maybe say "the initial
data copy (tablesync)", or similar.

(4) stats_reset
"stats_reset" is currently documented as the last column of the
"pg_stat_subscription_workers" view - but it's actually no longer
included in the view.

(5) src/tools/pgindent/typedefs.list
The following current entries are bogus:
PgStat_MsgSubWorkerErrorPurge
PgStat_MsgSubWorkerPurge

The following entry is missing:
PgStat_MsgSubscriptionPurge

Regards,
Greg Nancarrow
Fujitsu Australia

#267Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Masahiko Sawada (#264)
Re: Skipping logical replication transactions on subscriber side

On Sun, Nov 7, 2021 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. In this version patch, subscription
worker statistics are collected per-database and handled in a similar
way to tables and functions. I think perhaps we still need to discuss
details of how the statistics should be handled but I'd like to share
the patch for discussion.

While reviewing the v20, I have some initial comments,

+     <row>
+      <entry><structname>pg_stat_subscription_workers</structname><indexterm><primary>pg_stat_subscription_workers</primary></indexterm></entry>
+      <entry>At least one row per subscription, showing about errors that
+      occurred on subscription.
+      See <link linkend="monitoring-pg-stat-subscription-workers">
+      <structname>pg_stat_subscription_workers</structname></link> for details.
+      </entry>

1.
I don't like the fact that this view is very specific for showing the
errors but the name of the view is very generic. So are we keeping
this name to expand the scope of the view in the future? If this is
meant only for showing the errors then the name should be more
specific.

2.
Why comment says "At least one row per subscription"? this looks
confusing, I mean if there is no error then there will not be even one
row right?

+  <para>
+   The <structname>pg_stat_subscription_workers</structname> view will contain
+   one row per subscription error reported by workers applying logical
+   replication changes and workers handling the initial data copy of the
+   subscribed tables.
+  </para>

3.
So there will only be one row per subscription? I did not read the
code, but suppose there was an error due to some constraint now if
that constraint is removed and there is a new error then the old error
will be removed immediately or it will be removed by auto vacuum? If
it is not removed immediately then there could be multiple errors per
subscription in the view so the comment is not correct.

4.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>last_error_time</structfield> <type>timestamp
with time zone</type>
+      </para>
+      <para>
+       Time at which the last error occurred
+      </para></entry>
+     </row>

Will it be useful to know when the first time error occurred?

5.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with
time zone</type>
+      </para>
+      <para>

The actual view does not contain this column.

6.
+       <para>
+        Resets statistics of a single subscription worker statistics.

/Resets statistics of a single subscription worker statistics/Resets
statistics of a single subscription worker

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#268Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#264)
Re: Skipping logical replication transactions on subscriber side

On Sun, Nov 7, 2021 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 3, 2021 at 12:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

If we follow the idea of keeping stats at db level (in
PgStat_StatDBEntry) as discussed above then I think we already have a
way to remove stat entries via pg_stat_reset which removes the stats
corresponding to tables, functions and after this patch corresponding
to subscriptions as well for the current database. Won't that be
sufficient? I see your point but I think it may be better if we keep
the same behavior for stats of apply and table sync workers.

Make sense.

We can document this point.

Following the tables, functions, I thought of keeping the name of the
reset function similar to "pg_stat_reset_single_table_counters" but I
feel the currently used name "pg_stat_reset_subscription_worker" in
the patch is better. Do let me know what you think?

Yeah, I also tend to prefer pg_stat_reset_subscription_worker name
since "single" isn't clear in the context of subscription worker. And
the behavior of the reset function for subscription workers is also
different from pg_stat_reset_single_xxx_counters.

I've attached an updated patch. In this version patch, subscription
worker statistics are collected per-database and handled in a similar
way to tables and functions. I think perhaps we still need to discuss
details of how the statistics should be handled but I'd like to share
the patch for discussion.

Do you have something specific in mind to discuss the details of how
stats should be handled?

Few comments/questions:
====================
1.
static void pgstat_reset_replslot(PgStat_StatReplSlotEntry
*slotstats, TimestampTz ts);

+
static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, TimestampTz now);

Spurious line addition.

2. Why now there is no code to deal with dead table sync entries as
compared to previous version of patch?

3. Why do we need two different functions
pg_stat_reset_subscription_worker_sub and
pg_stat_reset_subscription_worker_subrel to handle reset? Isn't it
sufficient to reset all entries for a subscription if relid is
InvalidOid?

4. It seems now stats_reset entry is not present in
pg_stat_subscription_workers? How will users find that information if
required?

--
With Regards,
Amit Kapila.

#269Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#267)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 9, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sun, Nov 7, 2021 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. In this version patch, subscription
worker statistics are collected per-database and handled in a similar
way to tables and functions. I think perhaps we still need to discuss
details of how the statistics should be handled but I'd like to share
the patch for discussion.

While reviewing the v20, I have some initial comments,

+     <row>
+      <entry><structname>pg_stat_subscription_workers</structname><indexterm><primary>pg_stat_subscription_workers</primary></indexterm></entry>
+      <entry>At least one row per subscription, showing about errors that
+      occurred on subscription.
+      See <link linkend="monitoring-pg-stat-subscription-workers">
+      <structname>pg_stat_subscription_workers</structname></link> for details.
+      </entry>

1.
I don't like the fact that this view is very specific for showing the
errors but the name of the view is very generic. So are we keeping
this name to expand the scope of the view in the future?

Yes, we are planning to display some other xact specific stats as well
corresponding to subscription workers. See [1]/messages/by-id/OSBPR01MB48887CA8F40C8D984A6DC00CED199@OSBPR01MB4888.jpnprd01.prod.outlook.com[2]/messages/by-id/CAA4eK1+1n3upCMB-Y_k9b1wPNCtNE7MEHan9kA1s6GNsZGB0Og@mail.gmail.com.

[1]: /messages/by-id/OSBPR01MB48887CA8F40C8D984A6DC00CED199@OSBPR01MB4888.jpnprd01.prod.outlook.com
[2]: /messages/by-id/CAA4eK1+1n3upCMB-Y_k9b1wPNCtNE7MEHan9kA1s6GNsZGB0Og@mail.gmail.com

--
With Regards,
Amit Kapila.

#270Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#268)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 9, 2021 at 3:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Nov 7, 2021 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 3, 2021 at 12:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

If we follow the idea of keeping stats at db level (in
PgStat_StatDBEntry) as discussed above then I think we already have a
way to remove stat entries via pg_stat_reset which removes the stats
corresponding to tables, functions and after this patch corresponding
to subscriptions as well for the current database. Won't that be
sufficient? I see your point but I think it may be better if we keep
the same behavior for stats of apply and table sync workers.

Make sense.

We can document this point.

Okay.

Following the tables, functions, I thought of keeping the name of the
reset function similar to "pg_stat_reset_single_table_counters" but I
feel the currently used name "pg_stat_reset_subscription_worker" in
the patch is better. Do let me know what you think?

Yeah, I also tend to prefer pg_stat_reset_subscription_worker name
since "single" isn't clear in the context of subscription worker. And
the behavior of the reset function for subscription workers is also
different from pg_stat_reset_single_xxx_counters.

I've attached an updated patch. In this version patch, subscription
worker statistics are collected per-database and handled in a similar
way to tables and functions. I think perhaps we still need to discuss
details of how the statistics should be handled but I'd like to share
the patch for discussion.

Do you have something specific in mind to discuss the details of how
stats should be handled?

As you commented, I removed stats_reset column from
pg_stat_subscription_workers view since tables and functions stats
view doesn't have it.

Few comments/questions:
====================
1.
static void pgstat_reset_replslot(PgStat_StatReplSlotEntry
*slotstats, TimestampTz ts);

+
static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, TimestampTz now);

Spurious line addition.

Will fix.

2. Why now there is no code to deal with dead table sync entries as
compared to previous version of patch?

I think we discussed that it's better if we keep the same behavior for
stats of apply and table sync workers. So the table sync entries are
dead after the subscription is dropped, like apply entries. No?

3. Why do we need two different functions
pg_stat_reset_subscription_worker_sub and
pg_stat_reset_subscription_worker_subrel to handle reset? Isn't it
sufficient to reset all entries for a subscription if relid is
InvalidOid?

Since setting InvalidOid to relid means an apply entry we cannot use
it for that purpose.

4. It seems now stats_reset entry is not present in
pg_stat_subscription_workers? How will users find that information if
required?

Users can find it in pg_stat_databases. The same is true for table and
function statistics -- they don't have stats_reset column but reset
stats_reset of its entry on pg_stat_database.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#271Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#269)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 9, 2021 at 11:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 9, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

1.
I don't like the fact that this view is very specific for showing the
errors but the name of the view is very generic. So are we keeping
this name to expand the scope of the view in the future?

Yes, we are planning to display some other xact specific stats as well
corresponding to subscription workers. See [1][2].

[1] - /messages/by-id/OSBPR01MB48887CA8F40C8D984A6DC00CED199@OSBPR01MB4888.jpnprd01.prod.outlook.com
[2] - /messages/by-id/CAA4eK1+1n3upCMB-Y_k9b1wPNCtNE7MEHan9kA1s6GNsZGB0Og@mail.gmail.com

Thanks for pointing me to this thread, I will have a look.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#272Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#270)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 9, 2021 at 12:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 9, 2021 at 3:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

4. It seems now stats_reset entry is not present in
pg_stat_subscription_workers? How will users find that information if
required?

Users can find it in pg_stat_databases. The same is true for table and
function statistics -- they don't have stats_reset column but reset
stats_reset of its entry on pg_stat_database.

Okay, but isn't it better to deal with the reset of subscription
workers via pgstat_recv_resetsinglecounter by introducing subobjectid?
I think that will make code consistent for all database-related stats.

--
With Regards,
Amit Kapila.

#273Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#272)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 9, 2021 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 9, 2021 at 12:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 9, 2021 at 3:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

4. It seems now stats_reset entry is not present in
pg_stat_subscription_workers? How will users find that information if
required?

Users can find it in pg_stat_databases. The same is true for table and
function statistics -- they don't have stats_reset column but reset
stats_reset of its entry on pg_stat_database.

Okay, but isn't it better to deal with the reset of subscription
workers via pgstat_recv_resetsinglecounter by introducing subobjectid?
I think that will make code consistent for all database-related stats.

Agreed. It's better to use the same function internally even if the
SQL-callable interfaces are different.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#274Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#271)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 9, 2021 at 1:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Nov 9, 2021 at 11:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 9, 2021 at 11:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

1.
I don't like the fact that this view is very specific for showing the
errors but the name of the view is very generic. So are we keeping
this name to expand the scope of the view in the future?

Yes, we are planning to display some other xact specific stats as well
corresponding to subscription workers. See [1][2].

[1] - /messages/by-id/OSBPR01MB48887CA8F40C8D984A6DC00CED199@OSBPR01MB4888.jpnprd01.prod.outlook.com
[2] - /messages/by-id/CAA4eK1+1n3upCMB-Y_k9b1wPNCtNE7MEHan9kA1s6GNsZGB0Og@mail.gmail.com

Thanks for pointing me to this thread, I will have a look.

I think we can even add a line in the commit message stating that this
can be extended in the future to track other xact related stats for
subscription workers. I think it will help readers of the patch.

--
With Regards,
Amit Kapila.

#275vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#264)
Re: Skipping logical replication transactions on subscriber side

On Sun, Nov 7, 2021 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 3, 2021 at 12:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 2, 2021 at 2:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 2, 2021 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Nov 1, 2021 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Oct 29, 2021 at 8:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Fair enough. So statistics can be removed either by vacuum or drop
subscription. Also, if we go by this logic then there is no harm in
retaining the stat entries for tablesync errors. Why have different
behavior for apply and tablesync workers?

My understanding is that the subscription worker statistics entry
corresponds to workers (but not physical workers since the physical
process is changed after restarting). So if the worker finishes its
jobs, it is no longer necessary to show errors since further problems
will not occur after that. Table sync worker’s job finishes when
completing table copy (unless table sync is performed again by REFRESH
PUBLICATION) whereas apply worker’s job finishes when the subscription
is dropped.

Actually, I am not very sure how users can use the old error
information after we allowed skipping the conflicting xid. Say, if
they want to add/remove some constraints on the table based on
previous errors then they might want to refer to errors of both the
apply worker and table sync worker.

I think that in general, statistics should be retained as long as a
corresponding object exists on the database, like other cumulative
statistic views. So I’m concerned that an entry of a cumulative stats
view is automatically removed by a non-stats-related function (i.g.,
ALTER SUBSCRIPTION SKIP). Which seems a new behavior for cumulative
stats views.

We can retain the stats entries for table sync worker but what I want
to avoid is that the view shows many old entries that will never be
updated. I've sometimes seen cases where the user mistakenly restored
table data on the subscriber before creating a subscription, failed
table sync on many tables due to unique violation, and truncated
tables on the subscriber. I think that unlike the stats entries for
apply worker, retaining the stats entries for table sync could be
harmful since it’s likely to be a large amount (even hundreds of
entries). Especially, it could lead to bloat the stats file since it
has an error message. So if we do that, I'd like to provide a function
for users to remove (not reset) stats entries manually.

If we follow the idea of keeping stats at db level (in
PgStat_StatDBEntry) as discussed above then I think we already have a
way to remove stat entries via pg_stat_reset which removes the stats
corresponding to tables, functions and after this patch corresponding
to subscriptions as well for the current database. Won't that be
sufficient? I see your point but I think it may be better if we keep
the same behavior for stats of apply and table sync workers.

Make sense.

Following the tables, functions, I thought of keeping the name of the
reset function similar to "pg_stat_reset_single_table_counters" but I
feel the currently used name "pg_stat_reset_subscription_worker" in
the patch is better. Do let me know what you think?

Yeah, I also tend to prefer pg_stat_reset_subscription_worker name
since "single" isn't clear in the context of subscription worker. And
the behavior of the reset function for subscription workers is also
different from pg_stat_reset_single_xxx_counters.

I've attached an updated patch. In this version patch, subscription
worker statistics are collected per-database and handled in a similar
way to tables and functions. I think perhaps we still need to discuss
details of how the statistics should be handled but I'd like to share
the patch for discussion.

Thanks for the updated patch, Few comments:
1) should we change "Tables and functions hashes are initialized to
empty" to "Tables, functions and subworker hashes are initialized to
empty"
+       hash_ctl.keysize = sizeof(PgStat_StatSubWorkerKey);
+       hash_ctl.entrysize = sizeof(PgStat_StatSubWorkerEntry);
+       dbentry->subworkers = hash_create("Per-database subscription worker",
+
   PGSTAT_SUBWORKER_HASH_SIZE,
+
   &hash_ctl,
+
   HASH_ELEM | HASH_BLOBS);
2) Since databaseid, tabhash, funchash and subworkerhash are members
of dbentry, can we remove the function arguments databaseid, tabhash,
funchash and subworkerhash and pass dbentry similar to
pgstat_write_db_statsfile function?
@@ -4370,12 +4582,14 @@ done:
  */
 static void
 pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
-                                                bool permanent)
+                                                HTAB *subworkerhash,
bool permanent)
 {
        PgStat_StatTabEntry *tabentry;
        PgStat_StatTabEntry tabbuf;
        PgStat_StatFuncEntry funcbuf;
        PgStat_StatFuncEntry *funcentry;
+       PgStat_StatSubWorkerEntry subwbuf;
+       PgStat_StatSubWorkerEntry *subwentry;
3) Can we move pgstat_get_subworker_entry below pgstat_get_db_entry
and pgstat_get_tab_entry, so that the hash lookup can be together
consistently. Similarly pgstat_send_subscription_purge can be moved
after pgstat_send_slru.
+/* ----------
+ * pgstat_get_subworker_entry
+ *
+ * Return subscription worker entry with the given subscription OID and
+ * relation OID.  If subrelid is InvalidOid, it returns an entry of the
+ * apply worker otherwise of the table sync worker associated with subrelid.
+ * If no subscription entry exists, initialize it, if the create parameter
+ * is true.  Else, return NULL.
+ * ----------
+ */
+static PgStat_StatSubWorkerEntry *
+pgstat_get_subworker_entry(PgStat_StatDBEntry *dbentry, Oid subid,
Oid subrelid,
+                                                  bool create)
+{
+       PgStat_StatSubWorkerEntry *subwentry;
+       PgStat_StatSubWorkerKey key;
+       bool            found;

4) This change can be removed from pgstat.c:
@@ -332,9 +339,11 @@ static bool pgstat_db_requested(Oid databaseid);
static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData
name, bool create_it);
static void pgstat_reset_replslot(PgStat_StatReplSlotEntry
*slotstats, TimestampTz ts);

+
static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, TimestampTz now);
static void pgstat_send_funcstats(void);

5) I was able to compile without including
catalog/pg_subscription_rel.h, we can remove including
catalog/pg_subscription_rel.h if not required.
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -41,6 +41,8 @@
 #include "catalog/catalog.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_proc.h"
+#include "catalog/pg_subscription.h"
+#include "catalog/pg_subscription_rel.h"
 6) Similarly replication/logicalproto.h also need not be included
 --- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
 #include "pgstat.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
+#include "replication/logicalproto.h"
 #include "replication/slot.h"
 #include "storage/proc.h"
7) There is an extra ";", We can remove one ";" from below:
+       PgStat_StatSubWorkerKey key;
+       bool            found;
+       HASHACTION      action = (create ? HASH_ENTER : HASH_FIND);;
+
+       key.subid = subid;
+       key.subrelid = subrelid;

Regards,
Vignesh

#276Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#266)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 8, 2021 at 4:10 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Nov 8, 2021 at 1:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. In this version patch, subscription
worker statistics are collected per-database and handled in a similar
way to tables and functions. I think perhaps we still need to discuss
details of how the statistics should be handled but I'd like to share
the patch for discussion.

That's for the updated patch.
Some initial comments on the v20 patch:

Thank you for the comments!

doc/src/sgml/monitoring.sgml

(1) wording
The word "information" seems to be missing after "showing" (otherwise
is reads "showing about errors", which isn't correct grammar).
I suggest the following change:

BEFORE:
+      <entry>At least one row per subscription, showing about errors that
+      occurred on subscription.
AFTER:
+      <entry>At least one row per subscription, showing information about
+      errors that occurred on subscription.

Fixed.

(2) pg_stat_reset_subscription_worker(subid Oid, relid Oid) function
documentation
The description doesn't read well. I'd suggest the following change:

BEFORE:
* Resets statistics of a single subscription worker statistics.
AFTER:
* Resets the statistics of a single subscription worker.

I think that the documentation for this function should make it clear
that a non-NULL "subid" parameter is required for both reset cases
(tablesync and apply).
Perhaps this could be done by simply changing the first sentence to say:
"Resets the statistics of a single subscription worker, for a worker
running on the subscription with <parameter>subid</parameter>."
(and then can remove " running on the subscription with
<parameter>subid</parameter>" from the last sentence)

Fixed.

I think that the documentation for this function should say that it
should be used in conjunction with the "pg_stat_subscription_workers"
view in order to obtain the required subid/relid values for resetting.
(and should provide a link to the documentation for that view)

I think it's not necessarily true that users should use
pg_stat_subscription_workers in order to obtain subid/relid since we
can obtain the same also from pg_subscription_rel. But I agree that it
should clarify that this function resets entries of
pg_stat_subscription view. Fixed.

Also, I think that the function documentation should make it clear how
to distinguish the tablesync vs apply worker statistics case.
e.g. the tablesync error case is indicated by a null "command" in the
information returned from the "pg_stat_subscription_workers" view
(otherwise it seems a user could only know this by looking at the server log).

The documentation of pg_stat_subscription_workers explains that
subrelid is always NULL for apply workers. Is it not enough?

Finally, there are currently no tests for this new function.

I've added some tests.

(3) pg_stat_subscription_workers
In the documentation for this, some users may not realise that "the
initial data copy" refers to "tablesync", so maybe say "the initial
data copy (tablesync)", or similar.

Perhaps it's better not to use the term "tablesync" since we don't use
the term anywhere now. Instead, we should say more clearly, say
"subscription worker handling initial data copy of the relation, as
the description pg_stat_subscription says.

(4) stats_reset
"stats_reset" is currently documented as the last column of the
"pg_stat_subscription_workers" view - but it's actually no longer
included in the view.

Removed.

(5) src/tools/pgindent/typedefs.list
The following current entries are bogus:
PgStat_MsgSubWorkerErrorPurge
PgStat_MsgSubWorkerPurge

The following entry is missing:
PgStat_MsgSubscriptionPurge

Fixed.

I'll submit an updated patch soon.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#277Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Dilip Kumar (#267)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 9, 2021 at 3:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sun, Nov 7, 2021 at 7:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. In this version patch, subscription
worker statistics are collected per-database and handled in a similar
way to tables and functions. I think perhaps we still need to discuss
details of how the statistics should be handled but I'd like to share
the patch for discussion.

While reviewing the v20, I have some initial comments,

+     <row>
+      <entry><structname>pg_stat_subscription_workers</structname><indexterm><primary>pg_stat_subscription_workers</primary></indexterm></entry>
+      <entry>At least one row per subscription, showing about errors that
+      occurred on subscription.
+      See <link linkend="monitoring-pg-stat-subscription-workers">
+      <structname>pg_stat_subscription_workers</structname></link> for details.
+      </entry>

1.
I don't like the fact that this view is very specific for showing the
errors but the name of the view is very generic. So are we keeping
this name to expand the scope of the view in the future? If this is
meant only for showing the errors then the name should be more
specific.

As Amit already mentioned, we're planning to add more xact statistics
to this view. I've mentioned that in the commit message.

2.
Why comment says "At least one row per subscription"? this looks
confusing, I mean if there is no error then there will not be even one
row right?

+  <para>
+   The <structname>pg_stat_subscription_workers</structname> view will contain
+   one row per subscription error reported by workers applying logical
+   replication changes and workers handling the initial data copy of the
+   subscribed tables.
+  </para>

Right. Fixed.

3.
So there will only be one row per subscription? I did not read the
code, but suppose there was an error due to some constraint now if
that constraint is removed and there is a new error then the old error
will be removed immediately or it will be removed by auto vacuum? If
it is not removed immediately then there could be multiple errors per
subscription in the view so the comment is not correct.

There is one row per subscription worker (apply worker and tablesync
worker). If the same error consecutively occurred, error_count is
incremented and last_error_time is updated. Otherwise, i.g., if a
different error occurred on the apply worker, all statistics are
updated.

4.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>last_error_time</structfield> <type>timestamp
with time zone</type>
+      </para>
+      <para>
+       Time at which the last error occurred
+      </para></entry>
+     </row>

Will it be useful to know when the first time error occurred?

Good idea. Users can know when the subscription stopped due to this
error. Added.

5.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with
time zone</type>
+      </para>
+      <para>

The actual view does not contain this column.

Removed.

6.
+       <para>
+        Resets statistics of a single subscription worker statistics.

/Resets statistics of a single subscription worker statistics/Resets
statistics of a single subscription worker

Fixed.

I'll update an updated patch soon.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#278Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#275)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 10, 2021 at 12:49 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks for the updated patch, Few comments:

Thank you for the comments!

1) should we change "Tables and functions hashes are initialized to
empty" to "Tables, functions and subworker hashes are initialized to
empty"
+       hash_ctl.keysize = sizeof(PgStat_StatSubWorkerKey);
+       hash_ctl.entrysize = sizeof(PgStat_StatSubWorkerEntry);
+       dbentry->subworkers = hash_create("Per-database subscription worker",
+
PGSTAT_SUBWORKER_HASH_SIZE,
+
&hash_ctl,
+
HASH_ELEM | HASH_BLOBS);

Fixed.

2) Since databaseid, tabhash, funchash and subworkerhash are members
of dbentry, can we remove the function arguments databaseid, tabhash,
funchash and subworkerhash and pass dbentry similar to
pgstat_write_db_statsfile function?
@@ -4370,12 +4582,14 @@ done:
*/
static void
pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash,
-                                                bool permanent)
+                                                HTAB *subworkerhash,
bool permanent)
{
PgStat_StatTabEntry *tabentry;
PgStat_StatTabEntry tabbuf;
PgStat_StatFuncEntry funcbuf;
PgStat_StatFuncEntry *funcentry;
+       PgStat_StatSubWorkerEntry subwbuf;
+       PgStat_StatSubWorkerEntry *subwentry;

As the comment of this function says, this function has the ability to
skip storing per-table or per-function (and or
per-subscription-workers) data, if NULL is passed for the
corresponding hashtable, although that's not used at the moment. IMO
it'd be better to keep such behavior.

3) Can we move pgstat_get_subworker_entry below pgstat_get_db_entry
and pgstat_get_tab_entry, so that the hash lookup can be together
consistently. Similarly pgstat_send_subscription_purge can be moved
after pgstat_send_slru.
+/* ----------
+ * pgstat_get_subworker_entry
+ *
+ * Return subscription worker entry with the given subscription OID and
+ * relation OID.  If subrelid is InvalidOid, it returns an entry of the
+ * apply worker otherwise of the table sync worker associated with subrelid.
+ * If no subscription entry exists, initialize it, if the create parameter
+ * is true.  Else, return NULL.
+ * ----------
+ */
+static PgStat_StatSubWorkerEntry *
+pgstat_get_subworker_entry(PgStat_StatDBEntry *dbentry, Oid subid,
Oid subrelid,
+                                                  bool create)
+{
+       PgStat_StatSubWorkerEntry *subwentry;
+       PgStat_StatSubWorkerKey key;
+       bool            found;

Agreed. Moved.

4) This change can be removed from pgstat.c:
@@ -332,9 +339,11 @@ static bool pgstat_db_requested(Oid databaseid);
static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData
name, bool create_it);
static void pgstat_reset_replslot(PgStat_StatReplSlotEntry
*slotstats, TimestampTz ts);

+
static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, TimestampTz now);
static void pgstat_send_funcstats(void);

Removed.

5) I was able to compile without including
catalog/pg_subscription_rel.h, we can remove including
catalog/pg_subscription_rel.h if not required.
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -41,6 +41,8 @@
#include "catalog/catalog.h"
#include "catalog/pg_database.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_subscription.h"
+#include "catalog/pg_subscription_rel.h"

Removed.

6) Similarly replication/logicalproto.h also need not be included
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -24,6 +24,7 @@
#include "pgstat.h"
#include "postmaster/bgworker_internals.h"
#include "postmaster/postmaster.h"
+#include "replication/logicalproto.h"
#include "replication/slot.h"
#include "storage/proc.h"

Removed;

7) There is an extra ";", We can remove one ";" from below:
+       PgStat_StatSubWorkerKey key;
+       bool            found;
+       HASHACTION      action = (create ? HASH_ENTER : HASH_FIND);;
+
+       key.subid = subid;
+       key.subrelid = subrelid;

Fixed.

I've attached an updated patch that incorporates all comments I got so
far. Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v21-0001-Add-a-subscription-worker-statistics-view-pg_sta.patchapplication/octet-stream; name=v21-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch
#279Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#278)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporates all comments I got so
far. Please review it.

Thanks for the updated patch.
A few minor comments:

doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml

(1) tab in doc updates

There's a tab before "Otherwise,":

+ copy of the relation with <parameter>relid</parameter>.
Otherwise,

src/backend/utils/adt/pgstatfuncs.c

(2) The function comment for "pg_stat_reset_subscription_worker_sub"
seems a bit long and I expected it to be multi-line (did you run
pg_indent?)

src/include/pgstat.h

(3) Remove PgStat_StatSubWorkerEntry.dbid?

The "dbid" member of the new PgStat_StatSubWorkerEntry struct doesn't
seem to be used, so I think it should be removed.
(I could remove it and everything builds OK and tests pass).

Regards,
Greg Nancarrow
Fujitsu Australia

#280Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#279)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporates all comments I got so
far. Please review it.

Thanks for the updated patch.
A few minor comments:

doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml

(1) tab in doc updates

There's a tab before "Otherwise,":

+ copy of the relation with <parameter>relid</parameter>.
Otherwise,

Fixed.

src/backend/utils/adt/pgstatfuncs.c

(2) The function comment for "pg_stat_reset_subscription_worker_sub"
seems a bit long and I expected it to be multi-line (did you run
pg_indent?)

I ran pg_indent on pgstatfuncs.c but it didn't become a multi-line comment.

src/include/pgstat.h

(3) Remove PgStat_StatSubWorkerEntry.dbid?

The "dbid" member of the new PgStat_StatSubWorkerEntry struct doesn't
seem to be used, so I think it should be removed.
(I could remove it and everything builds OK and tests pass).

Fixed.

Thank you for the comments! I've updated an updated version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v22-0001-Add-a-subscription-worker-statistics-view-pg_sta.patchapplication/octet-stream; name=v22-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch
#281vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#280)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporates all comments I got so
far. Please review it.

Thanks for the updated patch.
A few minor comments:

doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml

(1) tab in doc updates

There's a tab before "Otherwise,":

+ copy of the relation with <parameter>relid</parameter>.
Otherwise,

Fixed.

src/backend/utils/adt/pgstatfuncs.c

(2) The function comment for "pg_stat_reset_subscription_worker_sub"
seems a bit long and I expected it to be multi-line (did you run
pg_indent?)

I ran pg_indent on pgstatfuncs.c but it didn't become a multi-line comment.

src/include/pgstat.h

(3) Remove PgStat_StatSubWorkerEntry.dbid?

The "dbid" member of the new PgStat_StatSubWorkerEntry struct doesn't
seem to be used, so I think it should be removed.
(I could remove it and everything builds OK and tests pass).

Fixed.

Thank you for the comments! I've updated an updated version patch.

Thanks for the updated patch.
I found one issue:
This Assert can fail in few cases:
+void
+pgstat_report_subworker_error(Oid subid, Oid subrelid, Oid relid,
+
LogicalRepMsgType command, TransactionId xid,
+                                                         const char *errmsg)
+{
+       PgStat_MsgSubWorkerError msg;
+       int                     len;
+
+       Assert(strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN);
+       len = offsetof(PgStat_MsgSubWorkerError, m_message[0]) +
strlen(errmsg) + 1;
+

I could reproduce the problem with the following scenario:
Publisher:
create table t1 (c1 varchar);
create publication pub1 for table t1;
insert into t1 values(repeat('abcd', 5000));

Subscriber:
create table t1(c1 smallint);
create subscription sub1 connection 'dbname=postgres port=5432'
publication pub1 with ( two_phase = true);
postgres=# select * from pg_stat_subscription_workers;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Subscriber logs:
2021-11-15 19:27:56.380 IST [15685] LOG: logical replication apply
worker for subscription "sub1" has started
2021-11-15 19:27:56.384 IST [15687] LOG: logical replication table
synchronization worker for subscription "sub1", table "t1" has started
TRAP: FailedAssertion("strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN",
File: "pgstat.c", Line: 1946, PID: 15687)
postgres: logical replication worker for subscription 16387 sync 16384
(ExceptionalCondition+0xd0)[0x55a18f3c727f]
postgres: logical replication worker for subscription 16387 sync 16384
(pgstat_report_subworker_error+0x7a)[0x55a18f126417]
postgres: logical replication worker for subscription 16387 sync 16384
(ApplyWorkerMain+0x493)[0x55a18f176611]
postgres: logical replication worker for subscription 16387 sync 16384
(StartBackgroundWorker+0x23c)[0x55a18f11f7e2]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54efc0)[0x55a18f134fc0]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54f3af)[0x55a18f1353af]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54e338)[0x55a18f134338]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7feef84371f0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x57)[0x7feef81e3ac7]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x5498c2)[0x55a18f12f8c2]
postgres: logical replication worker for subscription 16387 sync 16384
(PostmasterMain+0x134c)[0x55a18f12f1dd]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x43c3d4)[0x55a18f0223d4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7feef80fd565]
postgres: logical replication worker for subscription 16387 sync 16384
(_start+0x2e)[0x55a18ecaf4fe]
2021-11-15 19:27:56.483 IST [15645] LOG: background worker "logical
replication worker" (PID 15687) was terminated by signal 6: Aborted
2021-11-15 19:27:56.483 IST [15645] LOG: terminating any other active
server processes
2021-11-15 19:27:56.485 IST [15645] LOG: all server processes
terminated; reinitializing

Here it fails because of a long error message ""invalid input syntax
for type smallint:
\"abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabc...."
because we try to insert varchar type data into smallint type. Maybe
we should trim the error message in this case.

Regards,
Vignesh

#282Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#281)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 15, 2021 at 11:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporates all comments I got so
far. Please review it.

Thanks for the updated patch.
A few minor comments:

doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml

(1) tab in doc updates

There's a tab before "Otherwise,":

+ copy of the relation with <parameter>relid</parameter>.
Otherwise,

Fixed.

src/backend/utils/adt/pgstatfuncs.c

(2) The function comment for "pg_stat_reset_subscription_worker_sub"
seems a bit long and I expected it to be multi-line (did you run
pg_indent?)

I ran pg_indent on pgstatfuncs.c but it didn't become a multi-line comment.

src/include/pgstat.h

(3) Remove PgStat_StatSubWorkerEntry.dbid?

The "dbid" member of the new PgStat_StatSubWorkerEntry struct doesn't
seem to be used, so I think it should be removed.
(I could remove it and everything builds OK and tests pass).

Fixed.

Thank you for the comments! I've updated an updated version patch.

Thanks for the updated patch.
I found one issue:
This Assert can fail in few cases:
+void
+pgstat_report_subworker_error(Oid subid, Oid subrelid, Oid relid,
+
LogicalRepMsgType command, TransactionId xid,
+                                                         const char *errmsg)
+{
+       PgStat_MsgSubWorkerError msg;
+       int                     len;
+
+       Assert(strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN);
+       len = offsetof(PgStat_MsgSubWorkerError, m_message[0]) +
strlen(errmsg) + 1;
+

I could reproduce the problem with the following scenario:
Publisher:
create table t1 (c1 varchar);
create publication pub1 for table t1;
insert into t1 values(repeat('abcd', 5000));

Subscriber:
create table t1(c1 smallint);
create subscription sub1 connection 'dbname=postgres port=5432'
publication pub1 with ( two_phase = true);
postgres=# select * from pg_stat_subscription_workers;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Subscriber logs:
2021-11-15 19:27:56.380 IST [15685] LOG: logical replication apply
worker for subscription "sub1" has started
2021-11-15 19:27:56.384 IST [15687] LOG: logical replication table
synchronization worker for subscription "sub1", table "t1" has started
TRAP: FailedAssertion("strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN",
File: "pgstat.c", Line: 1946, PID: 15687)
postgres: logical replication worker for subscription 16387 sync 16384
(ExceptionalCondition+0xd0)[0x55a18f3c727f]
postgres: logical replication worker for subscription 16387 sync 16384
(pgstat_report_subworker_error+0x7a)[0x55a18f126417]
postgres: logical replication worker for subscription 16387 sync 16384
(ApplyWorkerMain+0x493)[0x55a18f176611]
postgres: logical replication worker for subscription 16387 sync 16384
(StartBackgroundWorker+0x23c)[0x55a18f11f7e2]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54efc0)[0x55a18f134fc0]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54f3af)[0x55a18f1353af]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54e338)[0x55a18f134338]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7feef84371f0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x57)[0x7feef81e3ac7]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x5498c2)[0x55a18f12f8c2]
postgres: logical replication worker for subscription 16387 sync 16384
(PostmasterMain+0x134c)[0x55a18f12f1dd]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x43c3d4)[0x55a18f0223d4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7feef80fd565]
postgres: logical replication worker for subscription 16387 sync 16384
(_start+0x2e)[0x55a18ecaf4fe]
2021-11-15 19:27:56.483 IST [15645] LOG: background worker "logical
replication worker" (PID 15687) was terminated by signal 6: Aborted
2021-11-15 19:27:56.483 IST [15645] LOG: terminating any other active
server processes
2021-11-15 19:27:56.485 IST [15645] LOG: all server processes
terminated; reinitializing

Here it fails because of a long error message ""invalid input syntax
for type smallint:

Good catch!

\"abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabc...."
because we try to insert varchar type data into smallint type. Maybe
we should trim the error message in this case.

Right. I've fixed this issue and attached an updated patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v23-0001-Add-a-subscription-worker-statistics-view-pg_sta.patchapplication/octet-stream; name=v23-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch
#283houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#282)
RE: Skipping logical replication transactions on subscriber side

On Tues, Nov 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Hi,

Thanks for updating the patch.
Here are few comments.

1)

+ <function>pg_stat_reset_subscription_worker</function> ( <parameter>subid</parameter> <type>oid</type>, <optional> <parameter>relid</parameter> <type>oid</type> </optional> )

It seems we should put '<optional>' before the comma(',').

2)
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subrelid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the relation that the worker is synchronizing; null for the
+       main apply worker
+      </para></entry>
+     </row>

Is the 'subrelid' only used for distinguishing the worker type ? If so, would it
be clear to have a string value here. I recalled the previous version patch has
failure_source column but was removed. Maybe I missed something.

3)
.
+extern void pgstat_reset_subworker_stats(Oid subid, Oid subrelid, bool allstats);

I didn't find the code of this functions, maybe we can remove this declaration ?

Best regards,
Hou zj

#284Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: houzj.fnst@fujitsu.com (#283)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 17, 2021 at 9:13 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tues, Nov 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2)
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subrelid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the relation that the worker is synchronizing; null for the
+       main apply worker
+      </para></entry>
+     </row>

Is the 'subrelid' only used for distinguishing the worker type ?

I think it will additionally tell which table sync worker as well.

If so, would it
be clear to have a string value here. I recalled the previous version patch has
failure_source column but was removed. Maybe I missed something.

I also don't remember the reason for this but like to know.

I am also reviewing the latest version of the patch and will share
comments/questions sometime today.

--
With Regards,
Amit Kapila.

#285Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#284)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 17, 2021 at 1:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 17, 2021 at 9:13 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tues, Nov 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2)
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subrelid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the relation that the worker is synchronizing; null for the
+       main apply worker
+      </para></entry>
+     </row>

Is the 'subrelid' only used for distinguishing the worker type ?

I think it will additionally tell which table sync worker as well.

Right.

If so, would it
be clear to have a string value here. I recalled the previous version patch has
failure_source column but was removed. Maybe I missed something.

I also don't remember the reason for this but like to know.

I felt it's a bit redundant. Setting subrelid to NULL already means
that it’s an entry for a tablesync worker. If users want the value
like “apply” or “tablesync” for each entry, they can use the subrelid
value.

I am also reviewing the latest version of the patch and will share
comments/questions sometime today.

Thanks!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#286vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#282)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 11:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporates all comments I got so
far. Please review it.

Thanks for the updated patch.
A few minor comments:

doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml

(1) tab in doc updates

There's a tab before "Otherwise,":

+ copy of the relation with <parameter>relid</parameter>.
Otherwise,

Fixed.

src/backend/utils/adt/pgstatfuncs.c

(2) The function comment for "pg_stat_reset_subscription_worker_sub"
seems a bit long and I expected it to be multi-line (did you run
pg_indent?)

I ran pg_indent on pgstatfuncs.c but it didn't become a multi-line comment.

src/include/pgstat.h

(3) Remove PgStat_StatSubWorkerEntry.dbid?

The "dbid" member of the new PgStat_StatSubWorkerEntry struct doesn't
seem to be used, so I think it should be removed.
(I could remove it and everything builds OK and tests pass).

Fixed.

Thank you for the comments! I've updated an updated version patch.

Thanks for the updated patch.
I found one issue:
This Assert can fail in few cases:
+void
+pgstat_report_subworker_error(Oid subid, Oid subrelid, Oid relid,
+
LogicalRepMsgType command, TransactionId xid,
+                                                         const char *errmsg)
+{
+       PgStat_MsgSubWorkerError msg;
+       int                     len;
+
+       Assert(strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN);
+       len = offsetof(PgStat_MsgSubWorkerError, m_message[0]) +
strlen(errmsg) + 1;
+

I could reproduce the problem with the following scenario:
Publisher:
create table t1 (c1 varchar);
create publication pub1 for table t1;
insert into t1 values(repeat('abcd', 5000));

Subscriber:
create table t1(c1 smallint);
create subscription sub1 connection 'dbname=postgres port=5432'
publication pub1 with ( two_phase = true);
postgres=# select * from pg_stat_subscription_workers;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Subscriber logs:
2021-11-15 19:27:56.380 IST [15685] LOG: logical replication apply
worker for subscription "sub1" has started
2021-11-15 19:27:56.384 IST [15687] LOG: logical replication table
synchronization worker for subscription "sub1", table "t1" has started
TRAP: FailedAssertion("strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN",
File: "pgstat.c", Line: 1946, PID: 15687)
postgres: logical replication worker for subscription 16387 sync 16384
(ExceptionalCondition+0xd0)[0x55a18f3c727f]
postgres: logical replication worker for subscription 16387 sync 16384
(pgstat_report_subworker_error+0x7a)[0x55a18f126417]
postgres: logical replication worker for subscription 16387 sync 16384
(ApplyWorkerMain+0x493)[0x55a18f176611]
postgres: logical replication worker for subscription 16387 sync 16384
(StartBackgroundWorker+0x23c)[0x55a18f11f7e2]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54efc0)[0x55a18f134fc0]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54f3af)[0x55a18f1353af]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54e338)[0x55a18f134338]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7feef84371f0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x57)[0x7feef81e3ac7]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x5498c2)[0x55a18f12f8c2]
postgres: logical replication worker for subscription 16387 sync 16384
(PostmasterMain+0x134c)[0x55a18f12f1dd]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x43c3d4)[0x55a18f0223d4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7feef80fd565]
postgres: logical replication worker for subscription 16387 sync 16384
(_start+0x2e)[0x55a18ecaf4fe]
2021-11-15 19:27:56.483 IST [15645] LOG: background worker "logical
replication worker" (PID 15687) was terminated by signal 6: Aborted
2021-11-15 19:27:56.483 IST [15645] LOG: terminating any other active
server processes
2021-11-15 19:27:56.485 IST [15645] LOG: all server processes
terminated; reinitializing

Here it fails because of a long error message ""invalid input syntax
for type smallint:

Good catch!

\"abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabc...."
because we try to insert varchar type data into smallint type. Maybe
we should trim the error message in this case.

Right. I've fixed this issue and attached an updated patch.

Thanks for the updated patch. The issue is fixed in the patch provided.
I found that in one of the scenarios the statistics is getting lost:
Test steps:
Step 1:
Setup Publisher(create 100 publications pub1...pub100 for t1...t100) like below:
===============================================
create table t1(c1 int);
create publication pub1 for table t1;
insert into t1 values(10);
insert into t1 values(10);
create table t2(c1 int);
create publication pub1 for table t2;
insert into t2 values(10);
insert into t2 values(10);
....

Script can be generated using:
while [ $a -lt 100 ]
do
a=`expr $a + 1`
echo "./psql -d postgres -p 5432 -c \"create table t$a(c1
int);\"" >> publisher.sh
echo "./psql -d postgres -p 5432 -c \"create publication pub$a
for table t$a;\"" >> publisher.sh
echo "./psql -d postgres -p 5432 -c \"insert into t$a
values(10);\"" >> publisher.sh
echo "./psql -d postgres -p 5432 -c \"insert into t$a
values(10);\"" >> publisher.sh
done

Step 2:
Setup Subscriber(create 100 subscriptions):
===============================================
create table t1(c1 int primary key);
create subscription sub1 connection 'dbname=postgres port=5432'
publication pub1;
create table t2(c1 int primary key);
create subscription sub2 connection 'dbname=postgres port=5432'
publication pub2;
....

Script can be generated using:
while [ $a -lt 100]
do
a=`expr $a + 1`
echo "./psql -d postgres -p 5433 -c \"create table t$a(c1 int
primary key);\"" >> subscriber.sh
echo "./psql -d postgres -p 5433 -c \"create subscription
sub$a connection 'dbname=postgres port=5432' publication pub$a;\"" >>
subscriber.sh
done

Step 3:
postgres=# select * from pg_stat_subscription_workers order by subid;
subid | subname | subrelid | relid | command | xid | error_count |
error_message | first_error_time | last_error_time
-------+---------+----------+-------+---------+-----+-------------+------------------------------------------------------------+----------------------------------+----------------------------------
16389 | sub1 | 16384 | 16384 | | | 17 | duplicate key value violates
unique constraint "t1_pkey" | 2021-11-17 12:01:46.141086+05:30 |
2021-11-17 12:03:13.175698+05:30
16395 | sub2 | 16390 | 16390 | | | 16 | duplicate key value violates
unique constraint "t2_pkey" | 2021-11-17 12:01:51.337055+05:30 |
2021-11-17 12:03:15.512249+05:30
16401 | sub3 | 16396 | 16396 | | | 16 | duplicate key value violates
unique constraint "t3_pkey" | 2021-11-17 12:01:51.352157+05:30 |
2021-11-17 12:03:15.802225+05:30
16407 | sub4 | 16402 | 16402 | | | 16 | duplicate key value violates
unique constraint "t4_pkey" | 2021-11-17 12:01:51.390638+05:30 |
2021-11-17 12:03:14.709496+05:30
16413 | sub5 | 16408 | 16408 | | | 16 | duplicate key value violates
unique constraint "t5_pkey" | 2021-11-17 12:01:51.418825+05:30 |
2021-11-17 12:03:15.257235+05:30

Step 4:
Then restart the publisher

Step 5:
postgres=# select * from pg_stat_subscription_workers order by subid;
subid | subname | subrelid | relid | command | xid | error_count |
error_message |
first_error_time | last_error_time
-------+---------+----------+-------+---------+-----+-------------+------------------------------------------------------------------------------------------------------------------------------------------+-----
-----------------------------+----------------------------------
16389 | sub1 | 16384 | 16384 | | | 1 | could not create replication
slot "pg_16389_sync_16384_7031422794938304519": FATAL: terminating
connection due to administrator command+| 2021
-11-17 12:03:28.201247+05:30 | 2021-11-17 12:03:28.201247+05:30
| | | | | | | server closed the connection unexpectedly +|
|
| | | | | | | This probably means the server terminated abnormally +|
|
| | | | | | | before or while proce |
|
16395 | sub2 | 16390 | 16390 | | | 18 | duplicate key value violates
unique constraint "t2_pkey" | 2021
-11-17 12:01:51.337055+05:30 | 2021-11-17 12:03:23.832585+05:30
16401 | sub3 | 16396 | 16396 | | | 18 | duplicate key value violates
unique constraint "t3_pkey" | 2021
-11-17 12:01:51.352157+05:30 | 2021-11-17 12:03:26.567873+05:30
16407 | sub4 | 16402 | 16402 | | | 1 | could not create replication
slot "pg_16407_sync_16402_7031422794938304519": FATAL: terminating
connection due to administrator command+| 2021
-11-17 12:03:28.196958+05:30 | 2021-11-17 12:03:28.196958+05:30
| | | | | | | server closed the connection unexpectedly +|
|
| | | | | | | This probably means the server terminated abnormally +|
|
| | | | | | | before or while proce |
|
16413 | sub5 | 16408 | 16408 | | | 18 | duplicate key value violates
unique constraint "t5_pkey" | 2021
-11-17 12:01:51.418825+05:30 | 2021-11-17 12:03:25.595697+05:30

Step 6:
postgres=# select * from pg_stat_subscription_workers order by subid;
subid | subname | subrelid | relid | command | xid | error_count |
error_message | first_error_time | last_error_time
-------+---------+----------+-------+---------+-----+-------------+------------------------------------------------------------+----------------------------------+----------------------------------
16389 | sub1 | 16384 | 16384 | | | 1 | duplicate key value violates
unique constraint "t1_pkey" | 2021-11-17 12:03:33.346514+05:30 |
2021-11-17 12:03:33.346514+05:30
16395 | sub2 | 16390 | 16390 | | | 19 | duplicate key value violates
unique constraint "t2_pkey" | 2021-11-17 12:01:51.337055+05:30 |
2021-11-17 12:03:33.437505+05:30
16401 | sub3 | 16396 | 16396 | | | 19 | duplicate key value violates
unique constraint "t3_pkey" | 2021-11-17 12:01:51.352157+05:30 |
2021-11-17 12:03:33.482954+05:30
16407 | sub4 | 16402 | 16402 | | | 1 | duplicate key value violates
unique constraint "t4_pkey" | 2021-11-17 12:03:33.327489+05:30 |
2021-11-17 12:03:33.327489+05:30
16413 | sub5 | 16408 | 16408 | | | 19 | duplicate key value violates
unique constraint "t5_pkey" | 2021-11-17 12:01:51.418825+05:30 |
2021-11-17 12:03:33.374522+05:30

We can see that sub1 and sub4 statistics are lost, old error_count
value is lost. I'm not sure if this behavior is ok or not. Thoughts?

Regards,
Vignesh

#287Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#286)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 17, 2021 at 3:52 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 11:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporates all comments I got so
far. Please review it.

Thanks for the updated patch.
A few minor comments:

doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml

(1) tab in doc updates

There's a tab before "Otherwise,":

+ copy of the relation with <parameter>relid</parameter>.
Otherwise,

Fixed.

src/backend/utils/adt/pgstatfuncs.c

(2) The function comment for "pg_stat_reset_subscription_worker_sub"
seems a bit long and I expected it to be multi-line (did you run
pg_indent?)

I ran pg_indent on pgstatfuncs.c but it didn't become a multi-line comment.

src/include/pgstat.h

(3) Remove PgStat_StatSubWorkerEntry.dbid?

The "dbid" member of the new PgStat_StatSubWorkerEntry struct doesn't
seem to be used, so I think it should be removed.
(I could remove it and everything builds OK and tests pass).

Fixed.

Thank you for the comments! I've updated an updated version patch.

Thanks for the updated patch.
I found one issue:
This Assert can fail in few cases:
+void
+pgstat_report_subworker_error(Oid subid, Oid subrelid, Oid relid,
+
LogicalRepMsgType command, TransactionId xid,
+                                                         const char *errmsg)
+{
+       PgStat_MsgSubWorkerError msg;
+       int                     len;
+
+       Assert(strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN);
+       len = offsetof(PgStat_MsgSubWorkerError, m_message[0]) +
strlen(errmsg) + 1;
+

I could reproduce the problem with the following scenario:
Publisher:
create table t1 (c1 varchar);
create publication pub1 for table t1;
insert into t1 values(repeat('abcd', 5000));

Subscriber:
create table t1(c1 smallint);
create subscription sub1 connection 'dbname=postgres port=5432'
publication pub1 with ( two_phase = true);
postgres=# select * from pg_stat_subscription_workers;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Subscriber logs:
2021-11-15 19:27:56.380 IST [15685] LOG: logical replication apply
worker for subscription "sub1" has started
2021-11-15 19:27:56.384 IST [15687] LOG: logical replication table
synchronization worker for subscription "sub1", table "t1" has started
TRAP: FailedAssertion("strlen(errmsg) < PGSTAT_SUBWORKERERROR_MSGLEN",
File: "pgstat.c", Line: 1946, PID: 15687)
postgres: logical replication worker for subscription 16387 sync 16384
(ExceptionalCondition+0xd0)[0x55a18f3c727f]
postgres: logical replication worker for subscription 16387 sync 16384
(pgstat_report_subworker_error+0x7a)[0x55a18f126417]
postgres: logical replication worker for subscription 16387 sync 16384
(ApplyWorkerMain+0x493)[0x55a18f176611]
postgres: logical replication worker for subscription 16387 sync 16384
(StartBackgroundWorker+0x23c)[0x55a18f11f7e2]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54efc0)[0x55a18f134fc0]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54f3af)[0x55a18f1353af]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x54e338)[0x55a18f134338]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7feef84371f0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x57)[0x7feef81e3ac7]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x5498c2)[0x55a18f12f8c2]
postgres: logical replication worker for subscription 16387 sync 16384
(PostmasterMain+0x134c)[0x55a18f12f1dd]
postgres: logical replication worker for subscription 16387 sync 16384
(+0x43c3d4)[0x55a18f0223d4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7feef80fd565]
postgres: logical replication worker for subscription 16387 sync 16384
(_start+0x2e)[0x55a18ecaf4fe]
2021-11-15 19:27:56.483 IST [15645] LOG: background worker "logical
replication worker" (PID 15687) was terminated by signal 6: Aborted
2021-11-15 19:27:56.483 IST [15645] LOG: terminating any other active
server processes
2021-11-15 19:27:56.485 IST [15645] LOG: all server processes
terminated; reinitializing

Here it fails because of a long error message ""invalid input syntax
for type smallint:

Good catch!

\"abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabc...."
because we try to insert varchar type data into smallint type. Maybe
we should trim the error message in this case.

Right. I've fixed this issue and attached an updated patch.

Thanks for the updated patch. The issue is fixed in the patch provided.
I found that in one of the scenarios the statistics is getting lost:

Thank you for the tests!!

Step 3:
postgres=# select * from pg_stat_subscription_workers order by subid;
subid | subname | subrelid | relid | command | xid | error_count |
error_message | first_error_time | last_error_time
-------+---------+----------+-------+---------+-----+-------------+------------------------------------------------------------+----------------------------------+----------------------------------
16389 | sub1 | 16384 | 16384 | | | 17 | duplicate key value violates
unique constraint "t1_pkey" | 2021-11-17 12:01:46.141086+05:30 |
2021-11-17 12:03:13.175698+05:30
16395 | sub2 | 16390 | 16390 | | | 16 | duplicate key value violates
unique constraint "t2_pkey" | 2021-11-17 12:01:51.337055+05:30 |
2021-11-17 12:03:15.512249+05:30
16401 | sub3 | 16396 | 16396 | | | 16 | duplicate key value violates
unique constraint "t3_pkey" | 2021-11-17 12:01:51.352157+05:30 |
2021-11-17 12:03:15.802225+05:30
16407 | sub4 | 16402 | 16402 | | | 16 | duplicate key value violates
unique constraint "t4_pkey" | 2021-11-17 12:01:51.390638+05:30 |
2021-11-17 12:03:14.709496+05:30
16413 | sub5 | 16408 | 16408 | | | 16 | duplicate key value violates
unique constraint "t5_pkey" | 2021-11-17 12:01:51.418825+05:30 |
2021-11-17 12:03:15.257235+05:30

Step 4:
Then restart the publisher

Step 5:
postgres=# select * from pg_stat_subscription_workers order by subid;
subid | subname | subrelid | relid | command | xid | error_count |
error_message |
first_error_time | last_error_time
-------+---------+----------+-------+---------+-----+-------------+------------------------------------------------------------------------------------------------------------------------------------------+-----
-----------------------------+----------------------------------
16389 | sub1 | 16384 | 16384 | | | 1 | could not create replication
slot "pg_16389_sync_16384_7031422794938304519": FATAL: terminating
connection due to administrator command+| 2021
-11-17 12:03:28.201247+05:30 | 2021-11-17 12:03:28.201247+05:30
| | | | | | | server closed the connection unexpectedly +|
|
| | | | | | | This probably means the server terminated abnormally +|
|
| | | | | | | before or while proce |
|
16395 | sub2 | 16390 | 16390 | | | 18 | duplicate key value violates
unique constraint "t2_pkey" | 2021
-11-17 12:01:51.337055+05:30 | 2021-11-17 12:03:23.832585+05:30
16401 | sub3 | 16396 | 16396 | | | 18 | duplicate key value violates
unique constraint "t3_pkey" | 2021
-11-17 12:01:51.352157+05:30 | 2021-11-17 12:03:26.567873+05:30
16407 | sub4 | 16402 | 16402 | | | 1 | could not create replication
slot "pg_16407_sync_16402_7031422794938304519": FATAL: terminating
connection due to administrator command+| 2021
-11-17 12:03:28.196958+05:30 | 2021-11-17 12:03:28.196958+05:30
| | | | | | | server closed the connection unexpectedly +|
|
| | | | | | | This probably means the server terminated abnormally +|
|
| | | | | | | before or while proce |
|
16413 | sub5 | 16408 | 16408 | | | 18 | duplicate key value violates
unique constraint "t5_pkey" | 2021
-11-17 12:01:51.418825+05:30 | 2021-11-17 12:03:25.595697+05:30

Step 6:
postgres=# select * from pg_stat_subscription_workers order by subid;
subid | subname | subrelid | relid | command | xid | error_count |
error_message | first_error_time | last_error_time
-------+---------+----------+-------+---------+-----+-------------+------------------------------------------------------------+----------------------------------+----------------------------------
16389 | sub1 | 16384 | 16384 | | | 1 | duplicate key value violates
unique constraint "t1_pkey" | 2021-11-17 12:03:33.346514+05:30 |
2021-11-17 12:03:33.346514+05:30
16395 | sub2 | 16390 | 16390 | | | 19 | duplicate key value violates
unique constraint "t2_pkey" | 2021-11-17 12:01:51.337055+05:30 |
2021-11-17 12:03:33.437505+05:30
16401 | sub3 | 16396 | 16396 | | | 19 | duplicate key value violates
unique constraint "t3_pkey" | 2021-11-17 12:01:51.352157+05:30 |
2021-11-17 12:03:33.482954+05:30
16407 | sub4 | 16402 | 16402 | | | 1 | duplicate key value violates
unique constraint "t4_pkey" | 2021-11-17 12:03:33.327489+05:30 |
2021-11-17 12:03:33.327489+05:30
16413 | sub5 | 16408 | 16408 | | | 19 | duplicate key value violates
unique constraint "t5_pkey" | 2021-11-17 12:01:51.418825+05:30 |
2021-11-17 12:03:33.374522+05:30

We can see that sub1 and sub4 statistics are lost, old error_count
value is lost. I'm not sure if this behavior is ok or not. Thoughts?

Looking at the outputs of steps 3, 5, and 6, the error messages are
different. In the current design, error_count is incremented only when
the exact same error (i.g., xid, command, relid, error message are the
same) comes. Since some different kinds of errors happened on the
subscription the error_count was reset. Similarly, the
first_error_time value was also reset.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#288vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#282)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 11:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Few comments:
1) should we set subwentry to NULL to handle !create && !found case
or we could return NULL similar to the earlier function.
+static PgStat_StatSubWorkerEntry *
+pgstat_get_subworker_entry(PgStat_StatDBEntry *dbentry, Oid subid,
Oid subrelid,
+                                                  bool create)
+{
+       PgStat_StatSubWorkerEntry *subwentry;
+       PgStat_StatSubWorkerKey key;
+       bool            found;
+       HASHACTION      action = (create ? HASH_ENTER : HASH_FIND);
+
+       key.subid = subid;
+       key.subrelid = subrelid;
+       subwentry = (PgStat_StatSubWorkerEntry *)
hash_search(dbentry->subworkers,
+
                                           (void *) &key,
+
                                           action, &found);
+
+       /* If not found, initialize the new one */
+       if (create && !found)
2) Should we keep the line width to 80 chars:
+/* ----------
+ * PgStat_MsgSubWorkerError            Sent by the apply worker or
the table sync worker to
+ *                                                             report
the error occurred during logical replication.
+ * ----------
+ */
+#define PGSTAT_SUBWORKERERROR_MSGLEN 256
+typedef struct PgStat_MsgSubWorkerError
+{

Regards,
Vignesh

#289Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#282)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Few comments/questions:
=====================
1.
+  <para>
+   The <structname>pg_stat_subscription_workers</structname> view will contain
+   one row per subscription error reported by workers applying logical
+   replication changes and workers handling the initial data copy of the
+   subscribed tables.  The statistics entry is removed when the subscription
+   the worker is running on is removed.
+  </para>

The last line of this paragraph is not clear to me. First "the" before
"worker" in the following part of the sentence seems unnecessary
"..when the subscription the worker..". Then the part "running on is
removed" is unclear because it could also mean that we remove the
entry when a subscription is disabled. Can we rephrase it to: "The
statistics entry is removed when the corresponding subscription is
dropped"?

2.
Between v20 and v23 versions of patch the size of hash table
PGSTAT_SUBWORKER_HASH_SIZE is increased from 32 to 256. I might have
missed the comment which lead to this change, can you point me to the
same or if you changed it for some other reason, can you let me know
the same?

3.
+
+ /*
+ * Repeat for subscription workers.  Similarly, we needn't bother
+ * in the common case where no function stats are being collected.
+ */

/function/subscription workers'

4.
+      <para>
+       Name of command being applied when the error occurred.  This field
+       is always NULL if the error was reported during the initial data
+       copy.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>xid</structfield> <type>xid</type>
+      </para>
+      <para>
+       Transaction ID of the publisher node being applied when the error
+       occurred.  This field is always NULL if the error was reported
+       during the initial data copy.
+      </para></entry>

Is it important to stress on 'always' in the above two descriptions?

5.
The current description of first/last_error_time seems sliglthy
misleading as one can interpret that these are about different errors.
Let's slightly change the description of first/last_error_time as
follows or something on those lines:

</para>
+      <para>
+       Time at which the first error occurred
+      </para></entry>
+     </row>

First time at which this error occurred

<structfield>last_error_time</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which the last error occurred

Last time at which this error occurred. This will be the same as
first_error_time except when the same error occurred more than once
consecutively.

6.
+        </indexterm>
+        <function>pg_stat_reset_subscription_worker</function> (
<parameter>subid</parameter> <type>oid</type>, <optional>
<parameter>relid</parameter> <type>oid</type> </optional> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets the statistics of a single subscription worker running on the
+        subscription with <parameter>subid</parameter> shown in the
+        <structname>pg_stat_subscription_worker</structname> view.  If the
+        argument <parameter>relid</parameter> is not <literal>NULL</literal>,
+        resets statistics of the subscription worker handling the initial data
+        copy of the relation with <parameter>relid</parameter>.  Otherwise,
+        resets the subscription worker statistics of the main apply worker.
+        If the argument <parameter>relid</parameter> is omitted, resets the
+        statistics of all subscription workers running on the subscription
+        with <parameter>subid</parameter>.
+       </para>

The first line of this description seems to indicate that we can only
reset the stats of a single worker but the later part indicates that
we can reset stats of all subscription workers. Can we change the
first line as: "Resets the statistics of subscription workers running
on the subscription with <parameter>subid</parameter> shown in the
<structname>pg_stat_subscription_worker</structname> view.".

7.
pgstat_vacuum_stat()
{
..
+ pgstat_setheader(&spmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
+ spmsg.m_databaseid = MyDatabaseId;
+ spmsg.m_nentries = 0;
..
}

Do we really need to set the header here? It seems to be getting set
in pgstat_send_subscription_purge() while sending this message.

8.
pgstat_vacuum_stat()
{
..
+
+ if (hash_search(htab, (void *) &(subwentry->key.subid), HASH_FIND, NULL)
+ != NULL)
+ continue;
+
+ /* This subscription is dead, add the subid to the message */
+ spmsg.m_subids[spmsg.m_nentries++] = subwentry->key.subid;
..
}

I think it is better to use a separate variable here for subid as we
are using for funcid and tableid. That will make this part of the code
easier to follow and look consistent.

9.
+/* ----------
+ * PgStat_MsgSubWorkerError Sent by the apply worker or the table
sync worker to
+ * report the error occurred during logical replication.
+ * ----------

In this comment "during logical replication" sounds too generic. Can
we instead use "while processing changes." or something like that to
make it a bit more specific?

--
With Regards,
Amit Kapila.

#290Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#288)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 17, 2021 at 4:16 PM vignesh C <vignesh21@gmail.com> wrote:

Few comments:
1) should we set subwentry to NULL to handle !create && !found case
or we could return NULL similar to the earlier function.

I think it is good to be consistent with the nearby code in this case.

--
With Regards,
Amit Kapila.

#291houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#282)
RE: Skipping logical replication transactions on subscriber side

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Hi,

I have few comments for the testcases.

1)

+my $appname = 'tap_sub';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = off, two_phase = on);");
+my $appname_streaming = 'tap_sub_streaming';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION '$publisher_connstr application_name=$appname_streaming' PUBLICATION tap_pub_streaming WITH (streaming = on, two_phase = on);");
+

I think we can remove the 'application_name=$appname', so that the command
could be shorter.

2)
+...(streaming = on, two_phase = on);");
Besides, is there some reasons to set two_phase to ? If so,
It might be better to add some comments about it.

3)
+CREATE PUBLICATION tap_pub_streaming FOR TABLE test_tab_streaming;
+]);
+

It seems there's no tests to use the table test_tab_streaming. I guess this
table is used to test streaming change error, maybe we can add some tests for
it ?

Best regards,
Hou zj

#292tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#282)
RE: Skipping logical replication transactions on subscriber side

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Thanks for your patch.

I read the discussion about stats entries for table sync worker[1]/messages/by-id/CAD21AoAT42mhcqeB1jPfRL1+EUHbZk8MMY_fBgsyZvJeKNpG+w@mail.gmail.com, the
statistics are retained after table sync worker finished its jobs and user can remove
them via pg_stat_reset_subscription_worker function.

But I notice that, if a table sync worker finished its jobs, the error reported by
this worker will not be shown in the pg_stat_subscription_workers view. (It seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it intentional? I think this may cause a result that users don't know the statistics are still exist, and won't remove the statistics manually. And that is not friendly to users' storage, right?

[1]: /messages/by-id/CAD21AoAT42mhcqeB1jPfRL1+EUHbZk8MMY_fBgsyZvJeKNpG+w@mail.gmail.com

Regards
Tang

#293Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tanghy.fnst@fujitsu.com (#292)
Re: Skipping logical replication transactions on subscriber side

On Thu, Nov 18, 2021 at 5:45 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Thanks for your patch.

I read the discussion about stats entries for table sync worker[1], the
statistics are retained after table sync worker finished its jobs and user can remove
them via pg_stat_reset_subscription_worker function.

But I notice that, if a table sync worker finished its jobs, the error reported by
this worker will not be shown in the pg_stat_subscription_workers view. (It seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it intentional? I think this may cause a result that users don't know the statistics are still exist, and won't remove the statistics manually. And that is not friendly to users' storage, right?

You're right. The condition "WHERE substate <> 'r') should be removed.
I'll do that change in the next version patch. Thanks!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#294Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#283)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 17, 2021 at 12:43 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tues, Nov 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Hi,

Thanks for updating the patch.
Here are few comments.

Thank you for the comments!

1)

+ <function>pg_stat_reset_subscription_worker</function> ( <parameter>subid</parameter> <type>oid</type>, <optional> <parameter>relid</parameter> <type>oid</type> </optional> )

It seems we should put '<optional>' before the comma(',').

Will fix.

2)
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subrelid</structfield> <type>oid</type>
+      </para>
+      <para>
+       OID of the relation that the worker is synchronizing; null for the
+       main apply worker
+      </para></entry>
+     </row>

Is the 'subrelid' only used for distinguishing the worker type ? If so, would it
be clear to have a string value here. I recalled the previous version patch has
failure_source column but was removed. Maybe I missed something.

As Amit mentioned, users can use this check which table sync worker.

3)
.
+extern void pgstat_reset_subworker_stats(Oid subid, Oid subrelid, bool allstats);

I didn't find the code of this functions, maybe we can remove this declaration ?

Will remove.

I'll submit an updated patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#295Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#288)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 17, 2021 at 7:46 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 11:43 PM vignesh C <vignesh21@gmail.com> wrote:

On Mon, Nov 15, 2021 at 2:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 15, 2021 at 4:49 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Nov 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Few comments:

Thank you for the comments!

1) should we set subwentry to NULL to handle !create && !found case
or we could return NULL similar to the earlier function.
+static PgStat_StatSubWorkerEntry *
+pgstat_get_subworker_entry(PgStat_StatDBEntry *dbentry, Oid subid,
Oid subrelid,
+                                                  bool create)
+{
+       PgStat_StatSubWorkerEntry *subwentry;
+       PgStat_StatSubWorkerKey key;
+       bool            found;
+       HASHACTION      action = (create ? HASH_ENTER : HASH_FIND);
+
+       key.subid = subid;
+       key.subrelid = subrelid;
+       subwentry = (PgStat_StatSubWorkerEntry *)
hash_search(dbentry->subworkers,
+
(void *) &key,
+
action, &found);
+
+       /* If not found, initialize the new one */
+       if (create && !found)

It's better to return NULL if !create && !found. WIll fix.

2) Should we keep the line width to 80 chars:
+/* ----------
+ * PgStat_MsgSubWorkerError            Sent by the apply worker or
the table sync worker to
+ *                                                             report
the error occurred during logical replication.
+ * ----------
+ */
+#define PGSTAT_SUBWORKERERROR_MSGLEN 256
+typedef struct PgStat_MsgSubWorkerError
+{

Hmm, pg_indent seems not to fix it. Anyway, will fix.

I'll fix an updated patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#296Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#282)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 16, 2021 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

A couple of comments for the v23 patch:

doc/src/sgml/monitoring.sgml
(1) inconsistent decription
I think that the following description seems inconsistent with the
previous description given above it in the patch (i.e. "One row per
subscription worker, showing statistics about errors that occurred on
that subscription worker"):

"The <structname>pg_stat_subscription_workers</structname> view will
contain one row per subscription error reported by workers applying
logical replication changes and workers handling the initial data copy
of the subscribed tables."

I think it is inconsistent because it implies there could be multiple
subscription error rows for the same worker.
Maybe the following wording could be used instead, or something similar:

"The <structname>pg_stat_subscription_workers</structname> view will
contain one row per subscription worker on which errors have occurred,
for workers applying logical replication changes and workers handling
the initial data copy of the subscribed tables."

(2) null vs NULL
The "subrelid" column description uses "null" but the "command" column
description uses "NULL".
I think "NULL" should be used for consistency.

Regards,
Greg Nancarrow
Fujitsu Australia

#297Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#293)
Re: Skipping logical replication transactions on subscriber side

On Thu, Nov 18, 2021 at 5:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 18, 2021 at 5:45 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Thanks for your patch.

I read the discussion about stats entries for table sync worker[1], the
statistics are retained after table sync worker finished its jobs and user can remove
them via pg_stat_reset_subscription_worker function.

But I notice that, if a table sync worker finished its jobs, the error reported by
this worker will not be shown in the pg_stat_subscription_workers view. (It seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it intentional? I think this may cause a result that users don't know the statistics are still exist, and won't remove the statistics manually. And that is not friendly to users' storage, right?

You're right. The condition "WHERE substate <> 'r') should be removed.
I'll do that change in the next version patch. Thanks!

One more thing you might want to consider for the next version is
whether to rename the columns as discussed in the related thread [1]/messages/by-id/CAA4eK1KR41bRUuPeNBSGv2+q7ROKukS3myeAUqrZMD8MEwR0DQ@mail.gmail.com?
I think we should consider future work and name them accordingly.

[1]: /messages/by-id/CAA4eK1KR41bRUuPeNBSGv2+q7ROKukS3myeAUqrZMD8MEwR0DQ@mail.gmail.com

--
With Regards,
Amit Kapila.

#298vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#297)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 9:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Nov 18, 2021 at 5:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 18, 2021 at 5:45 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Thanks for your patch.

I read the discussion about stats entries for table sync worker[1], the
statistics are retained after table sync worker finished its jobs and user can remove
them via pg_stat_reset_subscription_worker function.

But I notice that, if a table sync worker finished its jobs, the error reported by
this worker will not be shown in the pg_stat_subscription_workers view. (It seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it intentional? I think this may cause a result that users don't know the statistics are still exist, and won't remove the statistics manually. And that is not friendly to users' storage, right?

You're right. The condition "WHERE substate <> 'r') should be removed.
I'll do that change in the next version patch. Thanks!

One more thing you might want to consider for the next version is
whether to rename the columns as discussed in the related thread [1]?
I think we should consider future work and name them accordingly.

[1] - /messages/by-id/CAA4eK1KR41bRUuPeNBSGv2+q7ROKukS3myeAUqrZMD8MEwR0DQ@mail.gmail.com

Since the statistics collector process uses UDP socket, the sequencing
of the messages is not guaranteed. Will there be a problem if
Subscription is dropped and stats collector receives
PGSTAT_MTYPE_SUBSCRIPTIONPURGE first and the subscription worker entry
is removed and then receives PGSTAT_MTYPE_SUBWORKERERROR(this order
can happen because of UDP socket). I'm not sure if the Assert will be
a problem in this case. If this scenario is possible we could just
silently return in that case.

+static void
+pgstat_recv_subworker_error(PgStat_MsgSubWorkerError *msg, int len)
+{
+       PgStat_StatDBEntry *dbentry;
+       PgStat_StatSubWorkerEntry *subwentry;
+
+       dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+       /* Get the subscription worker stats */
+       subwentry = pgstat_get_subworker_entry(dbentry, msg->m_subid,
+
            msg->m_subrelid, true);
+       Assert(subwentry);
+
+       /*
+        * Update only the counter and last error timestamp if we received
+        * the same error again
+        */

Thoughts?

Regards,
Vignesh

#299Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: vignesh C (#298)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 4:39 PM vignesh C <vignesh21@gmail.com> wrote:

Since the statistics collector process uses UDP socket, the sequencing
of the messages is not guaranteed. Will there be a problem if
Subscription is dropped and stats collector receives
PGSTAT_MTYPE_SUBSCRIPTIONPURGE first and the subscription worker entry
is removed and then receives PGSTAT_MTYPE_SUBWORKERERROR(this order
can happen because of UDP socket). I'm not sure if the Assert will be
a problem in this case. If this scenario is possible we could just
silently return in that case.

Given that the message sequencing is not guaranteed, it looks like
that Assert and the current code after it won't handle that scenario
well. Silently returning if subwentry is NULL does seem like the way
to deal with that possibility.
Doesn't this possibility of out-of-sequence messaging due to UDP
similarly mean that "first_error_time" and "last_error_time" may not
be currently handled correctly?

Regards,
Greg Nancarrow
Fujitsu Australia

#300Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#298)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 11:09 AM vignesh C <vignesh21@gmail.com> wrote:

On Fri, Nov 19, 2021 at 9:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Nov 18, 2021 at 5:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 18, 2021 at 5:45 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Thanks for your patch.

I read the discussion about stats entries for table sync worker[1], the
statistics are retained after table sync worker finished its jobs and user can remove
them via pg_stat_reset_subscription_worker function.

But I notice that, if a table sync worker finished its jobs, the error reported by
this worker will not be shown in the pg_stat_subscription_workers view. (It seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it intentional? I think this may cause a result that users don't know the statistics are still exist, and won't remove the statistics manually. And that is not friendly to users' storage, right?

You're right. The condition "WHERE substate <> 'r') should be removed.
I'll do that change in the next version patch. Thanks!

One more thing you might want to consider for the next version is
whether to rename the columns as discussed in the related thread [1]?
I think we should consider future work and name them accordingly.

[1] - /messages/by-id/CAA4eK1KR41bRUuPeNBSGv2+q7ROKukS3myeAUqrZMD8MEwR0DQ@mail.gmail.com

Since the statistics collector process uses UDP socket, the sequencing
of the messages is not guaranteed. Will there be a problem if
Subscription is dropped and stats collector receives
PGSTAT_MTYPE_SUBSCRIPTIONPURGE first and the subscription worker entry
is removed and then receives PGSTAT_MTYPE_SUBWORKERERROR(this order
can happen because of UDP socket). I'm not sure if the Assert will be
a problem in this case.

Why that Assert will hit? We seem to be always passing 'create' as
true so it should create a new entry. I think a similar situation can
happen for functions and it will be probably cleaned in the next
vacuum cycle.

--
With Regards,
Amit Kapila.

#301vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#300)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 12:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Nov 19, 2021 at 11:09 AM vignesh C <vignesh21@gmail.com> wrote:

On Fri, Nov 19, 2021 at 9:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Nov 18, 2021 at 5:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 18, 2021 at 5:45 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Thanks for your patch.

I read the discussion about stats entries for table sync worker[1], the
statistics are retained after table sync worker finished its jobs and user can remove
them via pg_stat_reset_subscription_worker function.

But I notice that, if a table sync worker finished its jobs, the error reported by
this worker will not be shown in the pg_stat_subscription_workers view. (It seemed caused by this condition: "WHERE srsubstate <> 'r'") Is it intentional? I think this may cause a result that users don't know the statistics are still exist, and won't remove the statistics manually. And that is not friendly to users' storage, right?

You're right. The condition "WHERE substate <> 'r') should be removed.
I'll do that change in the next version patch. Thanks!

One more thing you might want to consider for the next version is
whether to rename the columns as discussed in the related thread [1]?
I think we should consider future work and name them accordingly.

[1] - /messages/by-id/CAA4eK1KR41bRUuPeNBSGv2+q7ROKukS3myeAUqrZMD8MEwR0DQ@mail.gmail.com

Since the statistics collector process uses UDP socket, the sequencing
of the messages is not guaranteed. Will there be a problem if
Subscription is dropped and stats collector receives
PGSTAT_MTYPE_SUBSCRIPTIONPURGE first and the subscription worker entry
is removed and then receives PGSTAT_MTYPE_SUBWORKERERROR(this order
can happen because of UDP socket). I'm not sure if the Assert will be
a problem in this case.

Why that Assert will hit? We seem to be always passing 'create' as
true so it should create a new entry. I think a similar situation can
happen for functions and it will be probably cleaned in the next
vacuum cycle.

Since we are passing true that Assert will not hit, sorry I missed to
notice that. It will create a new entry as you rightly pointed out.
Since the cleaning is handled by vacuum and current code is also doing
that way, I felt no need to make any change.

Regards,
Vignesh

#302Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Amit Kapila (#300)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 5:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Why that Assert will hit? We seem to be always passing 'create' as
true so it should create a new entry. I think a similar situation can
happen for functions and it will be probably cleaned in the next
vacuum cycle.

Oops, I missed that too. So at worst, vacuum will clean it up in the
out-of-order SUBSCRIPTIONPURGE,SUBWORKERERROR case.

But I still think the current code may not correctly handle
first_error_time/last_error_time timestamps if out-of-order
SUBWORKERERROR messages occur, right?

Regards,
Greg Nancarrow
Fujitsu Australia

#303Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Greg Nancarrow (#302)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 1:22 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Fri, Nov 19, 2021 at 5:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Why that Assert will hit? We seem to be always passing 'create' as
true so it should create a new entry. I think a similar situation can
happen for functions and it will be probably cleaned in the next
vacuum cycle.

Oops, I missed that too. So at worst, vacuum will clean it up in the
out-of-order SUBSCRIPTIONPURGE,SUBWORKERERROR case.

But I still think the current code may not correctly handle
first_error_time/last_error_time timestamps if out-of-order
SUBWORKERERROR messages occur, right?

Yeah in such a case last_error_time can be shown as a time before
first_error_time but I don't think that will be a big problem, the
next message will fix it. I don't see what we can do about it and the
same is true for other cases like pg_stat_archiver where the success
and failure times can be out of order. If we want we can remove one of
those times but I don't think this happens frequently enough to be
considered a problem. Anyway, these stats are not considered to be
updated with the most latest info.

--
With Regards,
Amit Kapila.

#304Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Amit Kapila (#303)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 8:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah in such a case last_error_time can be shown as a time before
first_error_time but I don't think that will be a big problem, the
next message will fix it. I don't see what we can do about it and the
same is true for other cases like pg_stat_archiver where the success
and failure times can be out of order. If we want we can remove one of
those times but I don't think this happens frequently enough to be
considered a problem. Anyway, these stats are not considered to be
updated with the most latest info.

Couldn't the code block in pgstat_recv_subworker_error() that
increments error_count just compare the new msg timestamp against the
existing first_error_time and last_error_time and, based on the
result, update those if required?

Regards,
Greg Nancarrow
Fujitsu Australia

#305Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Greg Nancarrow (#304)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 3:00 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Fri, Nov 19, 2021 at 8:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah in such a case last_error_time can be shown as a time before
first_error_time but I don't think that will be a big problem, the
next message will fix it. I don't see what we can do about it and the
same is true for other cases like pg_stat_archiver where the success
and failure times can be out of order. If we want we can remove one of
those times but I don't think this happens frequently enough to be
considered a problem. Anyway, these stats are not considered to be
updated with the most latest info.

Couldn't the code block in pgstat_recv_subworker_error() that
increments error_count just compare the new msg timestamp against the
existing first_error_time and last_error_time and, based on the
result, update those if required?

I don't see any problem with that but let's see what Sawada-San has to
say about this?

--
With Regards,
Amit Kapila.

#306Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#305)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 19, 2021 at 7:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Nov 19, 2021 at 3:00 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Fri, Nov 19, 2021 at 8:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah in such a case last_error_time can be shown as a time before
first_error_time but I don't think that will be a big problem, the
next message will fix it. I don't see what we can do about it and the
same is true for other cases like pg_stat_archiver where the success
and failure times can be out of order. If we want we can remove one of
those times but I don't think this happens frequently enough to be
considered a problem. Anyway, these stats are not considered to be
updated with the most latest info.

Couldn't the code block in pgstat_recv_subworker_error() that
increments error_count just compare the new msg timestamp against the
existing first_error_time and last_error_time and, based on the
result, update those if required?

I don't see any problem with that but let's see what Sawada-San has to
say about this?

IMO not sure we should do that. Since the stats collector will not
likely to receive the same error report frequently in practice (5 sec
interval by default), perhaps this problem will unlikely to happen.
Even if the same messages are reported frequently enough to cause this
problem, the next message will also be reported soon, fixing it soon,
as Amit mentioned. Also, IIUC once we have the shared memory based
stats collector, we won’t need to worry about this problem. Given that
this kind of problem potentially exists also in other stats views that
have timestamp values, I’m not sure it's worth dealing with this
problem only in pg_stat_subscription_workers view.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#307Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#291)
Re: Skipping logical replication transactions on subscriber side

On Thu, Nov 18, 2021 at 12:52 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Hi,

I have few comments for the testcases.

1)

+my $appname = 'tap_sub';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = off, two_phase = on);");
+my $appname_streaming = 'tap_sub_streaming';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION '$publisher_connstr application_name=$appname_streaming' PUBLICATION tap_pub_streaming WITH (streaming = on, two_phase = on);");
+

I think we can remove the 'application_name=$appname', so that the command
could be shorter.

But we wait for the subscription to catch up by using
wait_for_catchup() with application_name, no?

2)
+...(streaming = on, two_phase = on);");
Besides, is there some reasons to set two_phase to ? If so,
It might be better to add some comments about it.

Yes, two_phase = on is required by the tests for skip transaction
patch. WIll remove it.

3)
+CREATE PUBLICATION tap_pub_streaming FOR TABLE test_tab_streaming;
+]);
+

It seems there's no tests to use the table test_tab_streaming. I guess this
table is used to test streaming change error, maybe we can add some tests for
it ?

Oops, similarly this is also required by the skip transaction tests.
Will remove it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#308Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#307)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 24, 2021 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 18, 2021 at 12:52 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Hi,

I have few comments for the testcases.

1)

+my $appname = 'tap_sub';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = off, two_phase = on);");
+my $appname_streaming = 'tap_sub_streaming';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION '$publisher_connstr application_name=$appname_streaming' PUBLICATION tap_pub_streaming WITH (streaming = on, two_phase = on);");
+

I think we can remove the 'application_name=$appname', so that the command
could be shorter.

But we wait for the subscription to catch up by using
wait_for_catchup() with application_name, no?

Yeah, but you can directly use the subscription name in
wait_for_catchup because we internally use that as
fallback_application_name. If application_name is not specified in the
connection string as suggested by Hou-San then
fallback_application_name will be considered. Both ways are okay and I
see we use both ways in the tests but it seems there are more places
where we use the method Hou-San is suggesting in subscription tests.

--
With Regards,
Amit Kapila.

#309Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#308)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 24, 2021 at 12:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 24, 2021 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 18, 2021 at 12:52 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Hi,

I have few comments for the testcases.

1)

+my $appname = 'tap_sub';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = off, two_phase = on);");
+my $appname_streaming = 'tap_sub_streaming';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION '$publisher_connstr application_name=$appname_streaming' PUBLICATION tap_pub_streaming WITH (streaming = on, two_phase = on);");
+

I think we can remove the 'application_name=$appname', so that the command
could be shorter.

But we wait for the subscription to catch up by using
wait_for_catchup() with application_name, no?

Yeah, but you can directly use the subscription name in
wait_for_catchup because we internally use that as
fallback_application_name. If application_name is not specified in the
connection string as suggested by Hou-San then
fallback_application_name will be considered. Both ways are okay and I
see we use both ways in the tests but it seems there are more places
where we use the method Hou-San is suggesting in subscription tests.

Okay, thanks! I referred to tests that set application_name. ISTM it's
better to unite them so as not to confuse them in future tests.

Anyway, I'll remove it in the next version patch that I'll submit soon.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#310Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#309)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 24, 2021 at 1:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 24, 2021 at 12:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 24, 2021 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 18, 2021 at 12:52 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, November 16, 2021 2:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Hi,

I have few comments for the testcases.

1)

+my $appname = 'tap_sub';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = off, two_phase = on);");
+my $appname_streaming = 'tap_sub_streaming';
+$node_subscriber->safe_psql(
+    'postgres',
+    "CREATE SUBSCRIPTION tap_sub_streaming CONNECTION '$publisher_connstr application_name=$appname_streaming' PUBLICATION tap_pub_streaming WITH (streaming = on, two_phase = on);");
+

I think we can remove the 'application_name=$appname', so that the command
could be shorter.

But we wait for the subscription to catch up by using
wait_for_catchup() with application_name, no?

Yeah, but you can directly use the subscription name in
wait_for_catchup because we internally use that as
fallback_application_name. If application_name is not specified in the
connection string as suggested by Hou-San then
fallback_application_name will be considered. Both ways are okay and I
see we use both ways in the tests but it seems there are more places
where we use the method Hou-San is suggesting in subscription tests.

Okay, thanks! I referred to tests that set application_name. ISTM it's
better to unite them so as not to confuse them in future tests.

Agreed, but let's do this clean-up as a separate patch. Feel free to
submit the patch for the same in a separate thread.

--
With Regards,
Amit Kapila.

#311Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#289)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 17, 2021 at 8:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

Few comments/questions:
=====================
1.
+  <para>
+   The <structname>pg_stat_subscription_workers</structname> view will contain
+   one row per subscription error reported by workers applying logical
+   replication changes and workers handling the initial data copy of the
+   subscribed tables.  The statistics entry is removed when the subscription
+   the worker is running on is removed.
+  </para>

The last line of this paragraph is not clear to me. First "the" before
"worker" in the following part of the sentence seems unnecessary
"..when the subscription the worker..". Then the part "running on is
removed" is unclear because it could also mean that we remove the
entry when a subscription is disabled. Can we rephrase it to: "The
statistics entry is removed when the corresponding subscription is
dropped"?

Agreed. Fixed.

2.
Between v20 and v23 versions of patch the size of hash table
PGSTAT_SUBWORKER_HASH_SIZE is increased from 32 to 256. I might have
missed the comment which lead to this change, can you point me to the
same or if you changed it for some other reason, can you let me know
the same?

I'd missed reverting this change. I considered increasing this value
since the lifetime of subscription is long. But when it comes to
unshared hashtable can be expanded on-the-fly, it's better to start
with a small value. Reverted.

3.
+
+ /*
+ * Repeat for subscription workers.  Similarly, we needn't bother
+ * in the common case where no function stats are being collected.
+ */

/function/subscription workers'

Fixed.

4.
+      <para>
+       Name of command being applied when the error occurred.  This field
+       is always NULL if the error was reported during the initial data
+       copy.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>xid</structfield> <type>xid</type>
+      </para>
+      <para>
+       Transaction ID of the publisher node being applied when the error
+       occurred.  This field is always NULL if the error was reported
+       during the initial data copy.
+      </para></entry>

Is it important to stress on 'always' in the above two descriptions?

No, removed.

5.
The current description of first/last_error_time seems sliglthy
misleading as one can interpret that these are about different errors.
Let's slightly change the description of first/last_error_time as
follows or something on those lines:

</para>
+      <para>
+       Time at which the first error occurred
+      </para></entry>
+     </row>

First time at which this error occurred

<structfield>last_error_time</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which the last error occurred

Last time at which this error occurred. This will be the same as
first_error_time except when the same error occurred more than once
consecutively.

Changed. I've removed first_error_time as per discussion on the thread
for adding xact stats.

6.
+        </indexterm>
+        <function>pg_stat_reset_subscription_worker</function> (
<parameter>subid</parameter> <type>oid</type>, <optional>
<parameter>relid</parameter> <type>oid</type> </optional> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets the statistics of a single subscription worker running on the
+        subscription with <parameter>subid</parameter> shown in the
+        <structname>pg_stat_subscription_worker</structname> view.  If the
+        argument <parameter>relid</parameter> is not <literal>NULL</literal>,
+        resets statistics of the subscription worker handling the initial data
+        copy of the relation with <parameter>relid</parameter>.  Otherwise,
+        resets the subscription worker statistics of the main apply worker.
+        If the argument <parameter>relid</parameter> is omitted, resets the
+        statistics of all subscription workers running on the subscription
+        with <parameter>subid</parameter>.
+       </para>

The first line of this description seems to indicate that we can only
reset the stats of a single worker but the later part indicates that
we can reset stats of all subscription workers. Can we change the
first line as: "Resets the statistics of subscription workers running
on the subscription with <parameter>subid</parameter> shown in the
<structname>pg_stat_subscription_worker</structname> view.".

Changed.

7.
pgstat_vacuum_stat()
{
..
+ pgstat_setheader(&spmsg.m_hdr, PGSTAT_MTYPE_SUBSCRIPTIONPURGE);
+ spmsg.m_databaseid = MyDatabaseId;
+ spmsg.m_nentries = 0;
..
}

Do we really need to set the header here? It seems to be getting set
in pgstat_send_subscription_purge() while sending this message.

Removed.

8.
pgstat_vacuum_stat()
{
..
+
+ if (hash_search(htab, (void *) &(subwentry->key.subid), HASH_FIND, NULL)
+ != NULL)
+ continue;
+
+ /* This subscription is dead, add the subid to the message */
+ spmsg.m_subids[spmsg.m_nentries++] = subwentry->key.subid;
..
}

I think it is better to use a separate variable here for subid as we
are using for funcid and tableid. That will make this part of the code
easier to follow and look consistent.

Agreed, and changed.

9.
+/* ----------
+ * PgStat_MsgSubWorkerError Sent by the apply worker or the table
sync worker to
+ * report the error occurred during logical replication.
+ * ----------

In this comment "during logical replication" sounds too generic. Can
we instead use "while processing changes." or something like that to
make it a bit more specific?

"while processing changes" sounds good.

I've attached an updated version patch. Unless I miss something, all
comments I got so far have been incorporated into this patch. Please
review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v24-0001-Add-a-subscription-worker-statistics-view-pg_sta.patchapplication/octet-stream; name=v24-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch
#312Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#311)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 24, 2021 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Changed. I've removed first_error_time as per discussion on the thread
for adding xact stats.

We also agreed to change the column names to start with last_error_*
[1]: /messages/by-id/CAD21AoCQ8z5goy3BCqfk2gn5p8NVH5B-uxO3Xc-dXN-MXVfnKg@mail.gmail.com
can change it just before committing that patch? I thought it might be
better to do it that way now itself.

[1]: /messages/by-id/CAD21AoCQ8z5goy3BCqfk2gn5p8NVH5B-uxO3Xc-dXN-MXVfnKg@mail.gmail.com

--
With Regards,
Amit Kapila.

#313vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#311)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 24, 2021 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 17, 2021 at 8:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

One very minor comment:
conflict can be moved to next line to keep it within 80 chars boundary
wherever possible
+# Initial table setup on both publisher and subscriber. On subscriber we create
+# the same tables but with primary keys. Also, insert some data that
will conflict
+# with the data replicated from publisher later.
+$node_publisher->safe_psql(
Similarly in the below:
+# Insert more data to test_tab1, raising an error on the subscriber
due to violation
+# of the unique constraint on test_tab1.
+my $xid = $node_publisher->safe_psql(

The rest of the patch looks good.

Regards,
Vignesh

#314Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#311)
Re: Skipping logical replication transactions on subscriber side

On Wed, Nov 24, 2021 at 10:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch. Unless I miss something, all
comments I got so far have been incorporated into this patch. Please
review it.

Only a couple of minor points:

src/backend/postmaster/pgstat.c
(1) pgstat_get_subworker_entry

In the following comment, it should say "returns an entry ...":

+ * apply worker otherwise returns entry of the table sync worker associated

src/include/pgstat.h
(2) typedef struct PgStat_StatDBEntry

"subworker" should be "subworkers" in the following comment, to match
the struct member name:

* subworker is the hash table of PgStat_StatSubWorkerEntry which stores

Otherwise, the patch LGTM.

Regards,
Greg Nancarrow
Fujitsu Australia

#315Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#312)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Nov 25, 2021 at 1:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 24, 2021 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Changed. I've removed first_error_time as per discussion on the thread
for adding xact stats.

We also agreed to change the column names to start with last_error_*
[1]. Is there a reason to not make those changes? Do you think that we
can change it just before committing that patch? I thought it might be
better to do it that way now itself.

Oh, I thought that you think that we change the column names when
adding xact stats to the view. But these names also make sense even
without the xact stats. I've attached an updated patch. It also
incorporated comments from Vignesh and Greg.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v25-0001-Add-a-subscription-worker-statistics-view-pg_sta.patchapplication/octet-stream; name=v25-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch
#316Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#313)
Re: Skipping logical replication transactions on subscriber side

On Thu, Nov 25, 2021 at 7:36 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Nov 24, 2021 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 17, 2021 at 8:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Nov 16, 2021 at 12:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Right. I've fixed this issue and attached an updated patch.

One very minor comment:
conflict can be moved to next line to keep it within 80 chars boundary
wherever possible
+# Initial table setup on both publisher and subscriber. On subscriber we create
+# the same tables but with primary keys. Also, insert some data that
will conflict
+# with the data replicated from publisher later.
+$node_publisher->safe_psql(
Similarly in the below:
+# Insert more data to test_tab1, raising an error on the subscriber
due to violation
+# of the unique constraint on test_tab1.
+my $xid = $node_publisher->safe_psql(

The rest of the patch looks good.

Thank you for the comments! These are incorporated into v25 patch I
just submitted.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#317Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#314)
Re: Skipping logical replication transactions on subscriber side

On Thu, Nov 25, 2021 at 9:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Wed, Nov 24, 2021 at 10:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch. Unless I miss something, all
comments I got so far have been incorporated into this patch. Please
review it.

Only a couple of minor points:

src/backend/postmaster/pgstat.c
(1) pgstat_get_subworker_entry

In the following comment, it should say "returns an entry ...":

+ * apply worker otherwise returns entry of the table sync worker associated

src/include/pgstat.h
(2) typedef struct PgStat_StatDBEntry

"subworker" should be "subworkers" in the following comment, to match
the struct member name:

* subworker is the hash table of PgStat_StatSubWorkerEntry which stores

Otherwise, the patch LGTM.

Thank you for the comments! These are incorporated into v25 patch I
just submitted.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#318houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#315)
RE: Skipping logical replication transactions on subscriber side

On Thur, Nov 25, 2021 8:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 25, 2021 at 1:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 24, 2021 at 5:14 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

Changed. I've removed first_error_time as per discussion on the
thread for adding xact stats.

We also agreed to change the column names to start with last_error_*
[1]. Is there a reason to not make those changes? Do you think that we
can change it just before committing that patch? I thought it might be
better to do it that way now itself.

Oh, I thought that you think that we change the column names when adding xact
stats to the view. But these names also make sense even without the xact stats.
I've attached an updated patch. It also incorporated comments from Vignesh
and Greg.

Hi,

I only noticed some minor things in the testcases

1)
+$node_publisher->append_conf('postgresql.conf',
+			     qq[
+logical_decoding_work_mem = 64kB
+]);

It seems we don’t need set the decode_work_mem since we don't test streaming ?

2)
+$node_publisher->safe_psql('postgres',
+			   q[
+CREATE PUBLICATION tap_pub FOR TABLE test_tab1, test_tab2;
+]);

There are a few places where only one command exists in the 'q[' or 'qq[' like the above code.
To be consistent, I think it might be better to remove the wrap here, maybe we can write like:
$node_publisher->safe_psql('postgres',
' CREATE PUBLICATION tap_pub FOR TABLE test_tab1, test_tab2;');

The others LGTM.

Best regards,
Hou zj

#319Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: houzj.fnst@fujitsu.com (#318)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Nov 25, 2021 at 10:06 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Thur, Nov 25, 2021 8:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 25, 2021 at 1:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Nov 24, 2021 at 5:14 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

Changed. I've removed first_error_time as per discussion on the
thread for adding xact stats.

We also agreed to change the column names to start with last_error_*
[1]. Is there a reason to not make those changes? Do you think that we
can change it just before committing that patch? I thought it might be
better to do it that way now itself.

Oh, I thought that you think that we change the column names when adding xact
stats to the view. But these names also make sense even without the xact stats.
I've attached an updated patch. It also incorporated comments from Vignesh
and Greg.

Hi,

I only noticed some minor things in the testcases

1)
+$node_publisher->append_conf('postgresql.conf',
+                            qq[
+logical_decoding_work_mem = 64kB
+]);

It seems we don’t need set the decode_work_mem since we don't test streaming ?

2)
+$node_publisher->safe_psql('postgres',
+                          q[
+CREATE PUBLICATION tap_pub FOR TABLE test_tab1, test_tab2;
+]);

There are a few places where only one command exists in the 'q[' or 'qq[' like the above code.
To be consistent, I think it might be better to remove the wrap here, maybe we can write like:
$node_publisher->safe_psql('postgres',
' CREATE PUBLICATION tap_pub FOR TABLE test_tab1, test_tab2;');

Indeed. Attached an updated patch. Thanks!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v26-0001-Add-a-subscription-worker-statistics-view-pg_sta.patchapplication/octet-stream; name=v26-0001-Add-a-subscription-worker-statistics-view-pg_sta.patch
#320tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#319)
RE: Skipping logical replication transactions on subscriber side

On Friday, November 26, 2021 9:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Indeed. Attached an updated patch. Thanks!

Thanks for your patch. A small comment:

+       OID of the relation that the worker is synchronizing; null for the
+       main apply worker

Should we modify it to "OID of the relation that the worker was synchronizing ..."?

The rest of the patch LGTM.

Regards
Tang

#321Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#319)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 26, 2021 at 6:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Indeed. Attached an updated patch. Thanks!

I have made a number of changes in the attached patch which includes
(a) the patch was trying to register multiple array entries for the
same subscription which doesn't seem to be required, see changes in
pgstat_vacuum_stat, (b) multiple changes in the test like reduced the
wal_retrieve_retry_interval to 2s which has reduced the test time to
half, remove the check related to resetting of stats as there is no
guarantee that the message will be received by the collector and we
were not sending it again, changed the test case file name to
026_stats as we can add more subscription-related stats in this test
file itself (c) added/edited multiple comments, (d) updated
PGSTAT_FILE_FORMAT_ID.

Do let me know what you think of the attached?

--
With Regards,
Amit Kapila.

Attachments:

v27-0001-Add-a-view-to-show-the-stats-of-subscription-wor.patchapplication/octet-stream; name=v27-0001-Add-a-view-to-show-the-stats-of-subscription-wor.patch
#322Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: tanghy.fnst@fujitsu.com (#320)
Re: Skipping logical replication transactions on subscriber side

On Fri, Nov 26, 2021 at 7:45 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Friday, November 26, 2021 9:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Indeed. Attached an updated patch. Thanks!

Thanks for your patch. A small comment:

+       OID of the relation that the worker is synchronizing; null for the
+       main apply worker

Should we modify it to "OID of the relation that the worker was synchronizing ..."?

I don't think this change is required, see the description of the
similar column in pg_stat_subscription.

--
With Regards,
Amit Kapila.

#323Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#321)
Re: Skipping logical replication transactions on subscriber side

On Sat, Nov 27, 2021 at 7:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Nov 26, 2021 at 6:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Indeed. Attached an updated patch. Thanks!

Thank you for updating the patch!

I have made a number of changes in the attached patch which includes
(a) the patch was trying to register multiple array entries for the
same subscription which doesn't seem to be required, see changes in
pgstat_vacuum_stat, (b) multiple changes in the test like reduced the
wal_retrieve_retry_interval to 2s which has reduced the test time to
half, remove the check related to resetting of stats as there is no
guarantee that the message will be received by the collector and we
were not sending it again, changed the test case file name to
026_stats as we can add more subscription-related stats in this test
file itself

Since we have pg_stat_subscription view, how about 026_worker_stats.pl?

The rests look good to me.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#324Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#323)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 29, 2021 at 7:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Nov 27, 2021 at 7:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thank you for updating the patch!

I have made a number of changes in the attached patch which includes
(a) the patch was trying to register multiple array entries for the
same subscription which doesn't seem to be required, see changes in
pgstat_vacuum_stat, (b) multiple changes in the test like reduced the
wal_retrieve_retry_interval to 2s which has reduced the test time to
half, remove the check related to resetting of stats as there is no
guarantee that the message will be received by the collector and we
were not sending it again, changed the test case file name to
026_stats as we can add more subscription-related stats in this test
file itself

Since we have pg_stat_subscription view, how about 026_worker_stats.pl?

Sounds better. Updated patch attached.

The rests look good to me.

Okay, I'll push this patch tomorrow unless there are more comments.

--
With Regards,
Amit Kapila.

Attachments:

v28-0001-Add-a-view-to-show-the-stats-of-subscription-wor.patchapplication/octet-stream; name=v28-0001-Add-a-view-to-show-the-stats-of-subscription-wor.patch
#325vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#324)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 29, 2021 at 9:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Nov 29, 2021 at 7:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Nov 27, 2021 at 7:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Thank you for updating the patch!

I have made a number of changes in the attached patch which includes
(a) the patch was trying to register multiple array entries for the
same subscription which doesn't seem to be required, see changes in
pgstat_vacuum_stat, (b) multiple changes in the test like reduced the
wal_retrieve_retry_interval to 2s which has reduced the test time to
half, remove the check related to resetting of stats as there is no
guarantee that the message will be received by the collector and we
were not sending it again, changed the test case file name to
026_stats as we can add more subscription-related stats in this test
file itself

Since we have pg_stat_subscription view, how about 026_worker_stats.pl?

Sounds better. Updated patch attached.

Thanks for the updated patch, the v28 patch looks good to me.

Regards,
Vignesh

#326Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#325)
Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 29, 2021 at 11:38 AM vignesh C <vignesh21@gmail.com> wrote:

I have pushed this patch and there is a buildfarm failure for it. See:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&amp;dt=2021-11-30%2005%3A05%3A25

Sawada-San has shared his initial analysis on pgsql-committers [1]/messages/by-id/CAD21AoChP5wOT2AYziF+-j7vvThF2NyAs7wr+yy+8hsnu=8Rgg@mail.gmail.com and
I am responding here as the fix requires some more discussion.

Looking at the result the test actually got, we had two error entries
for test_tab1 instead of one:

# Failed test 'check the error reported by the apply worker'
# at t/026_worker_stats.pl line 33.
# got: 'tap_sub|INSERT|test_tab1|t
# tap_sub||test_tab1|t'
# expected: 'tap_sub|INSERT|test_tab1|t'

The possible scenarios are:

The table sync worker for test_tab1 failed due to an error unrelated
to apply changes:

2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with
OID 2 is already active for PID 23706

At this time, the view had one error entry for the table sync worker.
After retrying table sync, it succeeded:

2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "test_tab1"
has finished

Then after inserting a row on the publisher, the apply worker inserted
the row but failed due to violating a unique key violation, which is
expected:

2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value
violates unique constraint "test_tab1_pkey"
2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists.
2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data
during "INSERT" for replication target relation "public.test_tab1" in
transaction 721 at 2021-11-30 06:24:04.305096+01

As a result, we had two error entries for test_tab1: the table sync
worker error and the apply worker error. I didn't expect that the
table sync worker for test_tab1 failed due to "replication origin with
OID 2 is already active for PID 23706” error.

Looking at test_subscription_error() in 026_worker_stats.pl, we have
two checks; in the first check, we wait for the view to show the error
entry for the given relation name and xid. This check was passed since
we had the second error (i.g., apply worker error). In the second
check, we get error entries from pg_stat_subscription_workers by
specifying only the relation name. Therefore, we ended up getting two
entries and failed the tests.

To fix this issue, I think that in the second check, we can get the
error from pg_stat_subscription_workers by specifying the relation
name *and* xid like the first check does. I've attached the patch.
What do you think?

I think this will fix the reported failure but there is another race
condition in the test. Isn't it possible that for table test_tab2, we
get an error "replication origin with OID ..." or some other error
before copy, in that case also, we will proceed from the second call
of test_subscription_error() which is not what we expect in the test?
Shouldn't we someway check that the error message also starts with
"duplicate key value violates ..."?

[1]: /messages/by-id/CAD21AoChP5wOT2AYziF+-j7vvThF2NyAs7wr+yy+8hsnu=8Rgg@mail.gmail.com

--
With Regards,
Amit Kapila.

#327Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#326)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Nov 29, 2021 at 11:38 AM vignesh C <vignesh21@gmail.com> wrote:

I have pushed this patch and there is a buildfarm failure for it. See:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&amp;dt=2021-11-30%2005%3A05%3A25

Sawada-San has shared his initial analysis on pgsql-committers [1] and
I am responding here as the fix requires some more discussion.

Looking at the result the test actually got, we had two error entries
for test_tab1 instead of one:

# Failed test 'check the error reported by the apply worker'
# at t/026_worker_stats.pl line 33.
# got: 'tap_sub|INSERT|test_tab1|t
# tap_sub||test_tab1|t'
# expected: 'tap_sub|INSERT|test_tab1|t'

The possible scenarios are:

The table sync worker for test_tab1 failed due to an error unrelated
to apply changes:

2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with
OID 2 is already active for PID 23706

At this time, the view had one error entry for the table sync worker.
After retrying table sync, it succeeded:

2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "test_tab1"
has finished

Then after inserting a row on the publisher, the apply worker inserted
the row but failed due to violating a unique key violation, which is
expected:

2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value
violates unique constraint "test_tab1_pkey"
2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists.
2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data
during "INSERT" for replication target relation "public.test_tab1" in
transaction 721 at 2021-11-30 06:24:04.305096+01

As a result, we had two error entries for test_tab1: the table sync
worker error and the apply worker error. I didn't expect that the
table sync worker for test_tab1 failed due to "replication origin with
OID 2 is already active for PID 23706” error.

Looking at test_subscription_error() in 026_worker_stats.pl, we have
two checks; in the first check, we wait for the view to show the error
entry for the given relation name and xid. This check was passed since
we had the second error (i.g., apply worker error). In the second
check, we get error entries from pg_stat_subscription_workers by
specifying only the relation name. Therefore, we ended up getting two
entries and failed the tests.

To fix this issue, I think that in the second check, we can get the
error from pg_stat_subscription_workers by specifying the relation
name *and* xid like the first check does. I've attached the patch.
What do you think?

I think this will fix the reported failure but there is another race
condition in the test. Isn't it possible that for table test_tab2, we
get an error "replication origin with OID ..." or some other error
before copy, in that case also, we will proceed from the second call
of test_subscription_error() which is not what we expect in the test?

Right.

Shouldn't we someway check that the error message also starts with
"duplicate key value violates ..."?

Yeah, I think it's a good idea to make the checks more specific. That
is, probably we can specify the prefix of the error message and
subrelid in addition to the current conditions: relid and xid. That
way, we can check what error was reported by which workers (tablesync
or apply) for which relations. And both check queries in
test_subscription_error() can have the same WHERE clause.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#328Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#327)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 30, 2021 at 8:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Nov 29, 2021 at 11:38 AM vignesh C <vignesh21@gmail.com> wrote:

I have pushed this patch and there is a buildfarm failure for it. See:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&amp;dt=2021-11-30%2005%3A05%3A25

Sawada-San has shared his initial analysis on pgsql-committers [1] and
I am responding here as the fix requires some more discussion.

Looking at the result the test actually got, we had two error entries
for test_tab1 instead of one:

# Failed test 'check the error reported by the apply worker'
# at t/026_worker_stats.pl line 33.
# got: 'tap_sub|INSERT|test_tab1|t
# tap_sub||test_tab1|t'
# expected: 'tap_sub|INSERT|test_tab1|t'

The possible scenarios are:

The table sync worker for test_tab1 failed due to an error unrelated
to apply changes:

2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with
OID 2 is already active for PID 23706

At this time, the view had one error entry for the table sync worker.
After retrying table sync, it succeeded:

2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "test_tab1"
has finished

Then after inserting a row on the publisher, the apply worker inserted
the row but failed due to violating a unique key violation, which is
expected:

2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value
violates unique constraint "test_tab1_pkey"
2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists.
2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data
during "INSERT" for replication target relation "public.test_tab1" in
transaction 721 at 2021-11-30 06:24:04.305096+01

As a result, we had two error entries for test_tab1: the table sync
worker error and the apply worker error. I didn't expect that the
table sync worker for test_tab1 failed due to "replication origin with
OID 2 is already active for PID 23706” error.

Looking at test_subscription_error() in 026_worker_stats.pl, we have
two checks; in the first check, we wait for the view to show the error
entry for the given relation name and xid. This check was passed since
we had the second error (i.g., apply worker error). In the second
check, we get error entries from pg_stat_subscription_workers by
specifying only the relation name. Therefore, we ended up getting two
entries and failed the tests.

To fix this issue, I think that in the second check, we can get the
error from pg_stat_subscription_workers by specifying the relation
name *and* xid like the first check does. I've attached the patch.
What do you think?

I think this will fix the reported failure but there is another race
condition in the test. Isn't it possible that for table test_tab2, we
get an error "replication origin with OID ..." or some other error
before copy, in that case also, we will proceed from the second call
of test_subscription_error() which is not what we expect in the test?

Right.

Shouldn't we someway check that the error message also starts with
"duplicate key value violates ..."?

Yeah, I think it's a good idea to make the checks more specific. That
is, probably we can specify the prefix of the error message and
subrelid in addition to the current conditions: relid and xid. That
way, we can check what error was reported by which workers (tablesync
or apply) for which relations. And both check queries in
test_subscription_error() can have the same WHERE clause.

I've attached a patch that fixes this issue. Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

0001-Fix-regression-test-failure-caused-by-commit-8d74fc9.patchapplication/octet-stream; name=0001-Fix-regression-test-failure-caused-by-commit-8d74fc9.patch
#329vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#328)
Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 30, 2021 at 7:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 30, 2021 at 8:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Nov 29, 2021 at 11:38 AM vignesh C <vignesh21@gmail.com> wrote:

I have pushed this patch and there is a buildfarm failure for it. See:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&amp;dt=2021-11-30%2005%3A05%3A25

Sawada-San has shared his initial analysis on pgsql-committers [1] and
I am responding here as the fix requires some more discussion.

Looking at the result the test actually got, we had two error entries
for test_tab1 instead of one:

# Failed test 'check the error reported by the apply worker'
# at t/026_worker_stats.pl line 33.
# got: 'tap_sub|INSERT|test_tab1|t
# tap_sub||test_tab1|t'
# expected: 'tap_sub|INSERT|test_tab1|t'

The possible scenarios are:

The table sync worker for test_tab1 failed due to an error unrelated
to apply changes:

2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with
OID 2 is already active for PID 23706

At this time, the view had one error entry for the table sync worker.
After retrying table sync, it succeeded:

2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "test_tab1"
has finished

Then after inserting a row on the publisher, the apply worker inserted
the row but failed due to violating a unique key violation, which is
expected:

2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value
violates unique constraint "test_tab1_pkey"
2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists.
2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data
during "INSERT" for replication target relation "public.test_tab1" in
transaction 721 at 2021-11-30 06:24:04.305096+01

As a result, we had two error entries for test_tab1: the table sync
worker error and the apply worker error. I didn't expect that the
table sync worker for test_tab1 failed due to "replication origin with
OID 2 is already active for PID 23706” error.

Looking at test_subscription_error() in 026_worker_stats.pl, we have
two checks; in the first check, we wait for the view to show the error
entry for the given relation name and xid. This check was passed since
we had the second error (i.g., apply worker error). In the second
check, we get error entries from pg_stat_subscription_workers by
specifying only the relation name. Therefore, we ended up getting two
entries and failed the tests.

To fix this issue, I think that in the second check, we can get the
error from pg_stat_subscription_workers by specifying the relation
name *and* xid like the first check does. I've attached the patch.
What do you think?

I think this will fix the reported failure but there is another race
condition in the test. Isn't it possible that for table test_tab2, we
get an error "replication origin with OID ..." or some other error
before copy, in that case also, we will proceed from the second call
of test_subscription_error() which is not what we expect in the test?

Right.

Shouldn't we someway check that the error message also starts with
"duplicate key value violates ..."?

Yeah, I think it's a good idea to make the checks more specific. That
is, probably we can specify the prefix of the error message and
subrelid in addition to the current conditions: relid and xid. That
way, we can check what error was reported by which workers (tablesync
or apply) for which relations. And both check queries in
test_subscription_error() can have the same WHERE clause.

I've attached a patch that fixes this issue. Please review it.

Thanks for the updated patch, the patch applies neatly and make
check-world passes. Also I ran the failing test in a loop and found it
to be passing always.

Regards,
Vignesh

#330houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#328)
RE: Skipping logical replication transactions on subscriber side

On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 30, 2021 at 8:41 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Mon, Nov 29, 2021 at 11:38 AM vignesh C <vignesh21@gmail.com>

wrote:

I have pushed this patch and there is a buildfarm failure for it. See:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&amp;d
t=2021-11-30%2005%3A05%3A25

Sawada-San has shared his initial analysis on pgsql-committers [1]
and I am responding here as the fix requires some more discussion.

Looking at the result the test actually got, we had two error
entries for test_tab1 instead of one:

# Failed test 'check the error reported by the apply worker'
# at t/026_worker_stats.pl line 33.
# got: 'tap_sub|INSERT|test_tab1|t
# tap_sub||test_tab1|t'
# expected: 'tap_sub|INSERT|test_tab1|t'

The possible scenarios are:

The table sync worker for test_tab1 failed due to an error
unrelated to apply changes:

2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin
with OID 2 is already active for PID 23706

At this time, the view had one error entry for the table sync worker.
After retrying table sync, it succeeded:

2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication
table synchronization worker for subscription "tap_sub", table

"test_tab1"

has finished

Then after inserting a row on the publisher, the apply worker
inserted the row but failed due to violating a unique key
violation, which is
expected:

2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value
violates unique constraint "test_tab1_pkey"
2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists.
2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote
data during "INSERT" for replication target relation
"public.test_tab1" in transaction 721 at 2021-11-30
06:24:04.305096+01

As a result, we had two error entries for test_tab1: the table
sync worker error and the apply worker error. I didn't expect that
the table sync worker for test_tab1 failed due to "replication
origin with OID 2 is already active for PID 23706” error.

Looking at test_subscription_error() in 026_worker_stats.pl, we
have two checks; in the first check, we wait for the view to show
the error entry for the given relation name and xid. This check
was passed since we had the second error (i.g., apply worker
error). In the second check, we get error entries from
pg_stat_subscription_workers by specifying only the relation name.
Therefore, we ended up getting two entries and failed the tests.

To fix this issue, I think that in the second check, we can get
the error from pg_stat_subscription_workers by specifying the
relation name *and* xid like the first check does. I've attached the patch.
What do you think?

I think this will fix the reported failure but there is another race
condition in the test. Isn't it possible that for table test_tab2,
we get an error "replication origin with OID ..." or some other
error before copy, in that case also, we will proceed from the
second call of test_subscription_error() which is not what we expect in the

test?

Right.

Shouldn't we someway check that the error message also starts with
"duplicate key value violates ..."?

Yeah, I think it's a good idea to make the checks more specific. That
is, probably we can specify the prefix of the error message and
subrelid in addition to the current conditions: relid and xid. That
way, we can check what error was reported by which workers (tablesync
or apply) for which relations. And both check queries in
test_subscription_error() can have the same WHERE clause.

I've attached a patch that fixes this issue. Please review it.

I have a question about the testcase (I could be wrong here).

Is it possible that the race condition happen between apply worker(test_tab1)
and table sync worker(test_tab2) ? If so, it seems the error("replication
origin with OID") could happen randomly until we resolve the conflict.
Based on this, for the following code:
-----
# Wait for the error statistics to be updated.
my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
$node->poll_query_until(
'postgres', $check_sql,
) or die "Timed out while waiting for statistics to be updated";

* [1] *

$check_sql =
qq[
SELECT subname, last_error_command, last_error_relid::regclass,
last_error_count > 0 ] . $part_sql;
my $result = $node->safe_psql('postgres', $check_sql);
is($result, $expected, $msg);
-----

Is it possible that the error("replication origin with OID") happen again at the
place [1]. In this case, the error message we have checked could be replaced by
another error("replication origin ...") and then the test fail ?

Best regards,
Hou zj

#331Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: houzj.fnst@fujitsu.com (#330)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 at 8:24 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Shouldn't we someway check that the error message also starts with
"duplicate key value violates ..."?

Yeah, I think it's a good idea to make the checks more specific. That
is, probably we can specify the prefix of the error message and
subrelid in addition to the current conditions: relid and xid. That
way, we can check what error was reported by which workers (tablesync
or apply) for which relations. And both check queries in
test_subscription_error() can have the same WHERE clause.

I've attached a patch that fixes this issue. Please review it.

I have a question about the testcase (I could be wrong here).

Is it possible that the race condition happen between apply worker(test_tab1)
and table sync worker(test_tab2) ? If so, it seems the error("replication
origin with OID") could happen randomly until we resolve the conflict.
Based on this, for the following code:
-----
# Wait for the error statistics to be updated.
my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
$node->poll_query_until(
'postgres', $check_sql,
) or die "Timed out while waiting for statistics to be updated";

* [1] *

$check_sql =
qq[
SELECT subname, last_error_command, last_error_relid::regclass,
last_error_count > 0 ] . $part_sql;
my $result = $node->safe_psql('postgres', $check_sql);
is($result, $expected, $msg);
-----

Is it possible that the error("replication origin with OID") happen again at the
place [1]. In this case, the error message we have checked could be replaced by
another error("replication origin ...") and then the test fail ?

Once we get the "duplicate key violation ..." error before * [1] * via
apply_worker then we shouldn't get replication origin-specific error
because the origin set up is done before starting to apply changes.
Also, even if that or some other happens after * [1] * because of
errmsg_prefix check it should still succeed. Does that make sense?

--
With Regards,
Amit Kapila.

#332houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#331)
RE: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 11:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 1, 2021 at 8:24 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

Shouldn't we someway check that the error message also starts with
"duplicate key value violates ..."?

Yeah, I think it's a good idea to make the checks more specific. That
is, probably we can specify the prefix of the error message and
subrelid in addition to the current conditions: relid and xid. That
way, we can check what error was reported by which workers (tablesync
or apply) for which relations. And both check queries in
test_subscription_error() can have the same WHERE clause.

I've attached a patch that fixes this issue. Please review it.

I have a question about the testcase (I could be wrong here).

Is it possible that the race condition happen between apply

worker(test_tab1)

and table sync worker(test_tab2) ? If so, it seems the error("replication
origin with OID") could happen randomly until we resolve the conflict.
Based on this, for the following code:
-----
# Wait for the error statistics to be updated.
my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
$node->poll_query_until(
'postgres', $check_sql,
) or die "Timed out while waiting for statistics to be updated";

* [1] *

$check_sql =
qq[
SELECT subname, last_error_command, last_error_relid::regclass,
last_error_count > 0 ] . $part_sql;
my $result = $node->safe_psql('postgres', $check_sql);
is($result, $expected, $msg);
-----

Is it possible that the error("replication origin with OID") happen again at the
place [1]. In this case, the error message we have checked could be replaced

by

another error("replication origin ...") and then the test fail ?

Once we get the "duplicate key violation ..." error before * [1] * via
apply_worker then we shouldn't get replication origin-specific error
because the origin set up is done before starting to apply changes.
Also, even if that or some other happens after * [1] * because of
errmsg_prefix check it should still succeed. Does that make sense?

Oh, I missed the point that the origin set up is done once we get the expected error.
Thanks for the explanation, and I think the patch looks good.

Best regards,
Hou zj

#333Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#331)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 1, 2021 at 8:24 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Shouldn't we someway check that the error message also starts with
"duplicate key value violates ..."?

Yeah, I think it's a good idea to make the checks more specific. That
is, probably we can specify the prefix of the error message and
subrelid in addition to the current conditions: relid and xid. That
way, we can check what error was reported by which workers (tablesync
or apply) for which relations. And both check queries in
test_subscription_error() can have the same WHERE clause.

I've attached a patch that fixes this issue. Please review it.

I have a question about the testcase (I could be wrong here).

Is it possible that the race condition happen between apply worker(test_tab1)
and table sync worker(test_tab2) ? If so, it seems the error("replication
origin with OID") could happen randomly until we resolve the conflict.
Based on this, for the following code:
-----
# Wait for the error statistics to be updated.
my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
$node->poll_query_until(
'postgres', $check_sql,
) or die "Timed out while waiting for statistics to be updated";

* [1] *

$check_sql =
qq[
SELECT subname, last_error_command, last_error_relid::regclass,
last_error_count > 0 ] . $part_sql;
my $result = $node->safe_psql('postgres', $check_sql);
is($result, $expected, $msg);
-----

Is it possible that the error("replication origin with OID") happen again at the
place [1]. In this case, the error message we have checked could be replaced by
another error("replication origin ...") and then the test fail ?

Once we get the "duplicate key violation ..." error before * [1] * via
apply_worker then we shouldn't get replication origin-specific error
because the origin set up is done before starting to apply changes.

Right.

Also, even if that or some other happens after * [1] * because of
errmsg_prefix check it should still succeed.

In this case, the old error ("duplicate key violation ...") is
overwritten by a new error (e.g., connection error. not sure how
possible it is) and the test fails because the query returns no
entries, no? If so, the result from the second check_sql is unstable
and it's probably better to check the result only once. That is, the
first check_sql includes the command and we exit from the function
once we confirm the error entry is expectedly updated.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#334Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#333)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 1, 2021 at 8:24 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

I have a question about the testcase (I could be wrong here).

Is it possible that the race condition happen between apply worker(test_tab1)
and table sync worker(test_tab2) ? If so, it seems the error("replication
origin with OID") could happen randomly until we resolve the conflict.
Based on this, for the following code:
-----
# Wait for the error statistics to be updated.
my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
$node->poll_query_until(
'postgres', $check_sql,
) or die "Timed out while waiting for statistics to be updated";

* [1] *

$check_sql =
qq[
SELECT subname, last_error_command, last_error_relid::regclass,
last_error_count > 0 ] . $part_sql;
my $result = $node->safe_psql('postgres', $check_sql);
is($result, $expected, $msg);
-----

Is it possible that the error("replication origin with OID") happen again at the
place [1]. In this case, the error message we have checked could be replaced by
another error("replication origin ...") and then the test fail ?

Once we get the "duplicate key violation ..." error before * [1] * via
apply_worker then we shouldn't get replication origin-specific error
because the origin set up is done before starting to apply changes.

Right.

Also, even if that or some other happens after * [1] * because of
errmsg_prefix check it should still succeed.

In this case, the old error ("duplicate key violation ...") is
overwritten by a new error (e.g., connection error. not sure how
possible it is)

Yeah, or probably some memory allocation failure. I think the
probability of such failures is very low but OTOH why take chance.

and the test fails because the query returns no
entries, no?

Right.

If so, the result from the second check_sql is unstable
and it's probably better to check the result only once. That is, the
first check_sql includes the command and we exit from the function
once we confirm the error entry is expectedly updated.

Yeah, I think that should be fine.

With Regards,
Amit Kapila.

#335Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#334)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 at 1:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 1, 2021 at 8:24 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

I have a question about the testcase (I could be wrong here).

Is it possible that the race condition happen between apply worker(test_tab1)
and table sync worker(test_tab2) ? If so, it seems the error("replication
origin with OID") could happen randomly until we resolve the conflict.
Based on this, for the following code:
-----
# Wait for the error statistics to be updated.
my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
$node->poll_query_until(
'postgres', $check_sql,
) or die "Timed out while waiting for statistics to be updated";

* [1] *

$check_sql =
qq[
SELECT subname, last_error_command, last_error_relid::regclass,
last_error_count > 0 ] . $part_sql;
my $result = $node->safe_psql('postgres', $check_sql);
is($result, $expected, $msg);
-----

Is it possible that the error("replication origin with OID") happen again at the
place [1]. In this case, the error message we have checked could be replaced by
another error("replication origin ...") and then the test fail ?

Once we get the "duplicate key violation ..." error before * [1] * via
apply_worker then we shouldn't get replication origin-specific error
because the origin set up is done before starting to apply changes.

Right.

Also, even if that or some other happens after * [1] * because of
errmsg_prefix check it should still succeed.

In this case, the old error ("duplicate key violation ...") is
overwritten by a new error (e.g., connection error. not sure how
possible it is)

Yeah, or probably some memory allocation failure. I think the
probability of such failures is very low but OTOH why take chance.

and the test fails because the query returns no
entries, no?

Right.

If so, the result from the second check_sql is unstable
and it's probably better to check the result only once. That is, the
first check_sql includes the command and we exit from the function
once we confirm the error entry is expectedly updated.

Yeah, I think that should be fine.

Okay, I've attached an updated patch. Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v2-0001-Fix-regression-test-failure-caused-by-commit-8d74.patchapplication/octet-stream; name=v2-0001-Fix-regression-test-failure-caused-by-commit-8d74.patch
#336houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#335)
RE: Skipping logical replication transactions on subscriber side

On Wednesday, December 1, 2021 1:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 1, 2021 at 1:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

If so, the result from the second check_sql is unstable and it's
probably better to check the result only once. That is, the first
check_sql includes the command and we exit from the function once we
confirm the error entry is expectedly updated.

Yeah, I think that should be fine.

Okay, I've attached an updated patch. Please review it.

I agreed that checking the result only once makes the test more stable.
The patch looks good to me.

Best regards,
Hou zj

#337Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: houzj.fnst@fujitsu.com (#336)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 at 11:57 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, December 1, 2021 1:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Okay, I've attached an updated patch. Please review it.

I agreed that checking the result only once makes the test more stable.
The patch looks good to me.

Pushed.

Now, coming back to the skip_xid patch. To summarize the discussion in
that regard so far, we have discussed various alternatives for the
syntax like:

a. ALTER SUBSCRIPTION ... [SET|RESET] SKIP TRANSACTION xxx;
b. Alter Subscription <sub_name> SET ( subscription_parameter [=value]
[, ... ] );
c. Alter Subscription <sub_name> On Error ( subscription_parameter
[=value] [, ... ] );
d. Alter Subscription <sub_name> SKIP ( subscription_parameter
[=value] [, ... ] );
where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

We didn't prefer (a) as it can lead to more keywords as we add more
options; (b) as we want these new skip options to behave and be set
differently than existing subscription properties because of the
difference in their behavior; (c) as that sounds more like an action
to be performed on a future condition (error/conflict) whereas here we
already knew that an error has happened;

As per discussion till now, option (d) seems preferable. In this, we
need to see how and what to allow as options. The simplest way for the
first version is to just allow one xid to be specified at a time which
would mean that specifying multiple xids should error out. We can also
additionally allow specifying operations like 'insert', 'update',
etc., and then relation list (list of oids). What that would mean is
that for a transaction we can allow which particular operations and
relations we want to skip.

I am not sure what exactly we can provide to users to allow skipping
initial table sync as we can't specify XID there. One option that
comes to mind is to allow specifying a combination of copy_data and
relid to skip table sync for a particular relation. We might think of
not doing anything for table sync workers but not sure if that is a
good option.

Thoughts?

--
With Regards,
Amit Kapila.

#338Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Amit Kapila (#337)
Re: Skipping logical replication transactions on subscriber side

On 02.12.21 07:48, Amit Kapila wrote:

a. ALTER SUBSCRIPTION ... [SET|RESET] SKIP TRANSACTION xxx;
b. Alter Subscription <sub_name> SET ( subscription_parameter [=value]
[, ... ] );
c. Alter Subscription <sub_name> On Error ( subscription_parameter
[=value] [, ... ] );
d. Alter Subscription <sub_name> SKIP ( subscription_parameter
[=value] [, ... ] );
where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

As per discussion till now, option (d) seems preferable.

I agree.

In this, we
need to see how and what to allow as options. The simplest way for the
first version is to just allow one xid to be specified at a time which
would mean that specifying multiple xids should error out. We can also
additionally allow specifying operations like 'insert', 'update',
etc., and then relation list (list of oids). What that would mean is
that for a transaction we can allow which particular operations and
relations we want to skip.

I don't know how difficult it would be, but allowing multiple xids might
be desirable. But this syntax gives you flexibility, so we can also
start with a simple implementation.

I am not sure what exactly we can provide to users to allow skipping
initial table sync as we can't specify XID there. One option that
comes to mind is to allow specifying a combination of copy_data and
relid to skip table sync for a particular relation. We might think of
not doing anything for table sync workers but not sure if that is a
good option.

I don't think this feature should affect tablesync. The semantics are
not clear, and it's not really needed. If the tablesync doesn't work,
you can try the setup again from scratch.

#339Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#338)
Re: Skipping logical replication transactions on subscriber side

On Thu, Dec 2, 2021 at 8:38 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 02.12.21 07:48, Amit Kapila wrote:

a. ALTER SUBSCRIPTION ... [SET|RESET] SKIP TRANSACTION xxx;
b. Alter Subscription <sub_name> SET ( subscription_parameter [=value]
[, ... ] );
c. Alter Subscription <sub_name> On Error ( subscription_parameter
[=value] [, ... ] );
d. Alter Subscription <sub_name> SKIP ( subscription_parameter
[=value] [, ... ] );
where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

As per discussion till now, option (d) seems preferable.

I agree.

In this, we
need to see how and what to allow as options. The simplest way for the
first version is to just allow one xid to be specified at a time which
would mean that specifying multiple xids should error out. We can also
additionally allow specifying operations like 'insert', 'update',
etc., and then relation list (list of oids). What that would mean is
that for a transaction we can allow which particular operations and
relations we want to skip.

I don't know how difficult it would be, but allowing multiple xids might
be desirable.

Are there many cases where there could be multiple xid failures that
the user can skip? Apply worker always keeps looping at the same error
failure so the user wouldn't know of the second xid failure (if any)
till the first failure is resolved. I could think of one such case
where it is possible during the initial synchronization phase where
apply worker went ahead then tablesync worker by skipping to apply the
changes on the corresponding table. After that, it is possible, that
the table sync worker failed during the catch-up phase and apply
worker fails during the processing of some other rel.

But this syntax gives you flexibility, so we can also
start with a simple implementation.

Yeah, I also think so. BTW, what do you think of providing extra
flexibility of giving other options like 'operation', 'rel' along with
xid? I think such options could be useful for large transactions that
operate on multiple tables as it is quite possible that only a
particular operation from the entire transaction is the cause of
failure. Now, on one side, we can argue that skipping the entire
transaction is better from the consistency point of view but I think
it is already possible that we just skip a particular update/delete
(if the corresponding tuple doesn't exist on the subscriber). For the
sake of simplicity, we can just allow providing xid at this stage and
then extend it later as required but I am not very sure of that point.

I am not sure what exactly we can provide to users to allow skipping
initial table sync as we can't specify XID there. One option that
comes to mind is to allow specifying a combination of copy_data and
relid to skip table sync for a particular relation. We might think of
not doing anything for table sync workers but not sure if that is a
good option.

I don't think this feature should affect tablesync. The semantics are
not clear, and it's not really needed. If the tablesync doesn't work,
you can try the setup again from scratch.

Okay, that makes sense. But note it is possible that tablesync workers
might also need to skip some xids during the catchup phase to complete
the sync.

--
With Regards,
Amit Kapila.

#340Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#339)
Re: Skipping logical replication transactions on subscriber side

On Fri, Dec 3, 2021 at 11:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 2, 2021 at 8:38 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 02.12.21 07:48, Amit Kapila wrote:

a. ALTER SUBSCRIPTION ... [SET|RESET] SKIP TRANSACTION xxx;
b. Alter Subscription <sub_name> SET ( subscription_parameter [=value]
[, ... ] );
c. Alter Subscription <sub_name> On Error ( subscription_parameter
[=value] [, ... ] );
d. Alter Subscription <sub_name> SKIP ( subscription_parameter
[=value] [, ... ] );
where subscription_parameter can be one of:
xid = <xid_val>
lsn = <lsn_val>
...

As per discussion till now, option (d) seems preferable.

I agree.

+1

In this, we
need to see how and what to allow as options. The simplest way for the
first version is to just allow one xid to be specified at a time which
would mean that specifying multiple xids should error out. We can also
additionally allow specifying operations like 'insert', 'update',
etc., and then relation list (list of oids). What that would mean is
that for a transaction we can allow which particular operations and
relations we want to skip.

I don't know how difficult it would be, but allowing multiple xids might
be desirable.

Are there many cases where there could be multiple xid failures that
the user can skip? Apply worker always keeps looping at the same error
failure so the user wouldn't know of the second xid failure (if any)
till the first failure is resolved. I could think of one such case
where it is possible during the initial synchronization phase where
apply worker went ahead then tablesync worker by skipping to apply the
changes on the corresponding table. After that, it is possible, that
the table sync worker failed during the catch-up phase and apply
worker fails during the processing of some other rel.

But this syntax gives you flexibility, so we can also
start with a simple implementation.

Yeah, I also think so. BTW, what do you think of providing extra
flexibility of giving other options like 'operation', 'rel' along with
xid? I think such options could be useful for large transactions that
operate on multiple tables as it is quite possible that only a
particular operation from the entire transaction is the cause of
failure. Now, on one side, we can argue that skipping the entire
transaction is better from the consistency point of view but I think
it is already possible that we just skip a particular update/delete
(if the corresponding tuple doesn't exist on the subscriber). For the
sake of simplicity, we can just allow providing xid at this stage and
then extend it later as required but I am not very sure of that point.

+1

Skipping a whole transaction by specifying xid would be a good start.
Ideally, we'd like to automatically skip only operations within the
transaction that fail but it seems not easy to achieve. If we allow
specifying operations and/or relations, probably multiple operations
or relations need to be specified in some cases. Otherwise, the
subscriber cannot continue logical replication if the transaction has
multiple operations on different relations that fail. But similar to
the idea of specifying multiple xids, we need to note the fact that
user wouldn't know of the second operation failure unless the apply
worker applies the change. So I'm not sure there are many use cases in
practice where users can specify multiple operations and relations in
order to skip applies that fail.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#341Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#340)
Re: Skipping logical replication transactions on subscriber side

On Fri, Dec 3, 2021 at 12:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Dec 3, 2021 at 11:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

But this syntax gives you flexibility, so we can also
start with a simple implementation.

Yeah, I also think so. BTW, what do you think of providing extra
flexibility of giving other options like 'operation', 'rel' along with
xid? I think such options could be useful for large transactions that
operate on multiple tables as it is quite possible that only a
particular operation from the entire transaction is the cause of
failure. Now, on one side, we can argue that skipping the entire
transaction is better from the consistency point of view but I think
it is already possible that we just skip a particular update/delete
(if the corresponding tuple doesn't exist on the subscriber). For the
sake of simplicity, we can just allow providing xid at this stage and
then extend it later as required but I am not very sure of that point.

+1

Skipping a whole transaction by specifying xid would be a good start.

Okay, that sounds reasonable, so let's do that for now.

--
With Regards,
Amit Kapila.

#342Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#341)
Re: Skipping logical replication transactions on subscriber side

On Mon, Dec 6, 2021 at 2:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 3, 2021 at 12:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Dec 3, 2021 at 11:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

But this syntax gives you flexibility, so we can also
start with a simple implementation.

Yeah, I also think so. BTW, what do you think of providing extra
flexibility of giving other options like 'operation', 'rel' along with
xid? I think such options could be useful for large transactions that
operate on multiple tables as it is quite possible that only a
particular operation from the entire transaction is the cause of
failure. Now, on one side, we can argue that skipping the entire
transaction is better from the consistency point of view but I think
it is already possible that we just skip a particular update/delete
(if the corresponding tuple doesn't exist on the subscriber). For the
sake of simplicity, we can just allow providing xid at this stage and
then extend it later as required but I am not very sure of that point.

+1

Skipping a whole transaction by specifying xid would be a good start.

Okay, that sounds reasonable, so let's do that for now.

I'll submit the patch tomorrow.

While updating the patch, I realized that skipping a transaction that
is prepared on the publisher will be tricky a bit;

First of all, since skip-xid is in pg_subscription catalog, we need to
do a catalog update in a transaction and commit it to disable it. I
think we need to set origin-lsn and timestamp of the transaction being
skipped to the transaction that does the catalog update. That is,
during skipping the (not prepared) transaction, we skip all
data-modification changes coming from the publisher, do a catalog
update, and commit the transaction. If we do the catalog update in the
next transaction after skipping the whole transaction, skip_xid could
be left in case of a server crash between them. Also, we cannot set
origin-lsn and timestamp to an empty transaction.

In prepared transaction cases, I think that when handling a prepare
message, we need to commit the transaction to update the catalog,
instead of preparing it. And at the commit prepared and rollback
prepared time, we skip it since there is not the prepared transaction
on the subscriber. Currently, handling rollback prepared already
behaves so; it first checks whether we have prepared the transaction
or not and skip it if haven’t. So I think we need to do that also for
commit prepared case. With that, this requires protocol changes so
that the subscriber can get prepare-lsn and prepare-time when handling
commit prepared.

So I’m writing a separate patch to add prepare-lsn and timestamp to
commit_prepared message, which will be a building block for skipping
prepared transactions. Actually, I think it’s beneficial even today;
we can skip preparing the transaction if it’s an empty transaction.
Although the comment it’s not a common case, I think that it could
happen quite often in some cases:

* XXX, We can optimize such that at commit prepared time, we first check
* whether we have prepared the transaction or not but that doesn't seem
* worthwhile because such cases shouldn't be common.
*/

For example, if the publisher has multiple subscriptions and there are
many prepared transactions that modify the particular table subscribed
by one publisher, many empty transactions are replicated to other
subscribers.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#343Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Amit Kapila (#339)
Re: Skipping logical replication transactions on subscriber side

On 03.12.21 03:53, Amit Kapila wrote:

I don't know how difficult it would be, but allowing multiple xids might
be desirable.

Are there many cases where there could be multiple xid failures that
the user can skip? Apply worker always keeps looping at the same error
failure so the user wouldn't know of the second xid failure (if any)
till the first failure is resolved.

Yeah, nevermind, doesn't make sense.

Yeah, I also think so. BTW, what do you think of providing extra
flexibility of giving other options like 'operation', 'rel' along with
xid? I think such options could be useful for large transactions that
operate on multiple tables as it is quite possible that only a
particular operation from the entire transaction is the cause of
failure. Now, on one side, we can argue that skipping the entire
transaction is better from the consistency point of view but I think
it is already possible that we just skip a particular update/delete
(if the corresponding tuple doesn't exist on the subscriber). For the
sake of simplicity, we can just allow providing xid at this stage and
then extend it later as required but I am not very sure of that point.

Skipping transactions partially sounds dangerous, especially when
exposed as an option to users. Needs more careful thought.

#344Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#342)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 7, 2021 at 5:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 6, 2021 at 2:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll submit the patch tomorrow.

While updating the patch, I realized that skipping a transaction that
is prepared on the publisher will be tricky a bit;

First of all, since skip-xid is in pg_subscription catalog, we need to
do a catalog update in a transaction and commit it to disable it. I
think we need to set origin-lsn and timestamp of the transaction being
skipped to the transaction that does the catalog update. That is,
during skipping the (not prepared) transaction, we skip all
data-modification changes coming from the publisher, do a catalog
update, and commit the transaction. If we do the catalog update in the
next transaction after skipping the whole transaction, skip_xid could
be left in case of a server crash between them.

But if we haven't updated origin_lsn/timestamp before the crash, won't
it request the same transaction again from the publisher? If so, it
will be again able to skip it because skip_xid is still not updated.

Also, we cannot set
origin-lsn and timestamp to an empty transaction.

But won't we update the catalog for skip_xid in that case?

Do we see any advantage of updating the skip_xid in the same
transaction vs. doing it in a separate transaction? If not then
probably we can choose either of those ways and add some comments to
indicate the possibility of doing it another way.

In prepared transaction cases, I think that when handling a prepare
message, we need to commit the transaction to update the catalog,
instead of preparing it. And at the commit prepared and rollback
prepared time, we skip it since there is not the prepared transaction
on the subscriber.

Can't we think of just allowing prepare in this case and updating the
skip_xid only at commit time? I see that in this case, we would be
doing prepare for a transaction that has no changes but as such cases
won't be common, isn't that acceptable?

Currently, handling rollback prepared already
behaves so; it first checks whether we have prepared the transaction
or not and skip it if haven’t. So I think we need to do that also for
commit prepared case. With that, this requires protocol changes so
that the subscriber can get prepare-lsn and prepare-time when handling
commit prepared.

So I’m writing a separate patch to add prepare-lsn and timestamp to
commit_prepared message, which will be a building block for skipping
prepared transactions. Actually, I think it’s beneficial even today;
we can skip preparing the transaction if it’s an empty transaction.
Although the comment it’s not a common case, I think that it could
happen quite often in some cases:

* XXX, We can optimize such that at commit prepared time, we first check
* whether we have prepared the transaction or not but that doesn't seem
* worthwhile because such cases shouldn't be common.
*/

For example, if the publisher has multiple subscriptions and there are
many prepared transactions that modify the particular table subscribed
by one publisher, many empty transactions are replicated to other
subscribers.

I think this is not clear to me. Why would one have multiple
subscriptions for the same publication? I thought it is possible when
say some publisher doesn't publish any data of prepared transaction
say because the corresponding action is not published or something
like that. I don't deny that someday we want to optimize this case but
it might be better if we don't need to do it along with this patch.

--
With Regards,
Amit Kapila.

#345Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#344)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 8, 2021 at 2:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 7, 2021 at 5:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 6, 2021 at 2:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll submit the patch tomorrow.

While updating the patch, I realized that skipping a transaction that
is prepared on the publisher will be tricky a bit;

First of all, since skip-xid is in pg_subscription catalog, we need to
do a catalog update in a transaction and commit it to disable it. I
think we need to set origin-lsn and timestamp of the transaction being
skipped to the transaction that does the catalog update. That is,
during skipping the (not prepared) transaction, we skip all
data-modification changes coming from the publisher, do a catalog
update, and commit the transaction. If we do the catalog update in the
next transaction after skipping the whole transaction, skip_xid could
be left in case of a server crash between them.

But if we haven't updated origin_lsn/timestamp before the crash, won't
it request the same transaction again from the publisher? If so, it
will be again able to skip it because skip_xid is still not updated.

Yes. I mean that if we update origin_lsn and origin_timestamp when
committing the skipped transaction and then update the catalog in the
next transaction it doesn't work in case of a crash. But it's not
possible in the first place since the first transaction is empty and
we cannot set origin_lsn and origin_timestamp to it.

Also, we cannot set
origin-lsn and timestamp to an empty transaction.

But won't we update the catalog for skip_xid in that case?

Yes. Probably my explanation was not clear. Even if we skip all
changes of the transaction, the transaction doesn't become empty since
we update the catalog.

Do we see any advantage of updating the skip_xid in the same
transaction vs. doing it in a separate transaction? If not then
probably we can choose either of those ways and add some comments to
indicate the possibility of doing it another way.

I think that since the skipped transaction is always empty there is
always one transaction. What we need to consider is when we update
origin_lsn and origin_timestamp. In non-prepared transaction cases,
the only option is when updating the catalog.

In prepared transaction cases, I think that when handling a prepare
message, we need to commit the transaction to update the catalog,
instead of preparing it. And at the commit prepared and rollback
prepared time, we skip it since there is not the prepared transaction
on the subscriber.

Can't we think of just allowing prepare in this case and updating the
skip_xid only at commit time? I see that in this case, we would be
doing prepare for a transaction that has no changes but as such cases
won't be common, isn't that acceptable?

In this case, we will end up committing both the prepared (empty)
transaction and the transaction that updates the catalog, right? If
so, since these are separate transactions it can be a problem in case
of a crash between these two commits.

Currently, handling rollback prepared already
behaves so; it first checks whether we have prepared the transaction
or not and skip it if haven’t. So I think we need to do that also for
commit prepared case. With that, this requires protocol changes so
that the subscriber can get prepare-lsn and prepare-time when handling
commit prepared.

So I’m writing a separate patch to add prepare-lsn and timestamp to
commit_prepared message, which will be a building block for skipping
prepared transactions. Actually, I think it’s beneficial even today;
we can skip preparing the transaction if it’s an empty transaction.
Although the comment it’s not a common case, I think that it could
happen quite often in some cases:

* XXX, We can optimize such that at commit prepared time, we first check
* whether we have prepared the transaction or not but that doesn't seem
* worthwhile because such cases shouldn't be common.
*/

For example, if the publisher has multiple subscriptions and there are
many prepared transactions that modify the particular table subscribed
by one publisher, many empty transactions are replicated to other
subscribers.

I think this is not clear to me. Why would one have multiple
subscriptions for the same publication? I thought it is possible when
say some publisher doesn't publish any data of prepared transaction
say because the corresponding action is not published or something
like that. I don't deny that someday we want to optimize this case but
it might be better if we don't need to do it along with this patch.

I imagined that the publisher has two publications (say pub-A and
pub-B) that publishes a diferent set of relations in the database and
there are two subscribers that are subscribing to either one
publication (e.g, subscriber-A subscribes to pub-A and subscriber-B
subscribes to pub-B). If many prepared transactions happen on the
publisher and these transactions modify only relations published by
pub-A, both subscriber-A and subscriber-B would prepare the same
number of transactions but all of them in subscriber-B is empty.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#346Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#345)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 8, 2021 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 8, 2021 at 2:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 7, 2021 at 5:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 6, 2021 at 2:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll submit the patch tomorrow.

While updating the patch, I realized that skipping a transaction that
is prepared on the publisher will be tricky a bit;

First of all, since skip-xid is in pg_subscription catalog, we need to
do a catalog update in a transaction and commit it to disable it. I
think we need to set origin-lsn and timestamp of the transaction being
skipped to the transaction that does the catalog update. That is,
during skipping the (not prepared) transaction, we skip all
data-modification changes coming from the publisher, do a catalog
update, and commit the transaction. If we do the catalog update in the
next transaction after skipping the whole transaction, skip_xid could
be left in case of a server crash between them.

But if we haven't updated origin_lsn/timestamp before the crash, won't
it request the same transaction again from the publisher? If so, it
will be again able to skip it because skip_xid is still not updated.

Yes. I mean that if we update origin_lsn and origin_timestamp when
committing the skipped transaction and then update the catalog in the
next transaction it doesn't work in case of a crash. But it's not
possible in the first place since the first transaction is empty and
we cannot set origin_lsn and origin_timestamp to it.

Also, we cannot set
origin-lsn and timestamp to an empty transaction.

But won't we update the catalog for skip_xid in that case?

Yes. Probably my explanation was not clear. Even if we skip all
changes of the transaction, the transaction doesn't become empty since
we update the catalog.

Do we see any advantage of updating the skip_xid in the same
transaction vs. doing it in a separate transaction? If not then
probably we can choose either of those ways and add some comments to
indicate the possibility of doing it another way.

I think that since the skipped transaction is always empty there is
always one transaction. What we need to consider is when we update
origin_lsn and origin_timestamp. In non-prepared transaction cases,
the only option is when updating the catalog.

Your last sentence is not completely clear to me but it seems you
agree that we can use one transaction instead of two to skip the
changes, perform a catalog update, and update origin_lsn/timestamp.

In prepared transaction cases, I think that when handling a prepare
message, we need to commit the transaction to update the catalog,
instead of preparing it. And at the commit prepared and rollback
prepared time, we skip it since there is not the prepared transaction
on the subscriber.

Can't we think of just allowing prepare in this case and updating the
skip_xid only at commit time? I see that in this case, we would be
doing prepare for a transaction that has no changes but as such cases
won't be common, isn't that acceptable?

In this case, we will end up committing both the prepared (empty)
transaction and the transaction that updates the catalog, right?

Can't we do this catalog update before committing the prepared
transaction? If so, both in prepared and non-prepared cases, our
implementation could be the same and we have a reason to accomplish
the catalog update in the same transaction for which we skipped the
changes.

If
so, since these are separate transactions it can be a problem in case
of a crash between these two commits.

Currently, handling rollback prepared already
behaves so; it first checks whether we have prepared the transaction
or not and skip it if haven’t. So I think we need to do that also for
commit prepared case. With that, this requires protocol changes so
that the subscriber can get prepare-lsn and prepare-time when handling
commit prepared.

So I’m writing a separate patch to add prepare-lsn and timestamp to
commit_prepared message, which will be a building block for skipping
prepared transactions. Actually, I think it’s beneficial even today;
we can skip preparing the transaction if it’s an empty transaction.
Although the comment it’s not a common case, I think that it could
happen quite often in some cases:

* XXX, We can optimize such that at commit prepared time, we first check
* whether we have prepared the transaction or not but that doesn't seem
* worthwhile because such cases shouldn't be common.
*/

For example, if the publisher has multiple subscriptions and there are
many prepared transactions that modify the particular table subscribed
by one publisher, many empty transactions are replicated to other
subscribers.

I think this is not clear to me. Why would one have multiple
subscriptions for the same publication? I thought it is possible when
say some publisher doesn't publish any data of prepared transaction
say because the corresponding action is not published or something
like that. I don't deny that someday we want to optimize this case but
it might be better if we don't need to do it along with this patch.

I imagined that the publisher has two publications (say pub-A and
pub-B) that publishes a diferent set of relations in the database and
there are two subscribers that are subscribing to either one
publication (e.g, subscriber-A subscribes to pub-A and subscriber-B
subscribes to pub-B). If many prepared transactions happen on the
publisher and these transactions modify only relations published by
pub-A, both subscriber-A and subscriber-B would prepare the same
number of transactions but all of them in subscriber-B is empty.

Okay, I understand those cases but note always checking if the
prepared xact exists during commit prepared has a cost and that is why
we avoided it at the first place. There is a separate effort in
progress [1]https://commitfest.postgresql.org/36/3093/ where we want to avoid sending empty transactions at the
first place. So, it is better to avoid this cost via that effort
rather than adding additional cost at commit of each prepared
transaction. OTOH, if there are other strong reasons to do it then we
can probably consider it.

[1]: https://commitfest.postgresql.org/36/3093/

--
With Regards,
Amit Kapila.

#347Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#346)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 8, 2021 at 3:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 8, 2021 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 8, 2021 at 2:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 7, 2021 at 5:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 6, 2021 at 2:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll submit the patch tomorrow.

While updating the patch, I realized that skipping a transaction that
is prepared on the publisher will be tricky a bit;

First of all, since skip-xid is in pg_subscription catalog, we need to
do a catalog update in a transaction and commit it to disable it. I
think we need to set origin-lsn and timestamp of the transaction being
skipped to the transaction that does the catalog update. That is,
during skipping the (not prepared) transaction, we skip all
data-modification changes coming from the publisher, do a catalog
update, and commit the transaction. If we do the catalog update in the
next transaction after skipping the whole transaction, skip_xid could
be left in case of a server crash between them.

But if we haven't updated origin_lsn/timestamp before the crash, won't
it request the same transaction again from the publisher? If so, it
will be again able to skip it because skip_xid is still not updated.

Yes. I mean that if we update origin_lsn and origin_timestamp when
committing the skipped transaction and then update the catalog in the
next transaction it doesn't work in case of a crash. But it's not
possible in the first place since the first transaction is empty and
we cannot set origin_lsn and origin_timestamp to it.

Also, we cannot set
origin-lsn and timestamp to an empty transaction.

But won't we update the catalog for skip_xid in that case?

Yes. Probably my explanation was not clear. Even if we skip all
changes of the transaction, the transaction doesn't become empty since
we update the catalog.

Do we see any advantage of updating the skip_xid in the same
transaction vs. doing it in a separate transaction? If not then
probably we can choose either of those ways and add some comments to
indicate the possibility of doing it another way.

I think that since the skipped transaction is always empty there is
always one transaction. What we need to consider is when we update
origin_lsn and origin_timestamp. In non-prepared transaction cases,
the only option is when updating the catalog.

Your last sentence is not completely clear to me but it seems you
agree that we can use one transaction instead of two to skip the
changes, perform a catalog update, and update origin_lsn/timestamp.

Yes.

In prepared transaction cases, I think that when handling a prepare
message, we need to commit the transaction to update the catalog,
instead of preparing it. And at the commit prepared and rollback
prepared time, we skip it since there is not the prepared transaction
on the subscriber.

Can't we think of just allowing prepare in this case and updating the
skip_xid only at commit time? I see that in this case, we would be
doing prepare for a transaction that has no changes but as such cases
won't be common, isn't that acceptable?

In this case, we will end up committing both the prepared (empty)
transaction and the transaction that updates the catalog, right?

Can't we do this catalog update before committing the prepared
transaction? If so, both in prepared and non-prepared cases, our
implementation could be the same and we have a reason to accomplish
the catalog update in the same transaction for which we skipped the
changes.

But in case of a crash between these two transactions, given that
skip_xid is already cleared how do we know the prepared transaction
that was supposed to be skipped?

If
so, since these are separate transactions it can be a problem in case
of a crash between these two commits.

Currently, handling rollback prepared already
behaves so; it first checks whether we have prepared the transaction
or not and skip it if haven’t. So I think we need to do that also for
commit prepared case. With that, this requires protocol changes so
that the subscriber can get prepare-lsn and prepare-time when handling
commit prepared.

So I’m writing a separate patch to add prepare-lsn and timestamp to
commit_prepared message, which will be a building block for skipping
prepared transactions. Actually, I think it’s beneficial even today;
we can skip preparing the transaction if it’s an empty transaction.
Although the comment it’s not a common case, I think that it could
happen quite often in some cases:

* XXX, We can optimize such that at commit prepared time, we first check
* whether we have prepared the transaction or not but that doesn't seem
* worthwhile because such cases shouldn't be common.
*/

For example, if the publisher has multiple subscriptions and there are
many prepared transactions that modify the particular table subscribed
by one publisher, many empty transactions are replicated to other
subscribers.

I think this is not clear to me. Why would one have multiple
subscriptions for the same publication? I thought it is possible when
say some publisher doesn't publish any data of prepared transaction
say because the corresponding action is not published or something
like that. I don't deny that someday we want to optimize this case but
it might be better if we don't need to do it along with this patch.

I imagined that the publisher has two publications (say pub-A and
pub-B) that publishes a diferent set of relations in the database and
there are two subscribers that are subscribing to either one
publication (e.g, subscriber-A subscribes to pub-A and subscriber-B
subscribes to pub-B). If many prepared transactions happen on the
publisher and these transactions modify only relations published by
pub-A, both subscriber-A and subscriber-B would prepare the same
number of transactions but all of them in subscriber-B is empty.

Okay, I understand those cases but note always checking if the
prepared xact exists during commit prepared has a cost and that is why
we avoided it at the first place. There is a separate effort in
progress [1] where we want to avoid sending empty transactions at the
first place. So, it is better to avoid this cost via that effort
rather than adding additional cost at commit of each prepared
transaction. OTOH, if there are other strong reasons to do it then we
can probably consider it.

Thank you for the information. Agreed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#348Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#347)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 8, 2021 at 12:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 8, 2021 at 3:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 8, 2021 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Can't we think of just allowing prepare in this case and updating the
skip_xid only at commit time? I see that in this case, we would be
doing prepare for a transaction that has no changes but as such cases
won't be common, isn't that acceptable?

In this case, we will end up committing both the prepared (empty)
transaction and the transaction that updates the catalog, right?

Can't we do this catalog update before committing the prepared
transaction? If so, both in prepared and non-prepared cases, our
implementation could be the same and we have a reason to accomplish
the catalog update in the same transaction for which we skipped the
changes.

But in case of a crash between these two transactions, given that
skip_xid is already cleared how do we know the prepared transaction
that was supposed to be skipped?

I was thinking of doing it as one transaction at the time of
commit_prepare. Say, in function apply_handle_commit_prepared(), if we
check whether the skip_xid is the same as prepare_data.xid then update
the catalog and set origin_lsn/timestamp in the same transaction. Why
do we need two transactions for it?

--
With Regards,
Amit Kapila.

#349Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#348)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 8, 2021 at 5:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 8, 2021 at 12:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 8, 2021 at 3:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 8, 2021 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Can't we think of just allowing prepare in this case and updating the
skip_xid only at commit time? I see that in this case, we would be
doing prepare for a transaction that has no changes but as such cases
won't be common, isn't that acceptable?

In this case, we will end up committing both the prepared (empty)
transaction and the transaction that updates the catalog, right?

Can't we do this catalog update before committing the prepared
transaction? If so, both in prepared and non-prepared cases, our
implementation could be the same and we have a reason to accomplish
the catalog update in the same transaction for which we skipped the
changes.

But in case of a crash between these two transactions, given that
skip_xid is already cleared how do we know the prepared transaction
that was supposed to be skipped?

I was thinking of doing it as one transaction at the time of
commit_prepare. Say, in function apply_handle_commit_prepared(), if we
check whether the skip_xid is the same as prepare_data.xid then update
the catalog and set origin_lsn/timestamp in the same transaction. Why
do we need two transactions for it?

I meant the two transactions are the prepared transaction and the
transaction that updates the catalog. If I understand your idea
correctly, in apply_handle_commit_prepared(), we update the catalog
and set origin_lsn/timestamp. These are done in the same transaction.
Then, we commit the prepared transaction, right? If the server crashes
between them, skip_xid is already cleared and logical replication
starts from the LSN after COMMIT PREPARED. But the prepared
transaction still exists on the subscriber.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#350Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#347)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 8, 2021 at 4:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Okay, I understand those cases but note always checking if the
prepared xact exists during commit prepared has a cost and that is why
we avoided it at the first place.

BTW what costs were we concerned about? Looking at LookupGXact(), we
look for the 2PC state data on shmem while acquiring TwoPhaseStateLock
in shared mode. And we check origin_lsn and origin_timestamp of 2PC by
reading WAL or 2PC state file only if gid matched. On the other hand,
committing the prepared transaction does WAL logging, waits for
synchronous replication, and calls post-commit callbacks, and removes
2PC state file etc. And it requires acquiring TwoPhaseStateLock in
exclusive mode to remove 2PC state entry. So it looks like always
checking if the prepared transaction exists and skipping it if not is
cheaper than always committing prepared transactions.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#351Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#349)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 8, 2021 at 4:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 8, 2021 at 5:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 8, 2021 at 12:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 8, 2021 at 3:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 8, 2021 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Can't we think of just allowing prepare in this case and updating the
skip_xid only at commit time? I see that in this case, we would be
doing prepare for a transaction that has no changes but as such cases
won't be common, isn't that acceptable?

In this case, we will end up committing both the prepared (empty)
transaction and the transaction that updates the catalog, right?

Can't we do this catalog update before committing the prepared
transaction? If so, both in prepared and non-prepared cases, our
implementation could be the same and we have a reason to accomplish
the catalog update in the same transaction for which we skipped the
changes.

But in case of a crash between these two transactions, given that
skip_xid is already cleared how do we know the prepared transaction
that was supposed to be skipped?

I was thinking of doing it as one transaction at the time of
commit_prepare. Say, in function apply_handle_commit_prepared(), if we
check whether the skip_xid is the same as prepare_data.xid then update
the catalog and set origin_lsn/timestamp in the same transaction. Why
do we need two transactions for it?

I meant the two transactions are the prepared transaction and the
transaction that updates the catalog. If I understand your idea
correctly, in apply_handle_commit_prepared(), we update the catalog
and set origin_lsn/timestamp. These are done in the same transaction.
Then, we commit the prepared transaction, right?

I am thinking that we can start a transaction, update the catalog,
commit that transaction. Then start a new one to update
origin_lsn/timestamp, finishprepared, and commit it. Now, if it
crashes after the first transaction, only commit prepared will be
resent again and this time we don't need to update the catalog as that
entry would be already cleared.

--
With Regards,
Amit Kapila.

#352Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#351)
Re: Skipping logical replication transactions on subscriber side

On Thu, Dec 9, 2021 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 8, 2021 at 4:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 8, 2021 at 5:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 8, 2021 at 12:36 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 8, 2021 at 3:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 8, 2021 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Can't we think of just allowing prepare in this case and updating the
skip_xid only at commit time? I see that in this case, we would be
doing prepare for a transaction that has no changes but as such cases
won't be common, isn't that acceptable?

In this case, we will end up committing both the prepared (empty)
transaction and the transaction that updates the catalog, right?

Can't we do this catalog update before committing the prepared
transaction? If so, both in prepared and non-prepared cases, our
implementation could be the same and we have a reason to accomplish
the catalog update in the same transaction for which we skipped the
changes.

But in case of a crash between these two transactions, given that
skip_xid is already cleared how do we know the prepared transaction
that was supposed to be skipped?

I was thinking of doing it as one transaction at the time of
commit_prepare. Say, in function apply_handle_commit_prepared(), if we
check whether the skip_xid is the same as prepare_data.xid then update
the catalog and set origin_lsn/timestamp in the same transaction. Why
do we need two transactions for it?

I meant the two transactions are the prepared transaction and the
transaction that updates the catalog. If I understand your idea
correctly, in apply_handle_commit_prepared(), we update the catalog
and set origin_lsn/timestamp. These are done in the same transaction.
Then, we commit the prepared transaction, right?

I am thinking that we can start a transaction, update the catalog,
commit that transaction. Then start a new one to update
origin_lsn/timestamp, finishprepared, and commit it. Now, if it
crashes after the first transaction, only commit prepared will be
resent again and this time we don't need to update the catalog as that
entry would be already cleared.

Sounds good. In the crash case, it should be fine since we will just
commit an empty transaction. The same is true for the case where
skip_xid has been changed after skipping and preparing the transaction
and before handling commit_prepared.

Regarding the case where the user specifies XID of the transaction
after it is prepared on the subscriber (i.g., the transaction is not
empty), we won’t skip committing the prepared transaction. But I think
that we don't need to support skipping already-prepared transaction
since such transaction doesn't conflict with anything regardless of
having changed or not.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#353Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#352)
Re: Skipping logical replication transactions on subscriber side

On Thu, Dec 9, 2021 at 2:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am thinking that we can start a transaction, update the catalog,
commit that transaction. Then start a new one to update
origin_lsn/timestamp, finishprepared, and commit it. Now, if it
crashes after the first transaction, only commit prepared will be
resent again and this time we don't need to update the catalog as that
entry would be already cleared.

Sounds good. In the crash case, it should be fine since we will just
commit an empty transaction. The same is true for the case where
skip_xid has been changed after skipping and preparing the transaction
and before handling commit_prepared.

Regarding the case where the user specifies XID of the transaction
after it is prepared on the subscriber (i.g., the transaction is not
empty), we won’t skip committing the prepared transaction. But I think
that we don't need to support skipping already-prepared transaction
since such transaction doesn't conflict with anything regardless of
having changed or not.

Yeah, this makes sense to me.

--
With Regards,
Amit Kapila.

#354Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#353)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Dec 9, 2021 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 9, 2021 at 2:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am thinking that we can start a transaction, update the catalog,
commit that transaction. Then start a new one to update
origin_lsn/timestamp, finishprepared, and commit it. Now, if it
crashes after the first transaction, only commit prepared will be
resent again and this time we don't need to update the catalog as that
entry would be already cleared.

Sounds good. In the crash case, it should be fine since we will just
commit an empty transaction. The same is true for the case where
skip_xid has been changed after skipping and preparing the transaction
and before handling commit_prepared.

Regarding the case where the user specifies XID of the transaction
after it is prepared on the subscriber (i.g., the transaction is not
empty), we won’t skip committing the prepared transaction. But I think
that we don't need to support skipping already-prepared transaction
since such transaction doesn't conflict with anything regardless of
having changed or not.

Yeah, this makes sense to me.

I've attached an updated patch. The new syntax is like "ALTER
SUBSCRIPTION testsub SKIP (xid = '123')".

I’ve been thinking we can do something safeguard for the case where
the user specified the wrong xid. For example, can we somewhat use the
stats in pg_stat_subscription_workers? An idea is that logical
replication worker fetches the xid from the stats when reading the
subscription and skips the transaction if the xid matches to
subskipxid. That is, the worker checks the error reported by the
worker previously working on the same subscription. The error could
not be a conflict error (e.g., connection error etc.) or might have
been cleared by the reset function, But given the worker is in an
error loop, the worker can eventually get xid in question. We can
prevent an unrelated transaction from being skipped unexpectedly. It
seems not a stable solution though. Or it might be enough to warn
users when they specified an XID that doesn’t match to last_error_xid.
Anyway, I think it’s better to have more discussion on this. Any
ideas?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transactio.patchapplication/octet-stream; name=0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transactio.patch
#355Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#354)
Re: Skipping logical replication transactions on subscriber side

On Fri, Dec 10, 2021 at 11:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 9, 2021 at 2:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am thinking that we can start a transaction, update the catalog,
commit that transaction. Then start a new one to update
origin_lsn/timestamp, finishprepared, and commit it. Now, if it
crashes after the first transaction, only commit prepared will be
resent again and this time we don't need to update the catalog as that
entry would be already cleared.

Sounds good. In the crash case, it should be fine since we will just
commit an empty transaction. The same is true for the case where
skip_xid has been changed after skipping and preparing the transaction
and before handling commit_prepared.

Regarding the case where the user specifies XID of the transaction
after it is prepared on the subscriber (i.g., the transaction is not
empty), we won’t skip committing the prepared transaction. But I think
that we don't need to support skipping already-prepared transaction
since such transaction doesn't conflict with anything regardless of
having changed or not.

Yeah, this makes sense to me.

I've attached an updated patch. The new syntax is like "ALTER
SUBSCRIPTION testsub SKIP (xid = '123')".

I’ve been thinking we can do something safeguard for the case where
the user specified the wrong xid. For example, can we somewhat use the
stats in pg_stat_subscription_workers? An idea is that logical
replication worker fetches the xid from the stats when reading the
subscription and skips the transaction if the xid matches to
subskipxid. That is, the worker checks the error reported by the
worker previously working on the same subscription. The error could
not be a conflict error (e.g., connection error etc.) or might have
been cleared by the reset function, But given the worker is in an
error loop, the worker can eventually get xid in question. We can
prevent an unrelated transaction from being skipped unexpectedly. It
seems not a stable solution though. Or it might be enough to warn
users when they specified an XID that doesn’t match to last_error_xid.

I think the idea is good but because it is not predictable as pointed
by you so we might want to just issue a LOG/WARNING. If not already
mentioned, then please do mention in docs the possibility of skipping
non-errored transactions.

Few comments/questions:
=====================
1.
+          Specifies the ID of the transaction whose application is to
be skipped
+          by the logical replication worker. Setting -1 means to reset the
+          transaction ID.

Can we change it to something like: "Specifies the ID of the
transaction whose changes are to be skipped by the logical replication
worker. ...."

2.
@@ -104,6 +104,16 @@ GetSubscription(Oid subid, bool missing_ok)
Assert(!isnull);
sub->publications = textarray_to_stringlist(DatumGetArrayTypeP(datum));

+ /* Get skip XID */
+ datum = SysCacheGetAttr(SUBSCRIPTIONOID,
+ tup,
+ Anum_pg_subscription_subskipxid,
+ &isnull);
+ if (!isnull)
+ sub->skipxid = DatumGetTransactionId(datum);
+ else
+ sub->skipxid = InvalidTransactionId;

Can't we assign it as we do for other fixed columns like subdbid,
subowner, etc.?

3.
+ * Also, we don't skip receiving the changes in streaming cases,
since we decide
+ * whether or not to skip applying the changes when starting to apply changes.

But why so? Can't we even skip streaming (and writing to file all such
messages)? If we can do this then we can avoid even collecting all
messages in a file.

4.
+ * Also, one might think that we can skip preparing the skipped transaction.
+ * But if we do that, PREPARE WAL record won’t be sent to its physical
+ * standbys, resulting in that users won’t be able to find the prepared
+ * transaction entry after a fail-over.
+ *
..
+ */
+ if (skipping_changes)
+ stop_skipping_changes(false);

Why do we need such a Prepare's entry either at current subscriber or
on its physical standby? I think it is to allow Commit-prepared. If
so, how about if we skip even commit prepared as well? Even on
physical standby, we would be having the value of skip_xid which can
help us to skip there as well after failover.

--
With Regards,
Amit Kapila.

#356Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#355)
Re: Skipping logical replication transactions on subscriber side

On Sat, Dec 11, 2021 at 3:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 10, 2021 at 11:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 9, 2021 at 2:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am thinking that we can start a transaction, update the catalog,
commit that transaction. Then start a new one to update
origin_lsn/timestamp, finishprepared, and commit it. Now, if it
crashes after the first transaction, only commit prepared will be
resent again and this time we don't need to update the catalog as that
entry would be already cleared.

Sounds good. In the crash case, it should be fine since we will just
commit an empty transaction. The same is true for the case where
skip_xid has been changed after skipping and preparing the transaction
and before handling commit_prepared.

Regarding the case where the user specifies XID of the transaction
after it is prepared on the subscriber (i.g., the transaction is not
empty), we won’t skip committing the prepared transaction. But I think
that we don't need to support skipping already-prepared transaction
since such transaction doesn't conflict with anything regardless of
having changed or not.

Yeah, this makes sense to me.

I've attached an updated patch. The new syntax is like "ALTER
SUBSCRIPTION testsub SKIP (xid = '123')".

I’ve been thinking we can do something safeguard for the case where
the user specified the wrong xid. For example, can we somewhat use the
stats in pg_stat_subscription_workers? An idea is that logical
replication worker fetches the xid from the stats when reading the
subscription and skips the transaction if the xid matches to
subskipxid. That is, the worker checks the error reported by the
worker previously working on the same subscription. The error could
not be a conflict error (e.g., connection error etc.) or might have
been cleared by the reset function, But given the worker is in an
error loop, the worker can eventually get xid in question. We can
prevent an unrelated transaction from being skipped unexpectedly. It
seems not a stable solution though. Or it might be enough to warn
users when they specified an XID that doesn’t match to last_error_xid.

I think the idea is good but because it is not predictable as pointed
by you so we might want to just issue a LOG/WARNING. If not already
mentioned, then please do mention in docs the possibility of skipping
non-errored transactions.

Few comments/questions:
=====================
1.
+          Specifies the ID of the transaction whose application is to
be skipped
+          by the logical replication worker. Setting -1 means to reset the
+          transaction ID.

Can we change it to something like: "Specifies the ID of the
transaction whose changes are to be skipped by the logical replication
worker. ...."

Agreed.

2.
@@ -104,6 +104,16 @@ GetSubscription(Oid subid, bool missing_ok)
Assert(!isnull);
sub->publications = textarray_to_stringlist(DatumGetArrayTypeP(datum));

+ /* Get skip XID */
+ datum = SysCacheGetAttr(SUBSCRIPTIONOID,
+ tup,
+ Anum_pg_subscription_subskipxid,
+ &isnull);
+ if (!isnull)
+ sub->skipxid = DatumGetTransactionId(datum);
+ else
+ sub->skipxid = InvalidTransactionId;

Can't we assign it as we do for other fixed columns like subdbid,
subowner, etc.?

Yeah, I think we can use InvalidTransactionId as the initial value
instead of setting NULL. Then, we can change this code.

3.
+ * Also, we don't skip receiving the changes in streaming cases,
since we decide
+ * whether or not to skip applying the changes when starting to apply changes.

But why so? Can't we even skip streaming (and writing to file all such
messages)? If we can do this then we can avoid even collecting all
messages in a file.

IIUC in streaming cases, a transaction can be sent to the subscriber
while splitting into multiple chunks of changes. In the meanwhile,
skip_xid can be changed. If the user changed or cleared skip_xid after
the subscriber skips some streamed changes, the subscriber won't able
to have complete changes of the transaction.

4.
+ * Also, one might think that we can skip preparing the skipped transaction.
+ * But if we do that, PREPARE WAL record won’t be sent to its physical
+ * standbys, resulting in that users won’t be able to find the prepared
+ * transaction entry after a fail-over.
+ *
..
+ */
+ if (skipping_changes)
+ stop_skipping_changes(false);

Why do we need such a Prepare's entry either at current subscriber or
on its physical standby? I think it is to allow Commit-prepared. If
so, how about if we skip even commit prepared as well? Even on
physical standby, we would be having the value of skip_xid which can
help us to skip there as well after failover.

It's true that skip_xid would be set also on physical standby. When it
comes to preparing the skipped transaction on the current subscriber,
if we want to skip commit-prepared I think we need protocol changes in
order for subscribers to know prepare_lsn and preppare_timestampso
that it can lookup the prepared transaction when doing
commit-prepared. I proposed this idea before. This change would be
benefical as of now since the publisher sends even empty transactions.
But considering the proposed patch[1] that makes the puslisher not
send empty transaction, this protocol change would be an optimization
only for this feature.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#357Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#354)
Re: Skipping logical replication transactions on subscriber side

On Fri, Dec 10, 2021 at 4:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. The new syntax is like "ALTER
SUBSCRIPTION testsub SKIP (xid = '123')".

I have some review comments:

(1) Patch comment - some suggested wording improvements

BEFORE:
If incoming change violates any constraint, logical replication stops
AFTER:
If an incoming change violates any constraint, logical replication stops

BEFORE:
The user can specify XID by ALTER SUBSCRIPTION ... SKIP (xid = XXX),
updating pg_subscription.subskipxid field, telling the apply worker to
skip the transaction.
AFTER:
The user can specify the XID of the transaction to skip using
ALTER SUBSCRIPTION ... SKIP (xid = XXX), updating the pg_subscription.subskipxid
field, telling the apply worker to skip the transaction.

src/sgml/logical-replication.sgml
(2) Some suggested wording improvements

(i) Missing "the"
BEFORE:
+   the existing data.  When a conflict produce an error, it is shown in
AFTER:
+   the existing data.  When a conflict produce an error, it is shown in the
(ii) Suggest starting a new sentence
BEFORE:
+   and it is also shown in subscriber's server log as follows:
AFTER:
+   The error is also shown in the subscriber's server log as follows:
(iii) Context message should say "at ..." instead of "with commit
timestamp ...", to match the actual output from the current code
BEFORE:
+CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 716 with commit timestamp
2021-09-29 15:52:45.165754+00
AFTER:
+CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 716 at 2021-09-29
15:52:45.165754+00

(iv) The following paragraph seems out of place, with the information
presented in the wrong order:

+  <para>
+   In this case, you need to consider changing the data on the
subscriber so that it
+   doesn't conflict with incoming changes, or dropping the
conflicting constraint or
+   unique index, or writing a trigger on the subscriber to suppress or redirect
+   conflicting incoming changes, or as a last resort, by skipping the
whole transaction.
+   They skip the whole transaction, including changes that may not violate any
+   constraint.  They may easily make the subscriber inconsistent, especially if
+   a user specifies the wrong transaction ID or the position of origin.
+  </para>

How about rearranging it as follows:

+  <para>
+   These methods skip the whole transaction, including changes that
may not violate
+   any constraint. They may easily make the subscriber inconsistent,
especially if
+   a user specifies the wrong transaction ID or the position of
origin, and should
+   be used as a last resort.
+   Alternatively, you might consider changing the data on the
subscriber so that it
+   doesn't conflict with incoming changes, or dropping the
conflicting constraint or
+   unique index, or writing a trigger on the subscriber to suppress or redirect
+   conflicting incoming changes.
+  </para>

doc/src/sgml/ref/alter_subscription.sgml
(3)

(i) Doc needs clarification
BEFORE:
+      the whole transaction.  The logical replication worker skips all data
AFTER:
+      the whole transaction.  For the latter case, the logical
replication worker skips all data

(ii) "Setting -1 means to reset the transaction ID"

Shouldn't it be explained what resetting actually does and when it can
be, or is needed to be, done? Isn't it automatically reset?
I notice that negative values (other than -1) seem to be regarded as
valid - is that right?
Also, what happens if this option is set multiple times? Does it just
override and use the latest setting? (other option handling errors out
with errorConflictingDefElem()).
e.g. alter subscription sub skip (xid = 721, xid = 722);

src/backend/replication/logical/worker.c
(4) Shouldn't the "done skipping logical replication transaction"
message also include the skipped XID value at the end?

src/test/subscription/t/027_skip_xact.pl
(5) Some suggested wording improvements

(i)
BEFORE:
+# Test skipping the transaction. This function must be called after the caller
+# inserting data that conflict with the subscriber.  After waiting for the
+# subscription worker stats are updated, we skip the transaction in question
+# by ALTER SUBSCRIPTION ... SKIP. Then, check if logical replication
can continue
+# working by inserting $nonconflict_data on the publisher.
AFTER:
+# Test skipping the transaction. This function must be called after the caller
+# inserts data that conflicts with the subscriber.  After waiting for the
+# subscription worker stats to be updated, we skip the transaction in question
+# by ALTER SUBSCRIPTION ... SKIP. Then, check if logical replication
can continue
+# working by inserting $nonconflict_data on the publisher.
(ii)
BEFORE:
+# will conflict with the data replicated from publisher later.
AFTER:
+# will conflict with the data replicated later from the publisher.

Regards,
Greg Nancarrow
Fujitsu Australia

#358Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#356)
Re: Skipping logical replication transactions on subscriber side

On Mon, Dec 13, 2021 at 8:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Dec 11, 2021 at 3:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

3.
+ * Also, we don't skip receiving the changes in streaming cases,
since we decide
+ * whether or not to skip applying the changes when starting to apply changes.

But why so? Can't we even skip streaming (and writing to file all such
messages)? If we can do this then we can avoid even collecting all
messages in a file.

IIUC in streaming cases, a transaction can be sent to the subscriber
while splitting into multiple chunks of changes. In the meanwhile,
skip_xid can be changed. If the user changed or cleared skip_xid after
the subscriber skips some streamed changes, the subscriber won't able
to have complete changes of the transaction.

Yeah, I think if we want we can handle this by writing into the stream
xid file whether the changes need to be skipped and then the
consecutive streams can check that in the file or may be in some way
don't allow skip_xid to be changed in worker if it is already skipping
some xact. If we don't want to do anything for this then it is better
to at least reflect this reasoning in the comments.

4.
+ * Also, one might think that we can skip preparing the skipped transaction.
+ * But if we do that, PREPARE WAL record won’t be sent to its physical
+ * standbys, resulting in that users won’t be able to find the prepared
+ * transaction entry after a fail-over.
+ *
..
+ */
+ if (skipping_changes)
+ stop_skipping_changes(false);

Why do we need such a Prepare's entry either at current subscriber or
on its physical standby? I think it is to allow Commit-prepared. If
so, how about if we skip even commit prepared as well? Even on
physical standby, we would be having the value of skip_xid which can
help us to skip there as well after failover.

It's true that skip_xid would be set also on physical standby. When it
comes to preparing the skipped transaction on the current subscriber,
if we want to skip commit-prepared I think we need protocol changes in
order for subscribers to know prepare_lsn and preppare_timestampso
that it can lookup the prepared transaction when doing
commit-prepared. I proposed this idea before. This change would be
benefical as of now since the publisher sends even empty transactions.
But considering the proposed patch[1] that makes the puslisher not
send empty transaction, this protocol change would be an optimization
only for this feature.

I was thinking to compare the xid received as part of the
commit_prepared message with the value of skip_xid to skip the
commit_prepared but I guess the user would change it between prepare
and commit prepare and then we won't be able to detect it, right? I
think we can handle this and the streaming case if we disallow users
to change the value of skip_xid when we are already skipping changes
or don't let the new skip_xid to reflect in the apply worker if we are
already skipping some other transaction. What do you think?

--
With Regards,
Amit Kapila.

#359Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#358)
Re: Skipping logical replication transactions on subscriber side

On Mon, Dec 13, 2021 at 1:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 13, 2021 at 8:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Dec 11, 2021 at 3:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

3.
+ * Also, we don't skip receiving the changes in streaming cases,
since we decide
+ * whether or not to skip applying the changes when starting to apply changes.

But why so? Can't we even skip streaming (and writing to file all such
messages)? If we can do this then we can avoid even collecting all
messages in a file.

IIUC in streaming cases, a transaction can be sent to the subscriber
while splitting into multiple chunks of changes. In the meanwhile,
skip_xid can be changed. If the user changed or cleared skip_xid after
the subscriber skips some streamed changes, the subscriber won't able
to have complete changes of the transaction.

Yeah, I think if we want we can handle this by writing into the stream
xid file whether the changes need to be skipped and then the
consecutive streams can check that in the file or may be in some way
don't allow skip_xid to be changed in worker if it is already skipping
some xact. If we don't want to do anything for this then it is better
to at least reflect this reasoning in the comments.

Yes. Given that we still need to apply messages other than
data-modification messages, we need to skip writing only these changes
to the stream file.

4.
+ * Also, one might think that we can skip preparing the skipped transaction.
+ * But if we do that, PREPARE WAL record won’t be sent to its physical
+ * standbys, resulting in that users won’t be able to find the prepared
+ * transaction entry after a fail-over.
+ *
..
+ */
+ if (skipping_changes)
+ stop_skipping_changes(false);

Why do we need such a Prepare's entry either at current subscriber or
on its physical standby? I think it is to allow Commit-prepared. If
so, how about if we skip even commit prepared as well? Even on
physical standby, we would be having the value of skip_xid which can
help us to skip there as well after failover.

It's true that skip_xid would be set also on physical standby. When it
comes to preparing the skipped transaction on the current subscriber,
if we want to skip commit-prepared I think we need protocol changes in
order for subscribers to know prepare_lsn and preppare_timestampso
that it can lookup the prepared transaction when doing
commit-prepared. I proposed this idea before. This change would be
benefical as of now since the publisher sends even empty transactions.
But considering the proposed patch[1] that makes the puslisher not
send empty transaction, this protocol change would be an optimization
only for this feature.

I was thinking to compare the xid received as part of the
commit_prepared message with the value of skip_xid to skip the
commit_prepared but I guess the user would change it between prepare
and commit prepare and then we won't be able to detect it, right? I
think we can handle this and the streaming case if we disallow users
to change the value of skip_xid when we are already skipping changes
or don't let the new skip_xid to reflect in the apply worker if we are
already skipping some other transaction. What do you think?

In streaming cases, we don’t know when stream-commit or stream-abort
comes and another conflict could occur on the subscription in the
meanwhile. But given that (we expect) this feature is used after the
apply worker enters into an error loop, this is unlikely to happen in
practice unless the user sets the wrong XID. Similarly, in 2PC cases,
we don’t know when commit-prepared or rollback-prepared comes and
another conflict could occur in the meanwhile. But this could occur in
practice even if the user specified the correct XID. Therefore, if we
disallow to change skip_xid until the subscriber receives
commit-prepared or rollback-prepared, we cannot skip the second
transaction that conflicts with data on the subscriber.

From the application perspective, which behavior is preferable between
skipping preparing a transaction and preparing an empty transaction,
in the first place? From the resource consumption etc., skipping
preparing transactions seems better. On the other hand, if we skipped
preparing the transaction, the application would not be able to find
the prepared transaction after a fail-over to the subscriber.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#360Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#359)
Re: Skipping logical replication transactions on subscriber side

On Mon, Dec 13, 2021 at 6:55 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 13, 2021 at 1:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 13, 2021 at 8:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

4.
+ * Also, one might think that we can skip preparing the skipped transaction.
+ * But if we do that, PREPARE WAL record won’t be sent to its physical
+ * standbys, resulting in that users won’t be able to find the prepared
+ * transaction entry after a fail-over.
+ *
..
+ */
+ if (skipping_changes)
+ stop_skipping_changes(false);

Why do we need such a Prepare's entry either at current subscriber or
on its physical standby? I think it is to allow Commit-prepared. If
so, how about if we skip even commit prepared as well? Even on
physical standby, we would be having the value of skip_xid which can
help us to skip there as well after failover.

It's true that skip_xid would be set also on physical standby. When it
comes to preparing the skipped transaction on the current subscriber,
if we want to skip commit-prepared I think we need protocol changes in
order for subscribers to know prepare_lsn and preppare_timestampso
that it can lookup the prepared transaction when doing
commit-prepared. I proposed this idea before. This change would be
benefical as of now since the publisher sends even empty transactions.
But considering the proposed patch[1] that makes the puslisher not
send empty transaction, this protocol change would be an optimization
only for this feature.

I was thinking to compare the xid received as part of the
commit_prepared message with the value of skip_xid to skip the
commit_prepared but I guess the user would change it between prepare
and commit prepare and then we won't be able to detect it, right? I
think we can handle this and the streaming case if we disallow users
to change the value of skip_xid when we are already skipping changes
or don't let the new skip_xid to reflect in the apply worker if we are
already skipping some other transaction. What do you think?

In streaming cases, we don’t know when stream-commit or stream-abort
comes and another conflict could occur on the subscription in the
meanwhile. But given that (we expect) this feature is used after the
apply worker enters into an error loop, this is unlikely to happen in
practice unless the user sets the wrong XID. Similarly, in 2PC cases,
we don’t know when commit-prepared or rollback-prepared comes and
another conflict could occur in the meanwhile. But this could occur in
practice even if the user specified the correct XID. Therefore, if we
disallow to change skip_xid until the subscriber receives
commit-prepared or rollback-prepared, we cannot skip the second
transaction that conflicts with data on the subscriber.

I agree with this theory. Can we reflect this in comments so that in
the future we know why we didn't pursue this direction?

From the application perspective, which behavior is preferable between
skipping preparing a transaction and preparing an empty transaction,
in the first place? From the resource consumption etc., skipping
preparing transactions seems better. On the other hand, if we skipped
preparing the transaction, the application would not be able to find
the prepared transaction after a fail-over to the subscriber.

I am not sure how much it matters that such prepares are not present
because we wanted to some way skip the corresponding commit prepared
as well. I think your previous point is a good enough reason as to why
we should allow such prepares.

--
With Regards,
Amit Kapila.

#361vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#354)
Re: Skipping logical replication transactions on subscriber side

On Fri, Dec 10, 2021 at 11:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 9, 2021 at 2:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am thinking that we can start a transaction, update the catalog,
commit that transaction. Then start a new one to update
origin_lsn/timestamp, finishprepared, and commit it. Now, if it
crashes after the first transaction, only commit prepared will be
resent again and this time we don't need to update the catalog as that
entry would be already cleared.

Sounds good. In the crash case, it should be fine since we will just
commit an empty transaction. The same is true for the case where
skip_xid has been changed after skipping and preparing the transaction
and before handling commit_prepared.

Regarding the case where the user specifies XID of the transaction
after it is prepared on the subscriber (i.g., the transaction is not
empty), we won’t skip committing the prepared transaction. But I think
that we don't need to support skipping already-prepared transaction
since such transaction doesn't conflict with anything regardless of
having changed or not.

Yeah, this makes sense to me.

I've attached an updated patch. The new syntax is like "ALTER
SUBSCRIPTION testsub SKIP (xid = '123')".

I’ve been thinking we can do something safeguard for the case where
the user specified the wrong xid. For example, can we somewhat use the
stats in pg_stat_subscription_workers? An idea is that logical
replication worker fetches the xid from the stats when reading the
subscription and skips the transaction if the xid matches to
subskipxid. That is, the worker checks the error reported by the
worker previously working on the same subscription. The error could
not be a conflict error (e.g., connection error etc.) or might have
been cleared by the reset function, But given the worker is in an
error loop, the worker can eventually get xid in question. We can
prevent an unrelated transaction from being skipped unexpectedly. It
seems not a stable solution though. Or it might be enough to warn
users when they specified an XID that doesn’t match to last_error_xid.
Anyway, I think it’s better to have more discussion on this. Any
ideas?

While the worker is skipping one of the skip transactions specified by
the user and immediately if the user specifies another skip
transaction while the skipping of the transaction is in progress this
new value will be reset by the worker while clearing the skip xid. I
felt once the worker has identified the skip xid and is about to skip
the xid, the worker can acquire a lock to prevent concurrency issues:
+static void
+clear_subscription_skip_xid(void)
+{
+       Relation        rel;
+       HeapTuple       tup;
+       bool            nulls[Natts_pg_subscription];
+       bool            replaces[Natts_pg_subscription];
+       Datum           values[Natts_pg_subscription];
+
+       memset(values, 0, sizeof(values));
+       memset(nulls, false, sizeof(nulls));
+       memset(replaces, false, sizeof(replaces));
+
+       if (!IsTransactionState())
+               StartTransactionCommand();
+
+       rel = table_open(SubscriptionRelationId, RowExclusiveLock);
+
+       /* Fetch the existing tuple. */
+       tup = SearchSysCacheCopy1(SUBSCRIPTIONOID,
+
ObjectIdGetDatum(MySubscription->oid));
+
+       if (!HeapTupleIsValid(tup))
+               elog(ERROR, "subscription \"%s\" does not exist",
MySubscription->name);
+
+       /* Set subskipxid to null */
+       nulls[Anum_pg_subscription_subskipxid - 1] = true;
+       replaces[Anum_pg_subscription_subskipxid - 1] = true;
+
+       /* Update the system catalog to reset the skip XID */
+       tup = heap_modify_tuple(tup, RelationGetDescr(rel), values, nulls,
+                                                       replaces);
+       CatalogTupleUpdate(rel, &tup->t_self, tup);
+
+       heap_freetuple(tup);
+       table_close(rel, RowExclusiveLock);
+}

Regards,
Vignesh

#362Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: vignesh C (#361)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 14, 2021 at 3:23 PM vignesh C <vignesh21@gmail.com> wrote:

While the worker is skipping one of the skip transactions specified by
the user and immediately if the user specifies another skip
transaction while the skipping of the transaction is in progress this
new value will be reset by the worker while clearing the skip xid. I
felt once the worker has identified the skip xid and is about to skip
the xid, the worker can acquire a lock to prevent concurrency issues:

That's a good point.
If only the last_error_xid could be skipped, then this wouldn't be an
issue, right?
If a different xid to skip is specified while the worker is currently
skipping a transaction, should that even be allowed?

Regards,
Greg Nancarrow
Fujitsu Australia

#363Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Masahiko Sawada (#340)
Re: Skipping logical replication transactions on subscriber side

On Fri, Dec 3, 2021 at 12:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Skipping a whole transaction by specifying xid would be a good start.
Ideally, we'd like to automatically skip only operations within the
transaction that fail but it seems not easy to achieve. If we allow
specifying operations and/or relations, probably multiple operations
or relations need to be specified in some cases. Otherwise, the
subscriber cannot continue logical replication if the transaction has
multiple operations on different relations that fail. But similar to
the idea of specifying multiple xids, we need to note the fact that
user wouldn't know of the second operation failure unless the apply
worker applies the change. So I'm not sure there are many use cases in
practice where users can specify multiple operations and relations in
order to skip applies that fail.

I think there would be use cases for specifying the relations or
operation, e.g. if the user finds an issue in inserting in a
particular relation then maybe based on some manual investigation he
founds that the table has some constraint due to that it is failing on
the subscriber side but on the publisher side that constraint is not
there so maybe the user is okay to skip the changes for this table and
not for other tables, or there might be a few more tables which are
designed based on the same principle and can have similar error so
isn't it good to provide an option to give the list of all such
tables.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#364Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#360)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 14, 2021 at 8:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 13, 2021 at 6:55 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In streaming cases, we don’t know when stream-commit or stream-abort
comes and another conflict could occur on the subscription in the
meanwhile. But given that (we expect) this feature is used after the
apply worker enters into an error loop, this is unlikely to happen in
practice unless the user sets the wrong XID. Similarly, in 2PC cases,
we don’t know when commit-prepared or rollback-prepared comes and
another conflict could occur in the meanwhile. But this could occur in
practice even if the user specified the correct XID. Therefore, if we
disallow to change skip_xid until the subscriber receives
commit-prepared or rollback-prepared, we cannot skip the second
transaction that conflicts with data on the subscriber.

I agree with this theory. Can we reflect this in comments so that in
the future we know why we didn't pursue this direction?

I might be missing something here, but for streaming, transaction
users can decide whether they wants to skip or not only once we start
applying no? I mean only once we start applying the changes we can
get some errors and by that time we must be having all the changes for
the transaction. So I do not understand the point we are trying to
discuss here?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#365Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#364)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 14, 2021 at 1:07 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Dec 14, 2021 at 8:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 13, 2021 at 6:55 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In streaming cases, we don’t know when stream-commit or stream-abort
comes and another conflict could occur on the subscription in the
meanwhile. But given that (we expect) this feature is used after the
apply worker enters into an error loop, this is unlikely to happen in
practice unless the user sets the wrong XID. Similarly, in 2PC cases,
we don’t know when commit-prepared or rollback-prepared comes and
another conflict could occur in the meanwhile. But this could occur in
practice even if the user specified the correct XID. Therefore, if we
disallow to change skip_xid until the subscriber receives
commit-prepared or rollback-prepared, we cannot skip the second
transaction that conflicts with data on the subscriber.

I agree with this theory. Can we reflect this in comments so that in
the future we know why we didn't pursue this direction?

I might be missing something here, but for streaming, transaction
users can decide whether they wants to skip or not only once we start
applying no? I mean only once we start applying the changes we can
get some errors and by that time we must be having all the changes for
the transaction.

That is right and as per my understanding, the patch is trying to
accomplish the same.

So I do not understand the point we are trying to
discuss here?

The point is that whether we can skip the changes while streaming
itself like when we get the changes and write to a stream file. Now,
it is possible that streams from multiple transactions can be
interleaved and users can change the skip_xid in between. It is not
that we can't handle this but that would require a more complex design
and it doesn't seem worth it because we can anyway skip the changes
while applying as you mentioned in the previous paragraph.

--
With Regards,
Amit Kapila.

#366vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#354)
Re: Skipping logical replication transactions on subscriber side

On Fri, Dec 10, 2021 at 11:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 6:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 9, 2021 at 2:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 9, 2021 at 11:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I am thinking that we can start a transaction, update the catalog,
commit that transaction. Then start a new one to update
origin_lsn/timestamp, finishprepared, and commit it. Now, if it
crashes after the first transaction, only commit prepared will be
resent again and this time we don't need to update the catalog as that
entry would be already cleared.

Sounds good. In the crash case, it should be fine since we will just
commit an empty transaction. The same is true for the case where
skip_xid has been changed after skipping and preparing the transaction
and before handling commit_prepared.

Regarding the case where the user specifies XID of the transaction
after it is prepared on the subscriber (i.g., the transaction is not
empty), we won’t skip committing the prepared transaction. But I think
that we don't need to support skipping already-prepared transaction
since such transaction doesn't conflict with anything regardless of
having changed or not.

Yeah, this makes sense to me.

I've attached an updated patch. The new syntax is like "ALTER
SUBSCRIPTION testsub SKIP (xid = '123')".

I’ve been thinking we can do something safeguard for the case where
the user specified the wrong xid. For example, can we somewhat use the
stats in pg_stat_subscription_workers? An idea is that logical
replication worker fetches the xid from the stats when reading the
subscription and skips the transaction if the xid matches to
subskipxid. That is, the worker checks the error reported by the
worker previously working on the same subscription. The error could
not be a conflict error (e.g., connection error etc.) or might have
been cleared by the reset function, But given the worker is in an
error loop, the worker can eventually get xid in question. We can
prevent an unrelated transaction from being skipped unexpectedly. It
seems not a stable solution though. Or it might be enough to warn
users when they specified an XID that doesn’t match to last_error_xid.
Anyway, I think it’s better to have more discussion on this. Any
ideas?

Few comments:
1) Should we check if conflicting option is specified like others above:
+               else if (strcmp(defel->defname, "xid") == 0)
+               {
+                       char *xid_str = defGetString(defel);
+                       TransactionId xid;
+
+                       if (strcmp(xid_str, "-1") == 0)
+                       {
+                               /* Setting -1 to xid means to reset it */
+                               xid = InvalidTransactionId;
+                       }
+                       else
+                       {
2) Currently only superusers can set skip xid, we can add this in the
documentation:
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to set %s", "skip_xid")));
3) There is an extra tab before "The resolution can be done ...", it
can be removed.
+      Skip applying changes of the particular transaction.  If incoming data
+      violates any constraints the logical replication will stop until it is
+      resolved. The resolution can be done either by changing data on the
+      subscriber so that it doesn't conflict with incoming change or
by skipping
+      the whole transaction.  The logical replication worker skips all data
4) xid with -2 is currently allowed, may be it is ok. If it is fine we
can remove it from the fail section.
+-- fail
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 1.1);
+ERROR:  invalid transaction id: 1.1
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = -2);
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 0);
+ERROR:  invalid transaction id: 0
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 1);

Regards,
Vignesh

#367Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#365)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 14, 2021 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I agree with this theory. Can we reflect this in comments so that in
the future we know why we didn't pursue this direction?

I might be missing something here, but for streaming, transaction
users can decide whether they wants to skip or not only once we start
applying no? I mean only once we start applying the changes we can
get some errors and by that time we must be having all the changes for
the transaction.

That is right and as per my understanding, the patch is trying to
accomplish the same.

So I do not understand the point we are trying to
discuss here?

The point is that whether we can skip the changes while streaming
itself like when we get the changes and write to a stream file. Now,
it is possible that streams from multiple transactions can be
interleaved and users can change the skip_xid in between. It is not
that we can't handle this but that would require a more complex design
and it doesn't seem worth it because we can anyway skip the changes
while applying as you mentioned in the previous paragraph.

Actually, I was trying to understand the use case for skipping while
streaming. Actually, during streaming we are not doing any database
operation that means this will not generate any error. So IIUC, there
is no use case for skipping while streaming itself? Is there any use
case which I am not aware of?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#368Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#367)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 14, 2021 at 3:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Dec 14, 2021 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I agree with this theory. Can we reflect this in comments so that in
the future we know why we didn't pursue this direction?

I might be missing something here, but for streaming, transaction
users can decide whether they wants to skip or not only once we start
applying no? I mean only once we start applying the changes we can
get some errors and by that time we must be having all the changes for
the transaction.

That is right and as per my understanding, the patch is trying to
accomplish the same.

So I do not understand the point we are trying to
discuss here?

The point is that whether we can skip the changes while streaming
itself like when we get the changes and write to a stream file. Now,
it is possible that streams from multiple transactions can be
interleaved and users can change the skip_xid in between. It is not
that we can't handle this but that would require a more complex design
and it doesn't seem worth it because we can anyway skip the changes
while applying as you mentioned in the previous paragraph.

Actually, I was trying to understand the use case for skipping while
streaming. Actually, during streaming we are not doing any database
operation that means this will not generate any error.

Say, there is an error the first time when we start to apply changes
for such a transaction. So, such a transaction will be streamed again.
Say, the user has set the skip_xid before we stream a second time, so
this time, we can skip it either during the stream phase or apply
phase. I think the patch is skipping it during apply phase.
Sawada-San, please confirm if my understanding is correct?

--
With Regards,
Amit Kapila.

#369Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#368)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 14, 2021 at 8:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 14, 2021 at 3:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Dec 14, 2021 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I agree with this theory. Can we reflect this in comments so that in
the future we know why we didn't pursue this direction?

I might be missing something here, but for streaming, transaction
users can decide whether they wants to skip or not only once we start
applying no? I mean only once we start applying the changes we can
get some errors and by that time we must be having all the changes for
the transaction.

That is right and as per my understanding, the patch is trying to
accomplish the same.

So I do not understand the point we are trying to
discuss here?

The point is that whether we can skip the changes while streaming
itself like when we get the changes and write to a stream file. Now,
it is possible that streams from multiple transactions can be
interleaved and users can change the skip_xid in between. It is not
that we can't handle this but that would require a more complex design
and it doesn't seem worth it because we can anyway skip the changes
while applying as you mentioned in the previous paragraph.

Actually, I was trying to understand the use case for skipping while
streaming. Actually, during streaming we are not doing any database
operation that means this will not generate any error.

Say, there is an error the first time when we start to apply changes
for such a transaction. So, such a transaction will be streamed again.
Say, the user has set the skip_xid before we stream a second time, so
this time, we can skip it either during the stream phase or apply
phase. I think the patch is skipping it during apply phase.
Sawada-San, please confirm if my understanding is correct?

My understanding is the same. The patch doesn't skip the streaming
phase but starts skipping when starting to apply changes. That is, we
receive streamed changes and write them to the stream file anyway
regardless of skip_xid. When receiving the stream-commit message, we
check whether or not we skip this transaction, and if so we apply all
messages in the stream file other than all data modification messages.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#370Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#362)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 14, 2021 at 2:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Dec 14, 2021 at 3:23 PM vignesh C <vignesh21@gmail.com> wrote:

While the worker is skipping one of the skip transactions specified by
the user and immediately if the user specifies another skip
transaction while the skipping of the transaction is in progress this
new value will be reset by the worker while clearing the skip xid. I
felt once the worker has identified the skip xid and is about to skip
the xid, the worker can acquire a lock to prevent concurrency issues:

That's a good point.
If only the last_error_xid could be skipped, then this wouldn't be an
issue, right?
If a different xid to skip is specified while the worker is currently
skipping a transaction, should that even be allowed?

We don't expect such usage but yes, it could happen and seems not
good. I thought we can acquire Share lock on pg_subscription during
the skip but not sure it's a good idea. It would be better if we can
find a way to allow users to specify only XID that has failed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#371Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#368)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 14, 2021 at 4:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Actually, I was trying to understand the use case for skipping while
streaming. Actually, during streaming we are not doing any database
operation that means this will not generate any error.

Say, there is an error the first time when we start to apply changes
for such a transaction. So, such a transaction will be streamed again.
Say, the user has set the skip_xid before we stream a second time, so
this time, we can skip it either during the stream phase or apply
phase. I think the patch is skipping it during apply phase.
Sawada-San, please confirm if my understanding is correct?

Got it, thanks for clarifying.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#372Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#370)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 15, 2021 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 14, 2021 at 2:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Dec 14, 2021 at 3:23 PM vignesh C <vignesh21@gmail.com> wrote:

While the worker is skipping one of the skip transactions specified by
the user and immediately if the user specifies another skip
transaction while the skipping of the transaction is in progress this
new value will be reset by the worker while clearing the skip xid. I
felt once the worker has identified the skip xid and is about to skip
the xid, the worker can acquire a lock to prevent concurrency issues:

That's a good point.
If only the last_error_xid could be skipped, then this wouldn't be an
issue, right?
If a different xid to skip is specified while the worker is currently
skipping a transaction, should that even be allowed?

We don't expect such usage but yes, it could happen and seems not
good. I thought we can acquire Share lock on pg_subscription during
the skip but not sure it's a good idea. It would be better if we can
find a way to allow users to specify only XID that has failed.

Yeah, but as we don't have a definite way to allow specifying only
failed XID, I think it is better to use share lock on that particular
subscription. We are already using it for add/update rel state (see,
AddSubscriptionRelState, UpdateSubscriptionRelState), so this will be
another place to use a similar technique.

--
With Regards,
Amit Kapila.

#373Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#363)
Re: Skipping logical replication transactions on subscriber side

On Tue, Dec 14, 2021 at 11:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Dec 3, 2021 at 12:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Skipping a whole transaction by specifying xid would be a good start.
Ideally, we'd like to automatically skip only operations within the
transaction that fail but it seems not easy to achieve. If we allow
specifying operations and/or relations, probably multiple operations
or relations need to be specified in some cases. Otherwise, the
subscriber cannot continue logical replication if the transaction has
multiple operations on different relations that fail. But similar to
the idea of specifying multiple xids, we need to note the fact that
user wouldn't know of the second operation failure unless the apply
worker applies the change. So I'm not sure there are many use cases in
practice where users can specify multiple operations and relations in
order to skip applies that fail.

I think there would be use cases for specifying the relations or
operation, e.g. if the user finds an issue in inserting in a
particular relation then maybe based on some manual investigation he
founds that the table has some constraint due to that it is failing on
the subscriber side but on the publisher side that constraint is not
there so maybe the user is okay to skip the changes for this table and
not for other tables, or there might be a few more tables which are
designed based on the same principle and can have similar error so
isn't it good to provide an option to give the list of all such
tables.

That's right and I agree there could be some use case for it and even
specifying the operation but I think we can always extend the existing
feature for it if the need arises. Note that the user can anyway only
specify a single relation or an operation because there is a way to
know only one error and till that is resolved, the apply process won't
proceed. We have discussed providing these additional options in this
thread but thought of doing it later once we have the base feature and
based on the feedback from users.

--
With Regards,
Amit Kapila.

#374Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#373)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 15, 2021 at 9:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 14, 2021 at 11:40 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

That's right and I agree there could be some use case for it and even
specifying the operation but I think we can always extend the existing
feature for it if the need arises. Note that the user can anyway only
specify a single relation or an operation because there is a way to
know only one error and till that is resolved, the apply process won't
proceed. We have discussed providing these additional options in this
thread but thought of doing it later once we have the base feature and
based on the feedback from users.

Yeah, I only wanted to make the point that this could be useful, it
seems we are on the same page. I agree we can extend it in the future
as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#375Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#370)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 15, 2021 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

We don't expect such usage but yes, it could happen and seems not
good. I thought we can acquire Share lock on pg_subscription during
the skip but not sure it's a good idea. It would be better if we can
find a way to allow users to specify only XID that has failed.

Yes, I agree that would be better.
If you didn't do that, I think you'd need to queue the XIDs to be
skipped (rather than locking).

Regards,
Greg Nancarrow
Fujitsu Australia

#376Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#372)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 15, 2021 at 1:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 15, 2021 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 14, 2021 at 2:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Dec 14, 2021 at 3:23 PM vignesh C <vignesh21@gmail.com> wrote:

While the worker is skipping one of the skip transactions specified by
the user and immediately if the user specifies another skip
transaction while the skipping of the transaction is in progress this
new value will be reset by the worker while clearing the skip xid. I
felt once the worker has identified the skip xid and is about to skip
the xid, the worker can acquire a lock to prevent concurrency issues:

That's a good point.
If only the last_error_xid could be skipped, then this wouldn't be an
issue, right?
If a different xid to skip is specified while the worker is currently
skipping a transaction, should that even be allowed?

We don't expect such usage but yes, it could happen and seems not
good. I thought we can acquire Share lock on pg_subscription during
the skip but not sure it's a good idea. It would be better if we can
find a way to allow users to specify only XID that has failed.

Yeah, but as we don't have a definite way to allow specifying only
failed XID, I think it is better to use share lock on that particular
subscription. We are already using it for add/update rel state (see,
AddSubscriptionRelState, UpdateSubscriptionRelState), so this will be
another place to use a similar technique.

Yes, but it seems to mean that we disallow users to change skip_xid
while the apply worker is skipping changes so we will end up having
the same problem we discussed so far;

In the current patch, we don't clear skip_xid at prepare time but do
that at commit-prepare time. But we cannot keep holding the lock until
commit-prepared comes because we don’t know when commit-prepared
comes. It’s possible that another conflict occurs before the
commit-prepared comes. We also cannot only clear skip_xid at prepare
time because it doesn’t solve the concurrency problem at
commit-prepared time. So if my understanding is correct, we need to
both clear skip_xid and unlock the lock at prepare time, and commit
the prepared (empty) transaction at commit-prepared time (I assume
that we prepare even empty transactions).

Suppose that at prepare time, we clear skip_xid (and release the lock)
and then prepare the transaction, if the server crashes right after
clearing skip_xid, skip_xid is already cleared but the transaction
will be sent again. The user has to specify skip_xid again. So let’s
change the order; we prepare the transaction and then clear skip_xid.
But if the server crashes between them, the transaction won’t be sent
again, but skip_xid is left. The user has to clear it. The left
skip_xid can automatically be cleared at commit-prepared time if XID
in the commit-prepared message matches skip_xid, but this actually
doesn’t solve the concurrency problem. If the user changed skip_xid
before commit-prepared, we would end up clearing the value. So we
might want to hold the lock until we clear skip_xid but we want to
avoid that as I explained first. It seems like we entered a loop.

It sounds better among these ideas that we clear skip_xid and then
prepare the transaction. Or we might want to revisit the idea of
storing skip_xid on shmem (e.g., ReplicationState) instead of the
catalog.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#377Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#376)
Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 15, 2021 at 8:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 15, 2021 at 1:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 15, 2021 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 14, 2021 at 2:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Dec 14, 2021 at 3:23 PM vignesh C <vignesh21@gmail.com> wrote:

While the worker is skipping one of the skip transactions specified by
the user and immediately if the user specifies another skip
transaction while the skipping of the transaction is in progress this
new value will be reset by the worker while clearing the skip xid. I
felt once the worker has identified the skip xid and is about to skip
the xid, the worker can acquire a lock to prevent concurrency issues:

That's a good point.
If only the last_error_xid could be skipped, then this wouldn't be an
issue, right?
If a different xid to skip is specified while the worker is currently
skipping a transaction, should that even be allowed?

We don't expect such usage but yes, it could happen and seems not
good. I thought we can acquire Share lock on pg_subscription during
the skip but not sure it's a good idea. It would be better if we can
find a way to allow users to specify only XID that has failed.

Yeah, but as we don't have a definite way to allow specifying only
failed XID, I think it is better to use share lock on that particular
subscription. We are already using it for add/update rel state (see,
AddSubscriptionRelState, UpdateSubscriptionRelState), so this will be
another place to use a similar technique.

Yes, but it seems to mean that we disallow users to change skip_xid
while the apply worker is skipping changes so we will end up having
the same problem we discussed so far;

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

--
With Regards,
Amit Kapila.

#378Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#377)
Re: Skipping logical replication transactions on subscriber side

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 15, 2021 at 8:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 15, 2021 at 1:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 15, 2021 at 8:19 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 14, 2021 at 2:35 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Dec 14, 2021 at 3:23 PM vignesh C <vignesh21@gmail.com> wrote:

While the worker is skipping one of the skip transactions specified by
the user and immediately if the user specifies another skip
transaction while the skipping of the transaction is in progress this
new value will be reset by the worker while clearing the skip xid. I
felt once the worker has identified the skip xid and is about to skip
the xid, the worker can acquire a lock to prevent concurrency issues:

That's a good point.
If only the last_error_xid could be skipped, then this wouldn't be an
issue, right?
If a different xid to skip is specified while the worker is currently
skipping a transaction, should that even be allowed?

We don't expect such usage but yes, it could happen and seems not
good. I thought we can acquire Share lock on pg_subscription during
the skip but not sure it's a good idea. It would be better if we can
find a way to allow users to specify only XID that has failed.

Yeah, but as we don't have a definite way to allow specifying only
failed XID, I think it is better to use share lock on that particular
subscription. We are already using it for add/update rel state (see,
AddSubscriptionRelState, UpdateSubscriptionRelState), so this will be
another place to use a similar technique.

Yes, but it seems to mean that we disallow users to change skip_xid
while the apply worker is skipping changes so we will end up having
the same problem we discussed so far;

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#379Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#378)
Re: Skipping logical replication transactions on subscriber side

On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

--
With Regards,
Amit Kapila.

#380Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#379)
Re: Skipping logical replication transactions on subscriber side

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#381Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Greg Nancarrow (#357)
Re: Skipping logical replication transactions on subscriber side

On 13.12.21 04:12, Greg Nancarrow wrote:

(ii) "Setting -1 means to reset the transaction ID"

Shouldn't it be explained what resetting actually does and when it can
be, or is needed to be, done? Isn't it automatically reset?
I notice that negative values (other than -1) seem to be regarded as
valid - is that right?
Also, what happens if this option is set multiple times? Does it just
override and use the latest setting? (other option handling errors out
with errorConflictingDefElem()).
e.g. alter subscription sub skip (xid = 721, xid = 722);

Let's not use magic numbers and instead use a syntax that is more
explicit, like SKIP (xid = NONE) or RESET SKIP or something like that.

#382Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#381)
Re: Skipping logical replication transactions on subscriber side

On Fri, Dec 17, 2021 at 3:23 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 13.12.21 04:12, Greg Nancarrow wrote:

(ii) "Setting -1 means to reset the transaction ID"

Shouldn't it be explained what resetting actually does and when it can
be, or is needed to be, done? Isn't it automatically reset?
I notice that negative values (other than -1) seem to be regarded as
valid - is that right?
Also, what happens if this option is set multiple times? Does it just
override and use the latest setting? (other option handling errors out
with errorConflictingDefElem()).
e.g. alter subscription sub skip (xid = 721, xid = 722);

Let's not use magic numbers and instead use a syntax that is more
explicit, like SKIP (xid = NONE) or RESET SKIP or something like that.

+1 for using SKIP (xid = NONE) because otherwise first we need to
introduce RESET syntax for this command.

--
With Regards,
Amit Kapila.

#383Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#382)
Re: Skipping logical replication transactions on subscriber side

On Fri, Dec 17, 2021 at 7:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 17, 2021 at 3:23 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 13.12.21 04:12, Greg Nancarrow wrote:

(ii) "Setting -1 means to reset the transaction ID"

Shouldn't it be explained what resetting actually does and when it can
be, or is needed to be, done? Isn't it automatically reset?
I notice that negative values (other than -1) seem to be regarded as
valid - is that right?
Also, what happens if this option is set multiple times? Does it just
override and use the latest setting? (other option handling errors out
with errorConflictingDefElem()).
e.g. alter subscription sub skip (xid = 721, xid = 722);

Let's not use magic numbers and instead use a syntax that is more
explicit, like SKIP (xid = NONE) or RESET SKIP or something like that.

+1 for using SKIP (xid = NONE) because otherwise first we need to
introduce RESET syntax for this command.

Agreed. Thank you for the comment!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#384Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#380)
Re: Skipping logical replication transactions on subscriber side

On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

I've tested it and realized that we cannot use replorigin_advance()
for this purpose without changes. That is, the current
replorigin_advance() doesn't allow to advance the origin by the owner:

/* Make sure it's not used by somebody else */
if (replication_state->acquired_by != 0)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("replication origin with OID %d is already
active for PID %d",
replication_state->roident,
replication_state->acquired_by)));
}

So we need to change it so that the origin owner can advance its
origin, which makes sense to me.

Also, when we have to update the origin instead of committing the
catalog change while updating the origin, we cannot record the origin
timestamp. This behavior makes sense to me because we skipped the
transaction. But ISTM it’s not good if we emit the origin timestamp
only when directly updating the origin. So probably we need to always
omit origin timestamp.

Apart from that, I'm vaguely concerned that the logic seems to be
getting complex. Probably it comes from the fact that we store
skip_xid in the catalog and update the catalog to clear/set the
skip_xid. It might be worth revisiting the idea of storing skip_xid on
shmem (e.g., ReplicationState)? That way, we can always advance the
origin by replorigin_advance() and don’t need to worry about a complex
case like the server crashes during preparing the transaction. I’ve
not considered the downside yet enough, though.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#385Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#384)
Re: Skipping logical replication transactions on subscriber side

On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

I've tested it and realized that we cannot use replorigin_advance()
for this purpose without changes. That is, the current
replorigin_advance() doesn't allow to advance the origin by the owner:

/* Make sure it's not used by somebody else */
if (replication_state->acquired_by != 0)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("replication origin with OID %d is already
active for PID %d",
replication_state->roident,
replication_state->acquired_by)));
}

So we need to change it so that the origin owner can advance its
origin, which makes sense to me.

Also, when we have to update the origin instead of committing the
catalog change while updating the origin, we cannot record the origin
timestamp.

Is it because we currently update the origin timestamp with commit record?

This behavior makes sense to me because we skipped the
transaction. But ISTM it’s not good if we emit the origin timestamp
only when directly updating the origin. So probably we need to always
omit origin timestamp.

Do you mean to say that you want to omit it even when we are
committing the changes?

Apart from that, I'm vaguely concerned that the logic seems to be
getting complex. Probably it comes from the fact that we store
skip_xid in the catalog and update the catalog to clear/set the
skip_xid. It might be worth revisiting the idea of storing skip_xid on
shmem (e.g., ReplicationState)?

IIRC, the problem with that idea was that we won't remember skip_xid
information after server restart and the user won't even know that it
has to set it again.

--
With Regards,
Amit Kapila.

#386Dilip Kumar
Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#385)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 5, 2022 at 9:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Do you mean to say that you want to omit it even when we are
committing the changes?

Apart from that, I'm vaguely concerned that the logic seems to be
getting complex. Probably it comes from the fact that we store
skip_xid in the catalog and update the catalog to clear/set the
skip_xid. It might be worth revisiting the idea of storing skip_xid on
shmem (e.g., ReplicationState)?

IIRC, the problem with that idea was that we won't remember skip_xid
information after server restart and the user won't even know that it
has to set it again.

I agree, that if we don't keep it in the catalog then after restart if
the transaction replayed again then the user has to set the skip xid
again and that would be pretty inconvenient because the user might
have to analyze the failure again and repeat the same process he did
before restart. But OTOH the combination of restart and the skip xid
might not be very frequent so this might not be a very bad option.
Basically, I am in favor of storing it in a catalog as that solution
looks cleaner at least from the user pov but if we think there are a
lot of complexities from the implementation pov then we might analyze
the approach of storing in shmem as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#387Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#386)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 5, 2022 at 9:48 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Jan 5, 2022 at 9:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Do you mean to say that you want to omit it even when we are
committing the changes?

Apart from that, I'm vaguely concerned that the logic seems to be
getting complex. Probably it comes from the fact that we store
skip_xid in the catalog and update the catalog to clear/set the
skip_xid. It might be worth revisiting the idea of storing skip_xid on
shmem (e.g., ReplicationState)?

IIRC, the problem with that idea was that we won't remember skip_xid
information after server restart and the user won't even know that it
has to set it again.

I agree, that if we don't keep it in the catalog then after restart if
the transaction replayed again then the user has to set the skip xid
again and that would be pretty inconvenient because the user might
have to analyze the failure again and repeat the same process he did
before restart. But OTOH the combination of restart and the skip xid
might not be very frequent so this might not be a very bad option.
Basically, I am in favor of storing it in a catalog as that solution
looks cleaner at least from the user pov but if we think there are a
lot of complexities from the implementation pov then we might analyze
the approach of storing in shmem as well.

Fair point, but I think it is better to see the patch or the problems
that can't be solved if we pursue storing it in catalog. Even, if we
decide to store it in shmem, we need to invent some way to inform the
user that we have not honored the previous setting of skip_xid and it
needs to be reset again.

--
With Regards,
Amit Kapila.

#388Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#385)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

I've tested it and realized that we cannot use replorigin_advance()
for this purpose without changes. That is, the current
replorigin_advance() doesn't allow to advance the origin by the owner:

/* Make sure it's not used by somebody else */
if (replication_state->acquired_by != 0)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("replication origin with OID %d is already
active for PID %d",
replication_state->roident,
replication_state->acquired_by)));
}

So we need to change it so that the origin owner can advance its
origin, which makes sense to me.

Also, when we have to update the origin instead of committing the
catalog change while updating the origin, we cannot record the origin
timestamp.

Is it because we currently update the origin timestamp with commit record?

Yes.

This behavior makes sense to me because we skipped the
transaction. But ISTM it’s not good if we emit the origin timestamp
only when directly updating the origin. So probably we need to always
omit origin timestamp.

Do you mean to say that you want to omit it even when we are
committing the changes?

Yes, it would be better to record only origin lsn in terms of consistency.

Apart from that, I'm vaguely concerned that the logic seems to be
getting complex. Probably it comes from the fact that we store
skip_xid in the catalog and update the catalog to clear/set the
skip_xid. It might be worth revisiting the idea of storing skip_xid on
shmem (e.g., ReplicationState)?

IIRC, the problem with that idea was that we won't remember skip_xid
information after server restart and the user won't even know that it
has to set it again.

Right, I agree that it’s not convenient when the server restarts or
crashes, but these problems could not be critical in the situation
where users have to use this feature; the subscriber already entered
an error loop so they can know xid again and it’s an uncommon case
that they need to restart during skipping changes.

Anyway, I'll submit an updated patch soon so we can discuss complexity
vs. convenience.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#389Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#388)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 7, 2022 at 6:35 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

I've tested it and realized that we cannot use replorigin_advance()
for this purpose without changes. That is, the current
replorigin_advance() doesn't allow to advance the origin by the owner:

/* Make sure it's not used by somebody else */
if (replication_state->acquired_by != 0)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("replication origin with OID %d is already
active for PID %d",
replication_state->roident,
replication_state->acquired_by)));
}

So we need to change it so that the origin owner can advance its
origin, which makes sense to me.

Also, when we have to update the origin instead of committing the
catalog change while updating the origin, we cannot record the origin
timestamp.

Is it because we currently update the origin timestamp with commit record?

Yes.

This behavior makes sense to me because we skipped the
transaction. But ISTM it’s not good if we emit the origin timestamp
only when directly updating the origin. So probably we need to always
omit origin timestamp.

Do you mean to say that you want to omit it even when we are
committing the changes?

Yes, it would be better to record only origin lsn in terms of consistency.

I am not so sure about this point because then what purpose origin
timestamp will serve in the code.

Apart from that, I'm vaguely concerned that the logic seems to be
getting complex. Probably it comes from the fact that we store
skip_xid in the catalog and update the catalog to clear/set the
skip_xid. It might be worth revisiting the idea of storing skip_xid on
shmem (e.g., ReplicationState)?

IIRC, the problem with that idea was that we won't remember skip_xid
information after server restart and the user won't even know that it
has to set it again.

Right, I agree that it’s not convenient when the server restarts or
crashes, but these problems could not be critical in the situation
where users have to use this feature; the subscriber already entered
an error loop so they can know xid again and it’s an uncommon case
that they need to restart during skipping changes.

Anyway, I'll submit an updated patch soon so we can discuss complexity
vs. convenience.

Okay, that sounds reasonable.

--
With Regards,
Amit Kapila.

#390Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#388)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 7, 2022 at 10:04 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

I've tested it and realized that we cannot use replorigin_advance()
for this purpose without changes. That is, the current
replorigin_advance() doesn't allow to advance the origin by the owner:

/* Make sure it's not used by somebody else */
if (replication_state->acquired_by != 0)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("replication origin with OID %d is already
active for PID %d",
replication_state->roident,
replication_state->acquired_by)));
}

So we need to change it so that the origin owner can advance its
origin, which makes sense to me.

Also, when we have to update the origin instead of committing the
catalog change while updating the origin, we cannot record the origin
timestamp.

Is it because we currently update the origin timestamp with commit record?

Yes.

This behavior makes sense to me because we skipped the
transaction. But ISTM it’s not good if we emit the origin timestamp
only when directly updating the origin. So probably we need to always
omit origin timestamp.

Do you mean to say that you want to omit it even when we are
committing the changes?

Yes, it would be better to record only origin lsn in terms of consistency.

Apart from that, I'm vaguely concerned that the logic seems to be
getting complex. Probably it comes from the fact that we store
skip_xid in the catalog and update the catalog to clear/set the
skip_xid. It might be worth revisiting the idea of storing skip_xid on
shmem (e.g., ReplicationState)?

IIRC, the problem with that idea was that we won't remember skip_xid
information after server restart and the user won't even know that it
has to set it again.

Right, I agree that it’s not convenient when the server restarts or
crashes, but these problems could not be critical in the situation
where users have to use this feature; the subscriber already entered
an error loop so they can know xid again and it’s an uncommon case
that they need to restart during skipping changes.

Anyway, I'll submit an updated patch soon so we can discuss complexity
vs. convenience.

Attached an updated patch. Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v2-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patchapplication/octet-stream; name=v2-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
#391vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#390)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 7, 2022 at 11:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 7, 2022 at 10:04 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

I've tested it and realized that we cannot use replorigin_advance()
for this purpose without changes. That is, the current
replorigin_advance() doesn't allow to advance the origin by the owner:

/* Make sure it's not used by somebody else */
if (replication_state->acquired_by != 0)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("replication origin with OID %d is already
active for PID %d",
replication_state->roident,
replication_state->acquired_by)));
}

So we need to change it so that the origin owner can advance its
origin, which makes sense to me.

Also, when we have to update the origin instead of committing the
catalog change while updating the origin, we cannot record the origin
timestamp.

Is it because we currently update the origin timestamp with commit record?

Yes.

This behavior makes sense to me because we skipped the
transaction. But ISTM it’s not good if we emit the origin timestamp
only when directly updating the origin. So probably we need to always
omit origin timestamp.

Do you mean to say that you want to omit it even when we are
committing the changes?

Yes, it would be better to record only origin lsn in terms of consistency.

Apart from that, I'm vaguely concerned that the logic seems to be
getting complex. Probably it comes from the fact that we store
skip_xid in the catalog and update the catalog to clear/set the
skip_xid. It might be worth revisiting the idea of storing skip_xid on
shmem (e.g., ReplicationState)?

IIRC, the problem with that idea was that we won't remember skip_xid
information after server restart and the user won't even know that it
has to set it again.

Right, I agree that it’s not convenient when the server restarts or
crashes, but these problems could not be critical in the situation
where users have to use this feature; the subscriber already entered
an error loop so they can know xid again and it’s an uncommon case
that they need to restart during skipping changes.

Anyway, I'll submit an updated patch soon so we can discuss complexity
vs. convenience.

Attached an updated patch. Please review it.

Thanks for the updated patch, few comments:
1) Should this be case insensitive to support NONE too:
+                       /* Setting xid = NONE is treated as resetting xid */
+                       if (strcmp(xid_str, "none") == 0)
+                               xid = InvalidTransactionId;

2) Can we have an option to specify last_error_xid of
pg_stat_subscription_workers. Something like:
alter subscription sub1 skip ( XID = 'last_subscription_error');

When the user specified last_subscription_error, it should pick
last_error_xid from pg_stat_subscription_workers.
As this operation is a critical operation, if there is an option which
could automatically pick and set from pg_stat_subscription_workers, it
would be useful.

3) Currently the following syntax is being supported, I felt this
should throw an error:
postgres=# alter subscription sub1 set ( XID = 100);
ALTER SUBSCRIPTION

4) You might need to rebase the patch:
git am v2-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
Applying: Add ALTER SUBSCRIPTION ... SKIP to skip the transaction on
subscriber nodes
error: patch failed: doc/src/sgml/logical-replication.sgml:333
error: doc/src/sgml/logical-replication.sgml: patch does not apply
Patch failed at 0001 Add ALTER SUBSCRIPTION ... SKIP to skip the
transaction on subscriber nodes
hint: Use 'git am --show-current-patch=diff' to see the failed patch

5) You might have to rename 027_skip_xact to 028_skip_xact as
027_nosuperuser.pl already exists
diff --git a/src/test/subscription/t/027_skip_xact.pl
b/src/test/subscription/t/027_skip_xact.pl
new file mode 100644
index 0000000000..a63c9c345e
--- /dev/null
+++ b/src/test/subscription/t/027_skip_xact.pl

Regards,
Vignesh

#392Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#380)
Re: Skipping logical replication transactions on subscriber side

On Thu, Dec 16, 2021 at 11:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

IIUC, the changes corresponding to above in the latest patch are as follows:

--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -921,7 +921,8 @@ replorigin_advance(RepOriginId node,
  LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
  /* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 &&
+ replication_state->acquired_by != MyProcPid)
  {
...
clear_subscription_skip_xid()
{
..
+ else if (!XLogRecPtrIsInvalid(origin_lsn))
+ {
+ /*
+ * User has already changed subskipxid before clearing the subskipxid, so
+ * don't change the catalog but just advance the replication origin.
+ */
+ replorigin_advance(replorigin_session_origin, origin_lsn,
+    GetXLogInsertRecPtr(),
+    false, /* go_backward */
+    true /* wal_log */);
+ }
..
}

I was thinking what if we don't advance origin explicitly in this
case? Actually, that will be no different than the transactions where
the apply worker doesn't apply any change because the initial sync is
in progress (see should_apply_changes_for_rel()) or we have received
an empty transaction. In those cases also, the origin lsn won't be
advanced even though we acknowledge the advanced last_received
location because of keep_alive messages. Now, it is possible after the
restart we send the old start_lsn location because the replication
origin was not updated before restart but we handle that case in the
server by starting from the last confirmed location. See below code:

CreateDecodingContext()
{
..
else if (start_lsn < slot->data.confirmed_flush)
..

Few other comments on the latest patch:
=================================
1.
A conflict will produce an error and will stop the replication; it must be
    resolved manually by the user.  Details about the conflict can be found in
-   the subscriber's server log.
+   <xref linkend="monitoring-pg-stat-subscription-workers"/> as well as the
+   subscriber's server log.

Can we slightly change the modified line to: "Details about the
conflict can be found in <xref
linkend="monitoring-pg-stat-subscription-workers"/> and the
subscriber's server log."? I think we can commit this change
separately as this is true even without this patch.

2.
    The resolution can be done either by changing data on the subscriber so
-   that it does not conflict with the incoming change or by skipping the
-   transaction that conflicts with the existing data.  The transaction can be
-   skipped by calling the <link linkend="pg-replication-origin-advance">
+   that it does not conflict with the incoming changes or by skipping the whole
+   transaction.  This option specifies the ID of the transaction whose
+   application is to be skipped by the logical replication worker.  The logical
+   replication worker skips all data modification transaction conflicts with
+   the existing data. When a conflict produce an error, it is shown in
+   <structname>pg_stat_subscription_workers</structname> view as follows:

I don't think most of the additional text added in the above paragraph
is required. We can rephrase it as: "The resolution can be done either
by changing data on the subscriber so that it does not conflict with
the incoming change or by skipping the transaction that conflicts with
the existing data. When a conflict produces an error, it is shown in
<structname>pg_stat_subscription_workers</structname> view as
follows:". After that keep the text, you have.

3.
They skip the whole transaction, including changes that may not violate any
+   constraint.  They may easily make the subscriber inconsistent, especially if
+   a user specifies the wrong transaction ID or the position of origin.

Can we slightly reword the above text as: "Skipping the whole
transaction includes skipping the changes that may not violate any
constraint. This can easily make the subscriber inconsistent,
especially if a user specifies the wrong transaction ID or the
position of origin."?

4.
The logical replication worker skips all data
+      modification changes within the specified transaction.  Therefore, since
+      it skips the whole transaction including the changes that may not violate
+      the constraint, it should only be used as a last resort. This option has
+      no effect for the transaction that is already prepared with enabling
+      <literal>two_phase</literal> on susbscriber.

Let's slightly reword the above text as: "The logical replication
worker skips all data modification changes within the specified
transaction including the changes that may not violate the constraint,
so, it should only be used as a last resort. This option has no effect
on the transaction that is already prepared by enabling
<literal>two_phase</literal> on the subscriber."

5.
+          by the logical replication worker. Setting
<literal>NONE</literal> means
+          to reset the transaction ID.

Let's slightly reword the second part of the sentence as: "Setting
<literal>NONE</literal> resets the transaction ID."

6.
Once we start skipping
+ * changes, we don't stop it until the we skip all changes of the
transaction even
+ * if the subscription invalidated and MySubscription->skipxid gets
changed or reset.

/subscription invalidated/subscription is invalidated

What do you mean by subscription invalidated and how is it related to
this feature? I think we should mention something on these lines in
the docs as well.

7.
"Please refer to the comments in these functions for details.". We can
slightly modify this part of the comment as: "Please refer to the
comments in corresponding functions for details."

--
With Regards,
Amit Kapila.

#393Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#391)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 10, 2022 at 2:57 PM vignesh C <vignesh21@gmail.com> wrote:

2) Can we have an option to specify last_error_xid of
pg_stat_subscription_workers. Something like:
alter subscription sub1 skip ( XID = 'last_subscription_error');

When the user specified last_subscription_error, it should pick
last_error_xid from pg_stat_subscription_workers.
As this operation is a critical operation, if there is an option which
could automatically pick and set from pg_stat_subscription_workers, it
would be useful.

I think having some automatic functionality around this would be good
but I am not so sure about this idea because it is possible that the
error has not reached the stats collector and the user might be
referring to server logs to set the skip xid. In such cases, even
though an error would have occurred but we won't be able to set the
required xid. Now, one can imagine that if we don't get the required
value from pg_stat_subscription_workers then we can return an error to
the user indicating that she can cross-verify the server logs and set
the appropriate xid value but IMO it could be confusing. I feel even
if we want some automatic functionality like you are proposing or
something else, it could be done as a separate patch but let's wait
and see what Sawada-San or others think about this?

--
With Regards,
Amit Kapila.

#394vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#393)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 11, 2022 at 7:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jan 10, 2022 at 2:57 PM vignesh C <vignesh21@gmail.com> wrote:

2) Can we have an option to specify last_error_xid of
pg_stat_subscription_workers. Something like:
alter subscription sub1 skip ( XID = 'last_subscription_error');

When the user specified last_subscription_error, it should pick
last_error_xid from pg_stat_subscription_workers.
As this operation is a critical operation, if there is an option which
could automatically pick and set from pg_stat_subscription_workers, it
would be useful.

I think having some automatic functionality around this would be good
but I am not so sure about this idea because it is possible that the
error has not reached the stats collector and the user might be
referring to server logs to set the skip xid. In such cases, even
though an error would have occurred but we won't be able to set the
required xid. Now, one can imagine that if we don't get the required
value from pg_stat_subscription_workers then we can return an error to
the user indicating that she can cross-verify the server logs and set
the appropriate xid value but IMO it could be confusing. I feel even
if we want some automatic functionality like you are proposing or
something else, it could be done as a separate patch but let's wait
and see what Sawada-San or others think about this?

If we are ok with the suggested idea then it can be done as a separate
patch, I agree that it need not be part of the existing patch.

Regards,
Vignesh

#395Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#392)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

IIUC, the changes corresponding to above in the latest patch are as follows:

--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -921,7 +921,8 @@ replorigin_advance(RepOriginId node,
LWLockAcquire(&replication_state->lock, LW_EXCLUSIVE);
/* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 &&
+ replication_state->acquired_by != MyProcPid)
{
...
clear_subscription_skip_xid()
{
..
+ else if (!XLogRecPtrIsInvalid(origin_lsn))
+ {
+ /*
+ * User has already changed subskipxid before clearing the subskipxid, so
+ * don't change the catalog but just advance the replication origin.
+ */
+ replorigin_advance(replorigin_session_origin, origin_lsn,
+    GetXLogInsertRecPtr(),
+    false, /* go_backward */
+    true /* wal_log */);
+ }
..
}

I was thinking what if we don't advance origin explicitly in this
case? Actually, that will be no different than the transactions where
the apply worker doesn't apply any change because the initial sync is
in progress (see should_apply_changes_for_rel()) or we have received
an empty transaction. In those cases also, the origin lsn won't be
advanced even though we acknowledge the advanced last_received
location because of keep_alive messages. Now, it is possible after the
restart we send the old start_lsn location because the replication
origin was not updated before restart but we handle that case in the
server by starting from the last confirmed location. See below code:

CreateDecodingContext()
{
..
else if (start_lsn < slot->data.confirmed_flush)
..

Good point. Probably one minor thing that is different from the
transaction where the apply worker applied an empty transaction is a
case where the server restarts/crashes before sending an
acknowledgment of the flush location. That is, in the case of the
empty transaction, the publisher sends an empty transaction again. On
the other hand in the case of skipping the transaction, a non-empty
transaction will be sent again but skip_xid is already changed or
cleared, therefore the user will have to specify skip_xid again. If we
write replication origin WAL record to advance the origin lsn, it
reduces the possibility of that. But I think it’s a very minor case so
we won’t need to deal with that.

Anyway, according to your analysis, I think we don't necessarily need
to do replorigin_advance() in this case.

Few other comments on the latest patch:
=================================
1.
A conflict will produce an error and will stop the replication; it must be
resolved manually by the user.  Details about the conflict can be found in
-   the subscriber's server log.
+   <xref linkend="monitoring-pg-stat-subscription-workers"/> as well as the
+   subscriber's server log.

Can we slightly change the modified line to: "Details about the
conflict can be found in <xref
linkend="monitoring-pg-stat-subscription-workers"/> and the
subscriber's server log."?

Will fix it.

I think we can commit this change
separately as this is true even without this patch.

Right. It seems an oversight of 8d74fc96db. I've attached the patch.

2.
The resolution can be done either by changing data on the subscriber so
-   that it does not conflict with the incoming change or by skipping the
-   transaction that conflicts with the existing data.  The transaction can be
-   skipped by calling the <link linkend="pg-replication-origin-advance">
+   that it does not conflict with the incoming changes or by skipping the whole
+   transaction.  This option specifies the ID of the transaction whose
+   application is to be skipped by the logical replication worker.  The logical
+   replication worker skips all data modification transaction conflicts with
+   the existing data. When a conflict produce an error, it is shown in
+   <structname>pg_stat_subscription_workers</structname> view as follows:

I don't think most of the additional text added in the above paragraph
is required. We can rephrase it as: "The resolution can be done either
by changing data on the subscriber so that it does not conflict with
the incoming change or by skipping the transaction that conflicts with
the existing data. When a conflict produces an error, it is shown in
<structname>pg_stat_subscription_workers</structname> view as
follows:". After that keep the text, you have.

Agreed, will fix.

3.
They skip the whole transaction, including changes that may not violate any
+   constraint.  They may easily make the subscriber inconsistent, especially if
+   a user specifies the wrong transaction ID or the position of origin.

Can we slightly reword the above text as: "Skipping the whole
transaction includes skipping the changes that may not violate any
constraint. This can easily make the subscriber inconsistent,
especially if a user specifies the wrong transaction ID or the
position of origin."?

Will fix.

4.
The logical replication worker skips all data
+      modification changes within the specified transaction.  Therefore, since
+      it skips the whole transaction including the changes that may not violate
+      the constraint, it should only be used as a last resort. This option has
+      no effect for the transaction that is already prepared with enabling
+      <literal>two_phase</literal> on susbscriber.

Let's slightly reword the above text as: "The logical replication
worker skips all data modification changes within the specified
transaction including the changes that may not violate the constraint,
so, it should only be used as a last resort. This option has no effect
on the transaction that is already prepared by enabling
<literal>two_phase</literal> on the subscriber."

Will fix.

5.
+          by the logical replication worker. Setting
<literal>NONE</literal> means
+          to reset the transaction ID.

Let's slightly reword the second part of the sentence as: "Setting
<literal>NONE</literal> resets the transaction ID."

Will fix.

6.
Once we start skipping
+ * changes, we don't stop it until the we skip all changes of the
transaction even
+ * if the subscription invalidated and MySubscription->skipxid gets
changed or reset.

/subscription invalidated/subscription is invalidated

Will fix.

What do you mean by subscription invalidated and how is it related to
this feature? I think we should mention something on these lines in
the docs as well.

I meant that MySubscription, a cache of pg_subscription entry, is
invalidated by the catalog change. IIUC while applying changes we
don't re-read pg_subscription (i.e., not calling
maybe_reread_subscription()). Similarly, while skipping changes, we
also don't do that. Therefore, even if skip_xid has been changed while
skipping changes, we don't stop skipping changes.

7.
"Please refer to the comments in these functions for details.". We can
slightly modify this part of the comment as: "Please refer to the
comments in corresponding functions for details."

Will fix.

I'll submit an updated patch soon.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

doc_update.patchapplication/octet-stream; name=doc_update.patch
#396Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#393)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 11, 2022 at 11:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jan 10, 2022 at 2:57 PM vignesh C <vignesh21@gmail.com> wrote:

2) Can we have an option to specify last_error_xid of
pg_stat_subscription_workers. Something like:
alter subscription sub1 skip ( XID = 'last_subscription_error');

When the user specified last_subscription_error, it should pick
last_error_xid from pg_stat_subscription_workers.
As this operation is a critical operation, if there is an option which
could automatically pick and set from pg_stat_subscription_workers, it
would be useful.

I think having some automatic functionality around this would be good
but I am not so sure about this idea because it is possible that the
error has not reached the stats collector and the user might be
referring to server logs to set the skip xid. In such cases, even
though an error would have occurred but we won't be able to set the
required xid. Now, one can imagine that if we don't get the required
value from pg_stat_subscription_workers then we can return an error to
the user indicating that she can cross-verify the server logs and set
the appropriate xid value but IMO it could be confusing. I feel even
if we want some automatic functionality like you are proposing or
something else, it could be done as a separate patch but let's wait
and see what Sawada-San or others think about this?

Agreed. The automatically setting XID would be a good idea but we can
do that in a separate patch so we can keep the first patch simple.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#397Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#395)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 11, 2022 at 8:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I was thinking what if we don't advance origin explicitly in this
case? Actually, that will be no different than the transactions where
the apply worker doesn't apply any change because the initial sync is
in progress (see should_apply_changes_for_rel()) or we have received
an empty transaction. In those cases also, the origin lsn won't be
advanced even though we acknowledge the advanced last_received
location because of keep_alive messages. Now, it is possible after the
restart we send the old start_lsn location because the replication
origin was not updated before restart but we handle that case in the
server by starting from the last confirmed location. See below code:

CreateDecodingContext()
{
..
else if (start_lsn < slot->data.confirmed_flush)
..

Good point. Probably one minor thing that is different from the
transaction where the apply worker applied an empty transaction is a
case where the server restarts/crashes before sending an
acknowledgment of the flush location. That is, in the case of the
empty transaction, the publisher sends an empty transaction again. On
the other hand in the case of skipping the transaction, a non-empty
transaction will be sent again but skip_xid is already changed or
cleared, therefore the user will have to specify skip_xid again. If we
write replication origin WAL record to advance the origin lsn, it
reduces the possibility of that. But I think it’s a very minor case so
we won’t need to deal with that.

Yeah, in the worst case, it will lead to conflict again and the user
needs to set the xid again.

Anyway, according to your analysis, I think we don't necessarily need
to do replorigin_advance() in this case.

Right.

5.
+          by the logical replication worker. Setting
<literal>NONE</literal> means
+          to reset the transaction ID.

Let's slightly reword the second part of the sentence as: "Setting
<literal>NONE</literal> resets the transaction ID."

Will fix.

6.
Once we start skipping
+ * changes, we don't stop it until the we skip all changes of the
transaction even
+ * if the subscription invalidated and MySubscription->skipxid gets
changed or reset.

/subscription invalidated/subscription is invalidated

Will fix.

What do you mean by subscription invalidated and how is it related to
this feature? I think we should mention something on these lines in
the docs as well.

I meant that MySubscription, a cache of pg_subscription entry, is
invalidated by the catalog change. IIUC while applying changes we
don't re-read pg_subscription (i.e., not calling
maybe_reread_subscription()). Similarly, while skipping changes, we
also don't do that. Therefore, even if skip_xid has been changed while
skipping changes, we don't stop skipping changes.

Okay, but I don't think we need to mention subscription is invalidated
as that could be confusing, the other part of the comment is quite
clear.

--
With Regards,
Amit Kapila.

#398Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#397)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 11, 2022 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 8:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I was thinking what if we don't advance origin explicitly in this
case? Actually, that will be no different than the transactions where
the apply worker doesn't apply any change because the initial sync is
in progress (see should_apply_changes_for_rel()) or we have received
an empty transaction. In those cases also, the origin lsn won't be
advanced even though we acknowledge the advanced last_received
location because of keep_alive messages. Now, it is possible after the
restart we send the old start_lsn location because the replication
origin was not updated before restart but we handle that case in the
server by starting from the last confirmed location. See below code:

CreateDecodingContext()
{
..
else if (start_lsn < slot->data.confirmed_flush)
..

Good point. Probably one minor thing that is different from the
transaction where the apply worker applied an empty transaction is a
case where the server restarts/crashes before sending an
acknowledgment of the flush location. That is, in the case of the
empty transaction, the publisher sends an empty transaction again. On
the other hand in the case of skipping the transaction, a non-empty
transaction will be sent again but skip_xid is already changed or
cleared, therefore the user will have to specify skip_xid again. If we
write replication origin WAL record to advance the origin lsn, it
reduces the possibility of that. But I think it’s a very minor case so
we won’t need to deal with that.

Yeah, in the worst case, it will lead to conflict again and the user
needs to set the xid again.

On second thought, the same is true for other cases, for example,
preparing the transaction and clearing skip_xid while handling a
prepare message. That is, currently we don't clear skip_xid while
handling a prepare message but do that while handling commit/rollback
prepared message, in order to avoid the worst case. If we do both
while handling a prepare message and the server crashes between them,
it ends up that skip_xid is cleared and the transaction will be
resent, which is identical to the worst-case above. Therefore, if we
accept this situation because of its low probability, probably we can
do the same things for other cases too, which makes the patch simple
especially for prepare and commit/rollback-prepared cases. What do you
think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#399Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#398)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 11, 2022 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 8:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I was thinking what if we don't advance origin explicitly in this
case? Actually, that will be no different than the transactions where
the apply worker doesn't apply any change because the initial sync is
in progress (see should_apply_changes_for_rel()) or we have received
an empty transaction. In those cases also, the origin lsn won't be
advanced even though we acknowledge the advanced last_received
location because of keep_alive messages. Now, it is possible after the
restart we send the old start_lsn location because the replication
origin was not updated before restart but we handle that case in the
server by starting from the last confirmed location. See below code:

CreateDecodingContext()
{
..
else if (start_lsn < slot->data.confirmed_flush)
..

Good point. Probably one minor thing that is different from the
transaction where the apply worker applied an empty transaction is a
case where the server restarts/crashes before sending an
acknowledgment of the flush location. That is, in the case of the
empty transaction, the publisher sends an empty transaction again. On
the other hand in the case of skipping the transaction, a non-empty
transaction will be sent again but skip_xid is already changed or
cleared, therefore the user will have to specify skip_xid again. If we
write replication origin WAL record to advance the origin lsn, it
reduces the possibility of that. But I think it’s a very minor case so
we won’t need to deal with that.

Yeah, in the worst case, it will lead to conflict again and the user
needs to set the xid again.

On second thought, the same is true for other cases, for example,
preparing the transaction and clearing skip_xid while handling a
prepare message. That is, currently we don't clear skip_xid while
handling a prepare message but do that while handling commit/rollback
prepared message, in order to avoid the worst case. If we do both
while handling a prepare message and the server crashes between them,
it ends up that skip_xid is cleared and the transaction will be
resent, which is identical to the worst-case above.

How are you thinking to update the skip xid before prepare? If we do
it in the same transaction then the changes in the catalog will be
part of the prepared xact but won't be committed. Now, say if we do it
after prepare, then the situation won't be the same because after
restart the same xact won't appear again.

--
With Regards,
Amit Kapila.

#400Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#395)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 11, 2022 at 8:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Few other comments on the latest patch:
=================================
1.
A conflict will produce an error and will stop the replication; it must be
resolved manually by the user.  Details about the conflict can be found in
-   the subscriber's server log.
+   <xref linkend="monitoring-pg-stat-subscription-workers"/> as well as the
+   subscriber's server log.

Can we slightly change the modified line to: "Details about the
conflict can be found in <xref
linkend="monitoring-pg-stat-subscription-workers"/> and the
subscriber's server log."?

Will fix it.

I think we can commit this change
separately as this is true even without this patch.

Right. It seems an oversight of 8d74fc96db. I've attached the patch.

Pushed.

--
With Regards,
Amit Kapila.

#401Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#399)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 11, 2022 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 8:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I was thinking what if we don't advance origin explicitly in this
case? Actually, that will be no different than the transactions where
the apply worker doesn't apply any change because the initial sync is
in progress (see should_apply_changes_for_rel()) or we have received
an empty transaction. In those cases also, the origin lsn won't be
advanced even though we acknowledge the advanced last_received
location because of keep_alive messages. Now, it is possible after the
restart we send the old start_lsn location because the replication
origin was not updated before restart but we handle that case in the
server by starting from the last confirmed location. See below code:

CreateDecodingContext()
{
..
else if (start_lsn < slot->data.confirmed_flush)
..

Good point. Probably one minor thing that is different from the
transaction where the apply worker applied an empty transaction is a
case where the server restarts/crashes before sending an
acknowledgment of the flush location. That is, in the case of the
empty transaction, the publisher sends an empty transaction again. On
the other hand in the case of skipping the transaction, a non-empty
transaction will be sent again but skip_xid is already changed or
cleared, therefore the user will have to specify skip_xid again. If we
write replication origin WAL record to advance the origin lsn, it
reduces the possibility of that. But I think it’s a very minor case so
we won’t need to deal with that.

Yeah, in the worst case, it will lead to conflict again and the user
needs to set the xid again.

On second thought, the same is true for other cases, for example,
preparing the transaction and clearing skip_xid while handling a
prepare message. That is, currently we don't clear skip_xid while
handling a prepare message but do that while handling commit/rollback
prepared message, in order to avoid the worst case. If we do both
while handling a prepare message and the server crashes between them,
it ends up that skip_xid is cleared and the transaction will be
resent, which is identical to the worst-case above.

How are you thinking to update the skip xid before prepare? If we do
it in the same transaction then the changes in the catalog will be
part of the prepared xact but won't be committed. Now, say if we do it
after prepare, then the situation won't be the same because after
restart the same xact won't appear again.

I was thinking to commit the catalog change first in a separate
transaction while not updating origin LSN and then prepare an empty
transaction while updating origin LSN. If the server crashes between
them, the skip_xid is cleared but the transaction will be resent.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#402Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#400)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 11, 2022 at 7:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 8:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Few other comments on the latest patch:
=================================
1.
A conflict will produce an error and will stop the replication; it must be
resolved manually by the user.  Details about the conflict can be found in
-   the subscriber's server log.
+   <xref linkend="monitoring-pg-stat-subscription-workers"/> as well as the
+   subscriber's server log.

Can we slightly change the modified line to: "Details about the
conflict can be found in <xref
linkend="monitoring-pg-stat-subscription-workers"/> and the
subscriber's server log."?

Will fix it.

I think we can commit this change
separately as this is true even without this patch.

Right. It seems an oversight of 8d74fc96db. I've attached the patch.

Pushed.

Thanks!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#403Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#401)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 12, 2022 at 5:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On second thought, the same is true for other cases, for example,
preparing the transaction and clearing skip_xid while handling a
prepare message. That is, currently we don't clear skip_xid while
handling a prepare message but do that while handling commit/rollback
prepared message, in order to avoid the worst case. If we do both
while handling a prepare message and the server crashes between them,
it ends up that skip_xid is cleared and the transaction will be
resent, which is identical to the worst-case above.

How are you thinking to update the skip xid before prepare? If we do
it in the same transaction then the changes in the catalog will be
part of the prepared xact but won't be committed. Now, say if we do it
after prepare, then the situation won't be the same because after
restart the same xact won't appear again.

I was thinking to commit the catalog change first in a separate
transaction while not updating origin LSN and then prepare an empty
transaction while updating origin LSN.

But, won't it complicate the handling if in the future we try to
enhance this API such that it skips partial changes like skipping only
for particular relation(s) or particular operations as discussed
previously in this thread?

--
With Regards,
Amit Kapila.

#404Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#403)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 12, 2022 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 12, 2022 at 5:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On second thought, the same is true for other cases, for example,
preparing the transaction and clearing skip_xid while handling a
prepare message. That is, currently we don't clear skip_xid while
handling a prepare message but do that while handling commit/rollback
prepared message, in order to avoid the worst case. If we do both
while handling a prepare message and the server crashes between them,
it ends up that skip_xid is cleared and the transaction will be
resent, which is identical to the worst-case above.

How are you thinking to update the skip xid before prepare? If we do
it in the same transaction then the changes in the catalog will be
part of the prepared xact but won't be committed. Now, say if we do it
after prepare, then the situation won't be the same because after
restart the same xact won't appear again.

I was thinking to commit the catalog change first in a separate
transaction while not updating origin LSN and then prepare an empty
transaction while updating origin LSN.

But, won't it complicate the handling if in the future we try to
enhance this API such that it skips partial changes like skipping only
for particular relation(s) or particular operations as discussed
previously in this thread?

Right. I was thinking that if we accept the situation that the user
has to set skip_xid again in case of the server crashes, we might be
able to accept also the situation that the user has to clear skip_xid
in a case of the server crashes. But it seems the former is less
problematic.

I've attached an updated patch that incorporated all comments I got so far.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v3-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patchapplication/octet-stream; name=v3-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
#405Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#391)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 10, 2022 at 6:27 PM vignesh C <vignesh21@gmail.com> wrote:

On Fri, Jan 7, 2022 at 11:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 7, 2022 at 10:04 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I thought we just want to lock before clearing the skip_xid something
like take the lock, check if the skip_xid in the catalog is the same
as we have skipped, if it is the same then clear it, otherwise, leave
it as it is. How will that disallow users to change skip_xid when we
are skipping changes?

Oh I thought we wanted to keep holding the lock while skipping changes
(changing skip_xid requires acquiring the lock).

So if skip_xid is already changed, the apply worker would do
replorigin_advance() with WAL logging, instead of committing the
catalog change?

Right. BTW, how are you planning to advance the origin? Normally, a
commit transaction would do it but when we are skipping all changes,
the commit might not do it as there won't be any transaction id
assigned.

I've not tested it yet but replorigin_advance() with wal_log = true
seems to work for this case.

I've tested it and realized that we cannot use replorigin_advance()
for this purpose without changes. That is, the current
replorigin_advance() doesn't allow to advance the origin by the owner:

/* Make sure it's not used by somebody else */
if (replication_state->acquired_by != 0)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("replication origin with OID %d is already
active for PID %d",
replication_state->roident,
replication_state->acquired_by)));
}

So we need to change it so that the origin owner can advance its
origin, which makes sense to me.

Also, when we have to update the origin instead of committing the
catalog change while updating the origin, we cannot record the origin
timestamp.

Is it because we currently update the origin timestamp with commit record?

Yes.

This behavior makes sense to me because we skipped the
transaction. But ISTM it’s not good if we emit the origin timestamp
only when directly updating the origin. So probably we need to always
omit origin timestamp.

Do you mean to say that you want to omit it even when we are
committing the changes?

Yes, it would be better to record only origin lsn in terms of consistency.

Apart from that, I'm vaguely concerned that the logic seems to be
getting complex. Probably it comes from the fact that we store
skip_xid in the catalog and update the catalog to clear/set the
skip_xid. It might be worth revisiting the idea of storing skip_xid on
shmem (e.g., ReplicationState)?

IIRC, the problem with that idea was that we won't remember skip_xid
information after server restart and the user won't even know that it
has to set it again.

Right, I agree that it’s not convenient when the server restarts or
crashes, but these problems could not be critical in the situation
where users have to use this feature; the subscriber already entered
an error loop so they can know xid again and it’s an uncommon case
that they need to restart during skipping changes.

Anyway, I'll submit an updated patch soon so we can discuss complexity
vs. convenience.

Attached an updated patch. Please review it.

Thank you for the comments!

Thanks for the updated patch, few comments:
1) Should this be case insensitive to support NONE too:
+                       /* Setting xid = NONE is treated as resetting xid */
+                       if (strcmp(xid_str, "none") == 0)
+                               xid = InvalidTransactionId;

I think the string value is always small cases so we don't need to do
strcacsecmp here.

2) Can we have an option to specify last_error_xid of
pg_stat_subscription_workers. Something like:
alter subscription sub1 skip ( XID = 'last_subscription_error');

When the user specified last_subscription_error, it should pick
last_error_xid from pg_stat_subscription_workers.
As this operation is a critical operation, if there is an option which
could automatically pick and set from pg_stat_subscription_workers, it
would be useful.

As I mentioned before in another mail, I think we can do that in a
separate patch.

3) Currently the following syntax is being supported, I felt this
should throw an error:
postgres=# alter subscription sub1 set ( XID = 100);
ALTER SUBSCRIPTION

Fixed.

4) You might need to rebase the patch:
git am v2-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
Applying: Add ALTER SUBSCRIPTION ... SKIP to skip the transaction on
subscriber nodes
error: patch failed: doc/src/sgml/logical-replication.sgml:333
error: doc/src/sgml/logical-replication.sgml: patch does not apply
Patch failed at 0001 Add ALTER SUBSCRIPTION ... SKIP to skip the
transaction on subscriber nodes
hint: Use 'git am --show-current-patch=diff' to see the failed patch

5) You might have to rename 027_skip_xact to 028_skip_xact as
027_nosuperuser.pl already exists
diff --git a/src/test/subscription/t/027_skip_xact.pl
b/src/test/subscription/t/027_skip_xact.pl
new file mode 100644
index 0000000000..a63c9c345e
--- /dev/null
+++ b/src/test/subscription/t/027_skip_xact.pl

I've resolved these conflicts.

These comments are incorporated into the latest v3 patch I just submitted[1]/messages/by-id/CAD21AoD9JXah2V8uFURUpZbK_ewsut+jb1ESm6YQkrhQm3nJRg@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoD9JXah2V8uFURUpZbK_ewsut+jb1ESm6YQkrhQm3nJRg@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#406vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#404)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 12, 2022 at 11:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 12, 2022 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 12, 2022 at 5:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On second thought, the same is true for other cases, for example,
preparing the transaction and clearing skip_xid while handling a
prepare message. That is, currently we don't clear skip_xid while
handling a prepare message but do that while handling commit/rollback
prepared message, in order to avoid the worst case. If we do both
while handling a prepare message and the server crashes between them,
it ends up that skip_xid is cleared and the transaction will be
resent, which is identical to the worst-case above.

How are you thinking to update the skip xid before prepare? If we do
it in the same transaction then the changes in the catalog will be
part of the prepared xact but won't be committed. Now, say if we do it
after prepare, then the situation won't be the same because after
restart the same xact won't appear again.

I was thinking to commit the catalog change first in a separate
transaction while not updating origin LSN and then prepare an empty
transaction while updating origin LSN.

But, won't it complicate the handling if in the future we try to
enhance this API such that it skips partial changes like skipping only
for particular relation(s) or particular operations as discussed
previously in this thread?

Right. I was thinking that if we accept the situation that the user
has to set skip_xid again in case of the server crashes, we might be
able to accept also the situation that the user has to clear skip_xid
in a case of the server crashes. But it seems the former is less
problematic.

I've attached an updated patch that incorporated all comments I got so far.

Thanks for the updated patch, few comments:
1) Currently skip xid is not displayed in describe subscriptions, can
we include it too:
\dRs+ sub1
List of subscriptions
Name | Owner | Enabled | Publication | Binary | Streaming | Two
phase commit | Synchronous commit | Conninfo
------+---------+---------+-------------+--------+-----------+------------------+--------------------+--------------------------------
sub1 | vignesh | t | {pub1} | f | f | e
| off | dbname=postgres host=localhost
(1 row)

2) This import "use PostgreSQL::Test::Utils;" is not required:
+# Tests for skipping logical replication transactions.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 6;
3) Some of the comments uses a punctuation mark and some of them does
not use, Should we keep it consistent:
+    # Wait for worker error
+    $node_subscriber->poll_query_until(
+       'postgres',
+    # Set skip xid
+    $node_subscriber->safe_psql(
+       'postgres',
+# Create publisher node.
+my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+# Create subscriber node.
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
4) Should this be changed:
+ * True if we are skipping all data modification changes (INSERT,
UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we
start skipping
+ * changes, we don't stop it until the we skip all changes of the
transaction even
+ * if pg_subscription is updated that and MySubscription->skipxid
gets changed or
to:
+ * True if we are skipping all data modification changes (INSERT,
UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we
start skipping
+ * changes, we don't stop it until we skip all changes of the transaction even
+ * if pg_subscription is updated that and MySubscription->skipxid
gets changed or

In "stop it until the we skip all changes", here the is not required.

Regards,
Vignesh

#407tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#404)
RE: Skipping logical replication transactions on subscriber side

On Wed, Jan 12, 2022 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporated all comments I got so far.

Thanks for updating the patch. Here are some comments:

1)
+ Skip applying changes of the particular transaction. If incoming data

Should "Skip" be "Skips" ?

2)
+      prepared by enabling <literal>two_phase</literal> on susbscriber.  After h
+      the logical replication successfully skips the transaction, the transaction

The "h" after word "After" seems redundant.

3)
+ Skipping the whole transaction includes skipping the cahnge that may not violate

"cahnge" should be "changes" I think.

4)
+/*
+ * True if we are skipping all data modification changes (INSERT, UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we start skipping
...
+ */
+static TransactionId skipping_xid = InvalidTransactionId;
+#define is_skipping_changes() (TransactionIdIsValid(skipping_xid))

Maybe we should modify this comment. Something like:
skipping_xid is valid if we are skipping all data modification changes ...

5)
+					if (!superuser())
+						ereport(ERROR,
+								(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+								 errmsg("must be superuser to set %s", "skip_xid")));

Should we change the message to "must be superuser to skip xid"?
Because the SQL stmt is "ALTER SUBSCRIPTION ... SKIP (xid = XXX)".

Regards,
Tang

#408Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#406)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 12, 2022 at 11:10 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Jan 12, 2022 at 11:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 12, 2022 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 12, 2022 at 5:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On second thought, the same is true for other cases, for example,
preparing the transaction and clearing skip_xid while handling a
prepare message. That is, currently we don't clear skip_xid while
handling a prepare message but do that while handling commit/rollback
prepared message, in order to avoid the worst case. If we do both
while handling a prepare message and the server crashes between them,
it ends up that skip_xid is cleared and the transaction will be
resent, which is identical to the worst-case above.

How are you thinking to update the skip xid before prepare? If we do
it in the same transaction then the changes in the catalog will be
part of the prepared xact but won't be committed. Now, say if we do it
after prepare, then the situation won't be the same because after
restart the same xact won't appear again.

I was thinking to commit the catalog change first in a separate
transaction while not updating origin LSN and then prepare an empty
transaction while updating origin LSN.

But, won't it complicate the handling if in the future we try to
enhance this API such that it skips partial changes like skipping only
for particular relation(s) or particular operations as discussed
previously in this thread?

Right. I was thinking that if we accept the situation that the user
has to set skip_xid again in case of the server crashes, we might be
able to accept also the situation that the user has to clear skip_xid
in a case of the server crashes. But it seems the former is less
problematic.

I've attached an updated patch that incorporated all comments I got so far.

Thanks for the updated patch, few comments:

Thank you for the comments!

1) Currently skip xid is not displayed in describe subscriptions, can
we include it too:
\dRs+ sub1
List of subscriptions
Name | Owner | Enabled | Publication | Binary | Streaming | Two
phase commit | Synchronous commit | Conninfo
------+---------+---------+-------------+--------+-----------+------------------+--------------------+--------------------------------
sub1 | vignesh | t | {pub1} | f | f | e
| off | dbname=postgres host=localhost
(1 row)

2) This import "use PostgreSQL::Test::Utils;" is not required:
+# Tests for skipping logical replication transactions.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 6;
3) Some of the comments uses a punctuation mark and some of them does
not use, Should we keep it consistent:
+    # Wait for worker error
+    $node_subscriber->poll_query_until(
+       'postgres',
+    # Set skip xid
+    $node_subscriber->safe_psql(
+       'postgres',
+# Create publisher node.
+my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+# Create subscriber node.
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
4) Should this be changed:
+ * True if we are skipping all data modification changes (INSERT,
UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we
start skipping
+ * changes, we don't stop it until the we skip all changes of the
transaction even
+ * if pg_subscription is updated that and MySubscription->skipxid
gets changed or
to:
+ * True if we are skipping all data modification changes (INSERT,
UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we
start skipping
+ * changes, we don't stop it until we skip all changes of the transaction even
+ * if pg_subscription is updated that and MySubscription->skipxid
gets changed or

In "stop it until the we skip all changes", here the is not required.

I agree with all the comments above. I've attached an updated patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v4-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patchapplication/octet-stream; name=v4-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
#409Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tanghy.fnst@fujitsu.com (#407)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jan 13, 2022 at 10:07 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Wed, Jan 12, 2022 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporated all comments I got so far.

Thanks for updating the patch. Here are some comments:

Thank you for the comments!

1)
+ Skip applying changes of the particular transaction. If incoming data

Should "Skip" be "Skips" ?

2)
+      prepared by enabling <literal>two_phase</literal> on susbscriber.  After h
+      the logical replication successfully skips the transaction, the transaction

The "h" after word "After" seems redundant.

3)
+ Skipping the whole transaction includes skipping the cahnge that may not violate

"cahnge" should be "changes" I think.

4)
+/*
+ * True if we are skipping all data modification changes (INSERT, UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we start skipping
...
+ */
+static TransactionId skipping_xid = InvalidTransactionId;
+#define is_skipping_changes() (TransactionIdIsValid(skipping_xid))

Maybe we should modify this comment. Something like:
skipping_xid is valid if we are skipping all data modification changes ...

5)
+                                       if (!superuser())
+                                               ereport(ERROR,
+                                                               (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                                                                errmsg("must be superuser to set %s", "skip_xid")));

Should we change the message to "must be superuser to skip xid"?
Because the SQL stmt is "ALTER SUBSCRIPTION ... SKIP (xid = XXX)".

I agree with all the comments above. These are incorporated into the
latest v4 patch I've just submitted[1]postgresql.org/message-id/CAD21AoBZC87nY1pCaexk1uBA68JSBmy2-UqLGirT9g-RVMhjKw%40mail.gmail.com.

Regards,

[1]: postgresql.org/message-id/CAD21AoBZC87nY1pCaexk1uBA68JSBmy2-UqLGirT9g-RVMhjKw%40mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#410vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#408)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 14, 2022 at 7:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 12, 2022 at 11:10 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Jan 12, 2022 at 11:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 12, 2022 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 12, 2022 at 5:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On second thought, the same is true for other cases, for example,
preparing the transaction and clearing skip_xid while handling a
prepare message. That is, currently we don't clear skip_xid while
handling a prepare message but do that while handling commit/rollback
prepared message, in order to avoid the worst case. If we do both
while handling a prepare message and the server crashes between them,
it ends up that skip_xid is cleared and the transaction will be
resent, which is identical to the worst-case above.

How are you thinking to update the skip xid before prepare? If we do
it in the same transaction then the changes in the catalog will be
part of the prepared xact but won't be committed. Now, say if we do it
after prepare, then the situation won't be the same because after
restart the same xact won't appear again.

I was thinking to commit the catalog change first in a separate
transaction while not updating origin LSN and then prepare an empty
transaction while updating origin LSN.

But, won't it complicate the handling if in the future we try to
enhance this API such that it skips partial changes like skipping only
for particular relation(s) or particular operations as discussed
previously in this thread?

Right. I was thinking that if we accept the situation that the user
has to set skip_xid again in case of the server crashes, we might be
able to accept also the situation that the user has to clear skip_xid
in a case of the server crashes. But it seems the former is less
problematic.

I've attached an updated patch that incorporated all comments I got so far.

Thanks for the updated patch, few comments:

Thank you for the comments!

1) Currently skip xid is not displayed in describe subscriptions, can
we include it too:
\dRs+ sub1
List of subscriptions
Name | Owner | Enabled | Publication | Binary | Streaming | Two
phase commit | Synchronous commit | Conninfo
------+---------+---------+-------------+--------+-----------+------------------+--------------------+--------------------------------
sub1 | vignesh | t | {pub1} | f | f | e
| off | dbname=postgres host=localhost
(1 row)

2) This import "use PostgreSQL::Test::Utils;" is not required:
+# Tests for skipping logical replication transactions.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 6;
3) Some of the comments uses a punctuation mark and some of them does
not use, Should we keep it consistent:
+    # Wait for worker error
+    $node_subscriber->poll_query_until(
+       'postgres',
+    # Set skip xid
+    $node_subscriber->safe_psql(
+       'postgres',
+# Create publisher node.
+my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+# Create subscriber node.
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
4) Should this be changed:
+ * True if we are skipping all data modification changes (INSERT,
UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we
start skipping
+ * changes, we don't stop it until the we skip all changes of the
transaction even
+ * if pg_subscription is updated that and MySubscription->skipxid
gets changed or
to:
+ * True if we are skipping all data modification changes (INSERT,
UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we
start skipping
+ * changes, we don't stop it until we skip all changes of the transaction even
+ * if pg_subscription is updated that and MySubscription->skipxid
gets changed or

In "stop it until the we skip all changes", here the is not required.

I agree with all the comments above. I've attached an updated patch.

Thanks for the updated patch, few minor comments:
1) Should "SKIP" be "SKIP (" here:
@@ -1675,7 +1675,7 @@ psql_completion(const char *text, int start, int end)
        /* ALTER SUBSCRIPTION <name> */
        else if (Matches("ALTER", "SUBSCRIPTION", MatchAny))
                COMPLETE_WITH("CONNECTION", "ENABLE", "DISABLE", "OWNER TO",
-                                         "RENAME TO", "REFRESH
PUBLICATION", "SET",
+                                         "RENAME TO", "REFRESH
PUBLICATION", "SET", "SKIP",
2) We could add a test for this if possible:
+               case ALTER_SUBSCRIPTION_SKIP:
+                       {
+                               if (!superuser())
+                                       ereport(ERROR,
+
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                                                        errmsg("must
be superuser to skip transaction")));

3) There was one typo in commit message, transaciton shoudl be transaction:
After skipping the transaciton the apply worker clears
pg_subscription.subskipxid.

Another small typo, susbscriber should be subscriber:
+      prepared by enabling <literal>two_phase</literal> on susbscriber.  After
+      the logical replication successfully skips the transaction, the
transaction
4) Should skipsubxid be mentioned as subskipxid here:
+      * Clear the subskipxid of pg_subscription catalog.  This catalog
+      * update must be committed before finishing prepared transaction.
+      * Because otherwise, in a case where the server crashes between
+      * finishing prepared transaction and the catalog update, COMMIT
+      * PREPARED won’t be resent but skipsubxid is left.

Regards,
Vignesh

#411Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#408)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 14, 2022 at 7:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I agree with all the comments above. I've attached an updated patch.

Review comments
================
1.
+
+  <para>
+   In this case, you need to consider changing the data on the
subscriber so that it

The starting of this sentence doesn't make sense to me. How about
changing it like: "To resolve conflicts, you need to ...

2.
+      <structname>pg_subscription</structname>.<structfield>subskipxid</structfield>)
+      is cleared.  See <xref linkend="logical-replication-conflicts"/> for
+      the details of logical replication conflicts.
+     </para>
+
+     <para>
+      <replaceable>skip_option</replaceable> specifies options for
this operation.
+      The supported option is:
+
+      <variablelist>
+       <varlistentry>
+        <term><literal>xid</literal> (<type>xid</type>)</term>
+        <listitem>
+         <para>
+          Specifies the ID of the transaction whose changes are to be skipped
+          by the logical replication worker. Setting
<literal>NONE</literal> resets
+          the transaction ID.
+         </para>

Empty spaces after line finish are inconsistent. I personally use a
single space before a new line but I see that others use two spaces
and the nearby documentation also uses two spaces in this regard so I
am fine either way but let's be consistent.

3.
+ case ALTER_SUBSCRIPTION_SKIP:
+ {
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to skip transaction")));
+
+ parse_subscription_options(pstate, stmt->options, SUBOPT_XID, &opts);
+
+ if (IsSet(opts.specified_opts, SUBOPT_XID))
..
..

Is there a case when the above 'if (IsSet(..' won't be true? If not,
then probably there should be Assert instead of 'if'.

4.
+static TransactionId skipping_xid = InvalidTransactionId;

I find this variable name bit odd. Can we name it skip_xid?

5.
+ * skipping_xid is valid if we are skipping all data modification changes
+ * (INSERT, UPDATE, etc.) of the specified transaction at
MySubscription->skipxid.
+ * Once we start skipping changes, we don't stop it until we skip all changes

I think it would be better to write the first line of comment as: "We
enable skipping all data modification changes (INSERT, UPDATE, etc.)
for the subscription if the user has specified skip_xid. Once we ..."

6.
+static void
+maybe_start_skipping_changes(TransactionId xid)
+{
+ Assert(!is_skipping_changes());
+ Assert(!in_remote_transaction);
+ Assert(!in_streamed_transaction);
+
+ /* Make sure subscription cache is up-to-date */
+ maybe_reread_subscription();

Why do we need to update the cache here by calling
maybe_reread_subscription() and at other places in the patch? It is
sufficient to get the skip_xid value at the start of the worker via
GetSubscription().

7. In maybe_reread_subscription(), isn't there a need to check whether
skip_xid is changed where we exit and launch the worker and compare
other subscription parameters?

8.
+static void
+clear_subscription_skip_xid(TransactionId xid, XLogRecPtr origin_lsn,
+ TimestampTz origin_timestamp)
+{
+ Relation rel;
+ Form_pg_subscription subform;
+ HeapTuple tup;
+ bool nulls[Natts_pg_subscription];
+ bool replaces[Natts_pg_subscription];
+ Datum values[Natts_pg_subscription];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+ memset(replaces, false, sizeof(replaces));
+
+ if (!IsTransactionState())
+ StartTransactionCommand();
+
+ LockSharedObject(SubscriptionRelationId, MySubscription->oid, 0,
+ AccessShareLock);

It is important to add a comment as to why we need a lock here.

9.
+ * needs to be set subskipxid again.  We can reduce the possibility by
+ * logging a replication origin WAL record to advance the origin LSN
+ * instead but it doesn't seem to be worth since it's a very minor case.

You can also add here that there is no way to advance origin_timestamp
so that would be inconsistent.

10.
+clear_subscription_skip_xid(TransactionId xid, XLogRecPtr origin_lsn,
+ TimestampTz origin_timestamp)
{
..
..
+ if (!IsTransactionState())
+ StartTransactionCommand();
..
..
+ CommitTransactionCommand();
..
}

The transaction should be committed in this function if it is started
here otherwise it should be the responsibility of the caller to commit
it.

--
With Regards,
Amit Kapila.

#412Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#410)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 14, 2022 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks for the updated patch, few minor comments:
1) Should "SKIP" be "SKIP (" here:
@@ -1675,7 +1675,7 @@ psql_completion(const char *text, int start, int end)
/* ALTER SUBSCRIPTION <name> */
else if (Matches("ALTER", "SUBSCRIPTION", MatchAny))
COMPLETE_WITH("CONNECTION", "ENABLE", "DISABLE", "OWNER TO",
-                                         "RENAME TO", "REFRESH
PUBLICATION", "SET",
+                                         "RENAME TO", "REFRESH
PUBLICATION", "SET", "SKIP",
Won't the another rule as follows added by patch sufficient for what
you are asking?
+ /* ALTER SUBSCRIPTION <name> SKIP */
+ else if (Matches("ALTER", "SUBSCRIPTION", MatchAny, "SKIP"))
+ COMPLETE_WITH("(");

I might be missing something but why do you think the handling of SKIP
be any different than what we are doing for SET?

--
With Regards,
Amit Kapila.

#413Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#411)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Sat, Jan 15, 2022 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 14, 2022 at 7:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I agree with all the comments above. I've attached an updated patch.

Review comments
================

Thank you for the comments!

1.
+
+  <para>
+   In this case, you need to consider changing the data on the
subscriber so that it

The starting of this sentence doesn't make sense to me. How about
changing it like: "To resolve conflicts, you need to ...

Fixed.

2.
+      <structname>pg_subscription</structname>.<structfield>subskipxid</structfield>)
+      is cleared.  See <xref linkend="logical-replication-conflicts"/> for
+      the details of logical replication conflicts.
+     </para>
+
+     <para>
+      <replaceable>skip_option</replaceable> specifies options for
this operation.
+      The supported option is:
+
+      <variablelist>
+       <varlistentry>
+        <term><literal>xid</literal> (<type>xid</type>)</term>
+        <listitem>
+         <para>
+          Specifies the ID of the transaction whose changes are to be skipped
+          by the logical replication worker. Setting
<literal>NONE</literal> resets
+          the transaction ID.
+         </para>

Empty spaces after line finish are inconsistent. I personally use a
single space before a new line but I see that others use two spaces
and the nearby documentation also uses two spaces in this regard so I
am fine either way but let's be consistent.

Fixed.

3.
+ case ALTER_SUBSCRIPTION_SKIP:
+ {
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to skip transaction")));
+
+ parse_subscription_options(pstate, stmt->options, SUBOPT_XID, &opts);
+
+ if (IsSet(opts.specified_opts, SUBOPT_XID))
..
..

Is there a case when the above 'if (IsSet(..' won't be true? If not,
then probably there should be Assert instead of 'if'.

Fixed.

4.
+static TransactionId skipping_xid = InvalidTransactionId;

I find this variable name bit odd. Can we name it skip_xid?

Okay, renamed.

5.
+ * skipping_xid is valid if we are skipping all data modification changes
+ * (INSERT, UPDATE, etc.) of the specified transaction at
MySubscription->skipxid.
+ * Once we start skipping changes, we don't stop it until we skip all changes

I think it would be better to write the first line of comment as: "We
enable skipping all data modification changes (INSERT, UPDATE, etc.)
for the subscription if the user has specified skip_xid. Once we ..."

Changed.

6.
+static void
+maybe_start_skipping_changes(TransactionId xid)
+{
+ Assert(!is_skipping_changes());
+ Assert(!in_remote_transaction);
+ Assert(!in_streamed_transaction);
+
+ /* Make sure subscription cache is up-to-date */
+ maybe_reread_subscription();

Why do we need to update the cache here by calling
maybe_reread_subscription() and at other places in the patch? It is
sufficient to get the skip_xid value at the start of the worker via
GetSubscription().

MySubscription could be out-of-date after a user changes the catalog.
In non-skipping change cases, we check it when starting the
transaction in begin_replication_step() which is called, e.g., when
applying an insert change. But I think we need to make sure it’s
up-to-date at the beginning of applying changes, that is, before
starting a transaction. Otherwise, we may end up skipping the
transaction based on out-of-dated subscription cache.

The reason why calling calling maybe_reread_subscription in both
apply_handle_commit_prepared() and apply_handle_rollback_prepared() is
the same; MySubscription could be out-of-date when applying
commit-prepared or rollback-prepared since we have not called
begin_replication_step() to open a new transaction.

7. In maybe_reread_subscription(), isn't there a need to check whether
skip_xid is changed where we exit and launch the worker and compare
other subscription parameters?

IIUC we relaunch the worker here when subscription parameters such as
slot_name was changed. In the current implementation, I think that
relaunching the worker is not necessarily necessary when skip_xid is
changed. For instance, when skipping the prepared transaction, we
deliberately don’t clear subskipxid of pg_subscription and do that at
commit-prepared or rollback-prepared case. There are chances that the
user changes skip_xid before commit-prepared or rollback-prepared. But
we tolerate this case.

Also, in non-streaming and non-2PC cases, while skipping changes we
don’t call maybe_reread_subscription() until all changes are skipped.
So it cannot work to cancel skipping changes that is already started.

8.
+static void
+clear_subscription_skip_xid(TransactionId xid, XLogRecPtr origin_lsn,
+ TimestampTz origin_timestamp)
+{
+ Relation rel;
+ Form_pg_subscription subform;
+ HeapTuple tup;
+ bool nulls[Natts_pg_subscription];
+ bool replaces[Natts_pg_subscription];
+ Datum values[Natts_pg_subscription];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, false, sizeof(nulls));
+ memset(replaces, false, sizeof(replaces));
+
+ if (!IsTransactionState())
+ StartTransactionCommand();
+
+ LockSharedObject(SubscriptionRelationId, MySubscription->oid, 0,
+ AccessShareLock);

It is important to add a comment as to why we need a lock here.

Added.

9.
+ * needs to be set subskipxid again.  We can reduce the possibility by
+ * logging a replication origin WAL record to advance the origin LSN
+ * instead but it doesn't seem to be worth since it's a very minor case.

You can also add here that there is no way to advance origin_timestamp
so that would be inconsistent.

Added.

10.
+clear_subscription_skip_xid(TransactionId xid, XLogRecPtr origin_lsn,
+ TimestampTz origin_timestamp)
{
..
..
+ if (!IsTransactionState())
+ StartTransactionCommand();
..
..
+ CommitTransactionCommand();
..
}

The transaction should be committed in this function if it is started
here otherwise it should be the responsibility of the caller to commit
it.

Fixed.

I've attached an updated patch that incorporated these comments except
for 6 and 7 that we probably need more discussion on. The comments
from Vignesh are also incorporated.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v5-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patchapplication/octet-stream; name=v5-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
#414Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#410)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 14, 2022 at 9:05 PM vignesh C <vignesh21@gmail.com> wrote:

On Fri, Jan 14, 2022 at 7:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 12, 2022 at 11:10 PM vignesh C <vignesh21@gmail.com> wrote:

On Wed, Jan 12, 2022 at 11:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 12, 2022 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 12, 2022 at 5:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 11, 2022 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 11, 2022 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On second thought, the same is true for other cases, for example,
preparing the transaction and clearing skip_xid while handling a
prepare message. That is, currently we don't clear skip_xid while
handling a prepare message but do that while handling commit/rollback
prepared message, in order to avoid the worst case. If we do both
while handling a prepare message and the server crashes between them,
it ends up that skip_xid is cleared and the transaction will be
resent, which is identical to the worst-case above.

How are you thinking to update the skip xid before prepare? If we do
it in the same transaction then the changes in the catalog will be
part of the prepared xact but won't be committed. Now, say if we do it
after prepare, then the situation won't be the same because after
restart the same xact won't appear again.

I was thinking to commit the catalog change first in a separate
transaction while not updating origin LSN and then prepare an empty
transaction while updating origin LSN.

But, won't it complicate the handling if in the future we try to
enhance this API such that it skips partial changes like skipping only
for particular relation(s) or particular operations as discussed
previously in this thread?

Right. I was thinking that if we accept the situation that the user
has to set skip_xid again in case of the server crashes, we might be
able to accept also the situation that the user has to clear skip_xid
in a case of the server crashes. But it seems the former is less
problematic.

I've attached an updated patch that incorporated all comments I got so far.

Thanks for the updated patch, few comments:

Thank you for the comments!

1) Currently skip xid is not displayed in describe subscriptions, can
we include it too:
\dRs+ sub1
List of subscriptions
Name | Owner | Enabled | Publication | Binary | Streaming | Two
phase commit | Synchronous commit | Conninfo
------+---------+---------+-------------+--------+-----------+------------------+--------------------+--------------------------------
sub1 | vignesh | t | {pub1} | f | f | e
| off | dbname=postgres host=localhost
(1 row)

2) This import "use PostgreSQL::Test::Utils;" is not required:
+# Tests for skipping logical replication transactions.
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 6;
3) Some of the comments uses a punctuation mark and some of them does
not use, Should we keep it consistent:
+    # Wait for worker error
+    $node_subscriber->poll_query_until(
+       'postgres',
+    # Set skip xid
+    $node_subscriber->safe_psql(
+       'postgres',
+# Create publisher node.
+my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+# Create subscriber node.
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
4) Should this be changed:
+ * True if we are skipping all data modification changes (INSERT,
UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we
start skipping
+ * changes, we don't stop it until the we skip all changes of the
transaction even
+ * if pg_subscription is updated that and MySubscription->skipxid
gets changed or
to:
+ * True if we are skipping all data modification changes (INSERT,
UPDATE, etc.) of
+ * the specified transaction at MySubscription->skipxid.  Once we
start skipping
+ * changes, we don't stop it until we skip all changes of the transaction even
+ * if pg_subscription is updated that and MySubscription->skipxid
gets changed or

In "stop it until the we skip all changes", here the is not required.

I agree with all the comments above. I've attached an updated patch.

Thanks for the updated patch, few minor comments:

Thank you for the comments.

1) Should "SKIP" be "SKIP (" here:
@@ -1675,7 +1675,7 @@ psql_completion(const char *text, int start, int end)
/* ALTER SUBSCRIPTION <name> */
else if (Matches("ALTER", "SUBSCRIPTION", MatchAny))
COMPLETE_WITH("CONNECTION", "ENABLE", "DISABLE", "OWNER TO",
-                                         "RENAME TO", "REFRESH
PUBLICATION", "SET",
+                                         "RENAME TO", "REFRESH
PUBLICATION", "SET", "SKIP",

As Amit mentioned, it's consistent with the SET option.

2) We could add a test for this if possible:
+               case ALTER_SUBSCRIPTION_SKIP:
+                       {
+                               if (!superuser())
+                                       ereport(ERROR,
+
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                                                        errmsg("must
be superuser to skip transaction")));

3) There was one typo in commit message, transaciton shoudl be transaction:
After skipping the transaciton the apply worker clears
pg_subscription.subskipxid.

Another small typo, susbscriber should be subscriber:
+      prepared by enabling <literal>two_phase</literal> on susbscriber.  After
+      the logical replication successfully skips the transaction, the
transaction
4) Should skipsubxid be mentioned as subskipxid here:
+      * Clear the subskipxid of pg_subscription catalog.  This catalog
+      * update must be committed before finishing prepared transaction.
+      * Because otherwise, in a case where the server crashes between
+      * finishing prepared transaction and the catalog update, COMMIT
+      * PREPARED won’t be resent but skipsubxid is left.

The above comments were incorporated into the latest v5 patch I just
submitted[1]/messages/by-id/CAD21AoCd3Y2-b67+pVrzrdteUmup1XG6JeHYOa5dGjh8qZ3VuQ@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoCd3Y2-b67+pVrzrdteUmup1XG6JeHYOa5dGjh8qZ3VuQ@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#415Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#413)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 17, 2022 at 9:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jan 15, 2022 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

6.
+static void
+maybe_start_skipping_changes(TransactionId xid)
+{
+ Assert(!is_skipping_changes());
+ Assert(!in_remote_transaction);
+ Assert(!in_streamed_transaction);
+
+ /* Make sure subscription cache is up-to-date */
+ maybe_reread_subscription();

Why do we need to update the cache here by calling
maybe_reread_subscription() and at other places in the patch? It is
sufficient to get the skip_xid value at the start of the worker via
GetSubscription().

MySubscription could be out-of-date after a user changes the catalog.
In non-skipping change cases, we check it when starting the
transaction in begin_replication_step() which is called, e.g., when
applying an insert change. But I think we need to make sure it’s
up-to-date at the beginning of applying changes, that is, before
starting a transaction. Otherwise, we may end up skipping the
transaction based on out-of-dated subscription cache.

I thought the user would normally set skip_xid only after an error
which means that the value should be as new as the time of the start
of the worker. I am slightly worried about the cost we might need to
pay for this additional look-up in case skip_xid is not changed. Do
you see any valid user scenario where we might not see the required
skip_xid? I am okay with calling this if we really need it.

7. In maybe_reread_subscription(), isn't there a need to check whether
skip_xid is changed where we exit and launch the worker and compare
other subscription parameters?

IIUC we relaunch the worker here when subscription parameters such as
slot_name was changed. In the current implementation, I think that
relaunching the worker is not necessarily necessary when skip_xid is
changed. For instance, when skipping the prepared transaction, we
deliberately don’t clear subskipxid of pg_subscription and do that at
commit-prepared or rollback-prepared case. There are chances that the
user changes skip_xid before commit-prepared or rollback-prepared. But
we tolerate this case.

I think between prepare and commit prepared, the user only needs to
change it if there is another error in which case we will anyway
restart and load the new value of same. But, I understand that we
don't need to restart if skip_xid is changed as it might not impact
remote connection in any way, so I am fine for not doing anything for
this.

--
With Regards,
Amit Kapila.

#416Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#415)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 17, 2022 at 2:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jan 17, 2022 at 9:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jan 15, 2022 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

6.
+static void
+maybe_start_skipping_changes(TransactionId xid)
+{
+ Assert(!is_skipping_changes());
+ Assert(!in_remote_transaction);
+ Assert(!in_streamed_transaction);
+
+ /* Make sure subscription cache is up-to-date */
+ maybe_reread_subscription();

Why do we need to update the cache here by calling
maybe_reread_subscription() and at other places in the patch? It is
sufficient to get the skip_xid value at the start of the worker via
GetSubscription().

MySubscription could be out-of-date after a user changes the catalog.
In non-skipping change cases, we check it when starting the
transaction in begin_replication_step() which is called, e.g., when
applying an insert change. But I think we need to make sure it’s
up-to-date at the beginning of applying changes, that is, before
starting a transaction. Otherwise, we may end up skipping the
transaction based on out-of-dated subscription cache.

I thought the user would normally set skip_xid only after an error
which means that the value should be as new as the time of the start
of the worker. I am slightly worried about the cost we might need to
pay for this additional look-up in case skip_xid is not changed. Do
you see any valid user scenario where we might not see the required
skip_xid? I am okay with calling this if we really need it.

Fair point. I've changed the code accordingly.

7. In maybe_reread_subscription(), isn't there a need to check whether
skip_xid is changed where we exit and launch the worker and compare
other subscription parameters?

IIUC we relaunch the worker here when subscription parameters such as
slot_name was changed. In the current implementation, I think that
relaunching the worker is not necessarily necessary when skip_xid is
changed. For instance, when skipping the prepared transaction, we
deliberately don’t clear subskipxid of pg_subscription and do that at
commit-prepared or rollback-prepared case. There are chances that the
user changes skip_xid before commit-prepared or rollback-prepared. But
we tolerate this case.

I think between prepare and commit prepared, the user only needs to
change it if there is another error in which case we will anyway
restart and load the new value of same. But, I understand that we
don't need to restart if skip_xid is changed as it might not impact
remote connection in any way, so I am fine for not doing anything for
this.

I'll leave this part for now. We can change it later if others think
it's necessary.

I've attached an updated patch. Please review it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v6-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patchapplication/octet-stream; name=v6-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
#417osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#416)
RE: Skipping logical replication transactions on subscriber side

On Monday, January 17, 2022 3:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. Please review it.

Hi, thank you for sharing a new patch.
Few comments on the v6.

(1) doc/src/sgml/ref/alter_subscription.sgml

+ resort. This option has no effect on the transaction that is already

One TAB exists between "resort" and "This".

(2) Minor improvement suggestion of comment in src/backend/replication/logical/worker.c

+ * reset during that.  Also, we don't skip receiving the changes in streaming
+ * cases, since we decide whether or not to skip applying the changes when

I sugguest that you don't use 'streaming cases', because
what "streaming cases" means sounds a bit broader than actual your implementation.
We do skip transaction of streaming cases but not during the spooling phase, right ?

I suggest below.

"We don't skip receiving the changes at the phase to spool streaming transactions"

(3) in the comment of apply_handle_prepare_internal, two full-width characters.

3-1
+ * won’t be resent in a case where the server crashes between them.

3-2
+ * COMMIT PREPARED or ROLLBACK PREPARED. But that’s okay because this

You have full-width characters for "won't" and "that's".
Could you please check ?

(4) typo

+ * the subscription if hte user has specified skip_xid. Once we start skipping

"hte" should "the" ?

(5)

I can miss something here but, in one of
the past discussions, there seems a consensus that
if the user specifies XID of a subtransaction,
it would be better to skip only the subtransaction.

This time, is it out of the range of the patch ?
If so, I suggest you include some description about it
either in the commit message or around codes related to it.

(6)

I feel it's a better idea to include a test whether
to skip aborted streaming transaction clears the XID
in the TAP test for this feature, in a sense to cover
various new code paths. Did you have any special reason
to omit the case ?

(7)

I want more explanation for the reason to restart the subscriber
in the TAP test because this is not mandatory operation.
(We can pass the TAP tests without this restart)

From :
# Restart the subscriber node to restart logical replication with no interval

IIUC, below would be better.

To :
# As an optimization to finish tests earlier, restart the subscriber with no interval,
# rather than waiting for new error to laucher a new apply worker.

Best Regards,
Takamichi Osumi

#418osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: osumi.takamichi@fujitsu.com (#417)
RE: Skipping logical replication transactions on subscriber side

On Monday, January 17, 2022 5:03 PM I wrote:

Hi, thank you for sharing a new patch.
Few comments on the v6.

(1) doc/src/sgml/ref/alter_subscription.sgml

+      resort.  This option has no effect on the transaction that is
+ already

One TAB exists between "resort" and "This".

(2) Minor improvement suggestion of comment in
src/backend/replication/logical/worker.c

+ * reset during that.  Also, we don't skip receiving the changes in
+ streaming
+ * cases, since we decide whether or not to skip applying the changes
+ when

I sugguest that you don't use 'streaming cases', because what "streaming
cases" means sounds a bit broader than actual your implementation.
We do skip transaction of streaming cases but not during the spooling phase,
right ?

I suggest below.

"We don't skip receiving the changes at the phase to spool streaming
transactions"

(3) in the comment of apply_handle_prepare_internal, two full-width
characters.

3-1
+ * won’t be resent in a case where the server crashes between them.

3-2
+ * COMMIT PREPARED or ROLLBACK PREPARED. But that’s okay
because this

You have full-width characters for "won't" and "that's".
Could you please check ?

(4) typo

+ * the subscription if hte user has specified skip_xid. Once we start
+ skipping

"hte" should "the" ?

(5)

I can miss something here but, in one of the past discussions, there seems a
consensus that if the user specifies XID of a subtransaction, it would be better
to skip only the subtransaction.

This time, is it out of the range of the patch ?
If so, I suggest you include some description about it either in the commit
message or around codes related to it.

(6)

I feel it's a better idea to include a test whether to skip aborted streaming
transaction clears the XID in the TAP test for this feature, in a sense to cover
various new code paths. Did you have any special reason to omit the case ?

(7)

I want more explanation for the reason to restart the subscriber in the TAP test
because this is not mandatory operation.
(We can pass the TAP tests without this restart)

From :
# Restart the subscriber node to restart logical replication with no interval

IIUC, below would be better.

To :
# As an optimization to finish tests earlier, restart the subscriber with no
interval, # rather than waiting for new error to laucher a new apply worker.

Few more minor comments

(8) another full-width char in apply_handle_commit_prepared

+ * PREPARED won't be resent but subskipxid is left.

Kindly check "won't" ?

(9) the header comments of clear_subscription_skip_xid

+/* clear subskipxid of pg_subscription catalog */

Should start with an upper letter ?

(10) some variable declarations and initialization of clear_subscription_skip_xid

There's no harm in moving below codes into a condition case
where the user didn't change the subskipxid before
apply worker clearing it.

+       bool            nulls[Natts_pg_subscription];
+       bool            replaces[Natts_pg_subscription];
+       Datum           values[Natts_pg_subscription];
+
+       memset(values, 0, sizeof(values));
+       memset(nulls, false, sizeof(nulls));
+       memset(replaces, false, sizeof(replaces));

Best Regards,
Takamichi Osumi

#419Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#417)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 17, 2022 at 5:03 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Monday, January 17, 2022 3:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. Please review it.

Hi, thank you for sharing a new patch.
Few comments on the v6.

Thank you for the comments!

(1) doc/src/sgml/ref/alter_subscription.sgml

+ resort. This option has no effect on the transaction that is already

One TAB exists between "resort" and "This".

Will remove.

(2) Minor improvement suggestion of comment in src/backend/replication/logical/worker.c

+ * reset during that.  Also, we don't skip receiving the changes in streaming
+ * cases, since we decide whether or not to skip applying the changes when

I sugguest that you don't use 'streaming cases', because
what "streaming cases" means sounds a bit broader than actual your implementation.
We do skip transaction of streaming cases but not during the spooling phase, right ?

I suggest below.

"We don't skip receiving the changes at the phase to spool streaming transactions"

I might be missing your point but I think it's correct that we don't
skip receiving the change of the transaction that is sent via
streaming protocol. And it doesn't sound broader to me. Could you
elaborate on that?

(3) in the comment of apply_handle_prepare_internal, two full-width characters.

3-1
+ * won’t be resent in a case where the server crashes between them.

3-2
+ * COMMIT PREPARED or ROLLBACK PREPARED. But that’s okay because this

You have full-width characters for "won't" and "that's".
Could you please check ?

Which characters in "won't" are full-width characters? I could not find them.

(4) typo

+ * the subscription if hte user has specified skip_xid. Once we start skipping

"hte" should "the" ?

Will fix.

(5)

I can miss something here but, in one of
the past discussions, there seems a consensus that
if the user specifies XID of a subtransaction,
it would be better to skip only the subtransaction.

This time, is it out of the range of the patch ?
If so, I suggest you include some description about it
either in the commit message or around codes related to it.

How can the user know subtransaction XID? I suppose you refer to
streaming protocol cases but while applying spooled changes we don't
report subtransaction XID neither in server log nor
pg_stat_subscription_workers.

(6)

I feel it's a better idea to include a test whether
to skip aborted streaming transaction clears the XID
in the TAP test for this feature, in a sense to cover
various new code paths. Did you have any special reason
to omit the case ?

Which code path is newly covered by this aborted streaming transaction
tests? I think that this patch is already covered even by the test for
a committed-and-streamed transaction. It doesn't matter whether the
streamed transaction is committed or aborted because an error occurs
while applying the spooled changes.

(7)

I want more explanation for the reason to restart the subscriber
in the TAP test because this is not mandatory operation.
(We can pass the TAP tests without this restart)

From :
# Restart the subscriber node to restart logical replication with no interval

IIUC, below would be better.

To :
# As an optimization to finish tests earlier, restart the subscriber with no interval,
# rather than waiting for new error to laucher a new apply worker.

I could not understand why the proposed sentence has more information.
Does it mean you want to mention "As an optimization to finish tests
earlier"?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#420Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#418)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 17, 2022 at 9:35 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Monday, January 17, 2022 5:03 PM I wrote:

Hi, thank you for sharing a new patch.
Few comments on the v6.

(1) doc/src/sgml/ref/alter_subscription.sgml

+      resort.  This option has no effect on the transaction that is
+ already

One TAB exists between "resort" and "This".

(2) Minor improvement suggestion of comment in
src/backend/replication/logical/worker.c

+ * reset during that.  Also, we don't skip receiving the changes in
+ streaming
+ * cases, since we decide whether or not to skip applying the changes
+ when

I sugguest that you don't use 'streaming cases', because what "streaming
cases" means sounds a bit broader than actual your implementation.
We do skip transaction of streaming cases but not during the spooling phase,
right ?

I suggest below.

"We don't skip receiving the changes at the phase to spool streaming
transactions"

(3) in the comment of apply_handle_prepare_internal, two full-width
characters.

3-1
+ * won’t be resent in a case where the server crashes between them.

3-2
+ * COMMIT PREPARED or ROLLBACK PREPARED. But that’s okay
because this

You have full-width characters for "won't" and "that's".
Could you please check ?

(4) typo

+ * the subscription if hte user has specified skip_xid. Once we start
+ skipping

"hte" should "the" ?

(5)

I can miss something here but, in one of the past discussions, there seems a
consensus that if the user specifies XID of a subtransaction, it would be better
to skip only the subtransaction.

This time, is it out of the range of the patch ?
If so, I suggest you include some description about it either in the commit
message or around codes related to it.

(6)

I feel it's a better idea to include a test whether to skip aborted streaming
transaction clears the XID in the TAP test for this feature, in a sense to cover
various new code paths. Did you have any special reason to omit the case ?

(7)

I want more explanation for the reason to restart the subscriber in the TAP test
because this is not mandatory operation.
(We can pass the TAP tests without this restart)

From :
# Restart the subscriber node to restart logical replication with no interval

IIUC, below would be better.

To :
# As an optimization to finish tests earlier, restart the subscriber with no
interval, # rather than waiting for new error to laucher a new apply worker.

Few more minor comments

Thank you for the comments!

(8) another full-width char in apply_handle_commit_prepared

+ * PREPARED won't be resent but subskipxid is left.

Kindly check "won't" ?

Again, I don't follow what you mean by full-width character in this context.

(9) the header comments of clear_subscription_skip_xid

+/* clear subskipxid of pg_subscription catalog */

Should start with an upper letter ?

Okay, I'll change it.

(10) some variable declarations and initialization of clear_subscription_skip_xid

There's no harm in moving below codes into a condition case
where the user didn't change the subskipxid before
apply worker clearing it.

+       bool            nulls[Natts_pg_subscription];
+       bool            replaces[Natts_pg_subscription];
+       Datum           values[Natts_pg_subscription];
+
+       memset(values, 0, sizeof(values));
+       memset(nulls, false, sizeof(nulls));
+       memset(replaces, false, sizeof(replaces));

Will move.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#421Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#419)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 17, 2022 at 6:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

(5)

I can miss something here but, in one of
the past discussions, there seems a consensus that
if the user specifies XID of a subtransaction,
it would be better to skip only the subtransaction.

This time, is it out of the range of the patch ?
If so, I suggest you include some description about it
either in the commit message or around codes related to it.

How can the user know subtransaction XID? I suppose you refer to
streaming protocol cases but while applying spooled changes we don't
report subtransaction XID neither in server log nor
pg_stat_subscription_workers.

I also think in the current system users won't be aware of
subtransaction's XID but I feel Osumi-San's point is valid that we
should at least add it in docs that we allow to skip only top-level
xacts. Also, in the future, it won't be impossible to imagine that we
can have subtransaction's XID info also available to users as we have
that in the case of streaming xacts (See subxact_data).

Few minor points:
===============
1.
+ * the subscription if hte user has specified skip_xid.

Typo. /hte/the

2.
+ * PREPARED won’t be resent but subskipxid is left.

In diffmerge tool, won't is showing some funny chars. When I manually
removed 't and added it again, everything is fine. I am not sure why
it is so? I think Osumi-San has also raised this complaint.

3.
+ /*
+ * We don't expect that the user set the XID of the transaction that is
+ * rolled back but if the skip XID is set, clear it.
+ */

/user set/user to set/

--
With Regards,
Amit Kapila.

#422Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#416)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 17, 2022 at 5:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. Please review it.

Some review comments for the v6 patch:

doc/src/sgml/logical-replication.sgml

(1) Expanded output

Since the view output is shown in "expanded output" mode, perhaps the
doc should say that, or alternatively add the following lines prior to
it, to make it clear:

postgres=# \x
Expanded display is on.

(2) Message output in server log

The actual CONTEXT text now just says "at ..." instead of "with commit
timestamp ...", so the doc needs to be updated as follows:

BEFORE:
+CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 716 with commit timestamp
2021-09-29 15:52:45.165754+00
AFTER:
+CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 716 at 2021-09-29
15:52:45.165754+00

(3)
The wording "the change" doesn't seem right here, so I suggest the
following update:

BEFORE:
+   Skipping the whole transaction includes skipping the change that
may not violate
AFTER:
+   Skipping the whole transaction includes skipping changes that may
not violate

doc/src/sgml/ref/alter_subscription.sgml

(4)
I have a number of suggested wording improvements:

BEFORE:
+      Skips applying changes of the particular transaction.  If incoming data
+      violates any constraints the logical replication will stop until it is
+      resolved.  The resolution can be done either by changing data on the
+      subscriber so that it doesn't conflict with incoming change or
by skipping
+      the whole transaction.  The logical replication worker skips all data
+      modification changes within the specified transaction including
the changes
+      that may not violate the constraint, so, it should only be used as a last
+      resort. This option has no effect on the transaction that is already
+      prepared by enabling <literal>two_phase</literal> on subscriber.
AFTER:
+      Skips applying all changes of the specified transaction.  If
incoming data
+      violates any constraints, logical replication will stop until it is
+      resolved.  The resolution can be done either by changing data on the
+      subscriber so that it doesn't conflict with incoming change or
by skipping
+      the whole transaction.  Using the SKIP option, the logical
replication worker skips all data
+      modification changes within the specified transaction, including changes
+      that may not violate the constraint, so, it should only be used as a last
+      resort. This option has no effect on transactions that are already
+      prepared by enabling <literal>two_phase</literal> on the subscriber.

(5)
change -> changes

BEFORE:
+      subscriber so that it doesn't conflict with incoming change or
by skipping
AFTER:
+      subscriber so that it doesn't conflict with incoming changes or
by skipping

src/backend/replication/logical/worker.c

(6) Missing word?
The following should say "worth doing" or "worth it"?

+ * doesn't seem to be worth since it's a very minor case.

src/test/regress/sql/subscription.sql

(7) Misleading test case
I think the following test case is misleading and should be removed,
because the "1.1" xid value is only regarded as invalid because "1" is
an invalid xid (and there's already a test case for a "1" xid) - the
fractional part gets thrown away, and doesn't affect the validity
here.

+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 1.1);

Regards,
Greg Nancarrow
Fujitsu Australia

#423Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#421)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 17, 2022 at 10:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jan 17, 2022 at 6:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

(5)

I can miss something here but, in one of
the past discussions, there seems a consensus that
if the user specifies XID of a subtransaction,
it would be better to skip only the subtransaction.

This time, is it out of the range of the patch ?
If so, I suggest you include some description about it
either in the commit message or around codes related to it.

How can the user know subtransaction XID? I suppose you refer to
streaming protocol cases but while applying spooled changes we don't
report subtransaction XID neither in server log nor
pg_stat_subscription_workers.

I also think in the current system users won't be aware of
subtransaction's XID but I feel Osumi-San's point is valid that we
should at least add it in docs that we allow to skip only top-level
xacts. Also, in the future, it won't be impossible to imagine that we
can have subtransaction's XID info also available to users as we have
that in the case of streaming xacts (See subxact_data).

Fair point and more accurate, but I'm a bit concerned that using these
words could confuse the user. There are some places in the doc where
we use the words “top-level transaction” and "sub transactions” but
these are not commonly used in the doc. The user normally would not be
aware that sub transactions are used to implement SAVEPOINTs. Also,
the publisher's subtransaction ID doesn’t appear anywhere on the
subscriber. So if we want to mention it, I think we should use other
words instead of them but I don’t have a good idea for that. Do you
have any ideas?

Few minor points:
===============
1.
+ * the subscription if hte user has specified skip_xid.

Typo. /hte/the

Will fix.

2.
+ * PREPARED won’t be resent but subskipxid is left.

In diffmerge tool, won't is showing some funny chars. When I manually
removed 't and added it again, everything is fine. I am not sure why
it is so? I think Osumi-San has also raised this complaint.

Oh I didn't realize that. I'll check it again by using diffmerge tool.

3.
+ /*
+ * We don't expect that the user set the XID of the transaction that is
+ * rolled back but if the skip XID is set, clear it.
+ */

/user set/user to set/

Will fix.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#424Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#422)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 18, 2022 at 10:36 AM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Mon, Jan 17, 2022 at 5:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. Please review it.

Some review comments for the v6 patch:

Thank you for the comments!

doc/src/sgml/logical-replication.sgml

(1) Expanded output

Since the view output is shown in "expanded output" mode, perhaps the
doc should say that, or alternatively add the following lines prior to
it, to make it clear:

postgres=# \x
Expanded display is on.

I'm not sure it's really necessary. A similar example would be
perform.sgml but it doesn't say "\x".

(2) Message output in server log

The actual CONTEXT text now just says "at ..." instead of "with commit
timestamp ...", so the doc needs to be updated as follows:

BEFORE:
+CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 716 with commit timestamp
2021-09-29 15:52:45.165754+00
AFTER:
+CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 716 at 2021-09-29
15:52:45.165754+00

Will fix.

(3)
The wording "the change" doesn't seem right here, so I suggest the
following update:

BEFORE:
+   Skipping the whole transaction includes skipping the change that
may not violate
AFTER:
+   Skipping the whole transaction includes skipping changes that may
not violate

doc/src/sgml/ref/alter_subscription.sgml

Will fix.

(4)
I have a number of suggested wording improvements:

BEFORE:
+      Skips applying changes of the particular transaction.  If incoming data
+      violates any constraints the logical replication will stop until it is
+      resolved.  The resolution can be done either by changing data on the
+      subscriber so that it doesn't conflict with incoming change or
by skipping
+      the whole transaction.  The logical replication worker skips all data
+      modification changes within the specified transaction including
the changes
+      that may not violate the constraint, so, it should only be used as a last
+      resort. This option has no effect on the transaction that is already
+      prepared by enabling <literal>two_phase</literal> on subscriber.
AFTER:
+      Skips applying all changes of the specified transaction.  If
incoming data
+      violates any constraints, logical replication will stop until it is
+      resolved.  The resolution can be done either by changing data on the
+      subscriber so that it doesn't conflict with incoming change or
by skipping
+      the whole transaction.  Using the SKIP option, the logical
replication worker skips all data
+      modification changes within the specified transaction, including changes
+      that may not violate the constraint, so, it should only be used as a last
+      resort. This option has no effect on transactions that are already
+      prepared by enabling <literal>two_phase</literal> on the subscriber.

Will fix.

(5)
change -> changes

BEFORE:
+      subscriber so that it doesn't conflict with incoming change or
by skipping
AFTER:
+      subscriber so that it doesn't conflict with incoming changes or
by skipping

Will fix.

src/backend/replication/logical/worker.c

(6) Missing word?
The following should say "worth doing" or "worth it"?

+ * doesn't seem to be worth since it's a very minor case.

WIll fix

src/test/regress/sql/subscription.sql

(7) Misleading test case
I think the following test case is misleading and should be removed,
because the "1.1" xid value is only regarded as invalid because "1" is
an invalid xid (and there's already a test case for a "1" xid) - the
fractional part gets thrown away, and doesn't affect the validity
here.

+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 1.1);

Good point. Will remove.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#425Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#423)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 18, 2022 at 8:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Jan 17, 2022 at 10:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jan 17, 2022 at 6:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

(5)

I can miss something here but, in one of
the past discussions, there seems a consensus that
if the user specifies XID of a subtransaction,
it would be better to skip only the subtransaction.

This time, is it out of the range of the patch ?
If so, I suggest you include some description about it
either in the commit message or around codes related to it.

How can the user know subtransaction XID? I suppose you refer to
streaming protocol cases but while applying spooled changes we don't
report subtransaction XID neither in server log nor
pg_stat_subscription_workers.

I also think in the current system users won't be aware of
subtransaction's XID but I feel Osumi-San's point is valid that we
should at least add it in docs that we allow to skip only top-level
xacts. Also, in the future, it won't be impossible to imagine that we
can have subtransaction's XID info also available to users as we have
that in the case of streaming xacts (See subxact_data).

Fair point and more accurate, but I'm a bit concerned that using these
words could confuse the user. There are some places in the doc where
we use the words “top-level transaction” and "sub transactions” but
these are not commonly used in the doc. The user normally would not be
aware that sub transactions are used to implement SAVEPOINTs. Also,
the publisher's subtransaction ID doesn’t appear anywhere on the
subscriber. So if we want to mention it, I think we should use other
words instead of them but I don’t have a good idea for that. Do you
have any ideas?

How about changing existing text:
+          Specifies the ID of the transaction whose changes are to be skipped
+          by the logical replication worker.  Setting <literal>NONE</literal>
+          resets the transaction ID.

to

Specifies the top-level transaction identifier whose changes are to be
skipped by the logical replication worker. We don't support skipping
individual subtransactions. Setting <literal>NONE</literal> resets
the transaction ID.

--
With Regards,
Amit Kapila.

#426tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#416)
RE: Skipping logical replication transactions on subscriber side

On Mon, Jan 17, 2022 2:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. Please review it.

Thanks for updating the patch. Few comments:

1)
		/* Two_phase is only supported in v15 and higher */
 		if (pset.sversion >= 150000)
 			appendPQExpBuffer(&buf,
-							  ", subtwophasestate AS \"%s\"\n",
-							  gettext_noop("Two phase commit"));
+							  ", subtwophasestate AS \"%s\"\n"
+							  ", subskipxid AS \"%s\"\n",
+							  gettext_noop("Two phase commit"),
+							  gettext_noop("Skip XID"));

appendPQExpBuffer(&buf,
", subsynccommit AS \"%s\"\n"

I think "skip xid" should be mentioned in the comment. Maybe it could be changed to:
"Two_phase and skip XID are only supported in v15 and higher"

2) The following two places are not consistent in whether "= value" is surround
with square brackets.

+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )

+ <term><literal>SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )</literal></term>

Should we modify the first place to:
+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )

Because currently there is only one skip_option - xid, and a parameter must be
specified when using it.

3)
+	 * Protect subskip_xid of pg_subscription from being concurrently updated
+	 * while clearing it.

"subskip_xid" should be "subskipxid" I think.

4)
+/*
+ * Start skipping changes of the transaction if the given XID matches the
+ * transaction ID specified by skip_xid option.
+ */

The option name was "skip_xid" in the previous version, and it is "xid" in
latest patch. So should we modify "skip_xid option" to "skip xid option", or
"skip option xid", or something else?

Also the following place has similar issue:
+ * the subscription if hte user has specified skip_xid. Once we start skipping

Regards,
Tang

#427osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#419)
RE: Skipping logical replication transactions on subscriber side

On Monday, January 17, 2022 9:52 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for the comments!

..

(2) Minor improvement suggestion of comment in
src/backend/replication/logical/worker.c

+ * reset during that.  Also, we don't skip receiving the changes in
+ streaming
+ * cases, since we decide whether or not to skip applying the changes
+ when

I sugguest that you don't use 'streaming cases', because what
"streaming cases" means sounds a bit broader than actual your

implementation.

We do skip transaction of streaming cases but not during the spooling phase,

right ?

I suggest below.

"We don't skip receiving the changes at the phase to spool streaming

transactions"

I might be missing your point but I think it's correct that we don't skip receiving
the change of the transaction that is sent via streaming protocol. And it doesn't
sound broader to me. Could you elaborate on that?

OK. Excuse me for lack of explanation.

I felt "streaming cases" implies "non-streaming cases"
to compare a diffference (in my head) when it is
used to explain something at first.
I imagined the contrast between those, when I saw it.

Thus, I thought "streaming cases" meant
whole flow of streaming transactions which consists of messages
surrounded by stream start and stream stop and which are finished by
stream commit/stream abort (including 2PC variations).

When I come back to the subject, you wrote below in the comment

"we don't skip receiving the changes in streaming cases,
since we decide whether or not to skip applying the changes
when starting to apply changes"

The first part of this sentence
("we don't skip receiving the changes in streaming cases")
gives me an impression where we don't skip changes in the streaming cases
(of my understanding above), but the last part
("we decide whether or not to skip applying the changes
when starting to apply change") means we skip transactions for streaming at apply phase.

So, this sentence looked confusing to me slightly.
Thus, I suggested below (and when I connect it with existing part)

"we don't skip receiving the changes at the phase to spool streaming transactions
since we decide whether or not to skip applying the changes when starting to apply changes"

For me this looked better, but of course, this is a suggestion.

(3) in the comment of apply_handle_prepare_internal, two full-width

characters.

3-1
+ * won’t be resent in a case where the server crashes between

them.

3-2
+        * COMMIT PREPARED or ROLLBACK PREPARED. But that’s okay
+ because this

You have full-width characters for "won't" and "that's".
Could you please check ?

Which characters in "won't" are full-width characters? I could not find them.

All characters I found and mentioned as full-width are single quotes.

It might be good that you check the entire patch once
by some tool that helps you to detect it.

(5)

I can miss something here but, in one of the past discussions, there
seems a consensus that if the user specifies XID of a subtransaction,
it would be better to skip only the subtransaction.

This time, is it out of the range of the patch ?
If so, I suggest you include some description about it either in the
commit message or around codes related to it.

How can the user know subtransaction XID? I suppose you refer to streaming
protocol cases but while applying spooled changes we don't report
subtransaction XID neither in server log nor pg_stat_subscription_workers.

Yeah, usually subtransaction XID is not exposed to the users. I agree.

But, clarifying the target of this feature is only top-level transactions
sounds better to me. Thank you Amit-san for your support
about how we should write it in [1]/messages/by-id/CAA4eK1JHUF7fVNHQ1ZRRgVsdE8XDY8BruU9dNP3Q3jizNdpEbg@mail.gmail.com !

(6)

I feel it's a better idea to include a test whether to skip aborted
streaming transaction clears the XID in the TAP test for this feature,
in a sense to cover various new code paths. Did you have any special
reason to omit the case ?

Which code path is newly covered by this aborted streaming transaction tests?
I think that this patch is already covered even by the test for a
committed-and-streamed transaction. It doesn't matter whether the streamed
transaction is committed or aborted because an error occurs while applying the
spooled changes.

Oh, this was my mistake. What I expressed as a new patch is
apply_handle_stream_abort -> clear_subscription_skip_xid.
But, this was totally wrong as you explained.

(7)

I want more explanation for the reason to restart the subscriber in
the TAP test because this is not mandatory operation.
(We can pass the TAP tests without this restart)

From :
# Restart the subscriber node to restart logical replication with no
interval

IIUC, below would be better.

To :
# As an optimization to finish tests earlier, restart the subscriber
with no interval, # rather than waiting for new error to laucher a new apply

worker.

I could not understand why the proposed sentence has more information.
Does it mean you want to mention "As an optimization to finish tests earlier"?

Yes, exactly. The point is to add "As an optimization to finish tests earlier".

Probably, I should have asked a simple question "why do you restart the subscriber" ?
At first sight, I couldn't understand the meaning for the restart and
you don't explain the reason itself.

[1]: /messages/by-id/CAA4eK1JHUF7fVNHQ1ZRRgVsdE8XDY8BruU9dNP3Q3jizNdpEbg@mail.gmail.com

Best Regards,
Takamichi Osumi

#428Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: tanghy.fnst@fujitsu.com (#426)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 18, 2022 at 8:34 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Mon, Jan 17, 2022 2:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2) The following two places are not consistent in whether "= value" is surround
with square brackets.

+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )

+ <term><literal>SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )</literal></term>

Should we modify the first place to:
+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )

Because currently there is only one skip_option - xid, and a parameter must be
specified when using it.

Good observation. Do we really need [, ... ] as currently, we support
only one value for XID?

--
With Regards,
Amit Kapila.

#429Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#428)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 18, 2022 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 18, 2022 at 8:34 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Mon, Jan 17, 2022 2:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2) The following two places are not consistent in whether "= value" is surround
with square brackets.

+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )

+ <term><literal>SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )</literal></term>

Should we modify the first place to:
+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )

Because currently there is only one skip_option - xid, and a parameter must be
specified when using it.

Good observation. Do we really need [, ... ] as currently, we support
only one value for XID?

I think no. In the doc, it should be:

ALTER SUBSCRIPTION name SKIP ( skip_option = value )

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#430Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: tanghy.fnst@fujitsu.com (#426)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 18, 2022 at 12:04 PM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Mon, Jan 17, 2022 2:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. Please review it.

Thanks for updating the patch. Few comments:

1)
/* Two_phase is only supported in v15 and higher */
if (pset.sversion >= 150000)
appendPQExpBuffer(&buf,
-                                                         ", subtwophasestate AS \"%s\"\n",
-                                                         gettext_noop("Two phase commit"));
+                                                         ", subtwophasestate AS \"%s\"\n"
+                                                         ", subskipxid AS \"%s\"\n",
+                                                         gettext_noop("Two phase commit"),
+                                                         gettext_noop("Skip XID"));

appendPQExpBuffer(&buf,
", subsynccommit AS \"%s\"\n"

I think "skip xid" should be mentioned in the comment. Maybe it could be changed to:
"Two_phase and skip XID are only supported in v15 and higher"

Added.

2) The following two places are not consistent in whether "= value" is surround
with square brackets.

+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )

+ <term><literal>SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )</literal></term>

Should we modify the first place to:
+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )

Because currently there is only one skip_option - xid, and a parameter must be
specified when using it.

Good catch. Fixed.

3)
+        * Protect subskip_xid of pg_subscription from being concurrently updated
+        * while clearing it.

"subskip_xid" should be "subskipxid" I think.

Fixed.

4)
+/*
+ * Start skipping changes of the transaction if the given XID matches the
+ * transaction ID specified by skip_xid option.
+ */

The option name was "skip_xid" in the previous version, and it is "xid" in
latest patch. So should we modify "skip_xid option" to "skip xid option", or
"skip option xid", or something else?

Also the following place has similar issue:
+ * the subscription if hte user has specified skip_xid. Once we start skipping

Fixed.

I've attached an updated patch. All comments I got so far were
incorporated into this patch unless I'm missing something.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v7-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patchapplication/x-patch; name=v7-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
#431Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#427)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 18, 2022 at 12:20 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Monday, January 17, 2022 9:52 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for the comments!

..

(2) Minor improvement suggestion of comment in
src/backend/replication/logical/worker.c

+ * reset during that.  Also, we don't skip receiving the changes in
+ streaming
+ * cases, since we decide whether or not to skip applying the changes
+ when

I sugguest that you don't use 'streaming cases', because what
"streaming cases" means sounds a bit broader than actual your

implementation.

We do skip transaction of streaming cases but not during the spooling phase,

right ?

I suggest below.

"We don't skip receiving the changes at the phase to spool streaming

transactions"

I might be missing your point but I think it's correct that we don't skip receiving
the change of the transaction that is sent via streaming protocol. And it doesn't
sound broader to me. Could you elaborate on that?

OK. Excuse me for lack of explanation.

I felt "streaming cases" implies "non-streaming cases"
to compare a diffference (in my head) when it is
used to explain something at first.
I imagined the contrast between those, when I saw it.

Thus, I thought "streaming cases" meant
whole flow of streaming transactions which consists of messages
surrounded by stream start and stream stop and which are finished by
stream commit/stream abort (including 2PC variations).

When I come back to the subject, you wrote below in the comment

"we don't skip receiving the changes in streaming cases,
since we decide whether or not to skip applying the changes
when starting to apply changes"

The first part of this sentence
("we don't skip receiving the changes in streaming cases")
gives me an impression where we don't skip changes in the streaming cases
(of my understanding above), but the last part
("we decide whether or not to skip applying the changes
when starting to apply change") means we skip transactions for streaming at apply phase.

So, this sentence looked confusing to me slightly.
Thus, I suggested below (and when I connect it with existing part)

"we don't skip receiving the changes at the phase to spool streaming transactions
since we decide whether or not to skip applying the changes when starting to apply changes"

For me this looked better, but of course, this is a suggestion.

Thank you for your explanation.

I've modified the comment with some changes since "the phase to spool
streaming transaction" seems not commonly be used in worker.c.

(3) in the comment of apply_handle_prepare_internal, two full-width

characters.

3-1
+ * won’t be resent in a case where the server crashes between

them.

3-2
+        * COMMIT PREPARED or ROLLBACK PREPARED. But that’s okay
+ because this

You have full-width characters for "won't" and "that's".
Could you please check ?

Which characters in "won't" are full-width characters? I could not find them.

All characters I found and mentioned as full-width are single quotes.

It might be good that you check the entire patch once
by some tool that helps you to detect it.

Thanks!

(5)

I can miss something here but, in one of the past discussions, there
seems a consensus that if the user specifies XID of a subtransaction,
it would be better to skip only the subtransaction.

This time, is it out of the range of the patch ?
If so, I suggest you include some description about it either in the
commit message or around codes related to it.

How can the user know subtransaction XID? I suppose you refer to streaming
protocol cases but while applying spooled changes we don't report
subtransaction XID neither in server log nor pg_stat_subscription_workers.

Yeah, usually subtransaction XID is not exposed to the users. I agree.

But, clarifying the target of this feature is only top-level transactions
sounds better to me. Thank you Amit-san for your support
about how we should write it in [1] !

Yes, I've included the sentence proposed by Amit in the latest patch.

(6)

I feel it's a better idea to include a test whether to skip aborted
streaming transaction clears the XID in the TAP test for this feature,
in a sense to cover various new code paths. Did you have any special
reason to omit the case ?

Which code path is newly covered by this aborted streaming transaction tests?
I think that this patch is already covered even by the test for a
committed-and-streamed transaction. It doesn't matter whether the streamed
transaction is committed or aborted because an error occurs while applying the
spooled changes.

Oh, this was my mistake. What I expressed as a new patch is
apply_handle_stream_abort -> clear_subscription_skip_xid.
But, this was totally wrong as you explained.

(7)

I want more explanation for the reason to restart the subscriber in
the TAP test because this is not mandatory operation.
(We can pass the TAP tests without this restart)

From :
# Restart the subscriber node to restart logical replication with no
interval

IIUC, below would be better.

To :
# As an optimization to finish tests earlier, restart the subscriber
with no interval, # rather than waiting for new error to laucher a new apply

worker.

I could not understand why the proposed sentence has more information.
Does it mean you want to mention "As an optimization to finish tests earlier"?

Yes, exactly. The point is to add "As an optimization to finish tests earlier".

Probably, I should have asked a simple question "why do you restart the subscriber" ?
At first sight, I couldn't understand the meaning for the restart and
you don't explain the reason itself.

I thought "to restart logical replication with no interval" explains
the reason why we restart the subscriber. I left this part but we can
change it later if others also want to do that change.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#432osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#430)
RE: Skipping logical replication transactions on subscriber side

On Tuesday, January 18, 2022 1:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. All comments I got so far were incorporated
into this patch unless I'm missing something.

Hi, thank you for your new patch v7.
For your information, I've encountered a failure to apply patch v7
on top of the latest commit (d3f4532)

$ git am v7-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
Applying: Add ALTER SUBSCRIPTION ... SKIP to skip the transaction on subscriber nodes
error: patch failed: src/backend/parser/gram.y:9954
error: src/backend/parser/gram.y: patch does not apply

Could you please rebase it when it's necessary ?

Best Regards,
Takamichi Osumi

#433Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#432)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 18, 2022 at 2:37 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, January 18, 2022 1:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch. All comments I got so far were incorporated
into this patch unless I'm missing something.

Hi, thank you for your new patch v7.
For your information, I've encountered a failure to apply patch v7
on top of the latest commit (d3f4532)

$ git am v7-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
Applying: Add ALTER SUBSCRIPTION ... SKIP to skip the transaction on subscriber nodes
error: patch failed: src/backend/parser/gram.y:9954
error: src/backend/parser/gram.y: patch does not apply

Could you please rebase it when it's necessary ?

Thank you for reporting!

I've attached a rebased patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v8-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patchapplication/octet-stream; name=v8-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
#434osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#433)
RE: Skipping logical replication transactions on subscriber side

On Tuesday, January 18, 2022 3:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached a rebased patch.

Thank you for your rebase !

Several review comments on v8.

(1) doc/src/sgml/logical-replication.sgml

+
+  <para>
+   To resolve conflicts, you need to consider changing the data on the subscriber so
+   that it doesn't conflict with incoming changes, or dropping the conflicting constraint
+   or unique index, or writing a trigger on the subscriber to suppress or redirect
+   conflicting incoming changes, or as a last resort, by skipping the whole transaction.
+   Skipping the whole transaction includes skipping changes that may not violate
+   any constraint.  This can easily make the subscriber inconsistent, especially if
+   a user specifies the wrong transaction ID or the position of origin.
+  </para>

The first sentence is too long and lack of readability slightly.
One idea to sort out listing items is to utilize "itemizedlist".
For instance, I imagined something like below.

<para>
To resolve conflicts, you need to consider following actions:
<itemizedlist>
<listitem>
<para>
Change the data on the subscriber so that it doesn't conflict with incoming changes
</para>
</listitem>
...
<listitem>
<para>
As a last resort, skip the whole transaction
</para>
</listitem>
</itemizedlist>
....
</para>

What did you think ?

By the way, in case only when you want to keep the current sentence style,
I have one more question. Do we need "by" in the part
"by skipping the whole transaction" ? If we focus on only this action,
I think the sentence becomes "you need to consider skipping the whole transaction".
If this is true, we don't need "by" in the part.

(2)

Also, in the same paragraph, we write

+ ... This can easily make the subscriber inconsistent, especially if
+   a user specifies the wrong transaction ID or the position of origin.

The subject of this sentence should be "Those" or "Some of those" ?
because we want to mention either "new skip xid feature" or
"pg_replication_origin_advance".

(3) doc/src/sgml/ref/alter_subscription.sgml

Below change contains unnecessary spaces.
+ the whole transaction. Using <command> ALTER SUBSCRIPTION ... SKIP </command>

Need to change
From:
<command> ALTER SUBSCRIPTION ... SKIP </command>
To:
<command>ALTER SUBSCRIPTION ... SKIP</command>

(4) comment in clear_subscription_skip_xid

+        * the flush position the transaction will be sent again and the user
+        * needs to be set subskipxid again.  We can reduce the possibility by

Shoud change
From:
the user needs to be set...
To:
the user needs to set...

(5) clear_subscription_skip_xid

+       if (!HeapTupleIsValid(tup))
+               elog(ERROR, "subscription \"%s\" does not exist", MySubscription->name);

Can we change it to ereport with ERRCODE_UNDEFINED_OBJECT ?
This suggestion has another aspect that in within one patch, we don't mix
both ereport and elog at the same time.

(6) apply_handle_stream_abort

@@ -1209,6 +1300,13 @@ apply_handle_stream_abort(StringInfo s)

logicalrep_read_stream_abort(s, &xid, &subxid);

+       /*
+        * We don't expect the user to set the XID of the transaction that is
+        * rolled back but if the skip XID is set, clear it.
+        */
+       if (MySubscription->skipxid == xid || MySubscription->skipxid == subxid)
+               clear_subscription_skip_xid(MySubscription->skipxid, InvalidXLogRecPtr, 0);
+

In my humble opinion, this still cares about subtransaction xid still.
If we want to be consistent with top level transactions only,
I felt checking MySubscription->skipxid == xid should be sufficient.

Below is an *insame* (in a sense not correct usage) scenario
to hit the "MySubscription->skipxid == subxid".
Sorry if it is not perfect.

-------
Set logical_decoding_work_mem = 64.
Create tables named 'tab' with a column id (integer);
Create pub and sub with streaming = true.
No initial data is required on both nodes
because we just want to issue stream_abort
after executing skip xid feature.

<Session1> to the publisher
begin;
select pg_current_xact_id(); -- for reference
insert into tab values (1);
savepoint s1;
insert into tab values (2);
savepoint s2;
insert into tab values (generate_series(1001, 2000));
select ctid, xmin, xmax, id from tab where id in (1, 2, 1001);

<Session2> to the subscriber
select subname, subskipxid from pg_subscription; -- shows 0
alter subscription mysub skip (xid = xxx); -- xxx is that of xmin for 1001 on the publisher
select subname, subskipxid from pg_subscription; -- check it shows xxx just in case

<Session1>
rollback to s1;
commit;
select * from tab; -- shows only data '1'.

<Session2>
select subname, subskipxid from pg_subscription; -- shows 0. subskipxid was reset by the skip xid feature
select count(1) = 1 from tab; -- shows true

FYI: the commands result of those last two commands.
postgres=# select subname, subskipxid from pg_subscription;
subname | subskipxid
---------+------------
mysub | 0
(1 row)

postgres=# select count(1) = 1 from tab;
?column?
----------
t
(1 row)

Thus, it still cares about subtransactions and clear the subskipxid.
Should we fix this behavior for consistency ?

Best Regards,
Takamichi Osumi

#435vignesh C
vignesh C
vignesh21@gmail.com
In reply to: Amit Kapila (#412)
Re: Skipping logical replication transactions on subscriber side

On Sat, Jan 15, 2022 at 3:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 14, 2022 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks for the updated patch, few minor comments:
1) Should "SKIP" be "SKIP (" here:
@@ -1675,7 +1675,7 @@ psql_completion(const char *text, int start, int end)
/* ALTER SUBSCRIPTION <name> */
else if (Matches("ALTER", "SUBSCRIPTION", MatchAny))
COMPLETE_WITH("CONNECTION", "ENABLE", "DISABLE", "OWNER TO",
-                                         "RENAME TO", "REFRESH
PUBLICATION", "SET",
+                                         "RENAME TO", "REFRESH
PUBLICATION", "SET", "SKIP",
Won't the another rule as follows added by patch sufficient for what
you are asking?
+ /* ALTER SUBSCRIPTION <name> SKIP */
+ else if (Matches("ALTER", "SUBSCRIPTION", MatchAny, "SKIP"))
+ COMPLETE_WITH("(");

I might be missing something but why do you think the handling of SKIP
be any different than what we are doing for SET?

In case of "ALTER SUBSCRIPTION sub1 SET" there are 2 possible tab
completion options, user can either specify "ALTER SUBSCRIPTION sub1
SET PUBLICATION pub1" or "ALTER SUBSCRIPTION sub1 SET ( SET option
like STREAMING,etc = 'on')", that is why we have 2 possible options as
below:
postgres=# ALTER SUBSCRIPTION sub1 SET
( PUBLICATION

Whereas in the case of SKIP there is only one possible tab completion
option i.e XID. We handle similarly in case of WITH option, we specify
"WITH (" in case of tab completion for "CREATE PUBLICATION pub1"
postgres=# CREATE PUBLICATION pub1
FOR ALL TABLES FOR ALL TABLES IN SCHEMA FOR TABLE
WITH (

Regards,
Vignesh

#436Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#433)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 18, 2022 at 5:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached a rebased patch.

A couple of comments for the v8 patch:

doc/src/sgml/logical-replication.sgml

(1)
Strictly-speaking it's the transaction, not transaction ID, that
contains changes, so suggesting minor change:

BEFORE:
+   The transaction ID that contains the change violating the constraint can be
AFTER:
+   The ID of the transaction that contains the change violating the
constraint can be

doc/src/sgml/ref/alter_subscription.sgml

(2) apply_handle_commit_internal
It's not entirely apparent what commits the clearing of subskixpid
here, so I suggest the following addition:

BEFORE:
+ * clear subskipxid of pg_subscription.
AFTER:
+ * clear subskipxid of pg_subscription, then commit.

Regards,
Greg Nancarrow
Fujitsu Australia

#437Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#434)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 19, 2022 at 12:22 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, January 18, 2022 3:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached a rebased patch.

Thank you for your rebase !

Several review comments on v8.

Thank you for the comments!

(1) doc/src/sgml/logical-replication.sgml

+
+  <para>
+   To resolve conflicts, you need to consider changing the data on the subscriber so
+   that it doesn't conflict with incoming changes, or dropping the conflicting constraint
+   or unique index, or writing a trigger on the subscriber to suppress or redirect
+   conflicting incoming changes, or as a last resort, by skipping the whole transaction.
+   Skipping the whole transaction includes skipping changes that may not violate
+   any constraint.  This can easily make the subscriber inconsistent, especially if
+   a user specifies the wrong transaction ID or the position of origin.
+  </para>

The first sentence is too long and lack of readability slightly.
One idea to sort out listing items is to utilize "itemizedlist".
For instance, I imagined something like below.

<para>
To resolve conflicts, you need to consider following actions:
<itemizedlist>
<listitem>
<para>
Change the data on the subscriber so that it doesn't conflict with incoming changes
</para>
</listitem>
...
<listitem>
<para>
As a last resort, skip the whole transaction
</para>
</listitem>
</itemizedlist>
....
</para>

What did you think ?

By the way, in case only when you want to keep the current sentence style,
I have one more question. Do we need "by" in the part
"by skipping the whole transaction" ? If we focus on only this action,
I think the sentence becomes "you need to consider skipping the whole transaction".
If this is true, we don't need "by" in the part.

I personally prefer to keep the current sentence since listing them
seems not suitable in this case. But I agree that "by" is not
necessary here.

(2)

Also, in the same paragraph, we write

+ ... This can easily make the subscriber inconsistent, especially if
+   a user specifies the wrong transaction ID or the position of origin.

The subject of this sentence should be "Those" or "Some of those" ?
because we want to mention either "new skip xid feature" or
"pg_replication_origin_advance".

I think "This" in the sentence refers to "Skipping the whole
transaction". In the previous paragraph, we describe that there are
two methods for skipping the whole transaction: this new feature and
pg_replication_origin_advance(). And in this paragraph, we don't
mention any specific methods for skipping the whole transaction but
describe that skipping the whole transaction per se can easily make
the subscriber inconsistent. The current structure is fine with me.

(3) doc/src/sgml/ref/alter_subscription.sgml

Below change contains unnecessary spaces.
+ the whole transaction. Using <command> ALTER SUBSCRIPTION ... SKIP </command>

Need to change
From:
<command> ALTER SUBSCRIPTION ... SKIP </command>
To:
<command>ALTER SUBSCRIPTION ... SKIP</command>

Will remove.

(4) comment in clear_subscription_skip_xid

+        * the flush position the transaction will be sent again and the user
+        * needs to be set subskipxid again.  We can reduce the possibility by

Shoud change
From:
the user needs to be set...
To:
the user needs to set...

Will remove.

(5) clear_subscription_skip_xid

+       if (!HeapTupleIsValid(tup))
+               elog(ERROR, "subscription \"%s\" does not exist", MySubscription->name);

Can we change it to ereport with ERRCODE_UNDEFINED_OBJECT ?
This suggestion has another aspect that in within one patch, we don't mix
both ereport and elog at the same time.

I don’t think we need to set errcode since this error is a
should-not-happen error.

(6) apply_handle_stream_abort

@@ -1209,6 +1300,13 @@ apply_handle_stream_abort(StringInfo s)

logicalrep_read_stream_abort(s, &xid, &subxid);

+       /*
+        * We don't expect the user to set the XID of the transaction that is
+        * rolled back but if the skip XID is set, clear it.
+        */
+       if (MySubscription->skipxid == xid || MySubscription->skipxid == subxid)
+               clear_subscription_skip_xid(MySubscription->skipxid, InvalidXLogRecPtr, 0);
+

In my humble opinion, this still cares about subtransaction xid still.
If we want to be consistent with top level transactions only,
I felt checking MySubscription->skipxid == xid should be sufficient.

I thought if we can clear subskipxid whose value has already been
processed on the subscriber with a reasonable cost it makes sense to
do that because it can reduce the possibility of the issue that XID is
wraparound while leaving the wrong in subskipxid. But as you pointed
out, the current behavior doesn’t match the description in the doc:

After the logical replication successfully skips the transaction, the
transaction ID (stored in pg_subscription.subskipxid) is cleared.

and

We don't support skipping individual subtransactions.

I'll remove it in the next version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#438Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#437)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 19, 2022 at 12:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 19, 2022 at 12:22 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

(6) apply_handle_stream_abort

@@ -1209,6 +1300,13 @@ apply_handle_stream_abort(StringInfo s)

logicalrep_read_stream_abort(s, &xid, &subxid);

+       /*
+        * We don't expect the user to set the XID of the transaction that is
+        * rolled back but if the skip XID is set, clear it.
+        */
+       if (MySubscription->skipxid == xid || MySubscription->skipxid == subxid)
+               clear_subscription_skip_xid(MySubscription->skipxid, InvalidXLogRecPtr, 0);
+

In my humble opinion, this still cares about subtransaction xid still.
If we want to be consistent with top level transactions only,
I felt checking MySubscription->skipxid == xid should be sufficient.

I thought if we can clear subskipxid whose value has already been
processed on the subscriber with a reasonable cost it makes sense to
do that because it can reduce the possibility of the issue that XID is
wraparound while leaving the wrong in subskipxid.

I guess that could happen if the user sets some unrelated XID value.
So, I think it should be okay to not clear this but we can add a
comment in the code at that place that we don't clear subtransaction's
XID as we don't support skipping individual subtransactions or
something like that.

--
With Regards,
Amit Kapila.

#439Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Masahiko Sawada (#433)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On 18.01.22 07:05, Masahiko Sawada wrote:

I've attached a rebased patch.

I think this is now almost done. Attached I have a small fixup patch
with some documentation proof-reading, and removing some comments I felt
are redundant. Some others have also sent you some documentation
updates, so feel free to merge mine in with them.

Some other comments:

parse_subscription_options() and AlterSubscriptionStmt mixes regular
options and skip options in ways that confuse me. It seems to work
correctly, though. I guess for now it's okay, but if we add more skip
options, it might be better to separate those more cleanly.

I think the superuser check in AlterSubscription() might no longer be
appropriate. Subscriptions can now be owned by non-superusers. Please
check that.

The display order in psql \dRs+ is a bit odd. I would put it at the
end, certainly not between Two phase commit and Synchronous commit.

Please run pgperltidy over 028_skip_xact.pl.

Is the setting of logical_decoding_work_mem in the test script required?
If so, comment why.

Please document arguments origin_lsn and origin_timestamp of
stop_skipping_changes(). Otherwise, one has to dig quite deep to find
out what they are for.

This is all minor stuff, so I think when this and the nearby comments
are addressed, this is fine by me.

Attachments:

0001-fixup-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-tran.patchtext/plain; charset=UTF-8; name=0001-fixup-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-tran.patch
#440Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#439)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 1:18 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 18.01.22 07:05, Masahiko Sawada wrote:

I've attached a rebased patch.

I think this is now almost done. Attached I have a small fixup patch
with some documentation proof-reading, and removing some comments I felt
are redundant. Some others have also sent you some documentation
updates, so feel free to merge mine in with them.

Thank you for reviewing the patch and attaching the fixup patch!

Some other comments:

parse_subscription_options() and AlterSubscriptionStmt mixes regular
options and skip options in ways that confuse me. It seems to work
correctly, though. I guess for now it's okay, but if we add more skip
options, it might be better to separate those more cleanly.

Agreed.

I think the superuser check in AlterSubscription() might no longer be
appropriate. Subscriptions can now be owned by non-superusers. Please
check that.

IIUC we don't allow non-superuser to own the subscription yet. We
still have the following superuser checks:

In CreateSubscription():

if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser to create subscriptions")));

and in AlterSubscriptionOwner_internal();

/* New owner must be a superuser */
if (!superuser_arg(newOwnerId))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied to change owner of
subscription \"%s\"",
NameStr(form->subname)),
errhint("The owner of a subscription must be a superuser.")));

Also, doing superuser check here seems to be consistent with
pg_replication_origin_advance() which is another way to skip
transactions and also requires superuser permission.

The display order in psql \dRs+ is a bit odd. I would put it at the
end, certainly not between Two phase commit and Synchronous commit.

Fixed.

Please run pgperltidy over 028_skip_xact.pl.

Fixed.

Is the setting of logical_decoding_work_mem in the test script required?
If so, comment why.

Yes, it makes the tests check streaming logical replication cases
easily. Added the comment.

Please document arguments origin_lsn and origin_timestamp of
stop_skipping_changes(). Otherwise, one has to dig quite deep to find
out what they are for.

Added.

Also, after reading the documentation updates, I realized that there
are two paragraphs describing almost the same things so merged them.
Please check the doc updates in the latest patch.

I've attached an updated patch that incorporated these commends as
well as other comments I got so far.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v9-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patchapplication/x-patch; name=v9-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transac.patch
#441Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: vignesh C (#435)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 19, 2022 at 3:32 PM vignesh C <vignesh21@gmail.com> wrote:

On Sat, Jan 15, 2022 at 3:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 14, 2022 at 5:35 PM vignesh C <vignesh21@gmail.com> wrote:

Thanks for the updated patch, few minor comments:
1) Should "SKIP" be "SKIP (" here:
@@ -1675,7 +1675,7 @@ psql_completion(const char *text, int start, int end)
/* ALTER SUBSCRIPTION <name> */
else if (Matches("ALTER", "SUBSCRIPTION", MatchAny))
COMPLETE_WITH("CONNECTION", "ENABLE", "DISABLE", "OWNER TO",
-                                         "RENAME TO", "REFRESH
PUBLICATION", "SET",
+                                         "RENAME TO", "REFRESH
PUBLICATION", "SET", "SKIP",
Won't the another rule as follows added by patch sufficient for what
you are asking?
+ /* ALTER SUBSCRIPTION <name> SKIP */
+ else if (Matches("ALTER", "SUBSCRIPTION", MatchAny, "SKIP"))
+ COMPLETE_WITH("(");

I might be missing something but why do you think the handling of SKIP
be any different than what we are doing for SET?

In case of "ALTER SUBSCRIPTION sub1 SET" there are 2 possible tab
completion options, user can either specify "ALTER SUBSCRIPTION sub1
SET PUBLICATION pub1" or "ALTER SUBSCRIPTION sub1 SET ( SET option
like STREAMING,etc = 'on')", that is why we have 2 possible options as
below:
postgres=# ALTER SUBSCRIPTION sub1 SET
( PUBLICATION

Whereas in the case of SKIP there is only one possible tab completion
option i.e XID. We handle similarly in case of WITH option, we specify
"WITH (" in case of tab completion for "CREATE PUBLICATION pub1"
postgres=# CREATE PUBLICATION pub1
FOR ALL TABLES FOR ALL TABLES IN SCHEMA FOR TABLE
WITH (

Right. I've incorporated this comment into the latest v9 patch[1]/messages/by-id/CAD21AoDOuNtvFUfU2wH2QgTJ6AyMXXh_vdA87qX0mUibdsrYTg@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoDOuNtvFUfU2wH2QgTJ6AyMXXh_vdA87qX0mUibdsrYTg@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#442Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Greg Nancarrow (#436)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 19, 2022 at 4:14 PM Greg Nancarrow <gregn4422@gmail.com> wrote:

On Tue, Jan 18, 2022 at 5:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached a rebased patch.

A couple of comments for the v8 patch:

Thank you for the comments!

doc/src/sgml/logical-replication.sgml

(1)
Strictly-speaking it's the transaction, not transaction ID, that
contains changes, so suggesting minor change:

BEFORE:
+   The transaction ID that contains the change violating the constraint can be
AFTER:
+   The ID of the transaction that contains the change violating the
constraint can be

doc/src/sgml/ref/alter_subscription.sgml

(2) apply_handle_commit_internal
It's not entirely apparent what commits the clearing of subskixpid
here, so I suggest the following addition:

BEFORE:
+ * clear subskipxid of pg_subscription.
AFTER:
+ * clear subskipxid of pg_subscription, then commit.

These comments are merged with Peter's comments and incorporated into
the latest v9 patch[1]/messages/by-id/CAD21AoDOuNtvFUfU2wH2QgTJ6AyMXXh_vdA87qX0mUibdsrYTg@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoDOuNtvFUfU2wH2QgTJ6AyMXXh_vdA87qX0mUibdsrYTg@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#443Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#438)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 19, 2022 at 5:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 19, 2022 at 12:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 19, 2022 at 12:22 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

(6) apply_handle_stream_abort

@@ -1209,6 +1300,13 @@ apply_handle_stream_abort(StringInfo s)

logicalrep_read_stream_abort(s, &xid, &subxid);

+       /*
+        * We don't expect the user to set the XID of the transaction that is
+        * rolled back but if the skip XID is set, clear it.
+        */
+       if (MySubscription->skipxid == xid || MySubscription->skipxid == subxid)
+               clear_subscription_skip_xid(MySubscription->skipxid, InvalidXLogRecPtr, 0);
+

In my humble opinion, this still cares about subtransaction xid still.
If we want to be consistent with top level transactions only,
I felt checking MySubscription->skipxid == xid should be sufficient.

I thought if we can clear subskipxid whose value has already been
processed on the subscriber with a reasonable cost it makes sense to
do that because it can reduce the possibility of the issue that XID is
wraparound while leaving the wrong in subskipxid.

I guess that could happen if the user sets some unrelated XID value.
So, I think it should be okay to not clear this but we can add a
comment in the code at that place that we don't clear subtransaction's
XID as we don't support skipping individual subtransactions or
something like that.

Agreed and added the comment in the latest patch[1]/messages/by-id/CAD21AoDOuNtvFUfU2wH2QgTJ6AyMXXh_vdA87qX0mUibdsrYTg@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAD21AoDOuNtvFUfU2wH2QgTJ6AyMXXh_vdA87qX0mUibdsrYTg@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#444Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#440)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 8:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 21, 2022 at 1:18 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

I think the superuser check in AlterSubscription() might no longer be
appropriate. Subscriptions can now be owned by non-superusers. Please
check that.

IIUC we don't allow non-superuser to own the subscription yet. We
still have the following superuser checks:

In CreateSubscription():

if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser to create subscriptions")));

and in AlterSubscriptionOwner_internal();

/* New owner must be a superuser */
if (!superuser_arg(newOwnerId))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied to change owner of
subscription \"%s\"",
NameStr(form->subname)),
errhint("The owner of a subscription must be a superuser.")));

Also, doing superuser check here seems to be consistent with
pg_replication_origin_advance() which is another way to skip
transactions and also requires superuser permission.

+1. I think this feature has the potential to make data inconsistent
and only be used as a last resort to resolve the conflicts so it is
better to allow this as a superuser.

--
With Regards,
Amit Kapila.

#445Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#429)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 18, 2022 at 9:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 18, 2022 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 18, 2022 at 8:34 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Mon, Jan 17, 2022 2:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2) The following two places are not consistent in whether "= value" is surround
with square brackets.

+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )

+ <term><literal>SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )</literal></term>

Should we modify the first place to:
+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )

Because currently there is only one skip_option - xid, and a parameter must be
specified when using it.

Good observation. Do we really need [, ... ] as currently, we support
only one value for XID?

I think no. In the doc, it should be:

ALTER SUBSCRIPTION name SKIP ( skip_option = value )

In the latest patch, I see:
+ <varlistentry>
+ <term><literal>SKIP ( <replaceable
class="parameter">skip_option</replaceable> = <replaceable
class="parameter">value</replaceable> [, ... ] )</literal></term>

What do we want to indicate by [, ... ]? To me, it appears like
multiple options but that is not what we support currently.

--
With Regards,
Amit Kapila.

#446Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#445)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 1:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 18, 2022 at 9:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 18, 2022 at 12:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 18, 2022 at 8:34 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:

On Mon, Jan 17, 2022 2:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2) The following two places are not consistent in whether "= value" is surround
with square brackets.

+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> [= <replaceable class="parameter">value</replaceable>] [, ... ] )

+ <term><literal>SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )</literal></term>

Should we modify the first place to:
+ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</replaceable> [, ... ] )

Because currently there is only one skip_option - xid, and a parameter must be
specified when using it.

Good observation. Do we really need [, ... ] as currently, we support
only one value for XID?

I think no. In the doc, it should be:

ALTER SUBSCRIPTION name SKIP ( skip_option = value )

In the latest patch, I see:
+ <varlistentry>
+ <term><literal>SKIP ( <replaceable
class="parameter">skip_option</replaceable> = <replaceable
class="parameter">value</replaceable> [, ... ] )</literal></term>

What do we want to indicate by [, ... ]? To me, it appears like
multiple options but that is not what we support currently.

You're right. It's an oversight.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#447osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#440)
RE: Skipping logical replication transactions on subscriber side

On Friday, January 21, 2022 12:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporated these commends as well as
other comments I got so far.

Thank you for your update !

Few minor comments.

(1) trivial question

For the users,
was it perfectly clear that in the cascading logical replication setup,
we can't selectively skip an arbitrary transaction of one upper nodes,
without skipping its all executions on subsequent nodes,
when we refer to the current doc description of v9 ?

IIUC, this is because we don't write changes WAL either and
can't propagate the contents to subsequent nodes.

I tested this case and it didn't, as I expected.
This can apply to other measures for conflicts, though.

(2) suggestion

There's no harm in writing a notification for a committer
"Bump catalog version" in the commit log,
as the patch changes the catalog.

(3) minor question

In the past, there was a discussion that
it might be better if we reset the XID
according to a change of subconninfo,
which might be an opportunity to connect another
publisher of a different XID space.
Currently, we can regard it as user's responsibility.
Was this correct ?

Best Regards,
Takamichi Osumi

#448Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#447)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 10:32 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Friday, January 21, 2022 12:08 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporated these commends as well as
other comments I got so far.

Thank you for your update !

Few minor comments.

(1) trivial question

For the users,
was it perfectly clear that in the cascading logical replication setup,
we can't selectively skip an arbitrary transaction of one upper nodes,
without skipping its all executions on subsequent nodes,
when we refer to the current doc description of v9 ?

IIUC, this is because we don't write changes WAL either and
can't propagate the contents to subsequent nodes.

I tested this case and it didn't, as I expected.
This can apply to other measures for conflicts, though.

Right, there is nothing new as the user will same effect when she uses
existing function pg_replication_origin_advance(). So, not sure if we
want to add something specific to this.

(3) minor question

In the past, there was a discussion that
it might be better if we reset the XID
according to a change of subconninfo,
which might be an opportunity to connect another
publisher of a different XID space.
Currently, we can regard it as user's responsibility.
Was this correct ?

I think if the user points to another publisher, doesn't it similarly
needs to change slot_name as well? If so, I think this can be treated
in a similar way.

--
With Regards,
Amit Kapila.

#449Greg Nancarrow
Greg Nancarrow
gregn4422@gmail.com
In reply to: Masahiko Sawada (#440)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 2:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporated these commends as
well as other comments I got so far.

src/backend/replication/logical/worker.c

(1)
Didn't you mean to say "check the" instead of "clear" in the following
comment? (the subtransaction's XID was never being cleared before,
just checked against the skipxid, and now that check has been removed)

+ * ...      .  Since we don't
+ * support skipping individual subtransactions we don't clear
+ * subtransaction's XID.

Other than that, the patch LGTM.

Regards,
Greg Nancarrow
Fujitsu Australia

#450osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Amit Kapila (#448)
RE: Skipping logical replication transactions on subscriber side

On Friday, January 21, 2022 2:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 21, 2022 at 10:32 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Friday, January 21, 2022 12:08 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

I've attached an updated patch that incorporated these commends as
well as other comments I got so far.

Thank you for your update !

Few minor comments.

(1) trivial question

For the users,
was it perfectly clear that in the cascading logical replication
setup, we can't selectively skip an arbitrary transaction of one upper
nodes, without skipping its all executions on subsequent nodes, when
we refer to the current doc description of v9 ?

IIUC, this is because we don't write changes WAL either and can't
propagate the contents to subsequent nodes.

I tested this case and it didn't, as I expected.
This can apply to other measures for conflicts, though.

Right, there is nothing new as the user will same effect when she uses existing
function pg_replication_origin_advance(). So, not sure if we want to add
something specific to this.

Okay, thank you for clarifying this !
That's good to know.

(3) minor question

In the past, there was a discussion that it might be better if we
reset the XID according to a change of subconninfo, which might be an
opportunity to connect another publisher of a different XID space.
Currently, we can regard it as user's responsibility.
Was this correct ?

I think if the user points to another publisher, doesn't it similarly needs to
change slot_name as well? If so, I think this can be treated in a similar way.

I see. Then, in the AlterSubscription(), switching a slot_name
doesn't affect other columns, which means this time,
we don't need some special measure for this either as well, IIUC.
Thanks !

Best Regards,
Takamichi Osumi

#451Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#446)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 10:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 21, 2022 at 1:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

What do we want to indicate by [, ... ]? To me, it appears like
multiple options but that is not what we support currently.

You're right. It's an oversight.

I have fixed this and a few other things in the attached patch.
1.
The newly added column needs to be updated in the following statement:
-- All columns of pg_subscription except subconninfo are publicly readable.
REVOKE ALL ON pg_subscription FROM public;
GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
substream, subtwophasestate, subslotname, subsynccommit,
subpublications)
ON pg_subscription TO public;

2.
+stop_skipping_changes(bool clear_subskipxid, XLogRecPtr origin_lsn,
+   TimestampTz origin_timestamp)
+{
+ Assert(is_skipping_changes());
+
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction %u",
+ skip_xid)));

Isn't it better to move this LOG at the end of this function? Because
clear* functions can give an error, so it is better to move it after
that. I have done that in the attached.

3.
+-- fail - must be superuser
+SET SESSION AUTHORIZATION 'regress_subscription_user2';
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 100);
+ERROR:  must be owner of subscription regress_testsub

This test doesn't seem to be right. You want to get the error for the
superuser but the error is for the owner. I have changed this test to
do what it intends to do.

Apart from this, I have changed a few comments and ran pgindent. Do
let me know what you think of the changes?

Few things that I think we can improve in 028_skip_xact.pl are as follows:

After CREATE SUBSCRIPTION, wait for initial sync to be over and
two_phase state to be enabled. Please see 021_twophase. For the
streaming case, we might be able to ensure streaming even with lesser
data. Can you please try that?

--
With Regards,
Amit Kapila.

Attachments:

v10-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patchapplication/octet-stream; name=v10-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
#452Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#451)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 21, 2022 at 10:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Few things that I think we can improve in 028_skip_xact.pl are as follows:

After CREATE SUBSCRIPTION, wait for initial sync to be over and
two_phase state to be enabled. Please see 021_twophase. For the
streaming case, we might be able to ensure streaming even with lesser
data. Can you please try that?

I noticed that the newly added test by this patch takes time is on the
upper side. See comparison with the subscription test that takes max
time:
[17:38:49] t/028_skip_xact.pl ................. ok 9298 ms
[17:38:59] t/100_bugs.pl ...................... ok 11349 ms

I think we can reduce time by removing some stream tests without much
impacting on coverage, possibly related to 2PC and streaming together,
and if you do that we probably don't need a subscription with both 2PC
and streaming enabled.

--
With Regards,
Amit Kapila.

#453Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Masahiko Sawada (#440)
Re: Skipping logical replication transactions on subscriber side

On 21.01.22 04:08, Masahiko Sawada wrote:

I think the superuser check in AlterSubscription() might no longer be
appropriate. Subscriptions can now be owned by non-superusers. Please
check that.

IIUC we don't allow non-superuser to own the subscription yet. We
still have the following superuser checks:

In CreateSubscription():

if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser to create subscriptions")));

and in AlterSubscriptionOwner_internal();

/* New owner must be a superuser */
if (!superuser_arg(newOwnerId))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied to change owner of
subscription \"%s\"",
NameStr(form->subname)),
errhint("The owner of a subscription must be a superuser.")));

Also, doing superuser check here seems to be consistent with
pg_replication_origin_advance() which is another way to skip
transactions and also requires superuser permission.

I'm referring to commit a2ab9c06ea15fbcb2bfde570986a06b37f52bcca. You
still have to be superuser to create a subscription, but you can change
the owner to a nonprivileged user and it will observe table permissions
on the subscriber.

Assuming my understanding of that commit is correct, I think it would be
sufficient in your patch to check that the current user is the owner of
the subscription.

#454David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Kapila (#451)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Apart from this, I have changed a few comments and ran pgindent. Do
let me know what you think of the changes?

The paragraph describing ALTER SUBSCRIPTION SKIP seems unnecessarily
repetitive. Consider:
"""
Skips applying all changes of the specified remote transaction, whose value
should be obtained from pg_stat_subscription_workers.last_error_xid. While
this will result in avoiding the last error on the subscription, thus
allowing it to resume working. See "link to a more holistic description in
the Logical Replication chapter" for alternative means of resolving
subscription errors. Removing an entire transaction from the history of a
table should be considered a last resort as it can leave the system in a
very inconsistent state.

Note, this feature will not accept transactions prepared under two-phase
commit.

This command sets pg_subscription.subskipxid field upon issuance and the
system clears the same field upon seeing and successfully skipped the
identified transaction. Issuing this command again while a skipped
transaction is pending replaces the existing transaction with the new one.
"""

Then change the subskipxid column description to be:
"""
ID of the transaction whose changes are to be skipped. It is 0 when there
are no pending skips. This is set by issuing ALTER SUBSCRIPTION SKIP and
resets back to 0 when the identified transactions passes through the
subscription stream and is successfully ignored.
"""

I don't understand why/how ", if a valid transaction ID;" comes into play
(how would we know whether it is valid, or if we do ALTER SUBSCRIPTION SKIP
should prohibit the invalid value from being chosen).

I'm against mentioning subtransactions in the skip_option description.

The Logical Replication page changes provide good content overall but I
dislike going into detail about how to perform conflict resolution in the
third paragraph and then summarize the various forms of conflict resolution
in the newly added forth. Maybe re-work things like:

1. Logical replication behaves...
2. A conflict will produce...details can be found in places...
3. Resolving conflicts can be done by...
4. (split and reworded) If choosing to simply skip the offending
transaction you take the pg_stat_subscription_worker.last_error_xid value
(716 in the example above) and provide it while executing ALTER
SUBSCRIPTION SKIP...
5. (split and reworded) Prior to v15 ALTER SUBSCRIPTION SKIP was not
available and instead you had to use the pg_replication_origin_advance()
function...

Don't just list out two options for the user to perform the same action.
Tell a story about why we felt compelled to add ALTER SYSTEM SKIP and why
either the function is now deprecated or is useful given different
circumstances (the former seems likely).

David J.

#455Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#453)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 7:23 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 21.01.22 04:08, Masahiko Sawada wrote:

I think the superuser check in AlterSubscription() might no longer be
appropriate. Subscriptions can now be owned by non-superusers. Please
check that.

IIUC we don't allow non-superuser to own the subscription yet. We
still have the following superuser checks:

In CreateSubscription():

if (!superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser to create subscriptions")));

and in AlterSubscriptionOwner_internal();

/* New owner must be a superuser */
if (!superuser_arg(newOwnerId))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied to change owner of
subscription \"%s\"",
NameStr(form->subname)),
errhint("The owner of a subscription must be a superuser.")));

Also, doing superuser check here seems to be consistent with
pg_replication_origin_advance() which is another way to skip
transactions and also requires superuser permission.

I'm referring to commit a2ab9c06ea15fbcb2bfde570986a06b37f52bcca. You
still have to be superuser to create a subscription, but you can change
the owner to a nonprivileged user and it will observe table permissions
on the subscriber.

Assuming my understanding of that commit is correct, I think it would be
sufficient in your patch to check that the current user is the owner of
the subscription.

Won't we already do that for Alter Subscription command which means
nothing special needs to be done for this? However, it seems to me
that the idea we are trying to follow here is that as this option can
lead to data inconsistency, it is good to allow only superusers to
specify this option. The owner of the subscription can be changed to
non-superuser as well in which case I think it won't be a good idea to
allow this option. OTOH, if we think it is okay to allow such an
option to users that don't have superuser privilege then I think
allowing it to the owner of the subscription makes sense to me.

--
With Regards,
Amit Kapila.

#456Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: David G. Johnston (#454)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 10:00 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Apart from this, I have changed a few comments and ran pgindent. Do
let me know what you think of the changes?

The paragraph describing ALTER SUBSCRIPTION SKIP seems unnecessarily repetitive. Consider:
"""
Skips applying all changes of the specified remote transaction, whose value should be obtained from pg_stat_subscription_workers.last_error_xid.

Here, you can also say that the value can be found from server logs as well.

While this will result in avoiding the last error on the
subscription, thus allowing it to resume working. See "link to a more
holistic description in the Logical Replication chapter" for
alternative means of resolving subscription errors. Removing an
entire transaction from the history of a table should be considered a
last resort as it can leave the system in a very inconsistent state.

Note, this feature will not accept transactions prepared under two-phase commit.

This command sets pg_subscription.subskipxid field upon issuance and the system clears the same field upon seeing and successfully skipped the identified transaction. Issuing this command again while a skipped transaction is pending replaces the existing transaction with the new one.
"""

The proposed text sounds better to me except for a minor change as
suggested above.

Then change the subskipxid column description to be:
"""
ID of the transaction whose changes are to be skipped. It is 0 when there are no pending skips. This is set by issuing ALTER SUBSCRIPTION SKIP and resets back to 0 when the identified transactions passes through the subscription stream and is successfully ignored.
"""

Users can manually reset it by specifying NONE, so that should be
covered in the above text, otherwise, looks good.

I don't understand why/how ", if a valid transaction ID;" comes into play (how would we know whether it is valid, or if we do ALTER SUBSCRIPTION SKIP should prohibit the invalid value from being chosen).

What do you mean by invalid value here? Is it the value lesser than
FirstNormalTransactionId or a value that is of the non-error
transaction? For the former, we already have a check in the patch and
for later we can't identify it with any certainty because the error
stats are collected by the stats collector.

I'm against mentioning subtransactions in the skip_option description.

We have mentioned that because currently, we don't support it but in
the future one can come up with an idea to support it. What problem do
you see with it?

The Logical Replication page changes provide good content overall but I dislike going into detail about how to perform conflict resolution in the third paragraph and then summarize the various forms of conflict resolution in the newly added forth. Maybe re-work things like:

1. Logical replication behaves...
2. A conflict will produce...details can be found in places...
3. Resolving conflicts can be done by...
4. (split and reworded) If choosing to simply skip the offending transaction you take the pg_stat_subscription_worker.last_error_xid value (716 in the example above) and provide it while executing ALTER SUBSCRIPTION SKIP...
5. (split and reworded) Prior to v15 ALTER SUBSCRIPTION SKIP was not available and instead you had to use the pg_replication_origin_advance() function...

Don't just list out two options for the user to perform the same action. Tell a story about why we felt compelled to add ALTER SYSTEM SKIP and why either the function is now deprecated or is useful given different circumstances (the former seems likely).

Personally, I don't see much value in the split (especially giving
context like "Prior to v15 ..) but specifying the circumstances where
each of the options could be useful.

--
With Regards,
Amit Kapila.

#457David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Kapila (#456)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 10:30 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Fri, Jan 21, 2022 at 10:00 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

Apart from this, I have changed a few comments and ran pgindent. Do
let me know what you think of the changes?

The paragraph describing ALTER SUBSCRIPTION SKIP seems unnecessarily

repetitive. Consider:

"""
Skips applying all changes of the specified remote transaction, whose

value should be obtained from pg_stat_subscription_workers.last_error_xid.

Here, you can also say that the value can be found from server logs as
well.

subscriber's server logs, right? I would agree that adding that for
completeness is warranted.

Then change the subskipxid column description to be:
"""
ID of the transaction whose changes are to be skipped. It is 0 when

there are no pending skips. This is set by issuing ALTER SUBSCRIPTION SKIP
and resets back to 0 when the identified transactions passes through the
subscription stream and is successfully ignored.

"""

Users can manually reset it by specifying NONE, so that should be
covered in the above text, otherwise, looks good.

I agree with incorporating "reset" into the paragraph somehow - does not
have to mention NONE, just that ALTER SUBSCRIPTION SKIP (not a family
friendly abbreviation...) is what does it.

I don't understand why/how ", if a valid transaction ID;" comes into

play (how would we know whether it is valid, or if we do ALTER SUBSCRIPTION
SKIP should prohibit the invalid value from being chosen).

What do you mean by invalid value here? Is it the value lesser than
FirstNormalTransactionId or a value that is of the non-error
transaction? For the former, we already have a check in the patch and
for later we can't identify it with any certainty because the error
stats are collected by the stats collector.

The original proposal qualifies the non-zero transaction id in
subskipxid as being a "valid transaction ID" and that invalid ones (which
is how "otherwise" is interpreted given the "valid" qualification preceding
it) are shown as 0. As an end-user that makes me wonder what it means for
a transaction ID to be invalid. My point is that dropping the mention of
"valid transaction ID" avoids that and lets the reader operate with an
understanding that things should "just work". If I see a non-zero in the
column I have a pending skip and if I see zero I do not. My wording
assumes it is that simple. If it isn't I would need some clarity as to why
it is not in order to write something I could read and understand from my
inexperienced user-centric point-of-view.

I get that I may provide a transaction ID that is invalid such that the
system could never see it (or at least not for a long while) - say we
error on transaction 102 and I typo it as 1002 or 101. But I would expect
either an error where I make the typo or the numbers 1002 or 101 to appear
on the table. I would not expect my 101 typo to result in a 0 appearing on
the table (and if it does so today I argue that is a POLA violation).
Thus, "if a valid transaction ID" from the original text just doesn't make
sense to me.

In typical usage it would seem strange to allow a skip to be recorded if
there is no existing error in the subscription. Should we (do we, haven't
read the code) warn in that situation?

*Or, why even force them to specify a number instead of just saying SKIP
and if there is a current error we skip its transaction, otherwise we warn
them that nothing happened because there is no last error.*

Additionally, the description for pg_stat_subscription_workers should
describe what happens once the transaction represented by last_error_xid
has either been successfully processed or skipped. Does this "last error"
stick around until another error happens (which is hopefully very rare) or
does it reset to blanks? Seems like it should reset, which really makes
this more of an "active_error" instead of a "last_error". This system is
linear, we are stuck until this error is resolved, making it active.

I'm against mentioning subtransactions in the skip_option description.

We have mentioned that because currently, we don't support it but in
the future one can come up with an idea to support it. What problem do
you see with it?

If you ever get around to implementing the feature then by all means add
it. My main issue is that we basically never talk about subtransactions in
the user-facing documentation and it doesn't seem desirable to do so here.
Knowing that a whole transaction is skipped is all I need to care about as
a user. I believe that no users will be asking "what about subtransactions
(savepoints)" but by mentioning it less experienced ones will now have
something to be curious about that they really do not need to be.

The Logical Replication page changes provide good content overall but I

dislike going into detail about how to perform conflict resolution in the
third paragraph and then summarize the various forms of conflict resolution
in the newly added forth. Maybe re-work things like:

Personally, I don't see much value in the split (especially giving
context like "Prior to v15 ..) but specifying the circumstances where
each of the options could be useful.

Yes, I've been reminded of the desire to avoid mentioning versions and
agree doing so here is correct. The added context is desired, the style
depends on the content.

David J.

#458Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: David G. Johnston (#457)
Re: Skipping logical replication transactions on subscriber side

On Sat, Jan 22, 2022 at 12:41 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Fri, Jan 21, 2022 at 10:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 21, 2022 at 10:00 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Apart from this, I have changed a few comments and ran pgindent. Do
let me know what you think of the changes?

The paragraph describing ALTER SUBSCRIPTION SKIP seems unnecessarily repetitive. Consider:
"""
Skips applying all changes of the specified remote transaction, whose value should be obtained from pg_stat_subscription_workers.last_error_xid.

Here, you can also say that the value can be found from server logs as well.

subscriber's server logs, right?

Right.

I would agree that adding that for completeness is warranted.

Then change the subskipxid column description to be:
"""
ID of the transaction whose changes are to be skipped. It is 0 when there are no pending skips. This is set by issuing ALTER SUBSCRIPTION SKIP and resets back to 0 when the identified transactions passes through the subscription stream and is successfully ignored.
"""

Users can manually reset it by specifying NONE, so that should be
covered in the above text, otherwise, looks good.

I agree with incorporating "reset" into the paragraph somehow - does not have to mention NONE, just that ALTER SUBSCRIPTION SKIP (not a family friendly abbreviation...) is what does it.

It is not clear to me what you have in mind here but to me in this
context saying "Setting <literal>NONE</literal> resets the transaction
ID." seems quite reasonable.

I don't understand why/how ", if a valid transaction ID;" comes into play (how would we know whether it is valid, or if we do ALTER SUBSCRIPTION SKIP should prohibit the invalid value from being chosen).

What do you mean by invalid value here? Is it the value lesser than
FirstNormalTransactionId or a value that is of the non-error
transaction? For the former, we already have a check in the patch and
for later we can't identify it with any certainty because the error
stats are collected by the stats collector.

The original proposal qualifies the non-zero transaction id in subskipxid as being a "valid transaction ID" and that invalid ones (which is how "otherwise" is interpreted given the "valid" qualification preceding it) are shown as 0. As an end-user that makes me wonder what it means for a transaction ID to be invalid. My point is that dropping the mention of "valid transaction ID" avoids that and lets the reader operate with an understanding that things should "just work". If I see a non-zero in the column I have a pending skip and if I see zero I do not. My wording assumes it is that simple. If it isn't I would need some clarity as to why it is not in order to write something I could read and understand from my inexperienced user-centric point-of-view.

I get that I may provide a transaction ID that is invalid such that the system could never see it (or at least not for a long while) - say we error on transaction 102 and I typo it as 1002 or 101. But I would expect either an error where I make the typo or the numbers 1002 or 101 to appear on the table. I would not expect my 101 typo to result in a 0 appearing on the table (and if it does so today I argue that is a POLA violation). Thus, "if a valid transaction ID" from the original text just doesn't make sense to me.

In typical usage it would seem strange to allow a skip to be recorded if there is no existing error in the subscription. Should we (do we, haven't read the code) warn in that situation?

Yeah, we will error in that situation. The only invalid values are
system reserved values (1,2).

Or, why even force them to specify a number instead of just saying SKIP and if there is a current error we skip its transaction, otherwise we warn them that nothing happened because there is no last error.

The idea is that we might extend this feature to skip specific
operations on relations or maybe by having other identifiers. One idea
we discussed was to automatically fetch the last error xid but then
decided it can be done as a later patch.

Additionally, the description for pg_stat_subscription_workers should describe what happens once the transaction represented by last_error_xid has either been successfully processed or skipped. Does this "last error" stick around until another error happens (which is hopefully very rare) or does it reset to blanks?

It will be reset only on subscription drop, otherwise, it will stick
around until another error happens.

Seems like it should reset, which really makes this more of an "active_error" instead of a "last_error". This system is linear, we are stuck until this error is resolved, making it active.

I'm against mentioning subtransactions in the skip_option description.

We have mentioned that because currently, we don't support it but in
the future one can come up with an idea to support it. What problem do
you see with it?

If you ever get around to implementing the feature then by all means add it. My main issue is that we basically never talk about subtransactions in the user-facing documentation and it doesn't seem desirable to do so here. Knowing that a whole transaction is skipped is all I need to care about as a user. I believe that no users will be asking "what about subtransactions (savepoints)" but by mentioning it less experienced ones will now have something to be curious about that they really do not need to be.

It is not that we don't mention subtransactions in the docs but I see
your point and I think we can avoid mentioning it in this case.

--
With Regards,
Amit Kapila.

#459David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Kapila (#458)
Re: Skipping logical replication transactions on subscriber side

On Sat, Jan 22, 2022 at 2:41 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jan 22, 2022 at 12:41 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Fri, Jan 21, 2022 at 10:30 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Fri, Jan 21, 2022 at 10:00 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Fri, Jan 21, 2022 at 4:55 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

I agree with incorporating "reset" into the paragraph somehow - does not

have to mention NONE, just that ALTER SUBSCRIPTION SKIP (not a family
friendly abbreviation...) is what does it.

It is not clear to me what you have in mind here but to me in this
context saying "Setting <literal>NONE</literal> resets the transaction
ID." seems quite reasonable.

OK

Yeah, we will error in that situation. The only invalid values are
system reserved values (1,2).

So long as the ALTER command errors when asked to skip those IDs there
isn't any reason for an end-user, who likely doesn't know or care that 1
and 2 are special, to be concerned about them (the only two invalid values)
while reading the docs.

Or, why even force them to specify a number instead of just saying SKIP

and if there is a current error we skip its transaction, otherwise we warn
them that nothing happened because there is no last error.

The idea is that we might extend this feature to skip specific
operations on relations or maybe by having other identifiers.

Again, you've already got syntax reserved that lets you add more features
to this command in the future; and removing warnings or errors because new
features make them moot is easy. Lets document and code what we are
willing to implement today. A single top-level transaction xid that is
presently blocking the worker from applying any more WAL.

One idea

we discussed was to automatically fetch the last error xid but then
decided it can be done as a later patch.

This seems backwards. The user-friendly approach is to not make them type
in anything at all. That said, this particular UX seems like it could use
some safety. Thus I would propose at this time that attempting to set the
skip_option to anything but THE active_error_xid for the named subscription
results in an error. Once you add new features the user can set the
skip_option to other things without provoking errors. Again, I consider
this a safety feature since the user now has to accurately match the xid to
the name in the SQL in order to perform a successful skip - and the to-be
affected transaction has to be one that is preventing replication from
moving forward. I'm not interested in providing a foot-gun where an
arbitrary future transaction can be scheduled to be skipped. Running the
command twice with the same values should provoke an error since the first
run should be allowed to finish (?). Also, we handle the situation where
the state of the worker changes between when the user saw the error and
wrote down the xid to skip and the actual execution of the alter command.
Maybe not highly anticipated scenarios but this is an easy win to deal with
them.

Additionally, the description for pg_stat_subscription_workers should

describe what happens once the transaction represented by last_error_xid
has either been successfully processed or skipped. Does this "last error"
stick around until another error happens (which is hopefully very rare) or
does it reset to blanks?

It will be reset only on subscription drop, otherwise, it will stick
around until another error happens.

I really dislike the user experience this provides, and given it is new in
v15 (and right now this table seems to exist solely to support this
feature) changing this seems within the realm of possibility. I have to
imagine these workers have a sense of local state that would just be "no
errors, no need to touch pg_stat_subscription_workers at the end of this
transaction's commit". It would save a local state of the error_xid and if
a successfully committed transaction has that xid it would clear the
error. The skip code path would also check for and see the matching xid
value and clear the error. Even if the local state thing doesn't work, one
catalog lookup per transaction seems like potentially reasonable overhead
to incur here.

David J.

#460David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#459)
Re: Skipping logical replication transactions on subscriber side

On Sat, Jan 22, 2022 at 9:21 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Sat, Jan 22, 2022 at 2:41 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

Additionally, the description for pg_stat_subscription_workers should

describe what happens once the transaction represented by last_error_xid
has either been successfully processed or skipped. Does this "last error"
stick around until another error happens (which is hopefully very rare) or
does it reset to blanks?

It will be reset only on subscription drop, otherwise, it will stick
around until another error happens.

I really dislike the user experience this provides, and given it is new in
v15 (and right now this table seems to exist solely to support this
feature) changing this seems within the realm of possibility. I have to
imagine these workers have a sense of local state that would just be "no
errors, no need to touch pg_stat_subscription_workers at the end of this
transaction's commit". It would save a local state of the error_xid and if
a successfully committed transaction has that xid it would clear the
error. The skip code path would also check for and see the matching xid
value and clear the error. Even if the local state thing doesn't work, one
catalog lookup per transaction seems like potentially reasonable overhead
to incur here.

It shouldn't even need to be that overhead intensive. Once an error is
encountered the system stops. By construction it must be told to redo, at
which point the information about "last error" is no longer relevant and
can be removed (for skipping the user/system will have already done
everything with the xid that is needed before the redo is issued). In the
steady-state it then is simply empty until a new error arises at which
point it becomes populated again; and stays that way until the system goes
into redo mode as instructed by the user via one of several methods.

David J.

#461Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#452)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 9:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 21, 2022 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 21, 2022 at 10:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Few things that I think we can improve in 028_skip_xact.pl are as follows:

After CREATE SUBSCRIPTION, wait for initial sync to be over and
two_phase state to be enabled. Please see 021_twophase. For the
streaming case, we might be able to ensure streaming even with lesser
data. Can you please try that?

I noticed that the newly added test by this patch takes time is on the
upper side. See comparison with the subscription test that takes max
time:
[17:38:49] t/028_skip_xact.pl ................. ok 9298 ms
[17:38:59] t/100_bugs.pl ...................... ok 11349 ms

I think we can reduce time by removing some stream tests without much
impacting on coverage, possibly related to 2PC and streaming together,
and if you do that we probably don't need a subscription with both 2PC
and streaming enabled.

Agreed.

In addition to that, after some tests, I realized that the two tests
of ROLLBACK PREPARED are not stable. If the walsender detects a
concurrent abort of the transaction that is being decoded, it’s
possible that it sends only beigin_prepare and prepare messages, and
consequently. If this happens before setting skip_xid, a unique key
constraint violation doesn’t occur on the subscription, and
consequently, skip_xid is not cleared. We can reduce the possibility
by setting a very high value to wal_retrieve_retry_interval but I
think it’s better to remove them. What do you think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#462Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#451)
Re: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 at 8:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 21, 2022 at 10:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 21, 2022 at 1:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

What do we want to indicate by [, ... ]? To me, it appears like
multiple options but that is not what we support currently.

You're right. It's an oversight.

I have fixed this and a few other things in the attached patch.

Thank you for updating the patch!

1.
The newly added column needs to be updated in the following statement:
-- All columns of pg_subscription except subconninfo are publicly readable.
REVOKE ALL ON pg_subscription FROM public;
GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
substream, subtwophasestate, subslotname, subsynccommit,
subpublications)
ON pg_subscription TO public;

2.
+stop_skipping_changes(bool clear_subskipxid, XLogRecPtr origin_lsn,
+   TimestampTz origin_timestamp)
+{
+ Assert(is_skipping_changes());
+
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction %u",
+ skip_xid)));

Isn't it better to move this LOG at the end of this function? Because
clear* functions can give an error, so it is better to move it after
that. I have done that in the attached.

3.
+-- fail - must be superuser
+SET SESSION AUTHORIZATION 'regress_subscription_user2';
+ALTER SUBSCRIPTION regress_testsub SKIP (xid = 100);
+ERROR:  must be owner of subscription regress_testsub

This test doesn't seem to be right. You want to get the error for the
superuser but the error is for the owner. I have changed this test to
do what it intends to do.

Apart from this, I have changed a few comments and ran pgindent. Do
let me know what you think of the changes?

Agree with these changes.

Few things that I think we can improve in 028_skip_xact.pl are as follows:

After CREATE SUBSCRIPTION, wait for initial sync to be over and
two_phase state to be enabled. Please see 021_twophase.

Agreed.

For the
streaming case, we might be able to ensure streaming even with lesser
data. Can you please try that?

Yeah, after some tests, it's enough to insert 500 rows as follows:

INSERT INTO test_tab_streaming SELECT i, md5(i::text) FROM
generate_series(1, 500) s(i);

I've just sent another email about that probably we can remove two
tests for ROLLBACK PREPARED, so I’ll update the patch while including
this point.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#463Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#461)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 24, 2022 at 8:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Jan 21, 2022 at 9:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 21, 2022 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 21, 2022 at 10:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Few things that I think we can improve in 028_skip_xact.pl are as follows:

After CREATE SUBSCRIPTION, wait for initial sync to be over and
two_phase state to be enabled. Please see 021_twophase. For the
streaming case, we might be able to ensure streaming even with lesser
data. Can you please try that?

I noticed that the newly added test by this patch takes time is on the
upper side. See comparison with the subscription test that takes max
time:
[17:38:49] t/028_skip_xact.pl ................. ok 9298 ms
[17:38:59] t/100_bugs.pl ...................... ok 11349 ms

I think we can reduce time by removing some stream tests without much
impacting on coverage, possibly related to 2PC and streaming together,
and if you do that we probably don't need a subscription with both 2PC
and streaming enabled.

Agreed.

In addition to that, after some tests, I realized that the two tests
of ROLLBACK PREPARED are not stable. If the walsender detects a
concurrent abort of the transaction that is being decoded, it’s
possible that it sends only beigin_prepare and prepare messages, and
consequently. If this happens before setting skip_xid, a unique key
constraint violation doesn’t occur on the subscription, and
consequently, skip_xid is not cleared. We can reduce the possibility
by setting a very high value to wal_retrieve_retry_interval but I
think it’s better to remove them.

+1.

--
With Regards,
Amit Kapila.

#464Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: David G. Johnston (#459)
Re: Skipping logical replication transactions on subscriber side

On Sat, Jan 22, 2022 at 9:51 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

So long as the ALTER command errors when asked to skip those IDs there isn't any reason for an end-user, who likely doesn't know or care that 1 and 2 are special, to be concerned about them (the only two invalid values) while reading the docs.

In this matter, I don't see any problem with the current text proposed
and there are many others who have also reviewed it. I am fine to
change if others also think that the current text needs to be changed.

Additionally, the description for pg_stat_subscription_workers should describe what happens once the transaction represented by last_error_xid has either been successfully processed or skipped. Does this "last error" stick around until another error happens (which is hopefully very rare) or does it reset to blanks?

It will be reset only on subscription drop, otherwise, it will stick
around until another error happens.

I really dislike the user experience this provides, and given it is new in v15 (and right now this table seems to exist solely to support this feature) changing this seems within the realm of possibility. I have to imagine these workers have a sense of local state that would just be "no errors, no need to touch pg_stat_subscription_workers at the end of this transaction's commit". It would save a local state of the error_xid and if a successfully committed transaction has that xid it would clear the error. The skip code path would also check for and see the matching xid value and clear the error. Even if the local state thing doesn't work, one catalog lookup per transaction seems like potentially reasonable overhead to incur here.

Are you telling to update the catalog to save error_xid when an error
occurs? If so, that has many challenges like we are not supposed to
perform any such operations when the transaction is in an error state.
We have discussed this and other ideas in the beginning. I don't find
any of your arguments convincing to change the basic approach here but
I would like to see what others think on this matter?

--
With Regards,
Amit Kapila.

#465David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Kapila (#464)
Re: Skipping logical replication transactions on subscriber side

On Sun, Jan 23, 2022 at 8:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I really dislike the user experience this provides, and given it is new

in v15 (and right now this table seems to exist solely to support this
feature) changing this seems within the realm of possibility. I have to
imagine these workers have a sense of local state that would just be "no
errors, no need to touch pg_stat_subscription_workers at the end of this
transaction's commit". It would save a local state of the error_xid and if
a successfully committed transaction has that xid it would clear the
error. The skip code path would also check for and see the matching xid
value and clear the error. Even if the local state thing doesn't work, one
catalog lookup per transaction seems like potentially reasonable overhead
to incur here.

Are you telling to update the catalog to save error_xid when an error
occurs? If so, that has many challenges like we are not supposed to
perform any such operations when the transaction is in an error state.
We have discussed this and other ideas in the beginning. I don't find
any of your arguments convincing to change the basic approach here but
I would like to see what others think on this matter?

Then how does the table get updated to that state in the first place since
it doesn't know the error details until there is an error?

In any case, clearing out the entries in the table would not happen while
it is applying the replication stream, in an error state or otherwise.

in = while streaming
out = not streaming

1(in). replication stream is working
2(in). replication stream fails; capture error information
3(in->out). stop replication stream; perform rollback on xid
4(out). update pg_stat_subscription_worker to report the failure, including
xid of the transaction
5(out). wait for the user to manually restart the replication stream
[if they do so by skipping the xid, save the xid from
pg_stat_subscription_worker into pg_subscription.subskipxid - possibly
requiring the user to confirm the xid]
[user has now done their thing and requested that the replication stream
resume]
6(out). clear the error information from pg_stat_subscription_worker; it is
no longer useful/doesn't exist because the user just took action to avoid
that very error, one way (skipping its transaction) or another.
7(out->in). resume the replication stream, return to step 1

You are already doing steps 1-5 and 7 today however you are forced to deal
with transactions and catalog access. I am just adding step 6, which turns
last_error_xid into current_error_xid because it is current value of the
error in the stream during step 5 when the user needs to decide how to
recover from the error. Once the user decides and the stream resumes that
error information has no value (go look in the logs if you want history).
Thus when 7 comes around and the stream is restarted the error info in
pg_stat_subscription_worker is empty waiting for the next error to happen.
If the user did nothing in step 5 then when that same wal is replayed at
step 2 the error will come back.

The main thing is how many ways can the user exit step 5 and to make sure
that no matter which way they exit step 6 happens before step 7.

David J.

#466tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
tanghy.fnst@fujitsu.com
In reply to: Amit Kapila (#451)
RE: Skipping logical replication transactions on subscriber side

On Fri, Jan 21, 2022 7:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

2.
+stop_skipping_changes(bool clear_subskipxid, XLogRecPtr origin_lsn,
+   TimestampTz origin_timestamp)
+{
+ Assert(is_skipping_changes());
+
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction %u",
+ skip_xid)));

Isn't it better to move this LOG at the end of this function? Because
clear* functions can give an error, so it is better to move it after
that. I have done that in the attached.

+	/* Stop skipping changes */
+	skip_xid = InvalidTransactionId;
+
+	ereport(LOG,
+			(errmsg("done skipping logical replication transaction %u",
+					skip_xid)));

I think we can move the LOG before resetting skip_xid, otherwise skip_xid would
always be 0 in the LOG.

Regards,
Tang

#467Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David G. Johnston (#465)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 24, 2022 at 1:49 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Sun, Jan 23, 2022 at 8:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I really dislike the user experience this provides, and given it is new in v15 (and right now this table seems to exist solely to support this feature) changing this seems within the realm of possibility. I have to imagine these workers have a sense of local state that would just be "no errors, no need to touch pg_stat_subscription_workers at the end of this transaction's commit". It would save a local state of the error_xid and if a successfully committed transaction has that xid it would clear the error. The skip code path would also check for and see the matching xid value and clear the error. Even if the local state thing doesn't work, one catalog lookup per transaction seems like potentially reasonable overhead to incur here.

Are you telling to update the catalog to save error_xid when an error
occurs? If so, that has many challenges like we are not supposed to
perform any such operations when the transaction is in an error state.
We have discussed this and other ideas in the beginning. I don't find
any of your arguments convincing to change the basic approach here but
I would like to see what others think on this matter?

Then how does the table get updated to that state in the first place since it doesn't know the error details until there is an error?

I think your idea is based on storing error information including XID
is stored in the system catalog. I think that the reasons why we use
the stats collector to store error information including
last_error_xid are (1) as Amit mentioned, it would have many
challenges if updating the catalog when the transaction is in an error
state, and (2) we can store more information such as error messages,
action, etc. other than XID so that users can identify that the
reported error is a conflict error but not other types of error such
as OOM error. For these reasons to me, it makes sense to store
subscribers' error information by using the stats collector.

When it comes to reporting a message to the stats collector, we need
to note that it's not guaranteed that all messages arrive at the stats
collector. Therefore, last_error_xid doesn't not necessarily get
updated after the worker reports an error. Similarly, the same is true
for clearing subskipxid. I agree that it's useful if
pg_subscription.subskipxid is automatically set when executing ALTER
SUBSCRIPTION SKIP but it might not work in some cases because of this
restriction.

There is another idea of storing error XID on shmem (e.g., in
ReplicationState) in addition to reporting error details to the stats
collector and using the XID when skipping the transaction, but I'm not
sure whether it's a reliable way.

Anyway, even if subskipxid is automatically set when ALTER
SUBSCRIPTION SKIP, I think we need to provide a way to clear it as the
current patch does (setting NONE) just in case.

In any case, clearing out the entries in the table would not happen while it is applying the replication stream, in an error state or otherwise.

in = while streaming
out = not streaming

1(in). replication stream is working
2(in). replication stream fails; capture error information
3(in->out). stop replication stream; perform rollback on xid
4(out). update pg_stat_subscription_worker to report the failure, including xid of the transaction
5(out). wait for the user to manually restart the replication stream

Do you mean that there always is user intervention after error so the
replication stream can resume?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#468David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Masahiko Sawada (#467)
Re: Skipping logical replication transactions on subscriber side

On Sun, Jan 23, 2022 at 11:55 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Mon, Jan 24, 2022 at 1:49 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Sun, Jan 23, 2022 at 8:35 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

I really dislike the user experience this provides, and given it is

new in v15 (and right now this table seems to exist solely to support this
feature) changing this seems within the realm of possibility. I have to
imagine these workers have a sense of local state that would just be "no
errors, no need to touch pg_stat_subscription_workers at the end of this
transaction's commit". It would save a local state of the error_xid and if
a successfully committed transaction has that xid it would clear the
error. The skip code path would also check for and see the matching xid
value and clear the error. Even if the local state thing doesn't work, one
catalog lookup per transaction seems like potentially reasonable overhead
to incur here.

Are you telling to update the catalog to save error_xid when an error
occurs? If so, that has many challenges like we are not supposed to
perform any such operations when the transaction is in an error state.
We have discussed this and other ideas in the beginning. I don't find
any of your arguments convincing to change the basic approach here but
I would like to see what others think on this matter?

Then how does the table get updated to that state in the first place

since it doesn't know the error details until there is an error?

I think your idea is based on storing error information including XID
is stored in the system catalog. I think that the reasons why we use
the stats collector

I noticed this dynamic while skimming the patch (and also pondering why the
new worker table was not in a catalog chapter) but am only now fully
beginning to appreciate its impact on this discussion.

to store error information including

last_error_xid are (1) as Amit mentioned, it would have many

challenges if updating the catalog when the transaction is in an error
state, and

I'm going on faith right now that this is a problem. But from my prior
outline I hope you can see why I find it surprising. Don't try to update a
catalog while in an error state. Get out of the error state first. e.g.,
A transient "holding pattern" would seem to work. Upon a server restart
the transient state would be forgotten, it would attempt to reapply the
wal, would see the same error, and would then go back into the transient
holding pattern. I do intend to read the other discussion on this
particular topic so a detailed rebuttal, if warranted, can be withheld.

(2) we can store more information such as error messages,
action, etc. other than XID so that users can identify that the
reported error is a conflict error but not other types of error such
as OOM error.

I mentioned only XID because of the focus on SKIP. The other data already
present in that table is ok. Whether we use a catalog or the stats
collector seems irrelevant. If anything the catalog makes more sense -
calling an error message a statistic is a bit of a reach.

Similarly, the same is true
for clearing subskipxid. I agree that it's useful if
pg_subscription.subskipxid is automatically set when executing ALTER
SUBSCRIPTION SKIP but it might not work in some cases because of this
restriction. For these reasons to me, it makes sense to store
subscribers' error information by using the stats collector.

I'm confused - pg_subscription is a catalog, not a stat view. Why is it
affected?

I don't see how point 2 prevents using a system catalog. I accept point 1
as true but will need to read some of the prior discussion to really
understand it.

When it comes to reporting a message to the stats collector, we need

to note that it's not guaranteed that all messages arrive at the stats
collector. Therefore, last_error_xid doesn't not necessarily get
updated after the worker reports an error.

You'll forgive me for not considering this due to its apparent lack of
mention in the documentation [*] and it's arguable classification as a POLA
violation.

[*]
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION

What I do read there seems compatible with the desired user experience.
500ms lag, idle transaction oriented, reset upon unclean shutdown, and
consumers seeing a stable transactional view: none of these seem like
show-stoppers.

Anyway, even if subskipxid is automatically set when ALTER

SUBSCRIPTION SKIP, I think we need to provide a way to clear it as the
current patch does (setting NONE) just in case.

With my suggestion of requiring a matching xid the whole option for
skip_xid = { xid | NONE } remains.

5(out). wait for the user to manually restart the replication stream

Do you mean that there always is user intervention after error so the
replication stream can resume?

That is my working assumption. It doesn't seem like the system would
auto-resume without a DBA doing something (I'll attribute a server crash to
the DBA for convenience).

Apparently I need to read more about how the system works today to
understand how this varies from and integrates with today's user experience.

That said, at present my two dislikes:

1) ALTER SYSTEM SKIP accepts any xid value (I need to consider further the
timing of when this resets to zero)
2) pg_stat_subscription_worker.last_error_* fields remain populated even
while the system is in a normal operating state.

are preventing me from preferring this patch over the status quo (yes, I
know the 2nd point is about a committed feature). Regardless of how far
off I may be regarding our technical ability to change them to a more (IMO)
user-friendly design.

David J.

#469Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: David G. Johnston (#468)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 24, 2022 at 1:30 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

That said, at present my two dislikes:

1) ALTER SYSTEM SKIP accepts any xid value (I need to consider further the timing of when this resets to zero)

I think this is required for future extension of this feature wherein
I think there could be multiple such xids say when we support parallel
apply workers. I think if we get a good way to do it even after the
first version like by making a xid an optional parameter.

--
With Regards,
Amit Kapila.

#470Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David G. Johnston (#468)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 24, 2022 at 5:00 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Sun, Jan 23, 2022 at 11:55 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Similarly, the same is true
for clearing subskipxid.

I'm confused - pg_subscription is a catalog, not a stat view. Why is it affected?

Sorry, I mistook last_error_xid for subskipxid here.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#471Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Amit Kapila (#455)
Re: Skipping logical replication transactions on subscriber side

On 22.01.22 03:54, Amit Kapila wrote:

Won't we already do that for Alter Subscription command which means
nothing special needs to be done for this? However, it seems to me
that the idea we are trying to follow here is that as this option can
lead to data inconsistency, it is good to allow only superusers to
specify this option. The owner of the subscription can be changed to
non-superuser as well in which case I think it won't be a good idea to
allow this option. OTOH, if we think it is okay to allow such an
option to users that don't have superuser privilege then I think
allowing it to the owner of the subscription makes sense to me.

I don't think this functionality allows a nonprivileged user to do
anything they couldn't otherwise do. You can create inconsistent data
in the sense that you can choose not to apply certain replicated data.
But a subscription owner has to have write access to the target tables
of the subscription, so they already have the ability to write or not
write any data they want.

#472Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Amit Kapila (#458)
Re: Skipping logical replication transactions on subscriber side

On 22.01.22 10:41, Amit Kapila wrote:

Additionally, the description for pg_stat_subscription_workers should describe what happens once the transaction represented by last_error_xid has either been successfully processed or skipped. Does this "last error" stick around until another error happens (which is hopefully very rare) or does it reset to blanks?

It will be reset only on subscription drop, otherwise, it will stick
around until another error happens.

Is this going to be a problem with transaction ID wraparound? Do we
need to use 64-bit xids for this?

#473David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Kapila (#469)
Re: Skipping logical replication transactions on subscriber side

On Monday, January 24, 2022, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jan 24, 2022 at 1:30 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

That said, at present my two dislikes:

1) ALTER SYSTEM SKIP accepts any xid value (I need to consider further

the timing of when this resets to zero)

I think this is required for future extension of this feature wherein
I think there could be multiple such xids say when we support parallel
apply workers. I think if we get a good way to do it even after the
first version like by making a xid an optional parameter.

Extending the behavior is doable, and maybe we end up without this
limitation in the future, so be it. But I’m having a hard time imagining a
scenario where the xid is not already known to the system, and the user,
and wants to be in effect for a very short window.

David J.

#474Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#471)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 24, 2022 at 7:36 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 22.01.22 03:54, Amit Kapila wrote:

Won't we already do that for Alter Subscription command which means
nothing special needs to be done for this? However, it seems to me
that the idea we are trying to follow here is that as this option can
lead to data inconsistency, it is good to allow only superusers to
specify this option. The owner of the subscription can be changed to
non-superuser as well in which case I think it won't be a good idea to
allow this option. OTOH, if we think it is okay to allow such an
option to users that don't have superuser privilege then I think
allowing it to the owner of the subscription makes sense to me.

I don't think this functionality allows a nonprivileged user to do
anything they couldn't otherwise do. You can create inconsistent data
in the sense that you can choose not to apply certain replicated data.

I thought this will be the only primary way to skip applying certain
transactions. The other could be via pg_replication_origin_advance().
Or are you talking about the case where we skip applying update/delete
where the corresponding rows are not found?

I see the point that if we can allow the owner to skip applying
updates/deletes in certain cases then probably this should also be
okay. Kindly let us know if you have something else in mind as well?

--
With Regards,
Amit Kapila.

#475Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#472)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 24, 2022 at 7:40 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 22.01.22 10:41, Amit Kapila wrote:

Additionally, the description for pg_stat_subscription_workers should describe what happens once the transaction represented by last_error_xid has either been successfully processed or skipped. Does this "last error" stick around until another error happens (which is hopefully very rare) or does it reset to blanks?

It will be reset only on subscription drop, otherwise, it will stick
around until another error happens.

Is this going to be a problem with transaction ID wraparound?

I think to avoid this we can send a message to clear this (at least to
clear XID in the view) after skipping the xact but there is no
guarantee that it will be received by the stats collector.
Additionally, the worker can periodically (say after every N (100,
500, etc) successful transaction) send a clear message after
successful apply. This will ensure that eventually the error entry
will be cleared.

Do we
need to use 64-bit xids for this?

For 64-bit XIds, as this reported XID is for the remote transactions,
I think we need to add 4-bytes to each transaction message(say Begin)
and that could be costly for small transactions. We also probably need
to make logical decoding aware of 64-bit XID? Note that XIDs in WAL
records are still 32-bit XID. I don't think this feature deserves such
a big (in terms of WAL and network message size) change.

--
With Regards,
Amit Kapila.

#476Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Amit Kapila (#474)
Re: Skipping logical replication transactions on subscriber side

On 25.01.22 03:54, Amit Kapila wrote:

I don't think this functionality allows a nonprivileged user to do
anything they couldn't otherwise do. You can create inconsistent data
in the sense that you can choose not to apply certain replicated data.

I thought this will be the only primary way to skip applying certain
transactions. The other could be via pg_replication_origin_advance().
Or are you talking about the case where we skip applying update/delete
where the corresponding rows are not found?

I see the point that if we can allow the owner to skip applying
updates/deletes in certain cases then probably this should also be
okay. Kindly let us know if you have something else in mind as well?

Let's start this again: The question at hand is whether ALTER
SUBSCRIPTION ... SKIP should be allowed for subscription owners that are
not superusers. The argument raised against that was that this would
allow the owner to create "inconsistent" data. But it hasn't been
explained what that actually means or why it is dangerous.

#477Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Amit Kapila (#475)
Re: Skipping logical replication transactions on subscriber side

On 25.01.22 06:18, Amit Kapila wrote:

I think to avoid this we can send a message to clear this (at least to
clear XID in the view) after skipping the xact but there is no
guarantee that it will be received by the stats collector.
Additionally, the worker can periodically (say after every N (100,
500, etc) successful transaction) send a clear message after
successful apply. This will ensure that eventually the error entry
will be cleared.

Well, I think we need *some* solution for now. We can't leave a footgun
where you say, "skip transaction 700", somehow transaction 700 doesn't
happen, the whole thing gets forgotten, but then 3 months later, the
next transaction 700 mysteriously gets dropped.

#478David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Peter Eisentraut (#477)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 25, 2022 at 5:52 AM Peter Eisentraut <
peter.eisentraut@enterprisedb.com> wrote:

On 25.01.22 06:18, Amit Kapila wrote:

I think to avoid this we can send a message to clear this (at least to
clear XID in the view) after skipping the xact but there is no
guarantee that it will be received by the stats collector.
Additionally, the worker can periodically (say after every N (100,
500, etc) successful transaction) send a clear message after
successful apply. This will ensure that eventually the error entry
will be cleared.

Well, I think we need *some* solution for now. We can't leave a footgun
where you say, "skip transaction 700", somehow transaction 700 doesn't
happen, the whole thing gets forgotten, but then 3 months later, the
next transaction 700 mysteriously gets dropped.

This is indeed part of why I feel that the xid being skipped should be
validated. As the feature is presented the user is supposed to read the
xid from the system (the new stat view or the error log) and supply it and
then the worker, when it goes to skip, should find that the very first
transaction xid it encounters is the one it is being told to skip. It
skips that transaction, clears the skipxid, and puts the system back into
normal operating mode. If that first transaction xid isn't the one being
specified to skip the worker should error with "skipping transaction
failed, xid 123 expected but 456 found".

This whole lack of a guarantee of the availability and accuracy regarding
the data that this process should be reliant upon needs to be engineered
away.

David J.

#479Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David G. Johnston (#478)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 25, 2022 at 11:35 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 5:52 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 25.01.22 06:18, Amit Kapila wrote:

I think to avoid this we can send a message to clear this (at least to
clear XID in the view) after skipping the xact but there is no
guarantee that it will be received by the stats collector.
Additionally, the worker can periodically (say after every N (100,
500, etc) successful transaction) send a clear message after
successful apply. This will ensure that eventually the error entry
will be cleared.

Well, I think we need *some* solution for now. We can't leave a footgun
where you say, "skip transaction 700", somehow transaction 700 doesn't
happen, the whole thing gets forgotten, but then 3 months later, the
next transaction 700 mysteriously gets dropped.

This is indeed part of why I feel that the xid being skipped should be validated. As the feature is presented the user is supposed to read the xid from the system (the new stat view or the error log) and supply it and then the worker, when it goes to skip, should find that the very first transaction xid it encounters is the one it is being told to skip. It skips that transaction, clears the skipxid, and puts the system back into normal operating mode. If that first transaction xid isn't the one being specified to skip the worker should error with "skipping transaction failed, xid 123 expected but 456 found".

Yeah, I think it's a good idea to clear the subskipxid after the first
transaction regardless of whether the worker skipped it.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#480David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Masahiko Sawada (#479)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Yeah, I think it's a good idea to clear the subskipxid after the first
transaction regardless of whether the worker skipped it.

So basically instead of stopping the worker with an error you suggest
having the worker continue applying changes (after resetting subskipxid,
and - arguably - the ?_error_* fields). Log the transaction xid mis-match
as a warning in the log file as opposed to an error.

I was supposing to make it an error and have the worker stop again since in
a system where the xid is verified and the code is bug-free I would expect
the situation to be a "can't happen" one and I'd rather error in that
circumstance than warn. The DBA will have to go and ALTER SUBSCRIPTION
SKIP (xid = NONE) to get the worker working again but I find that
acceptable in this case.

David J.

#481Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David G. Johnston (#480)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I think it's a good idea to clear the subskipxid after the first
transaction regardless of whether the worker skipped it.

So basically instead of stopping the worker with an error you suggest having the worker continue applying changes (after resetting subskipxid, and - arguably - the ?_error_* fields). Log the transaction xid mis-match as a warning in the log file as opposed to an error.

Agreed, I think it's better to log a warning than to raise an error.
In the case where the user specified the wrong XID, the worker should
fail again due to the same error.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#482David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Masahiko Sawada (#481)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 25, 2022 at 8:09 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

Yeah, I think it's a good idea to clear the subskipxid after the first
transaction regardless of whether the worker skipped it.

So basically instead of stopping the worker with an error you suggest

having the worker continue applying changes (after resetting subskipxid,
and - arguably - the ?_error_* fields). Log the transaction xid mis-match
as a warning in the log file as opposed to an error.

Agreed, I think it's better to log a warning than to raise an error.
In the case where the user specified the wrong XID, the worker should
fail again due to the same error.

If it remains possible for the system to accept a wrongly specified XID I
would agree that this behavior is preferable. At least when the user
wonders why the skip didn't work and they are seeing the same error again
they will have a log entry warning telling them their XID choice was
incorrect. I would prefer that the system not accept a wrongly specified
XID and the user be told directly and sooner that their XID choice was
incorrect.

David J.

#483Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David G. Johnston (#482)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 12:14 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 8:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I think it's a good idea to clear the subskipxid after the first
transaction regardless of whether the worker skipped it.

So basically instead of stopping the worker with an error you suggest having the worker continue applying changes (after resetting subskipxid, and - arguably - the ?_error_* fields). Log the transaction xid mis-match as a warning in the log file as opposed to an error.

Agreed, I think it's better to log a warning than to raise an error.
In the case where the user specified the wrong XID, the worker should
fail again due to the same error.

If it remains possible for the system to accept a wrongly specified XID I would agree that this behavior is preferable. At least when the user wonders why the skip didn't work and they are seeing the same error again they will have a log entry warning telling them their XID choice was incorrect.

Yes.

I would prefer that the system not accept a wrongly specified XID and the user be told directly and sooner that their XID choice was incorrect.

Given that we cannot use rely on the pg_stat_subscription_workers view
for this purpose, we would need either a new sub-system that tracks
each logical replication status so the system can set the error XID to
subskipxid, or to wait for shared-memory based stats collector. While
agreeing that ideally, we need such a sub-system I'm concerned that
everyone will agree to add complexity for this feature. That having
been said, if there is a significant need for it, we can implement it
as an improvement.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#484David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Masahiko Sawada (#483)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 25, 2022 at 8:33 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Given that we cannot use rely on the pg_stat_subscription_workers view
for this purpose, we would need either a new sub-system that tracks
each logical replication status so the system can set the error XID to
subskipxid, or to wait for shared-memory based stats collector.

I'm reading over the monitoring-stats page to try and get my head around
all of this. First of all, it defines two kinds of views:

1. PostgreSQL's statistics collector is a subsystem that supports
collection and reporting of information about server activity.
2. PostgreSQL also supports reporting dynamic information ... This facility
is independent of the collector process.

In then has two tables:

28.1 Dynamic Statistics Views (describing #2 above)
28.2 Collected Statistics Views (describing #1 above)

Apparently the "collector process" is UDP-like, not reliable. The
documentation fails to mention this fact. I'd argue that this is a
documentation bug.

I do see that the pg_stat_subscription_workers view is correctly placed in
Table 28.2

Reviewing the other views listed in that table only pg_stat_archiver abuses
the statistics collector in a similar fashion. All of the others are
actually metric oriented.

I don't care for the specification: "will contain one row per subscription
worker on which errors have occurred, for workers applying logical
replication changes and workers handling the initial data copy of the
subscribed tables."

I would much rather have this behave similar to pg_stat_activity (which, of
course, is a Dynamic Statistics View...) in that it shows only and all
workers that are presently working. The tablesync workers should go away
when they have finished synchronizing. I should not have to manually
intervene to get rid of unreliable expired data. The log file feels like a
superior solution to this monitoring view.

Alternatively, if the tablesync workers are done but we've been
accumulating real statistics for them, then by all means keep them included
in the view - but regardless of whether they encountered an error. But
maybe the view can right join in pg_stat_subscription as show a column for
"(pid is not null) AS is_active".

Maybe we need to add a track_finished_tablesync_workers GUC so the DBA can
decide whether to devote storage and processing resources to that
historical information.

If you had kept the original view name, "pg_stat_subscription_error", this
whole issue goes away. But you decided to make it more generic and call it
"pg_stat_subscription_workers" - which means you need to get rid of the
error-specific condition in the WHERE clause for the view. Show all
workers - I can filter on is_active. Showing only active workers is also
acceptable. You won't get to change your mind so decide whether this wants
to show only current and running state or whether historical statistics for
now defunct tablesync workers are desired. Personally, I would just show
active workers and if someone wants to add the feature they can add a
track_tablesync_worker_stats GUC and a matching view.

From that, every apply worker should be sending a statistics message to the
collector periodically. If error info is not present and the state is "all
is well", clear out any existing error info from the view. The attempt to
include an actual statistic field here doesn't seem useful nor redeeming.
I would add a "state" field in its place (well, after subrelid). And I
would still rename the columns to current_error_* and note that these
should be null unless the status field shows error (there may be some
additional complexity here). Just get rid of last_error_count.

David J.

P.S. I saw the discussion regarding pg_dump'ing the subskipid field. I
didn't notice any discussion around creating and restoring a basebackup.
It seems like during server startup subskipid should just be cleared out.
Then it doesn't matter what one does during backup.

#485Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#476)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 25, 2022 at 6:18 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 25.01.22 03:54, Amit Kapila wrote:

I don't think this functionality allows a nonprivileged user to do
anything they couldn't otherwise do. You can create inconsistent data
in the sense that you can choose not to apply certain replicated data.

I thought this will be the only primary way to skip applying certain
transactions. The other could be via pg_replication_origin_advance().
Or are you talking about the case where we skip applying update/delete
where the corresponding rows are not found?

I see the point that if we can allow the owner to skip applying
updates/deletes in certain cases then probably this should also be
okay. Kindly let us know if you have something else in mind as well?

Let's start this again: The question at hand is whether ALTER
SUBSCRIPTION ... SKIP should be allowed for subscription owners that are
not superusers. The argument raised against that was that this would
allow the owner to create "inconsistent" data. But it hasn't been
explained what that actually means or why it is dangerous.

There are two reasons in my mind: (a) We are going to skip some
unrelated data changes that are not the direct cause of conflict
because of the entire transaction skip. Now, it is possible that
unintentionally it allows skipping some actual changes
insert/update/delete/truncate to some relations which will then allow
even the future changes to cause some conflict or won't get applied. A
few examples are after TRUNCATE is skipped, the INSERTS in following
transactions can cause error "duplicate key .."; similarly say some
INSERT is skipped, then following UPDATE/DELETE won't find the
corresponding row to perform the operation. (b) Users can specify some
random XID, the discussion below is trying to detect this and raise
WARNING/ERROR but still, it could cause some valid transaction (which
won't generate any conflict/error) to skip.

These can lead to some missing data in the subscriber which the user
might not have expected.

--
With Regards,
Amit Kapila.

#486David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#468)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jan 24, 2022 at 12:59 AM David G. Johnston <
david.g.johnston@gmail.com> wrote:

5(out). wait for the user to manually restart the replication stream

Do you mean that there always is user intervention after error so the
replication stream can resume?

That is my working assumption. It doesn't seem like the system would
auto-resume without a DBA doing something (I'll attribute a server crash to
the DBA for convenience).

Apparently I need to read more about how the system works today to
understand how this varies from and integrates with today's user experience.

I've done some code reading. My understanding is that a background worker
for the main apply of a given subscription is created from the launcher
code (not reviewed) which is initialized at server startup (or as needed
sometime thereafter). This goes into a for(;;) loop in LogicalRepApplyLoop
under a PG_TRY in ApplyWorkerMain. When a message is applied that provokes
an error the PG_CATCH() in ApplyWorkerMain takes over and then this worker
dies. While in that PG_CATCH() we have an aborted transaction and so are
limited in what we can change. We PG_RE_THROW(); back to the background
worker infrastructure and let it perform logging and cleanup; which
includes this destroying this instance of the background worker. The
background worker that is destroyed is replaced and its replacement is
identical to the original so far as the statistics collector is concerned.

I haven't traced out when the replacement apply worker gets recreated. It
seems like doing so immediately, and then it going and just encountering
the same error, would be an undesirable choice, and so I've assumed it does
not. But I also wasn't expecting the apply worker to PG_RE_THROW() either,
but instead continue on running in a different for(;;) loop waiting for
some signal from the system that something has changed that may avoid the
error that put it in timeout.

So my more detailed goal would be to get rid of PG_RE_THROW(); (I assume
doing so would entail transaction rollback) and stay in the worker. Update
pg_subscription with the error information (having removed PG_RE_THROW we
have new things to consider re: pg_stat_subscription_workers). Go into a
for(;;) loop, maybe polling pg_subscription for an indication that it is OK
to retry applying the last transaction. (can an inter-process signal be
sent from a normal backend process to a background worker process?). The
SKIP command then matches XID values on pg_subscription; the resumption
sees the subskipxid, updates pg_subscription to remove the error info and
subskipid, skips the next transaction assuming it has the matching XID, and
then continues applying as normal. Adapt to deal with crash conditions as
needed though clearing before reapplying seems like a safe default. Again,
upon worker startup maybe they should be cleared too (making pg_dump and
other backup considerations moot - as noted in my P.S. in the previous
email).

I'm not sure we are paranoid enough regarding the locking of
pg_subscription for purposes of reading and writing subskipxid. I'd
probably rather serialize access to it, and maybe even not allow changing
from one non-zero XID to another non-zero XID. It shouldn't be needed in
practice (moreso if the XID has to be the one that is present from
current_error_xid) and the user can always reset first.

In worker.c I was and still am confused as to the meaning of 'c' and 'w' in
LogicalRepApplyLoop. In apply_dispatch in that file enums are used to
compare against the message byte, it would be helpful for the inexperienced
reader if 'c' and 'w' were done as enums instead as well.

David J.

#487Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#481)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 25, 2022 at 8:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I think it's a good idea to clear the subskipxid after the first
transaction regardless of whether the worker skipped it.

So basically instead of stopping the worker with an error you suggest having the worker continue applying changes (after resetting subskipxid, and - arguably - the ?_error_* fields). Log the transaction xid mis-match as a warning in the log file as opposed to an error.

Agreed, I think it's better to log a warning than to raise an error.
In the case where the user specified the wrong XID, the worker should
fail again due to the same error.

IIUC, the proposal is to compare the skip_xid with the very
transaction the apply worker received to apply and raise a warning if
it doesn't match with skip_xid and then continue. This seems like a
reasonable idea but can we guarantee that it is always the first
transaction that we want to skip? We seem to guarantee that we won't
get something again once it is written durably/flushed on the
subscriber side. I guess here it can happen that before the errored
transaction, there is some empty xact, or maybe part of the stream
(consider streaming transactions) of some xact, or there could be
other cases as well where the server will send those xacts again.

Now, if the above reasoning is correct then I think your proposal to
clear the skip_xid in the catalog as soon as we have applied the first
transaction successfully seems reasonable to me.

--
With Regards,
Amit Kapila.

#488Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: David G. Johnston (#486)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 7:31 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Mon, Jan 24, 2022 at 12:59 AM David G. Johnston <david.g.johnston@gmail.com> wrote:

So my more detailed goal would be to get rid of PG_RE_THROW();

I don't think that will be possible, consider the FATAL/PANIC error
case. Also, there are reasons why we always restart apply worker on
ERROR even without this work. If we want to change that, we might need
to redesign the apply side mechanism which I don't think we should try
to do as part of this patch.

--
With Regards,
Amit Kapila.

#489Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#487)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 25, 2022 at 8:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I think it's a good idea to clear the subskipxid after the first
transaction regardless of whether the worker skipped it.

So basically instead of stopping the worker with an error you suggest having the worker continue applying changes (after resetting subskipxid, and - arguably - the ?_error_* fields). Log the transaction xid mis-match as a warning in the log file as opposed to an error.

Agreed, I think it's better to log a warning than to raise an error.
In the case where the user specified the wrong XID, the worker should
fail again due to the same error.

IIUC, the proposal is to compare the skip_xid with the very
transaction the apply worker received to apply and raise a warning if
it doesn't match with skip_xid and then continue. This seems like a
reasonable idea but can we guarantee that it is always the first
transaction that we want to skip? We seem to guarantee that we won't
get something again once it is written durably/flushed on the
subscriber side. I guess here it can happen that before the errored
transaction, there is some empty xact, or maybe part of the stream
(consider streaming transactions) of some xact, or there could be
other cases as well where the server will send those xacts again.

Good point.

I guess that in the situation the worker entered an error loop, we can
guarantee that the worker fails while applying the first non-empty
transaction since starting logical replication. And the transaction is
what we’d like to skip. If the transaction that can be applied without
an error is resent after a restart, it’s a problem of logical
replication. As you pointed out, it's possible that there are some
empty transactions before the transaction in question since we don't
advance replication origin LSN if the transaction is empty. Also,
probably the same is true for a streamed transaction that is rolled
back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
subskipxid if the transaction is empty? That is, we make sure to clear
it after applying the first non-empty transaction. We would need to
carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
ends up not working at all in some cases.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#490Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#489)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 11:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jan 25, 2022 at 8:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I think it's a good idea to clear the subskipxid after the first
transaction regardless of whether the worker skipped it.

So basically instead of stopping the worker with an error you suggest having the worker continue applying changes (after resetting subskipxid, and - arguably - the ?_error_* fields). Log the transaction xid mis-match as a warning in the log file as opposed to an error.

Agreed, I think it's better to log a warning than to raise an error.
In the case where the user specified the wrong XID, the worker should
fail again due to the same error.

IIUC, the proposal is to compare the skip_xid with the very
transaction the apply worker received to apply and raise a warning if
it doesn't match with skip_xid and then continue. This seems like a
reasonable idea but can we guarantee that it is always the first
transaction that we want to skip? We seem to guarantee that we won't
get something again once it is written durably/flushed on the
subscriber side. I guess here it can happen that before the errored
transaction, there is some empty xact, or maybe part of the stream
(consider streaming transactions) of some xact, or there could be
other cases as well where the server will send those xacts again.

Good point.

I guess that in the situation the worker entered an error loop, we can
guarantee that the worker fails while applying the first non-empty
transaction since starting logical replication. And the transaction is
what we’d like to skip. If the transaction that can be applied without
an error is resent after a restart, it’s a problem of logical
replication. As you pointed out, it's possible that there are some
empty transactions before the transaction in question since we don't
advance replication origin LSN if the transaction is empty. Also,
probably the same is true for a streamed transaction that is rolled
back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
subskipxid if the transaction is empty? That is, we make sure to clear
it after applying the first non-empty transaction. We would need to
carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
ends up not working at all in some cases.

Probably, we also need to consider the case where the tablesync worker
entered an error loop and the user wants to skip the transaction? The
apply worker is also running at the same time but it should not clear
subskipxid. Similarly, the tablesync worker should not clear
subskipxid if the apply worker wants to skip the transaction.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#491Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#490)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 8:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 26, 2022 at 11:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

IIUC, the proposal is to compare the skip_xid with the very
transaction the apply worker received to apply and raise a warning if
it doesn't match with skip_xid and then continue. This seems like a
reasonable idea but can we guarantee that it is always the first
transaction that we want to skip? We seem to guarantee that we won't
get something again once it is written durably/flushed on the
subscriber side. I guess here it can happen that before the errored
transaction, there is some empty xact, or maybe part of the stream
(consider streaming transactions) of some xact, or there could be
other cases as well where the server will send those xacts again.

Good point.

I guess that in the situation the worker entered an error loop, we can
guarantee that the worker fails while applying the first non-empty
transaction since starting logical replication. And the transaction is
what we’d like to skip. If the transaction that can be applied without
an error is resent after a restart, it’s a problem of logical
replication. As you pointed out, it's possible that there are some
empty transactions before the transaction in question since we don't
advance replication origin LSN if the transaction is empty. Also,
probably the same is true for a streamed transaction that is rolled
back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
subskipxid if the transaction is empty? That is, we make sure to clear
it after applying the first non-empty transaction. We would need to
carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
ends up not working at all in some cases.

I think it is okay to clear after the first successful application of
any transaction. What I was not sure was about the idea of giving
WARNING/ERROR if the first xact to be applied is not the same as
skip_xid.

Probably, we also need to consider the case where the tablesync worker
entered an error loop and the user wants to skip the transaction? The
apply worker is also running at the same time but it should not clear
subskipxid. Similarly, the tablesync worker should not clear
subskipxid if the apply worker wants to skip the transaction.

I think for tablesync workers, the skip_xid set via this mechanism
won't work as we don't have any remote_xid for them, and neither any
XID is reported in the view for them.

--
With Regards,
Amit Kapila.

#492Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#491)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 26, 2022 at 8:55 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 26, 2022 at 11:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

IIUC, the proposal is to compare the skip_xid with the very
transaction the apply worker received to apply and raise a warning if
it doesn't match with skip_xid and then continue. This seems like a
reasonable idea but can we guarantee that it is always the first
transaction that we want to skip? We seem to guarantee that we won't
get something again once it is written durably/flushed on the
subscriber side. I guess here it can happen that before the errored
transaction, there is some empty xact, or maybe part of the stream
(consider streaming transactions) of some xact, or there could be
other cases as well where the server will send those xacts again.

Good point.

I guess that in the situation the worker entered an error loop, we can
guarantee that the worker fails while applying the first non-empty
transaction since starting logical replication. And the transaction is
what we’d like to skip. If the transaction that can be applied without
an error is resent after a restart, it’s a problem of logical
replication. As you pointed out, it's possible that there are some
empty transactions before the transaction in question since we don't
advance replication origin LSN if the transaction is empty. Also,
probably the same is true for a streamed transaction that is rolled
back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
subskipxid if the transaction is empty? That is, we make sure to clear
it after applying the first non-empty transaction. We would need to
carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
ends up not working at all in some cases.

I think it is okay to clear after the first successful application of
any transaction. What I was not sure was about the idea of giving
WARNING/ERROR if the first xact to be applied is not the same as
skip_xid.

Do you prefer not to do anything in this case?

Probably, we also need to consider the case where the tablesync worker
entered an error loop and the user wants to skip the transaction? The
apply worker is also running at the same time but it should not clear
subskipxid. Similarly, the tablesync worker should not clear
subskipxid if the apply worker wants to skip the transaction.

I think for tablesync workers, the skip_xid set via this mechanism
won't work as we don't have any remote_xid for them, and neither any
XID is reported in the view for them.

If the tablesync worker raises an error while applying changes after
finishing the copy, it also reports the error XID.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#493Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David G. Johnston (#484)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 7:05 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 8:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Given that we cannot use rely on the pg_stat_subscription_workers view
for this purpose, we would need either a new sub-system that tracks
each logical replication status so the system can set the error XID to
subskipxid, or to wait for shared-memory based stats collector.

I'm reading over the monitoring-stats page to try and get my head around all of this. First of all, it defines two kinds of views:

1. PostgreSQL's statistics collector is a subsystem that supports collection and reporting of information about server activity.
2. PostgreSQL also supports reporting dynamic information ... This facility is independent of the collector process.

In then has two tables:

28.1 Dynamic Statistics Views (describing #2 above)
28.2 Collected Statistics Views (describing #1 above)

Apparently the "collector process" is UDP-like, not reliable. The documentation fails to mention this fact. I'd argue that this is a documentation bug.

I do see that the pg_stat_subscription_workers view is correctly placed in Table 28.2

Reviewing the other views listed in that table only pg_stat_archiver abuses the statistics collector in a similar fashion. All of the others are actually metric oriented.

I don't care for the specification: "will contain one row per subscription worker on which errors have occurred, for workers applying logical replication changes and workers handling the initial data copy of the subscribed tables."

I would much rather have this behave similar to pg_stat_activity (which, of course, is a Dynamic Statistics View...) in that it shows only and all workers that are presently working.

I have no objection against having a dynamic statistics view showing
the status of each running worker but I think it should be implemented
in a separate view and not be something that replaces the
pg_stat_subscription_workers. I think pg_stat_subscription would be
the right place for it.

The tablesync workers should go away when they have finished synchronizing. I should not have to manually intervene to get rid of unreliable expired data. The log file feels like a superior solution to this monitoring view.

Alternatively, if the tablesync workers are done but we've been accumulating real statistics for them, then by all means keep them included in the view - but regardless of whether they encountered an error. But maybe the view can right join in pg_stat_subscription as show a column for "(pid is not null) AS is_active".

Maybe we need to add a track_finished_tablesync_workers GUC so the DBA can decide whether to devote storage and processing resources to that historical information.

If you had kept the original view name, "pg_stat_subscription_error", this whole issue goes away. But you decided to make it more generic and call it "pg_stat_subscription_workers" - which means you need to get rid of the error-specific condition in the WHERE clause for the view. Show all workers - I can filter on is_active. Showing only active workers is also acceptable. You won't get to change your mind so decide whether this wants to show only current and running state or whether historical statistics for now defunct tablesync workers are desired. Personally, I would just show active workers and if someone wants to add the feature they can add a track_tablesync_worker_stats GUC and a matching view.

We plan to clear/remove table sync entries who finished synchronization.

It’s better not to merge dynamic statistics such as pid and is_active
and accumulative statistics into one view. I think we can have both
views: pg_stat_subscription_workers view with some changes based on
the review comments (e.g., removing defunct tablesync entry), and
another view showing dynamic statistics such as the worker status.

From that, every apply worker should be sending a statistics message to the collector periodically. If error info is not present and the state is "all is well", clear out any existing error info from the view. The attempt to include an actual statistic field here doesn't seem useful nor redeeming. I would add a "state" field in its place (well, after subrelid). And I would still rename the columns to current_error_* and note that these should be null unless the status field shows error (there may be some additional complexity here). Just get rid of last_error_count.

I don't think that using the stats collector to show the current
status of each worker is a good idea because of 500ms lag, UDP
connection etc. Even if error info is not present and the state is
good according to the view, it might be out-of-date or simply not
true. If we want to do that, it’s much better to prepare something on
shmem so each worker can store its status (running or error, error
xid, etc.) and have pg_stat_subscription (or another view) show the
information. One thing we need to consider is that it needs to leave
the status even after exiting apply/tablesync worker but we don't know
how many statuses for workers we need to allocate on the shmem at
startup time.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#494Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#492)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 9:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 26, 2022 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think it is okay to clear after the first successful application of
any transaction. What I was not sure was about the idea of giving
WARNING/ERROR if the first xact to be applied is not the same as
skip_xid.

Do you prefer not to do anything in this case?

I am fine with clearing the skip_xid after the first successful
application. But note, we shouldn't do catalog access for this, we can
check if it is set in MySubscription.

Probably, we also need to consider the case where the tablesync worker
entered an error loop and the user wants to skip the transaction? The
apply worker is also running at the same time but it should not clear
subskipxid. Similarly, the tablesync worker should not clear
subskipxid if the apply worker wants to skip the transaction.

I think for tablesync workers, the skip_xid set via this mechanism
won't work as we don't have any remote_xid for them, and neither any
XID is reported in the view for them.

If the tablesync worker raises an error while applying changes after
finishing the copy, it also reports the error XID.

Right and agreed with your assessment for the same.

--
With Regards,
Amit Kapila.

#495David G. Johnston
David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Kapila (#494)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jan 25, 2022 at 9:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 26, 2022 at 9:36 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Wed, Jan 26, 2022 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

Probably, we also need to consider the case where the tablesync

worker

entered an error loop and the user wants to skip the transaction? The
apply worker is also running at the same time but it should not clear
subskipxid. Similarly, the tablesync worker should not clear
subskipxid if the apply worker wants to skip the transaction.

I think for tablesync workers, the skip_xid set via this mechanism
won't work as we don't have any remote_xid for them, and neither any
XID is reported in the view for them.

If the tablesync worker raises an error while applying changes after
finishing the copy, it also reports the error XID.

Right and agreed with your assessment for the same.

IIUC each tablesync process also performs an apply stage but only applies
the messages related to the single table it is responsible for. Once all
tablesync workers synchronize they are all destroyed and the main apply
worker takes over and applies transactions to all subscribed tables.

We probably should just provide an option for the user to specify
"subrelid". If null, only the main apply worker will skip the given xid,
otherwise only the worker tasked with syncing that particular table will do
so. It might take a sequence of ALTER SUBSCRIPTION SET commands to get a
broken initial table synchronization to load completely but at least there
will not be any surprises as to which tables had transactions skipped and
which did not.

It may even make sense, eventually for the main apply worker to skip on a
subrelid basis. Since the main apply worker isn't applying transactions at
the same time as the tablesync workers the non-null subrelid can also be
interpreted by the main apply worker.

David J.

#496Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David G. Johnston (#495)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 1:43 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Tue, Jan 25, 2022 at 9:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 26, 2022 at 9:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 26, 2022 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Probably, we also need to consider the case where the tablesync worker
entered an error loop and the user wants to skip the transaction? The
apply worker is also running at the same time but it should not clear
subskipxid. Similarly, the tablesync worker should not clear
subskipxid if the apply worker wants to skip the transaction.

I think for tablesync workers, the skip_xid set via this mechanism
won't work as we don't have any remote_xid for them, and neither any
XID is reported in the view for them.

If the tablesync worker raises an error while applying changes after
finishing the copy, it also reports the error XID.

Right and agreed with your assessment for the same.

IIUC each tablesync process also performs an apply stage but only applies the messages related to the single table it is responsible for. Once all tablesync workers synchronize they are all destroyed and the main apply worker takes over and applies transactions to all subscribed tables.

We probably should just provide an option for the user to specify "subrelid". If null, only the main apply worker will skip the given xid, otherwise only the worker tasked with syncing that particular table will do so. It might take a sequence of ALTER SUBSCRIPTION SET commands to get a broken initial table synchronization to load completely but at least there will not be any surprises as to which tables had transactions skipped and which did not.

That would work but I’m concerned that the users can specify it
properly. Also, we would need to change the errcontext message
generated by apply_error_callback() so the user can know that the
error occurred in either apply worker or tablesync worker.

Or, as another idea, since an error during table synchronization is
not common and could be resolved by truncating the table and
restarting the synchronization in practice, there might be no need
this much and we can support it only for apply worker errors.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#497Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#496)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 12:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 26, 2022 at 1:43 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

We probably should just provide an option for the user to specify "subrelid". If null, only the main apply worker will skip the given xid, otherwise only the worker tasked with syncing that particular table will do so. It might take a sequence of ALTER SUBSCRIPTION SET commands to get a broken initial table synchronization to load completely but at least there will not be any surprises as to which tables had transactions skipped and which did not.

That would work but I’m concerned that the users can specify it
properly. Also, we would need to change the errcontext message
generated by apply_error_callback() so the user can know that the
error occurred in either apply worker or tablesync worker.

Or, as another idea, since an error during table synchronization is
not common and could be resolved by truncating the table and
restarting the synchronization in practice, there might be no need
this much and we can support it only for apply worker errors.

Yes, that is what I have also in mind. We can always extend this
feature for tablesync process because it can not only fail for the
specified skip_xid but also for many other reasons during the initial
copy.

--
With Regards,
Amit Kapila.

#498Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#497)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jan 26, 2022 at 8:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jan 26, 2022 at 12:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 26, 2022 at 1:43 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

We probably should just provide an option for the user to specify "subrelid". If null, only the main apply worker will skip the given xid, otherwise only the worker tasked with syncing that particular table will do so. It might take a sequence of ALTER SUBSCRIPTION SET commands to get a broken initial table synchronization to load completely but at least there will not be any surprises as to which tables had transactions skipped and which did not.

That would work but I’m concerned that the users can specify it
properly. Also, we would need to change the errcontext message
generated by apply_error_callback() so the user can know that the
error occurred in either apply worker or tablesync worker.

Or, as another idea, since an error during table synchronization is
not common and could be resolved by truncating the table and
restarting the synchronization in practice, there might be no need
this much and we can support it only for apply worker errors.

Yes, that is what I have also in mind. We can always extend this
feature for tablesync process because it can not only fail for the
specified skip_xid but also for many other reasons during the initial
copy.

I'll update the patch accordingly to test and verify this approach.

In the meantime, I’d like to discuss the possible ideas of storing the
error XID somewhere the worker can see it even after a restart. It has
been proposed that the worker updates the catalog when an error
occurs, which was criticized as updating the catalog in such a
situation is not a good idea.

The next idea I considered was to store the error XID somewhere on
shmem (e.g., ReplicationState). But It requires entries at least as
much as subscriptions in principle, not
max_logical_replcation_workers. Since we don’t know it at startup
time, we need to use DSM or cache with a fixed number of entries. It
seems overkill to me.

The third idea, which is slightly better than others, is to update the
catalog by the launcher process, not the worker process; when an error
occurs, the apply worker stores the error XID (and maybe its
subscription OID) into its LogicalRepWorker entry, and the launcher
updates the corresponding entry of pg_subscription catalog before
launching workers. After the worker restarts, it clears the error XID
on the catalog if it successfully applied the transaction with the
error XID. The user can enable the skipping transaction behavior by a
query say ALTER SUBSCRIPTION SKIP ENABLED. The user cannot enable the
skipping behavior if the error XID is not set. If the skipping
behavior is enabled and the error XID is a valid value, the worker
skips the transaction and then clears both the error XID and a flag of
skipping behavior on the catalog.

With this idea, we don’t need a complex mechanism to store the error
XID for each subscription and can ensure to skip only the transaction
in question. But my concern is that the launcher updates the catalog.
Since it doesn’t connect to any database, probably it cannot open the
catalog indexes (because it requires lookup pg_class). Therefore, we
have to use in-place updates here. Through quick tests, I’ve confirmed
that using heap_inplace_update() to update the error XID on
pg_subscription tuples seems to work but not sure using an in-place
update here is a legitimate approach.

What do you think and any ideas?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#499Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Masahiko Sawada (#492)
Re: Skipping logical replication transactions on subscriber side

On 26.01.22 05:05, Masahiko Sawada wrote:

I think it is okay to clear after the first successful application of
any transaction. What I was not sure was about the idea of giving
WARNING/ERROR if the first xact to be applied is not the same as
skip_xid.

Do you prefer not to do anything in this case?

I think a warning would be sensible. If the user specifies to skip a
certain transaction and then that doesn't happen, we should at least say
something.

#500Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#499)
2 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jan 27, 2022 at 10:42 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 26.01.22 05:05, Masahiko Sawada wrote:

I think it is okay to clear after the first successful application of
any transaction. What I was not sure was about the idea of giving
WARNING/ERROR if the first xact to be applied is not the same as
skip_xid.

Do you prefer not to do anything in this case?

I think a warning would be sensible. If the user specifies to skip a
certain transaction and then that doesn't happen, we should at least say
something.

Meanwhile waiting for comments on the discussion about the designs of
both pg_stat_subscription_workers and ALTER SUBSCRIPTION SKIP feature,
I’ve incorporated some (minor) comments on the current design patch,
which includes:

* Use LSN instead of XID.
* Raise a warning if the user specifies to skip a certain transaction
and then that doesn’t happen.
* Skip-LSN has an effect on the first non-empty transaction. That is,
it’s cleared after successfully committing a non-empty transaction,
preventing the user-specified wrong LSN to remain.
* Remove some unnecessary tap tests to reduce the test time.

I think we all agree with the first point regardless of where we store
error information. And speaking of the current design, I think we all
agree on other points. Since the design discussion is ongoing, I’ll
incorporate other comments according to the result of the discussion.

The attached 0001 patch modifies the pg_stat_subscription_workers to
report LSN instead of XID, which is required by ALTER SUBSCRIPTION
SKIP patch, the 0002 patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v11-0001-Report-error-transaction-s-commit-LSN-instead-of.patchapplication/octet-stream; name=v11-0001-Report-error-transaction-s-commit-LSN-instead-of.patch
v11-0002-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patchapplication/octet-stream; name=v11-0002-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
#501Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#500)
Re: Skipping logical replication transactions on subscriber side

On Fri, Feb 11, 2022 at 7:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Jan 27, 2022 at 10:42 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 26.01.22 05:05, Masahiko Sawada wrote:

I think it is okay to clear after the first successful application of
any transaction. What I was not sure was about the idea of giving
WARNING/ERROR if the first xact to be applied is not the same as
skip_xid.

Do you prefer not to do anything in this case?

I think a warning would be sensible. If the user specifies to skip a
certain transaction and then that doesn't happen, we should at least say
something.

Meanwhile waiting for comments on the discussion about the designs of
both pg_stat_subscription_workers and ALTER SUBSCRIPTION SKIP feature,
I’ve incorporated some (minor) comments on the current design patch,
which includes:

* Use LSN instead of XID.

I think exposing LSN is a better approach as it doesn't have the
dangers of wraparound. And, I think users can use it with the existing
function pg_replication_origin_advance() which will save us from
adding additional code for this feature. We can explain/expand in docs
how users can use the error information from view/error_logs and use
the existing function to skip conflicting transactions. We might want
to even expose error_origin to make it a bit easier for users but not
sure. I feel the need for the new syntax (and then added code
complexity due to that) isn't warranted if we expose error_LSN and let
users use it with the existing functions.

Do you see any problem with the same?

--
With Regards,
Amit Kapila.

#502Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Amit Kapila (#501)
Re: Skipping logical replication transactions on subscriber side

On 14.02.22 10:16, Amit Kapila wrote:

I think exposing LSN is a better approach as it doesn't have the
dangers of wraparound. And, I think users can use it with the existing
function pg_replication_origin_advance() which will save us from
adding additional code for this feature. We can explain/expand in docs
how users can use the error information from view/error_logs and use
the existing function to skip conflicting transactions. We might want
to even expose error_origin to make it a bit easier for users but not
sure. I feel the need for the new syntax (and then added code
complexity due to that) isn't warranted if we expose error_LSN and let
users use it with the existing functions.

Well, the whole point of this feature is to provide a higher-level
interface instead of pg_replication_origin_advance(). Replication
origins are currently not something the users have to deal with
directly. We already document that you can use
pg_replication_origin_advance() to skip erroring transactions. But that
seems unsatisfactory. It'd be like using pg_surgery to fix unique
constraint violations.

#503Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Eisentraut (#502)
3 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Tue, Feb 15, 2022 at 7:35 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 14.02.22 10:16, Amit Kapila wrote:

I think exposing LSN is a better approach as it doesn't have the
dangers of wraparound. And, I think users can use it with the existing
function pg_replication_origin_advance() which will save us from
adding additional code for this feature. We can explain/expand in docs
how users can use the error information from view/error_logs and use
the existing function to skip conflicting transactions. We might want
to even expose error_origin to make it a bit easier for users but not
sure. I feel the need for the new syntax (and then added code
complexity due to that) isn't warranted if we expose error_LSN and let
users use it with the existing functions.

Well, the whole point of this feature is to provide a higher-level
interface instead of pg_replication_origin_advance(). Replication
origins are currently not something the users have to deal with
directly. We already document that you can use
pg_replication_origin_advance() to skip erroring transactions. But that
seems unsatisfactory. It'd be like using pg_surgery to fix unique
constraint violations.

+1

I’ve considered a plan for the skipping logical replication
transaction feature toward PG15. Several ideas and patches have been
proposed here and another related thread[1]/messages/by-id/20220125063131.4cmvsxbz2tdg6g65@alap3.anarazel.de[2]/messages/by-id/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com for the skipping
logical replication transaction feature as follows:

A. Change pg_stat_subscription_workers (committed 7a8507329085)
B. Add origin name and commit-LSN to logical replication worker
errcontext (proposed[2]/messages/by-id/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com)
C. Store error information (e.g., the error message and commit-LSN) to
the system catalog
D. Introduce ALTER SUBSCRIPTION SKIP
E. Record the skipped data somewhere: server logs or a table

Given the remaining time for PG15, it’s unlikely to complete all of
them for PG15 by the feature freeze. The most realistic plan for PG15
in my mind is to complete B and D. With these two items, the LSN of
the error-ed transaction is shown in the server log, and we can ask
users to check server logs for the LSN and use it with ALTER
SUBSCRIPTION SKIP command. If the community agrees with B+D, we will
have a user-visible feature for PG15 which can be further
extended/improved in PG16 by adding C and E. I started a new thread[2]/messages/by-id/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com
for B yesterday. In this thread, I'd like to discuss D.

I've attached an updated patch for D and here is the summary:

* Introduce a new command ALTER SUBSCRIPTION ... SKIP (lsn =
'0/1234'). The user can get the commit-LSN of the transaction in
question from the server logs thanks to B[2]/messages/by-id/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com.
* The user-specified LSN (say skip-LSN) is stored in the
pg_subscription catalog.
* The apply worker skips the whole transaction if the transaction's
commit-LSN exactly matches to skip-LSN.
* The skip-LSN has an effect on only the first non-empty transaction
since the worker started to apply changes. IOW it's cleared after
either skipping the whole transaction or successfully committing a
non-empty transaction, preventing the skip-LSN to remain in the
catalog. Also, since the latter case means that the user set the wrong
skip-LSN we clear it with a warning.
* ALTER SUBSCRIPTION SKIP doesn't support tablesync workers. But it
would not be a problem in practice since an error during table
synchronization is not common and could be resolved by truncating the
table and restarting the synchronization.

For the above reasons, ALTER SUBSCRIPTION SKIP command is safer than
the existing way of using pg_replication_origin_advance().

I've attached an updated patch along with two patches for cfbot tests
since the main patch (0003) depends on the other two patches. Both
0001 and 0002 patches are the same ones I attached on another
thread[2]/messages/by-id/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com.

Regards,

[1]: /messages/by-id/20220125063131.4cmvsxbz2tdg6g65@alap3.anarazel.de
[2]: /messages/by-id/CAD21AoBarBf2oTF71ig2g_o=3Z_Dt6_sOpMQma1kFgbnA5OZ_w@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patchapplication/octet-stream; name=v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
v12-0001-Use-complete-sentences-in-logical-replication-wo.patchapplication/octet-stream; name=v12-0001-Use-complete-sentences-in-logical-replication-wo.patch
v12-0002-Add-the-origin-name-and-remote-commit-LSN-to-log.patchapplication/octet-stream; name=v12-0002-Add-the-origin-name-and-remote-commit-LSN-to-log.patch
#504Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#503)
Re: Skipping logical replication transactions on subscriber side

On Tue, Mar 1, 2022 at 8:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I’ve considered a plan for the skipping logical replication
transaction feature toward PG15. Several ideas and patches have been
proposed here and another related thread[1][2] for the skipping
logical replication transaction feature as follows:

A. Change pg_stat_subscription_workers (committed 7a8507329085)
B. Add origin name and commit-LSN to logical replication worker
errcontext (proposed[2])
C. Store error information (e.g., the error message and commit-LSN) to
the system catalog
D. Introduce ALTER SUBSCRIPTION SKIP
E. Record the skipped data somewhere: server logs or a table

Given the remaining time for PG15, it’s unlikely to complete all of
them for PG15 by the feature freeze. The most realistic plan for PG15
in my mind is to complete B and D. With these two items, the LSN of
the error-ed transaction is shown in the server log, and we can ask
users to check server logs for the LSN and use it with ALTER
SUBSCRIPTION SKIP command.

It makes sense to me to try to finish B and D from the above list for
PG-15. I can review the patch for D in detail if others don't have an
objection to it.

Peter E., others, any opinion on this matter?

If the community agrees with B+D, we will
have a user-visible feature for PG15 which can be further
extended/improved in PG16 by adding C and E.

Agreed.

I've attached an updated patch for D and here is the summary:

* Introduce a new command ALTER SUBSCRIPTION ... SKIP (lsn =
'0/1234'). The user can get the commit-LSN of the transaction in
question from the server logs thanks to B[2].
* The user-specified LSN (say skip-LSN) is stored in the
pg_subscription catalog.
* The apply worker skips the whole transaction if the transaction's
commit-LSN exactly matches to skip-LSN.
* The skip-LSN has an effect on only the first non-empty transaction
since the worker started to apply changes. IOW it's cleared after
either skipping the whole transaction or successfully committing a
non-empty transaction, preventing the skip-LSN to remain in the
catalog. Also, since the latter case means that the user set the wrong
skip-LSN we clear it with a warning.

As this will be displayed only in server logs and by background apply
worker, should it be LOG or WARNING?

--
With Regards,
Amit Kapila.

#505osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#503)
RE: Skipping logical replication transactions on subscriber side

On Wednesday, March 2, 2022 12:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch along with two patches for cfbot tests since the
main patch (0003) depends on the other two patches. Both
0001 and 0002 patches are the same ones I attached on another thread[2].

Hi, few comments on v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch.

(1) doc/src/sgml/ref/alter_subscription.sgml

+    <term><literal>SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</r$
...
+      ...After logical replication
+      successfully skips the transaction or commits non-empty transaction,
+      the LSN (stored in
+      <structname>pg_subscription</structname>.<structfield>subskiplsn</structfield>)
+      is cleared.  See <xref linkend="logical-replication-conflicts"/> for
+      the details of logical replication conflicts.
+     </para>
...
+        <term><literal>lsn</literal> (<type>pg_lsn</type>)</term>
+        <listitem>
+         <para>
+          Specifies the commit LSN of the remote transaction whose changes are to be skipped
+          by the logical replication worker.  Skipping
+          individual subtransactions is not supported.  Setting <literal>NONE</literal>
+          resets the LSN.

I think we'll extend the SKIP option choices in the future besides the 'lsn' option.
Then, one sentence "After logical replication successfully skips the transaction or commits non-empty
transaction, the LSN .. is cleared" should be moved to the explanation for 'lsn' section,
if we think this behavior to reset LSN is unique for 'lsn' option ?

(2) doc/src/sgml/catalogs.sgml

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subskiplsn</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       Commit LSN of the transaction whose changes are to be skipped, if a valid
+       LSN; otherwise <literal>0/0</literal>.
+      </para></entry>
+     </row>
+

We need to cover the PREPARE that keeps causing errors on the subscriber.
This would apply to the entire patch (e.g. the rename of skip_xact_commit_lsn)

(3) apply_handle_commit_internal comments

/*
* Helper function for apply_handle_commit and apply_handle_stream_commit.
+ * Return true if the transaction was committed, otherwise return false.
*/

If we want to make the new added line alinged with other functions in worker.c,
we should insert one blank line before it ?

(4) apply_worker_post_transaction

I'm not sure if the current refactoring is good or not.
For example, the current HEAD calls pgstat_report_stat(false)
for a commit case if we are in a transaction in apply_handle_commit_internal.
On the other hand, your refactoring calls pgstat_report_stat unconditionally
for apply_handle_commit path. I'm not sure if there
are many cases to call apply_handle_commit without opening a transaction,
but is that acceptable ?

Also, the name is a bit broad.
How about making a function only for stopping and resetting LSN at this stage ?

(5) comments for clear_subscription_skip_lsn

How about changing the comment like below ?

From:
Clear subskiplsn of pg_subscription catalog
To:
Clear subskiplsn of pg_subscription catalog with origin state update

Best Regards,
Takamichi Osumi

#506Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#503)
Re: Skipping logical replication transactions on subscriber side

On Tue, Mar 1, 2022 at 8:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch along with two patches for cfbot tests
since the main patch (0003) depends on the other two patches. Both
0001 and 0002 patches are the same ones I attached on another
thread[2].

Few comments on 0003:
=====================
1.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subskiplsn</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       Commit LSN of the transaction whose changes are to be skipped,
if a valid
+       LSN; otherwise <literal>0/0</literal>.
+      </para></entry>
+     </row>

Can't this be prepared LSN or rollback prepared LSN? Can we say
Finish/End LSN and then add some details which all LSNs can be there?

2. The conflict resolution explanation needs an update after the
latest commits and we should probably change the commit LSN
terminology as mentioned in the previous point.

3. The text in alter_subscription.sgml looks a bit repetitive to me
(similar to what we have in logical-replication.sgml related to
conflicts). Here also we refer to only commit LSN which needs to be
changed as mentioned in the previous two points.

4.
if (strcmp(lsn_str, "none") == 0)
+ {
+ /* Setting lsn = NONE is treated as resetting LSN */
+ lsn = InvalidXLogRecPtr;
+ }
+ else
+ {
+ /* Parse the argument as LSN */
+ lsn = DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,
+ CStringGetDatum(lsn_str)));
+
+ if (XLogRecPtrIsInvalid(lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL location (LSN): %s", lsn_str)));

Is there a reason that we don't want to allow setting 0
(InvalidXLogRecPtr) for skip LSN?

5.
+# The subscriber will enter an infinite error loop, so we don't want
+# to overflow the server log with error messages.
+$node_subscriber->append_conf(
+ 'postgresql.conf',
+ qq[
+wal_retrieve_retry_interval = 2s
+]);

Can we change this test to use disable_on_error feature? I am thinking
if the disable_on_error feature got committed first, maybe we can have
one test file for this and disable_on_error feature (something like
conflicts.pl).

--
With Regards,
Amit Kapila.

#507Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#505)
Re: Skipping logical replication transactions on subscriber side

On Thu, Mar 10, 2022 at 2:10 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Wednesday, March 2, 2022 12:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch along with two patches for cfbot tests since the
main patch (0003) depends on the other two patches. Both
0001 and 0002 patches are the same ones I attached on another thread[2].

Hi, few comments on v12-0003-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch.

Thank you for the comments.

(1) doc/src/sgml/ref/alter_subscription.sgml

+    <term><literal>SKIP ( <replaceable class="parameter">skip_option</replaceable> = <replaceable class="parameter">value</r$
...
+      ...After logical replication
+      successfully skips the transaction or commits non-empty transaction,
+      the LSN (stored in
+      <structname>pg_subscription</structname>.<structfield>subskiplsn</structfield>)
+      is cleared.  See <xref linkend="logical-replication-conflicts"/> for
+      the details of logical replication conflicts.
+     </para>
...
+        <term><literal>lsn</literal> (<type>pg_lsn</type>)</term>
+        <listitem>
+         <para>
+          Specifies the commit LSN of the remote transaction whose changes are to be skipped
+          by the logical replication worker.  Skipping
+          individual subtransactions is not supported.  Setting <literal>NONE</literal>
+          resets the LSN.

I think we'll extend the SKIP option choices in the future besides the 'lsn' option.
Then, one sentence "After logical replication successfully skips the transaction or commits non-empty
transaction, the LSN .. is cleared" should be moved to the explanation for 'lsn' section,
if we think this behavior to reset LSN is unique for 'lsn' option ?

Hmm, I think that regardless of the type of option (e.g., relid, xid,
and action whatever), resetting the specified something after that is
specific to SKIP command. SKIP command should have an effect on only
the first non-empty transaction. Otherwise, we could end up leaving it
if the user mistakenly specifies the wrong one.

(2) doc/src/sgml/catalogs.sgml

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subskiplsn</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       Commit LSN of the transaction whose changes are to be skipped, if a valid
+       LSN; otherwise <literal>0/0</literal>.
+      </para></entry>
+     </row>
+

We need to cover the PREPARE that keeps causing errors on the subscriber.
This would apply to the entire patch (e.g. the rename of skip_xact_commit_lsn)

Fixed.

(3) apply_handle_commit_internal comments

/*
* Helper function for apply_handle_commit and apply_handle_stream_commit.
+ * Return true if the transaction was committed, otherwise return false.
*/

If we want to make the new added line alinged with other functions in worker.c,
we should insert one blank line before it ?

This part is removed.

(4) apply_worker_post_transaction

I'm not sure if the current refactoring is good or not.
For example, the current HEAD calls pgstat_report_stat(false)
for a commit case if we are in a transaction in apply_handle_commit_internal.
On the other hand, your refactoring calls pgstat_report_stat unconditionally
for apply_handle_commit path. I'm not sure if there
are many cases to call apply_handle_commit without opening a transaction,
but is that acceptable ?

Also, the name is a bit broad.
How about making a function only for stopping and resetting LSN at this stage ?

Agreed, it seems to be overkill. I'll revert that change.

(5) comments for clear_subscription_skip_lsn

How about changing the comment like below ?

From:
Clear subskiplsn of pg_subscription catalog
To:
Clear subskiplsn of pg_subscription catalog with origin state update

Updated.

I'll submit an updated patch that incorporated comments I got so far
and is rebased to disable_on_error patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#508Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#506)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Mar 10, 2022 at 9:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 1, 2022 at 8:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch along with two patches for cfbot tests
since the main patch (0003) depends on the other two patches. Both
0001 and 0002 patches are the same ones I attached on another
thread[2].

Few comments on 0003:
=====================
1.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subskiplsn</structfield> <type>pg_lsn</type>
+      </para>
+      <para>
+       Commit LSN of the transaction whose changes are to be skipped,
if a valid
+       LSN; otherwise <literal>0/0</literal>.
+      </para></entry>
+     </row>

Can't this be prepared LSN or rollback prepared LSN? Can we say
Finish/End LSN and then add some details which all LSNs can be there?

Right, changed to finish LSN.

2. The conflict resolution explanation needs an update after the
latest commits and we should probably change the commit LSN
terminology as mentioned in the previous point.

Updated.

3. The text in alter_subscription.sgml looks a bit repetitive to me
(similar to what we have in logical-replication.sgml related to
conflicts). Here also we refer to only commit LSN which needs to be
changed as mentioned in the previous two points.

Updated.

4.
if (strcmp(lsn_str, "none") == 0)
+ {
+ /* Setting lsn = NONE is treated as resetting LSN */
+ lsn = InvalidXLogRecPtr;
+ }
+ else
+ {
+ /* Parse the argument as LSN */
+ lsn = DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,
+ CStringGetDatum(lsn_str)));
+
+ if (XLogRecPtrIsInvalid(lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL location (LSN): %s", lsn_str)));

Is there a reason that we don't want to allow setting 0
(InvalidXLogRecPtr) for skip LSN?

0 is obviously an invalid value for skip LSN, which should not be
allowed similar to other options (like setting '' to slot_name). Also,
we use 0 (InvalidXLogRecPtr) internally to reset the subskipxid when
NONE is specified.

5.
+# The subscriber will enter an infinite error loop, so we don't want
+# to overflow the server log with error messages.
+$node_subscriber->append_conf(
+ 'postgresql.conf',
+ qq[
+wal_retrieve_retry_interval = 2s
+]);

Can we change this test to use disable_on_error feature? I am thinking
if the disable_on_error feature got committed first, maybe we can have
one test file for this and disable_on_error feature (something like
conflicts.pl).

Good idea. Updated.

I've attached an updated version patch. This patch can be applied on
top of the latest disable_on_error patch[1]/messages/by-id/CAA4eK1Kes9TsMpGL6m+AJNHYCGRvx6piYQt5v6TEbH_t9jh8nA@mail.gmail.com.

Regards,

[1]: /messages/by-id/CAA4eK1Kes9TsMpGL6m+AJNHYCGRvx6piYQt5v6TEbH_t9jh8nA@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v13-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patchapplication/octet-stream; name=v13-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
#509osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#508)
RE: Skipping logical replication transactions on subscriber side

On Friday, March 11, 2022 5:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch. This patch can be applied on top of the
latest disable_on_error patch[1].

Hi, thank you for the patch. I'll share my review comments on v13.

(a) src/backend/commands/subscriptioncmds.c

@@ -84,6 +86,8 @@ typedef struct SubOpts
        bool            streaming;
        bool            twophase;
        bool            disableonerr;
+       XLogRecPtr      lsn;                    /* InvalidXLogRecPtr for resetting purpose,
+                                                                * otherwise a valid LSN */

I think this explanation is slightly odd and can be improved.
Strictly speaking, I feel a *valid* LSN is for retting transaction purpose
from the functional perspective. Also, the wording "resetting purpose"
is unclear by itself. I'll suggest below change.

From:
InvalidXLogRecPtr for resetting purpose, otherwise a valid LSN
To:
A valid LSN when we skip transaction, otherwise InvalidXLogRecPtr

(b) The code position of additional append in describeSubscriptions

+
+               /* Skip LSN is only supported in v15 and higher */
+               if (pset.sversion >= 150000)
+                       appendPQExpBuffer(&buf,
+                                                         ", subskiplsn AS \"%s\"\n",
+                                                         gettext_noop("Skip LSN"));

I suggest to combine this code after subdisableonerr.

(c) parse_subscription_options

+                               /* Parse the argument as LSN */
+                               lsn = DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,

Here, shouldn't we call DatumGetLSN, instead of DatumGetTransactionId ?

(d) parse_subscription_options

+                       if (strcmp(lsn_str, "none") == 0)
+                       {
+                               /* Setting lsn = NONE is treated as resetting LSN */
+                               lsn = InvalidXLogRecPtr;
+                       }
+

We should remove this pair of curly brackets that is for one sentence.

(e) src/backend/replication/logical/worker.c

+ * to skip applying the changes when starting to apply changes.  The subskiplsn is
+ * cleared after successfully skipping the transaction or applying non-empty
+ * transaction, where the later avoids the mistakenly specified subskiplsn from
+ * being left.

typo "the later" -> "the latter"

At the same time, I feel the last part of this sentence can be an independent sentence.
From:
, where the later avoids the mistakenly specified subskiplsn from being left
To:
. The latter prevents the mistakenly specified subskiplsn from being left

* Note that my comments below are applied if we choose we don't merge disable_on_error test with skip lsn tests.

(f) src/test/subscription/t/030_skip_xact.pl

+use Test::More tests => 4;

It's better to utilize the new style for the TAP test.
Then, probably we should introduce done_testing()
at the end of the test.

(g) src/test/subscription/t/030_skip_xact.pl

I think there's no need to create two types of subscriptions.
Just one subscription with two_phase = on and streaming = on
would be sufficient for the tests(normal commit, commit prepared,
stream commit cases). I think this point of view will reduce
the number of the table and the publication, which will
make the whole test simpler.

Best Regards,
Takamichi Osumi

#510shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#508)
1 attachment(s)
RE: Skipping logical replication transactions on subscriber side

On Fri, Mar 11, 2022 4:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch. This patch can be applied on
top of the latest disable_on_error patch[1].

Thanks for your patch. Here are some comments for the v13 patch.

1. doc/src/sgml/ref/alter_subscription.sgml
+ Specifies the transaction's finish LSN of the remote transaction whose changes

Could it be simplified to "Specifies the finish LSN of the remote transaction
whose ...".

2.
I met a failed assertion, the backtrace is attached. This is caused by the
following code in maybe_start_skipping_changes().

+		/*
+		 * It's a rare case; a past subskiplsn was left because the server
+		 * crashed after preparing the transaction and before clearing the
+		 * subskiplsn. We clear it without a warning message so as not confuse
+		 * the user.
+		 */
+		if (unlikely(MySubscription->skiplsn < lsn))
+		{
+			clear_subscription_skip_lsn(MySubscription->skiplsn, InvalidXLogRecPtr, 0,
+										false);
+			Assert(!IsTransactionState());
+		}

We want to clear subskiplsn in the case mentioned in comment. But if the next
transaction is a steaming transaction and this function is called by
apply_spooled_messages(), we are inside a transaction here. So, I think this
assertion is not suitable for streaming transaction. Thoughts?

3.
+	XLogRecPtr	subskiplsn;		/* All changes which committed at this LSN are
+								 * skipped */

To be consistent, should the comment be changed to "All changes which finished
at this LSN are skipped"?

4.
+      After logical replication worker successfully skips the transaction or commits
+      non-empty transaction, the LSN (stored in
+      <structname>pg_subscription</structname>.<structfield>subskiplsn</structfield>)
+      is cleared.

Besides "commits non-empty transaction", subskiplsn would also be cleared in
some two-phase commit cases I think. Like prepare/commit/rollback a transaction,
even if it is an empty transaction. So, should we change it for these cases?

5.
+ * Clear subskiplsn of pg_subscription catalog with origin state update.

Should "with origin state update" modified to "with origin state updated"?

Regards,
Shi yu

Attachments:

backtrace.txttext/plain; name=backtrace.txt
#511osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#508)
RE: Skipping logical replication transactions on subscriber side

On Friday, March 11, 2022 5:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch. This patch can be applied on top of the
latest disable_on_error patch[1].

Hi, few extra comments on v13.

(1) src/backend/replication/logical/worker.c

With regard to clear_subscription_skip_lsn,
There are cases that we conduct origin state update twice.

For instance, the case we reset subskiplsn by executing an
irrelevant non-empty transaction. The first update is
conducted at apply_handle_commit_internal and the second one
is at clear_subscription_skip_lsn. In the second change,
we update replorigin_session_origin_lsn by smaller value(commit_lsn),
compared to the first update(end_lsn). Were those intentional and OK ?

(2) src/backend/replication/logical/worker.c

+ * Both origin_lsn and origin_timestamp are the remote transaction's end_lsn
+ * and commit timestamp, respectively.
+ */
+static void
+stop_skipping_changes(XLogRecPtr origin_lsn, TimestampTz origin_ts)

Typo. Should change 'origin_timestamp' to 'origin_ts',
because the name of the argument is the latter.

Also, here we handle not only commit but also prepare.
You need to fix the comment "commit timestamp" as well.

(3) src/backend/replication/logical/worker.c

+/*
+ * Clear subskiplsn of pg_subscription catalog with origin state update.
+ *
+ * if with_warning is true, we raise a warning when clearing the subskipxid.

It's better to insert this second sentence as the last sentence of
the other comments. It should start with capital letter as well.

Best Regards,
Takamichi Osumi

#512Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: shiy.fnst@fujitsu.com (#510)
Re: Skipping logical replication transactions on subscriber side

On Mon, Mar 14, 2022 at 6:50 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Fri, Mar 11, 2022 4:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch. This patch can be applied on
top of the latest disable_on_error patch[1].

Thanks for your patch. Here are some comments for the v13 patch.

Thank you for the comments!

1. doc/src/sgml/ref/alter_subscription.sgml
+ Specifies the transaction's finish LSN of the remote transaction whose changes

Could it be simplified to "Specifies the finish LSN of the remote transaction
whose ...".

Fixed.

2.
I met a failed assertion, the backtrace is attached. This is caused by the
following code in maybe_start_skipping_changes().

+               /*
+                * It's a rare case; a past subskiplsn was left because the server
+                * crashed after preparing the transaction and before clearing the
+                * subskiplsn. We clear it without a warning message so as not confuse
+                * the user.
+                */
+               if (unlikely(MySubscription->skiplsn < lsn))
+               {
+                       clear_subscription_skip_lsn(MySubscription->skiplsn, InvalidXLogRecPtr, 0,
+                                                                               false);
+                       Assert(!IsTransactionState());
+               }

We want to clear subskiplsn in the case mentioned in comment. But if the next
transaction is a steaming transaction and this function is called by
apply_spooled_messages(), we are inside a transaction here. So, I think this
assertion is not suitable for streaming transaction. Thoughts?

Good catch. After more thought, I realized that the assumption of this
if statement is wrong and we don't necessarily need to do here since
the left skip-LSN will eventually be cleared when the next transaction
is finished. So removed this part.

3.
+       XLogRecPtr      subskiplsn;             /* All changes which committed at this LSN are
+                                                                * skipped */

To be consistent, should the comment be changed to "All changes which finished
at this LSN are skipped"?

Fixed.

4.
+      After logical replication worker successfully skips the transaction or commits
+      non-empty transaction, the LSN (stored in
+      <structname>pg_subscription</structname>.<structfield>subskiplsn</structfield>)
+      is cleared.

Besides "commits non-empty transaction", subskiplsn would also be cleared in
some two-phase commit cases I think. Like prepare/commit/rollback a transaction,
even if it is an empty transaction. So, should we change it for these cases?

Fixed.

5.
+ * Clear subskiplsn of pg_subscription catalog with origin state update.

Should "with origin state update" modified to "with origin state updated"?

Fixed.

I'll submit an updated patch soon.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#513Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#511)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

Hi,

On Fri, Mar 11, 2022 at 8:37 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Friday, March 11, 2022 5:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch. This patch can be applied on top of the
latest disable_on_error patch[1].

Hi, thank you for the patch. I'll share my review comments on v13.

(a) src/backend/commands/subscriptioncmds.c

@@ -84,6 +86,8 @@ typedef struct SubOpts
bool            streaming;
bool            twophase;
bool            disableonerr;
+       XLogRecPtr      lsn;                    /* InvalidXLogRecPtr for resetting purpose,
+                                                                * otherwise a valid LSN */

I think this explanation is slightly odd and can be improved.
Strictly speaking, I feel a *valid* LSN is for retting transaction purpose
from the functional perspective. Also, the wording "resetting purpose"
is unclear by itself. I'll suggest below change.

From:
InvalidXLogRecPtr for resetting purpose, otherwise a valid LSN
To:
A valid LSN when we skip transaction, otherwise InvalidXLogRecPtr

"when we skip transaction" sounds incorrect to me since it's just an
option value but does not indicate that we really skip the transaction
that has that LSN. I realized that we directly use InvalidXLogRecPtr
for subskiplsn so I think no need to mention it.

(b) The code position of additional append in describeSubscriptions

+
+               /* Skip LSN is only supported in v15 and higher */
+               if (pset.sversion >= 150000)
+                       appendPQExpBuffer(&buf,
+                                                         ", subskiplsn AS \"%s\"\n",
+                                                         gettext_noop("Skip LSN"));

I suggest to combine this code after subdisableonerr.

I got the comment[1]/messages/by-id/09b80566-c790-704b-35b4-33f87befc41f@enterprisedb.com from Peter to put it at the end, which looks better to me.

(c) parse_subscription_options

+                               /* Parse the argument as LSN */
+                               lsn = DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,

Here, shouldn't we call DatumGetLSN, instead of DatumGetTransactionId ?

Right, fixed.

(d) parse_subscription_options

+                       if (strcmp(lsn_str, "none") == 0)
+                       {
+                               /* Setting lsn = NONE is treated as resetting LSN */
+                               lsn = InvalidXLogRecPtr;
+                       }
+

We should remove this pair of curly brackets that is for one sentence.

I moved the comment on top of the if statement and removed the brackets.

(e) src/backend/replication/logical/worker.c

+ * to skip applying the changes when starting to apply changes.  The subskiplsn is
+ * cleared after successfully skipping the transaction or applying non-empty
+ * transaction, where the later avoids the mistakenly specified subskiplsn from
+ * being left.

typo "the later" -> "the latter"

At the same time, I feel the last part of this sentence can be an independent sentence.
From:
, where the later avoids the mistakenly specified subskiplsn from being left
To:
. The latter prevents the mistakenly specified subskiplsn from being left

Fixed.

* Note that my comments below are applied if we choose we don't merge disable_on_error test with skip lsn tests.

(f) src/test/subscription/t/030_skip_xact.pl

+use Test::More tests => 4;

It's better to utilize the new style for the TAP test.
Then, probably we should introduce done_testing()
at the end of the test.

Fixed.

(g) src/test/subscription/t/030_skip_xact.pl

I think there's no need to create two types of subscriptions.
Just one subscription with two_phase = on and streaming = on
would be sufficient for the tests(normal commit, commit prepared,
stream commit cases). I think this point of view will reduce
the number of the table and the publication, which will
make the whole test simpler.

Good point, fixed.

On Mon, Mar 14, 2022 at 9:39 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Friday, March 11, 2022 5:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch. This patch can be applied on top of the
latest disable_on_error patch[1].

Hi, few extra comments on v13.

(1) src/backend/replication/logical/worker.c

With regard to clear_subscription_skip_lsn,
There are cases that we conduct origin state update twice.

For instance, the case we reset subskiplsn by executing an
irrelevant non-empty transaction. The first update is
conducted at apply_handle_commit_internal and the second one
is at clear_subscription_skip_lsn. In the second change,
we update replorigin_session_origin_lsn by smaller value(commit_lsn),
compared to the first update(end_lsn). Were those intentional and OK ?

Good catch, this part is removed in the latest patch.

(2) src/backend/replication/logical/worker.c

+ * Both origin_lsn and origin_timestamp are the remote transaction's end_lsn
+ * and commit timestamp, respectively.
+ */
+static void
+stop_skipping_changes(XLogRecPtr origin_lsn, TimestampTz origin_ts)

Typo. Should change 'origin_timestamp' to 'origin_ts',
because the name of the argument is the latter.

Also, here we handle not only commit but also prepare.
You need to fix the comment "commit timestamp" as well.

Fixed.

(3) src/backend/replication/logical/worker.c

+/*
+ * Clear subskiplsn of pg_subscription catalog with origin state update.
+ *
+ * if with_warning is true, we raise a warning when clearing the subskipxid.

It's better to insert this second sentence as the last sentence of
the other comments.

with_warning is removed in the latest patch.

I've attached an updated version patch.

Regards,

[1]: /messages/by-id/09b80566-c790-704b-35b4-33f87befc41f@enterprisedb.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v14-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patchapplication/octet-stream; name=v14-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
#514Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#513)
Re: Skipping logical replication transactions on subscriber side

On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

Review:
=======
1.
+++ b/doc/src/sgml/logical-replication.sgml
@@ -366,15 +366,19 @@ CONTEXT:  processing remote data for replication
origin "pg_16395" during "INSER
    transaction, the subscription needs to be disabled temporarily by
    <command>ALTER SUBSCRIPTION ... DISABLE</command> first or
alternatively, the
    subscription can be used with the
<literal>disable_on_error</literal> option.
-   Then, the transaction can be skipped by calling the
+   Then, the transaction can be skipped by using
+   <command>ALTER SUBSCRITPION ... SKIP</command> with the finish LSN
+   (i.e., LSN 0/14C0378). After that the replication
+   can be resumed by <command>ALTER SUBSCRIPTION ... ENABLE</command>.
+   Alternatively, the transaction can also be skipped by calling the

Do we really need to disable the subscription for the skip feature? I
think that is required for origin_advance. Also, probably, we can say
Finish LSN could be Prepare LSN, Commit LSN, etc.

2.
+ /*
+ * Quick return if it's not requested to skip this transaction. This
+ * function is called every start of applying changes and we assume that
+ * skipping the transaction is not used in many cases.
+ */
+ if (likely(XLogRecPtrIsInvalid(MySubscription->skiplsn) ||

The second part of this comment (especially ".. every start of
applying changes ..") sounds slightly odd to me. How about changing it
to: "This function is called for every remote transaction and we
assume that skipping the transaction is not used in many cases."

3.
+
+ ereport(LOG,
+ errmsg("start skipping logical replication transaction which
finished at %X/%X",
...
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction which
finished at %X/%X",

No need of 'which' in above LOG messages. I think the message will be
clear without the use of which in above message.

4.
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction which
finished at %X/%X",
+ LSN_FORMAT_ARGS(skip_xact_finish_lsn))));
+
+ /* Stop skipping changes */
+ skip_xact_finish_lsn = InvalidXLogRecPtr;

Let's reverse the order of these statements to make them consistent
with the corresponding maybe_start_* function.

5.
+
+ if (myskiplsn != finish_lsn)
+ ereport(WARNING,
+ errmsg("skip-LSN of logical replication subscription \"%s\"
cleared", MySubscription->name),

Shouldn't this be a LOG instead of a WARNING as this will be displayed
only in server logs and by background apply worker?

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
TupleTableSlot *remoteslot;
MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

7.
+ /*
+ * Start a new transaction to clear the subskipxid, if not started
+ * yet. The transaction is committed below.
+ */
+ if (!IsTransactionState())

I think the second part of the comment: "The transaction is committed
below." is not required.

8.
+ XLogRecPtr subskiplsn; /* All changes which finished at this LSN are
+ * skipped */
+
 #ifdef CATALOG_VARLEN /* variable-length fields start here */
  /* Connection string to the publisher */
  text subconninfo BKI_FORCE_NOT_NULL;
@@ -109,6 +112,8 @@ typedef struct Subscription
  bool disableonerr; /* Indicates if the subscription should be
  * automatically disabled if a worker error
  * occurs */
+ XLogRecPtr skiplsn; /* All changes which finished at this LSN are
+ * skipped */

No need for 'which' in the above comments.

9.
Can we merge 029_disable_on_error in 030_skip_xact and name it as
029_on_error (or 029_on_error_skip_disable or some variant of it)?
Both seem to be related features. I am slightly worried at the pace at
which the number of test files are growing in subscription test.

--
With Regards,
Amit Kapila.

#515osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#513)
RE: Skipping logical replication transactions on subscriber side

On Tuesday, March 15, 2022 3:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

A couple of minor comments on v14.

(1) apply_handle_commit_internal

+       if (is_skipping_changes())
+       {
+               stop_skipping_changes();
+
+               /*
+                * Start a new transaction to clear the subskipxid, if not started
+                * yet. The transaction is committed below.
+                */
+               if (!IsTransactionState())
+                       StartTransactionCommand();
+       }
+

I suppose we can move this condition check and stop_skipping_changes() call
to the inside of the block we enter when IsTransactionState() returns true.

As the comment of apply_handle_commit_internal() mentions,
it's the helper function for apply_handle_commit() and
apply_handle_stream_commit().

Then, I couldn't think that both callers don't open
a transaction before the call of apply_handle_commit_internal().
For applying spooled messages, we call begin_replication_step as well.

I can miss something, but timing when we receive COMMIT message
without opening a transaction, would be the case of empty transactions
where the subscription (and its subscription worker) is not interested.
If this is true, currently the patch's code includes
such cases within the range of is_skipping_changes() check.

(2) clear_subscription_skip_lsn's comments.

The comments for this function shouldn't touch
update of origin states, now that we don't update those.

+/*
+ * Clear subskiplsn of pg_subscription catalog with origin state updated.
+ *

This applies to other comments.

+       /*
+        * Update the subskiplsn of the tuple to InvalidXLogRecPtr.  If user has
+        * already changed subskiplsn before clearing it we don't update the
+        * catalog and don't advance the replication origin state.  
...
+        *            ....                We can reduce the possibility by
+        * logging a replication origin WAL record to advance the origin LSN
+        * instead but there is no way to advance the origin timestamp and it
+        * doesn't seem to be worth doing anything about it since it's a very rare
+        * case.
+        */

Best Regards,
Takamichi Osumi

#516Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#514)
Re: Skipping logical replication transactions on subscriber side

On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
TupleTableSlot *remoteslot;
MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

Since we already have the check of applying the change on the spot at
the beginning of the handlers I feel it's better to add
is_skipping_changes() to the check than add a new if statement to
apply_dispatch, but do you prefer to check it in one central place in
apply_dispatch?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#517Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#516)
Re: Skipping logical replication transactions on subscriber side

On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
TupleTableSlot *remoteslot;
MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

Since we already have the check of applying the change on the spot at
the beginning of the handlers I feel it's better to add
is_skipping_changes() to the check than add a new if statement to
apply_dispatch, but do you prefer to check it in one central place in
apply_dispatch?

I think either way is fine. I just wanted to know the reason, your
current change looks okay to me.

Some questions/comments
======================
1. IIRC, earlier, we thought of allowing to use of this option (SKIP)
only for superusers (as this can lead to inconsistent data if not used
carefully) but I don't see that check in the latest patch. What is the
reason for the same?

2.
+ /*
+ * Update the subskiplsn of the tuple to InvalidXLogRecPtr.

I think we can change the above part of the comment to "Clear subskiplsn."

3.
+ * Since we already have

Isn't it better to say here: Since we have already ...?

--
With Regards,
Amit Kapila.

#518Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#515)
Re: Skipping logical replication transactions on subscriber side

On Tue, Mar 15, 2022 at 7:30 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, March 15, 2022 3:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

A couple of minor comments on v14.

(1) apply_handle_commit_internal

+       if (is_skipping_changes())
+       {
+               stop_skipping_changes();
+
+               /*
+                * Start a new transaction to clear the subskipxid, if not started
+                * yet. The transaction is committed below.
+                */
+               if (!IsTransactionState())
+                       StartTransactionCommand();
+       }
+

I suppose we can move this condition check and stop_skipping_changes() call
to the inside of the block we enter when IsTransactionState() returns true.

As the comment of apply_handle_commit_internal() mentions,
it's the helper function for apply_handle_commit() and
apply_handle_stream_commit().

Then, I couldn't think that both callers don't open
a transaction before the call of apply_handle_commit_internal().
For applying spooled messages, we call begin_replication_step as well.

I can miss something, but timing when we receive COMMIT message
without opening a transaction, would be the case of empty transactions
where the subscription (and its subscription worker) is not interested.

I think when we skip non-streamed transactions we don't start a
transaction. So, if we do what you are suggesting, we will miss to
clear the skip_lsn after skipping the transaction.

--
With Regards,
Amit Kapila.

#519Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#517)
Re: Skipping logical replication transactions on subscriber side

On Wed, Mar 16, 2022 at 7:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
TupleTableSlot *remoteslot;
MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

Since we already have the check of applying the change on the spot at
the beginning of the handlers I feel it's better to add
is_skipping_changes() to the check than add a new if statement to
apply_dispatch, but do you prefer to check it in one central place in
apply_dispatch?

I think either way is fine. I just wanted to know the reason, your
current change looks okay to me.

I feel it is better to at least add a comment suggesting that we skip
only data modification changes because the other part of message
handle_stream_* is there in other message handlers as well. It will
make it easier to add a similar check in future message handlers.

--
With Regards,
Amit Kapila.

#520Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#517)
Re: Skipping logical replication transactions on subscriber side

On Wed, Mar 16, 2022 at 7:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
TupleTableSlot *remoteslot;
MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

Since we already have the check of applying the change on the spot at
the beginning of the handlers I feel it's better to add
is_skipping_changes() to the check than add a new if statement to
apply_dispatch, but do you prefer to check it in one central place in
apply_dispatch?

I think either way is fine. I just wanted to know the reason, your
current change looks okay to me.

Some questions/comments
======================

Some cosmetic suggestions:
======================
1.
+# Create subscriptions. Both subscription sets disable_on_error to on
+# so that they get disabled when a conflict occurs.
+$node_subscriber->safe_psql(
+ 'postgres',
+ qq[
+CREATE SUBSCRIPTION $subname CONNECTION '$publisher_connstr'
PUBLICATION tap_pub WITH (streaming = on, two_phase = on,
disable_on_error = on);
+]);

I don't understand what you mean by 'Both subscription ...' in the
above comments.

2.
+ # Check the log indicating that successfully skipped the transaction,

How about slightly rephrasing this to: "Check the log to ensure that
the transaction is skipped...."?

--
With Regards,
Amit Kapila.

#521Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#514)
Re: Skipping logical replication transactions on subscriber side

On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

Review:
=======

Thank you for the comments.

1.
+++ b/doc/src/sgml/logical-replication.sgml
@@ -366,15 +366,19 @@ CONTEXT:  processing remote data for replication
origin "pg_16395" during "INSER
transaction, the subscription needs to be disabled temporarily by
<command>ALTER SUBSCRIPTION ... DISABLE</command> first or
alternatively, the
subscription can be used with the
<literal>disable_on_error</literal> option.
-   Then, the transaction can be skipped by calling the
+   Then, the transaction can be skipped by using
+   <command>ALTER SUBSCRITPION ... SKIP</command> with the finish LSN
+   (i.e., LSN 0/14C0378). After that the replication
+   can be resumed by <command>ALTER SUBSCRIPTION ... ENABLE</command>.
+   Alternatively, the transaction can also be skipped by calling the

Do we really need to disable the subscription for the skip feature? I
think that is required for origin_advance. Also, probably, we can say
Finish LSN could be Prepare LSN, Commit LSN, etc.

Not necessary to disable the subscription for skip feature. Fixed.

2.
+ /*
+ * Quick return if it's not requested to skip this transaction. This
+ * function is called every start of applying changes and we assume that
+ * skipping the transaction is not used in many cases.
+ */
+ if (likely(XLogRecPtrIsInvalid(MySubscription->skiplsn) ||

The second part of this comment (especially ".. every start of
applying changes ..") sounds slightly odd to me. How about changing it
to: "This function is called for every remote transaction and we
assume that skipping the transaction is not used in many cases."

Fixed.

3.
+
+ ereport(LOG,
+ errmsg("start skipping logical replication transaction which
finished at %X/%X",
...
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction which
finished at %X/%X",

No need of 'which' in above LOG messages. I think the message will be
clear without the use of which in above message.

Removed.

4.
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction which
finished at %X/%X",
+ LSN_FORMAT_ARGS(skip_xact_finish_lsn))));
+
+ /* Stop skipping changes */
+ skip_xact_finish_lsn = InvalidXLogRecPtr;

Let's reverse the order of these statements to make them consistent
with the corresponding maybe_start_* function.

But we cannot simply rever the order since skip_xact_finish_lsn is
used in the log message. Do we want to use a variable for it?

5.
+
+ if (myskiplsn != finish_lsn)
+ ereport(WARNING,
+ errmsg("skip-LSN of logical replication subscription \"%s\"
cleared", MySubscription->name),

Shouldn't this be a LOG instead of a WARNING as this will be displayed
only in server logs and by background apply worker?

WARNINGs are used also by other auxiliary processes such as archiver,
autovacuum workers, and launcher. So I think we can use it here.

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
TupleTableSlot *remoteslot;
MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

I'd leave it as is as I mentioned in another email. But I've added
some comments as you suggested.

7.
+ /*
+ * Start a new transaction to clear the subskipxid, if not started
+ * yet. The transaction is committed below.
+ */
+ if (!IsTransactionState())

I think the second part of the comment: "The transaction is committed
below." is not required.

Removed.

8.
+ XLogRecPtr subskiplsn; /* All changes which finished at this LSN are
+ * skipped */
+
#ifdef CATALOG_VARLEN /* variable-length fields start here */
/* Connection string to the publisher */
text subconninfo BKI_FORCE_NOT_NULL;
@@ -109,6 +112,8 @@ typedef struct Subscription
bool disableonerr; /* Indicates if the subscription should be
* automatically disabled if a worker error
* occurs */
+ XLogRecPtr skiplsn; /* All changes which finished at this LSN are
+ * skipped */

No need for 'which' in the above comments.

Removed.

9.
Can we merge 029_disable_on_error in 030_skip_xact and name it as
029_on_error (or 029_on_error_skip_disable or some variant of it)?
Both seem to be related features. I am slightly worried at the pace at
which the number of test files are growing in subscription test.

Yes, we can merge them.

I'll submit an updated version patch after incorporating all comments I got.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#522osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Amit Kapila (#518)
RE: Skipping logical replication transactions on subscriber side

On Wednesday, March 16, 2022 11:33 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 15, 2022 at 7:30 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, March 15, 2022 3:13 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

A couple of minor comments on v14.

(1) apply_handle_commit_internal

+       if (is_skipping_changes())
+       {
+               stop_skipping_changes();
+
+               /*
+                * Start a new transaction to clear the subskipxid, if not

started

+                * yet. The transaction is committed below.
+                */
+               if (!IsTransactionState())
+                       StartTransactionCommand();
+       }
+

I suppose we can move this condition check and stop_skipping_changes()
call to the inside of the block we enter when IsTransactionState() returns

true.

As the comment of apply_handle_commit_internal() mentions, it's the
helper function for apply_handle_commit() and
apply_handle_stream_commit().

Then, I couldn't think that both callers don't open a transaction
before the call of apply_handle_commit_internal().
For applying spooled messages, we call begin_replication_step as well.

I can miss something, but timing when we receive COMMIT message
without opening a transaction, would be the case of empty transactions
where the subscription (and its subscription worker) is not interested.

I think when we skip non-streamed transactions we don't start a transaction.
So, if we do what you are suggesting, we will miss to clear the skip_lsn after
skipping the transaction.

OK, this is what I missed.

On the other hand, what I was worried about is that
empty transaction can start skipping changes,
if the subskiplsn is equal to the finish LSN for
the empty transaction. The reason is we call
maybe_start_skipping_changes even for empty ones
and set skip_xact_finish_lsn by the finish LSN in that case.

I checked I could make this happen with debugger and some logs for LSN.
What I did is just having two pairs of pub/sub
and conduct a change for one of them,
after I set a breakpoint in the logicalrep_write_begin
on the walsender that will issue an empty transaction.
Then, I check the finish LSN of it and
conduct an alter subscription skip lsn command with this LSN value.
As a result, empty transaction calls stop_skipping_changes
in the apply_handle_commit_internal and then
enter the block for IsTransactionState == true,
which would not happen before applying the patch.

Also, this behavior looks contradicted with some comments in worker.c
"The subskiplsn is cleared after successfully skipping the transaction
or applying non-empty transaction." so, I was just confused and
wrote the above comment.

I think this would not happen in practice, then
it might be OK without a special measure for this,
but I wasn't sure.

Best Regards,
Takamichi Osumi

#523osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: osumi.takamichi@fujitsu.com (#522)
RE: Skipping logical replication transactions on subscriber side

On Wednesday, March 16, 2022 3:37 PM I wrote:

On Wednesday, March 16, 2022 11:33 AM Amit Kapila
<amit.kapila16@gmail.com> wrote:

On Tue, Mar 15, 2022 at 7:30 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Tuesday, March 15, 2022 3:13 PM Masahiko Sawada

<sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

A couple of minor comments on v14.

(1) apply_handle_commit_internal

+       if (is_skipping_changes())
+       {
+               stop_skipping_changes();
+
+               /*
+                * Start a new transaction to clear the subskipxid,
+ if not

started

+                * yet. The transaction is committed below.
+                */
+               if (!IsTransactionState())
+                       StartTransactionCommand();
+       }
+

I suppose we can move this condition check and
stop_skipping_changes() call to the inside of the block we enter
when IsTransactionState() returns

true.

As the comment of apply_handle_commit_internal() mentions, it's the
helper function for apply_handle_commit() and
apply_handle_stream_commit().

Then, I couldn't think that both callers don't open a transaction
before the call of apply_handle_commit_internal().
For applying spooled messages, we call begin_replication_step as well.

I can miss something, but timing when we receive COMMIT message
without opening a transaction, would be the case of empty
transactions where the subscription (and its subscription worker) is not

interested.

I think when we skip non-streamed transactions we don't start a transaction.
So, if we do what you are suggesting, we will miss to clear the
skip_lsn after skipping the transaction.

OK, this is what I missed.

On the other hand, what I was worried about is that empty transaction can start
skipping changes, if the subskiplsn is equal to the finish LSN for the empty
transaction. The reason is we call maybe_start_skipping_changes even for
empty ones and set skip_xact_finish_lsn by the finish LSN in that case.

I checked I could make this happen with debugger and some logs for LSN.
What I did is just having two pairs of pub/sub and conduct a change for one of
them, after I set a breakpoint in the logicalrep_write_begin on the walsender
that will issue an empty transaction.
Then, I check the finish LSN of it and
conduct an alter subscription skip lsn command with this LSN value.
As a result, empty transaction calls stop_skipping_changes in the
apply_handle_commit_internal and then enter the block for IsTransactionState
== true, which would not happen before applying the patch.

Also, this behavior looks contradicted with some comments in worker.c "The
subskiplsn is cleared after successfully skipping the transaction or applying
non-empty transaction." so, I was just confused and wrote the above comment.

Sorry, my understanding was not correct.

Even when we clear the subskiplsn by empty transaction,
we can say that it applies to the success of skipping the transaction.
Then this behavior and allowing empty transaction to match the indicated
LSN by alter subscription is fine.

I'm sorry for making noises.

Best Regards,
Takamichi Osumi

#524Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#517)
Re: Skipping logical replication transactions on subscriber side

On Wed, Mar 16, 2022 at 11:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
TupleTableSlot *remoteslot;
MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

Since we already have the check of applying the change on the spot at
the beginning of the handlers I feel it's better to add
is_skipping_changes() to the check than add a new if statement to
apply_dispatch, but do you prefer to check it in one central place in
apply_dispatch?

I think either way is fine. I just wanted to know the reason, your
current change looks okay to me.

Some questions/comments
======================
1. IIRC, earlier, we thought of allowing to use of this option (SKIP)
only for superusers (as this can lead to inconsistent data if not used
carefully) but I don't see that check in the latest patch. What is the
reason for the same?

I thought the non-superuser subscription owner can resolve the
conflict by manuall manipulating the relations, which is the same
result of skipping all data modification changes by ALTER SUBSCRIPTION
SKIP feature. But after more thought, it would not be exactly the same
since the skipped transaction might include changes to the relation
that the owner doesn't have permission on it.

2.
+ /*
+ * Update the subskiplsn of the tuple to InvalidXLogRecPtr.

I think we can change the above part of the comment to "Clear subskiplsn."

Fixed.

3.
+ * Since we already have

Isn't it better to say here: Since we have already ...?

Fixed.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#525Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#520)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Mar 16, 2022 at 1:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 7:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
TupleTableSlot *remoteslot;
MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

Since we already have the check of applying the change on the spot at
the beginning of the handlers I feel it's better to add
is_skipping_changes() to the check than add a new if statement to
apply_dispatch, but do you prefer to check it in one central place in
apply_dispatch?

I think either way is fine. I just wanted to know the reason, your
current change looks okay to me.

Some questions/comments
======================

Some cosmetic suggestions:
======================
1.
+# Create subscriptions. Both subscription sets disable_on_error to on
+# so that they get disabled when a conflict occurs.
+$node_subscriber->safe_psql(
+ 'postgres',
+ qq[
+CREATE SUBSCRIPTION $subname CONNECTION '$publisher_connstr'
PUBLICATION tap_pub WITH (streaming = on, two_phase = on,
disable_on_error = on);
+]);

I don't understand what you mean by 'Both subscription ...' in the
above comments.

Fixed.

2.
+ # Check the log indicating that successfully skipped the transaction,

How about slightly rephrasing this to: "Check the log to ensure that
the transaction is skipped...."?

Fixed.

I've attached an updated version patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

v15-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patchapplication/octet-stream; name=v15-0001-Add-ALTER-SUBSCRIPTION-.-SKIP-to-skip-the-transa.patch
#526shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Masahiko Sawada (#525)
RE: Skipping logical replication transactions on subscriber side

On Wed, Mar 16, 2022 4:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

Thanks for updating the patch. Here are some comments for the v15 patch.

1. src/backend/replication/logical/worker.c

+ * to skip applying the changes when starting to apply changes.  The subskiplsn is
+ * cleared after successfully skipping the transaction or applying non-empty
+ * transaction. The latter prevents the mistakenly specified subskiplsn from

Should "applying non-empty transaction" be modified to "finishing a
transaction"? To be consistent with the description in the
alter_subscription.sgml.

2. src/test/subscription/t/029_on_error.pl

+# Test of logical replication subscription self-disabling feature.

Should we add something about "skip logical replication transactions" in this
comment?

Regards,
Shi yu

#527Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: shiy.fnst@fujitsu.com (#526)
Re: Skipping logical replication transactions on subscriber side

On Thu, Mar 17, 2022 at 8:13 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Wed, Mar 16, 2022 4:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

Thanks for updating the patch. Here are some comments for the v15 patch.

1. src/backend/replication/logical/worker.c

+ * to skip applying the changes when starting to apply changes.  The subskiplsn is
+ * cleared after successfully skipping the transaction or applying non-empty
+ * transaction. The latter prevents the mistakenly specified subskiplsn from

Should "applying non-empty transaction" be modified to "finishing a
transaction"? To be consistent with the description in the
alter_subscription.sgml.

The current wording in the patch seems okay to me as it is good to
emphasize on non-empty transactions.

2. src/test/subscription/t/029_on_error.pl

+# Test of logical replication subscription self-disabling feature.

Should we add something about "skip logical replication transactions" in this
comment?

How about: "Tests for disable_on_error and SKIP transaction features."?

I am making some other minor edits in the patch and will take care of
whatever we decide for these comments.

--
With Regards,
Amit Kapila.

#528Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#525)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

The patch LGTM. I have made minor changes in comments and docs in the
attached patch. Kindly let me know what you think of the attached?

I am planning to commit this early next week (on Monday) unless there
are more comments/suggestions.

--
With Regards,
Amit Kapila.

Attachments:

v16-0001-Add-ALTER-SUBSCRIPTION-.-SKIP.patchapplication/octet-stream; name=v16-0001-Add-ALTER-SUBSCRIPTION-.-SKIP.patch
#529osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Amit Kapila (#528)
RE: Skipping logical replication transactions on subscriber side

On Thursday, March 17, 2022 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

The patch LGTM. I have made minor changes in comments and docs in the
attached patch. Kindly let me know what you think of the attached?

Hi, thank you for the patch. Few minor comments.

(1) comment of maybe_start_skipping_changes

+       /*
+        * Quick return if it's not requested to skip this transaction. This
+        * function is called for every remote transaction and we assume that
+        * skipping the transaction is not used often.
+        */

I feel this comment should explain more about our intention and
what it confirms. In a case when user requests skip,
but it doesn't match the condition, we don't start
skipping changes, strictly speaking.

From:
Quick return if it's not requested to skip this transaction.

To:
Quick return if we can't ensure possible skiplsn is set
and it equals to the finish LSN of this transaction.

(2) 029_on_error.pl

+       my $contents = slurp_file($node_subscriber->logfile, $offset);
+       $contents =~
+         qr/processing remote data for replication origin \"pg_\d+\" during "INSERT" for replication target relation "public.tbl" in transaction \d+ finishe$
+         or die "could not get error-LSN";

I think we shouldn't use a lot of new words.

How about a change below ?

From:
could not get error-LSN
To:
failed to find expected error message that contains finish LSN for SKIP option

(3) apply_handle_commit_internal

Lastly, may I have the reasons to call both
stop_skipping_changes and clear_subscription_skip_lsn
in this function, instead of having them at the end
of apply_handle_commit and apply_handle_stream_commit ?

IMHO, this structure looks to create the
extra condition branches in apply_handle_commit_internal.

Also, because of this code, when we call stop_skipping_changes
in the apply_handle_commit_internal, after checking
is_skipping_changes() returns true, we check another
is_skipping_changes() at the top of stop_skipping_changes.

OTOH, for other cases like apply_handle_prepare, apply_handle_stream_prepare,
we call those two functions (or either one) depending on the needs,
after existing commits and during the closing processing.
(In the case of rollback_prepare, it's also called after existing commit)

I feel if we move those two functions at the end
of the apply_handle_commit and apply_handle_stream_commit,
then we will have more aligned codes and improve readability.

Best Regards,
Takamichi Osumi

#530Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: osumi.takamichi@fujitsu.com (#529)
Re: Skipping logical replication transactions on subscriber side

On Thu, Mar 17, 2022 at 12:39 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Thursday, March 17, 2022 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

The patch LGTM. I have made minor changes in comments and docs in the
attached patch. Kindly let me know what you think of the attached?

Hi, thank you for the patch. Few minor comments.

(1) comment of maybe_start_skipping_changes

+       /*
+        * Quick return if it's not requested to skip this transaction. This
+        * function is called for every remote transaction and we assume that
+        * skipping the transaction is not used often.
+        */

I feel this comment should explain more about our intention and
what it confirms. In a case when user requests skip,
but it doesn't match the condition, we don't start
skipping changes, strictly speaking.

From:
Quick return if it's not requested to skip this transaction.

To:
Quick return if we can't ensure possible skiplsn is set
and it equals to the finish LSN of this transaction.

Hmm, the current comment seems more appropriate. What you are
suggesting is almost writing the code in sentence form.

(2) 029_on_error.pl

+       my $contents = slurp_file($node_subscriber->logfile, $offset);
+       $contents =~
+         qr/processing remote data for replication origin \"pg_\d+\" during "INSERT" for replication target relation "public.tbl" in transaction \d+ finishe$
+         or die "could not get error-LSN";

I think we shouldn't use a lot of new words.

How about a change below ?

From:
could not get error-LSN
To:
failed to find expected error message that contains finish LSN for SKIP option

(3) apply_handle_commit_internal

...

I feel if we move those two functions at the end
of the apply_handle_commit and apply_handle_stream_commit,
then we will have more aligned codes and improve readability.

I think the intention is to avoid duplicate code as we have a common
function that gets called from both of those. OTOH, if Sawada-San or
others also prefer your approach to rearrange the code then I am fine
with it.

--
With Regards,
Amit Kapila.

#531Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#530)
Re: Skipping logical replication transactions on subscriber side

On Thu, Mar 17, 2022 at 5:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 17, 2022 at 12:39 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Thursday, March 17, 2022 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

The patch LGTM. I have made minor changes in comments and docs in the
attached patch. Kindly let me know what you think of the attached?

Hi, thank you for the patch. Few minor comments.

(1) comment of maybe_start_skipping_changes

+       /*
+        * Quick return if it's not requested to skip this transaction. This
+        * function is called for every remote transaction and we assume that
+        * skipping the transaction is not used often.
+        */

I feel this comment should explain more about our intention and
what it confirms. In a case when user requests skip,
but it doesn't match the condition, we don't start
skipping changes, strictly speaking.

From:
Quick return if it's not requested to skip this transaction.

To:
Quick return if we can't ensure possible skiplsn is set
and it equals to the finish LSN of this transaction.

Hmm, the current comment seems more appropriate. What you are
suggesting is almost writing the code in sentence form.

(2) 029_on_error.pl

+       my $contents = slurp_file($node_subscriber->logfile, $offset);
+       $contents =~
+         qr/processing remote data for replication origin \"pg_\d+\" during "INSERT" for replication target relation "public.tbl" in transaction \d+ finishe$
+         or die "could not get error-LSN";

I think we shouldn't use a lot of new words.

How about a change below ?

From:
could not get error-LSN
To:
failed to find expected error message that contains finish LSN for SKIP option

(3) apply_handle_commit_internal

...

I feel if we move those two functions at the end
of the apply_handle_commit and apply_handle_stream_commit,
then we will have more aligned codes and improve readability.

I think we cannot just move them to the end of apply_handle_commit()
and apply_handle_stream_commit(). Because if we do that, we end up
missing updating replication_session_origin_lsn/timestamp when
clearing the subskiplsn if we're skipping a non-stream transaction.

Basically, the apply worker differently handles 2pc transactions and
non-2pc transactions; we always prepare even empty transactions
whereas we don't commit empty non-2pc transactions. So I think we
don’t have to handle both in the same way.

I think the intention is to avoid duplicate code as we have a common
function that gets called from both of those.

Yes.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#532osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
osumi.takamichi@fujitsu.com
In reply to: Masahiko Sawada (#531)
RE: Skipping logical replication transactions on subscriber side

On Thursday, March 17, 2022 7:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Mar 17, 2022 at 5:52 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Thu, Mar 17, 2022 at 12:39 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:

On Thursday, March 17, 2022 3:04 PM Amit Kapila

<amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

The patch LGTM. I have made minor changes in comments and docs in
the attached patch. Kindly let me know what you think of the attached?

Hi, thank you for the patch. Few minor comments.

(3) apply_handle_commit_internal

...

I feel if we move those two functions at the end of the
apply_handle_commit and apply_handle_stream_commit, then we will
have more aligned codes and improve readability.

I think we cannot just move them to the end of apply_handle_commit() and
apply_handle_stream_commit(). Because if we do that, we end up missing
updating replication_session_origin_lsn/timestamp when clearing the
subskiplsn if we're skipping a non-stream transaction.

Basically, the apply worker differently handles 2pc transactions and non-2pc
transactions; we always prepare even empty transactions whereas we don't
commit empty non-2pc transactions. So I think we don’t have to handle both in
the same way.

Okay. Thank you so much for your explanation.
Then the code looks good to me.

Best Regards,
Takamichi Osumi

#533Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#528)
Re: Skipping logical replication transactions on subscriber side

On Thu, Mar 17, 2022 at 3:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

The patch LGTM. I have made minor changes in comments and docs in the
attached patch. Kindly let me know what you think of the attached?

Thank you for updating the patch. It looks good to me.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#534Euler Taveira
Euler Taveira
euler@eulerto.com
In reply to: Amit Kapila (#528)
Re: Skipping logical replication transactions on subscriber side

On Thu, Mar 17, 2022, at 3:03 AM, Amit Kapila wrote:

On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

The patch LGTM. I have made minor changes in comments and docs in the
attached patch. Kindly let me know what you think of the attached?

I am planning to commit this early next week (on Monday) unless there
are more comments/suggestions.

I reviewed this last version and I have a few comments.

+                * If the user set subskiplsn, we do a sanity check to make
+                * sure that the specified LSN is a probable value.

... user *sets*...

+                       ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                errmsg("skip WAL location (LSN) must be greater than origin LSN %X/%X",
+                                       LSN_FORMAT_ARGS(remote_lsn))));

Shouldn't we add the LSN to be skipped in the "(LSN)"?

+        * Start a new transaction to clear the subskipxid, if not started
+        * yet.

It seems it means subskiplsn.

+ * subskipxid in order to inform users for cases e.g., where the user mistakenly
+ * specified the wrong subskiplsn.

It seems it means subskiplsn.

+sub test_skip_xact
+{

It seems this function should be named test_skip_lsn. Unless the intention is
to cover other skip options in the future.

src/test/subscription/t/029_disable_on_error.pl | 94 ----------
src/test/subscription/t/029_on_error.pl | 183 +++++++++++++++++++

It seems you are removing a test for 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33.
I should also name 029_on_error.pl to something else such as 030_skip_lsn.pl or
a generic name 030_skip_option.pl.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#535Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#534)
Re: Skipping logical replication transactions on subscriber side

On Mon, Mar 21, 2022 at 7:09 AM Euler Taveira <euler@eulerto.com> wrote:

src/test/subscription/t/029_disable_on_error.pl | 94 ----------
src/test/subscription/t/029_on_error.pl | 183 +++++++++++++++++++

It seems you are removing a test for 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33.

We have covered the same test in the new test file. See "CREATE
SUBSCRIPTION sub CONNECTION '$publisher_connstr' PUBLICATION pub WITH
(disable_on_error = true, ...". This will test the cases we were
earlier testing via 'disable_on_error'.

I should also name 029_on_error.pl to something else such as 030_skip_lsn.pl or
a generic name 030_skip_option.pl.

The reason to keep the name 'on_error' is that it has tests for both
'disable_on_error' option and 'skip_lsn'. The other option could be
'on_error_action' or something like that. Now, does this make sense to
you?

--
With Regards,
Amit Kapila.

#536Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#534)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Mon, Mar 21, 2022 at 7:09 AM Euler Taveira <euler@eulerto.com> wrote:

On Thu, Mar 17, 2022, at 3:03 AM, Amit Kapila wrote:

On Wed, Mar 16, 2022 at 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated version patch.

The patch LGTM. I have made minor changes in comments and docs in the
attached patch. Kindly let me know what you think of the attached?

I am planning to commit this early next week (on Monday) unless there
are more comments/suggestions.

I reviewed this last version and I have a few comments.

+                * If the user set subskiplsn, we do a sanity check to make
+                * sure that the specified LSN is a probable value.

... user *sets*...

+                       ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                errmsg("skip WAL location (LSN) must be greater than origin LSN %X/%X",
+                                       LSN_FORMAT_ARGS(remote_lsn))));

Shouldn't we add the LSN to be skipped in the "(LSN)"?

+        * Start a new transaction to clear the subskipxid, if not started
+        * yet.

It seems it means subskiplsn.

+ * subskipxid in order to inform users for cases e.g., where the user mistakenly
+ * specified the wrong subskiplsn.

It seems it means subskiplsn.

+sub test_skip_xact
+{

It seems this function should be named test_skip_lsn. Unless the intention is
to cover other skip options in the future.

I have fixed all the above comments as per your suggestion in the
attached. Do let me know if something is missed?

src/test/subscription/t/029_disable_on_error.pl | 94 ----------
src/test/subscription/t/029_on_error.pl | 183 +++++++++++++++++++

It seems you are removing a test for 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33.
I should also name 029_on_error.pl to something else such as 030_skip_lsn.pl or
a generic name 030_skip_option.pl.

As explained in my previous email, I don't think any change is
required for this comment but do let me know if you still think so?

--
With Regards,
Amit Kapila.

Attachments:

v17-0001-Add-ALTER-SUBSCRIPTION-.-SKIP.patchapplication/octet-stream; name=v17-0001-Add-ALTER-SUBSCRIPTION-.-SKIP.patch
#537Euler Taveira
Euler Taveira
euler@eulerto.com
In reply to: Amit Kapila (#536)
Re: Skipping logical replication transactions on subscriber side

On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:

I have fixed all the above comments as per your suggestion in the
attached. Do let me know if something is missed?

Looks good to me.

src/test/subscription/t/029_disable_on_error.pl | 94 ----------
src/test/subscription/t/029_on_error.pl | 183 +++++++++++++++++++

It seems you are removing a test for 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33.
I should also name 029_on_error.pl to something else such as 030_skip_lsn.pl or
a generic name 030_skip_option.pl.

As explained in my previous email, I don't think any change is
required for this comment but do let me know if you still think so?

Oh, sorry about the noise. I saw mixed tests between the 2 new features and I
was confused if it was intentional or not.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#538Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Euler Taveira (#537)
Re: Skipping logical replication transactions on subscriber side

On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira <euler@eulerto.com> wrote:

On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:

I have fixed all the above comments as per your suggestion in the
attached. Do let me know if something is missed?

Looks good to me.

This patch is committed
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8).
Today, I have marked the corresponding entry in CF as committed.

--
With Regards,
Amit Kapila.

#539Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Amit Kapila (#538)
Re: Skipping logical replication transactions on subscriber side

On Tue, Mar 29, 2022 at 10:43:00AM +0530, Amit Kapila wrote:

On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira <euler@eulerto.com> wrote:

On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
I have fixed all the above comments as per your suggestion in the
attached. Do let me know if something is missed?

Looks good to me.

This patch is committed
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8).

src/test/subscription/t/029_on_error.pl has been failing reliably on the five
AIX buildfarm members:

# poll_query_until timed out executing this query:
# SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for match: (?^:LOG: done skipping logical replication transaction finished at 0/1D30788) at t/029_on_error.pl line 50.

I've posted five sets of logs (2.7 MiB compressed) here:
https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing

The members have not actually uploaded these failures, due to an OOM in the
Perl process driving the buildfarm script. I think the OOM is due to a need
for excess RAM to capture 029_on_error_subscriber.log, which is 27MB here. I
will move the members to 64-bit Perl. (AIX 32-bit binaries OOM easily:
https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX.)

#540Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#539)
Re: Skipping logical replication transactions on subscriber side

On Fri, Apr 1, 2022 at 4:44 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Mar 29, 2022 at 10:43:00AM +0530, Amit Kapila wrote:

On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira <euler@eulerto.com> wrote:

On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
I have fixed all the above comments as per your suggestion in the
attached. Do let me know if something is missed?

Looks good to me.

This patch is committed
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8).

src/test/subscription/t/029_on_error.pl has been failing reliably on the five
AIX buildfarm members:

# poll_query_until timed out executing this query:
# SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for match: (?^:LOG: done skipping logical replication transaction finished at 0/1D30788) at t/029_on_error.pl line 50.

I've posted five sets of logs (2.7 MiB compressed) here:
https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing

Thank you for the report. I'm investigating this issue.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#541Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#540)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Fri, Apr 1, 2022 at 5:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Apr 1, 2022 at 4:44 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Mar 29, 2022 at 10:43:00AM +0530, Amit Kapila wrote:

On Mon, Mar 21, 2022 at 5:51 PM Euler Taveira <euler@eulerto.com> wrote:

On Mon, Mar 21, 2022, at 12:25 AM, Amit Kapila wrote:
I have fixed all the above comments as per your suggestion in the
attached. Do let me know if something is missed?

Looks good to me.

This patch is committed
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=208c5d65bbd60e33e272964578cb74182ac726a8).

src/test/subscription/t/029_on_error.pl has been failing reliably on the five
AIX buildfarm members:

# poll_query_until timed out executing this query:
# SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for match: (?^:LOG: done skipping logical replication transaction finished at 0/1D30788) at t/029_on_error.pl line 50.

I've posted five sets of logs (2.7 MiB compressed) here:
https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing

Thank you for the report. I'm investigating this issue.

Looking at the subscriber logs, it successfully fetched the correct
error-LSN from the server logs and set it to ALTER SUBSCRIPTION …
SKIP:

2022-03-30 09:48:36.617 UTC [17039636:4] CONTEXT: processing remote
data for replication origin "pg_16391" during "INSERT" for replication
target relation "public.tbl" in transaction 725 finished at 0/1D30788
2022-03-30 09:48:36.617 UTC [17039636:5] LOG: logical replication
subscription "sub" has been disabled due to an error
:
2022-03-30 09:48:36.670 UTC [17039640:1] [unknown] LOG: connection
received: host=[local]
2022-03-30 09:48:36.672 UTC [17039640:2] [unknown] LOG: connection
authorized: user=nm database=postgres application_name=029_on_error.pl
2022-03-30 09:48:36.675 UTC [17039640:3] 029_on_error.pl LOG:
statement: ALTER SUBSCRIPTION sub SKIP (lsn = '0/1D30788')
2022-03-30 09:48:36.676 UTC [17039640:4] 029_on_error.pl LOG:
disconnection: session time: 0:00:00.006 user=nm database=postgres
host=[local]
:
2022-03-30 09:48:36.762 UTC [28246036:2] ERROR: duplicate key value
violates unique constraint "tbl_pkey"
2022-03-30 09:48:36.762 UTC [28246036:3] DETAIL: Key (i)=(1) already exists.
2022-03-30 09:48:36.762 UTC [28246036:4] CONTEXT: processing remote
data for replication origin "pg_16391" during "INSERT" for replication
target relation "public.tbl" in transaction 725 finished at 0/1D30788

However, the worker could not start skipping changes of the error
transaction for some reason. Given that "SELECT subskiplsn = '0/0'
FROM pg_subscription WHERE subname = 'sub’” didn't return true, some
value was set to subskiplsn even after the unique key error.

So I'm guessing that the apply worker could not get the updated value
of the subskiplsn or its MySubscription->skiplsn could not match with
the transaction's finish LSN. Also, given that the test is failing on
all AIX buildfarm members, there might be something specific to AIX.

Noah, to investigate this issue further, is it possible for you to
apply the attached patch and run the 029_on_error.pl test? The patch
adds some logs to get additional information.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

add_logs.patchapplication/octet-stream; name=add_logs.patch
#542Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#541)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Fri, Apr 01, 2022 at 09:25:52PM +0900, Masahiko Sawada wrote:

On Fri, Apr 1, 2022 at 4:44 PM Noah Misch <noah@leadboat.com> wrote:

src/test/subscription/t/029_on_error.pl has been failing reliably on the five
AIX buildfarm members:

# poll_query_until timed out executing this query:
# SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for match: (?^:LOG: done skipping logical replication transaction finished at 0/1D30788) at t/029_on_error.pl line 50.

I've posted five sets of logs (2.7 MiB compressed) here:
https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing

Given that "SELECT subskiplsn = '0/0'
FROM pg_subscription WHERE subname = 'sub’” didn't return true, some
value was set to subskiplsn even after the unique key error.

So I'm guessing that the apply worker could not get the updated value
of the subskiplsn or its MySubscription->skiplsn could not match with
the transaction's finish LSN. Also, given that the test is failing on
all AIX buildfarm members, there might be something specific to AIX.

Noah, to investigate this issue further, is it possible for you to
apply the attached patch and run the 029_on_error.pl test? The patch
adds some logs to get additional information.

Logs attached. I ran this outside the buildfarm script environment. Most
notably, I didn't override PG_TEST_TIMEOUT_DEFAULT like my buildfarm
configuration does, so the total log size is smaller.

Attachments:

log-subscription-20220401.tar.xzapplication/octet-stream
#543Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Noah Misch (#542)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Sat, Apr 2, 2022 at 5:41 AM Noah Misch <noah@leadboat.com> wrote:

On Fri, Apr 01, 2022 at 09:25:52PM +0900, Masahiko Sawada wrote:

On Fri, Apr 1, 2022 at 4:44 PM Noah Misch <noah@leadboat.com> wrote:

src/test/subscription/t/029_on_error.pl has been failing reliably on the five
AIX buildfarm members:

# poll_query_until timed out executing this query:
# SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for match: (?^:LOG: done skipping logical replication transaction finished at 0/1D30788) at t/029_on_error.pl line 50.

I've posted five sets of logs (2.7 MiB compressed) here:
https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing

Given that "SELECT subskiplsn = '0/0'
FROM pg_subscription WHERE subname = 'sub’” didn't return true, some
value was set to subskiplsn even after the unique key error.

So I'm guessing that the apply worker could not get the updated value
of the subskiplsn or its MySubscription->skiplsn could not match with
the transaction's finish LSN. Also, given that the test is failing on
all AIX buildfarm members, there might be something specific to AIX.

Noah, to investigate this issue further, is it possible for you to
apply the attached patch and run the 029_on_error.pl test? The patch
adds some logs to get additional information.

Logs attached.

Thank you.

By seeing the below Logs:
----
....
2022-04-01 18:19:34.710 CUT [58327402] LOG: not started skipping
changes: my_skiplsn 14EB7D8/B0706F72 finish_lsn 0/14EB7D8
...
----

It seems that the value of skiplsn read in GetSubscription is wrong
which makes the apply worker think it doesn't need to skip the
transaction. Now, in Alter/Create Subscription, we are using
LSNGetDatum() to store skiplsn value in pg_subscription but while
reading it in GetSubscription(), we are not converting back the datum
to LSN by using DatumGetLSN(). Is it possible that on this machine it
might be leading to not getting the right value for skiplsn? I think
it is worth trying to see if this fixes the problem.

Any other thoughts?

--
With Regards,
Amit Kapila.

Attachments:

datum_to_lsn_skiplsn_1.patchapplication/octet-stream; name=datum_to_lsn_skiplsn_1.patch
#544Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Amit Kapila (#543)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Sat, Apr 02, 2022 at 06:49:20AM +0530, Amit Kapila wrote:

On Sat, Apr 2, 2022 at 5:41 AM Noah Misch <noah@leadboat.com> wrote:

On Fri, Apr 01, 2022 at 09:25:52PM +0900, Masahiko Sawada wrote:

On Fri, Apr 1, 2022 at 4:44 PM Noah Misch <noah@leadboat.com> wrote:

src/test/subscription/t/029_on_error.pl has been failing reliably on the five
AIX buildfarm members:

# poll_query_until timed out executing this query:
# SELECT subskiplsn = '0/0' FROM pg_subscription WHERE subname = 'sub'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
timed out waiting for match: (?^:LOG: done skipping logical replication transaction finished at 0/1D30788) at t/029_on_error.pl line 50.

I've posted five sets of logs (2.7 MiB compressed) here:
https://drive.google.com/file/d/16NkyNIV07o0o8WM7GwcaAYFQDPTkULkR/view?usp=sharing

Given that "SELECT subskiplsn = '0/0'
FROM pg_subscription WHERE subname = 'sub’” didn't return true, some
value was set to subskiplsn even after the unique key error.

So I'm guessing that the apply worker could not get the updated value
of the subskiplsn or its MySubscription->skiplsn could not match with
the transaction's finish LSN. Also, given that the test is failing on
all AIX buildfarm members, there might be something specific to AIX.

Noah, to investigate this issue further, is it possible for you to
apply the attached patch and run the 029_on_error.pl test? The patch
adds some logs to get additional information.

Logs attached.

Thank you.

By seeing the below Logs:
----
....
2022-04-01 18:19:34.710 CUT [58327402] LOG: not started skipping
changes: my_skiplsn 14EB7D8/B0706F72 finish_lsn 0/14EB7D8
...
----

It seems that the value of skiplsn read in GetSubscription is wrong
which makes the apply worker think it doesn't need to skip the
transaction. Now, in Alter/Create Subscription, we are using
LSNGetDatum() to store skiplsn value in pg_subscription but while
reading it in GetSubscription(), we are not converting back the datum
to LSN by using DatumGetLSN(). Is it possible that on this machine it
might be leading to not getting the right value for skiplsn? I think
it is worth trying to see if this fixes the problem.

After applying datum_to_lsn_skiplsn_1.patch, I get another failure. Logs
attached.

Attachments:

log-subscription-20220401b.tar.xzapplication/octet-stream
#545Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Noah Misch (#544)
Re: Skipping logical replication transactions on subscriber side

On Sat, Apr 2, 2022 at 7:29 AM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 06:49:20AM +0530, Amit Kapila wrote:

After applying datum_to_lsn_skiplsn_1.patch, I get another failure. Logs
attached.

The failure is for the same reason. I noticed that even when skip lsn
value should be 0/0, it is some invalid value, see: "LOG: not started
skipping changes: my_skiplsn 0/B0706F72 finish_lsn 0/14EB7D8". Here,
my_skiplsn should be 0/0 instead of 0/B0706F72. Now, I am not sure why
the LSN's 4 bytes are correct and the other 4 bytes have some random
value. A similar problem is there when we have set the valid value of
skip lsn, see: "LOG: not started skipping changes: my_skiplsn
14EB7D8/B0706F72 finish_lsn 0/14EB7D8". Here the value of my_skiplsn
should be 0/14EB7D8 instead of 14EB7D8/B0706F72.

I am sure that if you create a subscription with the below test and
check the skip lsn value, it will be correct, otherwise, you would
have seen failure in subscription.sql as well. If possible, can you
please check the following example to rule out the possibility:

For example,
Publisher:
Create table t1(c1 int);
Create Publication pub1 for table t1;

Subscriber:
Create table t1(c1 int);
Create Subscription sub1 connection 'dbname = postgres' Publication pub1;
Select subname, subskiplsn from pg_subsription; -- subskiplsn should be 0/0

Alter Subscription sub1 SKIP (LSN = '0/14EB7D8');
Select subname, subskiplsn from pg_subsription; -- subskiplsn should
be 0/14EB7D8

Assuming the above is correct and we are still getting the wrong value
in apply worker, the only remaining suspect is the following code in
GetSubscription:
sub->skiplsn = DatumGetLSN(subform->subskiplsn);

I don't know what is wrong with this because subskiplsn is stored as
pg_lsn which is a fixed value and we should be able to access it by
struct. Do you see any problem with this?

--
With Regards,
Amit Kapila.

#546Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#545)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Sat, Apr 2, 2022 at 1:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 2, 2022 at 7:29 AM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 06:49:20AM +0530, Amit Kapila wrote:

After applying datum_to_lsn_skiplsn_1.patch, I get another failure. Logs
attached.

The failure is for the same reason. I noticed that even when skip lsn
value should be 0/0, it is some invalid value, see: "LOG: not started
skipping changes: my_skiplsn 0/B0706F72 finish_lsn 0/14EB7D8". Here,
my_skiplsn should be 0/0 instead of 0/B0706F72. Now, I am not sure why
the LSN's 4 bytes are correct and the other 4 bytes have some random
value.

It seems that 0/B0706F72 is not a random value. Two subscriber logs
show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
might show the next field in the pg_subscription catalog, i.e.,
subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
PUBLICATION pub WITH (disable_on_error = true, streaming = on,
two_phase = on)".

Given subscription.sql passes, something is wrong when we read the
subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".

Is it possible to run the test again with the attached patch?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

add_logs_v2.patchapplication/octet-stream; name=add_logs_v2.patch
#547Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#546)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Sat, Apr 02, 2022 at 04:33:44PM +0900, Masahiko Sawada wrote:

It seems that 0/B0706F72 is not a random value. Two subscriber logs
show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
might show the next field in the pg_subscription catalog, i.e.,
subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
PUBLICATION pub WITH (disable_on_error = true, streaming = on,
two_phase = on)".

Given subscription.sql passes, something is wrong when we read the
subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".

That's a good clue. We've never made pg_type.typalign able to represent
alignment as it works on AIX. A uint64 like pg_lsn has 8-byte alignment, so
the C struct follows from that. At the typalign level, we have only these:

#define TYPALIGN_CHAR 'c' /* char alignment (i.e. unaligned) */
#define TYPALIGN_SHORT 's' /* short alignment (typically 2 bytes) */
#define TYPALIGN_INT 'i' /* int alignment (typically 4 bytes) */
#define TYPALIGN_DOUBLE 'd' /* double alignment (often 8 bytes) */

On AIX, they are:

#define ALIGNOF_DOUBLE 4
#define ALIGNOF_INT 4
#define ALIGNOF_LONG 8
/* #undef ALIGNOF_LONG_LONG_INT */
/* #undef ALIGNOF_PG_INT128_TYPE */
#define ALIGNOF_SHORT 2

uint64 and pg_lsn use TYPALIGN_DOUBLE. For AIX, they really need a typalign
corresponding to ALIGNOF_LONG. Hence, the C struct layout doesn't match the
tuple layout. Columns potentially affected:

[local] test=*# select attrelid::regclass, attname from pg_attribute a join pg_class c on c.oid = attrelid where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1;
attrelid │ attname
─────────────────┼──────────────
pg_sequence │ seqstart
pg_sequence │ seqincrement
pg_sequence │ seqmax
pg_sequence │ seqmin
pg_sequence │ seqcache
pg_subscription │ subskiplsn
(6 rows)

The pg_sequence fields evade trouble, because there's exactly eight bytes (two
oids) before them.

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.
- Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.
- Introduce a new typalign value suitable for uint64. This is more intrusive,
but it's more future-proof. Looking beyond catalog columns, it might
improve performance by avoiding unaligned reads.

Is it possible to run the test again with the attached patch?

Logs attached. The test "passed", though it printed "poll_query_until timed
out" three times and took awhile.

Attachments:

log-subscription-20220401c.tar.xzapplication/octet-stream
#548Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Noah Misch (#547)
Re: Skipping logical replication transactions on subscriber side

On Sat, Apr 2, 2022 at 1:43 PM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 04:33:44PM +0900, Masahiko Sawada wrote:

It seems that 0/B0706F72 is not a random value. Two subscriber logs
show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
might show the next field in the pg_subscription catalog, i.e.,
subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
PUBLICATION pub WITH (disable_on_error = true, streaming = on,
two_phase = on)".

Given subscription.sql passes, something is wrong when we read the
subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".

That's a good clue. We've never made pg_type.typalign able to represent
alignment as it works on AIX. A uint64 like pg_lsn has 8-byte alignment, so
the C struct follows from that. At the typalign level, we have only these:

#define TYPALIGN_CHAR 'c' /* char alignment (i.e. unaligned) */
#define TYPALIGN_SHORT 's' /* short alignment (typically 2 bytes) */
#define TYPALIGN_INT 'i' /* int alignment (typically 4 bytes) */
#define TYPALIGN_DOUBLE 'd' /* double alignment (often 8 bytes) */

On AIX, they are:

#define ALIGNOF_DOUBLE 4
#define ALIGNOF_INT 4
#define ALIGNOF_LONG 8
/* #undef ALIGNOF_LONG_LONG_INT */
/* #undef ALIGNOF_PG_INT128_TYPE */
#define ALIGNOF_SHORT 2

uint64 and pg_lsn use TYPALIGN_DOUBLE. For AIX, they really need a typalign
corresponding to ALIGNOF_LONG. Hence, the C struct layout doesn't match the
tuple layout. Columns potentially affected:

[local] test=*# select attrelid::regclass, attname from pg_attribute a join pg_class c on c.oid = attrelid where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1;
attrelid │ attname
─────────────────┼──────────────
pg_sequence │ seqstart
pg_sequence │ seqincrement
pg_sequence │ seqmax
pg_sequence │ seqmin
pg_sequence │ seqcache
pg_subscription │ subskiplsn
(6 rows)

The pg_sequence fields evade trouble, because there's exactly eight bytes (two
oids) before them.

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.
- Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.

+1 to any one of the above. I mildly prefer the first option as that
will allow us to access the value directly instead of going via
SysCacheGetAttr but I am fine either way.

- Introduce a new typalign value suitable for uint64. This is more intrusive,
but it's more future-proof. Looking beyond catalog columns, it might
improve performance by avoiding unaligned reads.

Is it possible to run the test again with the attached patch?

Logs attached. The test "passed", though it printed "poll_query_until timed
out" three times and took awhile.

Thanks for helping in figuring out the problem.

--
With Regards,
Amit Kapila.

#549Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#548)
Re: Skipping logical replication transactions on subscriber side

On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 2, 2022 at 1:43 PM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 04:33:44PM +0900, Masahiko Sawada wrote:

It seems that 0/B0706F72 is not a random value. Two subscriber logs
show the same value. Since 0x70 = 'p', 0x6F = 'o', and 0x72 = 'r', it
might show the next field in the pg_subscription catalog, i.e.,
subconninfo. The subscription is created by "CREATE SUBSCRIPTION sub
CONNECTION 'port=57851 host=/tmp/6u2vRwQYik dbname=postgres'
PUBLICATION pub WITH (disable_on_error = true, streaming = on,
two_phase = on)".

Given subscription.sql passes, something is wrong when we read the
subskiplsn value by like "sub->skiplsn = subform->subskiplsn;".

That's a good clue. We've never made pg_type.typalign able to represent
alignment as it works on AIX. A uint64 like pg_lsn has 8-byte alignment, so
the C struct follows from that. At the typalign level, we have only these:

#define TYPALIGN_CHAR 'c' /* char alignment (i.e. unaligned) */
#define TYPALIGN_SHORT 's' /* short alignment (typically 2 bytes) */
#define TYPALIGN_INT 'i' /* int alignment (typically 4 bytes) */
#define TYPALIGN_DOUBLE 'd' /* double alignment (often 8 bytes) */

On AIX, they are:

#define ALIGNOF_DOUBLE 4
#define ALIGNOF_INT 4
#define ALIGNOF_LONG 8
/* #undef ALIGNOF_LONG_LONG_INT */
/* #undef ALIGNOF_PG_INT128_TYPE */
#define ALIGNOF_SHORT 2

uint64 and pg_lsn use TYPALIGN_DOUBLE. For AIX, they really need a typalign
corresponding to ALIGNOF_LONG. Hence, the C struct layout doesn't match the
tuple layout. Columns potentially affected:

[local] test=*# select attrelid::regclass, attname from pg_attribute a join pg_class c on c.oid = attrelid where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1;
attrelid │ attname
─────────────────┼──────────────
pg_sequence │ seqstart
pg_sequence │ seqincrement
pg_sequence │ seqmax
pg_sequence │ seqmin
pg_sequence │ seqcache
pg_subscription │ subskiplsn
(6 rows)

The pg_sequence fields evade trouble, because there's exactly eight bytes (two
oids) before them.

Thanks for helping with the investigation!

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.
- Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.

+1 to any one of the above. I mildly prefer the first option as that
will allow us to access the value directly instead of going via
SysCacheGetAttr but I am fine either way.

+1. I also prefer the first option.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#550Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#549)
Re: Skipping logical replication transactions on subscriber side

On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:

On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 2, 2022 at 1:43 PM Noah Misch <noah@leadboat.com> wrote:

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.
- Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.

+1 to any one of the above. I mildly prefer the first option as that
will allow us to access the value directly instead of going via
SysCacheGetAttr but I am fine either way.

+1. I also prefer the first option.

Sounds good to me.

#551Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#550)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Sun, Apr 3, 2022 at 9:45 AM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:

On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 2, 2022 at 1:43 PM Noah Misch <noah@leadboat.com> wrote:

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.
- Move subskiplsn to the CATALOG_VARLEN section, despite its fixed length.

+1 to any one of the above. I mildly prefer the first option as that
will allow us to access the value directly instead of going via
SysCacheGetAttr but I am fine either way.

+1. I also prefer the first option.

Sounds good to me.

I've attached the patch for the first option.

- Introduce a new typalign value suitable for uint64. This is more intrusive,
but it's more future-proof. Looking beyond catalog columns, it might
improve performance by avoiding unaligned reads.

The third option would be a good item for PG16 or later.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

make_subskiplsn_aligned.patchapplication/octet-stream; name=make_subskiplsn_aligned.patch
#552Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#551)
Re: Skipping logical replication transactions on subscriber side

On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:

On Sun, Apr 3, 2022 at 9:45 AM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:

On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 2, 2022 at 1:43 PM Noah Misch <noah@leadboat.com> wrote:

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.

--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
Oid			subdbid BKI_LOOKUP(pg_database);	/* Database the
* subscription is in. */
+
+	/*
+	 * All changes finished at this LSN are skipped.
+	 *
+	 * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte alignment
+	 * (TYPALIGN_DOUBLE) and it does not match the alignment on some platforms
+	 * such as AIX.  Therefore subskiplsn needs to be placed here so it is
+	 * always aligned.

I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 bytes, but
the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.

On most hosts, the C alignment of an XLogRecPtr is 8 bytes, and
TYPALIGN_DOUBLE==8. On AIX, C alignment is still 8 bytes, but
TYPALIGN_DOUBLE==4. The tuples on disk and in shared buffers use
TYPALIGN_DOUBLE to decide how much padding to insert, and that amount of
padding needs to match the C alignment padding. Placing the field here
reduces the padding to zero, making that invariant hold trivially.

+	 */
+	XLogRecPtr	subskiplsn;
+
NameData	subname;		/* Name of the subscription */

Oid subowner BKI_LOOKUP(pg_authid); /* Owner of the subscription */
@@ -71,9 +82,6 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
bool subdisableonerr; /* True if a worker error should cause the
* subscription to be disabled */

- XLogRecPtr subskiplsn; /* All changes finished at this LSN are
- * skipped */

Some code sites list pg_subscription fields in field order. Please update
them so they continue to list fields in field order. CreateSubscription() is
one example.

#553Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Noah Misch (#552)
Re: Skipping logical replication transactions on subscriber side

On Mon, Apr 4, 2022 at 8:01 AM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:

On Sun, Apr 3, 2022 at 9:45 AM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:

On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 2, 2022 at 1:43 PM Noah Misch <noah@leadboat.com> wrote:

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.

--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
Oid                     subdbid BKI_LOOKUP(pg_database);        /* Database the
* subscription is in. */
+
+     /*
+      * All changes finished at this LSN are skipped.
+      *
+      * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte alignment
+      * (TYPALIGN_DOUBLE) and it does not match the alignment on some platforms
+      * such as AIX.  Therefore subskiplsn needs to be placed here so it is
+      * always aligned.

I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 bytes, but
the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

+      */
+     XLogRecPtr      subskiplsn;
+
NameData        subname;                /* Name of the subscription */

Oid subowner BKI_LOOKUP(pg_authid); /* Owner of the subscription */
@@ -71,9 +82,6 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
bool subdisableonerr; /* True if a worker error should cause the
* subscription to be disabled */

- XLogRecPtr subskiplsn; /* All changes finished at this LSN are
- * skipped */

Some code sites list pg_subscription fields in field order. Please update
them so they continue to list fields in field order. CreateSubscription() is
one example.

Another minor point is that I think it is better to use DatumGetLSN to
read this in GetSubscription as we use LSNGetDatum while storing it. I
am not sure if there is any direct problem due to this but that looks
consistent to me.

--
With Regards,
Amit Kapila.

#554Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#553)
Re: Skipping logical replication transactions on subscriber side

On Mon, Apr 4, 2022 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 4, 2022 at 8:01 AM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:

On Sun, Apr 3, 2022 at 9:45 AM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:

On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 2, 2022 at 1:43 PM Noah Misch <noah@leadboat.com> wrote:

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.

--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
Oid                     subdbid BKI_LOOKUP(pg_database);        /* Database the
* subscription is in. */
+
+     /*
+      * All changes finished at this LSN are skipped.
+      *
+      * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte alignment
+      * (TYPALIGN_DOUBLE) and it does not match the alignment on some platforms
+      * such as AIX.  Therefore subskiplsn needs to be placed here so it is
+      * always aligned.

I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 bytes, but
the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

+      */
+     XLogRecPtr      subskiplsn;
+
NameData        subname;                /* Name of the subscription */

Oid subowner BKI_LOOKUP(pg_authid); /* Owner of the subscription */
@@ -71,9 +82,6 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
bool subdisableonerr; /* True if a worker error should cause the
* subscription to be disabled */

- XLogRecPtr subskiplsn; /* All changes finished at this LSN are
- * skipped */

Some code sites list pg_subscription fields in field order. Please update
them so they continue to list fields in field order. CreateSubscription() is
one example.

Another minor point is that I think it is better to use DatumGetLSN to
read this in GetSubscription as we use LSNGetDatum while storing it. I
am not sure if there is any direct problem due to this but that looks
consistent to me.

But it seems not consistent with other usages since we don't normally
use DatumGetXXX to get values directly from C struct.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#555Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#554)
Re: Skipping logical replication transactions on subscriber side

On Mon, Apr 4, 2022 at 8:41 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Apr 4, 2022 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Another minor point is that I think it is better to use DatumGetLSN to
read this in GetSubscription as we use LSNGetDatum while storing it. I
am not sure if there is any direct problem due to this but that looks
consistent to me.

But it seems not consistent with other usages since we don't normally
use DatumGetXXX to get values directly from C struct.

Okay, I see that for sequences also we don't use it, so we can
probably leave it as it is.

--
With Regards,
Amit Kapila.

#556Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Amit Kapila (#553)
Re: Skipping logical replication transactions on subscriber side

On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:

On Mon, Apr 4, 2022 at 8:01 AM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:

On Sun, Apr 3, 2022 at 9:45 AM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:

On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 2, 2022 at 1:43 PM Noah Misch <noah@leadboat.com> wrote:

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.

--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
Oid                     subdbid BKI_LOOKUP(pg_database);        /* Database the
* subscription is in. */
+
+     /*
+      * All changes finished at this LSN are skipped.
+      *
+      * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte alignment
+      * (TYPALIGN_DOUBLE) and it does not match the alignment on some platforms
+      * such as AIX.  Therefore subskiplsn needs to be placed here so it is
+      * always aligned.

I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 bytes, but
the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

I'd write it like this, though I'm not sure it's an improvement on your words:

When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment on
some of the C types that correspond to TYPALIGN_DOUBLE SQL types. To ensure
catalog C struct layout matches catalog tuple layout, arrange for the tuple
offset of each fixed-width, attalign='d' catalog column to be divisible by 8
unconditionally. Keep such columns before the first NameData column of the
catalog, since packagers can override NAMEDATALEN to an odd number.

The best place for such a comment would be in one of
src/test/regress/sql/*sanity*.sql, next to a test written to detect new
violations. If adding such a test would materially delay getting the
buildfarm green, putting the comment in pg_subscription.h works for me.

#557Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#556)
Re: Skipping logical replication transactions on subscriber side

On Mon, Apr 4, 2022 at 3:26 PM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:

On Mon, Apr 4, 2022 at 8:01 AM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 10:28:30AM +0900, Masahiko Sawada wrote:

On Sun, Apr 3, 2022 at 9:45 AM Noah Misch <noah@leadboat.com> wrote:

On Sat, Apr 02, 2022 at 08:44:45PM +0900, Masahiko Sawada wrote:

On Sat, Apr 2, 2022 at 7:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Apr 2, 2022 at 1:43 PM Noah Misch <noah@leadboat.com> wrote:

Some options:
- Move subskiplsn after subdbid, so it's always aligned anyway. I've
confirmed that this lets the test pass, in 44s.

--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -54,6 +54,17 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
Oid                     subdbid BKI_LOOKUP(pg_database);        /* Database the
* subscription is in. */
+
+     /*
+      * All changes finished at this LSN are skipped.
+      *
+      * Note that XLogRecPtr, pg_lsn in the catalog, is 8-byte alignment
+      * (TYPALIGN_DOUBLE) and it does not match the alignment on some platforms
+      * such as AIX.  Therefore subskiplsn needs to be placed here so it is
+      * always aligned.

I'm reading this comment as saying that TYPALIGN_DOUBLE is always 8 bytes, but
the problem arises precisely because TYPALIGN_DOUBLE==4 on AIX.

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

I'd write it like this, though I'm not sure it's an improvement on your words:

When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment on
some of the C types that correspond to TYPALIGN_DOUBLE SQL types. To ensure
catalog C struct layout matches catalog tuple layout, arrange for the tuple
offset of each fixed-width, attalign='d' catalog column to be divisible by 8
unconditionally. Keep such columns before the first NameData column of the
catalog, since packagers can override NAMEDATALEN to an odd number.

Thanks!

The best place for such a comment would be in one of
src/test/regress/sql/*sanity*.sql, next to a test written to detect new
violations.

Agreed.

IIUC in the new test, we would need a new SQL function to calculate
the offset of catalog columns including padding, is that right? Or do
you have an idea to do that by using existing functionality?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#558Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#557)
Re: Skipping logical replication transactions on subscriber side

On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:

On Mon, Apr 4, 2022 at 3:26 PM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

I'd write it like this, though I'm not sure it's an improvement on your words:

When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment on
some of the C types that correspond to TYPALIGN_DOUBLE SQL types. To ensure
catalog C struct layout matches catalog tuple layout, arrange for the tuple
offset of each fixed-width, attalign='d' catalog column to be divisible by 8
unconditionally. Keep such columns before the first NameData column of the
catalog, since packagers can override NAMEDATALEN to an odd number.

Thanks!

The best place for such a comment would be in one of
src/test/regress/sql/*sanity*.sql, next to a test written to detect new
violations.

Agreed.

IIUC in the new test, we would need a new SQL function to calculate
the offset of catalog columns including padding, is that right? Or do
you have an idea to do that by using existing functionality?

Something like this:

select
attrelid::regclass,
attname,
array(select typname
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum order by pa.attnum) AS types_before,
(select sum(attlen)
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum) AS len_before
from pg_attribute a
join pg_class c on c.oid = attrelid
where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
order by attrelid::regclass::text, attnum;
attrelid │ attname │ types_before │ len_before
─────────────────┼──────────────┼─────────────────────────────────────────────┼────────────
pg_sequence │ seqstart │ {oid,oid} │ 8
pg_sequence │ seqincrement │ {oid,oid,int8} │ 16
pg_sequence │ seqmax │ {oid,oid,int8,int8} │ 24
pg_sequence │ seqmin │ {oid,oid,int8,int8,int8} │ 32
pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8} │ 40
pg_subscription │ subskiplsn │ {oid,oid,name,oid,bool,bool,bool,char,bool} │ 81
(6 rows)

That doesn't count padding, but hazardous column changes will cause a diff in
the output.

#559Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#558)
Re: Skipping logical replication transactions on subscriber side

On Tue, Apr 5, 2022 at 9:21 AM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:

On Mon, Apr 4, 2022 at 3:26 PM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

I'd write it like this, though I'm not sure it's an improvement on your words:

When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment on
some of the C types that correspond to TYPALIGN_DOUBLE SQL types. To ensure
catalog C struct layout matches catalog tuple layout, arrange for the tuple
offset of each fixed-width, attalign='d' catalog column to be divisible by 8
unconditionally. Keep such columns before the first NameData column of the
catalog, since packagers can override NAMEDATALEN to an odd number.

Thanks!

The best place for such a comment would be in one of
src/test/regress/sql/*sanity*.sql, next to a test written to detect new
violations.

Agreed.

IIUC in the new test, we would need a new SQL function to calculate
the offset of catalog columns including padding, is that right? Or do
you have an idea to do that by using existing functionality?

Something like this:

select
attrelid::regclass,
attname,
array(select typname
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum order by pa.attnum) AS types_before,
(select sum(attlen)
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum) AS len_before
from pg_attribute a
join pg_class c on c.oid = attrelid
where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
order by attrelid::regclass::text, attnum;
attrelid │ attname │ types_before │ len_before
─────────────────┼──────────────┼─────────────────────────────────────────────┼────────────
pg_sequence │ seqstart │ {oid,oid} │ 8
pg_sequence │ seqincrement │ {oid,oid,int8} │ 16
pg_sequence │ seqmax │ {oid,oid,int8,int8} │ 24
pg_sequence │ seqmin │ {oid,oid,int8,int8,int8} │ 32
pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8} │ 40
pg_subscription │ subskiplsn │ {oid,oid,name,oid,bool,bool,bool,char,bool} │ 81
(6 rows)

That doesn't count padding, but hazardous column changes will cause a diff in
the output.

Yes, in this case, we can detect the violated column order even
without considering padding. On the other hand, I think this
calculation could not detect some patterns of order. For instance,
suppose the column order is {oid, bool, bool, oid, bool, bool, oid,
int8}, the len_before is 16 but offset of int8 column including
padding is 20 on ALIGNOF_DOUBLE==4 environment.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#560Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#559)
Re: Skipping logical replication transactions on subscriber side

On Tue, Apr 05, 2022 at 10:13:06AM +0900, Masahiko Sawada wrote:

On Tue, Apr 5, 2022 at 9:21 AM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:

On Mon, Apr 4, 2022 at 3:26 PM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

I'd write it like this, though I'm not sure it's an improvement on your words:

When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment on
some of the C types that correspond to TYPALIGN_DOUBLE SQL types. To ensure
catalog C struct layout matches catalog tuple layout, arrange for the tuple
offset of each fixed-width, attalign='d' catalog column to be divisible by 8
unconditionally. Keep such columns before the first NameData column of the
catalog, since packagers can override NAMEDATALEN to an odd number.

Thanks!

The best place for such a comment would be in one of
src/test/regress/sql/*sanity*.sql, next to a test written to detect new
violations.

Agreed.

IIUC in the new test, we would need a new SQL function to calculate
the offset of catalog columns including padding, is that right? Or do
you have an idea to do that by using existing functionality?

Something like this:

select
attrelid::regclass,
attname,
array(select typname
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum order by pa.attnum) AS types_before,
(select sum(attlen)
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum) AS len_before
from pg_attribute a
join pg_class c on c.oid = attrelid
where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
order by attrelid::regclass::text, attnum;
attrelid │ attname │ types_before │ len_before
─────────────────┼──────────────┼─────────────────────────────────────────────┼────────────
pg_sequence │ seqstart │ {oid,oid} │ 8
pg_sequence │ seqincrement │ {oid,oid,int8} │ 16
pg_sequence │ seqmax │ {oid,oid,int8,int8} │ 24
pg_sequence │ seqmin │ {oid,oid,int8,int8,int8} │ 32
pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8} │ 40
pg_subscription │ subskiplsn │ {oid,oid,name,oid,bool,bool,bool,char,bool} │ 81
(6 rows)

That doesn't count padding, but hazardous column changes will cause a diff in
the output.

Yes, in this case, we can detect the violated column order even
without considering padding. On the other hand, I think this
calculation could not detect some patterns of order. For instance,
suppose the column order is {oid, bool, bool, oid, bool, bool, oid,
int8}, the len_before is 16 but offset of int8 column including
padding is 20 on ALIGNOF_DOUBLE==4 environment.

Correct. Feel free to make it more precise. If you do want to add a
function, it could be a regress.c function rather than an always-installed
part of PostgreSQL. Again, getting the buildfarm green is a priority; we can
always add tests later.

#561Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#560)
Re: Skipping logical replication transactions on subscriber side

On Tue, Apr 5, 2022 at 10:46 AM Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 05, 2022 at 10:13:06AM +0900, Masahiko Sawada wrote:

On Tue, Apr 5, 2022 at 9:21 AM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:

On Mon, Apr 4, 2022 at 3:26 PM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

I'd write it like this, though I'm not sure it's an improvement on your words:

When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment on
some of the C types that correspond to TYPALIGN_DOUBLE SQL types. To ensure
catalog C struct layout matches catalog tuple layout, arrange for the tuple
offset of each fixed-width, attalign='d' catalog column to be divisible by 8
unconditionally. Keep such columns before the first NameData column of the
catalog, since packagers can override NAMEDATALEN to an odd number.

Thanks!

The best place for such a comment would be in one of
src/test/regress/sql/*sanity*.sql, next to a test written to detect new
violations.

Agreed.

IIUC in the new test, we would need a new SQL function to calculate
the offset of catalog columns including padding, is that right? Or do
you have an idea to do that by using existing functionality?

Something like this:

select
attrelid::regclass,
attname,
array(select typname
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum order by pa.attnum) AS types_before,
(select sum(attlen)
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum) AS len_before
from pg_attribute a
join pg_class c on c.oid = attrelid
where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
order by attrelid::regclass::text, attnum;
attrelid │ attname │ types_before │ len_before
─────────────────┼──────────────┼─────────────────────────────────────────────┼────────────
pg_sequence │ seqstart │ {oid,oid} │ 8
pg_sequence │ seqincrement │ {oid,oid,int8} │ 16
pg_sequence │ seqmax │ {oid,oid,int8,int8} │ 24
pg_sequence │ seqmin │ {oid,oid,int8,int8,int8} │ 32
pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8} │ 40
pg_subscription │ subskiplsn │ {oid,oid,name,oid,bool,bool,bool,char,bool} │ 81
(6 rows)

That doesn't count padding, but hazardous column changes will cause a diff in
the output.

Yes, in this case, we can detect the violated column order even
without considering padding. On the other hand, I think this
calculation could not detect some patterns of order. For instance,
suppose the column order is {oid, bool, bool, oid, bool, bool, oid,
int8}, the len_before is 16 but offset of int8 column including
padding is 20 on ALIGNOF_DOUBLE==4 environment.

Correct. Feel free to make it more precise. If you do want to add a
function, it could be a regress.c function rather than an always-installed
part of PostgreSQL. Again, getting the buildfarm green is a priority; we can
always add tests later.

Agreed. I'll update and submit the patch as soon as possible.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#562Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#561)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Tue, Apr 5, 2022 at 12:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Apr 5, 2022 at 10:46 AM Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 05, 2022 at 10:13:06AM +0900, Masahiko Sawada wrote:

On Tue, Apr 5, 2022 at 9:21 AM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 06:55:45PM +0900, Masahiko Sawada wrote:

On Mon, Apr 4, 2022 at 3:26 PM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 04, 2022 at 08:20:08AM +0530, Amit Kapila wrote:

How about a comment like: "It has to be kept at 8-byte alignment
boundary so as to be accessed directly via C struct as it uses
TYPALIGN_DOUBLE for storage which has 4-byte alignment on platforms
like AIX."? Can you please suggest a better comment if you don't like
this one?

I'd write it like this, though I'm not sure it's an improvement on your words:

When ALIGNOF_DOUBLE==4 (e.g. AIX), the C ABI may impose 8-byte alignment on
some of the C types that correspond to TYPALIGN_DOUBLE SQL types. To ensure
catalog C struct layout matches catalog tuple layout, arrange for the tuple
offset of each fixed-width, attalign='d' catalog column to be divisible by 8
unconditionally. Keep such columns before the first NameData column of the
catalog, since packagers can override NAMEDATALEN to an odd number.

Thanks!

The best place for such a comment would be in one of
src/test/regress/sql/*sanity*.sql, next to a test written to detect new
violations.

Agreed.

IIUC in the new test, we would need a new SQL function to calculate
the offset of catalog columns including padding, is that right? Or do
you have an idea to do that by using existing functionality?

Something like this:

select
attrelid::regclass,
attname,
array(select typname
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum order by pa.attnum) AS types_before,
(select sum(attlen)
from pg_type t join pg_attribute pa on t.oid = pa.atttypid
where pa.attrelid = a.attrelid and pa.attnum > 0 and pa.attnum < a.attnum) AS len_before
from pg_attribute a
join pg_class c on c.oid = attrelid
where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1
order by attrelid::regclass::text, attnum;
attrelid │ attname │ types_before │ len_before
─────────────────┼──────────────┼─────────────────────────────────────────────┼────────────
pg_sequence │ seqstart │ {oid,oid} │ 8
pg_sequence │ seqincrement │ {oid,oid,int8} │ 16
pg_sequence │ seqmax │ {oid,oid,int8,int8} │ 24
pg_sequence │ seqmin │ {oid,oid,int8,int8,int8} │ 32
pg_sequence │ seqcache │ {oid,oid,int8,int8,int8,int8} │ 40
pg_subscription │ subskiplsn │ {oid,oid,name,oid,bool,bool,bool,char,bool} │ 81
(6 rows)

That doesn't count padding, but hazardous column changes will cause a diff in
the output.

Yes, in this case, we can detect the violated column order even
without considering padding. On the other hand, I think this
calculation could not detect some patterns of order. For instance,
suppose the column order is {oid, bool, bool, oid, bool, bool, oid,
int8}, the len_before is 16 but offset of int8 column including
padding is 20 on ALIGNOF_DOUBLE==4 environment.

Correct. Feel free to make it more precise. If you do want to add a
function, it could be a regress.c function rather than an always-installed
part of PostgreSQL. Again, getting the buildfarm green is a priority; we can
always add tests later.

Agreed. I'll update and submit the patch as soon as possible.

I've attached an updated patch. The patch includes a regression test
to detect the new violation as we discussed. I've confirmed that
Cirrus CI tests pass. Please confirm on AIX and review the patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

make_subskiplsn_aligned_v2.patchapplication/octet-stream; name=make_subskiplsn_aligned_v2.patch
#563Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#562)
Re: Skipping logical replication transactions on subscriber side

On Tue, Apr 05, 2022 at 03:05:10PM +0900, Masahiko Sawada wrote:

I've attached an updated patch. The patch includes a regression test
to detect the new violation as we discussed. I've confirmed that
Cirrus CI tests pass. Please confirm on AIX and review the patch.

When the context of a "git grep skiplsn" match involves several struct fields
in struct order, please change to the new order. In other words, do for all
"git grep skiplsn" matches what the v2 patch does in GetSubscription(). The
v2 patch does not do this for catalogs.sgml, but it ought to. I didn't check
all the other "git grep" matches; please do so.

The changes present in this patch all look good.

#564Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#563)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Tue, Apr 5, 2022 at 4:08 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 05, 2022 at 03:05:10PM +0900, Masahiko Sawada wrote:

I've attached an updated patch. The patch includes a regression test
to detect the new violation as we discussed. I've confirmed that
Cirrus CI tests pass. Please confirm on AIX and review the patch.

When the context of a "git grep skiplsn" match involves several struct fields
in struct order, please change to the new order. In other words, do for all
"git grep skiplsn" matches what the v2 patch does in GetSubscription(). The
v2 patch does not do this for catalogs.sgml, but it ought to. I didn't check
all the other "git grep" matches; please do so.

Oops, I missed many places. I checked all "git grep" matches and fixed them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

make_subskiplsn_aligned_v3.patchapplication/x-patch; name=make_subskiplsn_aligned_v3.patch
#565Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#564)
Re: Skipping logical replication transactions on subscriber side

On Tue, Apr 05, 2022 at 04:41:28PM +0900, Masahiko Sawada wrote:

On Tue, Apr 5, 2022 at 4:08 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 05, 2022 at 03:05:10PM +0900, Masahiko Sawada wrote:

I've attached an updated patch. The patch includes a regression test
to detect the new violation as we discussed. I've confirmed that
Cirrus CI tests pass. Please confirm on AIX and review the patch.

When the context of a "git grep skiplsn" match involves several struct fields
in struct order, please change to the new order. In other words, do for all
"git grep skiplsn" matches what the v2 patch does in GetSubscription(). The
v2 patch does not do this for catalogs.sgml, but it ought to. I didn't check
all the other "git grep" matches; please do so.

Oops, I missed many places. I checked all "git grep" matches and fixed them.

--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1285,8 +1285,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
-- All columns of pg_subscription except subconninfo are publicly readable.
REVOKE ALL ON pg_subscription FROM public;
-GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
-              substream, subtwophasestate, subdisableonerr, subskiplsn, subslotname,
+GRANT SELECT (oid, subdbid, subname, subskiplsn, subowner, subenabled,
+              subbinary, substream, subtwophasestate, subdisableonerr, subslotname,
subsynccommit, subpublications)

subskiplsn comes before subname. Other than that, this looks done. I
recommend committing it with that change.

#566Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#565)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Wed, Apr 6, 2022 at 12:21 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 05, 2022 at 04:41:28PM +0900, Masahiko Sawada wrote:

On Tue, Apr 5, 2022 at 4:08 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Apr 05, 2022 at 03:05:10PM +0900, Masahiko Sawada wrote:

I've attached an updated patch. The patch includes a regression test
to detect the new violation as we discussed. I've confirmed that
Cirrus CI tests pass. Please confirm on AIX and review the patch.

When the context of a "git grep skiplsn" match involves several struct fields
in struct order, please change to the new order. In other words, do for all
"git grep skiplsn" matches what the v2 patch does in GetSubscription(). The
v2 patch does not do this for catalogs.sgml, but it ought to. I didn't check
all the other "git grep" matches; please do so.

Oops, I missed many places. I checked all "git grep" matches and fixed them.

--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1285,8 +1285,8 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
-- All columns of pg_subscription except subconninfo are publicly readable.
REVOKE ALL ON pg_subscription FROM public;
-GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
-              substream, subtwophasestate, subdisableonerr, subskiplsn, subslotname,
+GRANT SELECT (oid, subdbid, subname, subskiplsn, subowner, subenabled,
+              subbinary, substream, subtwophasestate, subdisableonerr, subslotname,
subsynccommit, subpublications)

subskiplsn comes before subname.

Right. I've attached an updated patch.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

make_subskiplsn_aligned_v4.patchapplication/octet-stream; name=make_subskiplsn_aligned_v4.patch
#567Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#566)
Re: Skipping logical replication transactions on subscriber side

On Wed, Apr 6, 2022 at 9:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 6, 2022 at 12:21 PM Noah Misch <noah@leadboat.com> wrote:

Right. I've attached an updated patch.

Thanks, this looks good to me as well. Noah, would you like to commit it?

--
With Regards,
Amit Kapila.

#568Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Noah Misch (#547)
Re: Skipping logical replication transactions on subscriber side

On 02.04.22 10:13, Noah Misch wrote:

uint64 and pg_lsn use TYPALIGN_DOUBLE. For AIX, they really need a typalign
corresponding to ALIGNOF_LONG. Hence, the C struct layout doesn't match the
tuple layout. Columns potentially affected:

[local] test=*# select attrelid::regclass, attname from pg_attribute a join pg_class c on c.oid = attrelid where attalign = 'd' and relkind = 'r' and attnotnull and attlen <> -1;
attrelid │ attname
─────────────────┼──────────────
pg_sequence │ seqstart
pg_sequence │ seqincrement
pg_sequence │ seqmax
pg_sequence │ seqmin
pg_sequence │ seqcache
pg_subscription │ subskiplsn
(6 rows)

The pg_sequence fields evade trouble, because there's exactly eight bytes (two
oids) before them.

Yes, we carefully did this when we ran into this the last time. See
</messages/by-id/76ce2ca3-40f2-d291-eae2-17b599f29ba0@2ndquadrant.com
and commit f3b421da5f4addc95812b9db05a24972b8fd9739.

#569Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#567)
Re: Skipping logical replication transactions on subscriber side

On Wed, Apr 6, 2022 at 10:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 6, 2022 at 9:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Apr 6, 2022 at 12:21 PM Noah Misch <noah@leadboat.com> wrote:

Right. I've attached an updated patch.

Thanks, this looks good to me as well. Noah, would you like to commit it?

I'll take care of this today. I think we can mark the new function
get_column_offset() being introduced by this patch as parallel safe.

--
With Regards,
Amit Kapila.

#570Amit Kapila
Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#569)
Re: Skipping logical replication transactions on subscriber side

On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll take care of this today. I think we can mark the new function
get_column_offset() being introduced by this patch as parallel safe.

Pushed.

--
With Regards,
Amit Kapila.

#571Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#570)
Re: Skipping logical replication transactions on subscriber side

On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll take care of this today. I think we can mark the new function
get_column_offset() being introduced by this patch as parallel safe.

Pushed.

Thanks!

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#572Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#571)
1 attachment(s)
Re: Skipping logical replication transactions on subscriber side

On Thu, Apr 07, 2022 at 08:39:58PM +0900, Masahiko Sawada wrote:

On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll take care of this today. I think we can mark the new function
get_column_offset() being introduced by this patch as parallel safe.

Pushed.

Thanks!

I took a closer look at the test case. The "get_column_offset(coltypes) % 8"
part would have caught the problem only when run on an ALIGNOF_DOUBLE==4
platform. Instead of testing the start of the typalign='d' column, let's test
the first offset beyond the previous column. The difference between those two
values depends on ALIGNOF_DOUBLE. While there, ignore typbyval; it doesn't
affect disk tuple layout, so this test shouldn't care. I plan to push the
attached patch.

Attachments:

sanity_check-skiplsn-v1.patchtext/plain; charset=us-ascii
#573Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#572)
Re: Skipping logical replication transactions on subscriber side

On Fri, Apr 15, 2022 at 4:26 PM Noah Misch <noah@leadboat.com> wrote:

On Thu, Apr 07, 2022 at 08:39:58PM +0900, Masahiko Sawada wrote:

On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll take care of this today. I think we can mark the new function
get_column_offset() being introduced by this patch as parallel safe.

Pushed.

Thanks!

I took a closer look at the test case. The "get_column_offset(coltypes) % 8"
part would have caught the problem only when run on an ALIGNOF_DOUBLE==4
platform. Instead of testing the start of the typalign='d' column, let's test
the first offset beyond the previous column. The difference between those two
values depends on ALIGNOF_DOUBLE.

Yes, but it could be false positives in some cases. For instance, the
column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
and 8 platforms but the new test fails.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#574Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Masahiko Sawada (#573)
Re: Skipping logical replication transactions on subscriber side

On Mon, Apr 18, 2022 at 10:45:50AM +0900, Masahiko Sawada wrote:

On Fri, Apr 15, 2022 at 4:26 PM Noah Misch <noah@leadboat.com> wrote:

On Thu, Apr 07, 2022 at 08:39:58PM +0900, Masahiko Sawada wrote:

On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll take care of this today. I think we can mark the new function
get_column_offset() being introduced by this patch as parallel safe.

Pushed.

Thanks!

I took a closer look at the test case. The "get_column_offset(coltypes) % 8"
part would have caught the problem only when run on an ALIGNOF_DOUBLE==4
platform. Instead of testing the start of the typalign='d' column, let's test
the first offset beyond the previous column. The difference between those two
values depends on ALIGNOF_DOUBLE.

Yes, but it could be false positives in some cases. For instance, the
column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
and 8 platforms but the new test fails.

I'm happy with that, because the affected author should look for padding-free
layouts before settling on your example layout. If the padding-free layouts
are all unacceptable, the author should update the expected sanity_check.out
to show the one row where the test "fails".

#575Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Noah Misch (#574)
Re: Skipping logical replication transactions on subscriber side

On Mon, Apr 18, 2022 at 12:22 PM Noah Misch <noah@leadboat.com> wrote:

On Mon, Apr 18, 2022 at 10:45:50AM +0900, Masahiko Sawada wrote:

On Fri, Apr 15, 2022 at 4:26 PM Noah Misch <noah@leadboat.com> wrote:

On Thu, Apr 07, 2022 at 08:39:58PM +0900, Masahiko Sawada wrote:

On Thu, Apr 7, 2022 at 7:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Apr 7, 2022 at 8:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

I'll take care of this today. I think we can mark the new function
get_column_offset() being introduced by this patch as parallel safe.

Pushed.

Thanks!

I took a closer look at the test case. The "get_column_offset(coltypes) % 8"
part would have caught the problem only when run on an ALIGNOF_DOUBLE==4
platform. Instead of testing the start of the typalign='d' column, let's test
the first offset beyond the previous column. The difference between those two
values depends on ALIGNOF_DOUBLE.

Yes, but it could be false positives in some cases. For instance, the
column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
and 8 platforms but the new test fails.

I'm happy with that, because the affected author should look for padding-free
layouts before settling on your example layout. If the padding-free layouts
are all unacceptable, the author should update the expected sanity_check.out
to show the one row where the test "fails".

That makes sense.

Regard,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#576Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#574)
Re: Skipping logical replication transactions on subscriber side

On Sun, Apr 17, 2022 at 11:22 PM Noah Misch <noah@leadboat.com> wrote:

Yes, but it could be false positives in some cases. For instance, the
column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
and 8 platforms but the new test fails.

I'm happy with that, because the affected author should look for padding-free
layouts before settling on your example layout. If the padding-free layouts
are all unacceptable, the author should update the expected sanity_check.out
to show the one row where the test "fails".

I realize that it was necessary to get something committed quickly
here to unbreak the buildfarm, but this is really a mess. As I
understand it, the problem here is that typalign='d' is either 4 bytes
or 8 depending on how the 'double' type is aligned on that platform,
but we use that typalign value also for some other data types that may
not be aligned in the same way as 'double'. Consequently, it's
possible to have a situation where the behavior of the C compiler
diverges from the behavior of heap_form_tuple(). To avoid that, we
need every catalog column that uses typalign=='d' to begin on an
8-byte boundary. We also want all such columns to occur before the
first NameData column in the catalog, to guard against the possibility
that NAMEDATALEN has been redefined to an odd value. I think this set
of constraints is a nuisance and that it's mostly good luck we haven't
run into any really awkward problems here so far.

In many of our catalogs, the first member is an OID and the second
member of the struct is of type NameData: pg_namespace, pg_class,
pg_proc, etc. That common design pattern is in direct contradiction to
the desires of this test case. As soon as someone wants to add a
typalign='d' member to any of those system catalogs, the struct layout
is going to have to get shuffled around -- and then it will look
different from all the other ones. Or else we'd have to rearrange them
all to move all the NameData columns to the end. I feel like it's
weird to introduce a test case that so obviously flies in the face of
how catalog layout has been done up to this point, especially for the
sake of a hypothetical user who want to set NAMEDATALEN to an odd
number. I doubt such scenarios have been thoroughly tested, or ever
will be. Perhaps instead we ought to legislate that NAMEDATALEN must
be a multiple of 8, or some such thing.

The other constraint, that typalign='d' fields must always fall on an
8 byte boundary, is probably less annoying in practice, but it's easy
to imagine a future catalog running into trouble. Let's say we want to
introduce a new catalog that has only an Oid column and a float8
column. Perhaps with 0-3 bool or uint8 columns as well, or with any
number of NameData columns as well. Well, the only way to satisfy this
constraint is to put the float8 column first and the Oid column after
it, which immediately makes it look different from every other catalog
we have. It's hard to feel like that would be a good solution here. I
think we ought to try to engineer a solution where heap_form_tuple()
is going to do the same thing as the C compiler without the sorts of
extra rules that this test case enforces.

AFAICS, we could do that by:

1. De-supporting platforms that have this problem, or
2. Introducing new typalign values, as Noah proposed back on April 2, or
3. Somehow forcing values that are sometimes 4-byte aligned and
sometimes 8-byte aligned to be 8-byte alignment on all platforms

I also don't like the fact that the test case doesn't even catch
exactly the problematic set of cases, but rather a superset, leaving
it up to future patch authors to make a correct judgment about whether
a certain new column can be listed as an expected output of the test
case or whether the catalog representation must be changed. The idea
that we'll reliably get that right might be optimistic. Again, I don't
mean to say that this is the fault of this test case since, without
the test case, we'd have no idea that there was even a potential
problem, which would not be better. But it feels to me like we're
hacking around the real problem instead of fixing it, and it seems to
me that we should try to do better.

--
Robert Haas
EDB: http://www.enterprisedb.com

#577Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Robert Haas (#576)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jun 13, 2022 at 11:25 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Apr 17, 2022 at 11:22 PM Noah Misch <noah@leadboat.com> wrote:

Yes, but it could be false positives in some cases. For instance, the
column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
and 8 platforms but the new test fails.

I'm happy with that, because the affected author should look for padding-free
layouts before settling on your example layout. If the padding-free layouts
are all unacceptable, the author should update the expected sanity_check.out
to show the one row where the test "fails".

I realize that it was necessary to get something committed quickly
here to unbreak the buildfarm, but this is really a mess. As I
understand it, the problem here is that typalign='d' is either 4 bytes
or 8 depending on how the 'double' type is aligned on that platform,
but we use that typalign value also for some other data types that may
not be aligned in the same way as 'double'. Consequently, it's
possible to have a situation where the behavior of the C compiler
diverges from the behavior of heap_form_tuple(). To avoid that, we
need every catalog column that uses typalign=='d' to begin on an
8-byte boundary. We also want all such columns to occur before the
first NameData column in the catalog, to guard against the possibility
that NAMEDATALEN has been redefined to an odd value. I think this set
of constraints is a nuisance and that it's mostly good luck we haven't
run into any really awkward problems here so far.

In many of our catalogs, the first member is an OID and the second
member of the struct is of type NameData: pg_namespace, pg_class,
pg_proc, etc. That common design pattern is in direct contradiction to
the desires of this test case. As soon as someone wants to add a
typalign='d' member to any of those system catalogs, the struct layout
is going to have to get shuffled around -- and then it will look
different from all the other ones. Or else we'd have to rearrange them
all to move all the NameData columns to the end. I feel like it's
weird to introduce a test case that so obviously flies in the face of
how catalog layout has been done up to this point, especially for the
sake of a hypothetical user who want to set NAMEDATALEN to an odd
number. I doubt such scenarios have been thoroughly tested, or ever
will be. Perhaps instead we ought to legislate that NAMEDATALEN must
be a multiple of 8, or some such thing.

The other constraint, that typalign='d' fields must always fall on an
8 byte boundary, is probably less annoying in practice, but it's easy
to imagine a future catalog running into trouble. Let's say we want to
introduce a new catalog that has only an Oid column and a float8
column. Perhaps with 0-3 bool or uint8 columns as well, or with any
number of NameData columns as well. Well, the only way to satisfy this
constraint is to put the float8 column first and the Oid column after
it, which immediately makes it look different from every other catalog
we have. It's hard to feel like that would be a good solution here. I
think we ought to try to engineer a solution where heap_form_tuple()
is going to do the same thing as the C compiler without the sorts of
extra rules that this test case enforces.

These seem to be valid concerns.

AFAICS, we could do that by:

1. De-supporting platforms that have this problem, or
2. Introducing new typalign values, as Noah proposed back on April 2, or
3. Somehow forcing values that are sometimes 4-byte aligned and
sometimes 8-byte aligned to be 8-byte alignment on all platforms

Introducing new typalign values seems a good idea to me as it's more
future-proof. Will this item be for PG16, right? The main concern
seems that what this test case enforces would be nuisance when
introducing a new system catalog or a new column to the existing
catalog but given we're in post PG15-beta1 it is unlikely to happen in
PG15.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#578Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Masahiko Sawada (#577)
Re: Skipping logical replication transactions on subscriber side

On Tue, Jun 14, 2022 at 3:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

AFAICS, we could do that by:

1. De-supporting platforms that have this problem, or
2. Introducing new typalign values, as Noah proposed back on April 2, or
3. Somehow forcing values that are sometimes 4-byte aligned and
sometimes 8-byte aligned to be 8-byte alignment on all platforms

Introducing new typalign values seems a good idea to me as it's more
future-proof. Will this item be for PG16, right? The main concern
seems that what this test case enforces would be nuisance when
introducing a new system catalog or a new column to the existing
catalog but given we're in post PG15-beta1 it is unlikely to happen in
PG15.

I agree that we're not likely to introduce a new typalign value any
sooner than v16. There are a couple of things that bother me about
that solution. One is that I don't know how many different behaviors
exist out there in the wild. If we distinguish the alignment of double
from the alignment of int8, is that good enough, or are there other
data types whose properties aren't necessarily the same as either of
those? The other is that 32-bit systems are already relatively rare
and probably will become more rare until they disappear completely. It
doesn't seem like a ton of fun to engineer solutions to problems that
may go away by themselves with the passage of time. On the other hand,
if the alternative is to live with this kind of ugliness for another 5
years, maybe the time it takes to craft a solution is effort well
spent.

--
Robert Haas
EDB: http://www.enterprisedb.com

#579Masahiko Sawada
Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Robert Haas (#578)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jun 16, 2022 at 2:27 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jun 14, 2022 at 3:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

AFAICS, we could do that by:

1. De-supporting platforms that have this problem, or
2. Introducing new typalign values, as Noah proposed back on April 2, or
3. Somehow forcing values that are sometimes 4-byte aligned and
sometimes 8-byte aligned to be 8-byte alignment on all platforms

Introducing new typalign values seems a good idea to me as it's more
future-proof. Will this item be for PG16, right? The main concern
seems that what this test case enforces would be nuisance when
introducing a new system catalog or a new column to the existing
catalog but given we're in post PG15-beta1 it is unlikely to happen in
PG15.

I agree that we're not likely to introduce a new typalign value any
sooner than v16. There are a couple of things that bother me about
that solution. One is that I don't know how many different behaviors
exist out there in the wild. If we distinguish the alignment of double
from the alignment of int8, is that good enough, or are there other
data types whose properties aren't necessarily the same as either of
those?

Yeah, there might be.

The other is that 32-bit systems are already relatively rare
and probably will become more rare until they disappear completely. It
doesn't seem like a ton of fun to engineer solutions to problems that
may go away by themselves with the passage of time.

IIUC the system affected by this problem is not necessarily 32-bit
system. For instance, the hoverfly on buildfarm is 64-bit system but
was affected by this problem. According to the XLC manual[1]https://support.scinet.utoronto.ca/Manuals/xlC++-proguide.pdf; Table 11 on page 10., there is
no difference between 32-bit systems and 64-bit systems in terms of
alignment for double. FWIW, looking at the manual, there might have
been a solution for AIX to specify -qalign=natural compiler option in
order to enforce the alignment of double to 8.

Regards,

[1]: https://support.scinet.utoronto.ca/Manuals/xlC++-proguide.pdf; Table 11 on page 10.
Table 11 on page 10.

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#580Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Masahiko Sawada (#579)
Re: Skipping logical replication transactions on subscriber side

On Thu, Jun 16, 2022 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

FWIW, looking at the manual, there might have
been a solution for AIX to specify -qalign=natural compiler option in
order to enforce the alignment of double to 8.

Well if that can work it sure seems better.

--
Robert Haas
EDB: http://www.enterprisedb.com

#581Peter Eisentraut
Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Robert Haas (#580)
Re: Skipping logical replication transactions on subscriber side

On 16.06.22 18:35, Robert Haas wrote:

On Thu, Jun 16, 2022 at 3:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

FWIW, looking at the manual, there might have
been a solution for AIX to specify -qalign=natural compiler option in
order to enforce the alignment of double to 8.

Well if that can work it sure seems better.

That means changing the system's ABI, so in the extreme case you then
need to compile everything else to match as well.

#582Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#581)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jun 20, 2022 at 9:52 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

That means changing the system's ABI, so in the extreme case you then
need to compile everything else to match as well.

I think we wouldn't want to do that in a minor release, but doing it
in a new major release seems fine -- especially if only AIX is
affected.

--
Robert Haas
EDB: http://www.enterprisedb.com

#583Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Robert Haas (#582)
Re: Skipping logical replication transactions on subscriber side

On Mon, Jun 13, 2022 at 10:25:24AM -0400, Robert Haas wrote:

On Sun, Apr 17, 2022 at 11:22 PM Noah Misch <noah@leadboat.com> wrote:

Yes, but it could be false positives in some cases. For instance, the
column {oid, bool, XLogRecPtr} should be okay on ALIGNOF_DOUBLE == 4
and 8 platforms but the new test fails.

I'm happy with that, because the affected author should look for padding-free
layouts before settling on your example layout. If the padding-free layouts
are all unacceptable, the author should update the expected sanity_check.out
to show the one row where the test "fails".

Perhaps instead we ought to legislate that NAMEDATALEN must
be a multiple of 8, or some such thing.

The other constraint, that typalign='d' fields must always fall on an
8 byte boundary, is probably less annoying in practice, but it's easy
to imagine a future catalog running into trouble. Let's say we want to
introduce a new catalog that has only an Oid column and a float8
column. Perhaps with 0-3 bool or uint8 columns as well, or with any
number of NameData columns as well. Well, the only way to satisfy this
constraint is to put the float8 column first and the Oid column after
it, which immediately makes it look different from every other catalog
we have.

AFAICS, we could do that by:

1. De-supporting platforms that have this problem, or
2. Introducing new typalign values, as Noah proposed back on April 2, or
3. Somehow forcing values that are sometimes 4-byte aligned and
sometimes 8-byte aligned to be 8-byte alignment on all platforms

On Mon, Jun 20, 2022 at 10:04:06AM -0400, Robert Haas wrote:

On Mon, Jun 20, 2022 at 9:52 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

That means changing the system's ABI, so in the extreme case you then
need to compile everything else to match as well.

I think we wouldn't want to do that in a minor release, but doing it
in a new major release seems fine -- especially if only AIX is
affected.

"Everything" isn't limited to PostgreSQL. The Perl ABI exposes large structs
to plperl; a field of type double could require the AIX user to rebuild Perl
with the same compiler option.

Overall, this could be a textbook example of choosing between:

- Mild harm (unaesthetic column order) to many people.
- Considerable harm (dump/reload instead of pg_upgrade) to a small, unknown,
possibly-zero quantity of people.

Here's how I rank the options, from most-preferred to least-preferred:

1. Put new eight-byte fields at the front of each catalog, when in doubt.
2. On systems where double alignment differs from int64 alignment, require
NAMEDATALEN%8==0. Upgrading to v16 would require dump/reload for AIX users
changing NAMEDATALEN to conform to the new restriction.
3. Introduce new typalign values. Upgrading to v16 would require dump/reload
for all AIX users.
4. De-support AIX.
5. From above, "Somehow forcing values that are sometimes 4-byte aligned and
sometimes 8-byte aligned to be 8-byte alignment on all platforms".
Upgrading to v16 would require dump/reload for all AIX users.
6. Require -qalign=natural on AIX. Upgrading to v16 would require dump/reload
and possible system library rebuilds for all AIX users.

I gather (1) isn't at the top of your ranking, or you wouldn't have written
in. What do you think of (2)?

#584Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#583)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jun 22, 2022 at 12:28 AM Noah Misch <noah@leadboat.com> wrote:

"Everything" isn't limited to PostgreSQL. The Perl ABI exposes large structs
to plperl; a field of type double could require the AIX user to rebuild Perl
with the same compiler option.

Oh, that isn't so great, then.

Here's how I rank the options, from most-preferred to least-preferred:

1. Put new eight-byte fields at the front of each catalog, when in doubt.
2. On systems where double alignment differs from int64 alignment, require
NAMEDATALEN%8==0. Upgrading to v16 would require dump/reload for AIX users
changing NAMEDATALEN to conform to the new restriction.
3. Introduce new typalign values. Upgrading to v16 would require dump/reload
for all AIX users.
4. De-support AIX.
5. From above, "Somehow forcing values that are sometimes 4-byte aligned and
sometimes 8-byte aligned to be 8-byte alignment on all platforms".
Upgrading to v16 would require dump/reload for all AIX users.
6. Require -qalign=natural on AIX. Upgrading to v16 would require dump/reload
and possible system library rebuilds for all AIX users.

I gather (1) isn't at the top of your ranking, or you wouldn't have written
in. What do you think of (2)?

(2) pleases me in the sense that it seems to inconvenience very few
people, perhaps no one, in order to avoid inconveniencing a larger
number of people. However, it doesn't seem sufficient. If I understand
correctly, even a catalog that includes no NameData column can have a
problem.

Regarding (1), it is my opinion that the only real value of typalign
is for system catalogs, and specifically that it lets you put the
fields in an order that is aesthetically pleasing rather than worrying
about alignment considerations. After all, if we just ordered the
fields by descending alignment requirement, we could get rid of
typalign altogether (at least, if we didn't care about backward
compatibility). User tables would get smaller because we'd get rid of
alignment padding, and I don't think we'd see much impact on
performance because, for user tables, we copy the values into a datum
array before doing anything interesting with them. So (1) seems to me
to be conceding that typalign is unfit for the only purpose it has.
Perhaps that's just how things are, but it doesn't seem like a good
way for things to be.

--
Robert Haas
EDB: http://www.enterprisedb.com

#585Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#584)
Re: Skipping logical replication transactions on subscriber side

[ sorry for not having tracked this thread more closely ... ]

Robert Haas <robertmhaas@gmail.com> writes:

Regarding (1), it is my opinion that the only real value of typalign
is for system catalogs, and specifically that it lets you put the
fields in an order that is aesthetically pleasing rather than worrying
about alignment considerations. After all, if we just ordered the
fields by descending alignment requirement, we could get rid of
typalign altogether (at least, if we didn't care about backward
compatibility). User tables would get smaller because we'd get rid of
alignment padding, and I don't think we'd see much impact on
performance because, for user tables, we copy the values into a datum
array before doing anything interesting with them. So (1) seems to me
to be conceding that typalign is unfit for the only purpose it has.

That's a fundamental misreading of the situation. typalign is essential
on alignment-picky architectures, else you will get a SIGBUS fault
when trying to fetch a multibyte value (whether it's just going to get
stored into a Datum array is not very relevant here).

It appears that what we've got on AIX is that typalign 'd' overstates the
actual alignment requirement for 'double', which is safe from the SIGBUS
angle. However, it is a problem for our usage with system catalogs,
where our C struct declarations may not line up with the way that a
tuple is constructed by the tuple assembly routines.

I concur that Noah's description of #2 is not an accurate statement
of the rules we'd have to impose to be sure that the C structs line up
with the actual tuple layouts. I don't think we want rules exactly,
what we need is mechanical verification that the field orderings in
use are safe. The last time I looked at this thread, what was being
discussed was (a) re-ordering pg_subscription's columns and (b)
adding some kind of regression test to verify that all catalogs meet
the expectation of 'd'-aligned fields not needing alignment padding
that an AIX compiler might choose not to insert. That still seems
like the most plausible answer to me. I don't especially want to
invent an additional typalign code that we could only test on legacy
platforms.

regards, tom lane

#586Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#585)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jun 22, 2022 at 10:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That's a fundamental misreading of the situation. typalign is essential
on alignment-picky architectures, else you will get a SIGBUS fault
when trying to fetch a multibyte value (whether it's just going to get
stored into a Datum array is not very relevant here).

I mean, that problem is easily worked around. Maybe you think memcpy
would be a lot slower than a direct assignment, but "essential" is a
strong word.

I concur that Noah's description of #2 is not an accurate statement
of the rules we'd have to impose to be sure that the C structs line up
with the actual tuple layouts. I don't think we want rules exactly,
what we need is mechanical verification that the field orderings in
use are safe. The last time I looked at this thread, what was being
discussed was (a) re-ordering pg_subscription's columns and (b)
adding some kind of regression test to verify that all catalogs meet
the expectation of 'd'-aligned fields not needing alignment padding
that an AIX compiler might choose not to insert. That still seems
like the most plausible answer to me. I don't especially want to
invent an additional typalign code that we could only test on legacy
platforms.

I agree with that, but I don't think that having the developers
enforce alignment rules by reordering catalog columns for the sake of
legacy platforms is appealing either.

--
Robert Haas
EDB: http://www.enterprisedb.com

#587Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#586)
Re: Skipping logical replication transactions on subscriber side

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 22, 2022 at 10:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't especially want to
invent an additional typalign code that we could only test on legacy
platforms.

I agree with that, but I don't think that having the developers
enforce alignment rules by reordering catalog columns for the sake of
legacy platforms is appealing either.

Given that we haven't run into this before, it seems like a reasonable
bet that the problem will seldom arise. So as long as we have a
cross-check I'm all right with calling it good and moving on. Expending
a whole lot of work to improve the situation seems uncalled-for.

When and if we get to a point where we're ready to break on-disk
compatibility for user tables, perhaps revisiting the alignment
rules would be an appropriate component of that. I don't see that
happening in the foreseeable future, though.

regards, tom lane

#588Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#587)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jun 22, 2022 at 11:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Given that we haven't run into this before, it seems like a reasonable
bet that the problem will seldom arise. So as long as we have a
cross-check I'm all right with calling it good and moving on. Expending
a whole lot of work to improve the situation seems uncalled-for.

All right. Well, I'm on record as not liking that solution, but
obviously you can and do feel differently.

--
Robert Haas
EDB: http://www.enterprisedb.com

#589Noah Misch
Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#585)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jun 22, 2022 at 09:50:02AM -0400, Robert Haas wrote:

On Wed, Jun 22, 2022 at 12:28 AM Noah Misch <noah@leadboat.com> wrote:

Here's how I rank the options, from most-preferred to least-preferred:

1. Put new eight-byte fields at the front of each catalog, when in doubt.
2. On systems where double alignment differs from int64 alignment, require
NAMEDATALEN%8==0. Upgrading to v16 would require dump/reload for AIX users
changing NAMEDATALEN to conform to the new restriction.
3. Introduce new typalign values. Upgrading to v16 would require dump/reload
for all AIX users.
4. De-support AIX.
5. From above, "Somehow forcing values that are sometimes 4-byte aligned and
sometimes 8-byte aligned to be 8-byte alignment on all platforms".
Upgrading to v16 would require dump/reload for all AIX users.
6. Require -qalign=natural on AIX. Upgrading to v16 would require dump/reload
and possible system library rebuilds for all AIX users.

I gather (1) isn't at the top of your ranking, or you wouldn't have written
in. What do you think of (2)?

(2) pleases me in the sense that it seems to inconvenience very few
people, perhaps no one, in order to avoid inconveniencing a larger
number of people. However, it doesn't seem sufficient.

Here's a more-verbose description of (2), with additions about what it does
and doesn't achieve:

2. On systems where double alignment differs from int64 alignment, require
NAMEDATALEN%8==0. Modify the test from commits 79b716c and c1da0ac to stop
treating "name" fields specially. The test will still fail for AIX
compatibility violations, but "name" columns no longer limit your field
position candidates like they do today (today == option (1)). Upgrading to
v16 would require dump/reload for AIX users changing NAMEDATALEN to conform
to the new restriction. (I'm not sure pg_upgrade checks NAMEDATALEN
compatibility, but it should require at least one of: same NAMEDATALEN, or
absence of "name" columns in user tables.)

If I understand
correctly, even a catalog that includes no NameData column can have a
problem.

Correct.

On Wed, Jun 22, 2022 at 10:39:20AM -0400, Tom Lane wrote:

It appears that what we've got on AIX is that typalign 'd' overstates the
actual alignment requirement for 'double', which is safe from the SIGBUS
angle.

On AIX, typalign='d' states the exact alignment requirement for 'double'. It
understates the alignment requirement for int64_t.

I don't think we want rules exactly, what we need is mechanical verification
that the field orderings in use are safe.

Commits 79b716c and c1da0ac did that.

#590Robert Haas
Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#589)
Re: Skipping logical replication transactions on subscriber side

On Wed, Jun 22, 2022 at 10:48 PM Noah Misch <noah@leadboat.com> wrote:

Here's a more-verbose description of (2), with additions about what it does
and doesn't achieve:

2. On systems where double alignment differs from int64 alignment, require
NAMEDATALEN%8==0. Modify the test from commits 79b716c and c1da0ac to stop
treating "name" fields specially. The test will still fail for AIX
compatibility violations, but "name" columns no longer limit your field
position candidates like they do today (today == option (1)). Upgrading to
v16 would require dump/reload for AIX users changing NAMEDATALEN to conform
to the new restriction. (I'm not sure pg_upgrade checks NAMEDATALEN
compatibility, but it should require at least one of: same NAMEDATALEN, or
absence of "name" columns in user tables.)

Doing this much seems pretty close to free to me. I doubt anyone
really cares about using a NAMEDATALEN value that is not a multiple of
8 on any platform. I also think there are few people who care about
AIX. The intersection must be very small indeed, or so I would think.

--
Robert Haas
EDB: http://www.enterprisedb.com