Minimal logical decoding on standbys

robertmhaas@gmail.com

over 7 years ago

In reply to: Andres Freund (#1)

Re: Minimal logical decoding on standbys

On Wed, Dec 12, 2018 at 3:41 PM Andres Freund <andres@anarazel.de> wrote:

I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements. I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts. As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.

To me, this paragraph appears to say that you don't like Craig's
approach without quite explaining why you don't like it. Could you be
a bit more explicit about that?

I also don't think we should actually enforce hot_standby_feedback being
enabled - there's use-cases where that's not convenient, and it's not
bullet proof anyway (can be enabled/disabled without using logical
decoding inbetween). I think when there's a conflict we should have the
HINT mention that hs_feedback can be used to prevent such conflicts,
that ought to be enough.

If we can make that work, +1 from me.

I'm wondering if it's time to move the latestRemovedXid computation for
this type of record to the primary - it's likely to be cheaper there and
avoids this kind of complication. Secondarily, it'd have the advantage
of making pluggable storage integration easier - there we have the
problem that we don't know which type of relation we're dealing with
during recovery, so such lookups make pluggability harder (zheap just
adds extra flags to signal that, but that's not extensible).

That doesn't look trivial. It seems like _bt_delitems_delete() would
need to get an array of XIDs, but that gets called from
_bt_vacuum_one_page(), which doesn't have that information available.
It doesn't look like there is a particularly cheap way of getting it,
either. What do you have in mind?

Another alternative would be to just prevent such index deletions for
catalog tables when wal_level = logical.

That doesn't sound like a very nice idea.

If we were to go with this approach, there'd be at least the following
tasks:
- adapt tests from [2]

OK.

- enforce hot-standby to be enabled on the standby when logical slots
are created, and at startup if a logical slot exists

Why do we need this?

- fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
above.

OK.

- Have a nicer conflict handling than what I implemented here. Craig's
approach deleted the slots, but I'm not sure I like that. Blocking
seems more appropriately here, after all it's likely that the
replication topology would be broken afterwards.

I guess the viable options are approximately -- (1) drop the slot, (2)
advance the slot, (3) mark the slot as "failed" but leave it in
existence as a tombstone, (4) wait until something changes. I like
(3) better than (1). (4) seems pretty unfortunate unless there's some
other system for having the slot advance automatically. Seems like a
way for replication to hang indefinitely without anybody understanding
why it's happened (or, maybe, noticing).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

andres@anarazel.de

over 7 years ago

In reply to: Robert Haas (#2)

Re: Minimal logical decoding on standbys

Hi,

On 2018-12-13 19:32:19 -0500, Robert Haas wrote:

On Wed, Dec 12, 2018 at 3:41 PM Andres Freund <andres@anarazel.de> wrote:

I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements. I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts. As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.

To me, this paragraph appears to say that you don't like Craig's
approach without quite explaining why you don't like it. Could you be
a bit more explicit about that?

I think the conflict system introduced in Craig's patch is quite
complicated, relies on logging new wal records on a regular basis, adds
needs to be more conservative about the xmin horizon, which is obviously
not great for performance.

If you look at Craig's patch, it currently relies on blocking out
concurrent checkpoints:
/*
* We must prevent a concurrent checkpoint, otherwise the catalog xmin
* advance xlog record with the new value might be written before the
* checkpoint but the checkpoint may still see the old
* oldestCatalogXmin value.
*/
if (!LWLockConditionalAcquire(CheckpointLock, LW_SHARED))
/* Couldn't get checkpointer lock; will retry later */
return;
which on its own seems unacceptable, given that CheckpointLock can be
held by checkpointer for a very long time. While that's ongoing the
catalog xmin horizon doesn't advance.

Looking at the code it seems hard, to me, to make that approach work
nicely. But I might just be tired.

I'm wondering if it's time to move the latestRemovedXid computation for
this type of record to the primary - it's likely to be cheaper there and
avoids this kind of complication. Secondarily, it'd have the advantage
of making pluggable storage integration easier - there we have the
problem that we don't know which type of relation we're dealing with
during recovery, so such lookups make pluggability harder (zheap just
adds extra flags to signal that, but that's not extensible).

That doesn't look trivial. It seems like _bt_delitems_delete() would
need to get an array of XIDs, but that gets called from
_bt_vacuum_one_page(), which doesn't have that information available.
It doesn't look like there is a particularly cheap way of getting it,
either. What do you have in mind?

I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).

Writing that message right now.

- enforce hot-standby to be enabled on the standby when logical slots
are created, and at startup if a logical slot exists

Why do we need this?

Currently the conflict routines are only called when hot standby is
on. There's also no way to use logical decoding (including just advancing the
slot), without hot-standby being enabled, so I think that'd be a pretty
harmless restriction.

- Have a nicer conflict handling than what I implemented here. Craig's
approach deleted the slots, but I'm not sure I like that. Blocking
seems more appropriately here, after all it's likely that the
replication topology would be broken afterwards.

I guess the viable options are approximately --

(1) drop the slot

Doable.

(2) advance the slot

That's not realistically possible, I think. We'd need to be able to use
most of the logical decoding infrastructure in that context, and we
don't have that available. It's also possible to deadlock, where
advancing the slot's xmin horizon would need further WAL, but WAL replay
is blocked on advancing the slot.

(3) mark the slot as "failed" but leave it in existence as a tombstone

We currently don't have that, but it'd be doable, I think.

(4) wait until something changes.
(4) seems pretty unfortunate unless there's some other system for
having the slot advance automatically. Seems like a way for
replication to hang indefinitely without anybody understanding why
it's happened (or, maybe, noticing).

On the other hand, it would often allow whatever user of the slot to
continue using it, till the conflict is "resolved". To me it seems about
as easy to debug physical replication being blocked, as somehow the slot
being magically deleted or marked as invalid.

Thanks for looking,

Andres Freund

Petr Jelinek

petr@2ndquadrant.com

over 7 years ago

In reply to: Andres Freund (#1)

Re: Minimal logical decoding on standbys

Hi,

On 12/12/2018 21:41, Andres Freund wrote:

I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements. I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts. As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.

IIRC I was originally advocating adding that xmin announcement to the
standby snapshot message, but this seems better.

If we were to go with this approach, there'd be at least the following
tasks:
- adapt tests from [2]
- enforce hot-standby to be enabled on the standby when logical slots
are created, and at startup if a logical slot exists
- fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
above.
- Have a nicer conflict handling than what I implemented here. Craig's
approach deleted the slots, but I'm not sure I like that. Blocking
seems more appropriately here, after all it's likely that the
replication topology would be broken afterwards.
- get_rel_logical_catalog() shouldn't be in lsyscache.[ch], and can be
optimized (e.g. check wal_level before opening rel etc).

Once we have this logic, it can be used to implement something like
failover slots on-top, by having having a mechanism that occasionally
forwards slots on standbys using pg_replication_slot_advance().

Looking at this from the failover slots perspective. Wouldn't blocking
on conflict mean that we stop physical replication on catalog xmin
advance when there is lagging logical replication on primary? It might
not be too big deal as in that use-case it should only happen if
hs_feedback was off at some point, but just wanted to point out this
potential problem.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

tushar.ahuja@enterprisedb.com

about 7 years ago

In reply to: Petr Jelinek (#4)

Re: Minimal logical decoding on standbys

Hi,

While testingï¿½ this featureï¿½ found that - if lots of insert happened on
the master cluster then pg_recvlogical is not showing the DATA
informationï¿½ on logical replication slot which created on SLAVE.

Please refer this scenario -

1)
Create a Master cluster with wal_level=logcal and create logical
replication slot -
ï¿½SELECT * FROM pg_create_logical_replication_slot('master_slot',
'test_decoding');

2)
Create a Standbyï¿½ cluster using pg_basebackup ( ./pg_basebackup -D
slave/ -v -R)ï¿½ and create logical replication slot -
SELECT * FROM pg_create_logical_replication_slot('standby_slot',
'test_decoding');

3)
X terminal - startï¿½ pg_recvlogicalï¿½ , provide port=5555 ( slave
cluster)ï¿½ and specify slot=standby_slot
./pg_recvlogical -d postgresï¿½ -p 5555 -s 1 -F 1ï¿½ -v --slot=standby_slotï¿½
--start -f -

Y terminal - startï¿½ pg_recvlogicalï¿½ , provide port=5432 ( master
cluster)ï¿½ and specify slot=master_slot
./pg_recvlogical -d postgresï¿½ -p 5432 -s 1 -F 1ï¿½ -v --slot=master_slotï¿½
--start -f -

Z terminal - run pg_benchï¿½ against Master cluster ( ./pg_bench -i -s 10
postgres)

Able to see DATA information on Y terminalï¿½ but not on X.

but same able to see by firing this below query on SLAVE cluster -

SELECT * FROM pg_logical_slot_get_changes('standby_slot', NULL, NULL);

Is it expected ?

regards,
tushar

On 12/17/2018 10:46 PM, Petr Jelinek wrote:

Hi,

On 12/12/2018 21:41, Andres Freund wrote:

I don't like the approach of managing the catalog horizon via those
periodically logged catalog xmin announcements. I think we instead
should build ontop of the records we already have and use to compute
snapshot conflicts. As of HEAD we don't know whether such tables are
catalog tables, but that's just a bool that we need to include in the
records, a basically immeasurable overhead given the size of those
records.

IIRC I was originally advocating adding that xmin announcement to the
standby snapshot message, but this seems better.

If we were to go with this approach, there'd be at least the following
tasks:
- adapt tests from [2]
- enforce hot-standby to be enabled on the standby when logical slots
are created, and at startup if a logical slot exists
- fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned
above.
- Have a nicer conflict handling than what I implemented here. Craig's
approach deleted the slots, but I'm not sure I like that. Blocking
seems more appropriately here, after all it's likely that the
replication topology would be broken afterwards.
- get_rel_logical_catalog() shouldn't be in lsyscache.[ch], and can be
optimized (e.g. check wal_level before opening rel etc).

Once we have this logic, it can be used to implement something like
failover slots on-top, by having having a mechanism that occasionally
forwards slots on standbys using pg_replication_slot_advance().

Looking at this from the failover slots perspective. Wouldn't blocking
on conflict mean that we stop physical replication on catalog xmin
advance when there is lagging logical replication on primary? It might
not be too big deal as in that use-case it should only happen if
hs_feedback was off at some point, but just wanted to point out this
potential problem.

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

andres@anarazel.de

about 7 years ago

In reply to: tushar (#5)

Re: Minimal logical decoding on standbys

Hi,

On 2019-03-01 13:33:23 +0530, tushar wrote:

While testingï¿½ this featureï¿½ found that - if lots of insert happened on the
master cluster then pg_recvlogical is not showing the DATA informationï¿½ on
logical replication slot which created on SLAVE.

Please refer this scenario -

1)
Create a Master cluster with wal_level=logcal and create logical replication
slot -
ï¿½SELECT * FROM pg_create_logical_replication_slot('master_slot',
'test_decoding');

2)
Create a Standbyï¿½ cluster using pg_basebackup ( ./pg_basebackup -D slave/ -v
-R)ï¿½ and create logical replication slot -
SELECT * FROM pg_create_logical_replication_slot('standby_slot',
'test_decoding');

So, if I understand correctly you do *not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?

Thanks,

Andres

[1]: /messages/by-id/CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@mail.gmail.com

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#3)

Re: Minimal logical decoding on standbys

On Fri, 14 Dec 2018 at 06:25, Andres Freund <andres@anarazel.de> wrote:

I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).

Attached is a WIP test patch
0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
modified version of Craig Ringer's test cases
(012_logical_decoding_on_replica.pl) that he had attached in [1]/messages/by-id/CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@mail.gmail.com.
Here, I have also attached his original file
(Craigs_012_logical_decoding_on_replica.pl).

Also attached are rebased versions of couple of Andres's implementation patches.

I have added a new test scenario :
DROP TABLE from master *before* the logical records of the table
insertions are retrieved from standby. The logical records should be
successfully retrieved.

Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :

ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
# Failed test 'xmin and catalog_xmin equal after slot drop'
# at t/016_logical_decoding_on_replica.pl line 272.
# got:
# expected: 2584

Other than the above, there is this test scenario which I had to remove :

#########################################################
# Conflict with recovery: xmin cancels decoding session
#########################################################
#
# Start a transaction on the replica then perform work that should cause a
# recovery conflict with it. We'll check to make sure the client gets
# terminated with recovery conflict.
#
# Temporarily disable hs feedback so we can test recovery conflicts.
# It's fine to continue using a physical slot, the xmin should be
# cleared. We only check hot_standby_feedback when establishing
# a new decoding session so this approach circumvents the safeguards
# in place and forces a conflict.

This test starts pg_recvlogical, and expects it to be terminated due
to recovery conflict because hs feedback is disabled.
But that does not happen; instead, pg_recvlogical does not return.

But I am not sure why it does not terminate with Andres's patch; it
was expected to terminate with Craig Ringer's patch.

Further, there are subsequent test scenarios that test pg_recvlogical
with hs_feedback disabled, which I have removed because pg_recvlogical
does not return. I am yet to clearly understand why that happens. I
suspect that is only because hs_feedback is disabled.

Also, the testcases verify pg_controldata's oldestCatalogXmin values,
which are now not present with Andres's patch; so I removed tracking
of oldestCatalogXmin.

Thanks
-Amit Khandekar

tushar.ahuja@enterprisedb.com

about 7 years ago

In reply to: Andres Freund (#6)

Re: Minimal logical decoding on standbys

On 03/01/2019 11:16 PM, Andres Freund wrote:

So, if I understand correctly you do*not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?

Ok, This time around - I performed like this -

.)Master cluster (set wal_level=logical and hot_standby_feedback=on in
postgresql.conf) , start the server and create a physical replication slot

postgres=# SELECT * FROM
pg_create_physical_replication_slot('decoding_standby');
slot_name | lsn
------------------+-----
decoding_standby |
(1 row)

.)Perform pg_basebackup using --slot=decoding_standby with option -R .
modify port=5555 , start the server

.)Connect to slave and create a logical replication slot

postgres=# create table t(n int);
ERROR: cannot execute CREATE TABLE in a read-only transaction
postgres=#

postgres=# SELECT * FROM
pg_create_logical_replication_slot('standby_slot', 'test_decoding');
slot_name | lsn
--------------+-----------
standby_slot | 0/2000060
(1 row)

run pgbench (./pgbench -i -s 10 postgres) against master and
simultaneously- start pg_recvlogical , provide port=5555 ( slave
cluster) and specify slot=standby_slot
./pg_recvlogical -d postgres -p 5555 -s 1 -F 1 -v --slot=standby_slot
--start -f -

[centos@centos-cpula bin]$ ./pg_recvlogical -d postgres -p 5555 -s 1 -F
1 -v --slot=standby_slot --start -f -
pg_recvlogical: starting log streaming at 0/0 (slot standby_slot)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot standby_slot)
pg_recvlogical: confirming write up to 0/30194E8, flush to 0/30194E8
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3019590, flush to 0/3019590
(slot standby_slot)
pg_recvlogical: confirming write up to 0/301D558, flush to 0/301D558
(slot standby_slot)
BEGIN 476
COMMIT 476
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
pg_recvlogical: confirming write up to 0/3034B40, flush to 0/3034B40
(slot standby_slot)
BEGIN 477
COMMIT 477

If we do the same for the logical replication slot which created on
Master cluster

[centos@centos-cpula bin]$ ./pg_recvlogical -d postgres -s 1 -F 1 -v
--slot=master_slot --start -f -
pg_recvlogical: starting log streaming at 0/0 (slot master_slot)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot master_slot)
table public.pgbench_accounts: INSERT: aid[integer]:65057 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65058 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65059 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65060 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65061 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65062 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65063 bid[integer]:1
abalance[integer]:0 filler[character]:' '
table public.pgbench_accounts: INSERT: aid[integer]:65064 bid[integer]:1
abalance[integer]:0 filler[character]:' '

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

tushar.ahuja@enterprisedb.com

about 7 years ago

In reply to: tushar (#8)

Re: Minimal logical decoding on standbys

On 03/04/2019 04:54 PM, tushar wrote:

.)Perform pg_basebackup using --slot=decoding_standby with option -R
. modify port=5555 , start the server

set primary_slot_name = 'decoding_standby' in the postgresql.conf file
of slave.

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

#10

andres@anarazel.de

about 7 years ago

In reply to: tushar (#8)

Re: Minimal logical decoding on standbys

Hi,

On 2019-03-04 16:54:32 +0530, tushar wrote:

On 03/01/2019 11:16 PM, Andres Freund wrote:

So, if I understand correctly you do*not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?

Ok, This time aroundï¿½ - I performed like this -

.)Master cluster (set wal_level=logical and hot_standby_feedback=on in
postgresql.conf) , start the server and create a physical replication slot

Note that hot_standby_feedback=on needs to be set on a standby, not on
the primary (although it doesn't do any harm there).

Thanks,

Andres

#11

tushar.ahuja@enterprisedb.com

about 7 years ago

In reply to: Andres Freund (#10)

Re: Minimal logical decoding on standbys

On 03/04/2019 10:57 PM, Andres Freund wrote:

Note that hot_standby_feedback=on needs to be set on a standby, not on
the primary (although it doesn't do any harm there).

Right, This parameter was enabled on both Master and slave.

Is someone able to reproduce this issue ?

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

#12

tushar.ahuja@enterprisedb.com

about 7 years ago

In reply to: Andres Freund (#10)

Re: Minimal logical decoding on standbys

There is an another issue , where i am getting error while executing
"pg_logical_slot_get_changes" on SLAVE

Master (running on port=5432) - run "make installcheck" after setting
PATH=<installation/bin:$PATH ) and export PGDATABASE=postgres from
regress/ folder
Slave (running on port=5555) - Connect to regression database and
select pg_logical_slot_get_changes

[centos@mail-arts bin]$ ./psql postgres -p 5555 -f t.sql
You are now connected to database "regression" as user "centos".
slot_name | lsn
-----------+-----------
m61 | 1/D437AD8
(1 row)

psql:t.sql:3: ERROR: could not resolve cmin/cmax of catalog tuple

[centos@mail-arts bin]$ cat t.sql
\c regression
SELECT * from pg_create_logical_replication_slot('m61', 'test_decoding');
select * from pg_logical_slot_get_changes('m61',null,null);

regards,

On 03/04/2019 10:57 PM, Andres Freund wrote:

Hi,

On 2019-03-04 16:54:32 +0530, tushar wrote:

On 03/01/2019 11:16 PM, Andres Freund wrote:

So, if I understand correctly you do*not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?

Ok, This time around - I performed like this -

.)Master cluster (set wal_level=logical and hot_standby_feedback=on in
postgresql.conf) , start the server and create a physical replication slot

Note that hot_standby_feedback=on needs to be set on a standby, not on
the primary (although it doesn't do any harm there).

Thanks,

Andres

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

#13

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Amit Khandekar (#7)

Re: Minimal logical decoding on standbys

On Mon, 4 Mar 2019 at 14:09, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Fri, 14 Dec 2018 at 06:25, Andres Freund <andres@anarazel.de> wrote:

I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).

Attached is a WIP test patch
0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
modified version of Craig Ringer's test cases

Hi Andres,

I am trying to come up with new testcases to test the recovery
conflict handling. Before that I have some queries :

With Craig Ringer's approach, the way to reproduce the recovery
conflict was, I believe, easy : Do a checkpoint, which will log the
global-catalog-xmin-advance WAL record, due to which the standby -
while replaying the message - may find out that it's a recovery
conflict. But with your approach, the latestRemovedXid is passed only
during specific vacuum-related WAL records, so to reproduce the
recovery conflict error, we need to make sure some specific WAL
records are logged, such as XLOG_BTREE_DELETE. So we need to create a
testcase such that while creating an index tuple, it erases dead
tuples from a page, so that it eventually calls
_bt_vacuum_one_page()=>_bt_delitems_delete(), thus logging a
XLOG_BTREE_DELETE record.

I tried to come up with this reproducible testcase without success.
This seems difficult. Do you have an easier option ? May be we can use
some other WAL records that may have easier more reliable test case
for showing up recovery conflict ?

Further, with your patch, in ResolveRecoveryConflictWithSlots(), it
just throws a WARNING error level; so the wal receiver would not make
the backends throw an error; hence the test case won't catch the
error. Is that right ?

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#14

tushar.ahuja@enterprisedb.com

about 7 years ago

In reply to: tushar (#12)

Re: Minimal logical decoding on standbys

Hi ,

I am getting a server crash on standby while executing
pg_logical_slot_get_changes function , please refer this scenario

Master cluster( ./initdb -D master)
set wal_level='hot_standby in master/postgresql.conf file
start the server , connect to psql terminal and create a physical
replication slot ( SELECT * from
pg_create_physical_replication_slot('p1');)

perform pg_basebackup using --slot 'p1' (./pg_basebackup -D slave/ -R
--slot p1 -v))
set wal_level='logical' , hot_standby_feedback=on,
primary_slot_name='p1' in slave/postgresql.conf file
start the server , connect to psql terminal and create a logical
replication slot ( SELECT * from
pg_create_logical_replication_slot('t','test_decoding');)

run pgbench ( ./pgbench -i -s 10 postgres) on master and select
pg_logical_slot_get_changes on Slave database

postgres=# select * from pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.274 IST [26817] LOG: starting logical decoding for
slot "t"
2019-03-13 20:34:50.274 IST [26817] DETAIL: Streaming transactions
committing after 0/6C000060, reading WAL from 0/6C000028.
2019-03-13 20:34:50.274 IST [26817] STATEMENT: select * from
pg_logical_slot_get_changes('t',null,null);
2019-03-13 20:34:50.275 IST [26817] LOG: logical decoding found
consistent point at 0/6C000028
2019-03-13 20:34:50.275 IST [26817] DETAIL: There are no running
transactions.
2019-03-13 20:34:50.275 IST [26817] STATEMENT: select * from
pg_logical_slot_get_changes('t',null,null);
TRAP: FailedAssertion("!(data == tupledata + tuplelen)", File:
"decode.c", Line: 977)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2019-03-13
20:34:50.276 IST [26809] LOG: server process (PID 26817) was terminated
by signal 6: Aborted

Stack trace -

(gdb) bt
#0 0x00007f370e673277 in raise () from /lib64/libc.so.6
#1 0x00007f370e674968 in abort () from /lib64/libc.so.6
#2 0x0000000000a30edf in ExceptionalCondition (conditionName=0xc36090
"!(data == tupledata + tuplelen)", errorType=0xc35f5c "FailedAssertion",
fileName=0xc35d70 "decode.c",
    lineNumber=977) at assert.c:54
#3 0x0000000000843c6f in DecodeMultiInsert (ctx=0x2ba1ac8,
buf=0x7ffd7a5136d0) at decode.c:977
#4 0x0000000000842b32 in DecodeHeap2Op (ctx=0x2ba1ac8,
buf=0x7ffd7a5136d0) at decode.c:375
#5 0x00000000008424dd in LogicalDecodingProcessRecord (ctx=0x2ba1ac8,
record=0x2ba1d88) at decode.c:125
#6 0x000000000084830d in pg_logical_slot_get_changes_guts
(fcinfo=0x2b95838, confirm=true, binary=false) at logicalfuncs.c:307
#7 0x000000000084846a in pg_logical_slot_get_changes (fcinfo=0x2b95838)
at logicalfuncs.c:376
#8 0x00000000006e5b9f in ExecMakeTableFunctionResult
(setexpr=0x2b93ee8, econtext=0x2b93d98, argContext=0x2b99940,
expectedDesc=0x2b97970, randomAccess=false) at execSRF.c:233
#9 0x00000000006fb738 in FunctionNext (node=0x2b93c80) at
nodeFunctionscan.c:94
#10 0x00000000006e52b1 in ExecScanFetch (node=0x2b93c80,
accessMtd=0x6fb67b <FunctionNext>, recheckMtd=0x6fba77
<FunctionRecheck>) at execScan.c:93
#11 0x00000000006e5326 in ExecScan (node=0x2b93c80, accessMtd=0x6fb67b
<FunctionNext>, recheckMtd=0x6fba77 <FunctionRecheck>) at execScan.c:143
#12 0x00000000006fbac1 in ExecFunctionScan (pstate=0x2b93c80) at
nodeFunctionscan.c:270
#13 0x00000000006e3293 in ExecProcNodeFirst (node=0x2b93c80) at
execProcnode.c:445
#14 0x00000000006d8253 in ExecProcNode (node=0x2b93c80) at
../../../src/include/executor/executor.h:241
#15 0x00000000006daa4e in ExecutePlan (estate=0x2b93a28,
planstate=0x2b93c80, use_parallel_mode=false, operation=CMD_SELECT,
sendTuples=true, numberTuples=0,
    direction=ForwardScanDirection, dest=0x2b907e0, execute_once=true)
at execMain.c:1643
#16 0x00000000006d8865 in standard_ExecutorRun (queryDesc=0x2afff28,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:362
#17 0x00000000006d869b in ExecutorRun (queryDesc=0x2afff28,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:306
#18 0x00000000008ccef1 in PortalRunSelect (portal=0x2b36168,
forward=true, count=0, dest=0x2b907e0) at pquery.c:929
#19 0x00000000008ccb90 in PortalRun (portal=0x2b36168,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x2b907e0, altdest=0x2b907e0, completionTag=0x7ffd7a513e90 "")
    at pquery.c:770
#20 0x00000000008c6b58 in exec_simple_query (query_string=0x2adc1e8
"select * from pg_logical_slot_get_changes('t',null,null);") at
postgres.c:1215
#21 0x00000000008cae88 in PostgresMain (argc=1, argv=0x2b06590,
dbname=0x2b063d0 "postgres", username=0x2ad8da8 "centos") at postgres.c:4256
#22 0x0000000000828464 in BackendRun (port=0x2afe3b0) at postmaster.c:4399
#23 0x0000000000827c42 in BackendStartup (port=0x2afe3b0) at
postmaster.c:4090
#24 0x0000000000824036 in ServerLoop () at postmaster.c:1703
#25 0x00000000008238ec in PostmasterMain (argc=3, argv=0x2ad6d00) at
postmaster.c:1376
#26 0x0000000000748aab in main (argc=3, argv=0x2ad6d00) at main.c:228
(gdb)

regards,

On 03/07/2019 09:03 PM, tushar wrote:

There is an another issue , where i am getting error while executing
"pg_logical_slot_get_changes" on SLAVE

Master (running on port=5432) - run "make installcheck" after
setting PATH=<installation/bin:$PATH ) and export
PGDATABASE=postgres from regress/ folder
Slave (running on port=5555) - Connect to regression database and
select pg_logical_slot_get_changes

[centos@mail-arts bin]$ ./psql postgres -p 5555 -f t.sql
You are now connected to database "regression" as user "centos".
slot_name |    lsn
-----------+-----------
m61       | 1/D437AD8
(1 row)

psql:t.sql:3: ERROR: could not resolve cmin/cmax of catalog tuple

[centos@mail-arts bin]$ cat t.sql
\c regression
SELECT * from   pg_create_logical_replication_slot('m61',
'test_decoding');
select * from pg_logical_slot_get_changes('m61',null,null);

regards,

On 03/04/2019 10:57 PM, Andres Freund wrote:

Hi,

On 2019-03-04 16:54:32 +0530, tushar wrote:

On 03/01/2019 11:16 PM, Andres Freund wrote:

So, if I understand correctly you do*not* have a phyiscal replication
slot for this standby? For the feature to work reliably that needs to
exist, and you need to have hot_standby_feedback enabled. Does having
that fix the issue?

Ok, This time around - I performed like this -

.)Master cluster (set wal_level=logical and hot_standby_feedback=on in
postgresql.conf) , start the server and create a physical
replication slot

Note that hot_standby_feedback=on needs to be set on a standby, not on
the primary (although it doesn't do any harm there).

Thanks,

Andres

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

#15

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Amit Khandekar (#13)

Re: Minimal logical decoding on standbys

On Fri, 8 Mar 2019 at 20:59, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Mon, 4 Mar 2019 at 14:09, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On Fri, 14 Dec 2018 at 06:25, Andres Freund <andres@anarazel.de> wrote:

I've a prototype attached, but let's discuss the details in a separate
thread. This also needs to be changed for pluggable storage, as we don't
know about table access methods in the startup process, so we can't call
can't determine which AM the heap is from during
btree_xlog_delete_get_latestRemovedXid() (and sibling routines).

Attached is a WIP test patch
0003-WIP-TAP-test-for-logical-decoding-on-standby.patch that has a
modified version of Craig Ringer's test cases

Hi Andres,

I am trying to come up with new testcases to test the recovery
conflict handling. Before that I have some queries :

With Craig Ringer's approach, the way to reproduce the recovery
conflict was, I believe, easy : Do a checkpoint, which will log the
global-catalog-xmin-advance WAL record, due to which the standby -
while replaying the message - may find out that it's a recovery
conflict. But with your approach, the latestRemovedXid is passed only
during specific vacuum-related WAL records, so to reproduce the
recovery conflict error, we need to make sure some specific WAL
records are logged, such as XLOG_BTREE_DELETE. So we need to create a
testcase such that while creating an index tuple, it erases dead
tuples from a page, so that it eventually calls
_bt_vacuum_one_page()=>_bt_delitems_delete(), thus logging a
XLOG_BTREE_DELETE record.

I tried to come up with this reproducible testcase without success.
This seems difficult. Do you have an easier option ? May be we can use
some other WAL records that may have easier more reliable test case
for showing up recovery conflict ?

I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;

This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477

But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.

Further, with your patch, in ResolveRecoveryConflictWithSlots(), it
just throws a WARNING error level; so the wal receiver would not make
the backends throw an error; hence the test case won't catch the
error. Is that right ?

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#16

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Amit Khandekar (#15)

Re: Minimal logical decoding on standbys

On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;

This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477

But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.

I am going ahead with drop-the-slot way of handling the recovery
conflict. I am trying out using ReplicationSlotDropPtr() to drop the
slot. It seems the required locks are already in place inside the for
loop of ResolveRecoveryConflictWithSlots(), so we can directly call
ReplicationSlotDropPtr() when the slot xmin conflict is found.

As explained above, the only way I could reproduce the conflict is by
turning hot_standby_feedback off on slave, creating and inserting into
a table on master and then running VACUUM FULL. But after doing this,
I am not able to verify whether the slot is dropped, because on slave,
any simple psql command thereon, waits on a lock acquired on sys
catache, e.g. pg_authid. Working on it.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#17

andres@anarazel.de

about 7 years ago

In reply to: Amit Khandekar (#16)

Re: Minimal logical decoding on standbys

Hi,

On 2019-04-02 15:26:52 +0530, Amit Khandekar wrote:

On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;

This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477

But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.

I am going ahead with drop-the-slot way of handling the recovery
conflict. I am trying out using ReplicationSlotDropPtr() to drop the
slot. It seems the required locks are already in place inside the for
loop of ResolveRecoveryConflictWithSlots(), so we can directly call
ReplicationSlotDropPtr() when the slot xmin conflict is found.

Cool.

As explained above, the only way I could reproduce the conflict is by
turning hot_standby_feedback off on slave, creating and inserting into
a table on master and then running VACUUM FULL. But after doing this,
I am not able to verify whether the slot is dropped, because on slave,
any simple psql command thereon, waits on a lock acquired on sys
catache, e.g. pg_authid. Working on it.

I think that indicates a bug somewhere. If replay progressed, it should
have killed the slot, and continued replaying past the VACUUM
FULL. Those symptoms suggest replay is stuck somewhere. I suggest a)
compiling with WAL_DEBUG enabled, and turning on wal_debug=1, b) looking
at a backtrace of the startup process.

Greetings,

Andres Freund

#18

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#17)

Re: Minimal logical decoding on standbys

On Tue, 2 Apr 2019 at 21:34, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-04-02 15:26:52 +0530, Amit Khandekar wrote:

On Thu, 14 Mar 2019 at 15:00, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

I managed to get a recovery conflict by :
1. Setting hot_standby_feedback to off
2. Creating a logical replication slot on standby
3. Creating a table on master, and insert some data.
2. Running : VACUUM FULL;

This gives WARNING messages in the standby log file.
2019-03-14 14:57:56.833 IST [40076] WARNING: slot decoding_standby w/
catalog xmin 474 conflicts with removed xid 477
2019-03-14 14:57:56.833 IST [40076] CONTEXT: WAL redo at 0/3069E98
for Heap2/CLEAN: remxid 477

But I did not add such a testcase into the test file, because with the
current patch, it does not do anything with the slot; it just keeps on
emitting WARNING in the log file; so we can't test this scenario as of
now using the tap test.

I am going ahead with drop-the-slot way of handling the recovery
conflict. I am trying out using ReplicationSlotDropPtr() to drop the
slot. It seems the required locks are already in place inside the for
loop of ResolveRecoveryConflictWithSlots(), so we can directly call
ReplicationSlotDropPtr() when the slot xmin conflict is found.

Cool.

As explained above, the only way I could reproduce the conflict is by
turning hot_standby_feedback off on slave, creating and inserting into
a table on master and then running VACUUM FULL. But after doing this,
I am not able to verify whether the slot is dropped, because on slave,
any simple psql command thereon, waits on a lock acquired on sys
catache, e.g. pg_authid. Working on it.

I think that indicates a bug somewhere. If replay progressed, it should
have killed the slot, and continued replaying past the VACUUM
FULL. Those symptoms suggest replay is stuck somewhere. I suggest a)
compiling with WAL_DEBUG enabled, and turning on wal_debug=1, b) looking
at a backtrace of the startup process.

Oops, it was my own change that caused the hang. Sorry for the noise.
After using wal_debug, found out that after replaying the LOCK records
for the catalog pg_auth, it was not releasing it because it had
actually got stuck in ReplicationSlotDropPtr() itself. In
ResolveRecoveryConflictWithSlots(), a shared
ReplicationSlotControlLock was already held before iterating through
the slots, and now ReplicationSlotDropPtr() again tries to take the
same lock in exclusive mode for setting slot->in_use, leading to a
deadlock. I fixed that by releasing the shared lock before calling
ReplicationSlotDropPtr(), and then re-starting the slots' scan over
again since we released it. We do similar thing for
ReplicationSlotCleanup().

Attached is a rebased version of your patch
logical-decoding-on-standby.patch. This v2 version also has the above
changes. It also includes the tap test file which is still in WIP
state, mainly because I have yet to add the conflict recovery handling
scenarios.

I see that you have already committed the
move-latestRemovedXid-computation-for-nbtree-xlog related changes.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#19

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Amit Khandekar (#18)

Re: Minimal logical decoding on standbys

On Wed, 3 Apr 2019 at 19:57, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

Oops, it was my own change that caused the hang. Sorry for the noise.
After using wal_debug, found out that after replaying the LOCK records
for the catalog pg_auth, it was not releasing it because it had
actually got stuck in ReplicationSlotDropPtr() itself. In
ResolveRecoveryConflictWithSlots(), a shared
ReplicationSlotControlLock was already held before iterating through
the slots, and now ReplicationSlotDropPtr() again tries to take the
same lock in exclusive mode for setting slot->in_use, leading to a
deadlock. I fixed that by releasing the shared lock before calling
ReplicationSlotDropPtr(), and then re-starting the slots' scan over
again since we released it. We do similar thing for
ReplicationSlotCleanup().

Attached is a rebased version of your patch
logical-decoding-on-standby.patch. This v2 version also has the above
changes. It also includes the tap test file which is still in WIP
state, mainly because I have yet to add the conflict recovery handling
scenarios.

Attached v3 patch includes a new scenario to test conflict recovery
handling by verifying that the conflicting slot gets dropped.

WIth this, I am done with the test changes, except the below question
that I had posted earlier which I would like to have inputs :

I am not sure what is expected. What actually happens is : the
physical xlot catalog_xmin remains NULL initially, but becomes
non-NULL after the logical replication slot is created on standby.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

#20

andres@anarazel.de

about 7 years ago

In reply to: Amit Khandekar (#19)

Re: Minimal logical decoding on standbys

Hi,

Thanks for the new version of the patch. Btw, could you add Craig as a
co-author in the commit message of the next version of the patch? Don't
want to forget him.

On 2019-04-05 17:08:39 +0530, Amit Khandekar wrote:

Regarding the test result failures, I could see that when we drop a
logical replication slot at standby server, then the catalog_xmin of
physical replication slot becomes NULL, whereas the test expects it to
be equal to xmin; and that's the reason a couple of test scenarios are
failing :

ok 33 - slot on standby dropped manually
Waiting for replication conn replica's replay_lsn to pass '0/31273E0' on master
done
not ok 34 - physical catalog_xmin still non-null
not ok 35 - xmin and catalog_xmin equal after slot drop
# Failed test 'xmin and catalog_xmin equal after slot drop'
# at t/016_logical_decoding_on_replica.pl line 272.
# got:
# expected: 2584

I am not sure what is expected. What actually happens is : the
physical xlot catalog_xmin remains NULL initially, but becomes
non-NULL after the logical replication slot is created on standby.

That seems like the correct behaviour to me - why would we still have a
catalog xmin if there's no slot logical slot?

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 006446b..5785d2f 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1064,6 +1064,85 @@ ReplicationSlotReserveWal(void)
}
}

+void
+ResolveRecoveryConflictWithSlots(Oid dboid, TransactionId xid)
+{
+	int			i;
+	bool		found_conflict = false;
+
+	if (max_replication_slots <= 0)
+		return;
+
+restart:
+	if (found_conflict)
+	{
+		CHECK_FOR_INTERRUPTS();
+		/*
+		 * Wait awhile for them to die so that we avoid flooding an
+		 * unresponsive backend when system is heavily loaded.
+		 */
+		pg_usleep(100000);
+		found_conflict = false;
+	}
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+	for (i = 0; i < max_replication_slots; i++)
+	{
+		ReplicationSlot *s;
+		NameData	slotname;
+		TransactionId slot_xmin;
+		TransactionId slot_catalog_xmin;
+
+		s = &ReplicationSlotCtl->replication_slots[i];
+
+		/* cannot change while ReplicationSlotCtlLock is held */
+		if (!s->in_use)
+			continue;
+
+		/* not our database, skip */
+		if (s->data.database != InvalidOid && s->data.database != dboid)
+			continue;
+
+		SpinLockAcquire(&s->mutex);
+		slotname = s->data.name;
+		slot_xmin = s->data.xmin;
+		slot_catalog_xmin = s->data.catalog_xmin;
+		SpinLockRelease(&s->mutex);
+
+		if (TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid))
+		{
+			found_conflict = true;
+
+			ereport(WARNING,
+					(errmsg("slot %s w/ xmin %u conflicts with removed xid %u",
+							NameStr(slotname), slot_xmin, xid)));
+		}
+
+		if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
+		{
+			found_conflict = true;
+
+			ereport(WARNING,
+					(errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
+							NameStr(slotname), slot_catalog_xmin, xid)));
+		}
+
+
+		if (found_conflict)
+		{
+			elog(WARNING, "Dropping conflicting slot %s", s->data.name.data);
+			LWLockRelease(ReplicationSlotControlLock);	/* avoid deadlock */
+			ReplicationSlotDropPtr(s);
+
+			/* We released the lock above; so re-scan the slots. */
+			goto restart;
+		}
+	}

I think this should be refactored so that the two found_conflict cases
set a 'reason' variable (perhaps an enum?) to the particular reason, and
then only one warning should be emitted. I also think that LOG might be
more appropriate than WARNING - as confusing as that is, LOG is more
severe than WARNING (see docs about log_min_messages).

@@ -0,0 +1,386 @@
+# Demonstrate that logical can follow timeline switches.
+#
+# Test logical decoding on a standby.
+#
+use strict;
+use warnings;
+use 5.8.0;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 55;
+use RecursiveCopy;
+use File::Copy;
+
+my ($stdin, $stdout, $stderr, $ret, $handle, $return);
+my $backup_name;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1, has_archiving => 1);
+$node_master->append_conf('postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+log_min_messages = 'debug2'
+log_error_verbosity = verbose
+# send status rapidly so we promptly advance xmin on master
+wal_receiver_status_interval = 1
+# very promptly terminate conflicting backends
+max_standby_streaming_delay = '2s'
+});
+$node_master->dump_info;
+$node_master->start;
+
+$node_master->psql('postgres', q[CREATE DATABASE testdb]);
+
+$node_master->safe_psql('testdb', q[SELECT * FROM pg_create_physical_replication_slot('decoding_standby');]);
+$backup_name = 'b1';
+my $backup_dir = $node_master->backup_dir . "/" . $backup_name;
+TestLib::system_or_bail('pg_basebackup', '-D', $backup_dir, '-d', $node_master->connstr('testdb'), '--slot=decoding_standby');
+
+sub print_phys_xmin
+{
+	my $slot = $node_master->slot('decoding_standby');
+	return ($slot->{'xmin'}, $slot->{'catalog_xmin'});
+}
+
+my ($xmin, $catalog_xmin) = print_phys_xmin();
+# After slot creation, xmins must be null
+is($xmin, '', "xmin null");
+is($catalog_xmin, '', "catalog_xmin null");
+
+my $node_replica = get_new_node('replica');
+$node_replica->init_from_backup(
+	$node_master, $backup_name,
+	has_streaming => 1,
+	has_restoring => 1);
+$node_replica->append_conf('postgresql.conf',
+	q[primary_slot_name = 'decoding_standby']);
+
+$node_replica->start;
+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+
+# with hot_standby_feedback off, xmin and catalog_xmin must still be null
+($xmin, $catalog_xmin) = print_phys_xmin();
+is($xmin, '', "xmin null after replica join");
+is($catalog_xmin, '', "catalog_xmin null after replica join");
+
+$node_replica->append_conf('postgresql.conf',q[
+hot_standby_feedback = on
+]);
+$node_replica->restart;
+sleep(2); # ensure walreceiver feedback sent

Can we make this more robust? E.g. by waiting till pg_stat_replication
shows the change on the primary? Because I can guarantee that this'll
fail on slow buildfarm machines (say the valgrind animals).

+$node_master->wait_for_catchup($node_replica, 'replay', $node_master->lsn('flush'));
+sleep(2); # ensure walreceiver feedback sent

Similar.

Greetings,

Andres Freund

#21

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#20)

#22

tushar.ahuja@enterprisedb.com

about 7 years ago

In reply to: tushar (#14)

#23

andres@anarazel.de

about 7 years ago

In reply to: tushar (#22)

#24

tushar.ahuja@enterprisedb.com

about 7 years ago

In reply to: Andres Freund (#23)

#25

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#23)

#26

andres@anarazel.de

about 7 years ago

In reply to: Amit Khandekar (#25)

#27

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#26)

#28

andres@anarazel.de

about 7 years ago

In reply to: Amit Khandekar (#27)

#29

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#28)

#30

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Amit Khandekar (#21)

#31

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#28)

#32

robertmhaas@gmail.com

about 7 years ago

In reply to: Amit Khandekar (#31)

#33

Sergei Kornilov

sk@zsrv.org

about 7 years ago

In reply to: Robert Haas (#32)

#34

robertmhaas@gmail.com

about 7 years ago

In reply to: Sergei Kornilov (#33)

#35

andres@anarazel.de

about 7 years ago

In reply to: Amit Khandekar (#31)

#36

andres@anarazel.de

about 7 years ago

In reply to: Robert Haas (#34)

#37

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#35)

#38

andres@anarazel.de

about 7 years ago

In reply to: Amit Khandekar (#37)

#39

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#38)

#40

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Amit Khandekar (#39)

#41

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Amit Khandekar (#40)

#42

andres@anarazel.de

about 7 years ago

In reply to: Amit Khandekar (#41)

#43

amitdkhan.pg@gmail.com

about 7 years ago

In reply to: Andres Freund (#42)

#44

andres@anarazel.de

about 7 years ago

In reply to: Amit Khandekar (#43)

#45

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Andres Freund (#44)

#46

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Amit Khandekar (#45)

#47

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Amit Khandekar (#46)

#48

andres@anarazel.de

almost 7 years ago

In reply to: Amit Khandekar (#46)

#49

andres@anarazel.de

almost 7 years ago

In reply to: Amit Khandekar (#47)

#50

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Andres Freund (#49)

#51

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Amit Khandekar (#50)

#52

alvherre@2ndquadrant.com

almost 7 years ago

In reply to: Andres Freund (#36)

#53

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Amit Khandekar (#51)

#54

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Amit Khandekar (#30)

#55

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Alvaro Herrera (#52)

#56

andres@anarazel.de

almost 7 years ago

In reply to: Amit Khandekar (#53)

#57

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: tushar (#5)

#58

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Andres Freund (#56)

#59

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Andres Freund (#56)

#60

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Amit Khandekar (#59)

#61

robertmhaas@gmail.com

almost 7 years ago

In reply to: Amit Khandekar (#58)

#62

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Robert Haas (#61)

#63

tushar.ahuja@enterprisedb.com

almost 7 years ago

In reply to: Amit Khandekar (#62)

#64

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: tushar (#63)

#65

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Amit Khandekar (#64)

#66

andres@anarazel.de

almost 7 years ago

In reply to: Amit Khandekar (#62)

#67

robertmhaas@gmail.com

almost 7 years ago

In reply to: Andres Freund (#66)

#68

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Andres Freund (#66)

#69

tushar.ahuja@enterprisedb.com

almost 7 years ago

In reply to: Amit Khandekar (#68)

#70

andres@anarazel.de

almost 7 years ago

In reply to: tushar (#69)

#71

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Andres Freund (#70)

#72

tushar.ahuja@enterprisedb.com

almost 7 years ago

In reply to: Andres Freund (#70)

#73

amitdkhan.pg@gmail.com

almost 7 years ago

In reply to: Amit Khandekar (#68)

#74

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Amit Khandekar (#73)

#75

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#74)

#76

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Amit Khandekar (#75)

#77

robertmhaas@gmail.com

over 6 years ago

In reply to: Amit Khandekar (#76)

#78

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Robert Haas (#77)

#79

robertmhaas@gmail.com

over 6 years ago

In reply to: Amit Khandekar (#78)

#80

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Robert Haas (#79)

#81

robertmhaas@gmail.com

over 6 years ago

In reply to: Amit Khandekar (#80)

#82

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Robert Haas (#81)

#83

robertmhaas@gmail.com

over 6 years ago

In reply to: Amit Khandekar (#82)

#84

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Robert Haas (#83)

#85

Craig Ringer

craig@2ndquadrant.com

over 6 years ago

In reply to: Andres Freund (#20)

#86

Craig Ringer

craig@2ndquadrant.com

over 6 years ago

In reply to: Robert Haas (#83)

#87

nilsocket

nilsocket@gmail.com

over 6 years ago

In reply to: Craig Ringer (#86)

#88

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Craig Ringer (#86)

#89

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Amit Khandekar (#88)

#90

Rahila Syed

rahila.syed@2ndquadrant.com

over 6 years ago

In reply to: Amit Khandekar (#89)

#91

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Rahila Syed (#90)

#92

Rahila Syed

rahila.syed@2ndquadrant.com

over 6 years ago

In reply to: Amit Khandekar (#91)

#93

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Rahila Syed (#92)

#94

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Amit Khandekar (#93)

#95

Rahila Syed

rahila.syed@2ndquadrant.com

over 6 years ago

In reply to: Amit Khandekar (#94)

#96

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Rahila Syed (#95)

#97

Andreas Joseph Krogh

andreas@visena.com

over 6 years ago

In reply to: Amit Khandekar (#96)

#98

amitdkhan.pg@gmail.com

over 6 years ago

In reply to: Andreas Joseph Krogh (#97)

#99

Andreas Joseph Krogh

andreas@visena.com

over 6 years ago

In reply to: Amit Khandekar (#98)

#100

James Sewell

james.sewell@jirotech.com

over 6 years ago

In reply to: Andreas Joseph Krogh (#99)

#101

alvherre@2ndquadrant.com

about 6 years ago

In reply to: Amit Khandekar (#96)

#102

alvherre@2ndquadrant.com

about 6 years ago

In reply to: Alvaro Herrera (#101)

#103