Creating many tables gets logical replication stuck

Started by Laurenz Albeover 5 years ago5 messagesgeneral

Jump to latest

Laurenz Albe

laurenz.albe@cybertec.at

over 5 years ago

Reproducer on 12.4:

On the primary:

CREATE TABLE repli (id bigint PRIMARY KEY, val text NOT NULL);
CREATE PUBLICATION repli_pub FOR TABLE repli;

On the standby:

CREATE TABLE repli (id bigint PRIMARY KEY, val text NOT NULL);
CREATE SUBSCRIPTION repli_sub CONNECTION '...' PUBLICATION repli_pub;

Logical replication works fine.

Now connect to the primary and run:

BEGIN;
SELECT format('CREATE TABLE table%s (id SERIAL PRIMARY KEY, somename text UNIQUE);', foo) FROM generate_series(1,13200) AS foo \gexec

13200 tables are created.
Don't commit the transaction yet and verify that logical replication is still working.

Now commit the transaction; logical replication stops working.

The log on the primary has lots of:

2020-08-21 12:41:47.899 CEST [25222] LOG: connection received: host=[local]
2020-08-21 12:41:47.900 CEST [25222] LOG: replication connection authorized: user=postgres application_name=repli_sub
2020-08-21 12:41:47.901 CEST [25222] ERROR: replication slot "repli_sub" is active for PID 23911

The standby has

2020-08-21 12:41:47.897 CEST [25221] LOG: logical replication apply worker for subscription "repli_sub" has started
2020-08-21 12:41:47.901 CEST [25221] ERROR: could not start WAL streaming: ERROR: replication slot "repli_sub" is active for PID 23911
2020-08-21 12:41:47.903 CEST [22901] LOG: background worker "logical replication worker" (PID 25221) exited with exit code 1

When I try a fast shutdown on the primary server, the WAL sender just doesn't stop,
it can only be killed with SIGKILL.

I set "wal_sender_timeout = 6000s" and "log_min_messages = debug3" on the primary
and "wal_receiver_timeout = 6000s" on the standby and restarted the servers.

I see messages like this for the WAL sender:

2020-08-21 13:53:34.321 CEST [33594] LOG: connection received: host=[local]
2020-08-21 13:53:34.321 CEST [33594] DEBUG: postmaster child[33594]: starting with (
2020-08-21 13:53:34.321 CEST [33594] DEBUG: postgres
2020-08-21 13:53:34.321 CEST [33594] DEBUG: )
2020-08-21 13:53:34.321 CEST [33594] DEBUG: InitPostgres
2020-08-21 13:53:34.321 CEST [33594] LOG: replication connection authorized: user=postgres application_name=repli_sub
2020-08-21 13:53:34.326 CEST [33594] DEBUG: received replication command: IDENTIFY_SYSTEM
2020-08-21 13:53:34.326 CEST [33594] DEBUG: received replication command: START_REPLICATION SLOT "repli_sub" LOGICAL 28/48643248 (proto_version '1', publication_names '"repli_pub"')
2020-08-21 13:53:34.326 CEST [33594] DEBUG: cannot stream from 28/48643248, minimum is 28/48705180, forwarding
2020-08-21 13:53:34.326 CEST [33594] DEBUG: find_in_dynamic_libpath: trying "/usr/pgsql-12/lib/pgoutput"
2020-08-21 13:53:34.326 CEST [33594] DEBUG: find_in_dynamic_libpath: trying "/usr/pgsql-12/lib/pgoutput.so"
2020-08-21 13:53:34.326 CEST [33594] LOG: starting logical decoding for slot "repli_sub"
2020-08-21 13:53:34.326 CEST [33594] DETAIL: Streaming transactions committing after 28/48705180, reading WAL from 28/322FEDB8.
2020-08-21 13:53:34.326 CEST [33594] DEBUG: switched to timeline 1 valid until 0/0
2020-08-21 13:53:34.326 CEST [33594] DEBUG: sending replication keepalive
2020-08-21 13:53:34.326 CEST [33594] LOG: logical decoding found consistent point at 28/322FEDB8
2020-08-21 13:53:34.326 CEST [33594] DETAIL: There are no running transactions.
2020-08-21 13:53:34.326 CEST [33594] DEBUG: write 28/48705180 flush 28/48705180 apply 28/48705180 reply_time 2020-08-21 13:53:34.326751+02
2020-08-21 13:53:34.335 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk
2020-08-21 13:53:34.349 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk
2020-08-21 13:53:34.362 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk
2020-08-21 13:53:34.377 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk
2020-08-21 13:53:34.391 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk
2020-08-21 13:53:34.406 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk
[...]
spill 4096 changes in XID 191852506 to disk
2020-08-21 13:53:38.388 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk
2020-08-21 13:53:38.405 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk
2020-08-21 13:53:38.423 CEST [33594] DEBUG: spill 4096 changes in XID 191852506 to disk
2020-08-21 13:53:38.438 CEST [33594] DEBUG: purged committed transactions from 0 to 0, xmin: 191852506, xmax: 191852506
2020-08-21 13:53:38.438 CEST [33594] DEBUG: xmin: 191852506, xmax: 191852506, oldest running: 191852506, oldest xmin: 191852506
2020-08-21 13:53:38.439 CEST [33594] DEBUG: purged committed transactions from 0 to 0, xmin: 191852506, xmax: 191852506
2020-08-21 13:53:38.439 CEST [33594] DEBUG: xmin: 191852506, xmax: 191852506, oldest running: 191852506, oldest xmin: 191852506
2020-08-21 13:53:38.501 CEST [33594] DEBUG: found top level transaction 191852506, with catalog changes
2020-08-21 13:53:38.501 CEST [33594] DEBUG: adding a new snapshot to 191852506 at 28/48705180
2020-08-21 13:53:38.766 CEST [33594] DEBUG: spill 1953 changes in XID 191852506 to disk
2020-08-21 13:54:12.207 CEST [33594] DEBUG: restored 4096/976801 changes from disk
2020-08-21 13:54:48.343 CEST [33594] DEBUG: restored 4096/976801 changes from disk
2020-08-21 13:55:25.801 CEST [33594] DEBUG: restored 4096/976801 changes from disk
2020-08-21 13:56:00.470 CEST [33594] DEBUG: restored 4096/976801 changes from disk
2020-08-21 13:56:33.711 CEST [33594] DEBUG: restored 4096/976801 changes from disk
[...]

This just keeps going with a message roughly every 30 seconds.

A typical stack trace of the WAL sender looks like this:

#0 0x00000000008c7d87 in hash_seq_search (status=status@entry=0x7ffc4e5446e0) at dynahash.c:1428
#1 0x00000000008b54c8 in RelfilenodeMapInvalidateCallback (arg=<optimized out>, relid=386381) at relfilenodemap.c:79
#2 0x00000000008a6786 in LocalExecuteInvalidationMessage (msg=0x7ff297f62b38) at inval.c:595
#3 0x0000000000750ba6 in ReorderBufferExecuteInvalidations (rb=<optimized out>, txn=<optimized out>) at reorderbuffer.c:2149
#4 ReorderBufferCommit (rb=0x1b270a0, xid=xid@entry=191852506, commit_lsn=173014012288, end_lsn=<optimized out>,
commit_time=commit_time@entry=651321281503986, origin_id=origin_id@entry=0, origin_lsn=0) at reorderbuffer.c:1770
#5 0x0000000000746a0a in DecodeCommit (xid=191852506, parsed=0x7ffc4e544aa0, buf=0x7ffc4e544c50, ctx=0x1b11120) at decode.c:640
#6 DecodeXactOp (ctx=0x1b11120, buf=buf@entry=0x7ffc4e544c50) at decode.c:248
#7 0x0000000000746e22 in LogicalDecodingProcessRecord (ctx=0x1b11120, record=0x1b11398) at decode.c:117
#8 0x00000000007581e5 in XLogSendLogical () at walsender.c:2848
#9 0x000000000075a3f3 in WalSndLoop (send_data=send_data@entry=0x758180 <XLogSendLogical>) at walsender.c:2199
#10 0x000000000075b12c in StartLogicalReplication (cmd=0x7ffc4e544d20) at walsender.c:1129
#11 exec_replication_command (
cmd_string=cmd_string@entry=0x1a425f0 "START_REPLICATION SLOT \"repli_sub\" LOGICAL 28/48643248 (proto_version '1', publication_names '\"repli_pub\"')") at walsender.c:1545
#12 0x00000000007a72d8 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1a99298, dbname=<optimized out>, username=<optimized out>)
at postgres.c:4243
#13 0x0000000000734ef2 in BackendRun (port=0x1a90520) at postmaster.c:4448
#14 BackendStartup (port=0x1a90520) at postmaster.c:4139
#15 ServerLoop () at postmaster.c:1704
#16 0x0000000000735df5 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1a3c1e0) at postmaster.c:1377
#17 0x00000000004cd6a1 in main (argc=3, argv=0x1a3c1e0) at main.c:228

perf top --no-children --call-graph=fp --dsos=/usr/pgsql-12/bin/postgres

+    8.10%  [.] hash_seq_search
+    4.19%  [.] LocalExecuteInvalidationMessage
+    3.49%  [.] hash_search_with_hash_value
+    3.37%  [.] CatCacheInvalidate
+    1.65%  [.] CallSyscacheCallbacks
+    0.93%  [.] uint32_hash
+    0.82%  [.] ReorderBufferCommit
+    0.79%  [.] hash_search
+    0.76%  [.] RelfilenodeMapInvalidateCallback
     0.63%  [.] InvalidateCatalogSnapshot
+    0.62%  [.] SysCacheInvalidate

What could be causing this?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Achilleas Mantzios

achill@matrix.gatewaynet.com

over 5 years ago

In reply to: Laurenz Albe (#1)

Re: Creating many tables gets logical replication stuck

Dear Laurenz thank you for your analysis and report.

On 21/8/20 4:00 μ.μ., Laurenz Albe wrote:

Reproducer on 12.4:

This is identical problem with this report here : /messages/by-id/6fa054d8-ad14-42a2-8926-5d79c97ecd65@matrix.gatewaynet.com

Yours,
Laurenz Albe

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Keisuke Kuroda

keisuke.kuroda.3862@gmail.com

over 5 years ago

In reply to: Achilleas Mantzios (#2)

Re: Creating many tables gets logical replication stuck

Hi All,

There was a similar problem in this discussion:
Logical decoding CPU-bound w/ large number of tables
/messages/by-id/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com

RelfilenodeMapHash from 1024 entries to 64.

The above changes reduced the performance impact.

However, I think the problem that there are too
many invalidations of RelfilenodeMapHash still remains.
As you report, when many tables are created/dropped/truncated,
The walsender process can get stuck.

--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com

Achilleas Mantzios

achill@matrix.gatewaynet.com

over 5 years ago

In reply to: Keisuke Kuroda (#3)

Re: Creating many tables gets logical replication stuck

Hello Keisuke
On 25/8/20 9:50 π.μ., Keisuke Kuroda wrote:

Hi All,

There was a similar problem in this discussion:
Logical decoding CPU-bound w/ large number of tables
/messages/by-id/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com

RelfilenodeMapHash from 1024 entries to 64.

The above changes reduced the performance impact.

However, I think the problem that there are too
many invalidations of RelfilenodeMapHash still remains.
As you report, when many tables are created/dropped/truncated,

Right! IIRC also a massive drop created the very same problem during the latest attempts for reproduction but I did not focus on this scenario.

The walsender process can get stuck.

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Laurenz Albe

laurenz.albe@cybertec.at

over 5 years ago

In reply to: Achilleas Mantzios (#4)

Re: Creating many tables gets logical replication stuck

On Tue, 2020-08-25 at 15:44 +0300, Achilleas Mantzios wrote:

Hello Keisuke
On 25/8/20 9:50 π.μ., Keisuke Kuroda wrote:

Hi All,

There was a similar problem in this discussion:
Logical decoding CPU-bound w/ large number of tables
/messages/by-id/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com

RelfilenodeMapHash from 1024 entries to 64.

The above changes reduced the performance impact.

However, I think the problem that there are too
many invalidations of RelfilenodeMapHash still remains.
As you report, when many tables are created/dropped/truncated,

Right! IIRC also a massive drop created the very same problem during the latest attempts for reproduction but I did not focus on this scenario.

The walsender process can get stuck.

Thanks you both, that is indeed the same problem, and the linked thread helps understand the problem.

Yours,
Laurenz Albe