CREATE REPLICATION SLOT fails on a timeout

Started by Steve Singeralmost 12 years ago6 messageshackers
Jump to latest
#1Steve Singer
steve@ssinger.info

I am finding that my logical walsender connections are being terminated
due to a timeout on the CREATE REPLICATION SLOT command. with
"terminating walsender process due to replication timeout"

Below is the stack trace when this happens

#3 0x000000000067df28 in WalSndCheckTimeOut
(now=now@entry=453585463823871) at walsender.c:1748
#4 0x000000000067eedc in WalSndWaitForWal (loc=691727888) at
walsender.c:1216
#5 logical_read_xlog_page (state=<optimized out>,
targetPagePtr=691724288, reqLen=<optimized out>,
targetRecPtr=<optimized out>, cur_page=0x18476e0 "}\320\005",
pageTLI=<optimized out>) at walsender.c:741
#6 0x00000000004f41bf in ReadPageInternal (state=state@entry=0x17aa140,
pageptr=pageptr@entry=691724288,
reqLen=reqLen@entry=3600) at xlogreader.c:525
#7 0x00000000004f454a in XLogReadRecord (state=0x17aa140,
RecPtr=691727856, RecPtr@entry=0,
errormsg=errormsg@entry=0x7fff7f668c28) at xlogreader.c:228
#8 0x0000000000675e5c in DecodingContextFindStartpoint
(ctx=ctx@entry=0x17a0358) at logical.c:460
#9 0x0000000000680f16 in CreateReplicationSlot (cmd=0x1798b50) at
walsender.c:800
#10 exec_replication_command (
cmd_string=cmd_string@entry=0x17f1478 "CREATE_REPLICATION_SLOT
\"slon_4_1\" LOGICAL \"slony1_funcs.2.2.0\"")
at walsender.c:1291
#11 0x00000000006bf4a1 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x177db50, dbname=0x177db30 "test1",

(gdb) p last_reply_timestamp
$1 = 0

I propose the attached patch sets last_reply_timestamp to now() it
starts processing a command. Since receiving a command is hearing
something from the client.

Steve

Attachments:

timeout_init.difftext/x-patch; name=timeout_init.diffDownload+1-0
#2Andres Freund
andres@anarazel.de
In reply to: Steve Singer (#1)
Re: CREATE REPLICATION SLOT fails on a timeout

Hi,

On 2014-05-16 16:37:16 -0400, Steve Singer wrote:

I am finding that my logical walsender connections are being terminated due
to a timeout on the CREATE REPLICATION SLOT command. with "terminating
walsender process due to replication timeout"

Below is the stack trace when this happens

#3 0x000000000067df28 in WalSndCheckTimeOut (now=now@entry=453585463823871)
at walsender.c:1748
#4 0x000000000067eedc in WalSndWaitForWal (loc=691727888) at
walsender.c:1216
...
#9 0x0000000000680f16 in CreateReplicationSlot (cmd=0x1798b50) at
walsender.c:800
#10 exec_replication_command () at walsender.c:1291
#11 0x00000000006bf4a1 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x177db50, dbname=0x177db30 "test1",

(gdb) p last_reply_timestamp
$1 = 0

I propose the attached patch sets last_reply_timestamp to now() it starts
processing a command. Since receiving a command is hearing something from
the client.

Hm. Yes, that's a problem.

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
new file mode 100644
index 5c11d68..56a2f10
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*************** exec_replication_command(const char *cmd
*** 1276,1281 ****
--- 1276,1282 ----
parse_rc))));

cmd_node = replication_parse_result;
+ last_reply_timestamp = GetCurrentTimestamp();

switch (cmd_node->type)
{

I don't think that's going to cut it though. The creation can take
longer than whatever wal_sender_timeout is set to (when there's lots of
longrunning transactions). I think checking whether last_reply_timestamp
= 0 during timeout checking is more robust.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#2)
Re: CREATE REPLICATION SLOT fails on a timeout

On 05/16/2014 04:43 PM, Andres Freund wrote:

Hi,

I don't think that's going to cut it though. The creation can take
longer than whatever wal_sender_timeout is set to (when there's lots of
longrunning transactions). I think checking whether last_reply_timestamp
= 0 during timeout checking is more robust.

Greetings,

Andres Freund

That makes sense.
A patch that does that is attached.

Steve

Attachments:

timeout_nonzero.difftext/x-patch; name=timeout_nonzero.diffDownload+2-2
#4Andres Freund
andres@anarazel.de
In reply to: Steve Singer (#3)
Re: CREATE REPLICATION SLOT fails on a timeout

On 2014-05-16 17:02:33 -0400, Steve Singer wrote:

I don't think that's going to cut it though. The creation can take
longer than whatever wal_sender_timeout is set to (when there's lots of
longrunning transactions). I think checking whether last_reply_timestamp
= 0 during timeout checking is more robust.

That makes sense.
A patch that does that is attached.

I've attached a heavily revised version of that patch. Didn't touch all
the necessary places...

Survives creation of logical replication slots under 'intensive
pressure', with a wal_sender_timeout=10ms.

Thanks for the bugreport!

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Don-t-pay-heed-to-wal_sender_timeout-while-creating-.patchtext/x-patch; charset=us-asciiDownload+21-5
#5Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#4)
Re: CREATE REPLICATION SLOT fails on a timeout

Hi,

On 2014-05-17 01:34:25 +0200, Andres Freund wrote:

On 2014-05-16 17:02:33 -0400, Steve Singer wrote:

I don't think that's going to cut it though. The creation can take
longer than whatever wal_sender_timeout is set to (when there's lots of
longrunning transactions). I think checking whether last_reply_timestamp
= 0 during timeout checking is more robust.

That makes sense.
A patch that does that is attached.

I've attached a heavily revised version of that patch. Didn't touch all
the necessary places...

Survives creation of logical replication slots under 'intensive
pressure', with a wal_sender_timeout=10ms.

Thanks for the bugreport!

Pushed a fix for it. I am pretty sure it will, but could you still test
that it fixes your problem?

Thanks!

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Steve Singer
steve@ssinger.info
In reply to: Andres Freund (#5)
Re: CREATE REPLICATION SLOT fails on a timeout

On 05/28/2014 06:41 PM, Andres Freund wrote:

Hi,

Pushed a fix for it. I am pretty sure it will, but could you still test
that it fixes your problem?

Thanks!

The fix seems to work (I am no longer getting the timeout on slot creation)

Thanks

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers