Issue with pg_basebackup v.11

Started by Ninad Shahover 4 years ago5 messagesgeneral
Jump to latest
#1Ninad Shah
nshah.postgres@gmail.com

Hello experts,

I am facing an issue with a customer's production server while trying to
take backup using pg_basebackup.

Below is the log from pg_basebackup execution.

* 115338208/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.1)
115355616/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.1)
115372640/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.1)
115389568/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.1)
115405792/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.1)
115423776/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.1)
115440640/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.2)
115454656/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.2)
pgbasebackup: could not read COPY data: could not receive data from server:
Connection timed out
pgbasebackup: removing contents of data directory
"/u01/PostgreSQL/11/datastaging"*

It copied nearly 110 GB of data and exited. Initially, we suspected it as a
network/OS issue. However, we tried to copy a 150 GB large file over the
network, which finished successfully.

What I observed is that it takes a couple of hours between below 2 lines.

115454656/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.2)
pgbasebackup: could not read COPY data: could not receive data from server:
Connection timed out

In other words, it run for an hour, and later, it takes 2 hours before it
times out.

Can someone please help me out here?

Regards,
Ninad Shah

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ninad Shah (#1)
Re: Issue with pg_basebackup v.11

Ninad Shah <nshah.postgres@gmail.com> writes:

What I observed is that it takes a couple of hours between below 2 lines.

115454656/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.2)
pgbasebackup: could not read COPY data: could not receive data from server:
Connection timed out

We have heard reports of network connections dropping while pg_basebackup
is busy doing something disk-intensive such as fsync'ing. The apparent
2-hour delay here does not mean that pg_basebackup was out to lunch for
2 hours; more likely that reflects the TCP timeout delay before the kernel
realizes that the connection is lost. The actual blame probably resides
with some firewall or router that has a short timeout for idle
connections.

I'd try turning on fairly aggressive TCP keepalive settings for the
connection, say keepalives_idle=30 or so.

regards, tom lane

#3Ninad Shah
nshah.postgres@gmail.com
In reply to: Tom Lane (#2)
Re: Issue with pg_basebackup v.11

Hey Tom,

Thank you for your response. Actually, when we copy data using scp/rsync,
it works without any issue. But, it fails while attempting to transfer
using pg_basebackup.

Would keepalive setting address and mitigate the issue?

Regards,
Ninad Shah

On Fri, 22 Oct 2021 at 21:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Ninad Shah <nshah.postgres@gmail.com> writes:

What I observed is that it takes a couple of hours between below 2 lines.

115454656/1304172127 kB (8%), 0/1 tablespace
(...atastaging/base/115868/154220.2)
pgbasebackup: could not read COPY data: could not receive data from

server:

Connection timed out

We have heard reports of network connections dropping while pg_basebackup
is busy doing something disk-intensive such as fsync'ing. The apparent
2-hour delay here does not mean that pg_basebackup was out to lunch for
2 hours; more likely that reflects the TCP timeout delay before the kernel
realizes that the connection is lost. The actual blame probably resides
with some firewall or router that has a short timeout for idle
connections.

I'd try turning on fairly aggressive TCP keepalive settings for the
connection, say keepalives_idle=30 or so.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ninad Shah (#3)
Re: Issue with pg_basebackup v.11

Ninad Shah <nshah.postgres@gmail.com> writes:

Would keepalive setting address and mitigate the issue?

[ shrug... ] Maybe; nobody else has more information about this
situation than you do. I suggested something to experiment with.

regards, tom lane

#5Ninad Shah
nshah.postgres@gmail.com
In reply to: Tom Lane (#4)
Re: Issue with pg_basebackup v.11

Thanks Tom.

Regards,
Ninad Shah

On Sat, 23 Oct 2021 at 20:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Ninad Shah <nshah.postgres@gmail.com> writes:

Would keepalive setting address and mitigate the issue?

[ shrug... ] Maybe; nobody else has more information about this
situation than you do. I suggested something to experiment with.

regards, tom lane