Strange message from pg_receivexlog

Started by Sergey Konoplevover 12 years ago5 messagesgeneral
Jump to latest
#1Sergey Konoplev
gray.ru@gmail.com

Hi all,

My WAL archiving script based on pg_receivexlog reported the following
error several days ago (just ignore everything before
'pg_receivexlog', it's a message my script generates).

Thu Aug 15 18:33:09 MSK 2013 ERROR archive_wal.sh: Problem occured
during WAL archiving: pg_receivexlog: could not send feedback packet:
server closed the connection unexpectedly

At the same time postgres reported this error in log:

2013-08-15 18:32:51 MSK 30945 postgres@[unknown] from [local]
[vxid:53/0 txid:0] [streaming 2A97/6FA48000] LOG: terminating
walsender process due to replication timeout

Both pg_receivexlog and postgres run at the same machive,
pg_receivexlog connects to postgres locally. /var/log/messages has
absolutely nothing about it. I also have a hot standby on another
machine connecting to the same master, but there is nothing strange in
its logs either.

Any thoughts what it was?

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Magnus Hagander
magnus@hagander.net
In reply to: Sergey Konoplev (#1)
Re: Strange message from pg_receivexlog

On Tue, Aug 20, 2013 at 8:17 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:

Hi all,

My WAL archiving script based on pg_receivexlog reported the following
error several days ago (just ignore everything before
'pg_receivexlog', it's a message my script generates).

Thu Aug 15 18:33:09 MSK 2013 ERROR archive_wal.sh: Problem occured
during WAL archiving: pg_receivexlog: could not send feedback packet:
server closed the connection unexpectedly

At the same time postgres reported this error in log:

2013-08-15 18:32:51 MSK 30945 postgres@[unknown] from [local]
[vxid:53/0 txid:0] [streaming 2A97/6FA48000] LOG: terminating
walsender process due to replication timeout

Both pg_receivexlog and postgres run at the same machive,
pg_receivexlog connects to postgres locally. /var/log/messages has
absolutely nothing about it. I also have a hot standby on another
machine connecting to the same master, but there is nothing strange in
its logs either.

Any thoughts what it was?

It looks like something outside postgres or pg_receivexlog did
terminate the connection. pg_receievexlog noticed it was closed,
whereas postgres waited for the timeout (but probably would've noticed
if it had actually had some other data to send maybe?). Do you have
some iptables connection tracking or something like that which could
be interfering?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Sergey Konoplev
gray.ru@gmail.com
In reply to: Magnus Hagander (#2)
Re: Strange message from pg_receivexlog

On Tue, Aug 20, 2013 at 2:10 AM, Magnus Hagander <magnus@hagander.net> wrote:

It looks like something outside postgres or pg_receivexlog did
terminate the connection. pg_receievexlog noticed it was closed,
whereas postgres waited for the timeout (but probably would've noticed
if it had actually had some other data to send maybe?). Do you have
some iptables connection tracking or something like that which could
be interfering?

AFAIU, just standard things:

-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited

Nothing looks suspicious for me.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Sergey Konoplev (#1)
Re: Strange message from pg_receivexlog

On Tue, Aug 20, 2013 at 3:17 PM, Sergey Konoplev <gray.ru@gmail.com> wrote:

Hi all,

My WAL archiving script based on pg_receivexlog reported the following
error several days ago (just ignore everything before
'pg_receivexlog', it's a message my script generates).

Thu Aug 15 18:33:09 MSK 2013 ERROR archive_wal.sh: Problem occured
during WAL archiving: pg_receivexlog: could not send feedback packet:
server closed the connection unexpectedly

At the same time postgres reported this error in log:

2013-08-15 18:32:51 MSK 30945 postgres@[unknown] from [local]
[vxid:53/0 txid:0] [streaming 2A97/6FA48000] LOG: terminating
walsender process due to replication timeout

Both pg_receivexlog and postgres run at the same machive,
pg_receivexlog connects to postgres locally. /var/log/messages has
absolutely nothing about it. I also have a hot standby on another
machine connecting to the same master, but there is nothing strange in
its logs either.

Any thoughts what it was?

Is the value of replication_timeout sufficiently-larger than the status-interval
of pg_receivexlog?

Regards,

--
Fujii Masao

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Sergey Konoplev
gray.ru@gmail.com
In reply to: Fujii Masao (#4)
Re: Strange message from pg_receivexlog

On Wed, Aug 21, 2013 at 5:09 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Thu Aug 15 18:33:09 MSK 2013 ERROR archive_wal.sh: Problem occured
during WAL archiving: pg_receivexlog: could not send feedback packet:
server closed the connection unexpectedly

At the same time postgres reported this error in log:

2013-08-15 18:32:51 MSK 30945 postgres@[unknown] from [local]
[vxid:53/0 txid:0] [streaming 2A97/6FA48000] LOG: terminating
walsender process due to replication timeout

Is the value of replication_timeout sufficiently-larger than the status-interval
of pg_receivexlog?

The replication_timeout is 60s.

The archive_wal.sh (script-wrapper around pg_receivexlog) reports its
status straight away as it falls with an error. Below is the
explanation of how it works.

This is the core of archive_wal.sh:

(
flock -xn 543 || exit 0
result=$($PGRECEIVEXLOG -n -D $WAL_ARCHIVE_DIR 2>&1) || \
die "Problem occured during WAL archiving: $result."
) 543>$WAL_LOCK_FILE

And it is set to run by cron once a minute reporting me by email on occasions:

MAILTO=gray.ru@gmail.com

* * * * * /bin/bash /var/lib/pgsql/tmsdb/archive_wal.sh

/var/log/tmsdb/archive_wal.log

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general