Streaming replica refuse to do restore_command

Started by Soni Mover 11 years ago3 messagesgeneral

diptatapa@gmail.com

over 11 years ago

Hello All,

This is how i set up the db :
Slave using streaming replica.
We configure slave to run pg_dump which usually last for about 12 hours.
We have limited pg_xlog on slave.
Once the pg_xlog on slave is full while pg_dump still in progress.

2014-08-11 09:39:23.226 CDT,,,25779,,53d26b30.64b3,2,,2014-07-25 09:35:28
CDT,,0,FATAL,53100,"could not write to file ""pg_xlog/xlogtemp.25779"": No
space left on device",,,,,,,,,""

after some restart point, the slave is trying to connect to primary and
refuse to do restore_command.

Is this an expected behaviour?

Full log :

*2014-08-11 09:39:23.226 CDT,,,25779,,53d26b30.64b3,2,,2014-07-25 09:35:28
CDT,,0,FATAL,53100,"could not write to file ""pg_xlog/xlogtemp.25779"": No
space left on device",,,,,,,,,""*
*2014-08-11 09:46:36.163 CDT,,,14394,,53c582ea.383a,14031,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: time",,,,,,,,,""*
2014-08-11 09:49:33.786 CDT,,,14394,,53c582ea.383a,14032,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 21862 buffers
(4.2%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=176.358 s, sync=1.243 s, total=177.623 s; sync files=683,
longest=0.159 s, average=0.001 s",,,,,,,,,""
2014-08-11 09:49:33.787 CDT,,,14394,,53c582ea.383a,14033,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC5/36CE52A0","last
completed transaction was at log time 2014-08-10
23:01:55.754838-05",,,,,,,,""
2014-08-11 09:49:33.790 CDT,,,14394,,53c582ea.383a,14034,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 09:51:37.256 CDT,,,14394,,53c582ea.383a,14035,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 76725 buffers
(14.6%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=122.170 s, sync=1.284 s, total=123.465 s; sync files=770,
longest=0.424 s, average=0.001 s",,,,,,,,,""
2014-08-11 09:51:37.256 CDT,,,14394,,53c582ea.383a,14036,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC5/BE0D9590","last
completed transaction was at log time 2014-08-11
01:10:49.220537-05",,,,,,,,""
2014-08-11 09:51:37.258 CDT,,,14394,,53c582ea.383a,14037,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 09:54:23.552 CDT,,,14394,,53c582ea.383a,14038,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 49243 buffers
(9.4%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=164.859 s, sync=1.428 s, total=166.293 s; sync files=794,
longest=0.157 s, average=0.001 s",,,,,,,,,""
2014-08-11 09:54:23.552 CDT,,,14394,,53c582ea.383a,14039,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC6/571657C8","last
completed transaction was at log time 2014-08-11
04:43:16.23604-05",,,,,,,,""
2014-08-11 09:54:23.561 CDT,,,14394,,53c582ea.383a,14040,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 09:57:51.829 CDT,,,14394,,53c582ea.383a,14041,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 215682 buffers
(41.1%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=206.897 s, sync=1.364 s, total=208.267 s; sync files=697,
longest=0.110 s, average=0.001 s",,,,,,,,,""
2014-08-11 09:57:51.829 CDT,,,14394,,53c582ea.383a,14042,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC7/FBE0D48","last
completed transaction was at log time 2014-08-11
06:42:22.051994-05",,,,,,,,""
2014-08-11 09:57:51.832 CDT,,,14394,,53c582ea.383a,14043,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:00:18.828 CDT,,,14394,,53c582ea.383a,14044,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 127358 buffers
(24.3%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=145.721 s, sync=1.263 s, total=146.995 s; sync files=905,
longest=0.038 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:00:18.828 CDT,,,14394,,53c582ea.383a,14045,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC7/E80EDE30","last
completed transaction was at log time 2014-08-11
07:15:48.30725-05",,,,,,,,""
2014-08-11 10:00:18.833 CDT,,,14394,,53c582ea.383a,14046,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:03:27.063 CDT,,,14394,,53c582ea.383a,14047,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 168653 buffers
(32.2%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=186.808 s, sync=1.409 s, total=188.229 s; sync files=886,
longest=0.151 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:03:27.063 CDT,,,14394,,53c582ea.383a,14048,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC8/6D003160","last
completed transaction was at log time 2014-08-11
07:40:58.390313-05",,,,,,,,""
2014-08-11 10:03:27.065 CDT,,,14394,,53c582ea.383a,14049,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:06:43.148 CDT,,,14394,,53c582ea.383a,14050,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 220017 buffers
(42.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=194.483 s, sync=1.590 s, total=196.083 s; sync files=890,
longest=0.245 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:06:43.149 CDT,,,14394,,53c582ea.383a,14051,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC9/2F02EEF0","last
completed transaction was at log time 2014-08-11
08:09:49.370415-05",,,,,,,,""
2014-08-11 10:06:43.151 CDT,,,14394,,53c582ea.383a,14052,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:09:40.521 CDT,,,14394,,53c582ea.383a,14053,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 174348 buffers
(33.3%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=175.910 s, sync=1.448 s, total=177.370 s; sync files=882,
longest=0.172 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:09:40.521 CDT,,,14394,,53c582ea.383a,14054,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2AC9/FB00D4B8","last
completed transaction was at log time 2014-08-11
08:40:41.226457-05",,,,,,,,""
2014-08-11 10:09:40.526 CDT,,,14394,,53c582ea.383a,14055,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:12:05.971 CDT,,,14394,,53c582ea.383a,14056,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 172634 buffers
(32.9%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=143.857 s, sync=1.578 s, total=145.445 s; sync files=909,
longest=0.087 s, average=0.001 s",,,,,,,,,""
2014-08-11 10:12:05.971 CDT,,,14394,,53c582ea.383a,14057,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2ACA/E6895640","last
completed transaction was at log time 2014-08-11
08:59:41.72795-05",,,,,,,,""
2014-08-11 10:12:05.979 CDT,,,14394,,53c582ea.383a,14058,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""
2014-08-11 10:14:46.943 CDT,,,14394,,53c582ea.383a,14059,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint complete: wrote 144919 buffers
(27.6%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=159.633 s, sync=1.323 s, total=160.964 s; sync files=865,
longest=0.071 s, average=0.001 s",,,,,,,,,""
*2014-08-11 10:14:46.943 CDT,,,14394,,53c582ea.383a,14060,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"recovery restart point at 2ACB/7D006098","last
completed transaction was at log time 2014-08-11
09:25:45.324747-05",,,,,,,,""*
*2014-08-11 10:14:46.970 CDT,,,14394,,53c582ea.383a,14061,,2014-07-15
14:37:14 CDT,,0,LOG,00000,"restartpoint starting: xlog",,,,,,,,,""*
*2014-08-11 10:15:53.298 CDT,,,29141,,53e8de29.71d5,1,,2014-08-11 10:15:53
CDT,,0,LOG,00000,"streaming replication successfully connected to
primary",,,,,,,,,""*
*2014-08-11 10:15:53.298 CDT,,,29141,,53e8de29.71d5,2,,2014-08-11 10:15:53
CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:
requested WAL segment 0000000300002ACC000000A6 has already been removed*
*",,,,,,,,,""*
2014-08-11 10:15:58.550 CDT,,,29146,,53e8de2e.71da,1,,2014-08-11 10:15:58
CDT,,0,LOG,00000,"streaming replication successfully connected to
primary",,,,,,,,,""
2014-08-11 10:15:58.550 CDT,,,29146,,53e8de2e.71da,2,,2014-08-11 10:15:58
CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:
requested WAL segment 0000000300002ACC000000A6 has already been removed
",,,,,,,,,""
2014-08-11 10:16:04.037 CDT,,,29151,,53e8de34.71df,1,,2014-08-11 10:16:04
CDT,,0,LOG,00000,"streaming replication successfully connected to
primary",,,,,,,,,""
2014-08-11 10:16:04.037 CDT,,,29151,,53e8de34.71df,2,,2014-08-11 10:16:04
CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:
requested WAL segment 0000000300002ACC000000A6 has already been removed
",,,,,,,,,""
2014-08-11 10:16:09.432 CDT,,,29157,,53e8de39.71e5,1,,2014-08-11 10:16:09
CDT,,0,LOG,00000,"streaming replication successfully connected to
primary",,,,,,,,,""
2014-08-11 10:16:09.432 CDT,,,29157,,53e8de39.71e5,2,,2014-08-11 10:16:09
CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:
requested WAL segment 0000000300002ACC000000A6 has already been removed

--
Regards,

Soni Maula Harriz

Michael Paquier

michael@paquier.xyz

over 11 years ago

In reply to: Soni M (#1)

Re: Streaming replica refuse to do restore_command

On Tue, Aug 12, 2014 at 2:10 PM, Soni M <diptatapa@gmail.com> wrote:

This is how i set up the db :
Slave using streaming replica.
We configure slave to run pg_dump which usually last for about 12 hours.
We have limited pg_xlog on slave.
Once the pg_xlog on slave is full while pg_dump still in progress.

What is the version of Postgres you are using?

after some restart point, the slave is trying to connect to primary and
refuse to do restore_command.
Is this an expected behaviour?
2014-08-11 10:15:53.298 CDT,,,29141,,53e8de29.71d5,2,,2014-08-11 10:15:53
CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:
requested WAL segment 0000000300002ACC000000A6 has already been removed
",,,,,,,,,""
2014-08-11 10:15:58.550 CDT,,,29146,,53e8de2e.71da,1,,2014-08-11 10:15:58
CDT,,0,LOG,00000,"streaming replication successfully connected to
primary",,,,,,,,,""

A restart point is created each time a standby server shuts down
during WAL replay. It seems that you stopped it for a too long time,
and that WAL files that this server expected to find are done because
it was out of sight for a too long time. Then, when a standby starts
and enters in recovery, it tries first to recover necessary WAL files
from the archives with restore_command and then switches to streaming
mode if it cannot find what is necessary. Are you sure that you kept
enough WAL history in your archives and that restore_command is
pointing to the right path/host?
Regards,
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Soni M

diptatapa@gmail.com

over 11 years ago

In reply to: Michael Paquier (#2)

Re: Streaming replica refuse to do restore_command

On Tue, Aug 12, 2014 at 12:37 PM, Michael Paquier <michael.paquier@gmail.com

wrote:

On Tue, Aug 12, 2014 at 2:10 PM, Soni M <diptatapa@gmail.com> wrote:

This is how i set up the db :
Slave using streaming replica.
We configure slave to run pg_dump which usually last for about 12 hours.
We have limited pg_xlog on slave.
Once the pg_xlog on slave is full while pg_dump still in progress.

What is the version of Postgres you are using?

Pg 9.1 on from RHEL packages

after some restart point, the slave is trying to connect to primary and

refuse to do restore_command.
Is this an expected behaviour?
2014-08-11 10:15:53.298 CDT,,,29141,,53e8de29.71d5,2,,2014-08-11 10:15:53
CDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:
requested WAL segment 0000000300002ACC000000A6 has already been removed
",,,,,,,,,""
2014-08-11 10:15:58.550 CDT,,,29146,,53e8de2e.71da,1,,2014-08-11 10:15:58
CDT,,0,LOG,00000,"streaming replication successfully connected to
primary",,,,,,,,,""

A restart point is created each time a standby server shuts down
during WAL replay. It seems that you stopped it for a too long time,
and that WAL files that this server expected to find are done because
it was out of sight for a too long time. Then, when a standby starts
and enters in recovery, it tries first to recover necessary WAL files
from the archives with restore_command and then switches to streaming
mode if it cannot find what is necessary. Are you sure that you kept
enough WAL history in your archives and that restore_command is
pointing to the right path/host?
Regards,
--
Michael

the wal files needed still on the archives and it's on the right path.

I try this and it's solved.
I moved all pg_xlog files to a bigger storage and manually copy needed wal
files from archive, and restart the server and then it runs smoothly

--
Regards,

Soni Maula Harriz