FATAL: could not receive data from WAL stream

Started by Patrick Bover 9 years ago7 messagesgeneral
Jump to latest
#1Patrick B
patrickbakerbr@gmail.com

Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and
wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:

restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL
segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside /var/lib/pgsql/9.2/archive
directory:

postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep
000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16
000000020000179A000000F8

It's an UBUNTU instance, so my recovery.conf is:

*/etc/postgresql/9.2/main/recovery.conf:*

restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash
"/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_
archivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator
application_name=devops'

What can be happening, if the file is in there?

Thanks
Patrick

#2Venkata B Nagothi
nag1010@gmail.com
In reply to: Patrick B (#1)
Re: FATAL: could not receive data from WAL stream

On Tue, Sep 20, 2016 at 12:38 PM, Patrick B <patrickbakerbr@gmail.com>
wrote:

Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and
wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:

restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL
segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside
/var/lib/pgsql/9.2/archive directory:

postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep
000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16
000000020000179A000000F8

It's an UBUNTU instance, so my recovery.conf is:

*/etc/postgresql/9.2/main/recovery.conf:*

restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash
"/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_ar
chivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator
application_name=devops'

What can be happening, if the file is in there?

Do you mean to say that the WAL file "000000020000179A000000F8" is
available @ "/var/lib/pgsql/9.2/archive" location ?

Regards,
Venkata B N

Fujitsu Australia

#3drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: Venkata B Nagothi (#2)
Re: FATAL: could not receive data from WAL stream

2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:

On Tue, Sep 20, 2016 at 12:38 PM, Patrick B <patrickbakerbr@gmail.com>
wrote:

Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and
wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:

restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL
segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside
/var/lib/pgsql/9.2/archive directory:

postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep
000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16
000000020000179A000000F8

It's an UBUNTU instance, so my recovery.conf is:

*/etc/postgresql/9.2/main/recovery.conf:*

restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash
"/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_ar
chivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator
application_name=devops'

What can be happening, if the file is in there?

Do you mean to say that the WAL file "000000020000179A000000F8" is
available @ "/var/lib/pgsql/9.2/archive" location ?

Regards,
Venkata B N

Fujitsu Australia

Yes.....

#4drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: drum.lucas@gmail.com (#3)
Re: FATAL: could not receive data from WAL stream

2016-09-20 16:29 GMT+12:00 Lucas Possamai <drum.lucas@gmail.com>:

2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:

On Tue, Sep 20, 2016 at 12:38 PM, Patrick B <patrickbakerbr@gmail.com>
wrote:

Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and
wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:

restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL
segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside
/var/lib/pgsql/9.2/archive directory:

postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep
000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16
000000020000179A000000F8

It's an UBUNTU instance, so my recovery.conf is:

*/etc/postgresql/9.2/main/recovery.conf:*

restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash
"/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_ar
chivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator
application_name=devops'

What can be happening, if the file is in there?

Do you mean to say that the WAL file "000000020000179A000000F8" is
available @ "/var/lib/pgsql/9.2/archive" location ?

Regards,
Venkata B N

Fujitsu Australia

Yes.....

Ops.. sorry... sent to the wrong email

#5Patrick B
patrickbakerbr@gmail.com
In reply to: Venkata B Nagothi (#2)
Re: FATAL: could not receive data from WAL stream

2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:

On Tue, Sep 20, 2016 at 12:38 PM, Patrick B <patrickbakerbr@gmail.com>
wrote:

Hi guys,

I got a slave server running Postgres 9.2 with streaming replication and
wal_archive in an EC2 Instance at Amazon.

Postgres logs are showing me this error:

restored log file "000000020000179A000000F8" from archive
invalid record length at 179A/F8FFF3D0
WAL segment `/var/lib/pgsql/9.2/archive/00000003.history` not found
streaming replication successfully connected to primary
FATAL: could not receive data from WAL stream: FATAL: requested WAL
segment 000000020000179A000000F8 has already been removed

However, 000000020000179A000000F8 file is inside
/var/lib/pgsql/9.2/archive directory:

postgres@devops:/var/lib/pgsql/9.2/archive$ ls -la | grep
000000020000179A000000F8
-rw------- 1 postgres postgres 16777216 Sep 16 05:16
000000020000179A000000F8

It's an UBUNTU instance, so my recovery.conf is:

*/etc/postgresql/9.2/main/recovery.conf:*

restore_command = 'exec /var/lib/pgsql/bin/restore_wal_segment.bash
"/var/lib/pgsql/9.2/wal_archive/%f" "%p"'
archive_cleanup_command = '/var/lib/postgresql/bin/pg_ar
chivecleaup_mv.bash'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=IP_MY_SLAVE port=5432 user=replicator
application_name=devops'

What can be happening, if the file is in there?

Do you mean to say that the WAL file "000000020000179A000000F8" is
available @ "/var/lib/pgsql/9.2/archive" location ?

Yes!

#6Michael Paquier
michael@paquier.xyz
In reply to: Patrick B (#5)
Re: FATAL: could not receive data from WAL stream

On Tue, Sep 20, 2016 at 1:30 PM, Patrick B <patrickbakerbr@gmail.com> wrote:

2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:

Do you mean to say that the WAL file "000000020000179A000000F8" is
available @ "/var/lib/pgsql/9.2/archive" location ?

Yes!

Timeline 2 has visibly reached its end at segment
000000020000179A000000F8 and it cannot find in the archive the history
file to see from which timeline it needs to fetch afterwards. As the
timeline file cannot be found, it then attempts to fetch the segment
that it thinks is complete from the master itself.

Didn't you trigger a promotion which would make the master reach the
timeline 3? And are you sure that 00000003.history is not in the
archives?
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Patrick B
patrickbakerbr@gmail.com
In reply to: Michael Paquier (#6)
Re: FATAL: could not receive data from WAL stream

2016-09-20 16:46 GMT+12:00 Michael Paquier <michael.paquier@gmail.com>:

On Tue, Sep 20, 2016 at 1:30 PM, Patrick B <patrickbakerbr@gmail.com>
wrote:

2016-09-20 15:14 GMT+12:00 Venkata B Nagothi <nag1010@gmail.com>:

Do you mean to say that the WAL file "000000020000179A000000F8" is
available @ "/var/lib/pgsql/9.2/archive" location ?

Yes!

Timeline 2 has visibly reached its end at segment
000000020000179A000000F8 and it cannot find in the archive the history
file to see from which timeline it needs to fetch afterwards. As the
timeline file cannot be found, it then attempts to fetch the segment
that it thinks is complete from the master itself.

Didn't you trigger a promotion which would make the master reach the
timeline 3? And are you sure that 00000003.history is not in the
archives?
--
Michael

The server went down and when it came back online I got that errors..

I got some errors on the logs: systemd1
<https://jirageoop.atlassian.net/wiki/display/RI/1&gt;: Removed slice User
Slice of postgres.

I belive something happened with Postgres user and when the server came
back online it started postgres in a new path... that excluded
recovery.conf and the server might have been promoted as master

This means I'll have to re-build the DB right?

Patrick