Hot standby stops after a few days of inactivity (i.e. no new WAL)

Started by Marc Schablewskiover 17 years ago6 messagesgeneral
Jump to latest
#1Marc Schablewski
ms@clickware.de

Hi,

we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
kernel) as a hot standby. After some maintenances work the WAL files
couldn't be shipped to that system (which had nothing to do with
postgres, as we found out later). The problem was not noticed for about
a week. When looking for a reason why the WAL weren't shipped, we found
the following error message:

2008-10-31 17:07:52 CET 9162LOG: received smart shutdown request
2008-10-31 17:07:52 CET 9178FATAL: could not restore file
"000000010000008600000018" from archive: return code 15
2008-10-31 17:07:52 CET 9162LOG: startup process (PID 9178) exited with
exit code 1
2008-10-31 17:07:52 CET 9162LOG: aborting startup due to startup
process failure

This message occurred about 3 1/2 days after the last log was shipped. I
searched the postgres docs and Google for the meaning of "return code
15" but couldn't find anything.

After copying the missing WAL from our master system and restarting
postgres, everything worked fine again, but I'm still curious what made
postgres stop waiting for WAL. It seems to me that there is some kind of
timeout that triggers if there are no new WAL for a couple of days, but
that would seem a bit strange. I'd expect postgres to wait forever if it
is not told to wake up from recovery mode manually. The manual's
"Recovery Settings" section didn't help either. I'm not sure if it is a
bug, at least it's strange.

Regards,
Marc

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Marc Schablewski (#1)
Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

Marc Schablewski wrote:

Hi,

we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
kernel) as a hot standby. After some maintenances work the WAL files
couldn't be shipped to that system (which had nothing to do with
postgres, as we found out later). The problem was not noticed for about
a week. When looking for a reason why the WAL weren't shipped, we found
the following error message:

2008-10-31 17:07:52 CET 9162LOG: received smart shutdown request
2008-10-31 17:07:52 CET 9178FATAL: could not restore file
"000000010000008600000018" from archive: return code 15

This server was stopped intentionally by someone or something, external
to Postgres itself. "Smart shutdown" means the postmaster got SIGTERM.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Marc Schablewski (#1)
Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

On Tue, Nov 4, 2008 at 5:50 AM, Marc Schablewski <ms@clickware.de> wrote:

Hi,

we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
kernel) as a hot standby. After some maintenances work the WAL files

I'm assuming you meant 'warm standby'...hot standby servers can be
served for queries. This feature is proposed for PostgreSQL 8.4

merlin

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marc Schablewski (#1)
Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

Marc Schablewski <ms@clickware.de> writes:

... When looking for a reason why the WAL weren't shipped, we found
the following error message:

2008-10-31 17:07:52 CET 9162LOG: received smart shutdown request
2008-10-31 17:07:52 CET 9178FATAL: could not restore file
"000000010000008600000018" from archive: return code 15

Something sent SIGTERM to both your postmaster (hence the "smart
shutdown" message) and the recovery_command script (causing it to
exit with code 15, which is probably SIGTERM though you might want
to check kill -l to be sure). You need to find out what's doing that
and make it stop.

regards, tom lane

#5Marc Schablewski
ms@clickware.de
In reply to: Alvaro Herrera (#2)
Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

Ah, ok. I somehow missed the first line of the message an the rest of it
left the impression that "something" must be wrong with replication.

I guess one of my colleagues might have shut down the database by
accident and forgot to tell me.

Anyway, thanks for your reply.

Marc

Alvaro Herrera wrote:

Show quoted text

Marc Schablewski wrote:

Hi,

we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
kernel) as a hot standby. After some maintenances work the WAL files
couldn't be shipped to that system (which had nothing to do with
postgres, as we found out later). The problem was not noticed for about
a week. When looking for a reason why the WAL weren't shipped, we found
the following error message:

2008-10-31 17:07:52 CET 9162LOG: received smart shutdown request
2008-10-31 17:07:52 CET 9178FATAL: could not restore file
"000000010000008600000018" from archive: return code 15

This server was stopped intentionally by someone or something, external
to Postgres itself. "Smart shutdown" means the postmaster got SIGTERM.

#6Marc Schablewski
ms@clickware.de
In reply to: Merlin Moncure (#3)
Re: Hot standby stops after a few days of inactivity (i.e. no new WAL)

Yes, 'warm standby' was what I intended to write. This must have been
some kind of wishful thinking. ;)
But I'd really appreciate 'hot standby' in a future version of postgres.

Marc

Merlin Moncure wrote:

Show quoted text

On Tue, Nov 4, 2008 at 5:50 AM, Marc Schablewski <ms@clickware.de> wrote:

Hi,

we are running a PostgreSQL 8.3.3 on a Linux box (SuSE 10.3, 2.6.22
kernel) as a hot standby. After some maintenances work the WAL files

I'm assuming you meant 'warm standby'...hot standby servers can be
served for queries. This feature is proposed for PostgreSQL 8.4

merlin