pg_internal.init is hazardous to your health

Started by Tom Laneabout 19 years ago8 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Dirk Lutzebaeck and I just spent a tense couple of hours trying to
figure out why a large database Down Under wasn't coming up after being
reloaded from a base backup plus PITR recovery. The symptoms were that
the recovery went fine, but backend processes would fail at startup or
soon after with "could not open relation XX/XX/XX: No such file" type of
errors.

The answer that ultimately emerged was that they'd been running a
nightly maintenance script that did REINDEX SYSTEM (among other things
I suppose). The PITR base backup included pg_internal.init files that
were appropriate when it was taken, and the PITR recovery process did
nothing whatsoever to update 'em :-(. So incoming backends picked up
init files with obsolete relfilenode values.

We don't actually need to *update* the file, per se, we only need to
remove it if no longer valid --- the next incoming backend will rebuild
it. I could see fixing this by making WAL recovery run around and zap
all the .init files (only problem is to find 'em), or we could add a new
kind of WAL record saying "remove the .init file for database XYZ"
to be emitted whenever someone removes the active one. Thoughts?

Meanwhile, if you're trying to recover from a PITR backup and it's not
working, try removing any pg_internal.init files you can find.

regards, tom lane

#2Gavin Sherry
swm@linuxworld.com.au
In reply to: Tom Lane (#1)
Re: pg_internal.init is hazardous to your health

On Tue, 17 Oct 2006, Tom Lane wrote:

Dirk Lutzebaeck and I just spent a tense couple of hours trying to
figure out why a large database Down Under wasn't coming up after being
reloaded from a base backup plus PITR recovery. The symptoms were that
the recovery went fine, but backend processes would fail at startup or
soon after with "could not open relation XX/XX/XX: No such file" type of
errors.

The answer that ultimately emerged was that they'd been running a
nightly maintenance script that did REINDEX SYSTEM (among other things
I suppose). The PITR base backup included pg_internal.init files that
were appropriate when it was taken, and the PITR recovery process did
nothing whatsoever to update 'em :-(. So incoming backends picked up
init files with obsolete relfilenode values.

Ouch.

We don't actually need to *update* the file, per se, we only need to
remove it if no longer valid --- the next incoming backend will rebuild
it. I could see fixing this by making WAL recovery run around and zap
all the .init files (only problem is to find 'em), or we could add a new
kind of WAL record saying "remove the .init file for database XYZ"
to be emitted whenever someone removes the active one. Thoughts?

The latter seems the Right Way except, I guess, that the decision to
remove the file is buried deep inside inval.c.

Thanks,

Gavin

#3Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#1)
Re: pg_internal.init is hazardous to your health

On Tue, 2006-10-17 at 22:29 -0400, Tom Lane wrote:

Dirk Lutzebaeck and I just spent a tense couple of hours trying to
figure out why a large database Down Under wasn't coming up after being
reloaded from a base backup plus PITR recovery. The symptoms were that
the recovery went fine, but backend processes would fail at startup or
soon after with "could not open relation XX/XX/XX: No such file" type of
errors.

Understand the tension...

The answer that ultimately emerged was that they'd been running a
nightly maintenance script that did REINDEX SYSTEM (among other things
I suppose). The PITR base backup included pg_internal.init files that
were appropriate when it was taken, and the PITR recovery process did
nothing whatsoever to update 'em :-(. So incoming backends picked up
init files with obsolete relfilenode values.

OK, I'm looking at this now for later discussion.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Gavin Sherry (#2)
Re: pg_internal.init is hazardous to your health

On Wed, 2006-10-18 at 12:49 +1000, Gavin Sherry wrote:

We don't actually need to *update* the file, per se, we only need to
remove it if no longer valid --- the next incoming backend will rebuild
it. I could see fixing this by making WAL recovery run around and zap
all the .init files (only problem is to find 'em), or we could add a new
kind of WAL record saying "remove the .init file for database XYZ"
to be emitted whenever someone removes the active one. Thoughts?

Yes, that assessment seems good.

The latter seems the Right Way except, I guess, that the decision to
remove the file is buried deep inside inval.c.

I'd prefer the zap everything approach, but emitting a WAL record looks
mostly straightforward and just as good.

RelationCacheInitFileInvalidate() can easily emit a WAL record. This is
called twice in succession, so we would emit WAL on the
RelationCacheInitFileInvalidate(true) call only. I'll work out a patch
for that...XLOG_XACT_RELCACHE_INVALIDATE

RelationCacheInitFileInvalidate() is also called on each
FinishPreparedTransaction(). If that is called 100% of the time, then we
can skip writing an additional record for prepared transactions by
triggering the removal of pg_internal.init when we see a
XLOG_XACT_COMMIT_PREPARED during replay.
Not sure whether we need to do that, Heikki? Anyone?
I'm guessing no, but it seems sensible to check.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#4)
Re: pg_internal.init is hazardous to your health

"Simon Riggs" <simon@2ndquadrant.com> writes:

RelationCacheInitFileInvalidate() is also called on each
FinishPreparedTransaction().

Surely not...

regards, tom lane

#6Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: pg_internal.init is hazardous to your health

On Wed, 2006-10-18 at 13:24 -0400, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

RelationCacheInitFileInvalidate() is also called on each
FinishPreparedTransaction().

Surely not...

I take that to mean there's nothing special about prepared transactions
and invalidating the rel cache, so we *do* need to have a separate WAL
record in all cases.

OK, I'll write up a patch later today (working in US for few days).

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#7Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Simon Riggs (#4)
Re: pg_internal.init is hazardous to your health

Simon Riggs wrote:

RelationCacheInitFileInvalidate() is also called on each
FinishPreparedTransaction().

It's only called if the prepared transaction invalidated the init file.

If that is called 100% of the time, then we
can skip writing an additional record for prepared transactions by
triggering the removal of pg_internal.init when we see a
XLOG_XACT_COMMIT_PREPARED during replay.
Not sure whether we need to do that, Heikki? Anyone?
I'm guessing no, but it seems sensible to check.

If you write the WAL record in RelationCacheInitFileInvalidate(true),
that's enough. No extra handling for prepared transactions is needed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Simon Riggs
simon@2ndquadrant.com
In reply to: Simon Riggs (#3)
Re: pg_internal.init is hazardous to your health

On Wed, 2006-10-18 at 15:56 +0100, Simon Riggs wrote:

On Tue, 2006-10-17 at 22:29 -0400, Tom Lane wrote:

The answer that ultimately emerged was that they'd been running a
nightly maintenance script that did REINDEX SYSTEM (among other things
I suppose). The PITR base backup included pg_internal.init files that
were appropriate when it was taken, and the PITR recovery process did
nothing whatsoever to update 'em :-(. So incoming backends picked up
init files with obsolete relfilenode values.

OK, I'm looking at this now for later discussion.

I've coded a patch and am just testing now.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com