pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

Started by Andres Freundalmost 12 years ago78 messagesbugs
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
multixacts around because they won't be read after the upgrade (and
aren't compatible). It just resets the new cluster's nextMulti to the
old + 1.
Unfortunately that means that there'll be a offsets/0000 file created by
initdb around. Sounds harmless enough, but that'll actually cause
problems if the old cluster had a nextMulti that's bigger than that
page.

When vac_truncate_clog() calls TruncateMultiXact() that'll scan
pg_multixact/offsets to find the earliest existing segment. That'll be
0000. If the to-be-truncated data is older than the last existing
segment it returns. Then it'll try to determine the last required data
in members/ by accessing the oldest data in offsets/.

Unfortunately, due to the existing 0000/ segment, that means it'll
sometimes try to access a nonexistant offsets/ file. Preventing vacuum
et al from succeeding.

It seems to me the fix for this is to a) rmtree("pg_multixact/members",
false) in copy_clog_xlog_xid() in the oldcluster < 9.3 case b) add a
warning to the release notes that everyone that used pg_upgrade and has
a 0000 file lying around should report to the mailinglist.

b) is a bit unsatisfactory, but I don't want to suggest removing the
file. In too many situations it might actually still be needed.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

Hi,

On 2014-05-30 14:16:31 +0200, Andres Freund wrote:

When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
multixacts around because they won't be read after the upgrade (and
aren't compatible). It just resets the new cluster's nextMulti to the
old + 1.
Unfortunately that means that there'll be a offsets/0000 file created by
initdb around. Sounds harmless enough, but that'll actually cause
problems if the old cluster had a nextMulti that's bigger than that
page.

When vac_truncate_clog() calls TruncateMultiXact() that'll scan
pg_multixact/offsets to find the earliest existing segment. That'll be
0000. If the to-be-truncated data is older than the last existing
segment it returns. Then it'll try to determine the last required data
in members/ by accessing the oldest data in offsets/.

Unfortunately, due to the existing 0000/ segment, that means it'll
sometimes try to access a nonexistant offsets/ file. Preventing vacuum
et al from succeeding.

For the sake of search engines, the error that triggers is:
ERROR: could not access status of transaction 2072053907
DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#1)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On Fri, May 30, 2014 at 02:16:31PM +0200, Andres Freund wrote:

Hi,

When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
multixacts around because they won't be read after the upgrade (and
aren't compatible). It just resets the new cluster's nextMulti to the
old + 1.
Unfortunately that means that there'll be a offsets/0000 file created by
initdb around. Sounds harmless enough, but that'll actually cause
problems if the old cluster had a nextMulti that's bigger than that
page.

When vac_truncate_clog() calls TruncateMultiXact() that'll scan
pg_multixact/offsets to find the earliest existing segment. That'll be
0000. If the to-be-truncated data is older than the last existing
segment it returns. Then it'll try to determine the last required data
in members/ by accessing the oldest data in offsets/.

Unfortunately, due to the existing 0000/ segment, that means it'll
sometimes try to access a nonexistant offsets/ file. Preventing vacuum
et al from succeeding.

It seems to me the fix for this is to a) rmtree("pg_multixact/members",
false) in copy_clog_xlog_xid() in the oldcluster < 9.3 case b) add a
warning to the release notes that everyone that used pg_upgrade and has
a 0000 file lying around should report to the mailinglist.

b) is a bit unsatisfactory, but I don't want to suggest removing the
file. In too many situations it might actually still be needed.

This is a bug in 9.3 pg_upgrade as well? Why has no one reported it
before?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#3)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On 2014-05-30 09:29:16 -0400, Bruce Momjian wrote:

This is a bug in 9.3 pg_upgrade as well?

Yes.

Why has no one reported it before?

My guess is that it wasn't attributed to pg_upgrade in the
past. Typically the error will only occur a fair amount of time
later. You'll just see vacuums randomly erroring out with slru.c errors
about nonexistant files :(.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#4)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On Fri, May 30, 2014 at 03:32:44PM +0200, Andres Freund wrote:

On 2014-05-30 09:29:16 -0400, Bruce Momjian wrote:

This is a bug in 9.3 pg_upgrade as well?

Yes.

Why has no one reported it before?

My guess is that it wasn't attributed to pg_upgrade in the
past. Typically the error will only occur a fair amount of time
later. You'll just see vacuums randomly erroring out with slru.c errors
about nonexistant files :(.

But how much later? pg_upgrade is pretty popular now but I am just not
seeing the number of errors as I would expect:

ERROR: could not access status of transaction 2072053907
DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.

I am not saying there is no bug, but from your analysis it would seem to
be 100% of pg_upgrade'ed clusters that use multi-xacts. Is that true?
If so, it seems we would need to tell everyone to remove the 0000 files
if there are higher numbered ones with numbering gaps. Is this
something our next minor release should fix in the multi-xacts code?
Fixing pg_upgrade seems like the easy part.

--
Bruce_ Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#5)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On 2014-05-30 11:00:06 -0400, Bruce Momjian wrote:

On Fri, May 30, 2014 at 03:32:44PM +0200, Andres Freund wrote:

On 2014-05-30 09:29:16 -0400, Bruce Momjian wrote:

This is a bug in 9.3 pg_upgrade as well?

Yes.

Why has no one reported it before?

My guess is that it wasn't attributed to pg_upgrade in the
past. Typically the error will only occur a fair amount of time
later. You'll just see vacuums randomly erroring out with slru.c errors
about nonexistant files :(.

But how much later? pg_upgrade is pretty popular now but I am just not
seeing the number of errors as I would expect:

ERROR: could not access status of transaction 2072053907
DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.

I am not saying there is no bug, but from your analysis it would seem to
be 100% of pg_upgrade'ed clusters that use multi-xacts.

It'd need to be clusters that used more multixacts than fit onto two
slru offsets/ segments in < 9.2. Otherwise things will probably just
continue to work because there's no hole.
Also the new cluster needs to have used more than
vacuum_multixact_freeze_min_age multis after the upgrade to make the
problem visible. I think.

If so, it seems we would need to tell everyone to remove the 0000 files
if there are higher numbered ones with numbering gaps.

The problem is that it's not actually that easy to define whether
there's a gap - after a multixact id wraparound the 0000 file might
exist again. So I'd rather not give a simple instruction that might
delete critical data.

Is this something our next minor release should fix in the multi-xacts
code?

I wondered whether we could detect that case and deal with it
transparently, but haven't come up with something smart. Alvaro?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#6)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On Fri, May 30, 2014 at 05:13:17PM +0200, Andres Freund wrote:

On 2014-05-30 11:00:06 -0400, Bruce Momjian wrote:

On Fri, May 30, 2014 at 03:32:44PM +0200, Andres Freund wrote:

On 2014-05-30 09:29:16 -0400, Bruce Momjian wrote:

This is a bug in 9.3 pg_upgrade as well?

Yes.

Why has no one reported it before?

My guess is that it wasn't attributed to pg_upgrade in the
past. Typically the error will only occur a fair amount of time
later. You'll just see vacuums randomly erroring out with slru.c errors
about nonexistant files :(.

But how much later? pg_upgrade is pretty popular now but I am just not
seeing the number of errors as I would expect:

ERROR: could not access status of transaction 2072053907
DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.

I am not saying there is no bug, but from your analysis it would seem to
be 100% of pg_upgrade'ed clusters that use multi-xacts.

It'd need to be clusters that used more multixacts than fit onto two
slru offsets/ segments in < 9.2. Otherwise things will probably just
continue to work because there's no hole.

Do you mean <= 9.2 here for the old cluster? Multi-xacts were added in
9.3:

commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed Jan 23 12:04:59 2013 -0300

Improve concurrency of foreign key locking

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#7)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On 2014-05-30 16:35:15 -0400, Bruce Momjian wrote:

On Fri, May 30, 2014 at 05:13:17PM +0200, Andres Freund wrote:

It'd need to be clusters that used more multixacts than fit onto two
slru offsets/ segments in < 9.2. Otherwise things will probably just
continue to work because there's no hole.

Do you mean <= 9.2 here for the old cluster?

yes.

Multi-xacts were added in 9.3:

That's not really correct. They were added in 8.2 or something...

commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed Jan 23 12:04:59 2013 -0300

Improve concurrency of foreign key locking

That just expanded the usage to also be able to track updated rows.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#8)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On Fri, May 30, 2014 at 11:03:53PM +0200, Andres Freund wrote:

On 2014-05-30 16:35:15 -0400, Bruce Momjian wrote:

On Fri, May 30, 2014 at 05:13:17PM +0200, Andres Freund wrote:

It'd need to be clusters that used more multixacts than fit onto two
slru offsets/ segments in < 9.2. Otherwise things will probably just
continue to work because there's no hole.

Do you mean <= 9.2 here for the old cluster?

yes.

Multi-xacts were added in 9.3:

That's not really correct. They were added in 8.2 or something...

commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed Jan 23 12:04:59 2013 -0300

Improve concurrency of foreign key locking

That just expanded the usage to also be able to track updated rows.

I guess I meant the new multixacts file format was in 9.3.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#9)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On Fri, May 30, 2014 at 05:25:26PM -0400, Bruce Momjian wrote:

On Fri, May 30, 2014 at 11:03:53PM +0200, Andres Freund wrote:

On 2014-05-30 16:35:15 -0400, Bruce Momjian wrote:

On Fri, May 30, 2014 at 05:13:17PM +0200, Andres Freund wrote:

It'd need to be clusters that used more multixacts than fit onto two
slru offsets/ segments in < 9.2. Otherwise things will probably just
continue to work because there's no hole.

Do you mean <= 9.2 here for the old cluster?

yes.

Multi-xacts were added in 9.3:

That's not really correct. They were added in 8.2 or something...

commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed Jan 23 12:04:59 2013 -0300

Improve concurrency of foreign key locking

That just expanded the usage to also be able to track updated rows.

I guess I meant the new multixacts file format was in 9.3.

It appears this item is on hold until Alvaro can comment. I can
probably find the impact and correct the problem for previous upgrades,
but it might be 1-2 months until I can get to it. The fix for future
upgrades is simple.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#1)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

Andres Freund wrote:

Hi,

When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
multixacts around because they won't be read after the upgrade (and
aren't compatible). It just resets the new cluster's nextMulti to the
old + 1.
Unfortunately that means that there'll be a offsets/0000 file created by
initdb around. Sounds harmless enough, but that'll actually cause
problems if the old cluster had a nextMulti that's bigger than that
page.

When vac_truncate_clog() calls TruncateMultiXact() that'll scan
pg_multixact/offsets to find the earliest existing segment. That'll be
0000. If the to-be-truncated data is older than the last existing
segment it returns. Then it'll try to determine the last required data
in members/ by accessing the oldest data in offsets/.

I'm trying to understand the mechanism of this bug, and I'm not
succeeding. If the offset/0000 was created by initdb, how come we try
to delete a file that's not also members/0000? I mean, surely the file
as created by initdb is empty (zeroed). In your sample error message
downthread,

ERROR: could not access status of transaction 2072053907
DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.

what prompted the status of that multixid to be sought? I see one
possible path to this error message, which is SlruPhysicalReadPage().
(There are other paths that lead to similar errors, but they use
"transaction 0" instead, so we can rule those out; and we can rule out
anything that uses MultiXactMemberCtl because of the path given in
DETAIL.)

There are four callsites that lead to that:

RecordNewMultiXact
GetMultiXactIdMembers (2x)
TrimMultiXact

Of those, only GetMultiXactIdMembers is likely to be called from vacuum
(actually RecordNewMultiXact can too, in a few cases, if it happens to
freeze a multi by creating another multi; should be pretty rare).
But you were talking about vacuum truncating pg_multixact -- and I don't
see how that's related to these functions.

Is it possible that you pasted the wrong error message?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#11)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On 2014-06-13 13:51:51 -0400, Alvaro Herrera wrote:

Andres Freund wrote:

Hi,

When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
multixacts around because they won't be read after the upgrade (and
aren't compatible). It just resets the new cluster's nextMulti to the
old + 1.
Unfortunately that means that there'll be a offsets/0000 file created by
initdb around. Sounds harmless enough, but that'll actually cause
problems if the old cluster had a nextMulti that's bigger than that
page.

When vac_truncate_clog() calls TruncateMultiXact() that'll scan
pg_multixact/offsets to find the earliest existing segment. That'll be
0000. If the to-be-truncated data is older than the last existing
segment it returns. Then it'll try to determine the last required data
in members/ by accessing the oldest data in offsets/.

I'm trying to understand the mechanism of this bug, and I'm not
succeeding. If the offset/0000 was created by initdb, how come we try
to delete a file that's not also members/0000? I mean, surely the file
as created by initdb is empty (zeroed). In your sample error message
downthread,

The bit you're missing is essentially the following in TruncateMultiXact():
/*
* Note we can't just plow ahead with the truncation; it's possible that
* there are no segments to truncate, which is a problem because we are
* going to attempt to read the offsets page to determine where to
* truncate the members SLRU. So we first scan the directory to determine
* the earliest offsets page number that we can read without error.
*/
That check is thwarted due to the 0000 segment. So the segment
containing the oldestMXact has already been removed, but we don't notice
that that's the case.

ERROR: could not access status of transaction 2072053907
DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.

what prompted the status of that multixid to be sought? I see one
possible path to this error message, which is SlruPhysicalReadPage().
(There are other paths that lead to similar errors, but they use
"transaction 0" instead, so we can rule those out; and we can rule out
anything that uses MultiXactMemberCtl because of the path given in
DETAIL.)

Same function:
/*
* First, compute the safe truncation point for MultiXactMember. This is
* the starting offset of the multixact we were passed as MultiXactOffset
* cutoff.
*/
{
int pageno;
int slotno;
int entryno;
MultiXactOffset *offptr;

/* lock is acquired by SimpleLruReadPage_ReadOnly */

pageno = MultiXactIdToOffsetPage(oldestMXact);
entryno = MultiXactIdToOffsetEntry(oldestMXact);

slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno,
oldestMXact);
offptr = (MultiXactOffset *)
MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
oldestOffset = *offptr;

LWLockRelease(MultiXactOffsetControlLock);
}

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#10)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

Bruce Momjian wrote:

It appears this item is on hold until Alvaro can comment. I can
probably find the impact and correct the problem for previous upgrades,
but it might be 1-2 months until I can get to it. The fix for future
upgrades is simple.

TBH I think future upgrades is what we need to fix now, so that people
don't do broken upgrades from 9.2 to 9.4 once the latter is out. (I
would assume that upgrades from 9.3 will not have a problem, unless the
9.3 setup itself is already broken due to a prior pg_upgrade). Also, we
need to fix pg_upgrade to protect people upgrading to 9.3.5 from <= 9.2.

As for detecting the case in existing installs, I think we can supply a
query in the next minor release notes that uses pg_ls_dir() to see
whether there is a 0000 file outside the working range of
multixact/offset, whose result if nonempty would indicate the need to
delete (or more likely rename) a file.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#3)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

Bruce Momjian wrote:

On Fri, May 30, 2014 at 02:16:31PM +0200, Andres Freund wrote:

Hi,

When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
multixacts around because they won't be read after the upgrade (and
aren't compatible). It just resets the new cluster's nextMulti to the
old + 1.
Unfortunately that means that there'll be a offsets/0000 file created by
initdb around. Sounds harmless enough, but that'll actually cause
problems if the old cluster had a nextMulti that's bigger than that
page.

I've been playing with this a bit. Here's a patch that just does
rmtree() of the problematic file during pg_upgrade, as proposed by
Andres, which solves the problem. Note that this patch removes the 0000
file in both cases: when upgrading from 9.2, and when upgrading from
9.3+. The former case is the bug that Andres reported. In the second
case, we overwrite the files with the ones from the old cluster; if
there's a lingering 0000 file in the new cluster, it would cause
problems as well. (Really, I don't see any reason to think that these
two cases are any different.)

This is a bug in 9.3 pg_upgrade as well? Why has no one reported it
before?

I think one reason is that not all upgrades see an issue here; for old
clusters that haven't gone beyond the 0000 offset file, there is no
problem. For clusters that have gone beyond 0000 but not by much, the
file will be deleted during the first truncation. It only becomes a
problem if the cluster is close enough to 2^31. Another thing to keep
in consideration is that initdb initializes all databases' datminmxid to
1. If the old cluster was past the 2^31 point, it means the datminmxid
doesn't move from 1 until the actual wraparound.

I noticed another problem while trying to reproduce it. It only happens
in assert-enabled builds: when FreezeMultiXactId() sees an old multi, it
asserts that it mustn't be running anymore, which is fine except that if
the multi value is one that survived a pg_upgrade cycle from 9.2 or
older (i.e. it was a share-locked tuple in the old cluster), an error is
raised when the assertion is checked. Since it doesn't make sense to
examine its running status at that point, the attached patch simply
disables the assertion in that case.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

multiupgrade.patchtext/x-diff; charset=us-asciiDownload+19-1
#15Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#14)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On Wed, Jun 18, 2014 at 06:51:31PM -0400, Alvaro Herrera wrote:

Bruce Momjian wrote:

On Fri, May 30, 2014 at 02:16:31PM +0200, Andres Freund wrote:

Hi,

When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
multixacts around because they won't be read after the upgrade (and
aren't compatible). It just resets the new cluster's nextMulti to the
old + 1.
Unfortunately that means that there'll be a offsets/0000 file created by
initdb around. Sounds harmless enough, but that'll actually cause
problems if the old cluster had a nextMulti that's bigger than that
page.

I've been playing with this a bit. Here's a patch that just does
rmtree() of the problematic file during pg_upgrade, as proposed by
Andres, which solves the problem. Note that this patch removes the 0000
file in both cases: when upgrading from 9.2, and when upgrading from
9.3+. The former case is the bug that Andres reported. In the second
case, we overwrite the files with the ones from the old cluster; if
there's a lingering 0000 file in the new cluster, it would cause
problems as well. (Really, I don't see any reason to think that these
two cases are any different.)

I wasn't happy with having that delete code added there when we do
directory delete in the function above. I instead broke apart the
delete and copy code and called the delete code where needed, in the
attached patch.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachments:

pg_upgrade.difftext/x-diff; charset=us-asciiDownload+31-9
#16Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#14)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On Wed, Jun 18, 2014 at 06:51:31PM -0400, Alvaro Herrera wrote:

This is a bug in 9.3 pg_upgrade as well? Why has no one reported it
before?

I think one reason is that not all upgrades see an issue here; for old
clusters that haven't gone beyond the 0000 offset file, there is no
problem. For clusters that have gone beyond 0000 but not by much, the
file will be deleted during the first truncation. It only becomes a
problem if the cluster is close enough to 2^31. Another thing to keep
in consideration is that initdb initializes all databases' datminmxid to
1. If the old cluster was past the 2^31 point, it means the datminmxid
doesn't move from 1 until the actual wraparound.

OK, so the xid has to be beyond 2^31 during pg_upgrade to trigger a
problem? That might explain the rare reporting of this bug. What would
the test query look like so we can tell people when to remove the '0000'
files? Would we need to see the existence of '0000' and high-numbered
files? How high? What does a 2^31 file look like?

Also, is there a reason you didn't remove the 'members/0000' file in your
patch? I have removed it in my version.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#17Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#16)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

On Wed, Jun 18, 2014 at 09:52:05PM -0400, Bruce Momjian wrote:

On Wed, Jun 18, 2014 at 06:51:31PM -0400, Alvaro Herrera wrote:

This is a bug in 9.3 pg_upgrade as well? Why has no one reported it
before?

I think one reason is that not all upgrades see an issue here; for old
clusters that haven't gone beyond the 0000 offset file, there is no
problem. For clusters that have gone beyond 0000 but not by much, the
file will be deleted during the first truncation. It only becomes a
problem if the cluster is close enough to 2^31. Another thing to keep
in consideration is that initdb initializes all databases' datminmxid to
1. If the old cluster was past the 2^31 point, it means the datminmxid
doesn't move from 1 until the actual wraparound.

OK, so the xid has to be beyond 2^31 during pg_upgrade to trigger a
problem? That might explain the rare reporting of this bug. What would
the test query look like so we can tell people when to remove the '0000'
files? Would we need to see the existence of '0000' and high-numbered
files? How high? What does a 2^31 file look like?

Also, what would a legitimate 0000 file at wrap-around time look like?
Would there have to be an 'ffff' or 'ffffff' file?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#17)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

Bruce Momjian wrote:

I wasn't happy with having that delete code added there when we do
directory delete in the function above. I instead broke apart the
delete and copy code and called the delete code where needed, in the
attached patch.

Makes sense, yeah. I didn't look closely enough to realize that the
function that does the copying also does the rmtree part.

I also now realize why the other case (upgrade from 9.3 to 9.4) does not
have a bug: we are already deleting the files in that path.

Bruce Momjian wrote:

I think one reason is that not all upgrades see an issue here; for old
clusters that haven't gone beyond the 0000 offset file, there is no
problem. For clusters that have gone beyond 0000 but not by much, the
file will be deleted during the first truncation. It only becomes a
problem if the cluster is close enough to 2^31. Another thing to keep
in consideration is that initdb initializes all databases' datminmxid to
1. If the old cluster was past the 2^31 point, it means the datminmxid
doesn't move from 1 until the actual wraparound.

OK, so the xid has to be beyond 2^31 during pg_upgrade to trigger a
problem? That might explain the rare reporting of this bug. What would
the test query look like so we can tell people when to remove the '0000'
files? Would we need to see the existence of '0000' and high-numbered
files? How high? What does a 2^31 file look like?

I misspoke.

I ran a few more upgrades, and then tried vacuuming all databases, which
is when the truncate code is run. Say the original cluster had an
oldestmulti of 10 million. If you just run VACUUM in the new cluster
after the upgrade, the 0000 file is not deleted: it's not yet old enough
in terms of multixact age. An error is not thrown, because we're still
not attempting a truncate. But if you lower the
vacuum_multixact_freeze_table_age to 10 million minus one, then we will
try the deletion and that will raise the error.

I think (didn't actually try) if you just let 150 million multixacts be
generated, that's the first time you will get the error.

Now if you run a VACUUM FREEZE after the upgrade, the file will be
deleted with no error.

I now think that the reason most people haven't hit the problem is that
they don't generate enough multis after upgrading a database that had
enough multis in the old database. This seems a bit curious

Also, is there a reason you didn't remove the 'members/0000' file in your
patch? I have removed it in my version.

There's no point. That file is the starting point for new multis
anyway, and it's compatible with the new format (because it's all
zeroes).

I guess you could get in trouble if you initdb the 9.3 database,
generate a few multis there, and then drop all tables and use that
modified cluster as a "new" cluster for pg_upgrade. I would question
the sanity of a person doing something like that, however.

Bruce Momjian wrote:

OK, so the xid has to be beyond 2^31 during pg_upgrade to trigger a
problem? That might explain the rare reporting of this bug. What would
the test query look like so we can tell people when to remove the '0000'
files? Would we need to see the existence of '0000' and high-numbered
files? How high? What does a 2^31 file look like?

Also, what would a legitimate 0000 file at wrap-around time look like?
Would there have to be an 'ffff' or 'ffffff' file?

Since I was wrong, there is no point in further research here. Anyway
the last file before wrapping around in pg_multixact/members is FFFF.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#18)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

BTW I hacked up pg_resetxlog a bit to make it generate the necessary
pg_multixact/offset file when -m is given. This is not acceptable for
commit because it duplicates the #defines from pg_multixact.c, but maybe
we want this functionality enough that we're interested in a more
complete version of this patch; also it unconditionally writes one zero
byte to the file, which is of course wrong if the file exists and
already contains data.

It'd be nice to be able to write this in a way that lets it work for all
existing SLRU users (pg_clog is the most common, but
pg_multixact/members and the predicate locking stuff might also be
useful). Not sure what would that look like.

Another consideration is that it doesn't remove existing files that are
outside of the new valid range according to freezing parameters and
such, but I'm not sure this is really doable considering that we might
need to change the relminmxid and datminmxid values, etc.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

resetxlog_multi.patchtext/x-diff; charset=us-asciiDownload+41-0
#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#18)
Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

Alvaro Herrera wrote:

Bruce Momjian wrote:

Bruce Momjian wrote:

OK, so the xid has to be beyond 2^31 during pg_upgrade to trigger a
problem? That might explain the rare reporting of this bug. What would
the test query look like so we can tell people when to remove the '0000'
files? Would we need to see the existence of '0000' and high-numbered
files? How high? What does a 2^31 file look like?

Also, what would a legitimate 0000 file at wrap-around time look like?
Would there have to be an 'ffff' or 'ffffff' file?

Since I was wrong, there is no point in further research here. Anyway
the last file before wrapping around in pg_multixact/members is FFFF.

Oops, I meant the last file before wrap in pg_multixact/offsets is FFFF,
which is what we're talking about in this thread.

For members it's 14078, but it's not relevant here.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#21Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#18)
#22Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#19)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#20)
#26Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#23)
#27Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#24)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#28)
#30Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#18)
#31Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#25)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#31)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#32)
#34Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#32)
#35Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#32)
#36Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#35)
#37Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#36)
#38Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#36)
#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#34)
#40Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#39)
#41Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#38)
#42Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#41)
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#38)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#43)
#45Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#43)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#45)
#47Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#46)
#48Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#44)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#47)
#50Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#49)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#50)
#52Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#51)
#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#52)
#54Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#53)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#54)
#56Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#55)
#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#56)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#57)
#59Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#57)
#60Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#58)
#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#60)
#62Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#61)
#63Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#62)
#64Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#63)
#65Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#64)
#66Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#64)
#67Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#66)
#68Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#67)
#69Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#68)
#70Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#69)
#71Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#70)
#72Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#71)
#73Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#72)
#74Stuart Bishop
stuart@stuartbishop.net
In reply to: Tom Lane (#43)
#75Andres Freund
andres@anarazel.de
In reply to: Stuart Bishop (#74)
#76Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#64)
#77Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#76)
#78Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#77)