Transactional sequence stuff breaks pg_upgrade
I believe I've identified the reason why skink and some other buildfarm
members have been failing the pg_upgrade test recently. It is that
recent changes in sequence support have caused binary-upgrade restore
runs to do some sequence OID/relfilenode assignments without any heed
to the OIDs that pg_upgrade tried to impose on those sequences. Once
those sequences have relfilenodes other than the intended ones, they
are land mines for all subsequent pg_upgrade-controlled table OID
assignments.
I am not very sure why it's so hard to duplicate the misbehavior; perhaps,
in order to make the failure happen with the current regression tests,
it's necessary for a background auto-analyze to happen and consume some
OIDs (for pg_statistic TOAST entries) at just the wrong time. However,
I can definitely demonstrate that there are uncontrolled relfilenode
assignments happening during pg_upgrade's restore run. I stuck an
elog() call into GetNewObjectId(), along with generation of a stack
trace using backtrace(), and here is one example:
[593daad3.4863:2243] LOG: generated OID 16735
[593daad3.4863:2244] STATEMENT:
-- For binary upgrade, must preserve pg_class oids
SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('46851'::pg_catalog.oid);
-- For binary upgrade, must preserve pg_type oid
SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('46852'::pg_catalog.oid);
ALTER TABLE "itest10" ALTER COLUMN "a" ADD GENERATED BY DEFAULT AS IDENTITY (
SEQUENCE NAME "itest10_a_seq"
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1
);
postgres: postgres regression [local] ALTER TABLE(GetNewObjectId+0xda) [0x50397a]
postgres: postgres regression [local] ALTER TABLE(GetNewRelFileNode+0xec) [0x52430c]
postgres: postgres regression [local] ALTER TABLE(RelationSetNewRelfilenode+0x79) [0x851d59]
postgres: postgres regression [local] ALTER TABLE(AlterSequence+0x1cd) [0x5d976d]
postgres: postgres regression [local] ALTER TABLE() [0x75d279]
postgres: postgres regression [local] ALTER TABLE(standard_ProcessUtility+0xb7) [0x75dec7]
postgres: postgres regression [local] ALTER TABLE() [0x75cb1d]
postgres: postgres regression [local] ALTER TABLE(standard_ProcessUtility+0xb7) [0x75dec7]
postgres: postgres regression [local] ALTER TABLE() [0x759f0b]
postgres: postgres regression [local] ALTER TABLE() [0x75ae91]
postgres: postgres regression [local] ALTER TABLE(PortalRun+0x250) [0x75b740]
postgres: postgres regression [local] ALTER TABLE() [0x757be7]
postgres: postgres regression [local] ALTER TABLE(PostgresMain+0xe08) [0x759968]
postgres: postgres regression [local] ALTER TABLE(PostmasterMain+0x1a99) [0x6e21a9]
postgres: postgres regression [local] ALTER TABLE(main+0x6b8) [0x65b958]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x3f3bc1ed1d]
postgres: postgres regression [local] ALTER TABLE() [0x473899]
Judging by when we started to see buildfarm failures, I think that
commit 3d79013b9 probably broke it, but the problem seems latent
in the whole concept of transactional sequence information.
Not sure what we want to do about it. One idea is to make
ALTER SEQUENCE not so transactional when in binary-upgrade mode.
(I'm also tempted to make GetNewRelFileNode complain if IsBinaryUpgrade
is true, but that's a separate matter.)
In any case, this is a "must fix" problem IMO, so I'll go add it to the
open items list.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
I believe I've identified the reason why skink and some other buildfarm
members have been failing the pg_upgrade test recently.
...
Not sure what we want to do about it. One idea is to make
ALTER SEQUENCE not so transactional when in binary-upgrade mode.
On closer inspection, the only thing that pg_upgrade needs is to be
able to do ALTER SEQUENCE OWNED BY without a relfilenode bump. PFA
two versions of a patch that fixes this problem, at least to the
extent that it gets through check-world without triggering the Assert
I added to GetNewRelFileNode (which HEAD doesn't). The first one
is a minimally-invasive hack; the second one puts the responsibility
for deciding if a sequence rewrite is needed into init_params. That's
bulkier but might be useful if we ever grow additional ALTER SEQUENCE
options that don't need a rewrite. OTOH, I'm not very clear on what
such options might look like, so maybe the extra flexibility is useless.
Thoughts?
regards, tom lane
Tom Lane wrote:
On closer inspection, the only thing that pg_upgrade needs is to be
able to do ALTER SEQUENCE OWNED BY without a relfilenode bump. PFA
two versions of a patch that fixes this problem, at least to the
extent that it gets through check-world without triggering the Assert
I added to GetNewRelFileNode (which HEAD doesn't). The first one
is a minimally-invasive hack; the second one puts the responsibility
for deciding if a sequence rewrite is needed into init_params. That's
bulkier but might be useful if we ever grow additional ALTER SEQUENCE
options that don't need a rewrite. OTOH, I'm not very clear on what
such options might look like, so maybe the extra flexibility is useless.
Thoughts?
I vote for the second patch. I don't have a clear idea either, but I'm
pretty sure the logical-replication people is going to be hacking on
sequences some more, yet, and this is likely to come in handy.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2017-06-11 17:17:07 -0400, Tom Lane wrote:
I am not very sure why it's so hard to duplicate the misbehavior; perhaps,
in order to make the failure happen with the current regression tests,
it's necessary for a background auto-analyze to happen and consume some
OIDs (for pg_statistic TOAST entries) at just the wrong time. However,
I can definitely demonstrate that there are uncontrolled relfilenode
assignments happening during pg_upgrade's restore run. I stuck an
elog() call into GetNewObjectId(), along with generation of a stack
trace using backtrace(), and here is one example:
Yea, that's not all that surprising :/
Judging by when we started to see buildfarm failures, I think that
commit 3d79013b9 probably broke it, but the problem seems latent
in the whole concept of transactional sequence information.
Hm.
Not sure what we want to do about it. One idea is to make
ALTER SEQUENCE not so transactional when in binary-upgrade mode.
I think what you proposed downthread makes sense - I'd ripped out fairly
similar code in my recent changes, because it was only introduced to
address the concurrency problems the new sequence code had, and it
didn't seem to buy sufficiently much...
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2017-06-11 20:03:28 -0400, Tom Lane wrote:
I wrote:
I believe I've identified the reason why skink and some other buildfarm
members have been failing the pg_upgrade test recently.
...
Not sure what we want to do about it. One idea is to make
ALTER SEQUENCE not so transactional when in binary-upgrade mode.On closer inspection, the only thing that pg_upgrade needs is to be
able to do ALTER SEQUENCE OWNED BY without a relfilenode bump.
That indeed seems appropriate. More on the reliability of that below,
though.
PFA two versions of a patch that fixes this problem, at least to the
extent that it gets through check-world without triggering the Assert
I added to GetNewRelFileNode (which HEAD doesn't). The first one
is a minimally-invasive hack; the second one puts the responsibility
for deciding if a sequence rewrite is needed into init_params. That's
bulkier but might be useful if we ever grow additional ALTER SEQUENCE
options that don't need a rewrite. OTOH, I'm not very clear on what
such options might look like, so maybe the extra flexibility is useless.
Thoughts?
I think the flexibility isn't a bad idea, there's certainly other
options (cycle, cache, and with more work additional ones) that could be
changed without a rewrite.
diff --git a/src/backend/catalog/catalog.c b/src/backend/catalog/catalog.c index 11ee536..43ef4cd 100644 --- a/src/backend/catalog/catalog.c +++ b/src/backend/catalog/catalog.c @@ -391,6 +391,13 @@ GetNewRelFileNode(Oid reltablespace, Rel bool collides; BackendId backend;+ /* + * If we ever get here during pg_upgrade, there's something wrong; all + * relfilenode assignments during a binary-upgrade run should be + * determined by commands in the dump script. + */ + Assert(!IsBinaryUpgrade); +
I'm very doubtful that a) this doesn't get hit in practice, and b) that
we can rely on it going forward. At least until we change toasting to
not use the global oid counter. A number of catalog tables have
toastable columns, and the decision when to toast will every now and
then change. Even without changing the compression algorithm, added
columns will push things across boundaries.
I'm not yet sure what the best fix for that will be, but I don't think
we should bake in the assumption that the oid counter won't be touched.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2017-06-11 20:03:28 -0400, Tom Lane wrote:
@@ -391,6 +391,13 @@ GetNewRelFileNode(Oid reltablespace, Rel
bool collides;
BackendId backend;+ /* + * If we ever get here during pg_upgrade, there's something wrong; all + * relfilenode assignments during a binary-upgrade run should be + * determined by commands in the dump script. + */ + Assert(!IsBinaryUpgrade); +
I'm very doubtful that a) this doesn't get hit in practice, and b) that
we can rely on it going forward. At least until we change toasting to
not use the global oid counter.
This is not about assignments from the global OID counter; the function
it's touching is GetNewRelFileNode() not GetNewObjectId().
GetNewObjectId() definitely does get hit during a binary-upgrade restore,
for all object types that pg_upgrade doesn't try to control the OIDs of
--- which is most. We don't care, for the most part. But we *do* care
about relfilenode assignments, for precisely the reason seen in this bug.
*All* assignments of relfilenodes have to be shortcircuited by pg_upgrade
override calls during a binary-restore run, or we risk filename
collisions. So if this assert ever gets hit, we have something to fix.
I intend to not only commit this, but back-patch it. There's enough
changes in relevant code paths that logic that is fine in HEAD might
not be fine in back branches.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-06-12 17:13:34 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2017-06-11 20:03:28 -0400, Tom Lane wrote:
@@ -391,6 +391,13 @@ GetNewRelFileNode(Oid reltablespace, Rel
bool collides;
BackendId backend;+ /* + * If we ever get here during pg_upgrade, there's something wrong; all + * relfilenode assignments during a binary-upgrade run should be + * determined by commands in the dump script. + */ + Assert(!IsBinaryUpgrade); +I'm very doubtful that a) this doesn't get hit in practice, and b) that
we can rely on it going forward. At least until we change toasting to
not use the global oid counter.This is not about assignments from the global OID counter; the function
it's touching is GetNewRelFileNode() not GetNewObjectId().
Ah, that makes more sense. You'd put the backtrace() into
GetNewObjectId() your original message, that's probably why I thought
about it.
We don't care, for the most part. But we *do* care about relfilenode
assignments, for precisely the reason seen in this bug.
Even there I don't think that's a sane assumption *for the future*. We
just need a slight change in the rules about when a toast table is needed
- and that stuff seriously need overhauling - and it doesn't work
anymore.
In my opinion the problem of:
assignments of relfilenodes have to be shortcircuited by pg_upgrade
override calls during a binary-restore run, or we risk filename
collisions.
should instead be solved by simply not even trying to preserve
relfilenodes. We can "just" copy/link files to the the new
relfilenodes, there's no need to preserve them, in contrast to
pg_class.oid etc. But that's obviously something for the future.
I intend to not only commit this, but back-patch it. There's enough
changes in relevant code paths that logic that is fine in HEAD might
not be fine in back branches.
Hm.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
In my opinion the problem of:
On 2017-06-12 17:13:34 -0400, Tom Lane wrote:
assignments of relfilenodes have to be shortcircuited by pg_upgrade
override calls during a binary-restore run, or we risk filename
collisions.
should instead be solved by simply not even trying to preserve
relfilenodes. We can "just" copy/link files to the the new
relfilenodes, there's no need to preserve them, in contrast to
pg_class.oid etc. But that's obviously something for the future.
Right. But until somebody gets around to fixing that, it's a problem.
Also, even if we fix this, we still have the issue of type OIDs residing
on-disk in places like array headers; that results in pg_upgrade having
to control type OID assignments as well. I thought about trying to insert
an assert in GetNewOidWithIndex to ensure that no type OIDs are selected
automatically during upgrade, but it seemed a bit too complicated. Might
be a good idea to figure it out though, unless you have an idea for
removing that requirement.
I intend to not only commit this, but back-patch it. There's enough
changes in relevant code paths that logic that is fine in HEAD might
not be fine in back branches.
Hm.
Just found out that 9.4 (and probably older) fail the assertion because
of the temp table created in get_rel_infos(). That's *probably* all
right because the table is *probably* gone from disk by the time we
start the actual restore run. But we start the new postmaster only
once, with -b, so the assertion applies to the preparatory steps as
well as the restore proper.
Later versions avoid that by using a CTE instead of a temp table.
I'm not excited enough about this to back-port the patch that
changed it, so I'm thinking of just adding the assert back to 9.5.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-06-12 17:35:37 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
In my opinion the problem of:
On 2017-06-12 17:13:34 -0400, Tom Lane wrote:
assignments of relfilenodes have to be shortcircuited by pg_upgrade
override calls during a binary-restore run, or we risk filename
collisions.should instead be solved by simply not even trying to preserve
relfilenodes. We can "just" copy/link files to the the new
relfilenodes, there's no need to preserve them, in contrast to
pg_class.oid etc. But that's obviously something for the future.Right. But until somebody gets around to fixing that, it's a problem.
Also, even if we fix this, we still have the issue of type OIDs residing
on-disk in places like array headers; that results in pg_upgrade having
to control type OID assignments as well.
Yes, but those should be largely controlled and controllable. With the
exception of the type oids of toast tables, more on that below.
I thought about trying to insert an assert in GetNewOidWithIndex to
ensure that no type OIDs are selected automatically during upgrade,
but it seemed a bit too complicated. Might be a good idea to figure
it out though, unless you have an idea for removing that requirement.
I think the only type oids assigned during pg_upgrade are the oids for
toast type types, right? I can think of two decent ways to deal with
those:
a) Do not create a corresponding composite type for toast tables. Not
super pretty, but I doubt it'd be a huge issue.
b) Use *one* composite type for all of the toast tables. That'd not be
entirely trivial because of pg_type.typrelid.
Both of these would have the advantage of removing some quite redundant
content from the catalogs.
I think with such a change, we'd have no issue with *additional* toast
tables being created during the run? We already take care of toast
tables not being created, by forcing the creation in binary upgrade mode
if a toast oid is set.
Later versions avoid that by using a CTE instead of a temp table.
I'm not excited enough about this to back-port the patch that
changed it, so I'm thinking of just adding the assert back to 9.5.
I'm still a bit doubtful that we know enough to consider whether the
assert is actually safe in practice, to backpatch it. On the other
hand, it'd only affect people building with asserts...
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2017-06-12 17:35:37 -0400, Tom Lane wrote:
I thought about trying to insert an assert in GetNewOidWithIndex to
ensure that no type OIDs are selected automatically during upgrade,
but it seemed a bit too complicated. Might be a good idea to figure
it out though, unless you have an idea for removing that requirement.
I think the only type oids assigned during pg_upgrade are the oids for
toast type types, right?
Perhaps that was a problem once, but it isn't now; see
binary_upgrade_set_next_toast_pg_type_oid().
In fact, we get through the pg_upgrade regression test with the attached
patch, and I really think we ought to commit it.
Having said that, I wouldn't mind trying to reduce the catalog overhead
for toast tables in v11 or beyond.
a) Do not create a corresponding composite type for toast tables. Not
super pretty, but I doubt it'd be a huge issue.
I suspect there are places that assume all tables have type OIDs.
b) Use *one* composite type for all of the toast tables. That'd not be
entirely trivial because of pg_type.typrelid.
That might work, with some klugery. Peter might have some insight about
this --- I'm not sure whether the CREATE TABLE OF TYPE syntax shares
a type OID across all the tables.
regards, tom lane
Attachments:
check-upgrade-type-and-relfilenodes-all-assigned.patchtext/x-diff; charset=us-ascii; name=check-upgrade-type-and-relfilenodes-all-assigned.patchDownload+16-0
On Mon, Jun 12, 2017 at 5:25 PM, Andres Freund <andres@anarazel.de> wrote:
Even there I don't think that's a sane assumption *for the future*. We
just need a slight change in the rules about when a toast table is needed
- and that stuff seriously need overhauling - and it doesn't work
anymore.
The problem is that if a relfilenode ever gets assigned by
GetNewRelFileNode() during a binary-upgrade restore, that OID may turn
out to be used by some other object later in the dump. And then
you're dead, because the dump restore will fail later on complaining
about, well, I forget the error message wording exactly, but,
basically, an OID collision. So if we change the rules in such a way
that objects which currently lack TOAST tables acquire them, we need
to first restore all of the objects *without* adding any new TOAST
tables, and then at the end create any new TOAST tables once we have a
complete list of the relfilenodes that are actually used.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
The problem is that if a relfilenode ever gets assigned by
GetNewRelFileNode() during a binary-upgrade restore, that OID may turn
out to be used by some other object later in the dump. And then
you're dead, because the dump restore will fail later on complaining
about, well, I forget the error message wording exactly, but,
basically, an OID collision.
Right.
So if we change the rules in such a way
that objects which currently lack TOAST tables acquire them, we need
to first restore all of the objects *without* adding any new TOAST
tables, and then at the end create any new TOAST tables once we have a
complete list of the relfilenodes that are actually used.
toasting.c explains why this currently doesn't seem necessary:
/*
* In binary-upgrade mode, create a TOAST table if and only if
* pg_upgrade told us to (ie, a TOAST table OID has been provided).
*
* This indicates that the old cluster had a TOAST table for the
* current table. We must create a TOAST table to receive the old
* TOAST file, even if the table seems not to need one.
*
* Contrariwise, if the old cluster did not have a TOAST table, we
* should be able to get along without one even if the new version's
* needs_toast_table rules suggest we should have one. There is a lot
* of daylight between where we will create a TOAST table and where
* one is really necessary to avoid failures, so small cross-version
* differences in the when-to-create heuristic shouldn't be a problem.
* If we tried to create a TOAST table anyway, we would have the
* problem that it might take up an OID that will conflict with some
* old-cluster table we haven't seen yet.
*/
I don't really see any near-term reason why we would need to change this.
In the long run, it would certainly be cleaner if pg_upgrade dropped
the force-the-relfilenode-assignment approach and instead remapped
relfilenodes from old cluster to new. But I think it's just for
cleanliness rather to fix any live or foreseeable bug.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 13, 2017 at 9:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
In the long run, it would certainly be cleaner if pg_upgrade dropped
the force-the-relfilenode-assignment approach and instead remapped
relfilenodes from old cluster to new. But I think it's just for
cleanliness rather to fix any live or foreseeable bug.
Honestly, I think that would probably be *less* robust overall. I
think we ought to be driving in the direction of having more and more
things common between the old and new clusters, rather than trying to
cope with them being different. It's pretty easy for users to have
hidden dependencies on values that we think are only for internal use
- e.g. somebody's got a table column of type regclass, and it breaks
if you changed the table OIDs. A table with relfilenode values in it
is perhaps less likely, but it is not beyond imagining that someone
(who wrote a replication system?) has a use for it. Now maybe you're
going to say "that's not a good idea", and maybe you're right, but
users don't enjoy being told that something they've been doing for
years works all the time *except* when you use pg_upgrade.
Also, I think that if we did it that way, it would be significantly
harder to debug. Right now, if something goes boom, you can look at
the old and new clusters and figure out what doesn't match, but if
pg_upgrade renumbered everything, you would no longer be able to do
that, or at least not easily.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 13, 2017 at 11:14:02AM -0400, Robert Haas wrote:
Also, I think that if we did it that way, it would be significantly
harder to debug. Right now, if something goes boom, you can look at
the old and new clusters and figure out what doesn't match, but if
pg_upgrade renumbered everything, you would no longer be able to do
that, or at least not easily.
FYI, pg_upgrade is designed to go boom if something doesn't look right
because it can't anticipate what changes might be made to Postgres in
the future.
boom == feature!
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers