BUG #15309: ERROR: catalog is missing 1 attribute(s) for relid 760676 when max_parallel_maintenance_workers > 0
The following bug has been logged on the website:
Bug reference: 15309
Logged by: death lock
Email address: deathlock13@gmail.com
PostgreSQL version: 11beta2
Operating system: Debian10 64b
Description:
11beta2 and git REL_11_STABLE, gcc 8.2, pg.conf : autovacuum=off
DB restored from plain-format backup, first 'vacuum full' ends with ERR :
"catalog is missing 1 attribute(s) for relid ..." - seem to point to only
PrimaryKey, sometimes the same PK
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
11beta2 and git REL_11_STABLE, gcc 8.2, pg.conf : autovacuum=off
DB restored from plain-format backup, first 'vacuum full' ends with ERR :
"catalog is missing 1 attribute(s) for relid ..." - seem to point to only
PrimaryKey, sometimes the same PK
If you could provide a self-contained example, this would be very
interesting, but there's nothing we can do with just the information
you've provided here.
Perhaps you can strip and/or anonymize your backup dump file down
to a postable example that reproduces this problem?
regards, tom lane
On Fri, Aug 3, 2018 at 6:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Perhaps you can strip and/or anonymize your backup dump file down
to a postable example that reproduces this problem?
It would also be interesting to see the result of this query, which
relies on the v11 amcheck extension (so "CREATE EXTENSION amcheck"
first):
SELECT bt_index_parent_check(index => c.oid, heapallindexed => true),
c.relname,
c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC;
If that doesn't raise an error, you could try the same query, but
remove "AND n.nspname = 'pg_catalog'". That will take considerably
longer, but probably won't be intolerable.
Thanks
--
Peter Geoghegan
On August 3, 2018 9:51:01 AM PDT, Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Aug 3, 2018 at 6:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Perhaps you can strip and/or anonymize your backup dump file down
to a postable example that reproduces this problem?It would also be interesting to see the result of this query, which
relies on the v11 amcheck extension (so "CREATE EXTENSION amcheck"
first):
Maybe I'm missing something, but doesn't that error likely point to a bug in the new parallel index creation code, rather than the created indexes?
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Fri, Aug 3, 2018 at 10:11 AM, Andres Freund <andres@anarazel.de> wrote:
Maybe I'm missing something, but doesn't that error likely point to a bug in the new parallel index creation code, rather than the created indexes?
I've certainly seen error messages like that with cases of
catalog corruption. I recall repairing pg_attribute by hand when this
happened, and then having to reindex the system catalogs.
I don't really know what's wrong here, but I wouldn't be surprised if
amcheck detected a problem. Let's see.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
On Fri, Aug 3, 2018 at 10:11 AM, Andres Freund <andres@anarazel.de> wrote:
Maybe I'm missing something, but doesn't that error likely point to a bug in the new parallel index creation code, rather than the created indexes?
I've certainly seen error messages like that with cases of
catalog corruption. I recall repairing pg_attribute by hand when this
happened, and then having to reindex the system catalogs.
I don't really know what's wrong here, but I wouldn't be surprised if
amcheck detected a problem. Let's see.
I think the OP's time would be more usefully spent on creating a
submittable test case. We will want that whether or not amcheck
complains, because the mere fact of the complaint is not likely to be
enough to find the problem. (I do agree that this smells like a problem
in parallel index build.)
regards, tom lane
On Fri, Aug 3, 2018 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think the OP's time would be more usefully spent on creating a
submittable test case. We will want that whether or not amcheck
complains, because the mere fact of the complaint is not likely to be
enough to find the problem. (I do agree that this smells like a problem
in parallel index build.)
A test case would be ideal, but it's unlikely to take very long to run
amcheck. If verification is limited to the system catalogs, then it
will probably take just a few seconds.
--
Peter Geoghegan
On Fri, Aug 3, 2018 at 6:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
11beta2 and git REL_11_STABLE, gcc 8.2, pg.conf : autovacuum=off
DB restored from plain-format backup, first 'vacuum full' ends with ERR :
"catalog is missing 1 attribute(s) for relid ..." - seem to point to only
PrimaryKey, sometimes the same PKIf you could provide a self-contained example, this would be very
interesting, but there's nothing we can do with just the information
you've provided here.
I got an off-list testcase from the OP, who writes:
"""
I've attached my 'SHOW ALL' and a backup of example db from PG wiki,
just kept renaming schema/tables until got a large enough number of
indexes - seem to be key to reproducing error(s).
2018-08-06 11:31:44 CEST|TEST|postgres|[local]| ERROR: duplicate key
value violates unique constraint "pg_class_relname_nsp_index"
2018-08-06 11:31:44 CEST|TEST|postgres|[local]| DETAIL: Key (relname,
relnamespace)=(pg_class_tblspc_relfilenode_index, 11) already exists.
2018-08-06 11:31:44 CEST|TEST|postgres|[local]| STATEMENT: vacuum
full analyze ;
or 2018-08-06 11:20:13 CEST|PUMA|postgres|[local]| DETAIL: Key
(relname, relnamespace)=(idx_last_name, 16498) already exists.
Got this after runing your query with amcheck before 'VACUUM FULL'
2018-08-06 11:51:10 CEST|||| PANIC: could not open critical system index 2676
2018-08-06 11:51:10 CEST|TEST|postgres|[local]| ERROR: could not open
critical system index 2676
2018-08-06 11:51:10 CEST|TEST|postgres|[local]| CONTEXT: parallel worker
2018-08-06 11:51:10 CEST|TEST|postgres|[local]| STATEMENT: vacuum full;
2018-08-06 11:51:10 CEST|||| LOG: background worker "parallel worker"
(PID 16618) was terminated by signal 6: Aborted
2018-08-06 11:51:10 CEST|||| LOG: terminating any other active server processes
2018-08-06 12:11:24 CEST|||| ERROR: could not open relation with OID 2696
2018-08-06 12:11:24 CEST|TEST|postgres|[local]| ERROR: could not open
relation with OID 2696
2018-08-06 12:11:24 CEST|TEST|postgres|[local]| CONTEXT: parallel worker
2018-08-06 12:11:24 CEST|TEST|postgres|[local]| STATEMENT: vacuum full;
2018-08-06 12:11:24 CEST|||| LOG: background worker "parallel worker"
(PID 16930) exited with exit code 1
"""
I can reproduce this with the backup provided. This seems to boil down
to the following:
1. Create a database with tens of thousands of relations, each of
which contain no data.
2. Run an unqualified VACUUM FULL.
This is what I see after the restore:
pg@foodb[22648]=# vacuum FULL ;
ERROR: duplicate key value violates unique constraint
"pg_class_relname_nsp_index"
DETAIL: Key (relname, relnamespace)=(customer_pkey, 16445) already exists.
pg@foodb[22648]=# :amcheck
ERROR: heap tuple (358,1) from table "pg_attribute" lacks matching
index tuple within index "pg_attribute_relid_attnam_index"
I'll work to isolate and diagnose the problem today. It likely has
something to do with corrupting the state needed by a catalog parallel
index build in the context of the VACUUM FULL. pg_attribute grows to
several tens of megabytes here, which is enough to get a parallel
index build.
--
Peter Geoghegan
On Mon, Aug 6, 2018 at 10:43 AM, Peter Geoghegan <pg@bowt.ie> wrote:
I'll work to isolate and diagnose the problem today. It likely has
something to do with corrupting the state needed by a catalog parallel
index build in the context of the VACUUM FULL. pg_attribute grows to
several tens of megabytes here, which is enough to get a parallel
index build.
This repro can be further simplified, by just doing a VACUUM FULL on
pg_attribute alone. There is no index corruption prior to that point.
After that point, there is -- both pg_attribute_relid_attnam_index and
pg_attribute_relid_attnum_index seem to become corrupt. All other
symptoms probably stem from this initial corruption, so I'm focusing
on it.
What I see if I look at the corrupt pg_attribute_relid_attnum_index
structure is that the index does actually have an entry for a heap
tuple that amcheck complains about lacking an entry for -- at least,
it has a key match. The problem that amcheck noticed was that the heap
item pointer was not as it should be (i.e. the index tuple points to
the wrong heap tuple). I also noticed that nearby index tuples had
duplicate entries, the first pointing to approximately the same place
in the heap that the corrupt-to-amcheck tuple points to, and the
second pointing to approximately the same place in the heap that
amcheck expected to find it at (amcheck was complaining about an
adjacent entry, so it's only approximately the same place in the
heap).
I suspect that the problem is that parallel workers have a different
idea about which relfilenode they need to scan, or something along
those lines. Maybe cluster_rel() needs to be taught about parallel
CREATE INDEX. I must have missed some detail within cluster.c prior to
parallel CREATE INDEX going in.
--
Peter Geoghegan
On Mon, Aug 6, 2018 at 1:31 PM, Peter Geoghegan <pg@bowt.ie> wrote:
I suspect that the problem is that parallel workers have a different
idea about which relfilenode they need to scan, or something along
those lines. Maybe cluster_rel() needs to be taught about parallel
CREATE INDEX. I must have missed some detail within cluster.c prior to
parallel CREATE INDEX going in.
To be clear, I mean that the leader process's worker state has the
right relfilenode (the leader process always participates as a
worker), but all worker processes have the stale relfilenode.
--
Peter Geoghegan
On Mon, Aug 6, 2018 at 1:37 PM, Peter Geoghegan <pg@bowt.ie> wrote:
To be clear, I mean that the leader process's worker state has the
right relfilenode (the leader process always participates as a
worker), but all worker processes have the stale relfilenode.
Sure enough, that's what the bug is - a few debugging calls to
RelationMapFilenodeToOid() within nbtsort.c proves it. Several
approaches to fixing the bug occur to me:
* Ban parallel CREATE INDEX for all catalogs. This was how things were
up until several weeks before the original patch was committed.
* Ban parallel CREATE INDEX for mapped catalogs only.
* Find a way to propagate the state necessary to have parallel workers
agree with the leader on the correct relfilenode.
We could probably propagate backend-local state like
active_local_updates without too much difficulty, which looks like it
would fix the problem. Note that we did something very similar with
reindex-pending-indexes lists in commit 29d58fd3. That commit
similarly involved propagating more backend-local state so that
parallel index builds (or at least REINDEX) on catalogs could be
enabled/work reliably. Maybe we should continue down the road of
making parallel builds work on catalogs, on general principle.
Thoughts?
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
Sure enough, that's what the bug is - a few debugging calls to
RelationMapFilenodeToOid() within nbtsort.c proves it. Several
approaches to fixing the bug occur to me:
* Ban parallel CREATE INDEX for all catalogs. This was how things were
up until several weeks before the original patch was committed.
* Ban parallel CREATE INDEX for mapped catalogs only.
* Find a way to propagate the state necessary to have parallel workers
agree with the leader on the correct relfilenode.
We could probably propagate backend-local state like
active_local_updates without too much difficulty, which looks like it
would fix the problem. Note that we did something very similar with
reindex-pending-indexes lists in commit 29d58fd3. That commit
similarly involved propagating more backend-local state so that
parallel index builds (or at least REINDEX) on catalogs could be
enabled/work reliably. Maybe we should continue down the road of
making parallel builds work on catalogs, on general principle.
Hm. Post-beta3, I think I'd vote for a conservative fix in v11,
which seems to be "ban for mapped catalogs". Feel free to make
it work in HEAD, though.
regards, tom lane
On Mon, Aug 6, 2018 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm. Post-beta3, I think I'd vote for a conservative fix in v11,
which seems to be "ban for mapped catalogs". Feel free to make
it work in HEAD, though.
Makes sense. I'm not sure if it's worth pursuing parallel mapped
catalog index builds much further, though. I doubt that ordinary users
care about whether or not this is supported, so this is a matter of
principle. I don't feel strongly on whether or not I should make
mapped builds work on HEAD, so I defer to you, and anyone else that
might have an interest. Does it matter, do you think?
It might be worth teaching heap_beginscan_parallel() to
cross-check each worker's heap relation's rd_smgr.smgr_node to a version
of the same field from the leader, stored in shared memory (in the
ParallelHeapScanDesc). That way, any future recurrence of a similar
bug will be far easier to detect. A "can't happen" error along these
lines seems like it would be worthwhile.
--
Peter Geoghegan
On 2018-Aug-06, Peter Geoghegan wrote:
On Mon, Aug 6, 2018 at 1:37 PM, Peter Geoghegan <pg@bowt.ie> wrote:
To be clear, I mean that the leader process's worker state has the
right relfilenode (the leader process always participates as a
worker), but all worker processes have the stale relfilenode.Sure enough, that's what the bug is - a few debugging calls to
RelationMapFilenodeToOid() within nbtsort.c proves it.
Uh, that's weird, isn't it? I mean, why is the relfilenode changing
underneath? Why isn't it blocked by the CREATE INDEX? Or is CREATE
INDEX inflicting that upon itself?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Aug-06, Peter Geoghegan wrote:
Sure enough, that's what the bug is - a few debugging calls to
RelationMapFilenodeToOid() within nbtsort.c proves it.
Uh, that's weird, isn't it? I mean, why is the relfilenode changing
underneath?
Because we're building a brand new index in a new file.
Why isn't it blocked by the CREATE INDEX? Or is CREATE
INDEX inflicting that upon itself?
The problem is (or so I assume) failure to propagate process-local
state of relmapper.c into the worker processes. So the leader knows
that we've assigned a new relfilenode to some mapped index, but the
workers don't.
regards, tom lane
On Mon, Aug 6, 2018 at 3:10 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Uh, that's weird, isn't it? I mean, why is the relfilenode changing
underneath? Why isn't it blocked by the CREATE INDEX? Or is CREATE
INDEX inflicting that upon itself?
CREATE INDEX/nbtsort.c does not take any special interest in
relfilenode, or anything like that.
As Tom said, this is a failure to propagate the process-local state of
relmapper.c into the worker processes. Or a failure to account for the
fact that that doesn't happen, perhaps.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
On Mon, Aug 6, 2018 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm. Post-beta3, I think I'd vote for a conservative fix in v11,
which seems to be "ban for mapped catalogs". Feel free to make
it work in HEAD, though.
Makes sense. I'm not sure if it's worth pursuing parallel mapped
catalog index builds much further, though. I doubt that ordinary users
care about whether or not this is supported, so this is a matter of
principle. I don't feel strongly on whether or not I should make
mapped builds work on HEAD, so I defer to you, and anyone else that
might have an interest. Does it matter, do you think?
Apparently there are people out there with catalogs big enough
to justify parallel reindex. I don't mind if the first version of
the feature doesn't handle that, but probably we should make it work
eventually.
It might be worth teaching heap_beginscan_parallel() to
cross-check each worker's heap relation's rd_smgr.smgr_node to a version
of the same field from the leader, stored in shared memory (in the
ParallelHeapScanDesc). That way, any future recurrence of a similar
bug will be far easier to detect. A "can't happen" error along these
lines seems like it would be worthwhile.
Well, maybe, but why is that field particularly vulnerable? I'd think you
should crosscheck the index's relfilenode too, at least, if you're going
to worry about that.
regards, tom lane
On Mon, Aug 6, 2018 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Apparently there are people out there with catalogs big enough
to justify parallel reindex. I don't mind if the first version of
the feature doesn't handle that, but probably we should make it work
eventually.
Okay. I'll pursue this in HEAD.
Well, maybe, but why is that field particularly vulnerable? I'd think you
should crosscheck the index's relfilenode too, at least, if you're going
to worry about that.
I agree.
I'd like to hear more opinions on the general idea. I'm particularly
interested in what Robert thinks.
--
Peter Geoghegan
On Mon, Aug 6, 2018 at 3:21 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Aug 6, 2018 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Apparently there are people out there with catalogs big enough
to justify parallel reindex. I don't mind if the first version of
the feature doesn't handle that, but probably we should make it work
eventually.Okay. I'll pursue this in HEAD.
I wrote this in a little over an hour, and it works fine. I'll post it
shortly, once I polish it up some more. I'll also post an alternative
"no mapped relation parallel builds" fix for v11, which will be
trivial.
Well, maybe, but why is that field particularly vulnerable? I'd think you
should crosscheck the index's relfilenode too, at least, if you're going
to worry about that.I agree.
I'd like to hear more opinions on the general idea. I'm particularly
interested in what Robert thinks.
On second thought, the crosscheck of the index's relfilenode seems
like independent work. I won't be working on it as part of the fix for
this bug.
The question of index relfilenode becoming stale or inconsistent
doesn't come up with parallel CREATE INDEX, since currently the
workers never write to the new index relfilenode. If we were going to
do a similar crosscheck for parallel index scan, the new code would be
need to be spread across places like btbeginscan(), and
RelationGetIndexScan(). It's not something that can just be done in
passing.
--
Peter Geoghegan
On Mon, Aug 6, 2018 at 7:32 PM, Peter Geoghegan <pg@bowt.ie> wrote:
I wrote this in a little over an hour, and it works fine. I'll post it
shortly, once I polish it up some more. I'll also post an alternative
"no mapped relation parallel builds" fix for v11, which will be
trivial.
Attached are two patches that each fix the issue -- a conservative
patch for v11, as well as a patch that actually propagates relmapper.c
state, for the master branch. It would be good to get a +1 on both
before pushing.
I'll pursue the parallel heap scan relfilenode cross-check thing
separately. Would still like to hear something from Robert on that
before proceeding.
--
Peter Geoghegan