"tuple concurrently updated" in pg_restore --jobs

Started by Justin Pryzbyalmost 6 years ago6 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

I hit this issue intermittently (roughly half the time) while working with a
patch David submitted, and finally found a recipe to reproduce it on an
unpatched v12 instance.

I was surprised to see pg_restore -j2 is restoring ACLs in pre-data in
parallel. Note different session IDs and PIDs:

2020-07-05 23:31:27.448 CDT,"pryzbyj","secondary_dump",24037,"[local]",5f02a91f.5de5,70,,LOG,00000,"statement: REVOKE SELECT ON TABLE pg_catalog.pg_proc FROM PUBLIC; ",,,,,,,,,"pg_restore","client backend"
2020-07-05 23:31:27.448 CDT,"pryzbyj","secondary_dump",24036,"[local]",5f02a91f.5de4,78,,LOG,00000,"statement: GRANT SELECT(tableoid) ON TABLE pg_catalog.pg_proc TO PUBLIC; ",,,,,,,,,"pg_restore","client backend"
2020-07-05 23:31:27.450 CDT,"pryzbyj","secondary_dump",24036,"[local]",5f02a91f.5de4,79,,LOG,00000,"statement: GRANT SELECT(oid) ON TABLE pg_catalog.pg_proc TO PUBLIC; ",,,,,,,,,"pg_restore","client backend"
2020-07-05 23:31:27.450 CDT,"pryzbyj","secondary_dump",24037,"[local]",5f02a91f.5de5,71,,ERROR,XX000,"tuple concurrently updated",,,,,,"REVOKE SELECT ON TABLE pg_catalog.pg_proc FROM PUBLIC;

postgres=# CREATE DATABASE pryzbyj;
postgres=# \c pryzbyj
pryzbyj=# REVOKE ALL ON pg_proc FROM postgres;
pryzbyj=# GRANT SELECT (tableoid, oid, proname) ON pg_proc TO public;
pryzbyj=# \dp+ pg_catalog.pg_proc
Schema | Name | Type | Access privileges | Column privileges | Policies
------------+---------+-------+-------------------+-------------------+----------
pg_catalog | pg_proc | table | =r/postgres | tableoid: +|
| | | | =r/postgres +|
| | | | oid: +|
| | | | =r/postgres +|
| | | | proname: +|
| | | | =r/postgres |

[pryzbyj@database ~]$ pg_dump pryzbyj -Fc -f pg_dump.out
[pryzbyj@database ~]$ pg_restore pg_dump.out -j2 -d pryzbyj --clean -v
...
pg_restore: entering main parallel loop
pg_restore: launching item 3744 ACL TABLE pg_proc
pg_restore: launching item 3745 ACL COLUMN pg_proc.proname
pg_restore: creating ACL "pg_catalog.TABLE pg_proc"
pg_restore: creating ACL "pg_catalog.COLUMN pg_proc.proname"
pg_restore:pg_restore: while PROCESSING TOC:
finished item 3745 ACL COLUMN pg_proc.proname
pg_restore: from TOC entry 3744; 0 0 ACL TABLE pg_proc postgres
pg_restore: error: could not execute query: ERROR: tuple concurrently updated
Command was: REVOKE ALL ON TABLE pg_catalog.pg_proc FROM postgres;

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#1)
Re: "tuple concurrently updated" in pg_restore --jobs

Justin Pryzby <pryzby@telsasoft.com> writes:

I hit this issue intermittently (roughly half the time) while working with a
patch David submitted, and finally found a recipe to reproduce it on an
unpatched v12 instance.

I was surprised to see pg_restore -j2 is restoring ACLs in pre-data in
parallel.

It's not pre-data. But it's true that pg_restore figures it can restore
ACLs in parallel during the ACL-restoring pass, on the theory that pg_dump
will not emit two different ACL entries for the same object, so that we
can do all the catalog updates in parallel without conflicts.

This works about 99% of the time, in fact. It falls down in the --clean
case if we have to revoke existing table permissions, because in that case
the REVOKE at table level is required to clear the table's per-column ACLs
as well, so that that ACL entry involves touching the same catalog rows
that the per-column ACLs want to touch.

I think the right fix is to give the per-column ACL entries dependencies
on the per-table ACL, if there is one. This will not fix the problem
for the case of restoring from an existing pg_dump archive that lacks
such dependency links --- but given the lack of field complaints, I'm
okay with that.

This looks straightforward, if somewhat tedious because we'll have to
change the API of pg_dump's dumpACL() function, which is called by
a lot of places. Barring objections, I'll go do that.

regards, tom lane

#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#2)
Re: "tuple concurrently updated" in pg_restore --jobs

On Fri, Jul 10, 2020 at 04:54:40PM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

I hit this issue intermittently (roughly half the time) while working with a
patch David submitted, and finally found a recipe to reproduce it on an
unpatched v12 instance.

I was surprised to see pg_restore -j2 is restoring ACLs in pre-data in
parallel.

It's not pre-data. But it's true that pg_restore figures it can restore
ACLs in parallel during the ACL-restoring pass, on the theory that pg_dump
will not emit two different ACL entries for the same object, so that we
can do all the catalog updates in parallel without conflicts.

This works about 99% of the time, in fact. It falls down in the --clean

Note that this fails for me (sometimes) even without --clean.

$ pg_restore pg_dump.out -j2 -d pryzbyj -v --section pre-data

pg_restore: entering main parallel loop
pg_restore: launching item 3395 ACL TABLE pg_proc
pg_restore: launching item 3396 ACL COLUMN pg_proc.proname
pg_restore: creating ACL "pg_catalog.TABLE pg_proc"
pg_restore: creating ACL "pg_catalog.COLUMN pg_proc.proname"
pg_restore: finished item 3395 ACL TABLE pg_proc
pg_restore: launching item 3397 ACL COLUMN pg_proc.pronamespace
pg_restore: creating ACL "pg_catalog.COLUMN pg_proc.pronamespace"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 3396; 0 0 ACL COLUMN pg_proc.proname postgres
pg_restore: error: could not execute query: ERROR: tuple concurrently updated
Command was: GRANT SELECT(proname) ON TABLE pg_catalog.pg_proc TO PUBLIC;

--
Justin

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#3)
Re: "tuple concurrently updated" in pg_restore --jobs

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Jul 10, 2020 at 04:54:40PM -0400, Tom Lane wrote:

This works about 99% of the time, in fact. It falls down in the --clean

Note that this fails for me (sometimes) even without --clean.

Oh, I was thinking that REVOKE would only be issued in the --clean
case, but apparently that's not so. Doesn't really affect the fix
proposal though. I just finished a patch for HEAD, as attached.

(I flushed the "CatalogId objCatId" argument of dumpACL, which was
not used.)

I'm not sure how far to back-patch it -- I think the parallel restore
of ACLs behavior is not very old, but we might want to teach older
pg_dump versions to insert the extra dependency anyway, for safety.

regards, tom lane

Attachments:

make-column-ACLs-depend-on-table-ACLs-1.patchtext/x-diff; charset=us-ascii; name=make-column-ACLs-depend-on-table-ACLs-1.patchDownload+64-35
#5Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#4)
Re: "tuple concurrently updated" in pg_restore --jobs

On Fri, Jul 10, 2020 at 05:36:28PM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Jul 10, 2020 at 04:54:40PM -0400, Tom Lane wrote:

This works about 99% of the time, in fact. It falls down in the --clean

Note that this fails for me (sometimes) even without --clean.

Oh, I was thinking that REVOKE would only be issued in the --clean
case, but apparently that's not so. Doesn't really affect the fix
proposal though. I just finished a patch for HEAD, as attached.

(I flushed the "CatalogId objCatId" argument of dumpACL, which was
not used.)

I'm not sure how far to back-patch it -- I think the parallel restore
of ACLs behavior is not very old, but we might want to teach older
pg_dump versions to insert the extra dependency anyway, for safety.

Yes, and the test case in David's patch on other thread [0]https://commitfest.postgresql.org/28/2568/ can't be
backpatched further than this patch is. A variant on his test case could just
as well be included in this patch (with pg_dump writing to a seekable FD) and
then amended later to also test writing to an unseekable FD.

[0]: https://commitfest.postgresql.org/28/2568/

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#5)
Re: "tuple concurrently updated" in pg_restore --jobs

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Jul 10, 2020 at 05:36:28PM -0400, Tom Lane wrote:

I'm not sure how far to back-patch it -- I think the parallel restore
of ACLs behavior is not very old, but we might want to teach older
pg_dump versions to insert the extra dependency anyway, for safety.

Yes, and the test case in David's patch on other thread [0] can't be
backpatched further than this patch is.

Actually, the answer seems to be that we'd better back-patch all the way,
because this is a live bug much further back than I'd guessed. pg_restore
is willing to run these ACL restores in parallel in all active branches.
The given test case only shows a failure back to 9.6, because older
versions don't dump ACLs on system catalogs; but of course you can just
try it with a user table instead.

Oddly, I could not get the "tuple concurrently updated" syndrome to
appear on 9.5. Not sure why not; the GRANT/REVOKE code looks the
same as in 9.6. What I *could* demonstrate in 9.5 is that sometimes
the post-restore state is flat out wrong: the column-level grants go
missing, presumably as a result of the table-level REVOKE executing
after the column-level GRANTs. Probably that syndrome occurs sometimes
in later branches too, depending on timing; but I didn't look.

regards, tom lane