pgsql: Prevent instability in contrib/pageinspect's regression test.

Started by Tom Laneabout 3 years ago13 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Prevent instability in contrib/pageinspect's regression test.

pageinspect has occasionally failed on slow buildfarm members,
with symptoms indicating that the expected effects of VACUUM
FREEZE didn't happen. This is presumably because a background
transaction such as auto-analyze was holding back global xmin.

We can work around that by using a temp table in the test.
Since commit a7212be8b, that will use an up-to-date cutoff xmin
regardless of other processes. And pageinspect itself shouldn't
really care whether the table is temp.

Back-patch to v14. There would be no point in older branches
without back-patching a7212be8b, which seems like more trouble
than the problem is worth.

Discussion: /messages/by-id/2892135.1668976646@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/e2933a6e11791191050cd925d52d34e785eece77

Modified Files
--------------
contrib/pageinspect/expected/page.out | 3 ++-
contrib/pageinspect/sql/page.sql | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

#2Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#1)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

Hi,

On 2022-11-21 15:51:05 +0000, Tom Lane wrote:

Prevent instability in contrib/pageinspect's regression test.

pageinspect has occasionally failed on slow buildfarm members,
with symptoms indicating that the expected effects of VACUUM
FREEZE didn't happen. This is presumably because a background
transaction such as auto-analyze was holding back global xmin.

We can work around that by using a temp table in the test.
Since commit a7212be8b, that will use an up-to-date cutoff xmin
regardless of other processes. And pageinspect itself shouldn't
really care whether the table is temp.

Back-patch to v14. There would be no point in older branches
without back-patching a7212be8b, which seems like more trouble
than the problem is worth.

Looks like a chunk of the buildfarm doesn't like this - presumably because
they use force_parallel_mode = regress. Seems ok to just force that to off in
this test?

Greetings,

Andres Freund

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

Andres Freund <andres@anarazel.de> writes:

Looks like a chunk of the buildfarm doesn't like this - presumably because
they use force_parallel_mode = regress. Seems ok to just force that to off in
this test?

Ugh ... didn't occur to me to try that. I'll take a look.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

I wrote:

Andres Freund <andres@anarazel.de> writes:

Looks like a chunk of the buildfarm doesn't like this - presumably because
they use force_parallel_mode = regress. Seems ok to just force that to off in
this test?

Ugh ... didn't occur to me to try that. I'll take a look.

Hmm, so the problem is:

SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
ERROR: cannot access temporary tables during a parallel operation

Why in the world is get_raw_page() marked as parallel safe?
It clearly isn't, given this restriction.

regards, tom lane

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

On Mon, Nov 21, 2022 at 12:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Andres Freund <andres@anarazel.de> writes:

Looks like a chunk of the buildfarm doesn't like this - presumably because
they use force_parallel_mode = regress. Seems ok to just force that to off in
this test?

Ugh ... didn't occur to me to try that. I'll take a look.

Hmm, so the problem is:

SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
ERROR: cannot access temporary tables during a parallel operation

Why in the world is get_raw_page() marked as parallel safe?
It clearly isn't, given this restriction.

I suspect that restriction was overlooked when evaluating the marking
of this function.

--
Robert Haas
EDB: http://www.enterprisedb.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Nov 21, 2022 at 12:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm, so the problem is:

SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
ERROR: cannot access temporary tables during a parallel operation

Why in the world is get_raw_page() marked as parallel safe?
It clearly isn't, given this restriction.

I suspect that restriction was overlooked when evaluating the marking
of this function.

So it would seem. PARALLEL RESTRICTED should work, though.

I'll check to see if any sibling functions have the same issue,
and push a patch to adjust them.

Presumably the parallel labeling has to be fixed as far back as
it's marked that way (didn't look). Maybe we should push the
test change further back too, just to exercise this.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
1 attachment(s)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

I wrote:

I'll check to see if any sibling functions have the same issue,
and push a patch to adjust them.
Presumably the parallel labeling has to be fixed as far back as
it's marked that way (didn't look). Maybe we should push the
test change further back too, just to exercise this.

Hmm, so this is easy enough to fix in HEAD and v15, as attached.
However, there's a problem in older branches: their newest
versions of pageinspect are older than 1.10, so this doesn't
work as-is.

We could imagine inventing versions like 1.9.1, and providing
a script pageinspect--1.9--1.9.1.sql to do what's done here
as well as (in later branches) pageinspect--1.9.1--1.10.sql that
duplicates pageinspect--1.9--1.10.sql, and then the same again for
1.8 and 1.7 for the older in-support branches. That seems like
an awful lot of trouble for something that there have been no
field complaints about.

I'm inclined to apply this in HEAD and v15, and revert the
test change in v14, and call it good.

regards, tom lane

Attachments:

fix-pageinspect-parallel-markings.patchtext/x-diff; charset=us-ascii; name=fix-pageinspect-parallel-markings.patchDownload
diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index 5c0736564a..ad5a3ac511 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -13,7 +13,8 @@ OBJS = \
 	rawpage.o
 
 EXTENSION = pageinspect
-DATA =  pageinspect--1.9--1.10.sql pageinspect--1.8--1.9.sql \
+DATA =  pageinspect--1.10--1.11.sql \
+	pageinspect--1.9--1.10.sql pageinspect--1.8--1.9.sql \
 	pageinspect--1.7--1.8.sql pageinspect--1.6--1.7.sql \
 	pageinspect--1.5.sql pageinspect--1.5--1.6.sql \
 	pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
diff --git a/contrib/pageinspect/meson.build b/contrib/pageinspect/meson.build
index 3ec50b9445..25fa7dc20c 100644
--- a/contrib/pageinspect/meson.build
+++ b/contrib/pageinspect/meson.build
@@ -33,6 +33,7 @@ install_data(
   'pageinspect--1.7--1.8.sql',
   'pageinspect--1.8--1.9.sql',
   'pageinspect--1.9--1.10.sql',
+  'pageinspect--1.10--1.11.sql',
   'pageinspect.control',
   kwargs: contrib_data_args,
 )
diff --git a/contrib/pageinspect/pageinspect--1.10--1.11.sql b/contrib/pageinspect/pageinspect--1.10--1.11.sql
new file mode 100644
index 0000000000..8fa5e105bc
--- /dev/null
+++ b/contrib/pageinspect/pageinspect--1.10--1.11.sql
@@ -0,0 +1,28 @@
+/* contrib/pageinspect/pageinspect--1.10--1.11.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pageinspect UPDATE TO '1.11'" to load this file. \quit
+
+--
+-- Functions that fetch relation pages must be PARALLEL RESTRICTED,
+-- not PARALLEL SAFE, otherwise they will fail when run on a
+-- temporary table in a parallel worker process.
+--
+
+ALTER FUNCTION get_raw_page(text, int8) PARALLEL RESTRICTED;
+ALTER FUNCTION get_raw_page(text, text, int8) PARALLEL RESTRICTED;
+-- tuple_data_split must be restricted because it may fetch TOAST data.
+ALTER FUNCTION tuple_data_split(oid, bytea, integer, integer, text) PARALLEL RESTRICTED;
+ALTER FUNCTION tuple_data_split(oid, bytea, integer, integer, text, bool) PARALLEL RESTRICTED;
+-- heap_page_item_attrs must be restricted because it calls tuple_data_split.
+ALTER FUNCTION heap_page_item_attrs(bytea, regclass, bool) PARALLEL RESTRICTED;
+ALTER FUNCTION heap_page_item_attrs(bytea, regclass) PARALLEL RESTRICTED;
+ALTER FUNCTION bt_metap(text) PARALLEL RESTRICTED;
+ALTER FUNCTION bt_page_stats(text, int8) PARALLEL RESTRICTED;
+ALTER FUNCTION bt_page_items(text, int8) PARALLEL RESTRICTED;
+ALTER FUNCTION hash_bitmap_info(regclass, int8) PARALLEL RESTRICTED;
+-- brin_page_items might be parallel safe, because it seems to touch
+-- only index metadata, but I don't think there's a point in risking it.
+-- Likewise for gist_page_items.
+ALTER FUNCTION brin_page_items(bytea, regclass) PARALLEL RESTRICTED;
+ALTER FUNCTION gist_page_items(bytea, regclass) PARALLEL RESTRICTED;
diff --git a/contrib/pageinspect/pageinspect.control b/contrib/pageinspect/pageinspect.control
index 7cdf37913d..f277413dd8 100644
--- a/contrib/pageinspect/pageinspect.control
+++ b/contrib/pageinspect/pageinspect.control
@@ -1,5 +1,5 @@
 # pageinspect extension
 comment = 'inspect the contents of database pages at a low level'
-default_version = '1.10'
+default_version = '1.11'
 module_pathname = '$libdir/pageinspect'
 relocatable = true
#8Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#5)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

Hi,

On 2022-11-21 12:52:01 -0500, Robert Haas wrote:

On Mon, Nov 21, 2022 at 12:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Andres Freund <andres@anarazel.de> writes:

Looks like a chunk of the buildfarm doesn't like this - presumably because
they use force_parallel_mode = regress. Seems ok to just force that to off in
this test?

Ugh ... didn't occur to me to try that. I'll take a look.

Hmm, so the problem is:

SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
ERROR: cannot access temporary tables during a parallel operation

Why in the world is get_raw_page() marked as parallel safe?
It clearly isn't, given this restriction.

I suspect that restriction was overlooked when evaluating the marking
of this function.

It's somewhat sad to add this restriction - I've used get_raw_page() (+
other functions) to scan a whole database for a bug. IIRC that actually
did end up using parallelism, albeit likely not very efficiently.

Don't really have a better idea though.

It may be worth inventing a framework where a function could analyze its
arguments (presumably via prosupport) to determine the degree of
parallel safety, but this doesn't seem sufficient reason...

Greetings

Andres Freund

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

Andres Freund <andres@anarazel.de> writes:

On 2022-11-21 12:52:01 -0500, Robert Haas wrote:

On Mon, Nov 21, 2022 at 12:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Why in the world is get_raw_page() marked as parallel safe?
It clearly isn't, given this restriction.

It's somewhat sad to add this restriction - I've used get_raw_page() (+
other functions) to scan a whole database for a bug. IIRC that actually
did end up using parallelism, albeit likely not very efficiently.
Don't really have a better idea though.

Me either.

It may be worth inventing a framework where a function could analyze its
arguments (presumably via prosupport) to determine the degree of
parallel safety, but this doesn't seem sufficient reason...

Maybe, but in this example you could only decide you were parallel
safe if the argument is an OID constant, which'd be pretty limiting.

If I were trying to find a better fix I'd be looking for ways for
parallel workers to be able to read the parent's temp tables.
(Perhaps that could tie in with the blue-sky discussion we had
the other day about allowing autovacuum on temp tables??)

regards, tom lane

#10David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#9)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

On Mon, Nov 21, 2022 at 1:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2022-11-21 12:52:01 -0500, Robert Haas wrote:

On Mon, Nov 21, 2022 at 12:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Why in the world is get_raw_page() marked as parallel safe?
It clearly isn't, given this restriction.

It's somewhat sad to add this restriction - I've used get_raw_page() (+
other functions) to scan a whole database for a bug. IIRC that actually
did end up using parallelism, albeit likely not very efficiently.
Don't really have a better idea though.

Me either.

It may be worth inventing a framework where a function could analyze its
arguments (presumably via prosupport) to determine the degree of
parallel safety, but this doesn't seem sufficient reason...

Maybe, but in this example you could only decide you were parallel
safe if the argument is an OID constant, which'd be pretty limiting.

If I were trying to find a better fix I'd be looking for ways for
parallel workers to be able to read the parent's temp tables.
(Perhaps that could tie in with the blue-sky discussion we had
the other day about allowing autovacuum on temp tables??)

I don't suppose we want to just document the fact that these power-user
non-core functions are unable to process temporary tables safely without
first disabling parallelism for the session.

David J.

#11Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#9)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

Hi,

On 2022-11-21 15:12:15 -0500, Tom Lane wrote:

If I were trying to find a better fix I'd be looking for ways for
parallel workers to be able to read the parent's temp tables.
(Perhaps that could tie in with the blue-sky discussion we had
the other day about allowing autovacuum on temp tables??)

That'd be a nontrivial change, because we explicitly don't use any
locking for anything relating to localbuf.c. One possible benefit could
be that we could substantially reduce the code duplication between
"normal" bufmgr.c and localbuf.c.

Greetings,

Andres Freund

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#11)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

Andres Freund <andres@anarazel.de> writes:

On 2022-11-21 15:12:15 -0500, Tom Lane wrote:

If I were trying to find a better fix I'd be looking for ways for
parallel workers to be able to read the parent's temp tables.
(Perhaps that could tie in with the blue-sky discussion we had
the other day about allowing autovacuum on temp tables??)

That'd be a nontrivial change, because we explicitly don't use any
locking for anything relating to localbuf.c. One possible benefit could
be that we could substantially reduce the code duplication between
"normal" bufmgr.c and localbuf.c.

I didn't say this was easy ;-). Aside from locking, the local buffers
are inside the process's address space and not accessible from outside.
Maybe they could be mapped into a shared memory region instead?
And there are optimizations like commit a7212be8b that depend on the
assumption that nothing else is accessing our process's temp tables.
That'd need a lot of thought, if we don't want to give up all the
performance benefits of temp tables.

regards, tom lane

#13Greg Stark
stark@mit.edu
In reply to: Andres Freund (#8)
Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

On Mon, 21 Nov 2022 at 15:01, Andres Freund <andres@anarazel.de> wrote:

It's somewhat sad to add this restriction - I've used get_raw_page() (+
other functions) to scan a whole database for a bug. IIRC that actually
did end up using parallelism, albeit likely not very efficiently.

Don't really have a better idea though.

Given how specific the use case is here a simple solution would be to
just have a dedicated get_raw_temp_page() and restrict get_raw_page()
to persistent tables.

I suppose slightly gilding it would be to make a get_raw_page_temp()
and get_raw_page_persistent() and then you could have get_raw_page()
call the appropropriate one. They would be parallel restricted except
for get_raw_page_persistent() and if you explicitly called it you
could get parallel scans otherwise you wouldn't.

--
greg