PGOPTIONS="-fh" make check gets stuck since Postgres 11

Started by Michael Paquierover 6 years ago22 messages
#1Michael Paquier
michael@paquier.xyz

Hi all,

I have begun playing with regressplans.sh which enforces various
combinations of "-f s|i|n|m|h" when running the regression tests, and
I have noticed that -fh can cause the server to become stuck in the
test join_hash.sql with this query (not sure which portion of the SET
LOCAL parameters are involved) :
select count(*) from simple r join extremely_skewed s using (id);

This does not happen with REL_10_STABLE where the test executes
immediately, so we has visibly an issue caused by v11 here.

Any thoughts?
--
Michael

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#1)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Mon, Jul 8, 2019 at 5:53 PM Michael Paquier <michael@paquier.xyz> wrote:

I have begun playing with regressplans.sh which enforces various
combinations of "-f s|i|n|m|h" when running the regression tests, and
I have noticed that -fh can cause the server to become stuck in the
test join_hash.sql with this query (not sure which portion of the SET
LOCAL parameters are involved) :
select count(*) from simple r join extremely_skewed s using (id);

This does not happen with REL_10_STABLE where the test executes
immediately, so we has visibly an issue caused by v11 here.

If you don't allow hash joins it makes this plan:

Aggregate
-> Nested Loop
Join Filter: (r.id = s.id)
-> Seq Scan on simple r
-> Materialize
-> Seq Scan on extremely_skewed s

"simple" has 20k rows and "extremely_skewed" has 20k rows but the
planner thinks it only has 2. So this going to take O(n^2) time and n
is 20k. Not sure what to do about that. Maybe "join_hash" should be
skipped for the -h (= no hash joins please) case?

--
Thomas Munro
https://enterprisedb.com

#3Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#2)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Mon, Jul 08, 2019 at 06:49:44PM +1200, Thomas Munro wrote:

"simple" has 20k rows and "extremely_skewed" has 20k rows but the
planner thinks it only has 2. So this going to take O(n^2) time and n
is 20k. Not sure what to do about that. Maybe "join_hash" should be
skipped for the -h (= no hash joins please) case?

Ah, thanks. Yes that's going to take a while :)

Well, another thing I'd like to think about is if there is any point
to keep regressplans.sh and the relevant options in postgres at this
stage. From the top of the file one can read that:
# This script runs the Postgres regression tests with all useful combinations
# of the backend options that disable various query plan types. If the
# results are not all the same, it may indicate a bug in a particular
# plan type, or perhaps just a regression test whose results aren't fully
# determinate (eg, due to lack of an ORDER BY keyword).

However if you run any option with make check, then in all runs there
are tests failing. We can improve the situation for some of them with
ORDER BY queries by looking at the query outputs, but some EXPLAIN
queries are sensitive to that, and the history around regressplans.sh
does not play in favor of it (some people really use it?). If you
look at the latest commits for it, it has not been really touched in
19 years.

So I would be rather in favor in nuking it.
--
Michael

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#1)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

Michael Paquier <michael@paquier.xyz> writes:

I have begun playing with regressplans.sh which enforces various
combinations of "-f s|i|n|m|h" when running the regression tests, and
I have noticed that -fh can cause the server to become stuck in the
test join_hash.sql with this query (not sure which portion of the SET
LOCAL parameters are involved) :
select count(*) from simple r join extremely_skewed s using (id);

This does not happen with REL_10_STABLE where the test executes
immediately, so we has visibly an issue caused by v11 here.

Yeah, these test cases were added by fa330f9ad in v11.

What it looks like to me is that some of these test cases force
"enable_mergejoin = off", so if you also have enable_hashjoin off then
you are going to get a nestloop plan, and it's hardly surprising that
that takes an unreasonable amount of time on the rather large test
tables used in these tests.

Given the purposes of this test, I think it'd be reasonable to force
both enable_hashjoin = on and enable_mergejoin = off at the very top
of the join_hash script, or the corresponding place in join.sql in
v11. Thomas, was there a specific reason for forcing enable_mergejoin
= off for only some of these tests?

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#3)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

Michael Paquier <michael@paquier.xyz> writes:

Well, another thing I'd like to think about is if there is any point
to keep regressplans.sh and the relevant options in postgres at this
stage. From the top of the file one can read that:

The point of regressplans.sh is to see if anything goes seriously
wrong when forcing non-default plan choices --- seriously wrong being
defined as crashes or semantically wrong answers. It's not expected
that the regression tests will automatically pass when you do that,
because of their dependencies on output row ordering, not to mention
all the EXPLAINs. I'm not for removing it --- the fact that its
results require manual evaluation doesn't make it useless.

Having said that, join_hash.sql in particular seems to have zero
value if it's not testing hash joins, so I think it'd be reasonable
for it to override a global enable_hashjoin = off setting. None of
the other regression test scripts seem to take nearly as much of a
performance hit from globally forcing poor plans.

regards, tom lane

#6Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#5)
1 attachment(s)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Mon, Jul 08, 2019 at 03:21:41PM -0400, Tom Lane wrote:

Having said that, join_hash.sql in particular seems to have zero
value if it's not testing hash joins, so I think it'd be reasonable
for it to override a global enable_hashjoin = off setting. None of
the other regression test scripts seem to take nearly as much of a
performance hit from globally forcing poor plans.

I am a bit confused here. Don't you mean to have enable_hashjoin =
*on* at the top of hash_join.sql instead like in the attached?
--
Michael

Attachments:

hash-join-fix.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/test/regress/expected/join_hash.out b/src/test/regress/expected/join_hash.out
index 9eee39bdd3..0890608acf 100644
--- a/src/test/regress/expected/join_hash.out
+++ b/src/test/regress/expected/join_hash.out
@@ -1,6 +1,9 @@
 --
 -- exercises for the hash join code
 --
+-- Enforce the use of hash joins in this test, which could get disabled
+-- at system-level.
+set enable_hashjoin = on;
 begin;
 set local min_parallel_table_scan_size = 0;
 set local parallel_setup_cost = 0;
diff --git a/src/test/regress/sql/join_hash.sql b/src/test/regress/sql/join_hash.sql
index ae352e9b0b..b38081d1d5 100644
--- a/src/test/regress/sql/join_hash.sql
+++ b/src/test/regress/sql/join_hash.sql
@@ -2,6 +2,10 @@
 -- exercises for the hash join code
 --
 
+-- Enforce the use of hash joins in this test, which could get disabled
+-- at system-level.
+set enable_hashjoin = on;
+
 begin;
 
 set local min_parallel_table_scan_size = 0;
#7Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#4)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Tue, Jul 9, 2019 at 2:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Given the purposes of this test, I think it'd be reasonable to force
both enable_hashjoin = on and enable_mergejoin = off at the very top
of the join_hash script, or the corresponding place in join.sql in
v11. Thomas, was there a specific reason for forcing enable_mergejoin
= off for only some of these tests?

Based on a suggestion from Andres (if I recall correctly), I wrapped
each individual test in savepoint/rollback, and then set just the GUCs
needed to get the plan shape and execution code path I wanted to
exercise, and I guess I found that I only needed to disable merge
joins for some of them. The idea was that the individual tests could
be understood independently.

--
Thomas Munro
https://enterprisedb.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#6)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

Michael Paquier <michael@paquier.xyz> writes:

On Mon, Jul 08, 2019 at 03:21:41PM -0400, Tom Lane wrote:

Having said that, join_hash.sql in particular seems to have zero
value if it's not testing hash joins, so I think it'd be reasonable
for it to override a global enable_hashjoin = off setting. None of
the other regression test scripts seem to take nearly as much of a
performance hit from globally forcing poor plans.

I am a bit confused here. Don't you mean to have enable_hashjoin =
*on* at the top of hash_join.sql instead like in the attached?

Right, overriding any enable_hashjoin = off that might've come from
PGOPTIONS or wherever.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#7)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

Thomas Munro <thomas.munro@gmail.com> writes:

On Tue, Jul 9, 2019 at 2:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Given the purposes of this test, I think it'd be reasonable to force
both enable_hashjoin = on and enable_mergejoin = off at the very top
of the join_hash script, or the corresponding place in join.sql in
v11. Thomas, was there a specific reason for forcing enable_mergejoin
= off for only some of these tests?

Based on a suggestion from Andres (if I recall correctly), I wrapped
each individual test in savepoint/rollback, and then set just the GUCs
needed to get the plan shape and execution code path I wanted to
exercise, and I guess I found that I only needed to disable merge
joins for some of them. The idea was that the individual tests could
be understood independently.

But per this discussion, they can only be "understood independently"
if you make some assumptions about the prevailing values of the
planner GUCs.

regards, tom lane

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#9)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Tue, Jul 9, 2019 at 2:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

Based on a suggestion from Andres (if I recall correctly), I wrapped
each individual test in savepoint/rollback, and then set just the GUCs
needed to get the plan shape and execution code path I wanted to
exercise, and I guess I found that I only needed to disable merge
joins for some of them. The idea was that the individual tests could
be understood independently.

But per this discussion, they can only be "understood independently"
if you make some assumptions about the prevailing values of the
planner GUCs.

Yeah. I had obviously never noticed that test script. +1 for just
enabling hash joins the top of join_hash.sql in 12+, and the
equivalent section in 11's join.sql (which is luckily at the end of
the file).

--
Thomas Munro
https://enterprisedb.com

#11Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#10)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote:

Yeah. I had obviously never noticed that test script. +1 for just
enabling hash joins the top of join_hash.sql in 12+, and the
equivalent section in 11's join.sql (which is luckily at the end of
the file).

Right, I did not pay much attention to REL_11_STABLE. In this case
the test begins around line 2030 and reaches the bottom of the file.
I would actually add a RESET at the bottom of it to avoid any tests to
be impacted, as usually bug-fix tests are just appended. Thomas,
perhaps you would prefer fixing it yourself? Or should I?
--
Michael

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#11)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

Michael Paquier <michael@paquier.xyz> writes:

On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote:

Yeah. I had obviously never noticed that test script. +1 for just
enabling hash joins the top of join_hash.sql in 12+, and the
equivalent section in 11's join.sql (which is luckily at the end of
the file).

Right, I did not pay much attention to REL_11_STABLE. In this case
the test begins around line 2030 and reaches the bottom of the file.
I would actually add a RESET at the bottom of it to avoid any tests to
be impacted, as usually bug-fix tests are just appended.

Agreed that the scope should be limited. But in 12/HEAD, I think the
relevant tests are all wrapped into one transaction block, so that
using SET LOCAL would be enough. Not sure if 11 is the same.

regards, tom lane

#13Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#11)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Tue, Jul 9, 2019 at 2:45 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jul 09, 2019 at 02:30:51PM +1200, Thomas Munro wrote:

Yeah. I had obviously never noticed that test script. +1 for just
enabling hash joins the top of join_hash.sql in 12+, and the
equivalent section in 11's join.sql (which is luckily at the end of
the file).

Right, I did not pay much attention to REL_11_STABLE. In this case
the test begins around line 2030 and reaches the bottom of the file.
I would actually add a RESET at the bottom of it to avoid any tests to
be impacted, as usually bug-fix tests are just appended. Thomas,
perhaps you would prefer fixing it yourself? Or should I?

It's my mistake. I'll fix it later today.

--
Thomas Munro
https://enterprisedb.com

#14Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#13)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Tue, Jul 09, 2019 at 03:04:27PM +1200, Thomas Munro wrote:

It's my mistake. I'll fix it later today.

Thanks!
--
Michael

#15Melanie Plageman
melanieplageman@gmail.com
In reply to: Tom Lane (#5)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Mon, Jul 8, 2019 at 12:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The point of regressplans.sh is to see if anything goes seriously
wrong when forcing non-default plan choices --- seriously wrong being
defined as crashes or semantically wrong answers. It's not expected
that the regression tests will automatically pass when you do that,
because of their dependencies on output row ordering, not to mention
all the EXPLAINs. I'm not for removing it --- the fact that its
results require manual evaluation doesn't make it useless.

It might be worth post-processing results files to ignore row ordering
in some cases to allow for easier comparison. Has this been proposed
in the past?

--
Melanie Plageman

#16Michael Paquier
michael@paquier.xyz
In reply to: Melanie Plageman (#15)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Tue, Jul 09, 2019 at 11:54:29AM -0700, Melanie Plageman wrote:

It might be worth post-processing results files to ignore row ordering
in some cases to allow for easier comparison. Has this been proposed
in the past?

Not that I recall.
--
Michael

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#16)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Wed, Jul 10, 2019 at 10:12 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jul 09, 2019 at 11:54:29AM -0700, Melanie Plageman wrote:

It might be worth post-processing results files to ignore row ordering
in some cases to allow for easier comparison. Has this been proposed
in the past?

Not that I recall.

It would be good if we can come up with something like that. It will
be helpful for zheap, where in some cases we get different row
ordering due to in-place updates. As of now, we try to add Order By
or do some extra magic to get consistent row ordering.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#18Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#17)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:

It would be good if we can come up with something like that. It will
be helpful for zheap, where in some cases we get different row
ordering due to in-place updates. As of now, we try to add Order By
or do some extra magic to get consistent row ordering.

That was an issue for me as well when working with Postgres-XC when
the row ordering was not guaranteed depending on the number of nodes
(speaking of which Greenplum has the same issues, no?). Adding ORDER
BY clauses to a set of tests may make sense, but then this may impact
the plans generated for some of them..
--
Michael

#19Melanie Plageman
melanieplageman@gmail.com
In reply to: Michael Paquier (#18)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Wed, Jul 10, 2019 at 12:40 AM Michael Paquier <michael@paquier.xyz>
wrote:

On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:

It would be good if we can come up with something like that. It will
be helpful for zheap, where in some cases we get different row
ordering due to in-place updates. As of now, we try to add Order By
or do some extra magic to get consistent row ordering.

That was an issue for me as well when working with Postgres-XC when
the row ordering was not guaranteed depending on the number of nodes
(speaking of which Greenplum has the same issues, no?). Adding ORDER
BY clauses to a set of tests may make sense, but then this may impact
the plans generated for some of them..
--
Michael

We have a tool that does this. gpdiff [1]https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/gpdiff.pl is used for results
post-processing
and it uses a perl module called atmsort [2]https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/atmsort.pl to deal with the specific
ORDER BY
case discussed here.

[1]: https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/gpdiff.pl
https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/gpdiff.pl
[2]: https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/atmsort.pl
https://github.com/greenplum-db/gpdb/blob/master/src/test/regress/atmsort.pl

--
Melanie Plageman

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#18)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

Michael Paquier <michael@paquier.xyz> writes:

On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:

It would be good if we can come up with something like that. It will
be helpful for zheap, where in some cases we get different row
ordering due to in-place updates. As of now, we try to add Order By
or do some extra magic to get consistent row ordering.

That was an issue for me as well when working with Postgres-XC when
the row ordering was not guaranteed depending on the number of nodes
(speaking of which Greenplum has the same issues, no?). Adding ORDER
BY clauses to a set of tests may make sense, but then this may impact
the plans generated for some of them..

Yeah, I do not want to get into a situation where we can't test
queries that lack ORDER BY. Also, the fact that tableam X doesn't
reproduce heap's row ordering is not a good reason to relax the
strength of the tests for heap. So I'm wondering about some
postprocessing that we could optionally apply. Perhaps the tools
Melanie mentions could help.

regards, tom lane

#21Ashwin Agrawal
aagrawal@pivotal.io
In reply to: Tom Lane (#20)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Wed, Jul 10, 2019 at 6:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael@paquier.xyz> writes:

On Wed, Jul 10, 2019 at 12:51:28PM +0530, Amit Kapila wrote:

It would be good if we can come up with something like that. It will
be helpful for zheap, where in some cases we get different row
ordering due to in-place updates. As of now, we try to add Order By
or do some extra magic to get consistent row ordering.

That was an issue for me as well when working with Postgres-XC when
the row ordering was not guaranteed depending on the number of nodes
(speaking of which Greenplum has the same issues, no?). Adding ORDER
BY clauses to a set of tests may make sense, but then this may impact
the plans generated for some of them..

Yeah, I do not want to get into a situation where we can't test
queries that lack ORDER BY. Also, the fact that tableam X doesn't
reproduce heap's row ordering is not a good reason to relax the
strength of the tests for heap. So I'm wondering about some
postprocessing that we could optionally apply. Perhaps the tools
Melanie mentions could help.

Surprisingly, I have been working from a couple of days to use those
Perl tools from Greenplum for Zedstore. As for Zedstore plans differ
for many regress tests because relation size not being the same as
heap and all. Also, for similar reasons, row orders change as
well. So, to effectively use the test untouched to validate Zedstore
and yes was thinking will help Zheap testing as well. I also tested
the same for regressplans.sh and it will lift a lot of manual burden
of investigating the results. As one can specify to completely ignore
explain plan outputs from the comparison between results and
expected. Will post patch for the tool, once I get in little decent
shape.

#22Michael Paquier
michael@paquier.xyz
In reply to: Ashwin Agrawal (#21)
Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

On Wed, Jul 10, 2019 at 09:11:41AM -0700, Ashwin Agrawal wrote:

Will post patch for the tool, once I get in little decent shape.

That would be nice! We may be able to get something into v13 this way
then.
--
Michael