Regression test failure in regression test temp.sql

Started by Michael Paquierover 6 years ago8 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

While browsing the buildfarm failures, I have found this problem on
anole for the test temp:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2019-08-07%2006%3A39%3A35
select relname from pg_class where relname like 'temp_parted_oncommit_test%';
relname
----------------------------
- temp_parted_oncommit_test
temp_parted_oncommit_test1
(2 rows)

  drop table temp_parted_oncommit_test;
  --- 276,283 ----
  select relname from pg_class where relname like 'temp_parted_oncommit_test%';
             relname
  ----------------------------
   temp_parted_oncommit_test1
+  temp_parted_oncommit_test
  (2 rows)

This could be solved just with an ORDER BY as per the attached. Any
objections?

Thanks,
--
Michael

Attachments:

temp-partition-test.patchtext/x-diff; charset=us-asciiDownload+4-2
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#1)
Re: Regression test failure in regression test temp.sql

Michael Paquier <michael@paquier.xyz> writes:

While browsing the buildfarm failures, I have found this problem on
anole for the test temp:
...
This could be solved just with an ORDER BY as per the attached. Any
objections?

There's no reason to expect stability of row order in pg_class, so
in principle this is a reasonable fix, but I kind of wonder why it's
necessary. The plan I get for this query is

regression=# explain select relname from pg_class where relname like 'temp_parted_oncommit_test%';
QUERY PLAN
-------------------------------------------------------------------------------------------------
Index Only Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..4.30 rows=1 width=64)
Index Cond: ((relname >= 'temp'::text) AND (relname < 'temq'::text))
Filter: (relname ~~ 'temp_parted_oncommit_test%'::text)
(3 rows)

which ought to deliver sorted rows natively. Adding ORDER BY doesn't
change this plan one bit. So what actually happened on anole to cause
a non-sorted result?

Not objecting to the patch, exactly, just feeling like there's
more here than meets the eye. Not quite sure if it's worth
investigating closer, or what we'd even need to do to do so.

BTW, I realize from looking at the plan that LIKE is interpreting the
underscores as wildcards. Maybe it's worth s/_/\_/ while you're
at it.

regards, tom lane

#3Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#2)
Re: Regression test failure in regression test temp.sql

On Wed, Aug 07, 2019 at 10:17:25AM -0400, Tom Lane wrote:

Not objecting to the patch, exactly, just feeling like there's
more here than meets the eye. Not quite sure if it's worth
investigating closer, or what we'd even need to do to do so.

Yes, something's weird here. I'd think that the index only scan
ensures a proper ordering in this case, so it could be possible that a
different plan got selected here? That would mean that the plan
selected would not be an index-only scan or an index scan. So perhaps
that was a bitmap scan?

BTW, I realize from looking at the plan that LIKE is interpreting the
underscores as wildcards. Maybe it's worth s/_/\_/ while you're

Right. Looking around there are much more tests which have the same
problem. This could become a problem if other tests running in
parallel use relation names with the same pattern, which is not a
issue as of HEAD, so I'd rather just back-patch the ORDER BY part of
it (temp.sql is the only test missing that). What do you think about
the attached?
--
Michael

Attachments:

regress-test-bugs.patchtext/x-diff; charset=us-asciiDownload+58-56
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#3)
Re: Regression test failure in regression test temp.sql

Michael Paquier <michael@paquier.xyz> writes:

On Wed, Aug 07, 2019 at 10:17:25AM -0400, Tom Lane wrote:

Not objecting to the patch, exactly, just feeling like there's
more here than meets the eye. Not quite sure if it's worth
investigating closer, or what we'd even need to do to do so.

Yes, something's weird here. I'd think that the index only scan
ensures a proper ordering in this case, so it could be possible that a
different plan got selected here? That would mean that the plan
selected would not be an index-only scan or an index scan. So perhaps
that was a bitmap scan?

I hacked temp.sql to print a couple different plans (doing it that way,
rather than manually, just to ensure that I was getting plans matching
what would actually happen right there). And what I see, as attached,
is that IOS and plain index and bitmap scans all have pretty much the
same total cost. The planner then ought to prefer IOS or plain on the
secondary grounds of cheaper startup cost. However, it's not so hard
to believe that it might switch to bitmap if something caused the cost
estimates to change by a few percent. So probably we should write this
off as "something affected the plan choice" and just add the ORDER BY
as you suggest.

BTW, I realize from looking at the plan that LIKE is interpreting the
underscores as wildcards. Maybe it's worth s/_/\_/ while you're

Right. Looking around there are much more tests which have the same
problem. This could become a problem if other tests running in
parallel use relation names with the same pattern, which is not a
issue as of HEAD, so I'd rather just back-patch the ORDER BY part of
it (temp.sql is the only test missing that). What do you think about
the attached?

Hmm, I wasn't thinking of changing anything more than this one query.
I'm not sure that a wide-ranging patch is going to be worth the
potential back-patching land mines it'd introduce. However, if you
want to do it anyway, please at least patch v12 as well --- that
should still be a pretty painless back-patch, even if it's not so
easy to go further.

BTW, most of the problem here seems to be that the SQL committee
made an infelicitous choice of wildcard characters for LIKE.
I wonder if it'd be saner to fix this by switching to regexes?

regression=# explain select relname from pg_class where relname like 'temp_parted_oncommit_test%';
QUERY PLAN
-------------------------------------------------------------------------------------------------
Index Only Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..4.30 rows=1 width=64)
Index Cond: ((relname >= 'temp'::text) AND (relname < 'temq'::text))
Filter: (relname ~~ 'temp_parted_oncommit_test%'::text)
(3 rows)

regression=# explain select relname from pg_class where relname ~ '^temp_parted_oncommit_test';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Only Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..4.30 rows=1 width=64)
Index Cond: ((relname >= 'temp_parted_oncommit_test'::text) AND (relname < 'temp_parted_oncommit_tesu'::text))
Filter: (relname ~ '^temp_parted_oncommit_test'::text)
(3 rows)

regards, tom lane

Attachments:

regression.diffstext/x-diff; charset=us-ascii; name=regression.diffsDownload+32-0
#5Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#4)
Re: Regression test failure in regression test temp.sql

On Sun, Aug 11, 2019 at 03:59:06PM -0400, Tom Lane wrote:

I hacked temp.sql to print a couple different plans (doing it that way,
rather than manually, just to ensure that I was getting plans matching
what would actually happen right there). And what I see, as attached,
is that IOS and plain index and bitmap scans all have pretty much the
same total cost. The planner then ought to prefer IOS or plain on the
secondary grounds of cheaper startup cost. However, it's not so hard
to believe that it might switch to bitmap if something caused the cost
estimates to change by a few percent. So probably we should write this
off as "something affected the plan choice" and just add the ORDER BY
as you suggest.

That matches what I was seeing, except that I have done those tests
manually. Still my plans matched with yours.

Hmm, I wasn't thinking of changing anything more than this one query.
I'm not sure that a wide-ranging patch is going to be worth the
potential back-patching land mines it'd introduce. However, if you
want to do it anyway, please at least patch v12 as well --- that
should still be a pretty painless back-patch, even if it's not so
easy to go further.

Okay, I have gone with a minimal fix of only changing some of the
quals in temp.sql as it could become a problem if other tests begin to
use relations beginning with "temp". If it proves that we have other
problems in this area later on, let's address it at this time.

BTW, most of the problem here seems to be that the SQL committee
made an infelicitous choice of wildcard characters for LIKE.
I wonder if it'd be saner to fix this by switching to regexes?

So that enforces the start of the string to match. This has the merit
to make the relation name cleaner to grab. I have gone with your
suggestion, thanks for the advice!
--
Michael

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#5)
Re: Regression test failure in regression test temp.sql

On Tue, Aug 13, 2019 at 1:58 PM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Aug 11, 2019 at 03:59:06PM -0400, Tom Lane wrote:

I hacked temp.sql to print a couple different plans (doing it that way,
rather than manually, just to ensure that I was getting plans matching
what would actually happen right there). And what I see, as attached,
is that IOS and plain index and bitmap scans all have pretty much the
same total cost. The planner then ought to prefer IOS or plain on the
secondary grounds of cheaper startup cost. However, it's not so hard
to believe that it might switch to bitmap if something caused the cost
estimates to change by a few percent. So probably we should write this
off as "something affected the plan choice" and just add the ORDER BY
as you suggest.

That matches what I was seeing, except that I have done those tests
manually. Still my plans matched with yours.

Here's another one that seems to fit that pattern.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2019-08-11%2007%3A33%3A39

+++ /home/andres/build/buildfarm/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/regress/results/collate.icu.utf8.out
2019-08-11 08:29:11.792695714 +0000
@@ -1622,15 +1622,15 @@
 SELECT typname FROM pg_type WHERE typname LIKE 'int_' AND typname <>
'INT2'::text COLLATE case_insensitive;
  typname
 ---------
- int4
  int8
+ int4
 (2 rows)

--
Thomas Munro
https://enterprisedb.com

#7Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#6)
Re: Regression test failure in regression test temp.sql

On Tue, Aug 13, 2019 at 02:51:03PM +1200, Thomas Munro wrote:

Here's another one that seems to fit that pattern.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&amp;dt=2019-08-11%2007%3A33%3A39

Indeed. Good catch! Perhaps you would like to fix it? There are two
queries in need of an ORDER BY, and the second query even uses two
semicolons (spoiler warning: that's a nit).
--
Michael

#8Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#7)
Re: Regression test failure in regression test temp.sql

On Tue, Aug 13, 2019 at 12:15:26PM +0900, Michael Paquier wrote:

Indeed. Good catch! Perhaps you would like to fix it? There are two
queries in need of an ORDER BY, and the second query even uses two
semicolons (spoiler warning: that's a nit).

And fixed. The test case was new as of v12.
--
Michael