BUG #16863: Assert failed in set_plain_rel_size() on processing ~* with a long prefix
The following bug has been logged on the website:
Bug reference: 16863
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 13.2
Operating system: Ubuntu 20.04
Description:
On executing the query:
CREATE TABLE test (t text);
SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));
I get an assertion failure with the following callstack:
Core was generated by `postgres: law regression [local] SELECT
'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f2fec581859 in __GI_abort () at abort.c:79
#2 0x0000559da7963ff8 in ExceptionalCondition (
conditionName=conditionName@entry=0x559da7ab4690 "rel->rows > 0 ||
IS_DUMMY_REL(rel)",
errorType=errorType@entry=0x559da79bf028 "FailedAssertion",
fileName=fileName@entry=0x559da7ab43b1 "allpaths.c",
lineNumber=lineNumber@entry=462) at assert.c:67
#3 0x0000559da7731c94 in set_rel_size (root=root@entry=0x559da8c23528,
rel=rel@entry=0x559da8c24058, rti=rti@entry=1,
rte=rte@entry=0x559da8b43978) at allpaths.c:462
#4 0x0000559da7731cd7 in set_base_rel_sizes
(root=root@entry=0x559da8c23528) at allpaths.c:323
#5 0x0000559da7733ae8 in make_one_rel (root=root@entry=0x559da8c23528,
joinlist=joinlist@entry=0x559da8c24700)
at allpaths.c:185
#6 0x0000559da775a68f in query_planner (root=root@entry=0x559da8c23528,
qp_callback=qp_callback@entry=0x559da775e617 <standard_qp_callback>,
qp_extra=qp_extra@entry=0x7ffe0e874eb0)
at planmain.c:269
#7 0x0000559da77617bd in grouping_planner
(root=root@entry=0x559da8c23528,
inheritance_update=inheritance_update@entry=false,
tuple_fraction=<optimized out>, tuple_fraction@entry=0)
at planner.c:2058
#8 0x0000559da7763d5a in subquery_planner (glob=glob@entry=0x559da8b43748,
parse=parse@entry=0x559da8b43860,
parent_root=parent_root@entry=0x0,
hasRecursion=hasRecursion@entry=false,
tuple_fraction=tuple_fraction@entry=0)
at planner.c:1015
#9 0x0000559da77643f0 in standard_planner (parse=0x559da8b43860,
query_string=<optimized out>, cursorOptions=256,
boundParams=<optimized out>) at planner.c:405
#10 0x0000559da77649e4 in planner (parse=parse@entry=0x559da8b43860,
query_string=query_string@entry=0x559da8b42550 "SELECT * FROM test WHERE
t ~* ('^' || repeat('-', 500));",
cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0) at planner.c:275
#11 0x0000559da783e71c in pg_plan_query
(querytree=querytree@entry=0x559da8b43860,
query_string=query_string@entry=0x559da8b42550 "SELECT * FROM test WHERE
t ~* ('^' || repeat('-', 500));",
cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0) at postgres.c:875
#12 0x0000559da783e7d9 in pg_plan_queries (querytrees=0x559da8c234f0,
query_string=query_string@entry=0x559da8b42550 "SELECT * FROM test WHERE
t ~* ('^' || repeat('-', 500));",
cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0) at postgres.c:966
#13 0x0000559da783eca9 in exec_simple_query (
query_string=query_string@entry=0x559da8b42550 "SELECT * FROM test WHERE
t ~* ('^' || repeat('-', 500));")
at postgres.c:1158
#14 0x0000559da7840cb0 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x559da8b6d7c8, dbname=<optimized out>,
username=<optimized out>) at postgres.c:4329
#15 0x0000559da77ac39f in BackendRun (port=port@entry=0x559da8b63ad0) at
postmaster.c:4526
#16 0x0000559da77af4fa in BackendStartup (port=port@entry=0x559da8b63ad0) at
postmaster.c:4210
#17 0x0000559da77af741 in ServerLoop () at postmaster.c:1739
#18 0x0000559da77b0c6a in PostmasterMain (argc=3, argv=<optimized out>) at
postmaster.c:1412
#19 0x0000559da76f9e16 in main (argc=3, argv=0x559da8b3c980) at main.c:210
12.02.2021 22:00, PG Bug reporting form wrote:
SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));
...
#2 0x0000559da7963ff8 in ExceptionalCondition (
conditionName=conditionName@entry=0x559da7ab4690 "rel->rows > 0 ||
IS_DUMMY_REL(rel)",
errorType=errorType@entry=0x559da79bf028 "FailedAssertion",
fileName=fileName@entry=0x559da7ab43b1 "allpaths.c",
lineNumber=lineNumber@entry=462) at assert.c:67
With the attached debugging patch applied I see that rel->rows there is
NaN. At the end of the walk by the following calls:
set_plain_rel_size -> set_baserel_size_estimates ->
clauselist_selectivity -> clauselist_selectivity_simple ->
clause_selectivity -> restriction_selectivity -> icregexeqsel ->
patternsel -> patternsel_common -> pattern_fixed_prefix ->
regex_fixed_prefix -> regex_selectivity
I've found a division that produces NaN:
sel /= pow(FIXED_CHAR_SEL, fixed_prefix_len);
The complete output with the debugging code:
psql:500.sql:2: INFO: sel:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000,
FIXED_CHAR_SEL: 0.200000, fixed_prefix_len: 500, pow:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
psql:500.sql:2: INFO: sel: NaN
psql:500.sql:2: INFO: rel->rows: NaN; nrows: NaN
psql:500.sql:2: INFO: rel->rows: NaN
psql:500.sql:2: server closed the connection unexpectedly
With the prefix length 400 the output is different:
psql:400.sql:2: INFO: sel:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002582249878086965166223199366465225696632326517795503118666931124080740300000000000000000000000000000000000000000000000000,
FIXED_CHAR_SEL: 0.200000, fixed_prefix_len: 400, pow:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002582249878086966144821031402096464952544460966116815567078211594740854000000000000000000000000000000000000000000000000000
psql:400.sql:2: INFO: sel: 1.000000
psql:400.sql:2: INFO: rel->rows: 7.000000; nrows: 6.800000
psql:400.sql:2: INFO: rel->rows: 7.000000
On the master with the length 500 there is no assertion failure but the
row count is strange:
psql:master500.sql:3: INFO: sel:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000,
FIXED_CHAR_SEL: 0.200000, fixed_prefix_len: 500, pow:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
psql:master500.sql:3: INFO: sel: NaN
psql:master500.sql:3: INFO: rel->rows:
10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104.000000;
nrows: NaN
psql:master500.sql:3: INFO: rel->rows:
10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104.000000
By the way, if the following check in restriction_selectivity() is
intended to throw error on an invalid selectivity, shouldn't the
isnan(result) test be appended here?:
if (result < 0.0 || result > 1.0)
elog(ERROR, "invalid restriction selectivity: %f", result);
Best regards,
Alexander
Attachments:
debug_sel.patchtext/x-patch; charset=UTF-8; name=debug_sel.patchDownload+4-0
Alexander Lakhin <exclusion@gmail.com> writes:
I've found a division that produces NaN:
sel /= pow(FIXED_CHAR_SEL, fixed_prefix_len);
Hmm. I'm not getting a NaN AFAICT, but I am getting pretty darn weird
estimates. I agree this needs some kind of clamp.
regression=# create table test (t text);
CREATE TABLE
regression=# explain SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));
...
Seq Scan on test (cost=0.00..27.00 rows=10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104 width=32)
...
BTW, the message you're answering hasn't shown up here, nor is it
in the mailing list archives. Odd.
regards, tom lane
Hello Tom,
12.02.2021 23:37, Tom Lane wrote:
Alexander Lakhin <exclusion@gmail.com> writes:
I've found a division that produces NaN:
sel /= pow(FIXED_CHAR_SEL, fixed_prefix_len);Hmm. I'm not getting a NaN AFAICT, but I am getting pretty darn weird
estimates. I agree this needs some kind of clamp.regression=# create table test (t text);
CREATE TABLE
regression=# explain SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));
...
Seq Scan on test (cost=0.00..27.00 rows=10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104 width=32)
The same number I've seen on the master branch. But on REL_13_STABLE I
get a NaN and then the assertion failure. (I've chosen the version 13.2
in the bug report, but it's really delayed somewhere.)
Though that division produced a NaN for me on both branches.
Best regards,
Alexander
Alexander Lakhin <exclusion@gmail.com> writes:
12.02.2021 23:37, Tom Lane wrote:
Alexander Lakhin <exclusion@gmail.com> writes:
I've found a division that produces NaN:
sel /= pow(FIXED_CHAR_SEL, fixed_prefix_len);
Hmm. I'm not getting a NaN AFAICT, but I am getting pretty darn weird
estimates. I agree this needs some kind of clamp.regression=# create table test (t text);
CREATE TABLE
regression=# explain SELECT * FROM test WHERE t ~* ('^' || repeat('-', 500));
...
Seq Scan on test (cost=0.00..27.00 rows=10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104 width=32)
The same number I've seen on the master branch. But on REL_13_STABLE I
get a NaN and then the assertion failure.
The difference in behavior is evidently explained by commit a90c950fc,
which has decided to insert a random number in place of a NaN estimate.
Well, it's not really random, it's supposed to be 1e100 ... but EXPLAIN
doesn't know there's only 16 or so significant digits there. Maybe we
need to work a bit harder on making that print nicely. And maybe we
should rethink the idea that it's okay to gloss over a NaN estimate
that way. I certainly don't see a defensible reason for assuming that
NaN means "a large value".
regards, tom lane