[sqlsmith] Failed assertion in joinrels.c
Hi,
sqlsmith triggered the following assertion in master (c188204).
TRAP: FailedAssertion("!(!bms_overlap(joinrelids, sjinfo->min_lefthand))", File: "joinrels.c", Line: 500)
As usual, the query is against the regression database. It is rather
unwieldy… I wonder if I should stop working on new grammar rules and
instead work on some post-processing that prunes the AST as much as
possible while maintaining the failure mode.
regards,
andreas
select
subq_647409.c0 as c0,
subq_647409.c0 as c1
from
public.customer as rel_4116461
left join public.clstr_tst_s as rel_4116555
left join information_schema.columns as rel_4116556
on (rel_4116555.rf_a = rel_4116556.ordinal_position )
right join pg_catalog.pg_roles as rel_4116557
on (rel_4116556.maximum_cardinality = rel_4116557.rolconnlimit )
on (rel_4116461.passwd = rel_4116557.rolpassword )
left join (select
subq_647410.c8 as c0
from
public.char_tbl as rel_4116611,
lateral (select
rel_4116612.name as c0,
rel_4116612.comment as c1,
rel_4116612.nslots as c2,
rel_4116612.comment as c3,
rel_4116612.nslots as c4,
rel_4116612.nslots as c5,
rel_4116612.comment as c6,
rel_4116612.comment as c7,
rel_4116612.nslots as c8,
rel_4116612.nslots as c9,
rel_4116612.nslots as c10
from
public.hub as rel_4116612
where rel_4116612.comment ~>=~ rel_4116612.comment
fetch first 116 rows only) as subq_647410
where (subq_647410.c7 !~~* subq_647410.c7)
or ((subq_647410.c3 = subq_647410.c3)
and ((subq_647410.c7 ~* subq_647410.c6)
and (subq_647410.c7 @@ subq_647410.c7)))
fetch first 152 rows only) as subq_647409
inner join public.int4_tbl as rel_4116661
inner join public.shoe as rel_4116662
on (rel_4116661.f1 = rel_4116662.sh_avail )
inner join public.rtest_vview3 as rel_4116663
on (rel_4116661.f1 = rel_4116663.a )
on (subq_647409.c0 = rel_4116662.sh_avail )
on (rel_4116555.b = rel_4116661.f1 )
where ((rel_4116557.rolvaliduntil is NULL)
or (rel_4116663.b !~ rel_4116461.name))
or (rel_4116661.f1 is not NULL)
fetch first 80 rows only;
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andreas Seltenreich <seltenreich@gmx.de> writes:
sqlsmith triggered the following assertion in master (c188204).
TRAP: FailedAssertion("!(!bms_overlap(joinrelids, sjinfo->min_lefthand))", File: "joinrels.c", Line: 500)
Cool, I'll take a look.
As usual, the query is against the regression database. It is rather
unwieldy… I wonder if I should stop working on new grammar rules and
instead work on some post-processing that prunes the AST as much as
possible while maintaining the failure mode.
Probably not really worth the trouble; I find it's usually easy to
produce a minimized test case after the failure cause is understood.
What concerns me more is that what you're finding is only cases that trip
an assertion sanity check. It seems likely that you're also managing to
trigger other bugs with less drastic consequences, such as "could not
devise a query plan" failures or just plain wrong answers. I'm not sure
how we could identify wrong answers automatically :-( but it might be
worth checking for XX000 SQLSTATE responses, since generally that should
be a can't-happen case. (Or if it can happen, we need to change the
errcode.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane writes:
What concerns me more is that what you're finding is only cases that trip
an assertion sanity check. It seems likely that you're also managing to
trigger other bugs with less drastic consequences, such as "could not
devise a query plan" failures or just plain wrong answers.
Ja, some of these are logged as well[1]smith=# select * from report24h; count | error -------+-------------------------------------------------------------------------- 43831 | ERROR: unsupported XML feature 39496 | ERROR: invalid regular expression: quantifier operand invalid 27261 | ERROR: canceling statement due to statement timeout 21386 | ERROR: operator does not exist: point = point 8580 | ERROR: cannot compare arrays of different element types 5019 | ERROR: invalid regular expression: brackets [] not balanced 4646 | ERROR: could not determine which collation to use for string comparison 2583 | ERROR: invalid regular expression: nfa has too many states 2248 | ERROR: operator does not exist: xml = xml 1198 | ERROR: operator does not exist: polygon = polygon 1171 | ERROR: cache lookup failed for index 16862 677 | ERROR: invalid regular expression: parentheses () not balanced 172 | ERROR: cache lookup failed for index 257148 84 | ERROR: could not find member 1(34520,34520) of opfamily 1976 55 | ERROR: missing support function 1(34516,34516) in opfamily 1976 42 | ERROR: operator does not exist: city_budget = city_budget 13 | ERROR: could not find commutator for operator 34538 10 | ERROR: could not identify a comparison function for type xid 4 | Connection to database failed 4 | ERROR: cache lookup failed for index 2619 3 | ERROR: plan should not reference subplan's variable 2 | ERROR: cache lookup failed for index 12322 2 | ERROR: failed to assign all NestLoopParams to plan nodes 2 | ERROR: invalid regular expression: invalid character range 1 | ERROR: could not find pathkey item to sort (25 rows) Time: 1158,990 ms, but most of them are really as
undrastic as can get, and I was afraid reporting them would be more of a
nuisance. I analysed a couple of the cache lookup failures, and they
all had a similar severreness than the example in the README[2]https://github.com/anse1/sqlsmith/blob/master/README.org. The
operator ones I analysed seem due to intentionally broken operators in
the regression db. The NestLoopParams and subplan reference one sound
interesting though…
I'm not sure how we could identify wrong answers automatically :-(
Csmith isn't doing this either. They discuss differential testing
though in their papers, i.e., comparing the results of different
products. Maybe a simple metric like numbers of rows returned might
already be valuable for correctness checks.
I also thought about doing some sampling on the data and simulating
relational operations and check for witness tuples, but it is probably
not appropriate to start implementing a mini-rdbms on the client side.
but it might be worth checking for XX000 SQLSTATE responses, since
generally that should be a can't-happen case. (Or if it can happen,
we need to change the errcode.)
The sqlstate is currently missing in the reports because libpqxx is not
putting it in it's exceptions :-/.
regards,
Andreas
Footnotes:
[1]: smith=# select * from report24h; count | error -------+-------------------------------------------------------------------------- 43831 | ERROR: unsupported XML feature 39496 | ERROR: invalid regular expression: quantifier operand invalid 27261 | ERROR: canceling statement due to statement timeout 21386 | ERROR: operator does not exist: point = point 8580 | ERROR: cannot compare arrays of different element types 5019 | ERROR: invalid regular expression: brackets [] not balanced 4646 | ERROR: could not determine which collation to use for string comparison 2583 | ERROR: invalid regular expression: nfa has too many states 2248 | ERROR: operator does not exist: xml = xml 1198 | ERROR: operator does not exist: polygon = polygon 1171 | ERROR: cache lookup failed for index 16862 677 | ERROR: invalid regular expression: parentheses () not balanced 172 | ERROR: cache lookup failed for index 257148 84 | ERROR: could not find member 1(34520,34520) of opfamily 1976 55 | ERROR: missing support function 1(34516,34516) in opfamily 1976 42 | ERROR: operator does not exist: city_budget = city_budget 13 | ERROR: could not find commutator for operator 34538 10 | ERROR: could not identify a comparison function for type xid 4 | Connection to database failed 4 | ERROR: cache lookup failed for index 2619 3 | ERROR: plan should not reference subplan's variable 2 | ERROR: cache lookup failed for index 12322 2 | ERROR: failed to assign all NestLoopParams to plan nodes 2 | ERROR: invalid regular expression: invalid character range 1 | ERROR: could not find pathkey item to sort (25 rows) Time: 1158,990 ms
count | error
-------+--------------------------------------------------------------------------
43831 | ERROR: unsupported XML feature
39496 | ERROR: invalid regular expression: quantifier operand invalid
27261 | ERROR: canceling statement due to statement timeout
21386 | ERROR: operator does not exist: point = point
8580 | ERROR: cannot compare arrays of different element types
5019 | ERROR: invalid regular expression: brackets [] not balanced
4646 | ERROR: could not determine which collation to use for string comparison
2583 | ERROR: invalid regular expression: nfa has too many states
2248 | ERROR: operator does not exist: xml = xml
1198 | ERROR: operator does not exist: polygon = polygon
1171 | ERROR: cache lookup failed for index 16862
677 | ERROR: invalid regular expression: parentheses () not balanced
172 | ERROR: cache lookup failed for index 257148
84 | ERROR: could not find member 1(34520,34520) of opfamily 1976
55 | ERROR: missing support function 1(34516,34516) in opfamily 1976
42 | ERROR: operator does not exist: city_budget = city_budget
13 | ERROR: could not find commutator for operator 34538
10 | ERROR: could not identify a comparison function for type xid
4 | Connection to database failed
4 | ERROR: cache lookup failed for index 2619
3 | ERROR: plan should not reference subplan's variable
2 | ERROR: cache lookup failed for index 12322
2 | ERROR: failed to assign all NestLoopParams to plan nodes
2 | ERROR: invalid regular expression: invalid character range
1 | ERROR: could not find pathkey item to sort
(25 rows)
Time: 1158,990 ms
[2]: https://github.com/anse1/sqlsmith/blob/master/README.org
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andreas Seltenreich <seltenreich@gmx.de> writes:
Tom Lane writes:
What concerns me more is that what you're finding is only cases that trip
an assertion sanity check. It seems likely that you're also managing to
trigger other bugs with less drastic consequences, such as "could not
devise a query plan" failures or just plain wrong answers.
Ja, some of these are logged as well[1], but most of them are really as
undrastic as can get, and I was afraid reporting them would be more of a
nuisance.
Well, I certainly think all of these represent bugs:
3 | ERROR: plan should not reference subplan's variable
2 | ERROR: failed to assign all NestLoopParams to plan nodes
1 | ERROR: could not find pathkey item to sort
This I'm not sure about; it could be that the query gave conflicting
collation specifiers, but on the other hand we've definitely had bugs
with people forgetting to run assign_query_collations on subexpressions:
4646 | ERROR: could not determine which collation to use for string comparison
This one's pretty darn odd, because 2619 is pg_statistic and not an index
at all:
4 | ERROR: cache lookup failed for index 2619
These seem likely to be bugs as well, though maybe they are race
conditions during a DROP and not worth fixing:
1171 | ERROR: cache lookup failed for index 16862
172 | ERROR: cache lookup failed for index 257148
84 | ERROR: could not find member 1(34520,34520) of opfamily 1976
55 | ERROR: missing support function 1(34516,34516) in opfamily 1976
13 | ERROR: could not find commutator for operator 34538
2 | ERROR: cache lookup failed for index 12322
I would say anything of the sort that is repeatable definitely deserves
investigation, because even if it's an expectable error condition, we
should be throwing a more user-friendly error message.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane writes:
Well, I certainly think all of these represent bugs:
[...]
thanks for priorizing them. I'll try to digest them somewhat before
posting.
This one's pretty darn odd, because 2619 is pg_statistic and not an index
at all:4 | ERROR: cache lookup failed for index 2619
This is actually the one from the README :-). Quoting to spare media
discontinuity:
--8<---------------cut here---------------start------------->8---
Taking a closer look at it reveals that it happens when you query a
certain catalog view like this:
self=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL;
FEHLER: cache lookup failed for index 2619
This is because the planner then puts pg_get_indexdef(oid) in a context
where it sees non-index-oids, which causes it to croak:
QUERY PLAN
------------------------------------------------------------------------------------
Hash Join (cost=17.60..30.65 rows=9 width=4)
Hash Cond: (i.oid = x.indexrelid)
-> Seq Scan on pg_class i (cost=0.00..12.52 rows=114 width=8)
Filter: ((pg_get_indexdef(oid) IS NOT NULL) AND (relkind = 'i'::"char"))
-> Hash (cost=17.31..17.31 rows=23 width=4)
-> Hash Join (cost=12.52..17.31 rows=23 width=4)
Hash Cond: (x.indrelid = c.oid)
-> Seq Scan on pg_index x (cost=0.00..4.13 rows=113 width=8)
-> Hash (cost=11.76..11.76 rows=61 width=8)
-> Seq Scan on pg_class c (cost=0.00..11.76 rows=61 width=8)
Filter: (relkind = ANY ('{r,m}'::"char"[]))
--8<---------------cut here---------------end--------------->8---
thanks,
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/01/2015 05:59 PM, Tom Lane wrote:
Well, I certainly think all of these represent bugs:
1 | ERROR: could not find pathkey item to sort
sqlsmith was able to find these two queries that trigger the error on my
machine:
ERROR: could not find pathkey item to sort
STATEMENT: select
rel_16564180.id1 as c0
from
public.bprime as rel_16564178
left join (select
rel_16564179.b as c0,
rel_16564179.c as c1
from
public.clstr_tst as rel_16564179
where rel_16564179.c < rel_16564179.d) as subq_2610919
inner join public.num_result as rel_16564180
on (subq_2610919.c0 = rel_16564180.id1 )
on (rel_16564178.thousand = subq_2610919.c0 )
where subq_2610919.c1 = subq_2610919.c1
fetch first 122 rows only;
STATEMENT: select
subq_1991714.c0 as c0,
subq_1991712.c0 as c1,
rel_12624817.a as c2
from
(select
rel_12624653.id2 as c0,
rel_12624653.id2 as c1
from
public.num_exp_add as rel_12624653
where rel_12624653.expected >= rel_12624653.expected
fetch first 137 rows only) as subq_1991697
inner join (select
rel_12624805.z as c0
from
public.insert_tbl as rel_12624805
where rel_12624805.y <= rel_12624805.y
fetch first 108 rows only) as subq_1991712
left join public.clstr_tst as rel_12624817
left join public.main_table as rel_12624818
on (rel_12624817.b = rel_12624818.a )
on (subq_1991712.c0 = rel_12624818.a )
on (subq_1991697.c0 = rel_12624818.a ),
lateral (select
rel_12624819.tgdeferrable as c0
from
pg_catalog.pg_trigger as rel_12624819
where (rel_12624819.tgconstraint >= rel_12624819.tgrelid)
and ((rel_12624819.tgconstrindid <= rel_12624819.tgrelid)
or (EXISTS (
select
rel_12624820.grantor as c0,
rel_12624820.grantee as c1,
rel_12624820.is_grantable as c2,
rel_12624820.routine_schema as c3,
rel_12624820.specific_schema as c4,
rel_12624820.grantee as c5,
rel_12624820.routine_catalog as c6,
rel_12624820.routine_schema as c7,
rel_12624820.specific_name as c8
from
information_schema.role_routine_grants as
rel_12624820
where 22 >= 29
fetch first 138 rows only)))
fetch first 93 rows only) as subq_1991714
where rel_12624817.d = rel_12624817.c
fetch first 97 rows only;
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
sqlsmith triggered the following assertion in master (c188204).
Thanks for writing sqlsmith. It seems like a great tool.
I wonder, are you just running the tool with assertions enabled when
PostgreSQL is built? If so, it might make sense to make various
problems more readily detected. As you may know, Clang has a pretty
decent option called AddressSanitizer that can detect memory errors as
they occur with an overhead that is not excessive. One might use the
following configure arguments when building PostgreSQL to use
AddressSanitizer:
./configure CC=clang CFLAGS='-O1 -g -fsanitize=address
-fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert
Of course, it remains to be seen if this pays for itself. Apparently
the tool has about a 2x overhead [1]http://clang.llvm.org/docs/AddressSanitizer.html#introduction -- Peter Geoghegan. I'm really not sure that you'll
find any more bugs this way, but it's certainly possible that you'll
find a lot more. Given your success in finding bugs without using
AddressSanitizer, introducing it may be premature.
[1]: http://clang.llvm.org/docs/AddressSanitizer.html#introduction -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane writes:
Well, I certainly think all of these represent bugs:
3 | ERROR: plan should not reference subplan's variable
2 | ERROR: failed to assign all NestLoopParams to plan nodes
These appear to be related. The following query produces the former,
but if you replace the very last reference of provider with the literal
'bar', it raises the latter error.
select 1 from
pg_catalog.pg_shseclabel as rel_09
inner join public.rtest_view2 as rel_32
left join pg_catalog.pg_roles as rel_33
on (rel_32.a = rel_33.rolconnlimit )
on (rel_09.provider = rel_33.rolpassword )
left join pg_catalog.pg_user as rel_35
on (rel_33.rolconfig = rel_35.useconfig )
where ( ((rel_09.provider ~<~ 'foo')
and (rel_35.usename ~* rel_09.provider)));
,----[ FWIW: git bisect run ]
| first bad commit: [e83bb10d6dcf05a666d4ada00d9788c7974ad378]
| Adjust definition of cheapest_total_path to work better with LATERAL.
`----
regards,
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
On 08/01/2015 05:59 PM, Tom Lane wrote:
Well, I certainly think all of these represent bugs:
1 | ERROR: could not find pathkey item to sort
sqlsmith was able to find these two queries that trigger the error on my
machine:
Hmm ... I see no error with these queries as of today's HEAD or
back-branch tips. I surmise that this was triggered by one of the other
recently-fixed bugs, though the connection isn't obvious offhand.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/02/2015 10:18 PM, Tom Lane wrote:
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
On 08/01/2015 05:59 PM, Tom Lane wrote:
Well, I certainly think all of these represent bugs:
1 | ERROR: could not find pathkey item to sort
sqlsmith was able to find these two queries that trigger the error
on my machine:Hmm ... I see no error with these queries as of today's HEAD or
back-branch tips. I surmise that this was triggered by one of the
other recently-fixed bugs, though the connection isn't obvious
offhand.
Yeah, I'm no longer able to reproduce it either.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/01/2015 05:59 PM, Tom Lane wrote:
Well, I certainly think all of these represent bugs:
How about this one?
1 ERROR: could not find RelOptInfo for given relids
It's triggered on 13bba02271dce865cd20b6f49224889c73fed4e7 by this query
and the attached one:
select 1
from public.nums as r5006875
inner join information_schema.domains as r5006876
on (r5006875.n = r5006876.character_maximum_length )
inner join (select
r5006878.z as c5
from testxmlschema.test2 as r5006878
where r5006878.q >= r5006878.q) as subq_771817
on (r5006875.n = subq_771817.c5 )
right join public.shoelace_arrive as r5006879
on (r5006875.n = r5006879.arr_quant )
left join pg_catalog.pg_tablespace as r5006966
left join pg_catalog.pg_stat_user_functions as r5006967
on (r5006966.spcowner = r5006967.funcid )
inner join pg_catalog.pg_authid as r5006968
on (r5006967.funcname = r5006968.rolname )
on (subq_771817.c5 = r5006968.rolconnlimit )
where r5006875.n is NULL;
Attachments:
Peter Geoghegan writes:
On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
sqlsmith triggered the following assertion in master (c188204).
Thanks for writing sqlsmith. It seems like a great tool.
I wonder, are you just running the tool with assertions enabled when
PostgreSQL is built?
Right. I have to admit my testing setup is still more tailored towards
testing sqlsmith than postgres.
If so, it might make sense to make various problems more readily
detected. As you may know, Clang has a pretty decent option called
AddressSanitizer that can detect memory errors as they occur with an
overhead that is not excessive.
I didn't known this clang feature yet, thanks for pointing it out. I
considered running some instances under valgrind to detect these, but
the performance penalty seemed not worth it.
One might use the following configure arguments when building
PostgreSQL to use AddressSanitizer:./configure CC=clang CFLAGS='-O1 -g -fsanitize=address
-fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert
A quick attempt to sneak these in made my ansible playbooks unhappy due
to "make check" failures and other generated noise. I'll try to have an
instance with the AddressSanitizer active soon though.
Of course, it remains to be seen if this pays for itself. Apparently
the tool has about a 2x overhead [1]. I'm really not sure that you'll
find any more bugs this way, but it's certainly possible that you'll
find a lot more. Given your success in finding bugs without using
AddressSanitizer, introducing it may be premature.
Piotr also suggested on IRC to run coverage tests w/ sqlsmith. This
could yield valuable hints in which direction to extend sqlsmith's
grammar.
Thanks,
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
How about this one?
1 ERROR: could not find RelOptInfo for given relids
That would be a bug, for sure ...
It's triggered on 13bba02271dce865cd20b6f49224889c73fed4e7 by this query
and the attached one:
... but I can't reproduce it on HEAD with either of these queries.
Not clear why you're getting different results.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/03/2015 09:18 PM, Tom Lane wrote:
... but I can't reproduce it on HEAD with either of these queries.
Not clear why you're getting different results.
I'm terribly sorry, but I didn't notice that postgresql.conf was modified...
Set join_collapse_limit = 32 and you should see the error.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andreas Seltenreich <seltenreich@gmx.de> writes:
Tom Lane writes:
Well, I certainly think all of these represent bugs:
3 | ERROR: plan should not reference subplan's variable
2 | ERROR: failed to assign all NestLoopParams to plan nodes
These appear to be related. The following query produces the former,
but if you replace the very last reference of provider with the literal
'bar', it raises the latter error.
Fixed that, thanks for the test case!
,----[ FWIW: git bisect run ]
| first bad commit: [e83bb10d6dcf05a666d4ada00d9788c7974ad378]
| Adjust definition of cheapest_total_path to work better with LATERAL.
`----
There's still something fishy about your git bisect results; they don't
have much to do with what seems to me to be the triggering condition.
I suspect the problem is that git bisect doesn't allow for the possibility
that the symptom might appear and disappear over time, ie it might have
been visible at some early stage of the LATERAL work but been fixed later,
and then reintroduced by still-later optimization efforts.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi there,I've been following the sqlsmith work and wanted to jump in and try it out. I took Peter's idea and tried building postgres with the flags suggested but it was hard to get anything working.
I'm on commit 85e5e222b1dd02f135a8c3bf387d0d6d88e669bd (Tue Aug 4 14:55:32 2015 -0400)
Configure arguments:./configure --prefix=$HOME/pkg CC=clang CFLAGS='-O1 -g -fsanitize=address -fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert
I had to make a simple leak suppression file:
$ cat leak.supp
leak:save_ps_display_args
leak:__GI___strdup
$ export LSAN_OPTIONS=suppressions=leak.supp
And then I could run postgres. After 50,000 queries, I'm left with the following report:
queries: 50514
AST stats (avg): height = 7.29877 nodes = 37.8156
296 ERROR: canceling statement due to statement timeout
166 ERROR: invalid regular expression: quantifier operand invalid
26 ERROR: could not determine which collation to use for string comparison
23 ERROR: cannot compare arrays of different element types
12 ERROR: invalid regular expression: brackets [] not balanced
5 ERROR: cache lookup failed for index 2619
2 ERROR: invalid regular expression: parentheses () not balanced
error rate: 0.0104921
AddressSanitizer didn't fire except for the suppressed leaks. The suppressed leaks were only hit at the beginning:
-----------------------------------------------------
Suppressions used:
count bytes template
1 520 save_ps_display_args
1 10 __GI___strdup
-----------------------------------------------------
sqlsmith is a cool little piece of kit and I see a lot of room for on going work (performance bumps for more queries per second; more db back ends; different fuzzers).
Yours,Ewan Higgs
From: Peter Geoghegan <pg@heroku.com>
To: Andreas Seltenreich <seltenreich@gmx.de>
Cc: Pg Hackers <pgsql-hackers@postgresql.org>
Sent: Sunday, 2 August 2015, 10:39
Subject: Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
sqlsmith triggered the following assertion in master (c188204).
Thanks for writing sqlsmith. It seems like a great tool.
I wonder, are you just running the tool with assertions enabled when
PostgreSQL is built? If so, it might make sense to make various
problems more readily detected. As you may know, Clang has a pretty
decent option called AddressSanitizer that can detect memory errors as
they occur with an overhead that is not excessive. One might use the
following configure arguments when building PostgreSQL to use
AddressSanitizer:
./configure CC=clang CFLAGS='-O1 -g -fsanitize=address
-fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert
Of course, it remains to be seen if this pays for itself. Apparently
the tool has about a 2x overhead [1]http://clang.llvm.org/docs/AddressSanitizer.html#introduction -- Peter Geoghegan. I'm really not sure that you'll
find any more bugs this way, but it's certainly possible that you'll
find a lot more. Given your success in finding bugs without using
AddressSanitizer, introducing it may be premature.
[1]: http://clang.llvm.org/docs/AddressSanitizer.html#introduction -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
On 08/03/2015 09:18 PM, Tom Lane wrote:
... but I can't reproduce it on HEAD with either of these queries.
Not clear why you're getting different results.
I'm terribly sorry, but I didn't notice that postgresql.conf was modified...
Set join_collapse_limit = 32 and you should see the error.
Ah ... now I get that error on the smaller query, but the larger one
(that you put in an attachment) still doesn't show any problem.
Do you have any other nondefault settings?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/05/2015 02:24 AM, Tom Lane wrote:
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
On 08/03/2015 09:18 PM, Tom Lane wrote:
... but I can't reproduce it on HEAD with either of these queries.
Not clear why you're getting different results.I'm terribly sorry, but I didn't notice that postgresql.conf was modified...
Set join_collapse_limit = 32 and you should see the error.Ah ... now I get that error on the smaller query, but the larger one
(that you put in an attachment) still doesn't show any problem.
Do you have any other nondefault settings?
Sorry, that needs from_collapse_limit = 32 as well.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
On 08/05/2015 02:24 AM, Tom Lane wrote:
Piotr Stefaniak <postgres@piotr-stefaniak.me> writes:
Set join_collapse_limit = 32 and you should see the error.
Ah ... now I get that error on the smaller query, but the larger one
(that you put in an attachment) still doesn't show any problem.
Do you have any other nondefault settings?
Sorry, that needs from_collapse_limit = 32 as well.
Yeah, I assumed as much, but it still doesn't happen for me. Possibly
something platform-dependent in statistics, or something like that.
Anyway, I fixed the problem exposed by the smaller query; would you
check that the larger one is okay for you now?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane writes:
On 08/01/2015 05:59 PM, Tom Lane wrote:
Well, I certainly think all of these represent bugs:
1 | ERROR: could not find pathkey item to sort
Hmm ... I see no error with these queries as of today's HEAD or
back-branch tips. I surmise that this was triggered by one of the other
recently-fixed bugs, though the connection isn't obvious offhand.
I still see this error in master as of b8cbe43, but the queries are
indeed a pita to reproduce. The one below is the only one so far that
is robust against running ANALYZE on the regression db, and also
reproduces when I run it as an EXTRA_TEST with make check.
regards,
Andreas
select
rel_217088662.a as c0,
rel_217088554.a as c1,
rel_217088662.b as c2,
subq_34235266.c0 as c3,
rel_217088660.id2 as c4,
rel_217088660.id2 as c5
from
public.clstr_tst as rel_217088554
inner join (select
rel_217088628.a as c0
from
public.rtest_vview3 as rel_217088628
where (rel_217088628.b !~ rel_217088628.b)
and ((((rel_217088628.b ~~* rel_217088628.b)
or (rel_217088628.b ~* rel_217088628.b))
or (rel_217088628.b <> rel_217088628.b))
or (rel_217088628.b = rel_217088628.b))) as subq_34235266
inner join public.num_exp_mul as rel_217088660
inner join public.onek2 as rel_217088661
on (rel_217088660.id1 = rel_217088661.unique1 )
on (subq_34235266.c0 = rel_217088660.id1 )
inner join public.main_table as rel_217088662
on (rel_217088661.unique2 = rel_217088662.a )
on (rel_217088554.b = rel_217088660.id1 )
where rel_217088554.d = rel_217088554.c
fetch first 94 rows only;
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers