pg_partition_tree crashes for a non-defined relation

Started by Michael Paquierover 7 years ago36 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

While testing another patch, I have bumped into the issue of
$subject... I should have put some more negative testing from the start
on this stuff, here is a culprit query when passing directly an OID:
select pg_partition_tree(0);

I think that we should make the function return NULL if the relation
defined does not exist, as we usually do for system-facing functions.
It is also easier for the caller to know that the relation does not
exist instead of having a plpgsql try/catch wrapper or such.

Thoughts?
--
Michael

Attachments:

partition-tree-invalid.patchtext/x-diff; charset=us-asciiDownload+11-0
#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#1)
Re: pg_partition_tree crashes for a non-defined relation

On Fri, Dec 07, 2018 at 10:04:06AM +0900, Michael Paquier wrote:

While testing another patch, I have bumped into the issue of
$subject... I should have put some more negative testing from the start
on this stuff, here is a culprit query when passing directly an OID:
select pg_partition_tree(0);

I think that we should make the function return NULL if the relation
defined does not exist, as we usually do for system-facing functions.
It is also easier for the caller to know that the relation does not
exist instead of having a plpgsql try/catch wrapper or such.

Thoughts?

Are there any objections about fixing this issue? I would rather fix it
sonner than later.
--
Michael

#3Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Michael Paquier (#2)
Re: pg_partition_tree crashes for a non-defined relation

Hi,

Sorry for not replying sooner.

On Sat, Dec 8, 2018 at 8:06 Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Dec 07, 2018 at 10:04:06AM +0900, Michael Paquier wrote:

While testing another patch, I have bumped into the issue of
$subject... I should have put some more negative testing from the start
on this stuff, here is a culprit query when passing directly an OID:
select pg_partition_tree(0);

I think that we should make the function return NULL if the relation
defined does not exist, as we usually do for system-facing functions.
It is also easier for the caller to know that the relation does not
exist instead of having a plpgsql try/catch wrapper or such.

Thoughts?

Are there any objections about fixing this issue? I would rather fix it
sonner than later.

Thanks for noticing it and creating the patch. The fix makes sense.

Regards,
Amit

#4Michael Paquier
michael@paquier.xyz
In reply to: Amit Langote (#3)
Re: pg_partition_tree crashes for a non-defined relation

On Sat, Dec 08, 2018 at 12:28:53PM +0900, Amit Langote wrote:

Thanks for noticing it and creating the patch. The fix makes sense.

Thanks a lot for looking at it!
--
Michael

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#2)
Re: pg_partition_tree crashes for a non-defined relation

Michael Paquier <michael@paquier.xyz> writes:

On Fri, Dec 07, 2018 at 10:04:06AM +0900, Michael Paquier wrote:

I think that we should make the function return NULL if the relation
defined does not exist, as we usually do for system-facing functions.
It is also easier for the caller to know that the relation does not
exist instead of having a plpgsql try/catch wrapper or such.

Are there any objections about fixing this issue? I would rather fix it
sonner than later.

Return NULL seems a reasonable behavior.

How about cases where the relation OID exists but it's the wrong
kind of relation?

regards, tom lane

#6Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#5)
Re: pg_partition_tree crashes for a non-defined relation

On Fri, Dec 07, 2018 at 11:33:32PM -0500, Tom Lane wrote:

How about cases where the relation OID exists but it's the wrong
kind of relation?

Such cases already return an error:
=# create sequence popo;
CREATE SEQUENCE
=# select pg_partition_tree('popo');
ERROR: 42809: "popo" is not a table, a foreign table, or an index
LOCATION: pg_partition_tree, partitionfuncs.c:54

I think that's a sensible choice, because it makes no sense to look for
the inheritors of unsupported relkinds. Perhaps you have a different
view on the matter?
--
Michael

#7Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#6)
Re: pg_partition_tree crashes for a non-defined relation

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Fri, Dec 07, 2018 at 11:33:32PM -0500, Tom Lane wrote:

How about cases where the relation OID exists but it's the wrong
kind of relation?

Such cases already return an error:
=# create sequence popo;
CREATE SEQUENCE
=# select pg_partition_tree('popo');
ERROR: 42809: "popo" is not a table, a foreign table, or an index
LOCATION: pg_partition_tree, partitionfuncs.c:54

I think that's a sensible choice, because it makes no sense to look for
the inheritors of unsupported relkinds. Perhaps you have a different
view on the matter?

We should really have a more clearly defined policy around this, but my
recollection is that we often prefer to return NULL rather than throwing
an error for the convenience of people doing things like querying
pg_class using similar functions.

I wonder if we maybe should have a regression test for every such
function which just queries the catalog in a way to force the function
to be called for every relation defined in the regression tests, to
ensure that it doesn't segfault or throw an error..

Thanks!

Stephen

#8Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#7)
Re: pg_partition_tree crashes for a non-defined relation

On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:

We should really have a more clearly defined policy around this, but my
recollection is that we often prefer to return NULL rather than throwing
an error for the convenience of people doing things like querying
pg_class using similar functions.

Yes, that's visibly right. At least that's what I can see from the
various pg_get_*def and pg_*_is_visible. Returning NULL would indeed
be more consistent.

I wonder if we maybe should have a regression test for every such
function which just queries the catalog in a way to force the function
to be called for every relation defined in the regression tests, to
ensure that it doesn't segfault or throw an error..

Like sqlsmith? It looks hard to me to make something like that part of
the main regression test suite, as that's going to be costly and hard to
scale with.
--
Michael

#9Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#8)
Re: pg_partition_tree crashes for a non-defined relation

On Sun, Dec 09, 2018 at 08:15:07AM +0900, Michael Paquier wrote:

On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:

We should really have a more clearly defined policy around this, but my
recollection is that we often prefer to return NULL rather than throwing
an error for the convenience of people doing things like querying
pg_class using similar functions.

Yes, that's visibly right. At least that's what I can see from the
various pg_get_*def and pg_*_is_visible. Returning NULL would indeed
be more consistent.

Thinking more about your argument, scanning fully pg_class is quite
sensible as well because there is no need to apply an extra qual on
relkind, so let's change the function as you suggest, by returning NULL
on invalid relation type. Any opinions about the attached then which
does the switch?
--
Michael

Attachments:

partition-tree-invalid-v2.patchtext/x-diff; charset=us-asciiDownload+22-6
#10Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#8)
Re: pg_partition_tree crashes for a non-defined relation

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:

I wonder if we maybe should have a regression test for every such
function which just queries the catalog in a way to force the function
to be called for every relation defined in the regression tests, to
ensure that it doesn't segfault or throw an error..

Like sqlsmith? It looks hard to me to make something like that part of
the main regression test suite, as that's going to be costly and hard to
scale with.

No, I mean something like:

with x as (select pg_partition_tree(relname) from pg_class)
select 1 from x limit 1;

or whatever it takes to make sure that the function is run against every
entry in pg_class (or whatever is appropriate) while not returning the
results (since we don't actually care about the output, we just want to
make sure it doesn't ERROR or crash).

sqlsmith, as I recall, doesn't care about ERROR cases, it's just looking
for crashes, so it's not quite the same thing.

Thanks!

Stephen

#11Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#9)
Re: pg_partition_tree crashes for a non-defined relation

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Sun, Dec 09, 2018 at 08:15:07AM +0900, Michael Paquier wrote:

On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:

We should really have a more clearly defined policy around this, but my
recollection is that we often prefer to return NULL rather than throwing
an error for the convenience of people doing things like querying
pg_class using similar functions.

Yes, that's visibly right. At least that's what I can see from the
various pg_get_*def and pg_*_is_visible. Returning NULL would indeed
be more consistent.

Thinking more about your argument, scanning fully pg_class is quite
sensible as well because there is no need to apply an extra qual on
relkind, so let's change the function as you suggest, by returning NULL
on invalid relation type. Any opinions about the attached then which
does the switch?

Looks alright on a quick glance, but shouldn't you also update the
comment..?

Thanks!

Stephen

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#10)
Re: pg_partition_tree crashes for a non-defined relation

Stephen Frost <sfrost@snowman.net> writes:

* Michael Paquier (michael@paquier.xyz) wrote:

On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:

I wonder if we maybe should have a regression test for every such
function which just queries the catalog in a way to force the function
to be called for every relation defined in the regression tests, to
ensure that it doesn't segfault or throw an error..

No, I mean something like:
with x as (select pg_partition_tree(relname) from pg_class)
select 1 from x limit 1;
or whatever it takes to make sure that the function is run against every
entry in pg_class (or whatever is appropriate) while not returning the
results (since we don't actually care about the output, we just want to
make sure it doesn't ERROR or crash).

I'm going to object to that on cost grounds. It is not reasonable to
run moderately-expensive functions like this on more than a thousand
pg_class entries in order to test what are just a few distinct cases,
especially in code that's highly unlikely to break once written.
Or at least, it's not reasonable to do that every time anybody runs
the regression tests for the rest of our lives.

Moreover, this would only help check a specific new function if the
author or reviewer remembered to add a test case that does it.
Since the whole problem here is "I forgot to consider this", I don't
have much faith in that happening.

Maybe we should have some sort of checklist of things to think about
when adding new SQL-visible functions, rather than trying to
memorialize every such consideration as a regression test no matter
how expensive. Admittedly, "I forgot to go over the checklist" is
still a problem; but at least it's not penalizing every developer and
every buildfarm run till kingdom come.

regards, tom lane

#13Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#12)
Re: pg_partition_tree crashes for a non-defined relation

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Michael Paquier (michael@paquier.xyz) wrote:

On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:

I wonder if we maybe should have a regression test for every such
function which just queries the catalog in a way to force the function
to be called for every relation defined in the regression tests, to
ensure that it doesn't segfault or throw an error..

No, I mean something like:
with x as (select pg_partition_tree(relname) from pg_class)
select 1 from x limit 1;
or whatever it takes to make sure that the function is run against every
entry in pg_class (or whatever is appropriate) while not returning the
results (since we don't actually care about the output, we just want to
make sure it doesn't ERROR or crash).

I'm going to object to that on cost grounds. It is not reasonable to
run moderately-expensive functions like this on more than a thousand
pg_class entries in order to test what are just a few distinct cases,
especially in code that's highly unlikely to break once written.

Yeah, I had been wondering if it would be too expensive also.

I don't entirely buy off on the argument that it's code that's 'highly
unlikely to break once written' though- we do add new relkinds from time
to time, for example. Perhaps we could have these functions run just
once per relkind.

Moreover, this would only help check a specific new function if the
author or reviewer remembered to add a test case that does it.

We could possibly write a test which would run every function that
accepts the typical data types (oid/regclass/name/etc) instead of
depending on the author to remember to add it.

Since the whole problem here is "I forgot to consider this", I don't
have much faith in that happening.

Yeah, I agree that we'd want something more automated as, otherwise, it
would definitely be likely to be forgotten.

Maybe we should have some sort of checklist of things to think about
when adding new SQL-visible functions, rather than trying to
memorialize every such consideration as a regression test no matter
how expensive. Admittedly, "I forgot to go over the checklist" is
still a problem; but at least it's not penalizing every developer and
every buildfarm run till kingdom come.

This seems like something we should do regardless. Moreover, I'd
suggest that we start documenting some of these policies in the way that
we have a style guide for errors and such- we need a "system function
style guide" that could start with something like "prefer to return NULL
instead of ERROR on unexpected but otherwise valid inputs, and test that
the code doesn't segfault when given a variety of inputs".

Thanks!

Stephen

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#13)
Re: pg_partition_tree crashes for a non-defined relation

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

... especially in code that's highly unlikely to break once written.

I don't entirely buy off on the argument that it's code that's 'highly
unlikely to break once written' though- we do add new relkinds from time
to time, for example. Perhaps we could have these functions run just
once per relkind.

Well, the relevant code is likely to be "if relkind is not x, y, or z,
then PG_RETURN_NULL". If we add a new relkind and forget to consider the
function, the outcome is a NULL result that perhaps should not have been
NULL ... but a test like this won't help us notice that.

regards, tom lane

#15Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#14)
Re: pg_partition_tree crashes for a non-defined relation

On Sun, Dec 09, 2018 at 02:07:29PM -0500, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

I don't entirely buy off on the argument that it's code that's 'highly
unlikely to break once written' though- we do add new relkinds from time
to time, for example. Perhaps we could have these functions run just
once per relkind.

Well, the relevant code is likely to be "if relkind is not x, y, or z,
then PG_RETURN_NULL". If we add a new relkind and forget to consider the
function, the outcome is a NULL result that perhaps should not have been
NULL ... but a test like this won't help us notice that.

Yes, in order to prevent problems with newly-introduced relkinds, I
think that the checks within functions should be careful to check only
for relkinds that they support, and not list those they do not support.
--
Michael

#16Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#11)
Re: pg_partition_tree crashes for a non-defined relation

On Sun, Dec 09, 2018 at 12:49:18PM -0500, Stephen Frost wrote:

Looks alright on a quick glance, but shouldn't you also update the
comment..?

The comment on HEAD or with the patch is that:
/* Only allow relation types that can appear in partition trees. */

This still looks adapted to me. Or would you reword it because "allow"
rather implies that an ERROR is returned? Would you prefer changing it
something like that perhaps:
"Return NULL for relation types that do not appear in partition trees."
--
Michael

#17Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#15)
Re: pg_partition_tree crashes for a non-defined relation

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Sun, Dec 09, 2018 at 02:07:29PM -0500, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

I don't entirely buy off on the argument that it's code that's 'highly
unlikely to break once written' though- we do add new relkinds from time
to time, for example. Perhaps we could have these functions run just
once per relkind.

Well, the relevant code is likely to be "if relkind is not x, y, or z,
then PG_RETURN_NULL". If we add a new relkind and forget to consider the
function, the outcome is a NULL result that perhaps should not have been
NULL ... but a test like this won't help us notice that.

Yes, in order to prevent problems with newly-introduced relkinds, I
think that the checks within functions should be careful to check only
for relkinds that they support, and not list those they do not support.

Yes, but it's certainly possible for someone to add another relkind to
that list without looking through the rest of the function carefully
enough. Perhaps not the best example. I still don't like the
assumption that the code won't be broken once it's written correctly the
first time though, and I continue to feel that it would be good to have
regression tests which run these functions with interesting arguments.

I also agree that we don't want to make the regression tests take a lot
of additional time.

Thanks!

Stephen

#18Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#16)
Re: pg_partition_tree crashes for a non-defined relation

Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:

On Sun, Dec 09, 2018 at 12:49:18PM -0500, Stephen Frost wrote:

Looks alright on a quick glance, but shouldn't you also update the
comment..?

The comment on HEAD or with the patch is that:
/* Only allow relation types that can appear in partition trees. */

This still looks adapted to me. Or would you reword it because "allow"
rather implies that an ERROR is returned? Would you prefer changing it
something like that perhaps:
"Return NULL for relation types that do not appear in partition trees."

Yes.

Thanks!

Stephen

#19Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#18)
Re: pg_partition_tree crashes for a non-defined relation

On Mon, Dec 10, 2018 at 08:21:43AM -0500, Stephen Frost wrote:

* Michael Paquier (michael@paquier.xyz) wrote:

This still looks adapted to me. Or would you reword it because "allow"
rather implies that an ERROR is returned? Would you prefer changing it
something like that perhaps:
"Return NULL for relation types that do not appear in partition trees."

Yes.

OK. Sure, let's do as you suggest then. I'll wait a couple of days
before committing the patch so as if someone has extra comments they
have the time to post. So please feel free to comment!
--
Michael

#20Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#19)
Re: pg_partition_tree crashes for a non-defined relation

On Mon, Dec 10, 2018 at 10:52:37PM +0900, Michael Paquier wrote:

OK. Sure, let's do as you suggest then. I'll wait a couple of days
before committing the patch so as if someone has extra comments they
have the time to post. So please feel free to comment!

And done this way. Thanks for the input, Stephen and others!
--
Michael

#21Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Michael Paquier (#20)
#22Michael Paquier
michael@paquier.xyz
In reply to: Amit Langote (#21)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#14)
#24Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#23)
#25Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Michael Paquier (#24)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#24)
#27Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#26)
#28Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Michael Paquier (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#28)
#30Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#30)
#32Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#31)
#33Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Michael Paquier (#32)
#34Michael Paquier
michael@paquier.xyz
In reply to: Amit Langote (#33)
#35Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#26)
#36Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#35)