Safer and faster get_attstatsslot()

Started by Tom Lanealmost 9 years ago3 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Monday's round of security patches was a lot more exciting than I would
have liked, because code that worked fine for Peter and me failed
erratically in the buildfarm. What eventually emerged was that I'd added
some missing free_attstatsslot() calls in rangetypes_selfuncs.c, and
naively copied the first argument (atttype) from the matching
get_attstatsslot() calls. One of those atttype values was in fact
wrong for the slot in question; this had been missed for years because
get_attstatsslot() doesn't actually do anything with that argument.
I think that at one point we had, or at least in the original conception
intended to have, an Assert that the atttype matched the actual stats
array element type found in the pg_statistic row; but we had to remove
it because in some cases the type in pg_statistic is only
binary-compatible with the datatype the applied operator is expecting.

So the existing API for get_attstatsslot()/free_attstatsslot() is just
seriously bug-prone. It would be better if the caller did not have
to supply any type information; indeed really what we'd want is for
get_attstatsslot() to pass back the actual type it found in pg_statistic.

I also realized as I looked at the code that it's exceedingly inefficient
if the array element type is pass-by-reference --- then it'll incur a
separate palloc, copy, and pfree for each element. We'd be a lot better
off to copy the stats array as a whole, especially since that would come
for free in the probably-common case that the array has to be detoasted.
This code was written with very small stats targets in mind, like about
10, and it just doesn't look very good when you're imagining 1000 or more
entries in the stats array.

So attached is a proposed redesign that makes the API for
get_attstatsslot()/free_attstatsslot() simpler and hopefully more
foolproof. I've not made any particular attempt to performance-test
it, but it really ought to be a significant win for pass-by-ref element
types. It will add an array-copying step that wasn't there before when
the element type is pass-by-value and the array isn't toasted, but that
seems like an acceptable price.

BTW, this patch makes the skewColType and skewColTypmod fields of Hash
plan nodes unnecessary, as the only reason for them was to satisfy
get_attstatsslot(). I didn't remove them here but it would make sense
to do so.

Comments? Is this something that'd be OK to push now, or had I better
sit on it till v11? Being freshly burned, I kinda want to fix it now,
but I recognize that my judgment may be colored by that.

regards, tom lane

Attachments:

fix-get-free-attstatsslot-api-1.patchtext/x-diff; charset=us-ascii; name=fix-get-free-attstatsslot-api-1.patchDownload+1001-990
#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#1)
Re: Safer and faster get_attstatsslot()

On 11 May 2017 at 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...because code that worked fine for Peter and me failed
erratically in the buildfarm.

I think its always a little bit too exciting for me also.

I suggest we have a commit tree and a main tree, with automatic
copying from commit -> main either
1. 24 hours after commit
2. or earlier if we have a full set of green results from people
running the full suite on the commit tree

That way we don't have to polute the main tree with all this jumping around.

Many alternate ideas possible.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#2)
Re: Safer and faster get_attstatsslot()

Simon Riggs <simon@2ndquadrant.com> writes:

On 11 May 2017 at 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...because code that worked fine for Peter and me failed
erratically in the buildfarm.

I think its always a little bit too exciting for me also.

I suggest we have a commit tree and a main tree, with automatic
copying from commit -> main either
1. 24 hours after commit
2. or earlier if we have a full set of green results from people
running the full suite on the commit tree

Meh. We don't really need that in normal development, and for security
patches there's still a problem: we don't want to wait around 24 hours
after the code is public.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers