constraint exclusion analysis caching

Started by Andrew Dunstanover 17 years ago13 messages
#1Andrew Dunstan
andrew@dunslane.net

Yesterday a client and I were sad to discover that the overhead of
constraint exclusion is apparently O(n) in the number of partitions, and
that where we had ~180 partitions each with a simple constraint (check
(field = nnn)) the overhead appeared to amount to about 0.25s on some
quite performant hardware, which is way too high for our application.
Actual execution of the query in question was talking one tenth of that
time.

For now we're going to work around this by directing the queries
directly to the child tables, although this does involve fairly large
application changes.

However, I wondered if we couldn't mitigate this by caching the results
of constraint exclusion analysis for a particular table + condition. I
have no idea how hard this would be, but in principle it seems silly to
keep paying the same penalty over and over again.

Thoughts?

cheers

andrew

#2Csaba Nagy
nagy@ecircle-ag.com
In reply to: Andrew Dunstan (#1)
Re: constraint exclusion analysis caching

On Fri, 2008-05-09 at 08:47 -0400, Andrew Dunstan wrote:

However, I wondered if we couldn't mitigate this by caching the results
of constraint exclusion analysis for a particular table + condition. I
have no idea how hard this would be, but in principle it seems silly to
keep paying the same penalty over and over again.

This would be a perfect candidate for the plan-branch based on actual
parameters capability, in association with globally cached plans
mentioned here:

http://archives.postgresql.org/pgsql-hackers/2008-04/msg00920.php

Cheers,
Csaba.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: constraint exclusion analysis caching

Andrew Dunstan <andrew@dunslane.net> writes:

Yesterday a client and I were sad to discover that the overhead of
constraint exclusion is apparently O(n) in the number of partitions, and
that where we had ~180 partitions each with a simple constraint (check
(field = nnn)) the overhead appeared to amount to about 0.25s on some
quite performant hardware, which is way too high for our application.

I would think that any sort of formal partitioning feature would fix the
problem, because the planner would understand directly about
partitioning instead of having to prove the correctness of not scanning
each one of the other 179 partitions. The existing feature is cool in
the sense of obtaining useful behavior from generalized spare parts,
but it was never designed or expected to give great planning speed
with large numbers of partitions. TFM points out that constraint
exclusion cannot scale beyond perhaps a hundred partitions ...

regards, tom lane

#4Gregory Stark
stark@enterprisedb.com
In reply to: Andrew Dunstan (#1)
Re: constraint exclusion analysis caching

"Andrew Dunstan" <andrew@dunslane.net> writes:

Actual execution of the query in question was talking one tenth of that
time.
...
but in principle it seems silly to keep paying the same penalty over and
over again.

I would think constraint_exclusion only really makes sense if you're spending
a lot more time executing than planning queries. Either that means you're
preparing queries once and then executing them many many times or you're
planning much slower queries where planning time is insignificant compared to
the time to execute them.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

#5Simon Riggs
simon@2ndquadrant.com
In reply to: Andrew Dunstan (#1)
Re: constraint exclusion analysis caching

On Fri, 2008-05-09 at 08:47 -0400, Andrew Dunstan wrote:

Yesterday a client and I were sad to discover that the overhead of
constraint exclusion is apparently O(n) in the number of partitions, and
that where we had ~180 partitions each with a simple constraint (check
(field = nnn)) the overhead appeared to amount to about 0.25s on some
quite performant hardware, which is way too high for our application.
Actual execution of the query in question was talking one tenth of that
time.

For now we're going to work around this by directing the queries
directly to the child tables, although this does involve fairly large
application changes.

However, I wondered if we couldn't mitigate this by caching the results
of constraint exclusion analysis for a particular table + condition. I
have no idea how hard this would be, but in principle it seems silly to
keep paying the same penalty over and over again.

I think the only way forward is to put an index across the constraints,
to allow the exclusion time to be O(logN).

Currently the constraints are all independent of each other and can even
overlap. So we would need a way of

* confirming that the partitions are non-overlapping
* defining some structure to them, to allow them to be organised in a
sequence that allows either a bsearch or an index to exist

The latter requires some kind of top-down definition, which hopefully is
on the way from Gavin.

This can then allow exclusion to take place dynamically within the
executor, to allow a form of nested join.

My other requirements are noted here...
http://wiki.postgresql.org/wiki/Image:Partitioning_Requirements.pdf

I'm not working on this at all at the moment.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#6Stephen Frost
sfrost@snowman.net
In reply to: Gregory Stark (#4)
Re: constraint exclusion analysis caching

* Gregory Stark (stark@enterprisedb.com) wrote:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Actual execution of the query in question was talking one tenth of that
time.
...
but in principle it seems silly to keep paying the same penalty over and
over again.

I would think constraint_exclusion only really makes sense if you're spending
a lot more time executing than planning queries. Either that means you're
preparing queries once and then executing them many many times or you're
planning much slower queries where planning time is insignificant compared to
the time to execute them.

Would it be possible to change the application to use prepared queries?
Seems like that'd make more sense the changing it to use the child
tables directly.. Just my 2c.

Thanks,

Stephen

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Stephen Frost (#6)
Re: constraint exclusion analysis caching

Stephen Frost wrote:

* Gregory Stark (stark@enterprisedb.com) wrote:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Actual execution of the query in question was talking one tenth of that
time.
...
but in principle it seems silly to keep paying the same penalty over and
over again.

I would think constraint_exclusion only really makes sense if you're spending
a lot more time executing than planning queries. Either that means you're
preparing queries once and then executing them many many times or you're
planning much slower queries where planning time is insignificant compared to
the time to execute them.

Would it be possible to change the application to use prepared queries?
Seems like that'd make more sense the changing it to use the child
tables directly.. Just my 2c.

This is actually a technique already used elsewhere in the app, so it
will fit quite well. Thanks for the suggestion, though.

(BTW, why does your MUA set Mail-Followup-To: (and do it badly, what's
more) ?)

cheers

andrew

#8Stephen Frost
sfrost@snowman.net
In reply to: Andrew Dunstan (#7)
Re: constraint exclusion analysis caching

* Andrew Dunstan (andrew@dunslane.net) wrote:

Seems like that'd make more sense the changing it to use the child
tables directly.. Just my 2c.

This is actually a technique already used elsewhere in the app, so it
will fit quite well. Thanks for the suggestion, though.

Sure.

(BTW, why does your MUA set Mail-Followup-To: (and do it badly, what's
more) ?)

I'm amazed at the number of people who ask me this.. Guess it's just
different for different communities. Basically, I like to keep my mail
in the different folders it belongs in, so I'd rather get responses to
my emails through the list than directly to me. Additionally, I don't
really need to get two copies of every email sent to me on a mailing
list.

It's actually really frowned upon in the Debian community to not respect
MFT and it's common to have it set to just the mailing list.

More information about it: http://cr.yp.to/proto/replyto.html

Enjoy,

Stephen

#9Gregory Stark
stark@enterprisedb.com
In reply to: Stephen Frost (#8)
Re: constraint exclusion analysis caching

"Stephen Frost" <sfrost@snowman.net> writes:

I'd rather get responses to my emails through the list than directly to me.
Additionally, I don't really need to get two copies of every email sent to
me on a mailing list.

Then doesn't setting it to:
Andrew Dunstan <andrew@dunslane.net>,PostgreSQL-development <pgsql-hackers@postgresql.org>

do precisely the opposite of what you would want?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Stephen Frost (#8)
Re: constraint exclusion analysis caching

Stephen Frost wrote:

(BTW, why does your MUA set Mail-Followup-To: (and do it badly, what's
more) ?)

I'm amazed at the number of people who ask me this.. Guess it's just
different for different communities. Basically, I like to keep my mail
in the different folders it belongs in, so I'd rather get responses to
my emails through the list than directly to me. Additionally, I don't
really need to get two copies of every email sent to me on a mailing
list.

I am amazed that you don't see that what your MUA is doing is actually
both wrong and that it inconveniences people.

For example, because it put *my* address in the list for your message
above, it caused my MUA quite correctly to add a To: line to myself,
which I certainly didn't want to do.

And it's completely unnecessary. For example, I have set my majordomo
preferences for the postgresql.org lists not to send me copies of emails
where I am also in the To: or Cc: lines. After doing that I get no
duplicates.

And I don't casue anyone else to have to edit the addresses when they
reply to my mail.

If you want to ensure that you reply to a list, use an MUA that has a
reply-to-list command - I see you use mutt, which has such a command IIRC.

cheers

andrew

#11Stephen Frost
sfrost@snowman.net
In reply to: Andrew Dunstan (#10)
Re: constraint exclusion analysis caching

Andrew,

* Andrew Dunstan (andrew@dunslane.net) wrote:

For example, because it put *my* address in the list for your message
above, it caused my MUA quite correctly to add a To: line to myself,
which I certainly didn't want to do.

Honestly, I suspect thunderbird just doesn't know your addresses if
it's adding your address back in. Adding your address isn't for you-
it's for other people. The, completely reasonable, assumption is that
if your address was included in a To or Cc that you're not on the list
and stripping that out would mean you'd be left out.

And it's completely unnecessary. For example, I have set my majordomo
preferences for the postgresql.org lists not to send me copies of emails
where I am also in the To: or Cc: lines. After doing that I get no
duplicates.

This doesn't help at all, actually. As I pointed out previously, I
*want* the mail through the list, what I *don't* want is people sending
list mail directly to me.

And I don't casue anyone else to have to edit the addresses when they
reply to my mail.

Are you sure thunderbird recognizes the email address you use for
posting as a local identity/account? Mutt has a specific 'alternates'
configuration to let it know what addresses are local.

If you want to ensure that you reply to a list, use an MUA that has a
reply-to-list command - I see you use mutt, which has such a command
IIRC.

Indeed, and it's exactly what I use when replying to list mail. The
issue isn't making sure that *I* reply to a list, it's asking other
people to reply through the list rather than to me.

Thanks,

Stephen

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Stephen Frost (#11)
Re: constraint exclusion analysis caching

Stephen Frost wrote:

Andrew,

* Andrew Dunstan (andrew@dunslane.net) wrote:

For example, because it put *my* address in the list for your message
above, it caused my MUA quite correctly to add a To: line to myself,
which I certainly didn't want to do.

Honestly, I suspect thunderbird just doesn't know your addresses if
it's adding your address back in. Adding your address isn't for you-
it's for other people. The, completely reasonable, assumption is that
if your address was included in a To or Cc that you're not on the list
and stripping that out would mean you'd be left out.

And it's completely unnecessary. For example, I have set my majordomo
preferences for the postgresql.org lists not to send me copies of emails
where I am also in the To: or Cc: lines. After doing that I get no
duplicates.

This doesn't help at all, actually. As I pointed out previously, I
*want* the mail through the list, what I *don't* want is people sending
list mail directly to me.

And I don't casue anyone else to have to edit the addresses when they
reply to my mail.

Are you sure thunderbird recognizes the email address you use for
posting as a local identity/account? Mutt has a specific 'alternates'
configuration to let it know what addresses are local.

If you want to ensure that you reply to a list, use an MUA that has a
reply-to-list command - I see you use mutt, which has such a command
IIRC.

Indeed, and it's exactly what I use when replying to list mail. The
issue isn't making sure that *I* reply to a list, it's asking other
people to reply through the list rather than to me.

a. I don't use Thunderbird.
b. Of couse the MUA knows what my address is.
c. Yours are pretty much the *only* settings of all the users of this
list that cause me issues. Judging by your own words I am not alone in
being thus inconvenienced (otherwise, why would "an amazing number" of
people ask you about it?). If you don't care about that then there's
nothing much I can do. Alvaro used to have a similar setup. When I
complained he very kindly fixed it.
d. Your "completely reasonable" assumption above is, of course, bogus.
Most people when replying to a list reply to all adresses. Assuming that
the non-list addresses are for people not on the list is nonsense.

cheers

andrew

#13Alvaro Herrera
alvherre@commandprompt.com
In reply to: Stephen Frost (#11)
Re: constraint exclusion analysis caching

Stephen Frost wrote:

And it's completely unnecessary. For example, I have set my majordomo
preferences for the postgresql.org lists not to send me copies of emails
where I am also in the To: or Cc: lines. After doing that I get no
duplicates.

This doesn't help at all, actually. As I pointed out previously, I
*want* the mail through the list, what I *don't* want is people sending
list mail directly to me.

Wouldn't it make sense, then, to filter any email which is Cc'ed to a
list, into that list's folder? Add to that a bit of duplicate removal
(say, procmail's, or whatever you use) and you're set.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.