Query plans for plpgsql triggers

Started by Eric Ridgeabout 20 years ago6 messagesgeneral
Jump to latest
#1Eric Ridge
ebr@tcdi.com

I've found a few performance issues with an internal database
application and I'm sure it's related to my misunderstanding of how
and when queries are planned when used in a plpgsql function. This
is against Postgres 7.4.

For example, suppose this function is defined as a per-statement
update trigger on "some_table":

CREATE OR REPLACE FUNCTION foo() RETURNS TRIGGER LANGUAGE 'plpsql' AS
'DECLARE
my_id int8;
BEGIN
my_id := 12; -- some arbitrary value
UPDATE some_other_table SET field = 'value' WHERE id = my_id::int8;
RETURN NULL;
END;'

The above function is (obviously) a stripped down version of
something real, but it accurately represents the basics.

When is the UPDATE statement inside foo() planned? When the trigger
is first created, or when it's first used per backend, or every time
it's used per backend? It's gotta be one of the former, because it
sure ain't the latter.

I dunno what plan is being generated, but it's gotta be using a
sequential scan. "some_other_table" contains roughly 2 million rows
and the "id" column is uniquely indexed, yet the UPDATE takes 35-40
seconds when run via the trigger, but only milliseconds if the
equivalent UPDATE statement is played into psql. The database is
freshly vacuumed and analyzed. And of course an EXPLAIN via psql
shows an index scan.

If I turn on statement logging I can plainly see that the embedded
UPDATE statement is the query taking a long time. Additionally, a
gdb stacktrace of the backend shows Postgres is somewhere inside the
pl_pgsql call handlers.

Changing the UPDATE to be:
EXECUTE ''UPDATE some_other_table SET field = ''''value'''' WHERE id
= '' || my_id || ''::int8'';

seems to "solve" the bad planning problem, but this sure is ugly.

Maybe I missed it in the 7.4 docs, but I can't find any information
on query planning for plpgsql functions. Any insight into how this
works would be greatly appreciated. Also, any mention of how PG
8.1.3 differs in this regard would also be handy.

thanks!

eric

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Ridge (#1)
Re: Query plans for plpgsql triggers

"Eric B. Ridge" <ebr@tcdi.com> writes:

When is the UPDATE statement inside foo() planned? When the trigger
is first created, or when it's first used per backend, or every time
it's used per backend?

First use per backend, ignoring corner cases such as replacing the
function definition.

I dunno what plan is being generated, but it's gotta be using a
sequential scan.

The issue is probably that the planner is seeing a parameterized
query. Try this:

prepare foo(int8) as update some_other_table SET field = 'value' WHERE id = $1;
explain execute foo(42);

and see what plan you get. If the id field has sufficiently
discouraging statistics then the planner may think that a seqscan
is the safest plan. In a "normal" query where you're comparing id
to a constant, the planner can see whether the constant matches any
of the most common values for the column --- if it doesn't then an
indexscan is a good plan.

If you really want a replan every time, you can get it by using
EXECUTE.

regards, tom lane

#3Eric Ridge
ebr@tcdi.com
In reply to: Tom Lane (#2)
Re: Query plans for plpgsql triggers

On Mar 24, 2006, at 11:39 PM, Tom Lane wrote:

The issue is probably that the planner is seeing a parameterized
query. Try this:

prepare foo(int8) as update some_other_table SET field = 'value'
WHERE id = $1;
explain execute foo(42);

I should have mentioned that while the UPDATE statement in the
trigger function really is as simple as the above, "some_other_table"
is actually a view with the requisite ON UPDATE DO INSTEAD rule:

CREATE OR REPLACE RULE some_other_table_update AS ON UPDATE TO
some_other_real_table DO INSTEAD
(
UPDATE some_other_real_table_1 SET field = NEW.field WHERE id =
OLD.id::int8;
UPDATE some_other_real_table_2 SET field = NEW.field WHERE id =
OLD.id::int8;
...
UPDATE some_other_real_table_39 SET field = NEW.field WHERE id =
OLD.id::int8;
UPDATE some_other_real_table SET field = NEW.field WHERE id =
OLD.id::int8;
);

The explain for your "prepare foo(42)" suggestion shows the correct
index scans for each of the 40 actual tables being updated by the RULE.

and see what plan you get. If the id field has sufficiently
discouraging statistics then the planner may think that a seqscan
is the safest plan. In a "normal" query where you're comparing id
to a constant, the planner can see whether the constant matches any
of the most common values for the column --- if it doesn't then an
indexscan is a good plan.

the "id" column, for *each* of the tables referenced in the RULE is
defined as
id int8 NOT NULL PRIMARY KEY

No value should be any more common than the other.

Could the fact that "some_other_table" is a view influence the
planner in some way?

If you really want a replan every time, you can get it by using
EXECUTE.

Indeed. If big-ugly-updateable-views can't influence the planner,
what positive impact would changing the statistics threshold have on
a primary key column?

As an aside, has there ever been any discussion/thought into some
ability to force all plpgsql queries to by dynamically planned w/o
the need to explicitly wrap them inside EXPLAIN? Maybe something like:

CREATE OR REPLACE FUNCTION foo() LANGUAGE 'plpgsql' OPTIONS
'dynamic_plans=on' AS '....';

or maybe a plpgsql, named 'plpgsql_dont_preplan_my_queries'?

Something like the above would at least make for "prettier" function
sources.

thanks for your time.

eric

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Ridge (#3)
Re: Query plans for plpgsql triggers

"Eric B. Ridge" <ebr@tcdi.com> writes:

I should have mentioned that while the UPDATE statement in the
trigger function really is as simple as the above, "some_other_table"
is actually a view with the requisite ON UPDATE DO INSTEAD rule:

This is the sort of detail that you really should not omit.

If you were using something newer than 7.4 then I'd ask for a complete
test case so I could look into improving the behavior --- but as it is,
I'd first suggest upgrading and seeing if the problem is already fixed.

regards, tom lane

#5Eric Ridge
ebr@tcdi.com
In reply to: Tom Lane (#4)
Re: Query plans for plpgsql triggers

On Mar 25, 2006, at 12:24 AM, Tom Lane wrote:

This is the sort of detail that you really should not omit.

Yeah, it didn't even occur to me until I ran the "explain execute foo
(42)" thing you suggested. We've been using these update rules for
so long that I just think of the views as regular tables (rules are
great, btw).

If you were using something newer than 7.4 then I'd ask for a complete
test case so I could look into improving the behavior --- but as it
is,
I'd first suggest upgrading and seeing if the problem is already
fixed.

We're working towards an upgrade to 8.1.3, and a new schema. Both of
which will likely provide all sorts of new "behaviors."

I'm now curious if complex rules can influence the planner in
negative ways. I don't see how they could -- I've never seen
unexpected EXPLAIN output via psql. However, I can try to work up a
test case against 7.4.12 if you think it'll be beneficial. It'll
take a few days and if you wanted 2 million-ish sample rows, be very
large.

eric

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Ridge (#5)
Re: Query plans for plpgsql triggers

"Eric B. Ridge" <ebr@tcdi.com> writes:

On Mar 25, 2006, at 12:24 AM, Tom Lane wrote:

This is the sort of detail that you really should not omit.

I'm now curious if complex rules can influence the planner in
negative ways.

It's possible. I'm not certain that that is really what you are
seeing, but it could be.

... However, I can try to work up a
test case against 7.4.12 if you think it'll be beneficial.

I doubt we'd consider patching such a problem in 7.4. At this point
the only issues that will get patched in 7.4 are security and data-loss
risks, not performance problems.

Again: demonstrating the issue in 8.1 or CVS HEAD would be a good step
towards getting people motivated to fix it.

regards, tom lane