stored procedures vs pg_stat_statements

Started by Merlin Moncureabout 1 year ago6 messages
#1Merlin Moncure
mmoncure@gmail.com

Hello,

I've noticed that in pg_stat_statements, stored procedures are almost
always given a unique query id unless the query is textually identical.
pg_stat_statements.c makes this pretty clear as procedures are considered
utility statements and no normalization analysis is done against them.
Client side invoked statements appear to be able to be parameterized so
that they might be normalized correctly, but AFAICT there is no way to do
this from the server or any non-parameterizing client (say, dblink).

Suffice it to say, pg_stat_statements is an administrator's dream and
heavily procedure wrapped databases might struggle to generate useful
statistics leading to lack of insight.

From pg_stat_statements.c in ProcessUtility():

/*
* Force utility statements to get queryId zero. We do this even
in cases
* where the statement contains an optimizable statement for which a
* queryId could be derived (such as EXPLAIN or DECLARE CURSOR).
For such
* cases, runtime control will first go through ProcessUtility and
then
* the executor, and we don't want the executor hooks to do
anything,
* since we are already measuring the statement's costs at the
utility
* level.
*
* Note that this is only done if pg_stat_statements is enabled and
* configured to track utility statements, in the unlikely
possibility
* that user configured another extension to handle utility
statements
* only.
*/
Can this simply be disabled for stored procedures as a special case? I'm
hoping this might do something useful that is also safe. Curious if anyone
has any thoughts on this.

merlin

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#1)
Re: stored procedures vs pg_stat_statements

Merlin Moncure <mmoncure@gmail.com> writes:

Can this simply be disabled for stored procedures as a special case? I'm
hoping this might do something useful that is also safe. Curious if anyone
has any thoughts on this.

No, I don't think that would help. The restriction on utility
statements would cover CREATE PROCEDURE/FUNCTION, not calls
of those things which is what I suppose you care about.

Do you have pg_stat_statements.track set to "all"? That should
allow statements within stored procedures to be tracked, which
again is what I'm guessing you care about.

regards, tom lane

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#2)
Re: stored procedures vs pg_stat_statements

On Mon, Dec 23, 2024 at 3:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

Can this simply be disabled for stored procedures as a special case? I'm
hoping this might do something useful that is also safe. Curious if

anyone

has any thoughts on this.

No, I don't think that would help. The restriction on utility
statements would cover CREATE PROCEDURE/FUNCTION, not calls
of those things which is what I suppose you care about.

Do you have pg_stat_statements.track set to "all"? That should
allow statements within stored procedures to be tracked, which
again is what I'm guessing you care about.

I'm aware of that and will set it -- it's the only option if I'm following
you. The way I've been doing things lately for bulk processing is a lot
of orchestrated procedures that are organized for purposes of monitoring
and easy administration, and telemetry would tend to be at that level, but
more granular tracking will get the job done and ought to be perfectly fine
as long as overhead is reasonable.

Mainly, I was curious if the behavior not to parse constants out of stored
procedure invocations was an unintentional artifact of the
utility statement approach. I guess it might be, but also that there is
nothing to solve here. Thanks for taking the time to respond.

merlin

#4Michael Paquier
michael@paquier.xyz
In reply to: Merlin Moncure (#3)
Re: stored procedures vs pg_stat_statements

On Mon, Dec 23, 2024 at 10:06:58PM -0600, Merlin Moncure wrote:

I'm aware of that and will set it -- it's the only option if I'm following
you. The way I've been doing things lately for bulk processing is a lot
of orchestrated procedures that are organized for purposes of monitoring
and easy administration, and telemetry would tend to be at that level, but
more granular tracking will get the job done and ought to be perfectly fine
as long as overhead is reasonable.

Mainly, I was curious if the behavior not to parse constants out of stored
procedure invocations was an unintentional artifact of the
utility statement approach. I guess it might be, but also that there is
nothing to solve here.

Hmm. So you mean that a combination of track_utility = off and track
= 'all' leads to the internals of CALL to not be normalized on first
call, while if we have track_utility = on and track = 'all' then the
internals of CALL are normalized. The behavior is different depending
on if the procedure is cached or not, as well, the second call
following the traces of the first call and we ignore the combinations
of the two GUCs. This kind of inconsistency is what I would call a
bug. I'm pretty sure that it is saner to say that we should apply
normalization all the time if we can, not avoid normalization in some
cases like the one you are pointing at.

Testing quickly down to 14, this is the same behavior on all the
branches up to HEAD.
--
Michael

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Michael Paquier (#4)
Re: stored procedures vs pg_stat_statements

On Mon, Dec 23, 2024 at 11:01 PM Michael Paquier <michael@paquier.xyz>
wrote:

On Mon, Dec 23, 2024 at 10:06:58PM -0600, Merlin Moncure wrote:

I'm aware of that and will set it -- it's the only option if I'm

following

you. The way I've been doing things lately for bulk processing is a lot
of orchestrated procedures that are organized for purposes of monitoring
and easy administration, and telemetry would tend to be at that level,

but

more granular tracking will get the job done and ought to be perfectly

fine

as long as overhead is reasonable.

Mainly, I was curious if the behavior not to parse constants out of

stored

procedure invocations was an unintentional artifact of the
utility statement approach. I guess it might be, but also that there is
nothing to solve here.

Hmm. So you mean that a combination of track_utility = off and track
= 'all' leads to the internals of CALL to not be normalized on first
call, while if we have track_utility = on and track = 'all' then the
internals of CALL are normalized. The behavior is different depending
on if the procedure is cached or not, as well, the second call
following the traces of the first call and we ignore the combinations
of the two GUCs. This kind of inconsistency is what I would call a
bug. I'm pretty sure that it is saner to say that we should apply
normalization all the time if we can, not avoid normalization in some
cases like the one you are pointing at.

Actually, I hadn't gotten that far yet; I was just noticing that:
CALL foo(1,2,3);
CALL foo(2,3,4);
...resolved to different queryids and if that was expected, and if not, if
some logic tune-ups in the extension improve behavior without deeper
changes.

With client side preparation, you can work around this (e.g. CALL foo($1,
$2)) but it's impossible from any non-paramaterizing client or the server
(for top level) since explicit prepared statements are not allowed for
stored procedures -- that's pretty limiting IMNSHO.

If there are issues with the non-top level approach -- that makes it
worse. For my part, I'm going to tweak the extension to see if there's any
relief there. Thanks for taking a look.

merlin

#6Michael Paquier
michael@paquier.xyz
In reply to: Merlin Moncure (#5)
Re: stored procedures vs pg_stat_statements

On Tue, Dec 24, 2024 at 08:23:57AM -0600, Merlin Moncure wrote:

Actually, I hadn't gotten that far yet; I was just noticing that:
CALL foo(1,2,3);
CALL foo(2,3,4);
...resolved to different queryids and if that was expected, and if not, if
some logic tune-ups in the extension improve behavior without deeper
changes.

Oh, yeah, definitely. This has been a severe issue for some customers
I'm dealing with that have a heavy workload based on CALL who also
wish to track utilities on top of non-utility queries. This level of
normalization is supported since 17~ and commit 11c34b342bd7, so you
would need to upgrade if you want that support natively as this
requires in-core backend adjustments:
CREATE OR REPLACE PROCEDURE sum_two(i int, j int) AS $$
DECLARE
r int;
BEGIN
SELECT (i+j)::int INTO r;
END; $$ LANGUAGE plpgsql;
CALL sum_two(1,1);
CALL sum_two(2,1000);
CALL sum_two(3,3);
SELECT calls, query FROM pg_stat_statements WHERE query ~ 'CALL';
calls | query
-------+---------------------
3 | CALL sum_two($1,$2)
(1 row)

One piece of facility that was lacking is support for query jumbling
with the parsing nodes of utilities, which was larger than this
specific commit, still it allowed the change. If you see other areas
that could make pgss more useful with more utility normalization, feel
free to point them out. The point would be then to figure out which
portions of the parsing nodes need to be ignored in the query
jumbling and the constant locations.

contrib/pg_stat_statements/sql/utility.sql has a test section
dedicated to CALL, FWIW.
--
Michael