[PATCH] Query Jumbling for CALL and SET utility statements

Started by Bertrand Drouvotover 3 years ago45 messageshackers
Jump to latest
#1Bertrand Drouvot
bertranddrouvot.pg@gmail.com

Hi hackers,

While query jumbling is provided for function calls that’s currently not
the case for procedures calls.
The reason behind this is that all utility statements are currently
discarded for jumbling.

We’ve recently seen performance impacts (LWLock contention) due to the
lack of jumbling on procedure calls with pg_stat_statements and
pg_stat_statements.track_utility enabled (think an application with a
high rate of procedure calls with unique parameters for each call).

Jeremy has had this conversation on twitter (see
https://twitter.com/jer_s/status/1560003560116342785) and Nikolay
reported that he also had to work on a similar performance issue with
SET being used.

That’s why we think it would make sense to allow jumbling for those 2
utility statements: CALL and SET.

Please find attached a patch proposal for doing so.

With the attached patch we would get things like:

CALL MINUS_TWO(3);
CALL MINUS_TWO(7);
CALL SUM_TWO(3, 8);
CALL SUM_TWO(7, 5);
set enable_seqscan=false;
set enable_seqscan=true;
set seq_page_cost=2.0;
set seq_page_cost=1.0;

postgres=# SELECT query, calls, rows FROM pg_stat_statements;
               query               | calls | rows
-----------------------------------+-------+------
 set seq_page_cost=$1              |     2 |    0
 CALL MINUS_TWO($1)                |     2 |    0
 set enable_seqscan=$1             |     2 |    0
 CALL SUM_TWO($1, $2)              |     2 |    0

Looking forward to your feedback,

Thanks,

Jeremy & Bertrand

--
Bertrand Drouvot
Amazon Web Services:https://aws.amazon.com

Attachments:

v1-0001-JumbleQuery-on-Call-and-Set.patchtext/plain; charset=UTF-8; name=v1-0001-JumbleQuery-on-Call-and-Set.patchDownload+196-5
#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Bertrand Drouvot (#1)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi

st 31. 8. 2022 v 17:34 odesílatel Drouvot, Bertrand <bdrouvot@amazon.com>
napsal:

Hi hackers,

While query jumbling is provided for function calls that’s currently not
the case for procedures calls.
The reason behind this is that all utility statements are currently
discarded for jumbling.

We’ve recently seen performance impacts (LWLock contention) due to the
lack of jumbling on procedure calls with pg_stat_statements and
pg_stat_statements.track_utility enabled (think an application with a high
rate of procedure calls with unique parameters for each call).

Jeremy has had this conversation on twitter (see
https://twitter.com/jer_s/status/1560003560116342785) and Nikolay
reported that he also had to work on a similar performance issue with SET
being used.

That’s why we think it would make sense to allow jumbling for those 2
utility statements: CALL and SET.

Please find attached a patch proposal for doing so.

With the attached patch we would get things like:
CALL MINUS_TWO(3);
CALL MINUS_TWO(7);
CALL SUM_TWO(3, 8);
CALL SUM_TWO(7, 5);
set enable_seqscan=false;
set enable_seqscan=true;
set seq_page_cost=2.0;
set seq_page_cost=1.0;

postgres=# SELECT query, calls, rows FROM pg_stat_statements;
query | calls | rows
-----------------------------------+-------+------
set seq_page_cost=$1 | 2 | 0
CALL MINUS_TWO($1) | 2 | 0
set enable_seqscan=$1 | 2 | 0
CALL SUM_TWO($1, $2) | 2 | 0

Looking forward to your feedback,

The idea is good, but I think you should use pg_stat_functions instead.
Maybe it is supported already (I didn't test it). I am not sure so SET
statement should be traced in pg_stat_statements - it is usually pretty
fast, and without context it says nothing. It looks like just overhead.

Regards

Pavel

Show quoted text

Thanks,

Jeremy & Bertrand

--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

#3Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#1)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On 2022-08-31 17:33:44 +0200, Drouvot, Bertrand wrote:

While query jumbling is provided for function calls that’s currently not the
case for procedures calls.
The reason behind this is that all utility statements are currently
discarded for jumbling.
[...]
That’s why we think it would make sense to allow jumbling for those 2
utility statements: CALL and SET.

Yes, I've seen this be an issue repeatedly. Although more heavily for PREPARE
/ EXECUTE than either of the two cases you handle here. IME not tracking
PREPARE / EXECUTE can distort statistics substantially - there's appears to be
a surprising number of applications / frameworks resorting to them. Basically
requiring that track utility is turned on.

I suspect we should carve out things like CALL, PREPARE, EXECUTE from
track_utility - it's more or less an architectural accident that they're
utility statements. It's a bit less clear that SET should be dealt with that
way.

@@ -383,6 +384,30 @@ JumbleExpr(JumbleState *jstate, Node *node)
APP_JUMB(var->varlevelsup);
}
break;
+		case T_CallStmt:
+			{
+				CallStmt   *stmt = (CallStmt *) node;
+				FuncExpr   *expr = stmt->funcexpr;
+
+				APP_JUMB(expr->funcid);
+				JumbleExpr(jstate, (Node *) expr->args);
+			}
+			break;

Why do we need to take the arguments into account?

+		case T_VariableSetStmt:
+			{
+				VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+				APP_JUMB_STRING(stmt->name);
+				JumbleExpr(jstate, (Node *) stmt->args);
+			}
+			break;

Same?

+		case T_A_Const:
+			{
+				int			loc = ((const A_Const *) node)->location;
+
+				RecordConstLocation(jstate, loc);
+			}
+			break;

I suspect we only need this because of the jumbling of unparsed arguments I
questioned above? If we do end up needing it, shouldn't we include the type
in the jumbling?

Greetings,

Andres Freund

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#2)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

st 31. 8. 2022 v 17:50 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
napsal:

Hi

st 31. 8. 2022 v 17:34 odesílatel Drouvot, Bertrand <bdrouvot@amazon.com>
napsal:

Hi hackers,

While query jumbling is provided for function calls that’s currently not
the case for procedures calls.
The reason behind this is that all utility statements are currently
discarded for jumbling.

We’ve recently seen performance impacts (LWLock contention) due to the
lack of jumbling on procedure calls with pg_stat_statements and
pg_stat_statements.track_utility enabled (think an application with a high
rate of procedure calls with unique parameters for each call).

Jeremy has had this conversation on twitter (see
https://twitter.com/jer_s/status/1560003560116342785) and Nikolay
reported that he also had to work on a similar performance issue with SET
being used.

That’s why we think it would make sense to allow jumbling for those 2
utility statements: CALL and SET.

Please find attached a patch proposal for doing so.

With the attached patch we would get things like:
CALL MINUS_TWO(3);
CALL MINUS_TWO(7);
CALL SUM_TWO(3, 8);
CALL SUM_TWO(7, 5);
set enable_seqscan=false;
set enable_seqscan=true;
set seq_page_cost=2.0;
set seq_page_cost=1.0;

postgres=# SELECT query, calls, rows FROM pg_stat_statements;
query | calls | rows
-----------------------------------+-------+------
set seq_page_cost=$1 | 2 | 0
CALL MINUS_TWO($1) | 2 | 0
set enable_seqscan=$1 | 2 | 0
CALL SUM_TWO($1, $2) | 2 | 0

Looking forward to your feedback,

The idea is good, but I think you should use pg_stat_functions instead.
Maybe it is supported already (I didn't test it). I am not sure so SET
statement should be traced in pg_stat_statements - it is usually pretty
fast, and without context it says nothing. It looks like just overhead.

I was wrong - there is an analogy with SELECT fx, and the statistics are in
pg_stat_statements, and in pg_stat_function too.

Regards

Pavel

Show quoted text

Regards

Pavel

Thanks,

Jeremy & Bertrand

--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com

#5Jeremy Schneider
schnjere@amazon.com
In reply to: Bertrand Drouvot (#1)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

On 8/31/22 8:33 AM, Drouvot, Bertrand wrote:

We’ve recently seen performance impacts (LWLock contention) due to the
lack of jumbling on procedure calls with pg_stat_statements and
pg_stat_statements.track_utility enabled (think an application with a
high rate of procedure calls with unique parameters for each call).

I ran some performance tests with the patch that Bertrand wrote to get
numbers. From my perspective, this patch is scoped very minimally and is
low risk; I don’t think it should need an enormous amount of validation.
It does appear to address the issues with both SET and CALL statements
that Nikolay and I respectively encountered. Honestly, this almost seems
like it was just an minor oversight in the original patch that added
support for CALL and procedures.

I used an r5.large EC2 instance running Linux and tested Bertrand’s
patch using the PostgreSQL 14.4 code base, compiled without and with
Bertrand’s patch. The difference is a lot more extreme on big servers
with lots of cores, but the difference is obvious even on a small
instance like this one.

As a side note: while I certainly don't want to build a database
primarily based on benchmarks, it's nice when benchmarks showcase the
database's strength. Without this patch, HammerDB completely falls over
in stored procedure mode, since one of the procedure arguments is a
time-based unique value on every call. Someone else at Amazon running
HammerDB was how I originally became aware of this problem.

-Jeremy

===== Setup:
$ psql -c "create procedure test(x int) as 'begin return; end' language
plpgsql;"
CREATE PROCEDURE
$ echo -e "\set x random(1,100000000) \n call test(:x)" >test-call.pgbench
$ echo -e "\set x random(1,100000000) \n set application_name=':x'"

test-set.pgbench

===== CALL results without patch:

[postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f
test-set.pgbench
pgbench (14.4)
transaction type: test-set.pgbench
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 100
duration: 15 s
number of transactions actually processed: 728748
latency average = 2.051 ms
initial connection time = 91.844 ms
tps = 48755.446492 (without initial connection time)
statement latencies in milliseconds:
0.000 \set x random(1,100000000)
2.046 set application_name=':x'

pg-14.4 rw postgres@postgres=# select wait_event, count(*) from
pg_stat_activity group by wait_event; \watch 1
...
Tue 30 Aug 2022 08:26:35 PM UTC (every 1s)

wait_event | count
---------------------+-------
[NULL] | 6
pg_stat_statements | 95
BgWriterMain | 1
ArchiverMain | 1
WalWriterMain | 1
AutoVacuumMain | 1
CheckpointerMain | 1
LogicalLauncherMain | 1
(8 rows)
...

===== CALL results with patch:

[postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f
test-call.pgbench
pgbench (14.4)
transaction type: test-call.pgbench
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 100
duration: 15 s
number of transactions actually processed: 1098776
latency average = 1.361 ms
initial connection time = 89.002 ms
tps = 73491.904878 (without initial connection time)
statement latencies in milliseconds:
0.000 \set x random(1,100000000)
1.383 call test(:x)

pg-14.4 rw postgres@postgres=# select wait_event, count(*) from
pg_stat_activity group by wait_event; \watch 1
...
Tue 30 Aug 2022 08:42:51 PM UTC (every 1s)

wait_event | count
---------------------+-------
[NULL] | 99
BgWriterHibernate | 1
ArchiverMain | 1
WalWriterMain | 1
AutoVacuumMain | 1
CheckpointerMain | 1
ClientRead | 2
LogicalLauncherMain | 1
(8 rows)
...

===== SET results without patch:

[postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f
test-set.pgbench
pgbench (14.4)
transaction type: test-set.pgbench
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 100
duration: 15 s
number of transactions actually processed: 728748
latency average = 2.051 ms
initial connection time = 91.844 ms
tps = 48755.446492 (without initial connection time)
statement latencies in milliseconds:
0.000 \set x random(1,100000000)
2.046 set application_name=':x'

pg-14.4 rw postgres@postgres=# select wait_event, count(*) from
pg_stat_activity group by wait_event; \watch 1
...
Tue 30 Aug 2022 08:26:35 PM UTC (every 1s)

wait_event | count
---------------------+-------
[NULL] | 6
pg_stat_statements | 95
BgWriterMain | 1
ArchiverMain | 1
WalWriterMain | 1
AutoVacuumMain | 1
CheckpointerMain | 1
LogicalLauncherMain | 1
(8 rows)
...

===== SET results with patch:

[postgres@ip-172-31-44-176 ~]$ pgbench -n -c 100 -j 100 -T15 -r -f
test-set.pgbench
pgbench (14.4)
transaction type: test-set.pgbench
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 100
duration: 15 s
number of transactions actually processed: 1178844
latency average = 1.268 ms
initial connection time = 89.159 ms
tps = 78850.178814 (without initial connection time)
statement latencies in milliseconds:
0.000 \set x random(1,100000000)
1.270 set application_name=':x'

pg-14.4 rw postgres@postgres=# select wait_event, count(*) from
pg_stat_activity group by wait_event; \watch 1
...
Tue 30 Aug 2022 08:44:30 PM UTC (every 1s)

wait_event | count
---------------------+-------
[NULL] | 101
BgWriterHibernate | 1
ArchiverMain | 1
WalWriterMain | 1
AutoVacuumMain | 1
CheckpointerMain | 1
LogicalLauncherMain | 1
(7 rows)
...

--
Jeremy Schneider
Database Engineer
Amazon Web Services

#6Jeremy Schneider
schnjere@amazon.com
In reply to: Andres Freund (#3)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

On 8/31/22 9:08 AM, Andres Freund wrote:

I suspect we should carve out things like CALL, PREPARE, EXECUTE from
track_utility - it's more or less an architectural accident that they're
utility statements. It's a bit less clear that SET should be dealt with that
way.

Regarding SET, the compelling use case was around "application_name"
whose purpose is to provide a label in pg_stat_activity and on log
lines, which can be used to improve observability and connect queries to
their source in application code. Nikolay's incident (on peak shopping
day for an eCommerce corp) evidently involved an application which
leveraged this, but as a result the contention on the pg_stat_statements
LWLock in exclusive mode effectively caused an outage for the retailer?
Or nearly did? My description here is based on Nikolay's public twitter
comment.

I've seen a lot of applications that make heavy use of temp tables,
where DDL would be pretty important to track as part of the regular
workload. So that probably should be added to the list alongside
prepared statements. And I'd want to spend a little more time thinking
about what other use cases might be missing. I'm hesitant about the
general idea of carving out some utility statements away from this
"track_utility" GUC.

Personally, at this point, I think pg_stat_statements is critical
infrastructure for anyone running PostgreSQL at scale. The information
it provides is indispensable. I don't think it's really defensible to
tell people that if they want to scale, then they need to fly blind on
any utility statements.

-Jeremy

--
Jeremy Schneider
Database Engineer
Amazon Web Services

#7Andres Freund
andres@anarazel.de
In reply to: Jeremy Schneider (#6)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On 2022-08-31 11:00:05 -0700, Jeremy Schneider wrote:

On 8/31/22 9:08 AM, Andres Freund wrote:

I suspect we should carve out things like CALL, PREPARE, EXECUTE from
track_utility - it's more or less an architectural accident that they're
utility statements. It's a bit less clear that SET should be dealt with that
way.

Regarding SET, the compelling use case was around "application_name"
whose purpose is to provide a label in pg_stat_activity and on log
lines, which can be used to improve observability and connect queries to
their source in application code.

I wasn't saying that SET shouldn't be jumbled, just that it seems more
reasonable to track it only when track_utility is enabled, rather than doing
so even when that's disabled. Which I do think makes sense for executing a
prepared statement and calling a procedure, since they're really only utility
statements by accident.

Personally, at this point, I think pg_stat_statements is critical
infrastructure for anyone running PostgreSQL at scale. The information
it provides is indispensable. I don't think it's really defensible to
tell people that if they want to scale, then they need to fly blind on
any utility statements.

I wasn't suggesting doing so at all.

Greetings,

Andres Freund

#8Jeremy Schneider
schnjere@amazon.com
In reply to: Andres Freund (#7)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

On 8/31/22 12:06 PM, Andres Freund wrote:

Regarding SET, the compelling use case was around "application_name"
whose purpose is to provide a label in pg_stat_activity and on log
lines, which can be used to improve observability and connect queries to
their source in application code.

I wasn't saying that SET shouldn't be jumbled, just that it seems more
reasonable to track it only when track_utility is enabled, rather than doing
so even when that's disabled. Which I do think makes sense for executing a
prepared statement and calling a procedure, since they're really only utility
statements by accident.

Hey Andres, sorry for misunderstanding your email!

Based on this quick test I just now ran (transcript below), I think that
PREPARE/EXECUTE is already excluded from track_utility?

I get your point about CALL, maybe it does make sense to also exclude
this. It might also be worth a small update to the doc for track_utility
about how it behaves, in this regard.

https://www.postgresql.org/docs/14/pgstatstatements.html#id-1.11.7.39.9

Example updated sentence:

|pg_stat_statements.track_utility| controls whether <<most>> utility

commands are tracked by the module. Utility commands are all those other
than |SELECT|, |INSERT|, |UPDATE| and |DELETE| <<, but this parameter
does not disable tracking of PREPARE, EXECUTE or CALL>>. The default
value is |on|. Only superusers can change this setting.

=====

pg-14.4 rw root@db1=# set pg_stat_statements.track_utility=on;
SET
pg-14.4 rw root@db1=# select pg_stat_statements_reset();
 pg_stat_statements_reset
--------------------------

(1 row)
pg-14.4 rw root@db1=# prepare test as select /* unique123 */ 1;
PREPARE
pg-14.4 rw root@db1=# execute test;
 ?column?
----------
        1
(1 row)

pg-14.4 rw root@db1=# set application_name='test';
SET
pg-14.4 rw root@db1=# select substr(query,1,50) from pg_stat_statements;
                  substr
-------------------------------------------
 prepare test as select /* unique123 */ $1
 select pg_stat_statements_reset()
 set application_name=$1
(3 rows)

=====

pg-14.4 rw root@db1=# set pg_stat_statements.track_utility=off;
SET
pg-14.4 rw root@db1=# select pg_stat_statements_reset();
 pg_stat_statements_reset
--------------------------

(1 row)

pg-14.4 rw root@db1=# prepare test as select /* unique123 */ 1;
PREPARE
pg-14.4 rw root@db1=# execute test;
 ?column?
----------
        1
(1 row)

pg-14.4 rw root@db1=# set application_name='test';
SET
pg-14.4 rw root@db1=# select substr(query,1,50) from pg_stat_statements;
                  substr
-------------------------------------------
 prepare test as select /* unique123 */ $1
 select pg_stat_statements_reset()
(2 rows)

--
Jeremy Schneider
Database Engineer
Amazon Web Services

#9Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Andres Freund (#3)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On 8/31/22 6:08 PM, Andres Freund wrote:

Hi,

On 2022-08-31 17:33:44 +0200, Drouvot, Bertrand wrote:

@@ -383,6 +384,30 @@ JumbleExpr(JumbleState *jstate, Node *node)
APP_JUMB(var->varlevelsup);
}
break;
+             case T_CallStmt:
+                     {
+                             CallStmt   *stmt = (CallStmt *) node;
+                             FuncExpr   *expr = stmt->funcexpr;
+
+                             APP_JUMB(expr->funcid);
+                             JumbleExpr(jstate, (Node *) expr->args);
+                     }
+                     break;

Why do we need to take the arguments into account?

Thanks for looking at it!

Agree that It's not needed to "solve" the Lock contention issue, but I
think it's needed for the "render".

Without it we would get, things like:

postgres=# call MY_PROC(10);
CALL
postgres=# call MY_PROC(100000000);
CALL
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
               query               | calls | rows
-----------------------------------+-------+------
 select pg_stat_statements_reset() |     1 |    1
 call MY_PROC(10)                  |     2 |    0
(2 rows)

instead of

postgres=# SELECT query, calls, rows FROM pg_stat_statements;
               query               | calls | rows
-----------------------------------+-------+------
 select pg_stat_statements_reset() |     1 |    1
 call MY_PROC($1)                  |     2 |    0
(2 rows)

+             case T_VariableSetStmt:
+                     {
+                             VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+                             APP_JUMB_STRING(stmt->name);
+                             JumbleExpr(jstate, (Node *) stmt->args);
+                     }
+                     break;

Same?

yeah, same reason. Without it we would get things like:

postgres=# set enable_seqscan=false;
SET
postgres=# set enable_seqscan=true;
SET
postgres=# SELECT query, calls, rows FROM pg_stat_statements;
               query               | calls | rows
-----------------------------------+-------+------
 select pg_stat_statements_reset() |     1 |    1
 set enable_seqscan=false          |     2 |    0
(2 rows)

instead of

postgres=# SELECT query, calls, rows FROM pg_stat_statements;
               query               | calls | rows
-----------------------------------+-------+------
 set enable_seqscan=$1             |     2 |    0
 select pg_stat_statements_reset() |     1 |    1
(2 rows)

+             case T_A_Const:
+                     {
+                             int                     loc = ((const A_Const *) node)->location;
+
+                             RecordConstLocation(jstate, loc);
+                     }
+                     break;

I suspect we only need this because of the jumbling of unparsed arguments I
questioned above?

Right but only for the T_VariableSetStmt case.

If we do end up needing it, shouldn't we include the type
in the jumbling?

I don't think so as this is only for the T_VariableSetStmt case.

And looking closer I don't see such as thing as "consttype" (that we can
find in the Const struct) in the A_Const struct.

Regards,

--
Bertrand Drouvot
Amazon Web Services:https://aws.amazon.com

#10Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Jeremy Schneider (#8)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On 8/31/22 10:05 PM, Jeremy Schneider wrote:

On 8/31/22 12:06 PM, Andres Freund wrote:

Regarding SET, the compelling use case was around "application_name"
whose purpose is to provide a label in pg_stat_activity and on log
lines, which can be used to improve observability and connect queries to
their source in application code.

I wasn't saying that SET shouldn't be jumbled, just that it seems more
reasonable to track it only when track_utility is enabled, rather than doing
so even when that's disabled. Which I do think makes sense for executing a
prepared statement and calling a procedure, since they're really only utility
statements by accident.

I get your point about CALL, maybe it does make sense to also exclude
this.

That's a good point and i think we should track CALL whatever the value
of pgss_track_utility is.

I think so because we are tracking function calls in all the cases
(because "linked" to select aka not a utility) and i don't see any
reasons why not to do the same for procedure calls.

Please find attached v2 as an attempt to do so.

With v2 we get things like:

postgres=# set pg_stat_statements.track_utility=on;
SET
postgres=# call MY_PROC(20);
CALL
postgres=# call MY_PROC(10);
CALL
postgres=# set enable_seqscan=false;
SET
postgres=# set enable_seqscan=true;
SET
postgres=# select queryid,query,calls from pg_stat_statements;
       queryid       |                  query                  | calls
---------------------+-----------------------------------------+-------
 4670878543381973400 | set pg_stat_statements.track_utility=$1 |     1
 -640317129591544054 | set enable_seqscan=$1 |     2
  492647827690744963 | select pg_stat_statements_reset() |     1
 6541399678435597534 | call MY_PROC($1) |     2

and

postgres=# set pg_stat_statements.track_utility=off;
SET
postgres=# call MY_PROC(10);
CALL
postgres=# call MY_PROC(20);
CALL
postgres=# set enable_seqscan=true;
SET
postgres=# set enable_seqscan=false;
SET
postgres=# select queryid,query,calls from pg_stat_statements;
       queryid       |                  query                  | calls
---------------------+-----------------------------------------+-------
 4670878543381973400 | set pg_stat_statements.track_utility=$1 |     1
  492647827690744963 | select pg_stat_statements_reset() |     1
 6541399678435597534 | call MY_PROC($1) |     2
(3 rows)

It might also be worth a small update to the doc for track_utility
about how it behaves, in this regard.

https://www.postgresql.org/docs/14/pgstatstatements.html#id-1.11.7.39.9

Example updated sentence:

|pg_stat_statements.track_utility| controls whether <<most>> utility

commands are tracked by the module. Utility commands are all those
other than |SELECT|, |INSERT|, |UPDATE| and |DELETE| <<, but this
parameter does not disable tracking of PREPARE, EXECUTE or CALL>>. The
default value is |on|. Only superusers can change this setting.

Agree, wording added to v2.

Regards,

--
Bertrand Drouvot
Amazon Web Services:https://aws.amazon.com

Attachments:

v2-0001-JumbleQuery-on-Call-and-Set.patchtext/plain; charset=UTF-8; name=v2-0001-JumbleQuery-on-Call-and-Set.patchDownload+275-9
#11Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#10)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Please find attached v2 as an attempt to do so.

+1 to the idea.
I think it will be better to evaluate jstate instead of
JUMBLE_UTILITY, such as:

if (query->utilityStmt && !jstate)

instead of

if (query->utilityStmt && !JUMBLE_UTILITY(query->utilityStmt))

This will allow for support of potentially other utility statements
In the future without having to teach pg_stat_statements about them.
If a jstate is set for the utility statements, pgss will do the right thing.

Thanks

--
Sami Imseih
Amazon Web Services (AWS)

#12Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Sami Imseih (#11)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On 9/1/22 5:13 PM, Imseih (AWS), Sami wrote:

Please find attached v2 as an attempt to do so.

+1 to the idea.

Thanks for looking at it!

I think it will be better to evaluate jstate instead of

JUMBLE_UTILITY, such as:

if (query->utilityStmt && !jstate)

instead of

if (query->utilityStmt && !JUMBLE_UTILITY(query->utilityStmt))

This will allow for support of potentially other utility statements

In the future without having to teach pg_stat_statements about them.

If a jstate is set for the utility statements, pgss will do the right
thing.

Fair point, thanks!

v3 including this change is attached.

Thanks,

--

Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services:https://aws.amazon.com

Attachments:

v3-0001-JumbleQuery-on-Call-and-Set.patchtext/plain; charset=UTF-8; name=v3-0001-JumbleQuery-on-Call-and-Set.patchDownload+275-9
#13Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#12)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

This is ready for committer but I suggest the following for the
doc changes:

1.
Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) are
combined into a single pg_stat_statements entry whenever they have
identical query structures according to an internal hash calculation.
Typically, two queries will be considered the same for this purpose
if they are semantically equivalent except for the values of literal
constants appearing in the query. Utility commands (that is, all other commands)
are compared strictly on the basis of their textual query strings, however.
-- to --
Plannable queries (that is, SELECT, INSERT, UPDATE, and DELETE) as
well as CALL and SET commands are combined into a single
pg_stat_statements entry whenever they have identical query
structures according to an internal hash calculation.
Typically, two queries will be considered the same for this purpose
if they are semantically equivalent except for the values of literal
constants appearing in the command. All other commands are compared
strictly on the basis of their textual query strings, however.

2.

pg_stat_statements.track_utility controls whether utility
commands are tracked by the module. Utility commands
are all those other than SELECT, INSERT, UPDATE and DELETE.
The default value is on. Only superusers can change this setting.
-- to --
pg_stat_statements.track_utility controls whether utility commands
are tracked by the module. Tracked utility commands are all those
other than SELECT, INSERT, UPDATE, DELETE, CALL and SET.
The default value is on. Only superusers can change this setting.

--
Thanks,
Sami Imseih
Amazon Web Services (AWS)

From: "Drouvot, Bertrand" <bdrouvot@amazon.com>
Date: Friday, September 2, 2022 at 4:06 AM
To: "Imseih (AWS), Sami" <simseih@amazon.com>, "Schneider (AWS), Jeremy" <schnjere@amazon.com>, Andres Freund <andres@anarazel.de>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>, Pavel Stehule <pavel.stehule@gmail.com>, Nikolay Samokhvalov <samokhvalov@gmail.com>
Subject: Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,
On 9/1/22 5:13 PM, Imseih (AWS), Sami wrote:

Please find attached v2 as an attempt to do so.

+1 to the idea.

Thanks for looking at it!
I think it will be better to evaluate jstate instead of
JUMBLE_UTILITY, such as:

if (query->utilityStmt && !jstate)

instead of

if (query->utilityStmt && !JUMBLE_UTILITY(query->utilityStmt))

This will allow for support of potentially other utility statements
In the future without having to teach pg_stat_statements about them.
If a jstate is set for the utility statements, pgss will do the right thing.

Fair point, thanks!

v3 including this change is attached.

Thanks,
--

Bertrand Drouvot

PostgreSQL Contributors Team

RDS Open Source Databases

Amazon Web Services: https://aws.amazon.com

#14Jeremy Schneider
schnjere@amazon.com
In reply to: Bertrand Drouvot (#12)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

On 9/2/22 2:06 AM, Drouvot, Bertrand wrote:

v3 including this change is attached.

FYI: "reset all" core dumps with v3

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

[postgres@ip-172-31-44-176 data]$ gdb /usr/local/pgsql-14.4/bin/postgres
core.27217
...
Core was generated by `postgres: postgres postgres [local]
RESET                                '.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f7776ae4821 in __strlen_sse2_pminub () from /lib64/libc.so.6
...
(gdb) bt
#0  0x00007f7776ae4821 in __strlen_sse2_pminub () from /lib64/libc.so.6
#1  0x00000000008e061c in JumbleExpr (jstate=0x1cf7f80, node=<optimized
out>) at queryjumble.c:400
#2  0x00000000008dfdd8 in JumbleQueryInternal (jstate=0x1cf7f80,
query=0x1cf7e70) at queryjumble.c:247
#3  0x00000000008e0b4b in JumbleQuery (query=query@entry=0x1cf7e70,
querytext=querytext@entry=0x1cf72f8 "reset all;") at queryjumble.c:127
#4  0x000000000056ba4b in parse_analyze (parseTree=0x1cf7ce0,
sourceText=0x1cf72f8 "reset all;", paramTypes=0x0, numParams=<optimized
out>, queryEnv=0x0) at analyze.c:130
#5  0x000000000079df63 in pg_analyze_and_rewrite
(parsetree=parsetree@entry=0x1cf7ce0,
query_string=query_string@entry=0x1cf72f8 "reset all;",
    paramTypes=paramTypes@entry=0x0, numParams=numParams@entry=0,
queryEnv=queryEnv@entry=0x0) at postgres.c:657
#6  0x000000000079e472 in exec_simple_query (query_string=0x1cf72f8
"reset all;") at postgres.c:1130
#7  0x000000000079f9d3 in PostgresMain (argc=argc@entry=1,
argv=argv@entry=0x7ffd0c341f80, dbname=0x1d44948 "postgres",
username=<optimized out>) at postgres.c:4496
#8  0x000000000048c9f3 in BackendRun (port=<optimized out>,
port=<optimized out>) at postmaster.c:4530
#9  BackendStartup (port=0x1d3bdd0) at postmaster.c:4252
#10 ServerLoop () at postmaster.c:1745
#11 0x0000000000721332 in PostmasterMain (argc=argc@entry=5,
argv=argv@entry=0x1cf1e10) at postmaster.c:1417
#12 0x000000000048da6e in main (argc=5, argv=0x1cf1e10) at main.c:209

--
Jeremy Schneider
Database Engineer
Amazon Web Services

#15Michael Paquier
michael@paquier.xyz
In reply to: Jeremy Schneider (#14)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

What happens on HEAD? That would be the target branch for a new
feature.
--
Michael

#16Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#15)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On 9/8/22 7:23 AM, Michael Paquier wrote:

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

Thanks Jeremy for reporting the issue!

What happens on HEAD? That would be the target branch for a new
feature.

Just tested and i can see the same issue on HEAD.

Issue is on stmt->name being NULL here:

Breakpoint 2, JumbleExpr (jstate=0x55d60e769e30, node=0x55d60e769b60) at
queryjumble.c:364
364             if (node == NULL)
(gdb) n
368             check_stack_depth();
(gdb)
374             APP_JUMB(node->type);
(gdb)
376             switch (nodeTag(node))
(gdb)
398                                     VariableSetStmt *stmt =
(VariableSetStmt *) node;
(gdb) n
400 APP_JUMB_STRING(stmt->name);
(gdb) p stmt->name
$1 = 0x0

I'll have a closer look.

Regards,

--

Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services:https://aws.amazon.com

#17Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#15)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

What happens on HEAD? That would be the target branch for a new
feature.

It would be the same AFAICS. From v3:

+		case T_VariableSetStmt:
+			{
+				VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+				APP_JUMB_STRING(stmt->name);
+				JumbleExpr(jstate, (Node *) stmt->args);
+			}

For a RESET ALL command stmt->name is NULL.

#18Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Julien Rouhaud (#17)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On 9/8/22 8:50 AM, Julien Rouhaud wrote:

Thanks for looking at it!

On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

What happens on HEAD? That would be the target branch for a new
feature.

It would be the same AFAICS. From v3:

+               case T_VariableSetStmt:
+                       {
+                               VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+                               APP_JUMB_STRING(stmt->name);
+                               JumbleExpr(jstate, (Node *) stmt->args);
+                       }

For a RESET ALL command stmt->name is NULL.

Right, please find attached v4 addressing the issue and also Sami's
comments [1]/messages/by-id/82A35172-BEB3-4DFA-B11C-AE5E50A0F932@amazon.com.

[1]: /messages/by-id/82A35172-BEB3-4DFA-B11C-AE5E50A0F932@amazon.com
/messages/by-id/82A35172-BEB3-4DFA-B11C-AE5E50A0F932@amazon.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services:https://aws.amazon.com

Attachments:

v4-0001-JumbleQuery-on-Call-and-Set.patchtext/plain; charset=UTF-8; name=v4-0001-JumbleQuery-on-Call-and-Set.patchDownload+286-15
#19Julien Rouhaud
rjuju123@gmail.com
In reply to: Bertrand Drouvot (#18)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On Thu, Sep 08, 2022 at 11:06:51AM +0200, Drouvot, Bertrand wrote:

Hi,

On 9/8/22 8:50 AM, Julien Rouhaud wrote:

Thanks for looking at it!

On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

What happens on HEAD? That would be the target branch for a new
feature.

It would be the same AFAICS. From v3:

+               case T_VariableSetStmt:
+                       {
+                               VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+                               APP_JUMB_STRING(stmt->name);
+                               JumbleExpr(jstate, (Node *) stmt->args);
+                       }

For a RESET ALL command stmt->name is NULL.

Right, please find attached v4 addressing the issue and also Sami's comments
[1].

(Sorry I've not been following this thread until now)

IME if your application relies on 2PC it's very likely that you will hit the
exact same problems described in your original email. What do you think about
normalizing those too while working on the subject?

#20Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Julien Rouhaud (#19)
Re: [PATCH] Query Jumbling for CALL and SET utility statements

Hi,

On 9/8/22 1:29 PM, Julien Rouhaud wrote:

Hi,

On Thu, Sep 08, 2022 at 11:06:51AM +0200, Drouvot, Bertrand wrote:

Hi,

On 9/8/22 8:50 AM, Julien Rouhaud wrote:

Thanks for looking at it!

On Thu, Sep 08, 2022 at 02:23:19PM +0900, Michael Paquier wrote:

On Wed, Sep 07, 2022 at 06:19:42PM -0700, Jeremy Schneider wrote:

I didn't fully debug yet, but here's the backtrace on my 14.4 build with
the patch

What happens on HEAD? That would be the target branch for a new
feature.

It would be the same AFAICS. From v3:

+               case T_VariableSetStmt:
+                       {
+                               VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+                               APP_JUMB_STRING(stmt->name);
+                               JumbleExpr(jstate, (Node *) stmt->args);
+                       }

For a RESET ALL command stmt->name is NULL.

Right, please find attached v4 addressing the issue and also Sami's comments
[1].

(Sorry I've not been following this thread until now)

IME if your application relies on 2PC it's very likely that you will hit the
exact same problems described in your original email.

Agree

What do you think about
normalizing those too while working on the subject?

That sounds reasonable, I'll have a look at those too while at it.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services:https://aws.amazon.com

#21Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#20)
#22bt22kawamotok
bt22kawamotok@oss.nttdata.com
In reply to: Bertrand Drouvot (#21)
#23Julien Rouhaud
rjuju123@gmail.com
In reply to: bt22kawamotok (#22)
#24Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Julien Rouhaud (#23)
#25Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#21)
#26Fujii Masao
masao.fujii@gmail.com
In reply to: Bertrand Drouvot (#21)
#27Sami Imseih
samimseih@gmail.com
In reply to: Fujii Masao (#26)
#28Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Fujii Masao (#26)
#29Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#28)
#30Fujii Masao
masao.fujii@gmail.com
In reply to: Bertrand Drouvot (#29)
#31Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Fujii Masao (#30)
#32Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#29)
#33Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#31)
#34Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#32)
#35Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#35)
#37Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#36)
#38Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#36)
#39Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#37)
#40Julien Rouhaud
rjuju123@gmail.com
In reply to: Bertrand Drouvot (#39)
#41Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#40)
#42Sami Imseih
samimseih@gmail.com
In reply to: Bertrand Drouvot (#39)
#43Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#42)
#44Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#43)
#45Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#41)