Add parallel columns for pg_stat_statements
Hello,
This patch was a bit discussed on [1]/messages/by-id/b4220d15-2e21-0e98-921b-b9892543cc93@dalibo.com, and with more details on [2]/messages/by-id/d657df20-c4bf-63f6-e74c-cb85a81d0383@dalibo.com. It's
based on another patch sent in 2022 (see [3]/messages/by-id/6acbe570-068e-bd8e-95d5-00c737b865e8@gmail.com). It introduces seven new
columns in pg_stat_statements:
* parallelized_queries_planned, number of times the query has been planned
to be parallelized,
* parallelized_queries_launched, number of times the query has been
executed with parallelization,
* parallelized_workers_planned, number of parallel workers planned for
this query,
* parallelized_workers_launched, number of parallel workers executed for
this query,
* parallelized_nodes, number of parallelized nodes,
* parallelized_nodes_all_workers, number of parallelized nodes which had
all requested workers,
* parallelized_nodes_no_worker, number of parallelized nodes which had no
requested workers.
As Benoit said yesterday, the intent is to help administrators evaluate the
usage of parallel workers in their databases and help configuring
parallelization usage.
A test script (test2.sql) is attached. You can execute it with "psql -Xef
test2.sql your_database" (your_database should not contain a t1 table as it
will be dropped and recreated).
Here is its result, a bit commented:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION
SELECT pg_stat_statements_reset();
pg_stat_statements_reset
-------------------------------
2024-08-29 18:00:35.314557+02
(1 row)
DROP TABLE IF EXISTS t1;
DROP TABLE
CREATE TABLE t1 (id integer);
CREATE TABLE
INSERT INTO t1 SELECT generate_series(1, 10_000_000);
INSERT 0 10000000
VACUUM ANALYZE t1;
VACUUM
SELECT query,
parallelized_queries_planned, parallelized_queries_launched,
parallelized_workers_planned, parallelized_workers_launched,
parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
(0 rows)
SELECT * FROM t1 LIMIT 1;
id
----
1
(1 row)
SELECT pg_sleep(1);
SELECT query,
parallelized_queries_planned, parallelized_queries_launched,
parallelized_workers_planned, parallelized_workers_launched,
parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
-[ RECORD 1 ]------------------+--------------------------
query | SELECT * FROM t1 LIMIT $1
parallelized_queries_planned | 0
parallelized_queries_launched | 0
parallelized_workers_planned | 0
parallelized_workers_launched | 0
parallelized_nodes | 0
parallelized_nodes_all_workers | 0
parallelized_nodes_no_worker | 0
==> no parallelization
SELECT count(*) FROM t1;
count
----------
10000000
(1 row)
SELECT pg_sleep(1);
SELECT query,
parallelized_queries_planned, parallelized_queries_launched,
parallelized_workers_planned, parallelized_workers_launched,
parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
-[ RECORD 1 ]------------------+--------------------------
query | SELECT count(*) FROM t1
parallelized_queries_planned | 1
parallelized_queries_launched | 1
parallelized_workers_planned | 2
parallelized_workers_launched | 2
parallelized_nodes | 1
parallelized_nodes_all_workers | 1
parallelized_nodes_no_worker | 0
-[ RECORD 2 ]------------------+--------------------------
query | SELECT * FROM t1 LIMIT $1
parallelized_queries_planned | 0
parallelized_queries_launched | 0
parallelized_workers_planned | 0
parallelized_workers_launched | 0
parallelized_nodes | 0
parallelized_nodes_all_workers | 0
parallelized_nodes_no_worker | 0
==> one parallelized query
==> I have the default configuration, so 2 for
max_parallel_worker_per_gather
==> hence two workers, with one node with all workers
SET max_parallel_workers_per_gather TO 5;
SET
SELECT count(*) FROM t1;
count
----------
10000000
(1 row)
SELECT pg_sleep(1);
SELECT query,
parallelized_queries_planned, parallelized_queries_launched,
parallelized_workers_planned, parallelized_workers_launched,
parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
-[ RECORD 1 ]------------------+--------------------------
query | SELECT count(*) FROM t1
parallelized_queries_planned | 2
parallelized_queries_launched | 2
parallelized_workers_planned | 6
parallelized_workers_launched | 6
parallelized_nodes | 2
parallelized_nodes_all_workers | 2
parallelized_nodes_no_worker | 0
-[ RECORD 2 ]------------------+--------------------------
query | SELECT * FROM t1 LIMIT $1
parallelized_queries_planned | 0
parallelized_queries_launched | 0
parallelized_workers_planned | 0
parallelized_workers_launched | 0
parallelized_nodes | 0
parallelized_nodes_all_workers | 0
parallelized_nodes_no_worker | 0
==> another parallelized query
==> with 5 as max_parallel_workers_per_gather, but only 4 workers to use
==> hence four workers, with one node with all workers
The biggest issue with this patch is that it's unable to know workers for
maintenance queries (CREATE INDEX for BTree and VACUUM).
Documentation is done, tests are missing. Once there's an agreement on this
patch, we'll work on the tests.
This has been a collective work with Benoit Lobréau, Jehan-Guillaume de
Rorthais, and Franck Boudehen.
Thanks.
Regards.
[1]: /messages/by-id/b4220d15-2e21-0e98-921b-b9892543cc93@dalibo.com
/messages/by-id/b4220d15-2e21-0e98-921b-b9892543cc93@dalibo.com
[2]: /messages/by-id/d657df20-c4bf-63f6-e74c-cb85a81d0383@dalibo.com
/messages/by-id/d657df20-c4bf-63f6-e74c-cb85a81d0383@dalibo.com
[3]: /messages/by-id/6acbe570-068e-bd8e-95d5-00c737b865e8@gmail.com
/messages/by-id/6acbe570-068e-bd8e-95d5-00c737b865e8@gmail.com
--
Guillaume.
On Thu, Aug 29, 2024 at 10:08:23PM +0200, Guillaume Lelarge wrote:
This patch was a bit discussed on [1], and with more details on [2]. It's
based on another patch sent in 2022 (see [3]). It introduces seven new
columns in pg_stat_statements:* parallelized_queries_planned, number of times the query has been planned
to be parallelized,
* parallelized_queries_launched, number of times the query has been
executed with parallelization,
Comparing the numbers of workers planned and launched with the number
of times a query has been called and planned should provide a rather
good equivalent, no? I am not sure that these two are mandatory to
have.
* parallelized_workers_planned, number of parallel workers planned for
this query,
* parallelized_workers_launched, number of parallel workers executed for
this query,
Yep. Definitely OK with these two. There is an overlap with what
Benoit has sent here when it comes to publish this data to the
executor state:
/messages/by-id/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com
* parallelized_nodes, number of parallelized nodes,
* parallelized_nodes_all_workers, number of parallelized nodes which had
all requested workers,* parallelized_nodes_no_worker, number of parallelized nodes which had no
requested workers.
I can see why you want to register this extra data on a node-basis,
but how does that help when it comes to tune the parallel GUCs? We
cannot control them at node level and the launched/planned ratio
offers an equivalent of that. Not exactly, but that's enough to get a
picture if there is a draught.
A test script (test2.sql) is attached. You can execute it with "psql -Xef
test2.sql your_database" (your_database should not contain a t1 table as it
will be dropped and recreated).
Let's add proper regression tests instead, including
oldextversions.sql as this bumps the version of the module. See for
example the tests of 6fd5071909a2 that can force workers to spawn
for BRIN and btree queries, validating some of the stats published
here.
--
Michael
Hi Michael,
Le jeu. 3 oct. 2024 à 09:15, Michael Paquier <michael@paquier.xyz> a écrit :
On Thu, Aug 29, 2024 at 10:08:23PM +0200, Guillaume Lelarge wrote:
This patch was a bit discussed on [1], and with more details on [2]. It's
based on another patch sent in 2022 (see [3]). It introduces seven new
columns in pg_stat_statements:* parallelized_queries_planned, number of times the query has been
planned
to be parallelized,
* parallelized_queries_launched, number of times the query has been
executed with parallelization,Comparing the numbers of workers planned and launched with the number
of times a query has been called and planned should provide a rather
good equivalent, no? I am not sure that these two are mandatory to
have.
I'm not sure I follow. That would mean that every time a query is executed,
it always gets the same amount of workers. Which is not guaranteed to be
true.
I would agree, though, that parallelized_queries_launched is probably not
that interesting. I could get rid of it if you think it should go away.
* parallelized_workers_planned, number of parallel workers planned for
this query,
* parallelized_workers_launched, number of parallel workers executed for
this query,Yep. Definitely OK with these two. There is an overlap with what
Benoit has sent here when it comes to publish this data to the
executor state:/messages/by-id/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com
Well, I don't see this as an overlap. Rather more information.
* parallelized_nodes, number of parallelized nodes,
* parallelized_nodes_all_workers, number of parallelized nodes which had
all requested workers,* parallelized_nodes_no_worker, number of parallelized nodes which had
no
requested workers.
I can see why you want to register this extra data on a node-basis,
but how does that help when it comes to tune the parallel GUCs? We
cannot control them at node level and the launched/planned ratio
offers an equivalent of that. Not exactly, but that's enough to get a
picture if there is a draught.
On this, I would agree with you. They are not that particularly useful to
get better setting for parallel GUCs. I can drop them if you want.
A test script (test2.sql) is attached. You can execute it with "psql -Xef
test2.sql your_database" (your_database should not contain a t1 table as
it
will be dropped and recreated).
Let's add proper regression tests instead, including
oldextversions.sql as this bumps the version of the module. See for
example the tests of 6fd5071909a2 that can force workers to spawn
for BRIN and btree queries, validating some of the stats published
here.
Did this on the v2 version of the patch (attached here).
Thanks for your review. If you want the parallelized_queries_launched
column and the parallelized_nodes_* columns dropped, I can do that on a v3
patch.
Regards.
--
Guillaume.
Attachments:
v2-0001-Add-parallel-columns-to-pg_stat_statements.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Add-parallel-columns-to-pg_stat_statements.patchDownload+415-12
On Sun, Oct 06, 2024 at 03:32:02PM +0200, Guillaume Lelarge wrote:
I'm not sure I follow. That would mean that every time a query is executed,
it always gets the same amount of workers. Which is not guaranteed to be
true.I would agree, though, that parallelized_queries_launched is probably not
that interesting. I could get rid of it if you think it should go away.
My point is that these stats are useful to know which action may have
to be taken when reaching a mean, and numbers in pg_stat_statements
offer hints that something is going wrong and that a closer lookup at
an EXPLAIN plan may be required, particularly if the total number of
workers planned and launched aggregated in the counters is unbalanced
across queries. If the planned/launched ratio is balanced across most
queries queries, a GUC adjustment may be OK. If the ratio is very
unbalanced in a lower set of queries, I'd also look at tweaking GUCs
instead like the per_gather. These counters give information that one
or the other may be required.
Well, I don't see this as an overlap. Rather more information.
Later versions of Benoit's patch have been accumulating this data in
the executor state. v4 posted at [1] has the following diffs:
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -724,6 +724,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ int es_workers_launched;
+ int es_workers_planned;
} EState;
Your v2 posted on this thread has that:
@@ -707,6 +707,12 @@ typedef struct EState
struct EPQState *es_epq_active;
bool es_use_parallel_mode; /* can we use parallel workers? */
+ bool es_used_parallel_mode; /* was executed in parallel */
+ int es_parallelized_workers_launched;
+ int es_parallelized_workers_planned;
+ int es_parallelized_nodes; /* # of parallelized nodes */
+ int es_parallelized_nodes_all_workers; /* # of nodes with all workers launched */
+ int es_parallelized_nodes_no_worker; /* # of nodes with no workers launched */
es_parallelized_workers_launched and es_workers_launched are the same
thing in both.
On this, I would agree with you. They are not that particularly useful to
get better setting for parallel GUCs. I can drop them if you want.
Yep. I would remove them for now. This leads to more bloat.
Did this on the v2 version of the patch (attached here).
Thanks for your review. If you want the parallelized_queries_launched
column and the parallelized_nodes_* columns dropped, I can do that on a v3
patch.
I'd recommend to split that into more independent patches:
- Introduce the two counters in EState with the incrementations done
in nodeGatherMerge.c and nodeGather.c (mentioned that at [2]/messages/by-id/Zv46wTMjLTuu2t9J@paquier.xyz -- Michael, you may
want to coordinate with Benoit to avoid duplicating the work).
- Expand pg_stat_statements to use them for DMLs, SELECTs, well where
they matter.
- Look at expanding that for utilities that can do parallel jobs:
CREATE INDEX and VACUUM, but this has lower priority to me, and this
can reuse the same counters as the ones added by patch 2.
[1]: /messages/by-id/6ecad3ad-835c-486c-9ebd-da87a9a97634@dalibo.com
[2]: /messages/by-id/Zv46wTMjLTuu2t9J@paquier.xyz -- Michael
--
Michael
Le lun. 7 oct. 2024 à 02:18, Michael Paquier <michael@paquier.xyz> a écrit :
On Sun, Oct 06, 2024 at 03:32:02PM +0200, Guillaume Lelarge wrote:
I'm not sure I follow. That would mean that every time a query is
executed,
it always gets the same amount of workers. Which is not guaranteed to be
true.I would agree, though, that parallelized_queries_launched is probably not
that interesting. I could get rid of it if you think it should go away.My point is that these stats are useful to know which action may have
to be taken when reaching a mean, and numbers in pg_stat_statements
offer hints that something is going wrong and that a closer lookup at
an EXPLAIN plan may be required, particularly if the total number of
workers planned and launched aggregated in the counters is unbalanced
across queries. If the planned/launched ratio is balanced across most
queries queries, a GUC adjustment may be OK. If the ratio is very
unbalanced in a lower set of queries, I'd also look at tweaking GUCs
instead like the per_gather. These counters give information that one
or the other may be required.Well, I don't see this as an overlap. Rather more information.
Later versions of Benoit's patch have been accumulating this data in the executor state. v4 posted at [1] has the following diffs: --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -724,6 +724,9 @@ typedef struct EState */ List *es_insert_pending_result_relations; List *es_insert_pending_modifytables; + + int es_workers_launched; + int es_workers_planned; } EState;Your v2 posted on this thread has that:
@@ -707,6 +707,12 @@ typedef struct EState
struct EPQState *es_epq_active;bool es_use_parallel_mode; /* can we use parallel workers? */ + bool es_used_parallel_mode; /* was executed in parallel */ + int es_parallelized_workers_launched; + int es_parallelized_workers_planned; + int es_parallelized_nodes; /* # of parallelized nodes */ + int es_parallelized_nodes_all_workers; /* # of nodes with all workers launched */ + int es_parallelized_nodes_no_worker; /* # of nodes with no workers launched */es_parallelized_workers_launched and es_workers_launched are the same
thing in both.
My bad. I agree this is the way to go. See patch v3-0001 attached.
On this, I would agree with you. They are not that particularly useful to
get better setting for parallel GUCs. I can drop them if you want.Yep. I would remove them for now. This leads to more bloat.
Done. See patch v3-0002 attached.
Did this on the v2 version of the patch (attached here).
Thanks for your review. If you want the parallelized_queries_launched
column and the parallelized_nodes_* columns dropped, I can do that on av3
patch.
I'd recommend to split that into more independent patches:
- Introduce the two counters in EState with the incrementations done
in nodeGatherMerge.c and nodeGather.c (mentioned that at [2], you may
want to coordinate with Benoit to avoid duplicating the work).
- Expand pg_stat_statements to use them for DMLs, SELECTs, well where
they matter.
- Look at expanding that for utilities that can do parallel jobs:
CREATE INDEX and VACUUM, but this has lower priority to me, and this
can reuse the same counters as the ones added by patch 2.
The first two are done. The last one is beyond my scope.
I'm now working on Benoit's patch to make it work with my v3-0001 patch.
I'll send the resulting patch on his thread.
[1]:
/messages/by-id/6ecad3ad-835c-486c-9ebd-da87a9a97634@dalibo.com
[2]: /messages/by-id/Zv46wTMjLTuu2t9J@paquier.xyz
--
Michael
Regards.
--
Guillaume.
Attachments:
v3-0001-Introduce-two-new-counters-in-EState.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Introduce-two-new-counters-in-EState.patchDownload+12-1
v3-0002-Add-parallel-columns-to-pg_stat_statements.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Add-parallel-columns-to-pg_stat_statements.patchDownload+326-11
On Mon, Oct 07, 2024 at 10:00:13AM +0200, Guillaume Lelarge wrote:
Le lun. 7 oct. 2024 à 02:18, Michael Paquier <michael@paquier.xyz> a écrit :
I'd recommend to split that into more independent patches:
- Introduce the two counters in EState with the incrementations done
in nodeGatherMerge.c and nodeGather.c (mentioned that at [2], you may
want to coordinate with Benoit to avoid duplicating the work).
- Expand pg_stat_statements to use them for DMLs, SELECTs, well where
they matter.
- Look at expanding that for utilities that can do parallel jobs:
CREATE INDEX and VACUUM, but this has lower priority to me, and this
can reuse the same counters as the ones added by patch 2.The first two are done. The last one is beyond my scope.
That's fair. I have put my hands on this patch set, finishing with
the attached.
A couple of notes:
- I've been struggling a bit on the "planned" vs "launched" terms used
in the names for the counters. It is inconsistent with the backend
state, where we talk about workers "to launch" and workers "launched".
"planned" does not really apply to utilities, as this may not be
planned per se.
- The test in parallel.sql can be cheaper, tweaking the right GUCs the
right way data in the table is not even required to spawn a set of
parallel workers.
- Meson was not updated for the new test and the files to install.
0001 and 0002 are the parts of the patch that I can see myself
applying; it is pretty cool to see pg_stat_statements complain that
the launched/to_launch ratio can get unbalanced really quickly when I
do something stupid. The CI is stable with these.
0003 has the remaining bits with the 3rd and 4th counters, able to
apply on top of 0002.
--
Michael
Attachments:
v4-0001-Introduce-two-new-counters-in-EState-for-parallel.patchtext/x-diff; charset=us-asciiDownload+21-1
v4-0002-Add-parallel-columns-to-pg_stat_statements.patchtext/x-diff; charset=us-asciiDownload+281-12
v4-0003-Introduce-two-more-counters-in-pg_stat_statements.patchtext/x-diff; charset=us-asciiDownload+68-13
Hi,
Le mar. 8 oct. 2024 à 09:29, Michael Paquier <michael@paquier.xyz> a écrit :
On Mon, Oct 07, 2024 at 10:00:13AM +0200, Guillaume Lelarge wrote:
Le lun. 7 oct. 2024 à 02:18, Michael Paquier <michael@paquier.xyz> a
écrit :
I'd recommend to split that into more independent patches:
- Introduce the two counters in EState with the incrementations done
in nodeGatherMerge.c and nodeGather.c (mentioned that at [2], you may
want to coordinate with Benoit to avoid duplicating the work).
- Expand pg_stat_statements to use them for DMLs, SELECTs, well where
they matter.
- Look at expanding that for utilities that can do parallel jobs:
CREATE INDEX and VACUUM, but this has lower priority to me, and this
can reuse the same counters as the ones added by patch 2.The first two are done. The last one is beyond my scope.
That's fair. I have put my hands on this patch set, finishing with
the attached.A couple of notes:
- I've been struggling a bit on the "planned" vs "launched" terms used
in the names for the counters. It is inconsistent with the backend
state, where we talk about workers "to launch" and workers "launched".
"planned" does not really apply to utilities, as this may not be
planned per se.
You're right. Much better to keep them consistent.
- The test in parallel.sql can be cheaper, tweaking the right GUCs the
right way data in the table is not even required to spawn a set of
parallel workers.
Yeah, it could be cheaper. As it is already quick, I didn't do the extra
mile to make it cheaper.
- Meson was not updated for the new test and the files to install.
Oops, sorry. Didn't know that Meson had to be updated.
0001 and 0002 are the parts of the patch that I can see myself
applying; it is pretty cool to see pg_stat_statements complain that
the launched/to_launch ratio can get unbalanced really quickly when I
do something stupid. The CI is stable with these.
Great, thanks :)
0003 has the remaining bits with the 3rd and 4th counters, able to
apply on top of 0002.
Still think they are interesting but I understand your concerns.
I've done a bit of testing with the three patches, and didn't find any
issue with them.
--
Guillaume.
On Tue, Oct 08, 2024 at 03:53:16PM +0200, Guillaume Lelarge wrote:
I've done a bit of testing with the three patches, and didn't find any
issue with them.
Okay, applied 0001 and 0002 then after a second lookup. I'll spend
some more time thinking about 0003 and the other threads.
--
Michael
Le mer. 9 oct. 2024 à 01:33, Michael Paquier <michael@paquier.xyz> a écrit :
On Tue, Oct 08, 2024 at 03:53:16PM +0200, Guillaume Lelarge wrote:
I've done a bit of testing with the three patches, and didn't find any
issue with them.Okay, applied 0001 and 0002 then after a second lookup. I'll spend
some more time thinking about 0003 and the other threads.
Thanks a lot.
--
Guillaume.
On Wed, Oct 09, 2024 at 08:32:52AM +0900, Michael Paquier wrote:
Okay, applied 0001 and 0002 then after a second lookup. I'll spend
some more time thinking about 0003 and the other threads.
Considered 0003, and I'm still not sure that this is something that
is really required based on the correlation that are now possible with
the number of times a query has been called and the number of
planned/launched workers.
So I am marking the entry as committed. Let's see about the threads
for the addition of this data at table-level and at the
database-level.
--
Michael
Le jeu. 7 nov. 2024 à 04:19, Michael Paquier <michael@paquier.xyz> a écrit :
On Wed, Oct 09, 2024 at 08:32:52AM +0900, Michael Paquier wrote:
Okay, applied 0001 and 0002 then after a second lookup. I'll spend
some more time thinking about 0003 and the other threads.Considered 0003, and I'm still not sure that this is something that
is really required based on the correlation that are now possible with
the number of times a query has been called and the number of
planned/launched workers.
I'm fine with your decision. After using the new metrics, we'll probably
see more clearly if that's enough.
So I am marking the entry as committed. Let's see about the threads
for the addition of this data at table-level and at the
database-level.
Sounds good!
Table level is probably not the most important in my view. Database-level
and logging are what really matters to me.
Thanks.
--
Guillaume.